Paper Digest: NeurIPS 2024 Papers & Highlights
Note: NeurIPS-2024 accepts more than 4,500 papers, this page only includes 500 of them selected by our daily paper digest ranking algorithm. To browse all accepted papers or learn more about the NeurIPS-2024 statistics, readers can read All 4,500 NeurIPS-2024 accepted papers in a separate page, which takes quite some time to load. On this pape, readers are also able to filter papers by keywords. For example, using ‘related code’ as the filter keyword will produce a list of all papers with code available to download.
To search or review papers within NIPS-2024 related to a specific topic, please use the search by venue (NIPS-2024), review by venue (NIPS-2024) and question answering by venue (NIPS-2024) services. To browse papers by author, here is a list of all ~17,000 authors (NIPS-2024). You may also like to explore our “Best Paper” Digest (NeurIPS), which lists the most influential NeurIPS papers since 1987.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: NeurIPS 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | SGLang: Efficient Execution of Structured Language Model Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SGLang, a system for efficient execution of complex language model programs. |
Lianmin Zheng; Liangsheng Yin; Zhiqiang Xie; Chuyue (Livia) Sun; Jeff Huang; Cody Hao Yu; Shiyi Cao; Christos Kozyrakis; Ion Stoica; Joseph Gonzalez; Clark Barrett; Ying Sheng; |
2 | You Don’t Need Data-Augmentations in Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we challenge the importance of invariance and data-augmentation in JEAs at scale. |
Théo Moutakanni; Maxime Oquab; Marc Szafraniec; Maria Vakalopoulou; Piotr Bojanowski; |
3 | The Mamba in The Llama: Distilling and Accelerating Hybrid Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent research suggests that state-space models (SSMs) like Mamba can be competitive with Transformer models for language modeling with advantageous deployment characteristics. Given the focus and expertise on training large-scale Transformer models, we consider the challenge of converting these pretrained models into SSMs for deployment. |
Junxiong Wang; Daniele Paliotta; Avner May; Alexander Rush; Tri Dao; |
4 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. |
Jay Shah; Ganesh Bikshandi; Ying Zhang; Vijay Thakkar; Pradeep Ramani; Tri Dao; |
5 | Improving Alignment and Robustness with Short Circuiting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: AI systems are can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that short-circuits models as they respond with harmful outputs. |
Andy Zou; Long Phan; Justin Wang; Derek Duenas; Maxwell Lin; Maksym Andriushchenko; J. Zico Kolter; Matt Fredrikson; Dan Hendrycks; |
6 | Repurposing Language Models Into Embedding Models: Finding The Compute-Optimal Recipe Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pretrained decoder-only language models. |
Albert Q. Jiang; Alicja Ziarko; Bartosz Piotrowski; Wenda Li; Mateja Jamnik; Piotr Miłoś; |
7 | Multi-language Diversity Benefits Autoformalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we create mma, a large, flexible, multi-language, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. |
Albert Q. Jiang; Wenda Li; Mateja Jamnik; |
8 | The FineWeb Datasets: Decanting The Web for The Finest Text Data at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. |
Guilherme Penedo; Hynek Kydlíček; Loubna Ben allal; Anton Lozhkov; Margaret Mitchell; Colin Raffel; Leandro Von Werra; Thomas Wolf; |
9 | Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we propose a natural bidirectional extension of the Mamba model Hydra, parameterized as a quasiseparable matrix mixer, which demonstrates superior performance over other sequence models including Transformers on non-causal tasks. |
Sukjun Hwang; Aakash Lahoti; Ratish Puduppully; Tri Dao; Albert Gu; |
10 | Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a method that is able to distill a pre-trained Transformer architecture into alternative architectures such as state space models (SSMs). |
Aviv Bick; Kevin Li; Eric Xing; J. Zico Kolter; Albert Gu; |
11 | Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we find that DAA methods deteriorate not only across a wide range of KL-budgets, but also often before even a single epoch of the dataset is completed. Through extensive empirical experimentation this work formulates the reward over-optimization or hacking problem for DAAs and explores its consequences across objectives, training regimes, and model scales. |
Rafael Rafailov; Yaswanth Chittepu; Ryan Park; Harshit Sushil Sikchi; Joey Hejna; Brad Knox; Chelsea Finn; Scott Niekum; |
12 | MINT-1T: Scaling Open-Source Multimodal Data By 10x: A Multimodal Dataset with One Trillion Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. |
Anas Awadalla; Le Xue; Oscar Lo; Manli Shu; Hannah Lee; Etash Guha; Sheng Shen; Mohamed Awadalla; Silvio Savarese; Caiming Xiong; Ran Xu; Yejin Choi; Ludwig Schmidt; |
13 | QUEEN: QUantized Efficient ENcoding for Streaming Free-viewpoint Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework for QUantized and Efficient ENcoding (QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). |
Sharath Girish; Tianye Li; Amrita Mazumdar; Abhinav Shrivastava; david luebke; Shalini De Mello; |
14 | Yo’LLaVA: Your Personalized Language and Vision Assistant Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Similarly, when looking at a friend’s image, the interest lies in seeing their activities (e.g., *my friend* is holding a cat), rather than merely observing generic human actions (e.g., *a man* is holding a cat). In this paper, we introduce the novel task of personalizing LMMs, so that they can have conversations about a specific subject. |
Thao Nguyen; Haotian Liu; Yuheng Li; Mu Cai; Utkarsh Ojha; Yong Jae Lee; |
15 | Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MEGALODON, an neural architecture for efficient sequence modeling with unlimited context length. |
Xuezhe Ma; Xiaomeng Yang; Wenhan Xiong; Beidi Chen; LILI YU; Hao Zhang; Jonathan May; Luke Zettlemoyer; Omer Levy; Chunting Zhou; |
16 | Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts, systematically investigate the impact of these components on downstream model performance, and suggest a recipe for strong learning for preference feedback. |
Hamish Ivison; Yizhong Wang; Jiacheng Liu; Zeqiu Wu; Valentina Pyatkin; Nathan Lambert; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
17 | ReVideo: Remake A Video with Motion and Content Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out from existing methods by allowing precise video editing in specific areas through the specification of both content and motion. |
Chong Mou; Mingdeng Cao; Xintao Wang; Zhaoyang Zhang; Ying Shan; Jian Zhang; |
18 | LLM Circuit Analyses Are Consistent Across Training and Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. |
Curt Tigges; Michael Hanna; Qinan Yu; Stella Biderman; |
19 | Stylus: Automatic Adapter Selection for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt’s keywords. |
Michael Luo; Justin Wong; Brandon Trabucco; Yanping Huang; Joseph Gonzalez; zhifeng Chen; Ruslan Salakhutdinov; Ion Stoica; |
20 | VHELM: A Holistic Evaluation of Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). |
Tony Lee; Haoqin Tu; Chi Heem Wong; Wenhao Zheng; Yiyang Zhou; Yifan Mai; Josselin Roberts; Michihiro Yasunaga; Huaxiu Yao; Cihang Xie; Percy Liang; |
21 | Neural Model Checking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a machine learning approach to model checking hardware designs. |
Mirco Giacobbe; Daniel Kroening; Abhinandan Pal; Michael Tautschnig; |
22 | Observational Scaling Laws and The Predictability of Langauge Model Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an alternative, $observational$ approach that bypasses model training and instead builds scaling laws from $\sim$80 publically available models. |
Yangjun Ruan; Chris Maddison; Tatsunori Hashimoto; |
23 | LocCa: Visual Pretraining with Location-aware Captioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This opens up the largely-unexplored potential of using natural language as a flexible and powerful interface for handling diverse pretraining tasks. In this paper, we demonstrate this with a novel visual pretraining paradigm, LocCa, that incorporates location-aware tasks into captioners to teach models to extract rich information from images. |
Bo Wan; Michael Tschannen; Yongqin Xian; Filip Pavetic; Ibrahim Alabdulmohsin; Xiao Wang; André Susano Pinto; Andreas Steiner; Lucas Beyer; Xiaohua Zhai; |
24 | Parameter-Inverted Image Pyramid Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). |
Xizhou Zhu; Xue Yang; Zhaokai Wang; Hao Li; Wenhan Dou; Junqi Ge; Lewei Lu; Yu Qiao; Jifeng Dai; |
25 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. |
Shengbang Tong; Ellis Brown; Lvhui Chen; Sanghyun Woo; Adithya Jairam Vedagiri IYER; Sai Charitha Akula; Shusheng Yang; Jihan Yang; Manoj Middepogu; Ziteng Wang; Xichen Pan; Rob Fergus; Yann LeCun; Saining Xie; |
26 | Chain-of-Thought Reasoning Without Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study takes a novel approach by asking: Can LLMs reason effectively without any prompting? |
Xuezhi Wang; Denny Zhou; |
27 | Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we unveil a new vulnerability: the privacy backdoor attack. |
Yuxin Wen; Leo Marchyok; Sanghyun Hong; Jonas Geiping; Tom Goldstein; Nicholas Carlini; |
28 | Humanoid Locomotion As Next Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. |
Ilija Radosavovic; Bike Zhang; Baifeng Shi; Jathushan Rajasegaran; Sarthak Kamat; Trevor Darrell; Koushil Sreenath; Jitendra Malik; |
29 | Image2Struct: A Benchmark for Evaluating Vision-Language Models in Extracting Structured Information from Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce three tasks in the domain of web pages, LaTeX, and music and two new metrics that allow efficient and automatic comparison between a pair of images. |
Josselin Roberts; Tony Lee; Chi Heem Wong; Michihiro Yasunaga; Yifan Mai; Percy Liang; |
30 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark that features video-language interleaved inputs up to an hour long. |
Haoning Wu; DONGXU LI; Bei Chen; Junnan Li; |
31 | MAmmoTH2: Scaling Instructions from The Web Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. |
Xiang Yue; Tianyu Zheng; Ge Zhang; Wenhu Chen; |
32 | Learning-to-Cache: Accelerating Diffusion Transformer Via Layer Caching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. |
Xinyin Ma; Gongfan Fang; Michael Bi Mi; Xinchao Wang; |
33 | The Art of Saying No: Contextual Noncompliance in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should *not* comply with user requests. |
Faeze Brahman; Sachin Kumar; Vidhisha Balachandran; Pradeep Dasigi; Valentina Pyatkin; Abhilasha Ravichander; Sarah Wiegreffe; Nouha Dziri; Khyathi Chandu; Jack Hessel; Yulia Tsvetkov; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
34 | Efficient LLM Jailbreak Via Adaptive Dense-to-sparse Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple open-source LLMs. |
Kai Hu; Weichen Yu; Tianjun Yao; Xiang Li; Wenhe Liu; Lijun Yu; Yining Li; Kai Chen; Zhiqiang Shen; Matt Fredrikson; |
35 | JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as *jailbreak artifacts*; (2) a jailbreaking dataset comprising 100 behaviors—both original and sourced from prior work—which align with OpenAI’s usage policies; (3) a standardized evaluation framework at https://github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. |
Patrick Chao; Edoardo Debenedetti; Alexander Robey; Maksym Andriushchenko; Francesco Croce; Vikash Sehwag; Edgar Dobriban; Nicolas Flammarion; George J. Pappas; Florian Tramer; Hamed Hassani; Eric Wong; |
36 | Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate an alternative approach involving multiple experts for denoising, and introduce Remix-DiT, a novel method designed to enhance output quality at a low cost. |
Gongfan Fang; Xinyin Ma; Xinchao Wang; |
37 | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or “N:M”) Sparsity in LLMs, aimed at reducing computational overhead during inference. |
Gongfan Fang; Hongxu Yin; Saurav Muralidharan; Greg Heinrich; Jeff Pool; Jan Kautz; Pavlo Molchanov; Xinchao Wang; |
38 | Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These methods require efficient approximations, and although learning a network that directly predicts the desired output is a promising solution, training such models with exact labels is often infeasible. We therefore explore training amortized models with noisy labels, and we find that this is inexpensive and surprisingly effective. |
Ian Covert; Chanwoo Kim; Su-In Lee; James Zou; Tatsunori Hashimoto; |
39 | GenAI Arena: An Open Evaluation Platform for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an open platform \arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. |
Dongfu Jiang; Max KU; Tianle Li; Yuansheng Ni; Shizhuo Sun; Rongqi Fan; Wenhu Chen; |
40 | TaskBench: Benchmarking Large Language Models for Task Automation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there is a lack of systematic and standardized benchmarks to promote the development of LLMs in task automation. To address this, we introduce TaskBench to evaluate the capability of LLMs in task automation. |
Yongliang Shen; Kaitao Song; Xu Tan; Wenqi Zhang; Kan Ren; Siyu Yuan; Weiming Lu; Dongsheng Li; Yueting Zhuang; |
41 | SafeSora: Towards Safety Alignment of Text2Video Generation Via A Human Preference Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the *SafeSora* dataset to promote research on aligning text-to-video generation with human values. |
Josef Dai; Tianle Chen; Xuyao Wang; Ziran Yang; Taiye Chen; Jiaming Ji; Yaodong Yang; |
42 | What Matters When Building Vision-language Models? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. |
Hugo Laurençon; Leo Tronchon; Matthieu Cord; Victor Sanh; |
43 | Rethinking Score Distillation As A Bridge Between Image Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, SDS has a number of characteristic artifacts that limit its utility in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from some current source distribution to a target distribution. |
David McAllister; Songwei Ge; Jia-Bin Huang; David Jacobs; Alexei Efros; Aleksander Holynski; Angjoo Kanazawa; |
44 | TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the practical application of 3D-SBDD generative models is hampered by their slow processing speeds. To address this bottleneck, we introduce TurboHopp, an accelerated pocket-conditioned 3D scaffold hopping model that merges the strategic effectiveness of traditional scaffold hopping with rapid generation capabilities of consistency models. |
Kiwoong Yoo; Owen Oertell; Junhyun Lee; Sanghoon Lee; Jaewoo Kang; |
45 | Graph-based Uncertainty Metrics for Long-form Language Model Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advancements in Large Language Models (LLMs) have significantly improved text generation capabilities, but these systems are still known to hallucinate and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty — which represents the relationship between LLM generations and claims within them as a bipartite graph and estimates the claim-level uncertainty with a family of graph centrality metrics. |
Mingjian Jiang; Yangjun Ruan; Prasanna Sattigeri; Salim Roukos; Tatsunori Hashimoto; |
46 | Fractal Patterns May Illuminate The Success of Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown. |
Ibrahim Alabdulmohsin; Vinh Tran; Mostafa Dehghani; |
47 | Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this assumption has not been rigorously studied in the literature. In this paper, we empirically investigate the proficiency of LLMs to handle these implicit numerical constraints when generating DS programs. |
Yinlin Deng; Chunqiu Steven Xia; Zhezhen Cao; Meiziniu Li; LINGMING ZHANG; |
48 | Iterative Reasoning Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. |
Richard Yuanzhe Pang; Weizhe Yuan; He He; Kyunghyun Cho; Sainbayar Sukhbaatar; Jason Weston; |
49 | Geometric-Averaged Preference Optimization for Soft Preference Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. |
Hiroki Furuta; Kuang-Huei Lee; Shixiang (Shane) Gu; Yutaka Matsuo; Aleksandra Faust; Heiga Zen; Izzeddin Gur; |
50 | Fully Transparent Self-Alignment for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. |
Yuxiang Wei; Federico Cassano; Jiawei Liu; Yifeng Ding; Naman Jain; Zachary Mueller; Harm de Vries; Leandro Von Werra; Arjun Guha; LINGMING ZHANG; |
51 | Are More LLM Calls All You Need? Towards The Scaling Properties of Compound AI Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we initiate the study of scaling properties of compound inference systems. |
Lingjiao Chen; Jared Quincy Davis; Boris Hanin; Peter Bailis; Ion Stoica; Matei A Zaharia; James Zou; |
52 | Large Scale Transfer Learning for Tabular Data Via Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, while recent foundation models have reduced the need for developing task-specific datasets and predictors in domains such as language modeling and computer vision, this transfer learning paradigm has not had similar impact in the tabular domain. In this work, we seek to narrow this gap and present TabuLa-8B, a language model for tabular prediction. |
Josh Gardner; Juan Perdomo; Ludwig Schmidt; |
53 | DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we introduce DiscoveryWorld, a virtual environment that enables benchmarking an agent’s ability to perform complete cycles of novel scientific discovery in an inexpensive, simulated, multi-modal, long-horizon, and fictional setting. |
Peter A Jansen; Marc-Alexandre Côté; Tushar Khot; Erin Bransom; Bhavana Dalvi Mishra; Bodhisattwa Prasad Majumder; Oyvind Tafjord; Peter Clark; |
54 | Learning to Reason Via Program Generation, Emulation, and Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To adapt the COGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. |
Nathaniel Weir; Muhammad Khalifa; Linlu Qiu; Orion Weller; Peter Clark; |
55 | QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. |
Saleh Ashkboos; Amirkeivan Mohtashami; Maximilian Croci; Bo Li; Pashmina Cameron; Martin Jaggi; Dan Alistarh; Torsten Hoefler; James Hensman; |
56 | Depth Anything V2 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. |
Lihe Yang; Bingyi Kang; Zilong Huang; Zhen Zhao; Xiaogang Xu; Jiashi Feng; Hengshuang Zhao; |
57 | I Don’t Know: Explicit Modeling of Uncertainty with An [IDK] Token Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel calibration method that can be used to combat hallucinations. |
Roi Cohen; Konstantin Dobler; Eden Biran; Gerard de Melo; |
58 | DFBA: Data Free Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. |
Bochuan Cao; Jinyuan Jia; Chuxuan Hu; Wenbo Guo; Zhen Xiang; Jinghui Chen; Bo Li; Dawn Song; |
59 | Tree of Attacks: Jailbreaking Black-Box LLMs Automatically Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present *Tree of Attacks with Pruning* (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. |
Anay Mehrotra; Manolis Zampetakis; Paul Kassianik; Blaine Nelson; Hyrum Anderson; Yaron Singer; Amin Karbasi; |
60 | VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. |
Xiang Li; Jian Ding; Mohamed Elhoseiny; |
61 | Quantifying The Bitter Lesson: How Safety Benchmarks Measure Capabilities Instead of Safety Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the spirit of the Bitter Lesson, we ask whether such effort is wasteful. To quantify this question, we leverage spectral analysis to measure an underlying capabilities component, the direction in benchmark-performance-space which explains most variation in model performance. |
Richard Ren; Steven Basart; Adam Khoja; Alexander Pan; Alice Gatti; Long Phan; Xuwang Yin; Mantas Mazeika; Gabe Mukobi; Ryan Kim; Stephen Fitz; Dan Hendrycks; |
62 | Smoothie: Label Free Language Model Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. |
Neel Guha; Mayee Chen; Trevor Chow; Ishan Khare; Christopher Ré; |
63 | Even Sparser Graph Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We establish theoretical conditions when a narrow network’s attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets. |
Hamed Shirzad; Honghao Lin; Balaji Venkatachalam; Ameya Velingker; David Woodruff; Danica J. Sutherland; |
64 | Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. |
Michael Wornow; Avanika Narayan; Ben Viggiano; Ishan Khare; Tathagat Verma; Tibor Thompson; Miguel Hernandez; Sudharsan Sundar; Chloe Trujillo; Krrish Chawla; Rongfei Lu; Justin Shen; Divya Nagaraj; Joshua Martinez; Vardhan Agrawal; Althea Hudson; Nigam Shah; Christopher Ré; |
65 | SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate how the role of interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates language model agents to autonomously use computers to solve software engineering tasks. |
John Yang; Carlos Jimenez; Alexander Wettig; Kilian Lieret; Shunyu Yao; Karthik Narasimhan; Ofir Press; |
66 | What Can Foundation Models’ Embeddings Do? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models’ embeddings with unified image and dataset-level understanding spanning modality and granularity. |
Xueyan Zou; Linjie Li; Jianfeng Wang; Jianwei Yang; Mingyu Ding; Junyi Wei; Zhengyuan Yang; Feng Li; Hao Zhang; Shilong Liu; Arul Aravinthan; Yong Jae Lee; Lijuan Wang; |
67 | 3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To foster richer and fine-grained part-level 3D understanding, we introduce 3DCoMPaT200, a large-scale dataset tailored for compositional understanding of object parts and materials, with 200 object categories with approximately 5 times larger object vocabulary compared to 3DCoMPaT and almost 4 times larger part categories. |
Mahmoud Ahmed; Xiang Li; Arpit Prajapati; Mohamed Elhoseiny; |
68 | You Only Cache Once: Decoder-Decoder Architectures for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. |
Yutao Sun; Li Dong; Yi Zhu; Shaohan Huang; Wenhui Wang; Shuming Ma; Quanlu Zhang; Jianyong Wang; Furu Wei; |
69 | Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted. |
Dan Braun; Jordan Taylor; Nicholas Goldowsky-Dill; Lee Sharkey; |
70 | Benchmarking LLMs Via Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect — uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. |
Fanghua Ye; Mingming Yang; Jianhui Pang; Longyue Wang; Derek Wong; Emine Yilmaz; Shuming Shi; Zhaopeng Tu; |
71 | Learning Segmentation from Point Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow. |
Laurynas Karazija; Iro Laina; Christian Rupprecht; Andrea Vedaldi; |
72 | Sparse Maximal Update Parameterization: A Holistic Approach to Sparse Training Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without stable dynamics and effective training recipes, it is costly to test sparsity at scale, which is key to surpassing dense networks and making the business case for sparsity acceleration in hardware. A holistic approach is needed to tackle these challenges and we propose S\textmuPar as one such approach. S\textmuPar ensures activations, gradients, and weight updates all scale independently of sparsity level. |
Nolan Dey; Shane Bergsma; Joel Hestness; |
73 | DataComp-LM: In Search of The Next Generation of Training Sets for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DataComp for Language Models, a testbed for controlled dataset experiments with the goal of improving language models. |
Amro Abbas; Alon Albalak; Kushal Arora; Hritik Bansal; Yonatan Bitton; Yair Carmon; Khyathi Chandu; Mayee Chen; Giannis Daras; Achal Dave; Alex Dimakis; Alaaeldin El-Nouby; Fartash Faghri; Alex Fang; Samir Yitzhak Gadre; Josh Gardner; Saurabh Garg; Dhruba Ghosh; Aaron Gokaslan; Dirk Groeneveld; Etash Guha; Suchin Gururangan; Reinhard Heckel; Cheng-Yu Hsieh; Gabriel Ilharco; Maor Ivgi; Jenia Jitsev; Matt Jordan; Sham Kakade; Sedrick Scott Keh; Maciej Kilian; Pang Wei Koh; Thomas Kollar; Jeffrey Li; Kyle Lo; Kalyani Marathe; Jean Mercat; Niklas Muennighoff; Marianna Nezhurina; Thao Nguyen; Sewoong Oh; Hadi Pouransari; Sarah Pratt; Sunny Sanyal; Ludwig Schmidt; Vaishaal Shankar; Rulin Shao; Georgios Smyrnis; Luca Soldaini; Shuran Song; Alexander Toshev; Igor Vasiljevic; Stephanie Wang; Mitchell Wortsman; Rui Xin; Luke Zettlemoyer; Hanlin Zhang; Jieyu Zhang; |
74 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability.To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). |
Hugh Zhang; Jeff Da; Dean Lee; Vaughn Robinson; Catherine Wu; William Song; Tiffany Zhao; Pranav Raja; Charlotte Zhuang; Dylan Slack; Qin Lyu; Sean Hendryx; Russell Kaplan; Michele Lunati; Summer Yue; |
75 | Neural Gaffer: Relighting Any Object Via Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. |
Haian Jin; Yuan Li; Fujun Luan; Yuanbo Xiangli; Sai Bi; Kai Zhang; Zexiang Xu; Jin Sun; Noah Snavely; |
76 | Achieving Efficient Alignment Through Learned Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce *Aligner*, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model. |
Jiaming Ji; Boyuan Chen; Hantao Lou; Donghai Hong; Borong Zhang; Xuehai Pan; Tianyi (Alex) Qiu; Juntao Dai; Yaodong Yang; |
77 | WizardArena: Post-training Large Language Models Via Simulated Offline Chatbot Arena Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. |
Haipeng Luo; Qingfeng Sun; Can Xu; Pu Zhao; Qingwei Lin; Jian-Guang Lou; Shifeng Chen; Yansong Tang; Weizhu Chen; |
78 | Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. |
Zhiqing Sun; Longhui Yu; Yikang Shen; Weiyang Liu; Yiming Yang; Sean Welleck; Chuang Gan; |
79 | Interpreting The Weight Space of Customized Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the space of weights spanned by a large collection of customized diffusion models. |
Amil Dravid; Yossi Gandelsman; Kuan-Chieh Wang; Rameen Abdal; Gordon Wetzstein; Alexei Efros; Kfir Aberman; |
80 | Make Your LLM Fully Utilize The Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents **information-intensive (IN2) training**, a purely data-driven solution to overcome lost-in-the-middle. |
Shengnan An; Zexiong Ma; Zeqi Lin; Nanning Zheng; Jian-Guang Lou; Weizhu Chen; |
81 | Multistep Distillation of Diffusion Models Via Moment Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method for making diffusion models faster to sample. |
Tim Salimans; Emiel Hoogeboom; Thomas Mensink; Jonathan Heek; |
82 | Query-Based Adversarial Prompt Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. |
Jonathan Hayase; Ema Borevković; Nicholas Carlini; Florian Tramer; Milad Nasr; |
83 | Evaluating Copyright Takedown Methods for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods,the impact on the model’s ability to retain uncopyrightable factual knowledge from the copyrighted content, and how well the model maintains its general utility and efficiency. |
Boyi Wei; Weijia Shi; Yangsibo Huang; Noah Smith; Chiyuan Zhang; Luke Zettlemoyer; Kai Li; Peter Henderson; |
84 | Exploring Context Window of Large Language Models Via Decomposed Positional Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. |
Zican Dong; Junyi Li; Xin Men; Xin Zhao; Bingning Wang; Zhen Tian; weipeng chen; Ji-Rong Wen; |
85 | Visual Autoregressive Modeling: Scalable Image Generation Via Next-Scale Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine next-scale prediction or next-resolution prediction, diverging from the standard raster-scan next-token prediction. |
Keyu Tian; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Liwei Wang; |
86 | Vision Model Pre-training on Interleaved Image-Text Data Via Latent Compression Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent success of compression learning in natural language processing, we propose a novel vision model pre-training method called Latent Compression Learning (LCL) for interleaved image-text data. |
CHENYU YANG; Xizhou Zhu; Jinguo Zhu; Weijie Su; Junjie Wang; Xuan Dong; Wenhai Wang; Bin Li; Jie Zhou; Yu Qiao; Jifeng Dai; |
87 | Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning Via The Lens of Representation Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the potential hierarchy of representation complexity among these RL paradigms. |
Guhao Feng; Han Zhong; |
88 | Video Diffusion Models Are Training-free Motion Interpreter and Controller Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging MOFT, we propose a novel training-free video motion control framework. |
Zeqi Xiao; Yifan Zhou; Shuai Yang; Xingang Pan; |
89 | DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With this question in mind, we further conduct qualitative and quantitative pre-experiments, which validate the negative impact of detection-segmentation imbalance issue on the model performance. To address this issue, this paper proposes DI-MaskDINO model, the core idea of which is to improve the final performance by alleviating the detection-segmentation imbalance. |
Zhixiong Nan; Li Xianghong; Tao Xiang; Jifeng Dai; |
90 | Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the within-class similarity, we introduce class-wise supervision during the image synthesizing process by batching the samples within classes, instead of across classes. |
Lingao Xiao; Yang He; |
91 | VisionLLM V2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. |
Jiannan Wu; Muyan Zhong; Sen Xing; Zeqiang Lai; Zhaoyang Liu; Wenhai Wang; Zhe Chen; Xizhou Zhu; Lewei Lu; Tong Lu; Ping Luo; Yu Qiao; Jifeng Dai; |
92 | Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. |
Ruihan Gao; Kangle Deng; Gengshan Yang; Wenzhen Yuan; Jun-Yan Zhu; |
93 | NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. |
Hexuan Deng; Wenxiang Jiao; Xuebo Liu; Min Zhang; Zhaopeng Tu; |
94 | Learning 1D Causal Visual Representation with De-focus Attention Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The issue of over-focus hinders the model’s ability to extract diverse visual features and to receive effective gradients for optimization. To address this, we propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns. |
Tao Chenxin; Xizhou Zhu; Shiqian Su; Lewei Lu; Changyao Tian; Xuan Luo; Gao Huang; Hongsheng Li; Yu Qiao; Jie Zhou; Jifeng Dai; |
95 | Boosting Text-to-Video Generative Model with MLLMs Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon this finding, we utilize MLLMs to perform fine-grained video preference annotations across two dimensions, resulting in the creation of VideoPrefer, which includes 135,000 preference annotations. Utilizing this dataset, we introduce VideoRM, the first general-purpose reward model tailored for video preference in the text-to-video domain. |
Xun Wu; Shaohan Huang; Guolong Wang; Jing Xiong; Furu Wei; |
96 | Multi-Head Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Multi-Head Mixture-of-Experts (MH-MoE). |
Xun Wu; Shaohan Huang; Wenhui Wang; Shuming Ma; Li Dong; Furu Wei; |
97 | Multimodal Large Language Models Make Text-to-Image Generative Models Align Better Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. |
Xun Wu; Shaohan Huang; Guolong Wang; Jing Xiong; Furu Wei; |
98 | Mind’s Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind’s Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. |
Wenshan Wu; Shaoguang Mao; Yadong Zhang; Yan Xia; Li Dong; Lei Cui; Furu Wei; |
99 | Learning Scene-specific Descriptions Via Adaptive Renormalization for Open-vocabulary Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP and follow a standard zero-shot pipeline – computing similarity between the query image and the text embeddings for each category (i.e., text classifiers). In this work, we argue that the text classifiers adopted by existing OVSGG methods, i.e., category-/part-level prompts, are scene-agnostic as they remain unchanged across contexts. |
Guikun Chen; Jin Li; Wenguan Wang; |
100 | Unlocking The Potential of Global Human Expertise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. |
Elliot Meyerson; Olivier Francon; Darren Sargent; Babak Hodjat; Risto Miikkulainen; |
101 | Visual Sketchpad: Sketching As A Visual Chain of Thought for Multimodal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. |
Yushi Hu; Weijia Shi; Xingyu Fu; Dan Roth; Mari Ostendorf; Luke Zettlemoyer; Noah Smith; Ranjay Krishna; |
102 | RL-GPT: Integrating Reinforcement Learning and Code-as-policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. |
Shaoteng Liu; Haoqi Yuan; Minda Hu; Yanwei Li; Yukang Chen; Shu Liu; Zongqing Lu; Jiaya Jia; |
103 | LLM Evaluators Recognize and Favor Their Own Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate if self-recognition capability contributes to self-preference. |
Arjun Panickssery; Samuel Bowman; Shi Feng; |
104 | SimPO: Simple Preference Optimization with A Reference-Free Reward Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SimPO, a simpler yet more effective approach. |
Yu Meng; Mengzhou Xia; Danqi Chen; |
105 | Finding Transformer Circuits With Edge Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we frame circuit discovery as an optimization problem and propose _Edge Pruning_ as an effective and scalable solution. |
Adithya Bhaskar; Alexander Wettig; Dan Friedman; Danqi Chen; |
106 | Dissecting The Failure of Invariant Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a Structural Causal Model (SCM) to theoretically dissect the performance of two prominent invariant learning methods–Invariant Risk Minimization (IRM) and Variance-Risk Extrapolation (VREx)–in node-level OOD settings. |
Qixun Wang; Yifei Wang; Yisen Wang; Xianghua Ying; |
107 | CALVIN: Improved Contextual Video Captioning Via Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Scene descriptions, especially in movies, require a deeper contextual understanding, unlike general-purpose video captioning. To address this challenge, we propose a model, CALVIN, a specialized video LLM that leverages previous movie context to generate fully contextual scene descriptions. |
Gowthami Somepalli; Arkabandhu Chowdhury; Jonas Geiping; Basri Ronen; Tom Goldstein; David Jacobs; |
108 | Algorithmic Capabilities of Random Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To what extent do they depend on the supervisory signal provided to models, and to what extent are they attributable to behavior already present in models at the beginning of training? To investigate these questions, we investigate what functions can be learned by randomly initialized transformers in which only the embedding layers are optimized, so that the only input–output mappings learnable from data are those already implemented (up to a choice of encoding scheme) by the randomly initialized model. |
Ziqian Zhong; Jacob Andreas; |
109 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from scientific papers. |
Zirui Wang; Mengzhou Xia; Luxi He; Howard Chen; Yitao Liu; Richard Zhu; Kaiqu Liang; Xindi Wu; Haotian Liu; Sadhika Malladi; Chevalier; Sanjeev Arora; Danqi Chen; |
110 | Chain of Thoughtlessness? An Analysis of CoT in Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examines the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. |
Kaya Stechly; Karthik Valmeekam; Subbarao Kambhampati; |
111 | Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using image models naively for solving inverse video problems often suffers from flickering, texture-sticking, and temporal inconsistency in generated videos. To tackle these problems, in this paper, we view frames as continuous functions in the 2D space, and videos as a sequence of continuous warping transformations between different frames. |
Giannis Daras; Weili Nie; Karsten Kreis; Alex Dimakis; Morteza Mardani; Nikola Kovachki; Arash Vahdat; |
112 | HYDRA: Model Factorization Framework for Black-Box LLM Personalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing solutions have primarily focused on prompt design to incorporate user-specific profiles and behaviors; however, such approaches often struggle to generalize effectively due to their inability to capture shared knowledge among all users. To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation. |
Yuchen Zhuang; Haotian Sun; Yue Yu; Rushi Qiang; Qifan Wang; Chao Zhang; Bo Dai; |
113 | WildVision: Evaluating Vision-Language Models in The Wild with Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our comprehensive analysis of 20K real-world interactions reveals important insights into the failure cases of top-performing VLMs. |
Yujie Lu; Dongfu Jiang; Wenhu Chen; William Yang Wang; Yejin Choi; Bill Yuchen Lin; |
114 | BitDelta: Your Fine-Tune May Only Be Worth One Bit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. |
James Liu; Guangxuan Xiao; Kai Li; Jason Lee; Song Han; Tri Dao; Tianle Cai; |
115 | MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. |
Yubo Wang; Xueguang Ma; Ge Zhang; Yuansheng Ni; Abhranil Chandra; Shiguang Guo; Weiming Ren; Aaran Arulraj; Xuan He; Ziyan Jiang; Tianle Li; Max KU; Wang; Alex Zhuang; Rongqi Fan; Xiang Yue; Wenhu Chen; |
116 | Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFN), which are less studied than attention blocks. |
Xiuying Wei; Skander Moalla; Razvan Pascanu; Caglar Gulcehre; |
117 | Be Like A Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate training data exposure without sacrificing model performance, we introduce a simple but subtle modification to the standard next-token prediction objective for autoregressive LLMs that we call the goldfish loss. |
Abhimanyu Hans; John Kirchenbauer; Yuxin Wen; Neel Jain; Hamid Kazemi; Prajwal Singhania; Siddharth Singh; Gowthami Somepalli; Jonas Geiping; Abhinav Bhatele; Tom Goldstein; |
118 | Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its success, two primary limitations have been identified: the inefficacy of Exponential Moving Average (EMA) from I-JEPA in preventing entire collapse and the inadequacy of I-JEPA prediction in accurately learning the mean of patch representations. Addressing these challenges, this study introduces a novel framework, namely C-JEPA (Contrastive-JEPA), which integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy. |
Shentong Mo; Shengbang Tong; |
119 | Crafting Interpretable Embeddings By Asking LLMs Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. |
Vinamra Benara; Chandan Singh; John Morris; Richard Antonello; Ion Stoica; Alexander Huth; Jianfeng Gao; |
120 | Who’s Asking? User Personas and The Mechanics of Latent Misalignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite investments in improving model safety, studies show that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. |
Asma Ghandeharioun; Ann Yuan; Marius Guerard; Emily Reif; |
121 | MathPile: A Billion-Token-Scale Pretraining Corpus for Math Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MathPile, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. |
Zengzhi Wang; Xuefeng Li; Rui Xia; Pengfei Liu; |
122 | Self-Retrieval: End-to-End Information Retrieval with One Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce \emph{Self-Retrieval}, a novel end-to-end LLM-driven information retrieval architecture. |
Qiaoyu Tang; Jiawei Chen; Zhuoqun Li; Bowen Yu; Yaojie Lu; ChengFu; Haiyang Yu; Hongyu Lin; Fei Huang; Ben He; Xianpei Han; Le Sun; Yongbin Li; |
123 | Knowledge Circuit in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the computation graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. |
Yunzhi Yao; Ningyu Zhang; Zekun Xi; Mengru Wang; Ziwen Xu; Shumin Deng; Huajun Chen; |
124 | MInference: Accelerating Pre-filling for Long-Context LLMs Via Dynamic Sparse Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for speeding up pre-filling often fail to maintain acceptable accuracy or efficiency when applied to longcontext LLMs. To address this gap, we introduce MInference, a sparse calculation method designed to accelerate pre-filling of long-sequence processing. |
Huiqiang Jiang; Yucheng LI; Chengruidong Zhang; Qianhui Wu; Xufang Luo; Surin Ahn; Zhenhua Han; Amir Abdi; Dongsheng Li; Chin-Yew Lin; Yuqing Yang; Lili Qiu; |
125 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. |
Shubham Toshniwal; Ivan Moshkov; Sean Narenthiran; Daria Gitman; Fei Jia; Igor Gitman; |
126 | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents UltraEdit, a large-scale (~ 4M editing samples), automatically generated dataset for instruction-based image editing. |
Haozhe Zhao; Xiaojian (Shawn) Ma; Liang Chen; Shuzheng Si; Rujie Wu; Kaikai An; Peiyu Yu; Minjia Zhang; Qing Li; Baobao Chang; |
127 | Refusal in Language Models Is Mediated By A Single Direction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. |
Andy Arditi; Oscar Obeso; Aaquib Syed; Nina Panickssery; Daniel Paleka; Wes Gurnee; Neel Nanda; |
128 | JiuZhang3.0: Efficiently Improving Mathematical Reasoning By Training Small Data Synthesis Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. |
Kun Zhou; Beichen Zhang; jiapeng wang; Zhipeng Chen; Xin Zhao; Jing Sha; Zhichao Sheng; Shijin Wang; Ji-Rong Wen; |
129 | Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. |
Senthooran Rajamanoharan; Arthur Conmy; Lewis Smith; Tom Lieberum; Vikrant Varma; Janos Kramar; Rohin Shah; Neel Nanda; |
130 | IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an automated evaluation framework IQA-EVAL to Interactive Question Answering Evaluations, more specifically, we introduce LLM-based Evaluation Agent (LEA) that can: (1) simulate human behaviors to generate interactions with IQA models; (2) automatically evaluate the generated interactions. |
Ruosen Li; Ruochen Li; Barry Wang; Xinya Du; |
131 | MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel semi-automatic question generation strategy by composing event structures from information extraction (IE) datasets and present the first Multi-hop Event-centric Question Answering (MEQA) benchmark. |
Ruosen Li; Zimu Wang; Son Tran; Lei Xia; Xinya Du; |
132 | Stabilize The Latent Space for Image Autoregressive Modeling: A Unified Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence. To address this discrepancy, we introduce a unified perspective on the relationship between latent space and generative models, emphasizing the stability of latent space in image generative modeling. |
Yongxin Zhu; Bocheng Li; Hang Zhang; Xin Li; Linli Xu; Lidong Bing; |
133 | How Do Large Language Models Handle Multilingualism? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. |
Yiran Zhao; Wenxuan Zhang; Guizhen Chen; Kenji Kawaguchi; Lidong Bing; |
134 | KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision. Our work, KVQuant, facilitates low precision KV cache quantization by incorporating several novel methods: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; and (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges. |
Coleman Hooper; Sehoon Kim; Hiva Mohammadzadeh; Michael Mahoney; Sophia Shao; Kurt Keutzer; Amir Gholami; |
135 | SIRIUS : Contexual Sparisty with Correction for Efficient LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces \textsc{Sirius}, an efficient correction mechanism, which enables accurate LLM inference with contextual sparsity. |
Yang Zhou; Zhuoming Chen; Zhaozhuo Xu; Victoria Lin; Beidi Chen; |
136 | S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning By Structured Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of Structured Sparse Fine-Tuning (S$^{2}$FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability. |
Xinyu Yang; Jixuan Leng; Geyang Guo; Jiawei Zhao; Ryumei Nakada; Linjun Zhang; Huaxiu Yao; Beidi Chen; |
137 | Sequoia: Scalable and Robust Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Sequoia, a scalable and robust algorithm for speculative decoding. |
Zhuoming Chen; Avner May; Ruslan Svirschevski; Yu-Hsun Huang; Max Ryabinin; Zhihao Jia; Beidi Chen; |
138 | Confidence Regulation Neurons in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. |
Alessandro Stolfo; Ben Wu; Wes Gurnee; Yonatan Belinkov; Xingyi Song; Mrinmaya Sachan; Neel Nanda; |
139 | FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework called FlowTurbo to accelerate the sampling of flow-based models while still enhancing the sampling quality. |
Wenliang Zhao; Minglei Shi; Xumin Yu; Jie Zhou; Jiwen Lu; |
140 | Counterfactual PPO Enhanced Shared Reflector for LLM-based Multi-agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, named COPPER, to enhance the collaboration ability of multi-agent systems through learnable self-reflection mechanism. |
Xiaohe Bo; Zeyu Zhang; Quanyu Dai; Xueyang Feng; Lei Wang; Rui Li; Xu Chen; Ji-Rong Wen; |
141 | OneBit: Towards Extremely Low-bit Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs. For this target, we introduce a 1-bit model compressing framework named OneBit, including a novel 1-bit parameter representation method to better quantize LLMs as well as an effective parameter initialization method based on matrix decomposition to improve the convergence speed of the quantization framework. |
Yuzhuang Xu; Xu Han; Zonghan Yang; Shuo Wang; Qingfu Zhu; Zhiyuan Liu; Weidong Liu; Wanxiang Che; |
142 | One-Shot Safety Alignment for Large Language Models Via Optimal Dualization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. |
Xinmeng Huang; Shuo Li; Edgar Dobriban; Osbert Bastani; Hamed Hassani; Dongsheng Ding; |
143 | Super Consistency of Neural Network Landscapes and Learning Rate Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From an optimization perspective, this phenomenon is puzzling, as it implies that the loss landscape is consistently similar across very different model sizes. In this work, we study the landscape through the lens of the Hessian, with a focus on its largest eigenvalue (i.e. the sharpness), and find that certain spectral properties under $\mu$P are largely independent of the width and depth of the network along the training trajectory. |
Lorenzo Noci; Alexandru Meterez; Thomas Hofmann; Antonio Orvieto; |
144 | Not All Tokens Are What You Need for Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. |
Zhenghao Lin; Zhibin Gou; Yeyun Gong; Xiao Liu; yelong shen; Ruochen Xu; Chen Lin; Yujiu Yang; Jian Jiao; Nan Duan; Weizhu Chen; |
145 | Trajectory Flow Matching with Applications to Clinical Time Series Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current algorithms for training Neural SDEs require backpropagation through the SDE dynamics, greatly limiting their scalability and stability. To address this, we propose \textbf{Trajectory Flow Matching} (TFM), which trains a Neural SDE in a \textit{simulation-free} manner, bypassing backpropagation through the dynamics. |
Xi (Nicole) Zhang; Yuan Pu; Yuki Kawamura; Andrew Loza; Yoshua Bengio; Dennis Shung; Alexander Tong; |
146 | Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $p$-annealing, which demonstrates improved performance on our metric. |
Adam Karvonen; Benjamin Wright; Can Rager; Rico Angell; Jannik Brinkmann; Logan Smith; Claudio Mayrink Verdun; David Bau; Samuel Marks; |
147 | GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach for single-shot novel view synthesis, a semantic-preserving generative warping framework that enables T2I generative models to learn where to warp and where to generate, through augmenting cross-view attention with self-attention. |
Junyoung Seo; Kazumi Fukuda; Takashi Shibuya; Takuya Narihira; Naoki Murata; Shoukang Hu; Chieh-Hsin Lai; Seungryong Kim; Yuki Mitsufuji; |
148 | Scaling Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. |
Biao Zhang; Garrett Tanzer; Orhan Firat; |
149 | ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ZipCache, an accurate and efficient KV cache quantization method for large language models (LLMs). |
Yefei He; Luoming Zhang; Weijia Wu; Jing Liu; Hong Zhou; Bohan Zhuang; |
150 | Amortized Planning with Large-Scale Transformers: A Case Study on Chess Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper uses chess, a landmark planning problem in AI, to assess transformers’ performance on a planning task where memorization is futile – even at large scale. |
Anian Ruoss; Grégoire Delétang; Sourabh Medapati; Jordi Grau-Moya; Kevin Li; Elliot Catt; John Reid; Cannada Lewis; Tim Genewein; Joel Veness; |
151 | DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to domain-information catastrophic forgetting in collaborative training and therefore makes model perform sub-optimally on the individual domain. To address this issue, we introduce DoFIT, a new Domain-aware FIT framework that alleviates catastrophic forgetting through two new designs. |
Binqian Xu; Xiangbo Shu; Haiyang Mei; Zechen Bai; Basura Fernando; Mike Zheng Shou; Jinhui Tang; |
152 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Notably, we introduce a rule-based camera trajectory generation method, enabling the synthetic pipeline to incorporate diverse and precise camera motion annotation, which can rarely found in real-world data. |
Zhenzhi Wang; Yixuan Li; Yanhong Zeng; Youqing Fang; Yuwei Guo; Wenran Liu; Jing Tan; Kai Chen; Bo Dai; Tianfan Xue; Dahua Lin; |
153 | Scalable Optimization in The Modular Norm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the natural norm particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. |
Jeremy Bernstein; Tim Large; Yang Liu; Jacob Huh; Hyojin Bahng; Phillip Isola; |
154 | Paloma: A Benchmark for Evaluating Language Model Fit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains, instead of assuming perplexity on one distribution extrapolates to others. |
Ian Magnusson; Akshita Bhagia; Valentin Hofmann; Luca Soldaini; Ananya Harsh Jha; Oyvind Tafjord; Dustin Schwenk; Evan Walsh; Yanai Elazar; Kyle Lo; Dirk Groeneveld; Iz Beltagy; Hannaneh Hajishirzi; Noah Smith; Kyle Richardson; Jesse Dodge; |
155 | AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. |
Ma Chang; Junlei Zhang; Zhihao Zhu; Cheng Yang; Yujiu Yang; Yaohui Jin; Zhenzhong Lan; Lingpeng Kong; Junxian He; |
156 | Analysing The Generalisation and Reliability of Steering Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. |
Daniel Tan; David Chanin; Aengus Lynch; Brooks Paige; Dimitrios Kanoulas; Adrià Garriga-Alonso; Robert Kirk; |
157 | Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose multiple metrics to rigorously quantify agents’ performance and alignment with the assigned role. |
Sahar Abdelnabi; Amr Gomaa; Sarath Sivaprasad; Schönherr; Mario Fritz; |
158 | Invisible Image Watermarks Are Provably Removable Using Generative AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They also prevent people from misusing images, especially those generated by AI models. We propose a family of regeneration attacks to remove these invisible watermarks. |
Xuandong Zhao; Kexun Zhang; Zihao Su; Saastha Vasan; Ilya Grishchenko; Christopher Kruegel; Giovanni Vigna; Yu-Xiang Wang; Lei Li; |
159 | A Universal Growth Rate for Learning with Smooth Surrogate Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
160 | Turning Indirect Knowledge Into Direct Demonstrations for Computer Agents at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Synatra, an approach that effectively transforming the indirect knowledge into direct supervisions at scale. |
Tianyue Ou; Frank F. Xu; Aman Madaan; Jiarui Liu; Robert Lo; Abishek Sridhar; Sudipta Sengupta; Dan Roth; Graham Neubig; Shuyan Zhou; |
161 | Connecting The Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Could an LLM infer the dangerous knowledge by piecing together these hints? As a step towards answering this question, we study \textit{inductive out-of-context reasoning} (OOCR). |
Johannes Treutlein; Dami Choi; Jan Betley; Cem Anil; Samuel Marks; Roger Grosse; Owain Evans; |
162 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. |
Rudolf Laine; Bilal Chughtai; Jan Betley; Kaivalya Hariharan; Mikita Balesni; Jérémy Scheurer; Marius Hobbhahn; Alexander Meinke; Owain Evans; |
163 | Fine-grained Analysis of In-context Linear Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a stronger characterization of the optimization and generalization landscape of ICL through contributions on architectures, low-rank parameterization, and correlated designs: (1) We study the landscape of 1-layer linear attention and 1-layer H3, a state-space model. |
Yingcong Li; Ankit Rawat; Samet Oymak; |
164 | Metric Flow Matching for Smooth Interpolations on The Data Manifold Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Metric Flow Matching (MFM), a novel simulation-free framework for conditional flow matching where interpolants are approximate geodesics learned by minimizing the kinetic energy of a data-induced Riemannian metric. |
Kacper Kapusniak; Peter Potaptchik; Teodora Reu; Leo Zhang; Alexander Tong; Michael Bronstein; Joey Bose; Francesco Di Giovanni; |
165 | Large Language Model Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study how to perform unlearning, i.e. forgetting undesirable (mis)behaviors, on large language models (LLMs). |
Yuanshun Yao; Xiaojun Xu; Yang Liu; |
166 | Large Language Model-Driven Audio Codec Is A Few-Shot Audio Task Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel and LLMs-driven audio codec model, LLM-Codec, to transfer the audio modality into the textual space, \textit{i.e.} representing audio tokens with words or sub-words in the vocabulary of LLMs, while keeping high audio reconstruction quality. |
Dongchao Yang; Haohan Guo; Yuanyuan Wang; Rongjie Huang; Xiang Li; Xu Tan; Xixin Wu; Helen Meng; |
167 | Preference Learning Algorithms Do Not Learn Preference Rankings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via *ranking accuracy*. |
Angelica Chen; Sadhika Malladi; Lily Zhang; Xinyi Chen; Qiuyi (Richard) Zhang; Rajesh Ranganath; Kyunghyun Cho; |
168 | The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About The Subjective and Multicultural Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. |
Hannah Rose Kirk; Alexander Whitefield; Paul Rottger; Andrew M. Bean; Katerina Margatina; Rafael Mosquera; Juan Ciro; Max Bartolo; Adina Williams; He He; Bertie Vidgen; Scott Hale; |
169 | Segment Anything Without Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Unsupervised SAM (UnSAM), a segment anything model for interactive and automatic whole-image segmentation which does not require human annotations. |
Xudong Wang; Jingfeng Yang; Trevor Darrell; |
170 | Smoothed Energy Guidance: Guiding Diffusion Models By Attenuating Energy Curvature of Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. |
Susung Hong; |
171 | Transcoders Find Interpretable LLM Feature Circuits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this we explore **transcoders**, which seek to faithfully approximate a densely activating MLP layer with a wider, sparsely-activating MLP layer. We successfully train transcoders on language models with 120M, 410M, and 1.4B parameters, and find them to perform at least on par with SAEs in terms of sparsity, faithfulness, and human-interpretability. |
Jacob Dunefsky; Philippe Chlenski; Neel Nanda; |
172 | LiT: Unifying LiDAR Languages with LiDAR Translator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These gaps, akin to language barriers, hinder the synergistic use of diverse LiDAR datasets, limiting the scalability and unification of perception models. To address this challenge, we present the \textit{LiDAR Translator (LiT)}, a novel framework designed to unify LiDAR data into a single target “language”. |
Yixing Lao; Tao Tang; Xiaoyang Wu; Peng Chen; Kaicheng Yu; Hengshuang Zhao; |
173 | Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel black-box approach for producing a diverse collection of adversarial prompts. |
Mikayel Samvelyan; Sharath Chandra Raparthy; Andrei Lupu; Eric Hambro; Aram Markosyan; Manish Bhatt; Yuning Mao; Minqi Jiang; Jack Parker-Holder; Jakob Foerster; Tim Rocktäschel; Roberta Raileanu; |
174 | PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents’ actions. |
Yijia Shao; Tianshi Li; Weiyan Shi; Yanchen Liu; Diyi Yang; |
175 | Out-of-Distribution Detection with A Single Unconditional Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. |
Alvin Heng; alexandre thiery; Harold Soh; |
176 | Scaling Transformer Neural Networks for Skillful and Reliable Medium-range Weather Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce Stormer, a simple transformer model that achieves state-of-the art performance on weather forecasting with minimal changes to the standard transformer backbone. |
Tung Nguyen; Rohan Shah; Hritik Bansal; Troy Arcomano; Romit Maulik; Rao Kotamarthi; Ian Foster; Sandeep Madireddy; Aditya Grover; |
177 | The Road Less Scheduled Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. |
Aaron Defazio; Xingyu Yang; Ahmed Khaled; Konstantin Mishchenko; Harsh Mehta; Ashok Cutkosky; |
178 | Diversity Is Not All You Need: Training A Robust Cooperative Agent Needs Specialist Partners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a principled method for quantifying both the diversity and specialization of a partner population based on the concept of mutual information. |
Rujikorn Charakorn; Poramate Manoonpong; Nat Dilokthanakul; |
179 | Enhancing Large Language Models Through Adaptive Tokenizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a simple but effective method to learn tokenizer specifically engineered for seamless integration with LLMs. |
Mengyu Zheng; Hanting Chen; Tianyu Guo; Chong Zhu; Binfan Zheng; Chang Xu; Yunhe Wang; |
180 | An Image Is Worth 32 Tokens for Reconstruction and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these 2D tokenizations face challenges in managing the inherent redundancies present in images, where adjacent regions frequently display similarities. To overcome this issue, we introduce **T**ransformer-based 1-D**i**mensional **Tok**enizer (TiTok), an innovative approach that tokenizes images into 1D latent sequences. |
Qihang Yu; Mark Weber; Xueqing Deng; Xiaohui Shen; Daniel Cremers; Liang-Chieh Chen; |
181 | From An Image to A Scene: Learning to Imagine The World from A Million 360° Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce 360-1M, a 360° video dataset consisting of 1 million videos, and a process for efficiently finding corresponding frames from diverse viewpoints at scale. |
Matthew Wallingford; Anand Bhattad; Aditya Kusupati; Vivek Ramanujan; Matt Deitke; Aniruddha Kembhavi; Roozbeh Mottaghi; Wei-Chiu Ma; Ali Farhadi; |
182 | Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems’ multiple abilities in a closed-loop manner. |
Xiaosong Jia; Zhenjie Yang; Qifeng Li; Zhiyuan Zhang; Junchi Yan; |
183 | TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs via context optimization. |
Benjamin Feuer; Robin Schirrmeister; Valeriia Cherepanova; Chinmay Hegde; Frank Hutter; Micah Goldblum; Niv Cohen; Colin White; |
184 | Recurrent Neural Networks: Vanishing and Exploding Gradients Are Not The End of The Story Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. |
Nicolas Zucchet; Antonio Orvieto; |
185 | DARG: Dynamic Evaluation of Large Language Models Via Adaptive Reasoning Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity. |
Zhehao Zhang; Jiaao Chen; Diyi Yang; |
186 | Transformers Can Do Arithmetic with The Right Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. |
Sean McLeish; Arpit Bansal; Alex Stein; Neel Jain; John Kirchenbauer; Brian Bartoldson; Bhavya Kailkhura; Abhinav Bhatele; Jonas Geiping; Avi Schwarzschild; Tom Goldstein; |
187 | InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents InterpBench, a collection of semi-synthetic yet realistic transformers with known circuits for evaluating these techniques. |
Rohan Gupta; Iván Arcuschin Moreno; Thomas Kwa; Adrià Garriga-Alonso; |
188 | FasterDiT: Towards Faster Diffusion Transformers Training Without Architecture Modification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to accelerate DiT training without any architectural modification. |
Jingfeng Yao; Cheng Wang; Wenyu Liu; Xinggang Wang; |
189 | OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. |
Tianbao Xie; Danyang Zhang; Jixuan Chen; Xiaochuan Li; Siheng Zhao; Ruisheng Cao; Jing Hua Toh; Zhoujun Cheng; Dongchan Shin; Fangyu Lei; Yitao Liu; Yiheng Xu; Shuyan Zhou; Silvio Savarese; Caiming Xiong; Victor Zhong; Tao Yu; |
190 | Many-Shot In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) Reinforced ICL that uses model-generated chain-of-thought rationales in place of human rationales, and (2) Unsupervised ICL where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. |
Rishabh Agarwal; Avi Singh; Lei Zhang; Bernd Bohnet; Luis Rosias; Stephanie Chan; Biao Zhang; Ankesh Anand; Zaheer Abbas; Azade Nova; John Co-Reyes; Eric Chu; Feryal Behbahani; Aleksandra Faust; Hugo Larochelle; |
191 | ReFT: Representation Finetuning for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. |
Zhengxuan Wu; Aryaman Arora; Zheng Wang; Atticus Geiger; Dan Jurafsky; Christopher D Manning; Christopher Potts; |
192 | FLAME : Factuality-Aware Alignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL). |
Sheng-Chieh Lin; Luyu Gao; Barlas Oguz; Wenhan Xiong; Jimmy Lin; Scott Yih; Xilun Chen; |
193 | CorDA: Context-Oriented Decomposition Adaptation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable adapters from weight decomposition oriented by the context of downstream task or world knowledge. |
Yibo Yang; Xiaojie Li; Zhongzhu Zhou; Shuaiwen Song; Jianlong Wu; Liqiang Nie; Bernard Ghanem; |
194 | Continual Audio-Visual Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel continual audio-visual sound separation task, aiming to continuously separate sound sources for new classes while preserving performance on previously learned classes, with the aid of visual guidance. |
Weiguo Pian; Yiyang Nan; Shijian Deng; Shentong Mo; Yunhui Guo; Yapeng Tian; |
195 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hypothesizing that difficult queries are crucial to learn complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. |
Yuxuan Tong; Xiwen Zhang; Rui Wang; Ruidong Wu; Junxian He; |
196 | MotionBooth: Motion-Aware Customized Text-to-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. |
Jianzong Wu; Xiangtai Li; Yanhong Zeng; Jiangning Zhang; Qianyu Zhou; Yining Li; Yunhai Tong; Kai Chen; |
197 | LLM Dataset Inference: Detect Datasets, Not Strings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a new *dataset inference* method to accurately identify the datasets used to train large language models. |
Pratyush Maini; Hengrui Jia; Nicolas Papernot; Adam Dziedzic; |
198 | BAKU: An Efficient Transformer for Multi-Task Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. |
Siddhant Haldar; Zhuoran Peng; Lerrel Pinto; |
199 | Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while existing SHGL methods share a similar essential with clustering approaches, they encounter two significant limitations: (i) noise in graph structures is often introduced during the message-passing process to weaken node representations, and (ii) cluster-level information may be inadequately captured and leveraged, diminishing the performance in downstream tasks. In this paper, we address these limitations by theoretically revisiting SHGL from the spectral clustering perspective and introducing a novel framework enhanced by rank and dual consistency constraints. |
YUJIE MO; Zhihe Lu; Runpeng Yu; Xiaofeng Zhu; Xinchao Wang; |
200 | Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to design dual-target drugs with diffusion models that are trained on single-target protein-ligand complex pairs. |
Xiangxin Zhou; Jiaqi Guan; Yijia Zhang; Xingang Peng; Liang Wang; Jianzhu Ma; |
201 | Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs), particularly GPT-4V, to present a novel approach, Make-it-Real: 1) We demonstrate that GPT-4V can effectively recognize and describe materials, allowing the construction of a detailed material library. |
Ye Fang; Zeyi Sun; Tong Wu; Jiaqi Wang; Ziwei Liu; Gordon Wetzstein; Dahua Lin; |
202 | MADiff: Offline Multi-agent Learning with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite the effectiveness shown for single-agent learning, it remains unclear how DMs can operate in multi-agent problems, where agents can hardly complete teamwork without good coordination by independently modeling each agent’s trajectories. In this paper, we propose MADiff, a novel generative multi-agent learning framework to tackle this problem. |
Zhengbang Zhu; Minghuan Liu; Liyuan Mao; Bingyi Kang; Minkai Xu; Yong Yu; Stefano Ermon; Weinan Zhang; |
203 | Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods’ full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. |
Zhikai Chen; Haitao Mao; Jingzhe Liu; Yu Song; Bingheng Li; Wei Jin; Bahare Fatemi; Anton Tsitsulin; Bryan Perozzi; Hui Liu; Jiliang Tang; |
204 | XLSTM: Extended Long Short-Term Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. |
Maximilian Beck; Korbinian Pöppel; Markus Spanring; Andreas Auer; Oleksandra Prudnikova; Michael Kopp; Günter Klambauer; Johannes Brandstetter; Sepp Hochreiter; |
205 | Learning to Cooperate with Humans Using Generative Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents. |
Yancheng Liang; Daphne Chen; Abhishek Gupta; Simon Du; Natasha Jaques; |
206 | FinBen: An Holistic Financial Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. |
Qianqian Xie; Weiguang Han; Zhengyu Chen; Ruoyu Xiang; Xiao Zhang; Yueru He; Mengxi Xiao; Dong Li; Yongfu Dai; Duanyu Feng; Yijing Xu; Haoqiang Kang; Ziyan Kuang; Chenhan Yuan; Kailai Yang; Zheheng Luo; Tianlin Zhang; Zhiwei Liu; GUOJUN XIONG; Zhiyang Deng; Yuechen Jiang; Zhiyuan Yao; Haohang Li; Yangyang Yu; Gang Hu; Huang Jiajia; Xiaoyang Liu; Alejandro Lopez-Lira; Benyou Wang; Yanzhao Lai; Hao Wang; Min Peng; Sophia Ananiadou; Jimin Huang; |
207 | Catastrophic Goodhart: Regularizing RLHF with KL Divergence Does Not Mitigate Heavy-tailed Reward Misspecification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, if error is heavy-tailed, some policies obtain arbitrarily high reward despite achieving no more utility than the base model—a phenomenon we call catastrophic Goodhart. We adapt a discrete optimization method developed for adversarial attacks to measure the tails of open-source reward models, finding that they are consistent with light-tailed error. |
Thomas Kwa; Adrià Garriga-Alonso; |
208 | G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we develop a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning. |
Xiaoxin He; Yijun Tian; Yifei Sun; Nitesh Chawla; Thomas Laurent; Yann LeCun; Xavier Bresson; Bryan Hooi; |
209 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel markerless algorithm to track 3D human poses in severe occlusion and close interaction to obtain our annotations with minimal manual intervention. |
Rawal Khirodkar; Jyun-Ting Song; Jinkun Cao; Zhengyi Luo; Kris Kitani; |
210 | Learn More, But Bother Less: Parameter Efficient Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel parameter-efficient approach for continual learning in LLMs, which empirically investigates knowledge transfer from previously learned tasks to new tasks through low-rank matrix parameters, enhancing the learning of new tasks without significant interference. |
Fuli Qiao; Mehrdad Mahdavi; |
211 | Aligning to Thousands of Varying Preferences Via System Message Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A major challenge in adopting a more individualized approach to LLM alignment is scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual’s preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system messages steering the LLM’s generation behavior to better align with the user’s intentions. |
Seongyun Lee; Sue Park; Seungone Kim; Minjoon Seo; |
212 | InterDreamer: Less Supervision for More Generalizable Text-Driven 3D Human-Object Interaction Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. |
ziyin wang; Sirui Xu; Yu-Xiong Wang; Liangyan Gui; |
213 | No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and loss of plasticity. |
Skander Moalla; Andrea Miele; Razvan Pascanu; Caglar Gulcehre; |
214 | RoPINN: Region Optimized Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since PDEs are usually defined on continuous domains, solely optimizing models on scattered points may be insufficient to obtain an accurate solution for the whole domain. To mitigate this inherent deficiency of the default scatter-point optimization, this paper proposes and theoretically studies a new training paradigm as region optimization. |
Haixu Wu; Huakun Luo; Yuezhou Ma; Jianmin Wang; Mingsheng Long; |
215 | LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low Resource and Extinct Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. |
Andrew M. Bean; Simeon Hellsten; Harry Mayne; Jabez Magomere; Ethan Chi; Ryan Chi; Scott Hale; Hannah Rose Kirk; |
216 | Normalization and Effective Learning Rates in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to make the learning rate schedule explicit with a simple re-parameterization which we call Normalize-and-Project (NaP), which couples the insertion of normalization layers with weight projection, ensuring that the effective learning rate remains constant throughout training. |
Clare Lyle; Zeyu Zheng; Khimya Khetarpal; James Martens; Hado van Hasselt; Razvan Pascanu; Will Dabney; |
217 | Faster Neighborhood Attention: Reducing The O(n^2) Cost of Self Attention at The Threadblock Level Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to massively improve upon existing infrastructure by providing two new methods for implementing neighborhood attention. |
Ali Hassani; Wen-Mei Hwu; Humphrey Shi; |
218 | Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that it is possible to manipulate the internal model representations as well as edit model weights, based on the mechanism we discover, in order to significantly improve performance on our synthetic Laundry List task, which requires recall from a list, often improving task accuracy by over 20\%. |
Jack Merullo; Carsten Eickhoff; Ellie Pavlick; |
219 | Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present Vitron, a universal pixel-level vision LLM designed for comprehensive understanding, generating, segmenting, and editing of both static images and dynamic videos. |
Hao Fei; Shengqiong Wu; Hanwang Zhang; Tat-Seng Chua; Shuicheng Yan; |
220 | Code Agents Are State of The Art Software Testers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth patches, and golden tests. |
Niels Mündler; Mark Müller; Jingxuan He; Martin Vechev; |
221 | Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. |
Boyuan Chen; Diego Martí Monsó; Yilun Du; Max Simchowitz; Russ Tedrake; Vincent Sitzmann; |
222 | Optimal Multiclass U-Calibration Error and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously. |
Haipeng Luo; Spandan Senapati; Vatsal Sharan; |
223 | BertaQA: How Much Do Language Models Know About Local Culture? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This raises the question of how well these models perform on topics relevant to other cultures, whose presence on the web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque. |
Julen Etxaniz; Gorka Azkune; Aitor Soroa; Oier Lacalle; Mikel Artetxe; |
224 | Mr.Bean: A Comprehensive Meta-Reasoning Benchmark for Analyzing Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a process-based benchmark Mr. Bean that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. |
Zhongshen Zeng; Yinhong Liu; Yingjia Wan; Jingyao Li; Pengguang Chen; Jianbo Dai; Yuxuan Yao; Rongwu Xu; Zehan Qi; Wanru Zhao; Linling Shen; Jianqiao Lu; Haochen Tan; Yukang Chen; Hao Zhang; Zhan Shi; Bailin Wang; Zhijiang Guo; Jiaya Jia; |
225 | On The Inductive Bias of Stacking Towards Improving Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although efficient for training, the model biases induced by such growing approaches is largely unexplored. In this work, we examine this fundamental aspect of gradual stacking, going beyond its efficiency benefits. |
Nikunj Saunshi; Stefani Karp; Shankar Krishnan; Sobhan Miryoosefi; Sashank Jakkam Reddi; Sanjiv Kumar; |
226 | Near-Minimax-Optimal Distributional Reinforcement Learning with A Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions in the generative model regime (up to logarithmic factors), the first result of this kind for any distributional RL algorithm. |
Mark Rowland; Kevin Li; Remi Munos; Clare Lyle; Yunhao Tang; Will Dabney; |
227 | Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the need for pluralistic alignment, we develop a novel class of multi-modal RLHF methods. |
Sriyash Poddar; Yanming Wan; Hamish Ivison; Abhishek Gupta; Natasha Jaques; |
228 | TAPVid-3D: A Benchmark for Tracking Any Point in 3D Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). |
Skanda Koppula; Ignacio Rocco; Yi Yang; joseph heyward; Joao Carreira; Andrew Zisserman; Gabriel Brostow; Carl Doersch; |
229 | Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO), to create robust system-level defenses. |
Andy Zhou; Bo Li; Haohan Wang; |
230 | Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a variety of discrete latent representations, including textual descriptions, detection bounding boxes, object blobs, and visual tokens. |
Jiatao Gu; Ying Shen; Shuangfei Zhai; Yizhe Zhang; Navdeep Jaitly; Joshua Susskind; |
231 | A General Protocol to Probe Large Vision Models for 3D Physical Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our objective in this paper is to probe large vision models to determine to what extent they `understand’ different physical properties of the 3D scene depicted in an image. |
Guanqi Zhan; Chuanxia Zheng; Weidi Xie; Andrew Zisserman; |
232 | Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing $\textit{Architect}$, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. |
Yian Wang; Xiaowen Qiu; Jiageng Liu; Zhehuan Chen; Jiting Cai; Yufei Wang; Tsun-Hsuan Johnson Wang; Zhou Xian; Chuang Gan; |
233 | ActionAtlas: A VideoQA Benchmark for Fine-grained Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our world is full of varied actions and moves in specialized fields that we, as humans, seek to identify and learn about. To evaluate the effectiveness of multi-modal models in helping us recognize such fine-grained actions, we introduce ActionAtlas, a video question answering (VideoQA) benchmark on fine-grained action recognition with short videos across various sports. |
Mohammadreza (Reza) Salehi; Jae Sung Park; Aditya Kusupati; Ranjay Krishna; Yejin Choi; Hannaneh Hajishirzi; Ali Farhadi; |
234 | A Practitioner’s Guide to Real-World Continual Multimodal Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to best update foundation models, in cases beyond small edits but not warranting re-pretraining, remains unclear. This work aims to provide extensive guidance on effective continual model updates in such scenarios. |
Karsten Roth; Vishaal Udandarao; Sebastian Dziadzio; Ameya Prabhu; Mehdi Cherti; Oriol Vinyals; Olivier Henaff; Samuel Albanie; Matthias Bethge; Zeynep Akata; |
235 | Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. |
Gavia Gray; aman tiwari; Shane Bergsma; Joel Hestness; |
236 | Iteratively Refined Behavior Regularization for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. |
Yi Ma; Jianye Hao; Xiaohan Hu; YAN ZHENG; Chenjun Xiao; |
237 | Offline Behavior Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two naive OBD objectives, DBC and PBC, which measure distillation performance via the decision difference between policies trained on distilled data and either offline data or a near-expert policy. |
Shiye Lei; Sen Zhang; Dacheng Tao; |
238 | Motion Graph Unleashed: A Novel Approach to Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce motion graph, a novel approach to address the video prediction problem, i.e., predicting future video frames from limited past data. |
Yiqi Zhong; Luming Liang; Bohan Tang; Ilya Zharkov; Ulrich Neumann; |
239 | T2V-Turbo: Breaking The Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve **both fast and high-quality video generation**. |
Jiachen Li; Weixi Feng; Tsu-Jui Fu; Xinyi Wang; S Basu; Wenhu Chen; William Yang Wang; |
240 | CV-VAE: A Compatible Video VAE for Latent Generative Video Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). |
Sijie Zhao; Yong Zhang; Xiaodong Cun; Shaoshu Yang; Muyao Niu; Xiaoyu Li; Wenbo HU; Ying Shan; |
241 | Neural Network Learns Low-dimensional Polynomials with SGD Near The Information-theoretic Limit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree-$q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). |
Kazusato Oko; Denny Wu; Jason Lee; Taiji Suzuki; |
242 | LM-HT SNN: Enhancing The Performance of SNN to ANN Counterpart Through Learnable Multi-hierarchical Threshold Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-threshold model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. |
Zecheng Hao; Xinyu Shi; Yujia Liu; Zhaofei Yu; Tiejun Huang; |
243 | No-Regret Learning for Fair Multi-Agent Social Welfare Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. |
Mengxiao Zhang; Ramiro Deo-Campo Vuong; Haipeng Luo; |
244 | Contextual Multinomial Logit Bandits with General Value Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. |
Mengxiao Zhang; Haipeng Luo; |
245 | Zero-shot Image Editing with Reference Imitation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. |
Xi Chen; Yutong Feng; Mengting Chen; Yiyang Wang; Shilong Zhang; Yu Liu; Yujun Shen; Hengshuang Zhao; |
246 | Implicit Bias of Mirror Flow on Separable Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. |
Scott Pesme; Radu-Alexandru Dragomir; Nicolas Flammarion; |
247 | WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce WildGuard—an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. |
Seungju Han; Kavel Rao; Allyson Ettinger; Liwei Jiang; Bill Yuchen Lin; Nathan Lambert; Nouha Dziri; Yejin Choi; |
248 | GSDF: 3DGS Meets SDF for Improved Neural Rendering and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although both neural implicit surfaces and explicit Gaussian primitives have advanced with neural rendering techniques, current methods impose strict constraints on density fields or primitive shapes, which enhances the affinity for geometric reconstruction at the sacrifice of rendering quality. To address this dilemma, we introduce GSDF, a dual-branch architecture combining 3D Gaussian Splatting (3DGS) and neural Signed Distance Fields (SDF). |
Mulin Yu; Tao Lu; Linning Xu; Lihan Jiang; Yuanbo Xiangli; Bo Dai; |
249 | Universal In-Context Approximation By Prompting Fully Recurrent Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. |
Aleksandar Petrov; Tom Lamb; Alasdair Paren; Philip Torr; Adel Bibi; |
250 | Understanding Emergent Abilities of Language Models from The Loss Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to study emergent abilities in the lens of pre-training loss, instead of model size or training compute. |
Zhengxiao Du; Aohan Zeng; Yuxiao Dong; Jie Tang; |
251 | Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in deep learning. |
Chia-Hsiang Kao; Bharath Hariharan; |
252 | Revisiting Few-Shot Object Detection with Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundational VLMs. |
Anish Madan; Neehar Peri; Shu Kong; Deva Ramanan; |
253 | Prospective Representation Learning for Non-Exemplar Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a Prospective Representation Learning (PRL) scheme to prepare the model for handling conflicts in advance. |
Wuxuan Shi; Mang Ye; |
254 | Fast Sampling Via Discrete Non-Markov Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation. |
Zixiang Chen; Angela Yuan; Yongqian Li; Yiwen Kou; Junkai Zhang; Quanquan Gu; |
255 | GrootVL: Tree Topology Is All You Need in State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. |
Yicheng Xiao; Lin Song; shaoli huang; Jiangshan Wang; Siyu Song; Yixiao Ge; Xiu Li; Ying Shan; |
256 | UQE: A Query Engine for Unstructured Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections. |
Hanjun Dai; Bethany Wang; Xingchen Wan; Bo Dai; Sherry Yang; Azade Nova; Pengcheng Yin; Mangpo Phothilimthana; Charles Sutton; Dale Schuurmans; |
257 | Exploring Molecular Pretraining Model at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. |
ji xh; Zhen Wang; Zhifeng Gao; Hang Zheng; Linfeng Zhang; Guolin Ke; |
258 | Adaptive Proximal Gradient Method for Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). |
Yura Malitsky; Konstantin Mishchenko; |
259 | Language Models As Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We examine the property by considering lossless gradient compression — a critical application in distributed learning — that depends heavily on precise probability modeling. To achieve this, we introduce LM-GC, a novel method that integrates LLMs with arithmetic coding. |
Hui-Po Wang; Mario Fritz; |
260 | Towards Visual Text Design Transfer Across Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. |
Yejin Choi; Jiwan Chung; Sumin Shim; Giyeong Oh; Youngjae Yu; |
261 | Efficient Lifelong Model Evaluation in An Era of Rapid Progress Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling \textit{ever-expanding} large-scale benchmarks called \textit{Lifelong Benchmarks}. |
Ameya Prabhu; Vishaal Udandarao; Philip Torr; Matthias Bethge; Adel Bibi; Samuel Albanie; |
262 | ARC: A Generalist Graph Anomaly Detector with In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current GAD methods necessitate training specific to each dataset, resulting in high training costs, substantial data requirements, and limited generalizability when being applied to new datasets and domains. To address these limitations, this paper proposes ARC, a generalist GAD approach that enables a “one-for-all” GAD model to detect anomalies across various graph datasets on-the-fly. |
Yixin Liu; Shiyuan Li; Yu Zheng; Qingfeng Chen; Chengqi Zhang; Shirui Pan; |
263 | Adam with Model Exponential Moving Average Is Effective for Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). |
Kwangjun Ahn; Ashok Cutkosky; |
264 | TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as _IE as a tool_. |
Avi Caciularu; Alon Jacovi; Eyal Ben-David; Sasha Goldshtein; Tal Schuster; Jonathan Herzig; Gal Elidan; Amir Globerson; |
265 | Vivid-ZOO: Multi-View Video Generation with Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text. |
Bing Li; Cheng Zheng; Wenxuan Zhu; Jinjie Mai; Biao Zhang; Peter Wonka; Bernard Ghanem; |
266 | OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. |
Zihao Wang; Shaofei Cai; Zhancun Mu; Haowei Lin; Ceyao Zhang; Xuejie Liu; Qing Li; Anji Liu; Xiaojian (Shawn) Ma; Yitao Liang; |
267 | Understanding The Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. |
Jerome Sieber; Carmen Amo Alonso; Alexandre Didier; Melanie Zeilinger; Antonio Orvieto; |
268 | SemCoder: Training Code Language Models with Comprehensive Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to bridge the gap between Code LLMs’ reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. |
Yangruibo Ding; Jinjun Peng; Marcus Min; Gail Kaiser; Junfeng Yang; Baishakhi Ray; |
269 | MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new variant of the Adam optimizer called microAdam that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. |
Ionut-Vlad Modoranu; Mher Safaryan; Grigory Malinovsky; Eldar Kurtić; Thomas Robert; Peter Richtarik; Dan Alistarh; |
270 | Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. |
Frederik Kunstner; Robin Yadav; Alan Milligan; Mark Schmidt; Alberto Bietti; |
271 | Universal Neural Functionals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes an algorithm that automatically constructs permutation equivariant models, which we refer to as universal neural functionals (UNFs), for any weight space. |
Allan Zhou; Chelsea Finn; James Harrison; |
272 | Enhancing Protein Mutation Effect Prediction Through A Retrieval-Augmented Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing models struggle to effectively extract mutation-related local structure motifs from protein databases, which hinders their predictive accuracy and robustness. To tackle this problem, we design a novel retrieval-augmented framework for incorporating similar structure information in known protein structures. |
Ruihan Guo; Rui Wang; Ruidong Wu; Zhizhou Ren; Jiahan Li; Shitong Luo; Zuofan Wu; Qiang Liu; Jian Peng; Jianzhu Ma; |
273 | Aligning LLM Agents By Learning Latent Preference from User Edits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learning framework, PRELUDE that infers a description of the user’s latent preference based on historic edit data and using it to define a prompt policy that drives future response generation. |
Ge Gao; Alexey Taymanov; Eduardo Salinas; Paul Mineiro; Dipendra Misra; |
274 | AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset — *AllClear* for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. |
Hangyu Zhou; Chia-Hsiang Kao; Cheng Perng Phoo; Utkarsh Mall; Bharath Hariharan; Kavita Bala; |
275 | Cardinality-Aware Set Prediction and Top-$k$ Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. |
Corinna Cortes; Anqi Mao; Christopher Mohri; Mehryar Mohri; Yutao Zhong; |
276 | Statistical Estimation in The Spiked Tensor Model Via The Quantum Approximate Optimization Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the performance of the QAOA on the spiked tensor model, a statistical estimation problem that exhibits a large computational-statistical gap classically. |
Leo Zhou; Joao Basso; Song Mei; |
277 | Deep Graph Mating Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first learning-free model reuse task within the non-Euclidean domain, termed as Deep Graph Mating (Grama). |
Yongcheng Jing; Seok-Hee Hong; Dacheng Tao; |
278 | SpeechAlign: Speech Language Models Can Self-Improve Via Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences. |
Dong Zhang; Zhaowei Li; Shimin Li; Xin Zhang; Pengyu Wang; Yaqian Zhou; Xipeng Qiu; |
279 | Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through extensive experiments on several chat models (Meta’s Llama 2-Chat, Mistral AI’s Mistral 7B Instruct v0.2, and OpenAI’s GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the “Pure Tuning, Safe Testing” (PTST) strategy — fine-tune models without a safety prompt, but include it at test time. |
Kaifeng Lyu; Haoyu Zhao; Xinran Gu; Dingli Yu; Anirudh Goyal; Sanjeev Arora; |
280 | Calibrated Self-Rewarding Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These approaches are resource-intensive and may not effectively reflect the target LVLM’s preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. |
Yiyang Zhou; Zhiyuan Fan; Dongjie Cheng; Sihan Yang; Zhaorun Chen; Chenhang Cui; Xiyao Wang; Yun Li; Linjun Zhang; Huaxiu Yao; |
281 | Transfer Q-star : Principled Decoding for LLM Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose $\texttt{Transfer Q}^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $r_{\texttt{BL}}$ (which can be different from the target reward $r$). |
Souradip Chakraborty; Soumya Suvra Ghosal; Ming Yin; Dinesh Manocha; Mengdi Wang; Amrit Singh Bedi; Furong Huang; |
282 | Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an innovative framework termed Image Textualization, which automatically produces high-quality image descriptions by leveraging existing mult-modal large language models (MLLMs) and multiple vision expert models in a collaborative manner. |
Renjie Pi; Jianshu Zhang; Jipeng Zhang; Rui Pan; Zhekai Chen; Tong Zhang; |
283 | SyncVIS: Synchronized Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite harvesting remarkable progress, existing works follow asynchronous designs, which model video sequences via either video-level queries only or adopting query-sensitive cascade structures, resulting in difficulties when handling complex and challenging video scenarios. In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS. |
Rongkun Zheng; Lu Qi; Xi Chen; Yi Wang; Kun Wang; Yu Qiao; Hengshuang Zhao; |
284 | Training-Free Visual Prompt Learning for Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. |
Mingrui Wu; Xinyue Cai; Jiayi Ji; Jiale Li; Oucheng Huang; Gen Luo; Hao Fei; GUANNAN JIANG; Xiaoshuai Sun; Rongrong Ji; |
285 | Decoupling Semantic Similarity from Spatial Alignment for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Revisiting the established similarity calculations for RSMs we expose their sensitivity to spatial alignment. In this paper we propose to solve this through _semantic RSMs_, which are invariant to spatial permutation. |
Tassilo Wald; Constantin Ulrich; Priyank Jaini; Gregor Koehler; David Zimmerer; Stefan Denner; Fabian Isensee; Michael Baumgartner; Klaus Maier-Hein; |
286 | Variational Distillation of Diffusion Policies Into Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. |
Hongyi Zhou; Denis Blessing; Ge Li; Onur Celik; Xiaogang Jia; Gerhard Neumann; Rudolf Lioutikov; |
287 | GTBench: Uncovering The Strategic Reasoning Capabilities of LLMs Via Game-Theoretic Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper evaluates LLMs’ reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. |
Jinhao Duan; Renming Zhang; James Diffenderfer; Bhavya Kailkhura; Lichao Sun; Elias Stengel-Eskin; Mohit Bansal; Tianlong Chen; Kaidi Xu; |
288 | Probing The Decision Boundaries of In-context Learning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. |
Siyan Zhao; Tung Nguyen; Aditya Grover; |
289 | MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. |
Xuan Ju; Yiming Gao; Zhaoyang Zhang; Ziyang Yuan; Xintao Wang; AILING ZENG; Yu Xiong; Qiang Xu; Ying Shan; |
290 | Who Evaluates The Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce T2IScoreScore (TS2), a curated set of semantic error graphs containing a prompt and a set increasingly erroneous images. |
Michael Saxon; Fatima Jahara; Mahsa Khoshnoodi; Yujie Lu; Aditya Sharma; William Yang Wang; |
291 | Improved Distribution Matching Distillation for Fast Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is not only computationally expensive for large-scale text-to-image synthesis, but it also limits the student’s quality, tying it too closely to the teacher’s original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. |
Tianwei Yin; Michaël Gharbi; Taesung Park; Richard Zhang; Eli Shechtman; Fredo Durand; Bill Freeman; |
292 | Mixture of Tokens: Continuous MoE Through Cross-Example Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the observation that the adaptation of fully continuous methods has been an overarching trend in deep learning, we develop Mixture of Tokens (MoT), a simple, continuous architecture that is capable of scaling the number of parameters similarly to sparse MoE models. |
Szymon Antoniak; Michał Krutul; Maciej Pióro; Jakub Krajewski; Jan Ludziejewski; Kamil Ciebiera; Krystian Król; Tomasz Odrzygóźdź; Marek Cygan; Sebastian Jaszczur; |
293 | Visual CoT: Advancing Multi-Modal Language Models with A Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. |
Hao Shao; Shengju Qian; Han Xiao; Guanglu Song; ZHUOFAN ZONG; Letian Wang; Yu Liu; Hongsheng Li; |
294 | Learning Action and Reasoning-Centric Image Editing from Videos and Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a new automatic metric that focuses on discriminative understanding. |
Benno Krojer; Dheeraj Vattikonda; Luis Lara; Varun Jampani; Eva Portelance; Chris Pal; Siva Reddy; |
295 | The Iterative Optimal Brain Surgeon: Faster Sparse Recovery By Leveraging Second-Order Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, these results still lack a solid theoretical understanding, and it is unclear whether they can be improved by leveraging connections to the wealth of work on sparse recovery algorithms. In this paper, we draw new connections between these two areas and present new sparse recovery algorithms inspired by the OBS framework that come with theoretical guarantees under reasonable assumptions and have strong practical performance. |
Diyuan Wu; Ionut-Vlad Modoranu; Mher Safaryan; Denis Kuznedelev; Dan Alistarh; |
296 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods typically focus solely on adapting VLMs from a single modality and fail to accumulate task-specific knowledge as more samples are processed. To address this, we introduce Dual Prototype Evolving (DPE), a novel test-time adaptation approach for VLMs that effectively accumulates task-specific knowledge from multi-modalities. |
Ce Zhang; Simon Stepputtis; Katia Sycara; Yaqi Xie; |
297 | Predicting Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. |
Alexander Havrilla; Wenjing Liao; |
298 | CALE: Continuous Arcade Learning Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare et al., 2013]. |
Jesse Farebrother; Pablo Samuel Castro; |
299 | HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an efficient framework, dubbed HiCoM, with three key components. |
Qiankun Gao; Jiarui Meng; Chengxiang Wen; Jie Chen; Jian Zhang; |
300 | Vector Quantization Prompting for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. |
Li Jiao; Qiuxia LAI; YU LI; Qiang Xu; |
301 | Transformer Efficiently Learns Low-dimensional Target Functions In-context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study ICL of a nonlinear function class via transformer with nonlinear MLP layer: given a class of single-index target functions $f_*(x) = \sigma_*(\langle x,\beta\rangle)$, where the index features $\beta\in\mathbb{R}^d$ are drawn from a rank-$r$ subspace, we show that a nonlinear transformer optimized by gradient descent on the empirical loss learns $f_*$ in-context with a prompt length that only depends on the dimension of function class $r$; in contrast, an algorithm that directly learns $f_*$ on test prompt yields a statistical complexity that scales with the ambient dimension $d$. |
Kazusato Oko; Yujin Song; Taiji Suzuki; Denny Wu; |
302 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. |
Jiacheng Ye; Shansan Gong; Liheng Chen; Lin Zheng; Jiahui Gao; Han Shi; Chuan Wu; Xin Jiang; Zhenguo Li; Wei Bi; Lingpeng Kong; |
303 | When Does Perceptual Alignment Benefit Vision Representations? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we investigate how aligning vision model representations to human perceptual judgments impacts their usability in standard computer vision tasks. |
Shobhita Sundaram; Stephanie Fu; Lukas Muttenthaler; Netanel Tamir; Lucy Chai; Simon Kornblith; Trevor Darrell; Phillip Isola; |
304 | Is Value Function Learning Really The Main Bottleneck of Offline RL? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to understand bottlenecks in current offline RL algorithms. |
Seohong Park; Kevin Frans; Sergey Levine; Aviral Kumar; |
305 | SplitNeRF: Split Sum Approximation Neural Field for Joint Geometry, Illumination, and Material Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel approach for digitizing real-world objects by estimating their geometry, material properties, and environmental lighting from a set of posed images with fixed lighting. |
Jesus Zarzar; Bernard Ghanem; |
306 | InfLLM: Training-Free Long-Context Extrapolation for LLMs with An Efficient Context Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. |
Chaojun Xiao; Pengle Zhang; Xu Han; Guangxuan Xiao; Yankai Lin; Zhengyan Zhang; Zhiyuan Liu; Maosong Sun; |
307 | Gorilla: Teaching LLMs to Use Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop Go-rilla, a finetuned LLaMA model that surpassesthe performance of GPT-4 on writing API calls. |
Shishir G Patil; Tianjun Zhang; Xin Wang; Joseph Gonzalez; |
308 | Reranking Laws for Language Generation: A Communication-Theoretic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reranker to choose the best one. In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy communication channels. |
António Farinhas; Haau-Sing Li; André Martins; |
309 | Robust Reinforcement Learning from Corrupted Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach — $R^3M$, which models the potentially corrupted preference label as sparse outliers. |
Alexander Bukharin; Ilgee Hong; Haoming Jiang; Zichong Li; Qingru Zhang; Zixuan Zhang; Tuo Zhao; |
310 | Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. |
Zhengfei Kuang; Shengqu Cai; Hao He; Yinghao Xu; Hongsheng Li; Leonidas Guibas; Gordon Wetzstein; |
311 | Needle In A Multimodal Haystack Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. |
Weiyun Wang; Shuibo Zhang; Yiming Ren; Yuchen Duan; Tiantong Li; Shuo Liu; Mengkang Hu; Zhe Chen; Kaipeng Zhang; Lewei Lu; Xizhou Zhu; Ping Luo; Yu Qiao; Jifeng Dai; Wenqi Shao; Wenhai Wang; |
312 | A Tractable Inference Perspective of Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While it is still possible to approximate such queries, we observe that such crude estimates undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern tractable generative models to bridge the gap between good sequence models and high expected returns at evaluation time. |
Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang; |
313 | StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple but effective self-attention mechanism, termed Consistent Self-Attention, that boosts the consistency between the generated images. |
Yupeng Zhou; Daquan Zhou; Ming-Ming Cheng; Jiashi Feng; Qibin Hou; |
314 | PaGoDA: Progressive Growing of A One-Step Generator from A Low-Resolution Diffusion Teacher Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyond that of the original teacher DM. |
Dongjun Kim; Chieh-Hsin Lai; Wei-Hsiang Liao; Yuhta Takida; Naoki Murata; Toshimitsu Uesaka; Yuki Mitsufuji; Stefano Ermon; |
315 | VMamba: Visual State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we transplant Mamba, a state-space language model, into VMamba, a vision backbone that works in linear time complexity. |
Liu Yue; Yunjie Tian; Yuzhong Zhao; Hongtian Yu; Lingxi Xie; Yaowei Wang; Qixiang Ye; Jianbin Jiao; Yunfan Liu; |
316 | TableRAG: Million-Token Tabular Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. |
Si-An Chen; Lesly Miculicich; Julian Eisenschlos; Zifeng Wang; Zilong Wang; Yanfei Chen; YASUHISA FUJII; Hsuan-Tien Lin; Chen-Yu Lee; Tomas Pfister; |
317 | ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ‘ODGS’ which includes a new rasterization appropriate for omnidirectional image projection. |
Suyoung Lee; Jaeyoung Chung; Jaeyoo Huh; Kyoung Mu Lee; |
318 | LSH-MoE: Communication-efficient MoE Training Via Locality-Sensitive Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose LSH-MoE, a communication-efficient MoE training framework using locality-sensitive hashing (LSH). |
Xiaonan Nie; Liu Qibin; Fangcheng Fu; Shenhan Zhu; Xupeng Miao; Xiaoyang Li; Yang Zhang; Shouda Liu; Bin CUI; |
319 | WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce WildTeaming, an automatic red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes selections of multiple mined tactics for systematic exploration of novel and even more challenging jailbreaks. |
Liwei Jiang; Kavel Rao; Seungju Han; Allyson Ettinger; Faeze Brahman; Sachin Kumar; Niloofar Mireshghallah; Ximing Lu; Maarten Sap; Nouha Dziri; Yejin Choi; |
320 | Towards Flexible Visual Relationship Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the complexity and interconnectedness of these tasks, it is crucial to have a flexible framework that can effectively address these tasks in a cohesive manner. In this work, we propose Flex-VRS, a single model that seamlessly integrates the above three aspects in standard and promptable visual relationship segmentation, and further possesses the capability for open-vocabulary segmentation to adapt to novel scenarios. |
Fangrui Zhu; Jianwei Yang; Huaizu Jiang; |
321 | Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, commonly used contrastively trained representations such as in CLIP have been shown to fail at enabling embodied agents to gain a sufficiently fine-grained scene understanding—a capability vital for control. To address this shortcoming, we consider representations from pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts and as such, contain text-conditioned representations that reflect highly fine-grained visuo-spatial information. |
Gunshi Gupta; Karmesh Yadav; Yarin Gal; Dhruv Batra; Zsolt Kira; Cong Lu; Tim G. J. Rudner; |
322 | Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new weight decay technique, Selective Projection Decay (SPD), that selectively imposes a strong penalty on certain layers while allowing others to change freely. |
Junjiao Tian; Chengyue Huang; Zsolt Kira; |
323 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that VLMs still struggle with natural images and questions that humans can easily answer, which we term natural adversarial samples. |
Baiqi Li; Zhiqiu Lin; WENXUAN PENG; Jean de Dieu Nyandwi; Daniel Jiang; Zixian Ma; Simran Khanuja; Ranjay Krishna; Graham Neubig; Deva Ramanan; |
324 | UNITS: A Unified Multi-Task Time Series Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. |
Shanghua Gao; Teddy Koker; Owen Queen; Tom Hartvigsen; Theodoros Tsiligkaridis; Marinka Zitnik; |
325 | WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically explore how model weights interact with unlearning processes in LLMs and we design the weight attribution-guided LLM unlearning method, WAGLE, which unveils the interconnections between ‘influence’ of weights and ‘influence’ of data to forget and retain in LLM generation. |
Jinghan Jia; Jiancheng Liu; Yihua Zhang; Parikshit Ram; Nathalie Baracaldo; Sijia Liu; |
326 | Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge for testing social perception and cooperation in embodied agents. |
Weihua Du; Qiushi Lyu; Jiaming Shan; Zhenting Qi; Hongxin Zhang; Sunli Chen; Andi Peng; Tianmin Shu; Kwonjoon Lee; Behzad Dariush; Chuang Gan; |
327 | Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. |
Jonas Spinner; Victor Breso; Pim de Haan; Tilman Plehn; Jesse Thaler; Johann Brehmer; |
328 | Conservative Fine-Tuning of Diffusion Models from Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In offline scenarios, existing approaches tend to suffer from overoptimization, as they may be misled by the reward model in out-of-distribution regions. To address this, we introduce a conservative fine-tuning approach, BRAID, by optimizing a conservative reward model, which includes additional penalization outside of offline data distributions. |
Masatoshi Uehara; Yulai Zhao; Ehsan Hajiramezanali; Gabriele Scalia; Gokcen Eraslan; Avantika Lal; Sergey Levine; Tommaso Biancalani; |
329 | Few-Shot Task Learning Through Inverse Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning, and present our approach, Few-Shot Task Learning Through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. |
Aviv Netanyahu; Yilun Du; Jyothish Pari; Josh Tenenbaum; Tianmin Shu; Pulkit Agrawal; |
330 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing question-answering (QA) datasets based on scientific papers are limited in scale and focus solely on textual content. To address this limitation, we introduce SPIQA (Scientific Paper Image Question Answering), the first large-scale QA dataset specifically designed to interpret complex figures and tables within the context of scientific research articles across various domains of computer science. |
Shraman Pramanick; Rama Chellappa; Subhashini Venugopalan; |
331 | Theoretical Guarantees in KL for Diffusion Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main contribution of this paper is to provide relatively mild assumption on $\nu^\star$, $\mu$ and $\pi$ to obtain non-asymptotics guarantees for Diffusion Flow Matching (DFM) models using as bridge the conditional distribution associated with the Brownian motion. |
Marta Gentiloni Silveri; Alain Durmus; Giovanni Conforti; |
332 | TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. |
Chenyang Le; Yao Qian; Dongmei Wang; Long Zhou; Shujie LIU; Xiaofei Wang; Midia Yousefi; Yanmin Qian; Jinyu Li; Michael Zeng; |
333 | Neural Isometries: Taming Transformations for Equivariant ML Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. |
Thomas Mitchel; Michael Taylor; Vincent Sitzmann; |
334 | Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a new analysis of BC with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. |
Dylan J Foster; Adam Block; Dipendra Misra; |
335 | Multiple Physics Pretraining for Spatiotemporal Surrogate Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. |
Michael McCabe; Bruno Régaldo-Saint Blancard; Liam Parker; Ruben Ohana; Miles Cranmer; Alberto Bietti; Michael Eickenberg; Siavash Golkar; Geraud Krawezik; Francois Lanusse; Mariel Pettee; Tiberiu Tesileanu; Kyunghyun Cho; Shirley Ho; |
336 | Decoupled Kullback-Leibler Divergence Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve deeper into the Kullback–Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error ($\mathbf{w}$MSE) loss and 2) a Cross-Entropy loss incorporating soft labels. |
Jiequan Cui; Zhuotao Tian; Zhisheng Zhong; Xiaojuan Qi; Bei Yu; Hanwang Zhang; |
337 | Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. |
Aniket Didolkar; Anirudh Goyal; Nan Rosemary Ke; Siyuan Guo; Michal Valko; Timothy Lillicrap; Danilo Jimenez Rezende; Yoshua Bengio; Michael Mozer; Sanjeev Arora; |
338 | ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ConceptMix, a scalable, controllable, and customizable benchmark consisting of two stages: (a) With categories of visual concepts (e.g., objects, colors, shapes, spatial relationships), it randomly samples an object and $k$-tuples of visual concepts to generate text prompts with GPT-4o for image generation. |
Xindi Wu; Dingli Yu; Yangsibo Huang; Olga Russakovsky; Sanjeev Arora; |
339 | Base of RoPE Bounds Context Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability. |
Xin Men; Mingyu Xu; Qingyu Zhang; Bingning Wang; Hongyu Lin; Xianpei Han; weipeng chen; |
340 | Fight Back Against Jailbreaking Via Prompt Adversarial Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, motivated by adversarial training paradigms for achieving reliable robustness, we propose an approach named **Prompt Adversarial Tuning (PAT)** that trains a prompt control attached to the user prompt as a guard prefix. |
Yichuan Mo; Yuji Wang; Zeming Wei; Yisen Wang; |
341 | PointMamba: A Simple State Space Model for Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. |
Dingkang Liang; Xin Zhou; Wei Xu; xingkui zhu; Zhikang Zou; Xiaoqing Ye; Xiao Tan; Xiang Bai; |
342 | Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by fuzzy information processing theory, this paper introduces the DDSR model, which uses fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users’ real interests. |
Wenjia Xie; Hao Wang; Luankang Zhang; Rui Zhou; Defu Lian; Enhong Chen; |
343 | Approaching Human-Level Forecasting with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. |
Danny Halawi; Fred Zhang; Chen Yueh-Han; Jacob Steinhardt; |
344 | InterControl: Zero-shot Human Interaction Generation By Controlling Every Joint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. |
Zhenzhi Wang; Jingbo Wang; Yixuan Li; Dahua Lin; Bo Dai; |
345 | Reversing The Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. |
Jiabao Ji; Yujian Liu; Yang Zhang; Gaowen Liu; Ramana Kompella; Sijia Liu; Shiyu Chang; |
346 | Emu3D: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Emu3D, a significant advancement in text-to-3D which produces faithful, high-quality meshes with full material control. |
Yawar Siddiqui; Filippos Kokkinos; Tom Monnier; Mahendra Kariya; Yanir Kleiman; Emilien Garreau; Oran Gafni; Natalia Neverova; Andrea Vedaldi; David Novotny; Roman Shapovalov; |
347 | One-Step Effective Diffusion Network for Real-World Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, the random noise introduces uncertainty in the output, which is unfriendly to image restoration tasks. To address these issues, we propose a one-step effective diffusion network, namely OSEDiff, for the Real-ISR problem. |
Rongyuan Wu; Lingchen Sun; Zhiyuan Ma; Lei Zhang; |
348 | Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. |
Bowen Ping; Shuo Wang; Hanqing Wang; Xu Han; Yuzhuang Xu; Yukun Yan; Yun Chen; Baobao Chang; Zhiyuan Liu; Maosong Sun; |
349 | CosAE: Learnable Fourier Series for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Cosine Autoencoder (CosAE), a novel, generic Autoencoder that seamlessly leverages the classic Fourier series with a feed-forward neural network. |
Sifei Liu; Shalini De Mello; Jan Kautz; |
350 | Explaining Text Datasets with Language Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make model parameters directly interpretable, we introduce a family of statistical models—including clustering, time-series, and classification models—parameterized by *natural language predicates*. |
Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt; |
351 | One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. |
Zechen Bai; Tong He; Haiyang Mei; Pichao WANG; Ziteng Gao; Joya Chen; liulei; Zheng Zhang; Mike Zheng Shou; |
352 | MeshFormer : High-Quality Mesh Generation with 3D-Guided Reconstruction Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. |
Minghua Liu; Chong Zeng; Xinyue Wei; Ruoxi Shi; Linghao Chen; Chao Xu; Mengqi Zhang; Zhaoning Wang; Xiaoshuai Zhang; Isabella Liu; Hongzhi Wu; Hao Su; |
353 | Q-VLM: Post-training Quantization for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. |
Changyuan Wang; Ziwei Wang; Xiuwei Xu; Yansong Tang; Jie Zhou; Jiwen Lu; |
354 | FuseFL: One-Shot Federated Learning Through The Lens of Causality with Progressive Model Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. |
Zhenheng Tang; Yonggang Zhang; Peijie Dong; Yiu-ming Cheung; Amelie Zhou; Bo Han; Xiaowen Chu; |
355 | TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Even though SRL has become the foundation of almost all geospatial artificial intelligence (GeoAI) research, we have not yet seen significant efforts to develop an extensive deep learning framework and benchmark to support SRL model development and evaluation. To fill this gap, we propose TorchSpatial, a learning framework and benchmark· for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. |
Nemin Wu; Qian Cao; Zhangyu Wang; Zeping Liu; Yanlin Qi; Jielu Zhang; Joshua Ni; X. Yao; Hongxu Ma; Lan Mu; Stefano Ermon; Tanuja Ganu; Akshay Nambi; Ni Lao; Gengchen Mai; |
356 | Generalizable Implicit Motion Modeling for Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing paradigms either simply consider linear combinations of bidirectional flows or directly predict bilateral flows with the condition of timestamps, lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. |
Zujin Guo; Wei Li; Chen Change Loy; |
357 | Weak-to-Strong Search: Align Large Language Models Via Searching Over Small Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce *weak-to-strong search*, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model. |
Zhanhui Zhou; Zhixuan Liu; Jie Liu; Zhichen Dong; Chao Yang; Yu Qiao; |
358 | Pandora’s Box: Towards Building Universal Attackers Against Real-World Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the research gap and practical demands, in this paper, we make the first attempt to build a universal attacker against real-world LVLMs, focusing on two critical aspects: (i) restricting access to only the LVLM inputs and outputs. |
Daizong Liu; Mingyu Yang; Xiaoye Qu; Pan Zhou; Xiang Fang; Keke Tang; Yao Wan; Lichao Sun; |
359 | Semi-Truths: A Large-Scale Dataset for Testing Robustness of AI-Generated Image Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Do they exhibit biases towards specific scenes or data distributions? To address these questions, we introduce Semi-Truths, featuring 27,635 real images, 245,360 masks, and 850,226 AI-augmented images featuring varying degrees of targeted and localized edits, created using diverse augmentation methods, diffusion models, and data distributions. |
Anisha Pal; Julia Kruk; Mansi Phute; Manognya Bhattaram; Diyi Yang; Duen Horng Chau; Judy Hoffman; |
360 | Demystify Mamba in Vision: A Linear Attention Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the similarities and disparities between Mamba and linear attention Transformer, providing comprehensive analyses to demystify the key factors behind Mamba’s success. |
Dongchen Han; Ziyi Wang; Zhuofan Xia; Yizeng Han; Yifan Pu; Chunjiang Ge; Jun Song; Shiji Song; Bo Zheng; Gao Huang; |
361 | Bridging The Divide: Reconsidering Softmax and Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step forward to close the gap between the linear and Softmax attention with novel theoretical analyses, which demystify the core factors behind the performance deviations. |
Dongchen Han; Yifan Pu; Zhuofan Xia; Yizeng Han; Xuran Pan; Xiu Li; Jiwen Lu; Shiji Song; Gao Huang; |
362 | Segment Any Change Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the segment any change models (AnyChange), a new type of change detection model that supports zero-shot prediction and generalization on unseen change types and data distributions. |
Zhuo Zheng; Yanfei Zhong; Liangpei Zhang; Stefano Ermon; |
363 | Clustering in Causal Attention Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a modification of the self-attention dynamics proposed in Geshkovski et al to better reflect the practically relevant causally masked attention used in transformer architectures for generative AI. |
Nikita Karagodin; Yury Polyanskiy; Philippe Rigollet; |
364 | MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce \textit{model-aware data selection with data influence models (MATES)}, where a data influence model continuously adapts to the evolving data preferences of the main pretraining model, thus selecting data most effective for the model’s current learning progress. |
Zichun Yu; Spandan Das; Chenyan Xiong; |
365 | NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. |
Daniel Dauner; Marcel Hallgarten; Tianyu Li; Xinshuo Weng; Zhiyu Huang; Zetong Yang; Hongyang Li; Igor Gilitschenski; Boris Ivanovic; Marco Pavone; Andreas Geiger; Kashyap Chitta; |
366 | DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce DeTikZify, a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs based on sketches and existing figures. |
Jonas Belouadi; Simone Ponzetto; Steffen Eger; |
367 | Learning from Highly Sparse Spatio-temporal Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a theoretical analysis revealing that such iterative models are not only susceptible to data sparsity but also to graph sparsity, causing unstable performances on different datasets. To overcome these limitations, we introduce a novel method named One-step Propagation and Confidence-based Refinement (OPCR). |
Leyan Deng; Chenwang Wu; Defu Lian; Enhong Chen; |
368 | Generative Hierarchical Materials Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formulate end-to-end language-to-structure generation as a multi-objective optimization problem, and propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures. |
Sherry Yang; Simon Batzner; Ruiqi Gao; Muratahan Aykol; Alexander Gaunt; Brendan C McMorrow; Danilo Jimenez Rezende; Dale Schuurmans; Igor Mordatch; Ekin Dogus Cubuk; |
369 | CAPE: Context-Adaptive Positional Encoding for Length Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Context-Adaptive Positional Encoding (CAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. |
Chuanyang Zheng; Yihang Gao; Han Shi; Minbin Huang; Jingyao Li; Jing Xiong; Xiaozhe Ren; Michael Ng; Xin Jiang; Zhenguo Li; Yu Li; |
370 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework for zero-shot object navigation. |
Hang Yin; Xiuwei Xu; Zhenyu Wu; Jie Zhou; Jiwen Lu; |
371 | Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. |
Jincheng Mei; Bo Dai; Alekh Agarwal; Sharan Vaswani; Anant Raj; Csaba Szepesvari; Dale Schuurmans; |
372 | QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of sampling a set of high-quality and diverse translations. |
Gonçalo Faria; Sweta Agrawal; António Farinhas; Ricardo Rei; José de Souza; André Martins; |
373 | Optimal Ablation for Model Internals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue for the adoption of optimal ablation of activations for studying model internals and show that it has theoretical and empirical advantages over popular methods for component ablation. |
Maximilian Li; Lucas Janson; |
374 | Probablistic Emulation of A Global Climate Model with Spherical DYffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present the first conditional generative model able to produce global climate ensemble projections that are accurate and physically consistent. |
Salva Rühling Cachay; Brian Henn; Oliver Watt-Meyer; Christopher S. Bretherton; Rose Yu; |
375 | Can Models Learn Skill Composition from Examples? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we employ a setup akin to Skill-Mix to evaluate the capacity of smaller models to learn compositional generalization from examples. |
Haoyu Zhao; Simran Kaur; Dingli Yu; Anirudh Goyal; Sanjeev Arora; |
376 | Image Understanding Makes for A Good Tokenizer for Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the potential of IU models to improve IG performance remains uncharted. We address this issue using a token-based IG framework, which relies on effective tokenizers to project images into token sequences. |
Luting Wang; Yang Zhao; Zijian Zhang; Jiashi Feng; Si Liu; Bingyi Kang; |
377 | FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, letting users apply characteristics like lighting, texture, and dynamics from different images. |
Tong Wu; Yinghao Xu; Ryan Po; Mengchen Zhang; Guandao Yang; Jiaqi Wang; Ziwei Liu; Dahua Lin; Gordon Wetzstein; |
378 | FilterNet: Harnessing Frequency Filters for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a novel perspective of enlightening signal processing for deep time series forecasting. |
Kun Yi; Wei Fan; Qi Zhang; Hui He; Jingru Fei; Shufeng Hao; Defu Lian; |
379 | PeRFlow: Piecewise Rectified Flow As Universal Plug-and-Play Accelerator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. |
Hanshu Yan; Xingchao Liu; Jiachun Pan; Jun Hao Liew; Qiang Liu; Jiashi Feng; |
380 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. |
ZHUOFAN ZONG; Bingqi Ma; Dazhong Shen; Guanglu Song; Hao Shao; DONGZHI JIANG; Hongsheng Li; Yu Liu; |
381 | MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment. |
Kailai Yang; Zhiwei Liu; Qianqian Xie; Jimin Huang; Tianlin Zhang; Sophia Ananiadou; |
382 | Consistency Purification: Effective and Efficient Diffusion Purification Towards Certified Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. |
Yiquan Li; Zhongzhu Chen; Kun Jin; Jiongxiao Wang; Bo Li; Chaowei Xiao; |
383 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the image-to-text concept matching mechanism. |
DONGZHI JIANG; Guanglu Song; Xiaoshi Wu; Renrui Zhang; Dazhong Shen; ZHUOFAN ZONG; Yu Liu; Hongsheng Li; |
384 | Panacea: Pareto Alignment Via Preference Adaptation for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Panacea, an innovative approach that reframes alignment as a multi-dimensional preference optimization problem. |
Yifan Zhong; Chengdong Ma; Xiaoyuan Zhang; Ziran Yang; Haojun Chen; Qingfu Zhang; Siyuan Qi; Yaodong Yang; |
385 | Adaptive Preference Scaling for Reinforcement Learning with Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength. |
Ilgee Hong; Zichong Li; Alexander Bukharin; Yixiao Li; Haoming Jiang; Tianbao Yang; Tuo Zhao; |
386 | CogVLM: Visual Expert for Pretrained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce CogVLM, a powerful open-source visual language foundation model. |
Weihan Wang; Qingsong Lv; Wenmeng Yu; Wenyi Hong; Ji Qi; Yan Wang; Junhui Ji; Zhuoyi Yang; Lei Zhao; Song XiXuan; Jiazheng Xu; Keqin Chen; Bin Xu; Juanzi Li; Yuxiao Dong; Ming Ding; Jie Tang; |
387 | MambaTalk: Co-Speech Gesture Generation with Selective State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the potential of state space models (SSMs). |
Zunnan Xu; Yukang Lin; Haonan Han; Sicheng Yang; Ronghui Li; Yachao Zhang; Xiu Li; |
388 | Data-Efficient Learning with Neural Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an algorithm for learning neural programs, called ISED, that only relies on input-output samples of black-box components. |
Alaia Solko-Breslin; Seewon Choi; Ziyang Li; Neelay Velingker; Rajeev Alur; Mayur Naik; Eric Wong; |
389 | Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present deep prior assembly, a novel framework that assembles diverse deep priors from large models for scene generation from single images in a zero-shot manner. |
Junsheng Zhou; Yu-Shen Liu; Zhizhong Han; |
390 | Unveiling Encoder-Free Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we bridge the gap between encoder-based and encoder-free models and present a simple yet effective training recipe towards pure LVLMs. |
Haiwen Diao; Yufeng Cui; Xiaotong Li; Yueze Wang; Huchuan Lu; Xinlong Wang; |
391 | Selective Attention: Enhancing Transformer Through Principled Context Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^\top\text{softmax}(Kq)$, where $V,K$ are the query and key respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. |
Xuechen Zhang; Xiangyu Chang; Mingchen Li; Amit Roy-Chowdhury; Jiasi Chen; Samet Oymak; |
392 | Stochastic Optimal Control Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. |
Carles Domingo i Enrich; Jiequn Han; Brandon Amos; Joan Bruna; Ricky T. Q. Chen; |
393 | Improved Off-policy Training of Diffusion Samplers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. |
Marcin Sendera; Minsu Kim; Sarthak Mittal; Pablo Lemos; Luca Scimeca; Jarrid Rector-Brooks; Alexandre Adam; Yoshua Bengio; Nikolay Malkin; |
394 | The Fine-Grained Complexity of Gradient Computation for Training Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show nearly identical results for the harder-seeming problem of computing the gradient of loss function of one layer attention network, and thus for the entire process of LLM training. |
Josh Alman; Zhao Song; |
395 | Scalable and Effective Arithmetic Tree Generation for RL-Driven Adder and Multiplier Designs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost computing performance, this work focuses on the two most common and fundamental arithmetic modules, adders and multipliers. |
Yao Lai; Jinxin Liu; David Pan; Ping Luo; |
396 | GenRec: Unifying Video Generation and Recognition with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. |
Zejia Weng; Xitong Yang; Zhen Xing; Zuxuan Wu; Yu-Gang Jiang; |
397 | SafeWorld: Geo-Diverse Safety Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On top of it, we propose a multi-dimensional automatic safety evaluation framework that assesses the contextual appropriateness, accuracy, and comprehensiveness of responses. |
Da Yin; Haoyi Qiu; Kung-Hsiang Huang; Kai-Wei Chang; Nanyun Peng; |
398 | BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. |
Zikang Zhou; HU Haibo; Xinhong Chen; Jianping Wang; Nan Guan; Kui Wu; Yung-Hui Li; Yu-Kai Huang; Chun Jason Xue; |
399 | EGODE: An Event-attended Graph ODE Framework for Modeling Rigid Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach named Event-attend Graph ODE(EGODE) for effective rigid dynamics modeling. |
Jingyang Yuan; Gongbo Sun; Zhiping Xiao; Hang Zhou; Xiao Luo; Junyu Luo; Yusheng Zhao; Wei Ju; Ming Zhang; |
400 | XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, traditional techniques such as global feature alignment or vision-language model distillation tend to impose only approximate correspondence, struggling notably with delineating fine-grained segmentation boundaries. To address this gap, we propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D. |
Ziyi Wang; Yanbo Wang; Xumin Yu; Jie Zhou; Jiwen Lu; |
401 | Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. |
Hadi Pouransari; Chun-Liang Li; Jen-Hao Chang; Pavan Kumar Anasosalu Vasu; Cem Koc; Vaishaal Shankar; Oncel Tuzel; |
402 | Towards Neuron Attributions in Multi-Modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, while neuron attribution has made significant progress in deciphering text-only LLMs, its application to Multimodal LLMs (MLLMs) remains less explored. To address this gap, we propose a novel Neuron Attribution method tailored for MLLMs, termed NAM. |
Junfeng Fang; Zac Bi; Ruipeng Wang; Houcheng Jiang; Yuan Gao; Kun Wang; An Zhang; Jie Shi; Xiang Wang; Tat-Seng Chua; |
403 | Metric Transforms and Low Rank Representations of Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new linear-algebraic tool based on Abelian group representation theory, and use it to address three key problems in machine learning.1. |
Timothy Chu; Josh Alman; Gary L. Miller; Shyam Narayanan; Mark Sellke; Zhao Song; |
404 | Differentiable Structure Learning with Partial Orders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main difficulty lies in adapting these constraints, typically suited for the space of total orderings, to the continuous optimization context of structure learning in the graph space. To bridge this gap, this paper formalizes a set of equivalent constraints that map partial orders onto graph spaces and introduces a plug-and-play module for their efficient application. |
Taiyu Ban; Lyuzhou Chen; Xiangyu Wang; Xin Wang; Derui Lyu; Huanhuan Chen; |
405 | R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By carefully deriving X-ray rasterization functions, we discover a previously unknown \emph{integration bias} in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. |
Ruyi Zha; Tao Jun Lin; Yuanhao Cai; Jiwen Cao; Yanhao Zhang; Hongdong Li; |
406 | GPT As Visual Explainer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Language Model as Visual Explainer (\texttt{LVX}), a systematic approach for interpreting the internal workings of vision models using a tree-structured linguistic explanation, without the need for model training. |
Xingyi Yang; Xinchao Wang; |
407 | A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. |
Heyang Zhao; Jiafan He; Quanquan Gu; |
408 | Unveiling The Power of Diffusion Features For Personalized Segmentation and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a significant flaw in these models is evident: they struggle to locate a desired instance when other instances within the same class are presented. In this paper, we explore text-to-image diffusion models for these tasks. |
Dvir Samuel; Rami Ben-Ari; Matan Levy; Nir Darshan; Gal Chechik; |
409 | Empowering and Assessing The Utility of Large Language Models in Crop Science Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, their untapped potential in crop science presents an opportunity for advancement. To narrow this gap, we introduce CROP, which includes a novel instruction tuning dataset specifically designed to enhance LLMs’ professional capabilities in the crop science sector, along with a benchmark that serves as a comprehensive evaluation of LLMs’ understanding of the domain knowledge. |
Hang Zhang; Jiawei SUN; Renqi Chen; Wei Liu; Zhonghang Yuan; Xinzhe Zheng; Zhefan Wang; Zhiyuan Yang; Hang Yan; Han-Sen Zhong; Xiqing Wang; Fan Yang; Nanqing Dong; Wanli Ouyang; |
410 | Can LLMs Learn By Teaching? A Preliminary Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration of this ambitious agenda. |
Xuefei Ning; Zifu Wang; Shiyao Li; Zinan Lin; Peiran Yao; Tianyu Fu; Matthew Blaschko; Guohao Dai; Huazhong Yang; Yu Wang; |
411 | Alleviating Distortion in Image Generation Via Multi-Resolution Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. |
Qihao Liu; Zhanpeng Zeng; Ju He; Qihang Yu; Xiaohui Shen; Liang-Chieh Chen; |
412 | SimGen: Simulator-conditioned Driving Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world. |
Yunsong Zhou; Michael Simon; Zhenghao (Mark) Peng; Sicheng Mo; Hongzi Zhu; Minyi Guo; Bolei Zhou; |
413 | Parallelizing Linear Transformers with The Delta Rule Over Sequence Length Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work describes a hardware-efficient algorithm for training a generalized variant of linear Transformers (of which DeltaNet is a special case) which exploits the WY representation for computing products of Householder matrices. |
Songlin Yang; Bailin Wang; Yu Zhang; Yikang Shen; Yoon Kim; |
414 | A Closer Look at Deep Learning Phenomena Through A Telescoping Lens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that this model presents a pedagogical formalism allowing us to isolate components of the training process even in complex contemporary settings, providing a sharp lens to reason about the effects of design choices such as architecture and optimization strategy, and reveals surprising parallels between neural network learning and gradient boosting. |
Alan Jeffares; Alicia Curth; Mihaela van der Schaar; |
415 | Communication Bounds for The Distributed Experts Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the experts problem in the distributed setting where an expert’s cost needs to be aggregated across multiple servers. |
Zhihao Jia; Qi Pang; Trung Tran; David Woodruff; Zhihao Zhang; Wenting Zheng; |
416 | Implicit Multimodal Alignment: On The Generalization of Frozen LLMs to Multimodal Inputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we expose frozen LLMs to image, video, audio and text inputs and analyse their internal representation with the attempt to understand their generalization beyond textual inputs. |
Mustafa Shukor; Matthieu Cord; |
417 | Mitigating Fine-tuning Based Jailbreak Attack with Backdoor Enhanced Safety Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. |
Jiongxiao Wang; Jiazhao LI; Yiquan Li; Xiangyu Qi; Junjie Hu; Sharon Li; Patrick McDaniel; Muhao Chen; Bo Li; Chaowei Xiao; |
418 | Game-Traversal-Benchmark: Evaluating Planning Abilities Of Large Language Models Via Traversing 2D Game Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They have also shown potential outside the natural language domain, but can LLMs plan? There has been a debate around this question. We contribute to this debate by proposing Game-Traversal-Benchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps to evaluate the planning and reasoning abilities of an LLM. |
Muhammad Umair Nasir; Steven James; Julian Togelius; |
419 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. |
Hoyeon Chang; Jinho Park; Seonghyeon Ye; Sohee Yang; Youngkyung Seo; Du-Seong Chang; Minjoon Seo; |
420 | One-shot Federated Learning Via Synthetic Distiller-Distillate Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, they may encounter scalability issues with complex datasets due to inherent two-step information loss: first, during local training (from data to model), and second, when transferring knowledge to the server model (from model to inversed data). In this paper, we propose FedSD2C, a novel and practical one-shot FL framework designed to address these challenges. |
JUNYUAN ZHANG; Songhua Liu; Xinchao Wang; |
421 | RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. |
Fangqiang Ding; Xiangyu Wen; Yunzhou Zhu; Yiming Li; Chris Xiaoxuan Lu; |
422 | FlexPlanner: Flexible 3D Floorplanning Via Deep Reinforcement Learning in Hybrid Action Space with Multi-Modality Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, they typically face difficulties in aligning the cross-die modules in 3D ICs due to their heuristic representations, which could potentially result in severe data transfer failures. To address these issues, we propose FlexPlanner, a flexible learning-based method in hybrid action space with multi-modality representation to simultaneously handle position, aspect ratio, and alignment of blocks. |
Ruizhe Zhong; Xingbo Du; Shixiong Kai; Zhentao Tang; Siyuan Xu; Jianye Hao; Mingxuan Yuan; Junchi Yan; |
423 | Learning Cooperative Trajectory Representations for Motion Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information. |
Hongzhi Ruan; Haibao Yu; Wenxian Yang; Siqi Fan; Zaiqing Nie; |
424 | Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. |
Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Houxing Ren; Aojun Zhou; Mingjie Zhan; Hongsheng Li; |
425 | Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting. |
Kaihang Pan; Zhaoyu Fan; Juncheng Li; Qifan Yu; Hao Fei; Siliang Tang; Richang Hong; Hanwang Zhang; QIANRU SUN; |
426 | LACIE: Listener-Aware Finetuning for Calibration in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that directly models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. |
Elias Stengel-Eskin; Peter Hase; Mohit Bansal; |
427 | AnonFair: A Flexible Toolkit for Algorithmic Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AnonFair, a new open source toolkit for enforcing algorithmic fairness. |
Eoin Delaney; Zihao Fu; Chris Russell; |
428 | SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs’ spatial perception and reasoning capabilities. |
AnChieh Cheng; Hongxu Yin; Yang Fu; Qiushan Guo; Ruihan Yang; Jan Kautz; Xiaolong Wang; Sifei Liu; |
429 | Learning Partitions from Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study the problem of learning the structure of a discrete set of tokens from their interaction with other tokens. |
Simon Buchholz; |
430 | APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents APIGen, an automated data generation pipeline designed to produce verifiable high-quality datasets for function-calling applications. |
Zuxin Liu; Thai Hoang; Jianguo Zhang; Ming Zhu; Tian Lan; Shirley kokane; Juntao Tan; Weiran Yao; Zhiwei Liu; Yihao Feng; Rithesh R N; Liangwei Yang; Silvio Savarese; Juan Carlos Niebles; Huan Wang; Shelby Heinecke; Caiming Xiong; |
431 | Multi-Label Learning with Stronger Consistency Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of surrogate losses and algorithms for multi-label learning, supported by $H$-consistency bounds. |
Anqi Mao; Yutao Zhong; Mehryar Mohri; |
432 | Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a comprehensive study of surrogate loss functions for learning to defer. |
Anqi Mao; Yutao Zhong; Mehryar Mohri; |
433 | CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. |
Leying Zhang; Yao Qian; Long Zhou; Shujie LIU; Dongmei Wang; Xiaofei Wang; Midia Yousefi; Yanmin Qian; Jinyu Li; Lei He; sheng zhao; Michael Zeng; |
434 | L4GM: Large 4D Gaussian Reconstruction Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input — in a single feed-forward pass that takes only a second. |
Jiawei Ren; Cheng Xie; Ashkan Mirzaei; hanxue liang; xiaohui zeng; Karsten Kreis; Ziwei Liu; Antonio Torralba; Sanja Fidler; Seung Wook Kim; Huan Ling; |
435 | CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. |
Peng Xia; Ze Chen; Juanxi Tian; Yangrui Gong; Ruibo Hou; Yue Xu; Zhenbang Wu; Zhiyuan Fan; Yiyang Zhou; Kangyu Zhu; Wenhao Zheng; Zhaoyang Wang; Xiao Wang; Xuchao Zhang; Chetan Bansal; Marc Niethammer; Junzhou Huang; Hongtu Zhu; Yun Li; Jimeng Sun; Zongyuan Ge; Gang Li; James Zou; Huaxiu Yao; |
436 | Guiding A Diffusion Model with A Bad Version of Itself Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. |
Tero Karras; Miika Aittala; Tuomas Kynkäänniemi; Jaakko Lehtinen; Timo Aila; Samuli Laine; |
437 | Elo Uncovered: Robustness and Best Practices in Language Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct extensive evaluation of Elo behaviour, illustrating that individual Elo computations exhibit volatility and investigating the impact of varying the Elo rating system’s hyperparameters. |
Meriem Boubdir; Edward Kim; Beyza Ermis; Sara Hooker; Marzieh Fadaee; |
438 | Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle a task which we call *data mixture inference*, which aims to uncover the distributional make-up of the pretraining data. |
Jonathan Hayase; Alisa Liu; Yejin Choi; Sewoong Oh; Noah Smith; |
439 | Theoretical and Empirical Insights Into The Origins of Degree Bias in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias. |
Arjun Subramonian; Jian Kang; Yizhou Sun; |
440 | Diffusion Models Are Certifiably Robust Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. |
Huanran Chen; Yinpeng Dong; Shitong Shao; Hao Zhongkai; Xiao Yang; Hang Su; Jun Zhu; |
441 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore the feasibility of training a text-to-image generation model comparable to advanced models using publicly available resources, we introduce EvolveDirector. |
Rui Zhao; Hangjie Yuan; Yujie Wei; Shiwei Zhang; Yuchao Gu; Lingmin Ran; Xiang Wang; Jay Zhangjie Wu; David Junhao Zhang; Yingya Zhang; Mike Zheng Shou; |
442 | SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, freezing the parameters in incremental sessions hinders models’ plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. |
Linglan Zhao; Xuerui Zhang; Weiran Huang; Ke Yan; Shouhong Ding; |
443 | A Simple Image Segmentation Framework Via In-Context Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image $\textbf{S}$egmentation framework utilizing $\textbf{in}$-context $\textbf{e}$xamples. |
Yang Liu; Chenchen Jing; Hengtao Li; Muzhi Zhu; Hao Chen; Xinlong Wang; Chunhua Shen; |
444 | Stress-Testing Capability Elicitation With Password-Locked Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elicit capabilities. |
Ryan Greenblatt; Fabien Roger; Dmitrii Krasheninnikov; David Krueger; |
445 | Consistency Diffusion Bridge Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, DDBM’s sampling process typically requires hundreds of network evaluations to achieve decent performance, which may impede their practical deployment due to high computational demands. In this work, inspired by the recent advance of consistency models in DMs, we tackle this problem by learning the consistency function of the probability-flow ordinary differential equation (PF-ODE) of DDBMs, which directly predicts the solution at a starting step given any point on the ODE trajectory. |
Guande He; Kaiwen Zheng; Jianfei Chen; Fan Bao; Jun Zhu; |
446 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present \ModelFullName (\ModelName), which processes long in-context text using visual tokens. |
Jinpeng Wang; Linjie Li; Yiqi Lin; Min Li; Lijuan Wang; Mike Zheng Shou; |
447 | $\textit{Bifr\ost}$: 3D-Aware Image Composing with Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces $\textit{Bifr\ost}$, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. |
Lingxiao Li; Kaixiong Gong; Wei-Hong Li; xili dai; Tao Chen; Xiaojun Yuan; Xiangyu Yue; |
448 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-modal Zero-shot Offboard Panoptic Perception (ZOPP) framework for autonomous driving scenes. |
Tao MA; Hongbin Zhou; Qiusheng Huang; Xuemeng Yang; Jianfei Guo; Bo Zhang; Min Dou; Yu Qiao; Botian Shi; Hongsheng Li; |
449 | Geometric Trajectory Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose geometric trajectory diffusion models (GeoTDM), the first diffusion model for modeling the temporal distribution of 3D geometric trajectories. |
Jiaqi Han; Minkai Xu; Aaron Lou; Haotian Ye; Stefano Ermon; |
450 | Lexicon3D: Probing Visual Encoding Models for Complex 3D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios. |
Yunze Man; Shuhong Zheng; Zhipeng Bao; Martial Hebert; Liangyan Gui; Yu-Xiong Wang; |
451 | PediatricsGPT: Large Language Models As Chinese Medical Assistants for Pediatric Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. |
Dingkang Yang; Jinjie Wei; Dongling Xiao; Shunli Wang; Tong Wu; Gang Li; Mingcheng Li; Shuaibing Wang; Jiawei Chen; Yue Jiang; Qingyao Xu; Ke Li; Peng Zhai; Lihua Zhang; |
452 | HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed Via Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. |
Yuanhao Cai; Zihao Xiao; Yixun Liang; Minghan Qin; Yulun Zhang; Xiaokang Yang; Yaoyao Liu; Alan Yuille; |
453 | Transferable Boltzmann Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, flow matching has been employed to train Boltzmann Generators for small molecular systems in Cartesian coordinates. We extend this work and propose a first framework for Boltzmann Generators that are transferable across chemical space, such that they predict zero-shot Boltzmann distributions for test molecules without being retraining for these systems. |
Leon Klein; Frank Noe; |
454 | PromptFix: You Prompt and We Fix The Photo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the stochastic nature of the diffusion process leads to deficiencies in image generation or editing tasks that require the detailed preservation of the generated images. To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks. |
yongsheng yu; Ziyun Zeng; Hang Hua; Jianlong Fu; Jiebo Luo; |
455 | HEST-1k: A Dataset For Spatial Transcriptomics and Histology Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce HEST-1k, a collection of 1,108 spatial transcriptomic profiles, each linked to a WSI and metadata. |
Guillaume Jaume; Paul Doucet; Andrew Song; Ming Y. Lu; Cristina Almagro Pérez; Sophia Wagner; Anurag Vaidya; Richard Chen; Drew Williamson; Ahrong Kim; Faisal Mahmood; |
456 | MiniCache: KV Cache Compression in Depth Dimension for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference. |
Akide Liu; Jing Liu; Zizheng Pan; Yefei He; Reza Haffari; Bohan Zhuang; |
457 | Improving Context-Aware Preference Modeling for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we contribute several \textit{context-conditioned} preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. |
Silviu Pitis; Ziang Xiao; Nicolas Le Roux; Alessandro Sordoni; |
458 | Unleashing The Potential of The Diffusion Model in Few-shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing from the extensive potential unveiled by the Diffusion Model in both semantic correspondence and open vocabulary segmentation, our work initiates an investigation into employing the Latent Diffusion Model for Few-shot Semantic Segmentation. |
Muzhi Zhu; Yang Liu; Zekai Luo; Chenchen Jing; Hao Chen; Guangkai Xu; Xinlong Wang; Chunhua Shen; |
459 | Self-Refining Diffusion Samplers: Enabling Parallelization Via Parareal Iterations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we introduce Self-Refining Diffusion Samplers (SRDS) that retain sample quality and can improve latency at the cost of additional parallel compute. |
Nikil Selvam; Amil Merchant; Stefano Ermon; |
460 | Text to Blind Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce BlindWays, the first multimodal motion benchmark for pedestrians who are blind. |
Hee Jae Kim; Kathakoli Sengupta; Masaki Kuribayashi; Hernisa Kacorri; Eshed Ohn-Bar; |
461 | Learning De-Biased Representations for Remote-Sensing Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose debLoRA—a generic training approach that works with any LoRA variants to yield debiased features. |
Zichen Tian; Zhaozheng CHEN; QIANRU SUN; |
462 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via 360-degree novel view synthesis. |
Wen-Hsuan Chu; Lei Ke; Katerina Fragkiadaki; |
463 | Learning 3D Garment Animation from Trajectories of A Piece of Cloth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, instead of using garment-wise supervised-learning we adopt a disentangled scheme to learn how to animate observed garments: 1). |
Yidi Shao; Chen Change Loy; Bo Dai; |
464 | Measuring Per-Unit Interpretability at Scale Without Humans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the first scalable method to measure the per-unit interpretability in vision DNNs. |
Roland S. Zimmermann; David Klindt; Wieland Brendel; |
465 | Spatio-Spectral Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, key limitations of *ℓ*-step MPGNNs are that their receptive field is typically limited to the *ℓ*-hop neighborhood of a node and that information exchange between distant nodes is limited by over-squashing. Motivated by these limitations, we propose *Spatio-Spectral Graph Neural Networks (S²GNNs)* – a new modeling paradigm for Graph Neural Networks (GNNs) that synergistically combines spatially and spectrally parametrized graph filters. |
Simon Geisler; Arthur Kosmala; Daniel Herbst; Stephan Günnemann; |
466 | No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is unclear how meaningful the notion of zero-shot generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during zero-shot evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? |
Vishaal Udandarao; Ameya Prabhu; Adhiraj Ghosh; Yash Sharma; Philip Torr; Adel Bibi; Samuel Albanie; Matthias Bethge; |
467 | The Best of Both Worlds: Toward An Honest and Helpful Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a training-free method named Curiosity-Driven Prompting, which enables LLMs to express their internal confusion and uncertainty about the given query and then optimize their responses. |
Gao Chujie; Qihui Zhang; Dongping Chen; Yue Huang; Siyuan Wu; Zhengyan Fu; Yao Wan; Xiangliang Zhang; Lichao Sun; |
468 | Breaking The Multi-Task Barrier in Meta-Reinforcement Learning with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is difficult to scale towards more general behavior without confronting challenges in multi-task optimization, but few solutions are compatible with meta-RL’s goal of learning from large training sets of unlabeled tasks. To address this challenge, we revisit the idea that multi-task RL is bottlenecked by imbalanced training losses created by uneven return scales across different tasks. |
Jake Grigsby; Justin Sasek; Samyak Parajuli; Ikechukwu D. Adebi; Amy Zhang; Yuke Zhu; |
469 | Federated Fine-tuning of Large Language Models Under Heterogeneous Tasks and Client Resources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the buckets effect in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. |
Jiamu Bai; Daoyuan Chen; Bingchen Qian; Liuyi Yao; Yaliang Li; |
470 | Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the semantic structure of CLIP’s latent space can be leveraged to provide interpretability, at no cost to downstream performance, by decomposing representations into semantic concepts. |
Usha Bhalla; Alex Oesterling; Suraj Srinivas; Flavio Calmon; Himabindu Lakkaraju; |
471 | Invariant Tokenization for Language Model Enabled Crystal Materials Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. |
Keqiang Yan; Xiner Li; Hongyi Ling; Kenna Ashen; Carl Edwards; Raymundo Arroyave; Marinka Zitnik; Heng Ji; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji; |
472 | A Simplicity Bias in The Learning Dynamics of Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To conduct this analysis, we develop a procedure to generate \textit{clones} of a given natural language data set, which capture the interactions between tokens up to a specified order. |
Riccardo Rende; Federica Gerace; Alessandro Laio; Sebastian Goldt; |
473 | CountGD: Multi-Modal Open-World Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to improve the generality and accuracy of open-vocabulary object counting in images. |
Niki Amini-Naieni; Tengda Han; Andrew Zisserman; |
474 | Poseidon: Efficient Foundation Models for PDEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Poseidon, a foundation model for learning the solution operators of PDEs. |
Maximilian Herde; Bogdan Raonic; Tobias Rohner; Roger Käppeli; Roberto Molinaro; Emmanuel de Bézenac; Siddhartha Mishra; |
475 | AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process. |
Jian Guan; Wei Wu; zujie wen; Peng Xu; Hongning Wang; Minlie Huang; |
476 | CAT3D: Create Anything in 3D with Multi-View Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. |
Ruiqi Gao; Aleksander Holynski; Philipp Henzler; Arthur Brussee; Ricardo Martin Brualla; Pratul Srinivasan; Jonathan Barron; Ben Poole; |
477 | G2D: From Global to Dense Radiography Representation Learning Via Vision-Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This focus hinders the learning of dense (pixel-level) visual features and is suboptimal for dense prediction tasks (e.g., medical image segmentation). To address this challenge, we propose a novel medical VLP framework, named **Global to Dense level representation learning (G2D)**, which aims to learn global and dense visual features simultaneously using only image-text pairs without extra annotations. |
Che Liu; Cheng Ouyang; Sibo Cheng; Anand Shah; Wenjia Bai; Rossella Arcucci; |
478 | HelpSteer 2: Open-source Dataset for Training Top-performing Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). |
Zhilin Wang; Yi Dong; Olivier Delalleau; Jiaqi Zeng; Gerald Shen; Daniel Egert; Jimmy Zhang; Makesh Narsimhan Sreedhar; Oleksii Kuchaiev; |
479 | Benchmarking Complex Instruction-Following with Multiple Constraints Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. |
Bosi Wen; Pei Ke; Xiaotao Gu; Lindong Wu; Hao Huang; Jinfeng Zhou; Wenchuang Li; Binxin Hu; Wendy Gao; Jiaxing Xu; Yiming Liu; Jie Tang; Hongning Wang; Minlie Huang; |
480 | Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this paper, we propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. |
Mathilde Caron; Alireza Fathi; Cordelia Schmid; Ahmet Iscen; |
481 | Learning to Assist Humans Without Inferring Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Theoretically, our work connects ideas from information theory, neuroscience, and reinforcement learning, and charts a path for representations to play a critical role in solving assistive problems. |
Vivek Myers; Evan Ellis; Benjamin Eysenbach; Sergey Levine; Anca Dragan; |
482 | UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel Unified Projection Sharing algorithm(UPS) to decouple the feature extraction and similarity modeling, achieving notable performance. |
Kun Zhou; Xinyu Lin; Zhonghang LIU; Xiaoguang Han; Jiangbo Lu; |
483 | Full-Atom Peptide Design with Geometric Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a generative model for full-atom Peptide design with Geometric LAtent Diffusion (PepGLAD). |
Xiangzhe Kong; Yinjun Jia; Wenbing Huang; Yang Liu; |
484 | Exocentric-to-Egocentric Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Exo2Ego-V, a novel exocentric-to-egocentric diffusion-based video generation method for daily-life skilled human activities where sparse 4-view exocentric viewpoints are configured 360° around the scene. |
Jia-Wei Liu; Weijia Mao; Zhongcong XU; Jussi Keppo; Mike Zheng Shou; |
485 | Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability. |
Shenyuan Gao; Jiazhi Yang; Li Chen; Kashyap Chitta; Yihang Qiu; Andreas Geiger; Jun Zhang; Hongyang Li; |
486 | Most Influential Subset Selection: Challenges, Promises, and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. |
Yuzheng Hu; Pingbang Hu; Han Zhao; Jiaqi Ma; |
487 | SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for The Legal Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SaulLM-medium and SaulLM-large, two large language models (LLMs) families tailored for the legal sector. |
Pierre Colombo; Telmo Pessoa Pires; Malik Boudiaf; Rui Melo; Gabriel Hautreux; Etienne Malaboeuf; Johanne Charpentier; Dominic Culver; Michael Desa; |
488 | MediQ: Question-Asking LLMs for Adaptive and Reliable Medical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MEDIQ, a framework to simulate realistic clinical interactions, which incorporates a Patient System and an adaptive Expert System. |
Shuyue Stella Li; Vidhisha Balachandran; Shangbin Feng; Jonathan Ilgen; Emma Pierson; Pang Wei Koh; Yulia Tsvetkov; |
489 | A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. |
Yitong Dong; Yijin Li; Zhaoyang Huang; Weikang Bian; Jingbo Liu; Hujun Bao; Zhaopeng Cui; Hongsheng Li; Guofeng Zhang; |
490 | Breaking The False Sense of Security in Backdoor Defense Through Re-Activation Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More practically, we extend our backdoor re-activation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. |
Mingli Zhu; Siyuan Liang; Baoyuan Wu; |
491 | InfoRM: Mitigating Reward Hacking in RLHF Via Information-Theoretic Reward Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this problem from an information-theoretic perspective and propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information. |
Yuchun Miao; Sen Zhang; Liang Ding; Rong Bao; Lefei Zhang; Dacheng Tao; |
492 | Fishers and Hessians of Continuous Relaxations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a technique for using the empirical Fisher matrices and Hessians of relaxations to alleviate the training bottleneck that arises from vanishing and exploding gradients in the objective function. |
Felix Petersen; Christian Borgelt; Tobias Sutter; Hilde Kuehne; Oliver Deussen; Stefano Ermon; |
493 | TrAct: Making First-layer Pre-Activations Trainable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. |
Felix Petersen; Christian Borgelt; Stefano Ermon; |
494 | Convolutional Differentiable Logic Gate Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, an approach for learning logic gate networks directly via a differentiable relaxation was proposed.Logic gate networks are faster than conventional neural network approaches because their inference only requires logic gate operators such as NAND, OR, and XOR, which are the underlying building blocks of current hardware and can be efficiently executed.We build on this idea, extending it by deep logic gate tree convolutions, logical OR pooling, and residual initializations. |
Felix Petersen; Hilde Kuehne; Christian Borgelt; Julian Welzel; Stefano Ermon; |
495 | Zero-Shot Image Segmentation Via Recursive Normalized Cut on Diffusion Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a diffusion UNet encoder as a foundation vision encoder and we introduce DiffCut, an unsupervised zero-shot segmentation method that solely harnesses the output features from the final self-attention block. |
Paul Couairon; Mustafa Shukor; Jean-Emmanuel HAUGEARD; Matthieu Cord; Nicolas THOME; |
496 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the finding that existing tokenizers are tailored to either image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. |
Junke Wang; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Zuxuan Wu; Yu-Gang Jiang; |
497 | InstructG2I: Synthesizing Images from Multimodal Attributed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). |
Bowen Jin; Ziqi Pang; Bingjun Guo; Yu-Xiong Wang; Jiaxuan You; Jiawei Han; |
498 | On The Scalability of GNNs for Molecular Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. |
Maciej Sypetkowski; Frederik Wenkel; Farimah Poursafaei; Nia Dickson; Karush Suri; Philip Fradkin; Dominique Beaini; |
499 | Efficient LLM Scheduling By Learning to Rank Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, most LLM serving systems employ a simple First-come-first-serve (FCFS) scheduling strategy, leading to Head-Of-Line (HOL) blocking and reduced throughput and service quality. In this paper, we reexamine this assumption — we show that, although predicting the exact generation length of each request is infeasible, it is possible to predict the relative ranks of output lengths in a batch of requests, using learning to rank. |
Yichao Fu; Siqi Zhu; Runlong Su; Aurick Qiao; Ion Stoica; Hao Zhang; |
500 | Antigen-Specific Antibody Design Via Direct Energy-based Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences, considering both rationality and functionality. |
Xiangxin Zhou; Dongyu Xue; Ruizhe Chen; Zaixiang Zheng; Liang Wang; Quanquan Gu; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~4,500 papers), please visit Paper Digest: NeurIPS-2024 (Full List).