Paper Digest: ICML 2024 Papers & Highlights

June 12, 2024October 11, 2024 admin

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2024, it is to be held in Vienna. To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.

Note: ICML-2024 accepts more than 2,600 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 2,600 ICML-2024 papers in a separate page.

To search or review papers within ICML-2024 related to a specific topic, please use the search by venue (ICML-2024), review by venue (ICML-2024) and question answering by venue (ICML-2024) services. To browse papers by author, here is a list of all ~9,000 authors (ICML-2024). You may also like to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.

This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!

TABLE 1: Paper Digest: ICML 2024 Papers & Highlights

	Paper	Author(s)
1	Better & Faster Large Language Models Via Multi-token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency.	Fabian Gloeckle; Badr Youbi Idrissi; Baptiste Roziere; David Lopez-Paz; Gabriel Synnaeve;
2	Transformers Are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While Transformers have been the main architecture behind deep learning’s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured semiseparable matrices.	Tri Dao; Albert Gu;
3	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales.	Patrick Esser; Sumith Kulal; Andreas Blattmann; Rahim Entezari; Jonas Müller; Harry Saini; Yam Levi; Dominik Lorenz; Axel Sauer; Frederic Boesel; Dustin Podell; Tim Dockhorn; Zion English; Robin Rombach;
4	Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel.	Tianle Cai; Yuhong Li; Zhengyang Geng; Hongwu Peng; Jason D. Lee; Deming Chen; Tri Dao;
5	Improving Factuality and Reasoning in Language Models Through Multiagent Debate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.	Yilun Du; Shuang Li; Antonio Torralba; Joshua B. Tenenbaum; Igor Mordatch;
6	Chatbot Arena: An Open Platform for Evaluating LLMs By Human Preference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using for efficient and accurate evaluation and ranking of models.	Wei-Lin Chiang; Lianmin Zheng; Ying Sheng; Anastasios Nikolas Angelopoulos; Tianle Li; Dacheng Li; Banghua Zhu; Hao Zhang; Michael Jordan; Joseph E. Gonzalez; Ion Stoica;
7	How Language Model Hallucinations Can Snowball Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim.	Muru Zhang; Ofir Press; William Merrill; Alisa Liu; Noah A. Smith;
8	Model Alignment As Prospect Theoretic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using a Kahneman-Tversky model of human utility, we propose a HALO that directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do.	Kawin Ethayarajh; Winnie Xu; Niklas Muennighoff; Dan Jurafsky; Douwe Kiela;
9	Premise Order Matters in Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that even if the model performance is decent on the optimal order, permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark.	Xinyun Chen; Ryan Andrew Chi; Xuezhi Wang; Denny Zhou;
10	RLAIF Vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Across the tasks of summarization, helpful dialogue generation, and harmless dialogue generation, we show that RLAIF achieves comparable performance to RLHF. Furthermore, we take a step towards self-improvement by demonstrating that RLAIF can outperform a supervised fine-tuned baseline even when the AI labeler is the same size as the policy, or even the exact same checkpoint as the initial policy.	Harrison Lee; Samrat Phatale; Hassan Mansoor; Thomas Mesnard; Johan Ferret; Kellie Ren Lu; Colton Bishop; Ethan Hall; Victor Carbune; Abhinav Rastogi; Sushant Prakash;
11	Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain.	Chrisantha Fernando; Dylan Sunil Banarse; Henryk Michalewski; Simon Osindero; Tim Rocktäschel;
12	LESS: Selecting Influential Data for Targeted Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose LESS, an optimizer-aware and practically efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.To facilitate future work, we release code and data at [princeton-nlp/LESS](https://github.com/princeton-nlp/LESS).	Mengzhou Xia; Sadhika Malladi; Suchin Gururangan; Sanjeev Arora; Danqi Chen;
13	Stay on Topic with Classifier-Free Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate across a wide array of benchmarks that CFG can be used broadly as an inference-time technique in pure language modeling.	Guillaume Sanchez; Alexander Spangher; Honglu Fan; Elad Levi; Stella Biderman;
14	Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model?	Collin Burns; Pavel Izmailov; Jan Hendrik Kirchner; Bowen Baker; Leo Gao; Leopold Aschenbrenner; Yining Chen; Adrien Ecoffet; Manas Joglekar; Jan Leike; Ilya Sutskever; Jeffrey Wu;
15	Genie: Generative Interactive Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Genie, the first generative interactive environment* trained in an unsupervised manner from unlabelled Internet videos.*	Jake Bruce; Michael D Dennis; Ashley Edwards; Jack Parker-Holder; Yuge Shi; Edward Hughes; Matthew Lai; Aditi Mavalankar; Richie Steigerwald; Chris Apps; Yusuf Aytar; Sarah Maria Elisabeth Bechtle; Feryal Behbahani; Stephanie C.Y. Chan; Nicolas Heess; Lucy Gonzalez; Simon Osindero; Sherjil Ozair; Scott Reed; Jingwei Zhang; Konrad Zolna; Jeff Clune; Nando de Freitas; Satinder Singh; Tim Rocktäschel;
16	R2E: Turning Any Github Repository Into A Programming Agent Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Repository to Environment (R2E), a framework that can turn any GitHub repository into a test environment to evaluate the performance of code-generating systems, both static and interactive.	Naman Jain; Manish Shetty; Tianjun Zhang; King Han; Koushik Sen; Ion Stoica;
17	ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality AI feedback automatically for a scalable alternative.	Ganqu Cui; Lifan Yuan; Ning Ding; Guanming Yao; Bingxiang He; Wei Zhu; Yuan Ni; Guotong Xie; Ruobing Xie; Yankai Lin; Zhiyuan Liu; Maosong Sun;
18	DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, encoding a complex, potentially multimodal data distribution into a single continuous Gaussian distribution arguably represents an unnecessarily challenging learning problem. We propose *Discrete-Continuous Latent Variable Diffusion Models (DisCo-Diff)* to simplify this task by introducing complementary discrete latent variables.	Yilun Xu; Gabriele Corso; Tommi Jaakkola; Arash Vahdat; Karsten Kreis;
19	Disentangled 3D Scene Generation with Layout Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method to generate 3D scenes that are disentangled into their component objects.	Dave Epstein; Ben Poole; Ben Mildenhall; Alexei A Efros; Aleksander Holynski;
20	Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called Align Your Steps.	Amirmojtaba Sabour; Sanja Fidler; Karsten Kreis;
21	HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming.	Mantas Mazeika; Long Phan; Xuwang Yin; Andy Zou; Zifan Wang; Norman Mu; Elham Sakhaee; Nathaniel Li; Steven Basart; Bo Li; David Forsyth; Dan Hendrycks;
22	Stealing Part of A Production Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2.	Nicholas Carlini; Daniel Paleka; Krishnamurthy Dj Dvijotham; Thomas Steinke; Jonathan Hayase; A. Feder Cooper; Katherine Lee; Matthew Jagielski; Milad Nasr; Arthur Conmy; Eric Wallace; David Rolnick; Florian Tramèr;
23	Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We primarily question whether the use of large Web-scraped datasets should* be viewed as differential-privacy-preserving.*	Florian Tramèr; Gautam Kamath; Nicholas Carlini;
24	Grokking Group Multiplication with Cosets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have “grokked” the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group’s subgroups.	Dashiell Stander; Qinan Yu; Honglu Fan; Stella Biderman;
25	Position: Open-Endedness Is Essential for Artificial Superhuman Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the creation of open-ended, ever self-improving AI remains elusive. *In this position paper, we argue that the ingredients are now in place to achieve open-endedness* in AI systems with respect to a human observer.	Edward Hughes; Michael D Dennis; Jack Parker-Holder; Feryal Behbahani; Aditi Mavalankar; Yuge Shi; Tom Schaul; Tim Rocktäschel;
26	Fast Adversarial Attacks on Language Models In One GPU Minute Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel class of fast, beam search-based adversarial attack (BEAST) for Language Models (LMs).	Vinu Sankar Sadasivan; Shoumik Saha; Gaurang Sriramanan; Priyatham Kattakinda; Atoosa Chegini; Soheil Feizi;
27	DoRA: Weight-Decomposed Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA).	Shih-yang Liu; Chien-Yi Wang; Hongxu Yin; Pavlo Molchanov; Yu-Chiang Frank Wang; Kwang-Ting Cheng; Min-Hung Chen;
28	Training Large Language Models for Reasoning Through Reverse Curriculum Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.	Zhiheng Xi; Wenxiang Chen; Boyang Hong; Senjie Jin; Rui Zheng; Wei He; Yiwen Ding; Shichun Liu; Xin Guo; Junzhe Wang; Honglin Guo; Wei Shen; Xiaoran Fan; Yuhao Zhou; Shihan Dou; Xiao Wang; Xinbo Zhang; peng sun; Tao Gui; Qi Zhang; Xuanjing Huang;
29	Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the reliance on self-attention for visual representation learning is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models.	Lianghui Zhu; Bencheng Liao; Qian Zhang; Xinlong Wang; Wenyu Liu; Xinggang Wang;
30	Equivariant Graph Neural Operator for Modeling 3D Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Equivariant Graph Neural Operator (EGNO), a novel and principled method that directly models dynamics as trajectories instead of just next-step prediction.	Minkai Xu; Jiaqi Han; Aaron Lou; Jean Kossaifi; Arvind Ramanathan; Kamyar Azizzadenesheli; Jure Leskovec; Stefano Ermon; Anima Anandkumar;
31	Language Models with Conformal Factuality Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose conformal factuality, a framework that can ensure high probability correctness guarantees for LMs by connecting language modeling and conformal prediction.	Christopher Mohri; Tatsunori Hashimoto;
32	Prismatic VLMs: Investigating The Design Space of Visually-Conditioned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance � a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization, and challenge sets that probe properties such as hallucination; evaluations that provide fine-grained insight VLM capabilities.	Siddharth Karamcheti; Suraj Nair; Ashwin Balakrishna; Percy Liang; Thomas Kollar; Dorsa Sadigh;
33	Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the Hourglass Diffusion Transformer (HDiT), an image-generative model that exhibits linear scaling with pixel count, supporting training at high resolution (e.g. $1024 \times 1024$) directly in pixel-space.	Katherine Crowson; Stefan Andreas Baumann; Alex Birch; Tanishq Mathew Abraham; Daniel Z Kaplan; Enrico Shippole;
34	Online Conformal Prediction with Decaying Step Sizes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method for online conformal prediction with decaying step sizes.	Anastasios Nikolas Angelopoulos; Rina Barber; Stephen Bates;
35	IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We significantly improve multi-view generation by considering video instead of image generators.	Luke Melas-Kyriazi; Iro Laina; Christian Rupprecht; Natalia Neverova; Andrea Vedaldi; Oran Gafni; Filippos Kokkinos;
36	Learning to Explore in POMDPs with Informational Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a POMDP agent that gathers information about the hidden state, using ideas from the meta-exploration literature.	Annie Xie; Logan Mondal Bhamidipaty; Evan Zheran Liu; Joey Hong; Sergey Levine; Chelsea Finn;
37	The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations.We release our benchmark and code publicly at https://wmdp.ai.	Nathaniel Li; Alexander Pan; Anjali Gopal; Summer Yue; Daniel Berrios; Alice Gatti; Justin D. Li; Ann-Kathrin Dombrowski; Shashwat Goel; Gabriel Mukobi; Nathan Helm-Burger; Rassin Lababidi; Lennart Justen; Andrew Bo Liu; Michael Chen; Isabelle Barrass; Oliver Zhang; Xiaoyuan Zhu; Rishub Tamirisa; Bhrugu Bharathi; Ariel Herbert-Voss; Cort B Breuer; Andy Zou; Mantas Mazeika; Zifan Wang; Palash Oswal; Weiran Lin; Adam Alfred Hunt; Justin Tienken-Harder; Kevin Y. Shih; Kemper Talley; John Guan; Ian Steneker; David Campbell; Brad Jokubaitis; Steven Basart; Stephen Fitz; Ponnurangam Kumaraguru; Kallol Krishna Karmakar; Uday Tupakula; Vijay Varadharajan; Yan Shoshitaishvili; Jimmy Ba; Kevin M. Esvelt; Alexandr Wang; Dan Hendrycks;
38	Linguistic Calibration of Long-Form Generations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form generations: an LM is linguistically calibrated if its generations enable its users to make calibrated probabilistic predictions.	Neil Band; Xuechen Li; Tengyu Ma; Tatsunori Hashimoto;
39	Position: Data-driven Discovery with Large Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We instead advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms, to foster data-driven scientific discoveries with efficiency and reproducibility.	Bodhisattwa Prasad Majumder; Harshit Surana; Dhruv Agarwal; Sanchaita Hazra; Ashish Sabharwal; Peter Clark;
40	Debating with More Persuasive LLMs Leads to More Truthful Answers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information.	Akbir Khan; John Hughes; Dan Valentine; Laura Ruis; Kshitij Sachan; Ansh Radhakrishnan; Edward Grefenstette; Samuel R. Bowman; Tim Rocktäschel; Ethan Perez;
41	Learning Iterative Reasoning Through Energy Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization.	Yilun Du; Jiayuan Mao; Joshua B. Tenenbaum;
42	Learning to Route Among Specialized Experts for Zero-Shot Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose $\textbf{P}$ost-$\textbf{H}$oc $\textbf{A}$daptive $\textbf{T}$okenwise $\textbf{G}$ating $\textbf{O}$ver an $\textbf{O}$cean of $\textbf{S}$pecialized $\textbf{E}$xperts (PHATGOOSE), which learns to route among specialized modules that were produced through parameter-efficient fine-tuning.	Mohammed Muqeeth; Haokun Liu; Yufan Liu; Colin Raffel;
43	Magicoder: Empowering Code Generation with OSS-Instruct Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters.	Yuxiang Wei; Zhe Wang; Jiawei Liu; Yifeng Ding; LINGMING ZHANG;
44	NExT-GPT: Any-to-Any Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT.	Shengqiong Wu; Hao Fei; Leigang Qu; Wei Ji; Tat-Seng Chua;
45	MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks.	Weihao Yu; Zhengyuan Yang; Linjie Li; Jianfeng Wang; Kevin Lin; Zicheng Liu; Xinchao Wang; Lijuan Wang;
46	Compositional Image Decomposition with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a method to decompose an image into such compositional components.	Jocelin Su; Nan Liu; Yanbo Wang; Joshua B. Tenenbaum; Yilun Du;
47	Position: Compositional Generative Modeling: A Single Model Is Not All You Need Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together.	Yilun Du; Leslie Pack Kaelbling;
48	Potential Based Diffusion Motion Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach towards learning potential based motion planning, where we train a neural network to capture and learn an easily optimizable potentials over motion planning trajectories.	Yunhao Luo; Chen Sun; Joshua B. Tenenbaum; Yilun Du;
49	Monitoring AI-Modified Content at Scale: A Case Study on The Impact of ChatGPT on AI Conference Peer Reviews Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM).	Weixin Liang; Zachary Izzo; Yaohui Zhang; Haley Lepp; Hancheng Cao; Xuandong Zhao; Lingjiao Chen; Haotian Ye; Sheng Liu; Zhi Huang; Daniel McFarland; James Y. Zou;
50	Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a framework called Patchscopes and show how it can be used to answer a wide range of questions about an LLM’s computation.	Asma Ghandeharioun; Avi Caciularu; Adam Pearce; Lucas Dixon; Mor Geva;
51	Graph Positional and Structural Encoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present the Graph Positional and Structural Encoder (GPSE), the first-ever graph encoder designed to capture rich PSE representations for augmenting any GNN.	Semih Cantürk; Renming Liu; Olivier Lapointe-Gagné; Vincent Létourneau; Guy Wolf; Dominique Beaini; Ladislav Rampášek;
52	Neural Operators with Localized Integral and Differential Kernels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a principled approach to operator learning that can capture local features under two frameworks by learning differential operators and integral operators with locally supported kernels.	Miguel Liu-Schiaffini; Julius Berner; Boris Bonev; Thorsten Kurth; Kamyar Azizzadenesheli; Anima Anandkumar;
53	Solving Poisson Equations Using Neural Walk-on-Spheres Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Neural Walk-on-Spheres (NWoS), a novel neural PDE solver for the efficient solution of high-dimensional Poisson equations.	Hong Chul Nam; Julius Berner; Anima Anandkumar;
54	Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better leverage pretraining for distribution shifts, we propose the Connect Later framework, which fine-tunes the model with targeted augmentations designed with knowledge of the shift.	Helen Qu; Sang Michael Xie;
55	Scalable Pre-training of Large Autoregressive Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective.	Alaaeldin El-Nouby; Michal Klein; Shuangfei Zhai; Miguel Ángel Bautista; Vaishaal Shankar; Alexander T Toshev; Joshua M. Susskind; Armand Joulin;
56	Language Models As Science Tutors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this, we introduce TutorEval and TutorChat.Our datasets build on open-source materials, and we release our models, data, and evaluations publicly.	Alexis Chevalier; Jiayi Geng; Alexander Wettig; Howard Chen; Sebastian Mizera; Toni Annala; Max Aragon; Arturo Rodriguez Fanlo; Simon Frieder; Simon Machado; Akshara Prabhakar; Ellie Thieu; Jiachen T. Wang; Zirui Wang; Xindi Wu; Mengzhou Xia; Wenhan Xia; Jiatong Yu; Junjie Zhu; Zhiyong Ren; Sanjeev Arora; Danqi Chen;
57	Fundamental Limitations of Alignment in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models.	Yotam Wolf; Noam Wies; Oshri Avnery; Yoav Levine; Amnon Shashua;
58	Test-Time Model Adaptation with Only Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method.	Shuaicheng Niu; Chunyan Miao; Guohao Chen; Pengcheng Wu; Peilin Zhao;
59	RLVF: Learning from Verbal Feedback Without Overgeneralization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method Contextualized Critiques with Constrained Preference Optimization (C3PO) to learn from high-level verbal feedback while reducing overgeneralization compared to current work.	Moritz Pascal Stephan; Alexander Khazatsky; Eric Mitchell; Annie S Chen; Sheryl Hsu; Archit Sharma; Chelsea Finn;
60	Position: The No Free Lunch Theorem, Kolmogorov Complexity, and The Role of Inductive Biases in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm.	Micah Goldblum; Marc Anton Finzi; Keefer Rowan; Andrew Gordon Wilson;
61	Mechanistic Design and Scaling of Hybrid Architectures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling laws.	Michael Poli; Armin W Thomas; Eric Nguyen; Pragaash Ponnusamy; Björn Deiseroth; Kristian Kersting; Taiji Suzuki; Brian Hie; Stefano Ermon; Christopher Re; Ce Zhang; Stefano Massaroli;
62	Human Alignment of Large Language Models Through Online Preference Optimisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD.	Daniele Calandriello; Zhaohan Daniel Guo; Remi Munos; Mark Rowland; Yunhao Tang; Bernardo Avila Pires; Pierre Harvey Richemond; Charline Le Lan; Michal Valko; Tianqi Liu; Rishabh Joshi; Zeyu Zheng; Bilal Piot;
63	NExT: Teaching Large Language Models to Reason About Code Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of how programs execute at run-time. To address this issue, we propose NExT, a method to teach LLMs to inspect the execution traces of programs (variable states of executed lines) and reason about their run-time behavior through chain-of-thought (CoT) rationales.	Ansong Ni; Miltiadis Allamanis; Arman Cohan; Yinlin Deng; Kensen Shi; Charles Sutton; Pengcheng Yin;
64	Q-Probe: A Lightweight Approach to Reward Maximization for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function.	Kenneth Li; Samy Jelassi; Hugh Zhang; Sham M. Kakade; Martin Wattenberg; David Brandfonbrener;
65	Position: Levels of AGI for Operationalizing Progress on The Path to AGI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors.	Meredith Ringel Morris; Jascha Sohl-Dickstein; Noah Fiedel; Tris Warkentin; Allan Dafoe; Aleksandra Faust; Clement Farabet; Shane Legg;
66	Distinguishing The Knowable from The Unknowable with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text.	Gustaf Ahdritz; Tian Qin; Nikhil Vyas; Boaz Barak; Benjamin L. Edelman;
67	InstructRetro: Instruction Tuning Post Retrieval-Augmented Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval.	Boxin Wang; Wei Ping; Lawrence McAfee; Peng Xu; Bo Li; Mohammad Shoeybi; Bryan Catanzaro;
68	PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel visual prompting approach for VLMs that we call Prompting with Iterative Visual Optimization (PIVOT), which casts tasks as iterative visual question answering.	Soroush Nasiriany; Fei Xia; Wenhao Yu; Ted Xiao; Jacky Liang; Ishita Dasgupta; Annie Xie; Danny Driess; Ayzaan Wahid; Zhuo Xu; Quan Vuong; Tingnan Zhang; Tsang-Wei Edward Lee; Kuang-Huei Lee; Peng Xu; Sean Kirmani; Yuke Zhu; Andy Zeng; Karol Hausman; Nicolas Heess; Chelsea Finn; Sergey Levine; brian ichter;
69	Assessing The Brittleness of Safety Alignment Via Pruning and Low-Rank Modifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels.	Boyi Wei; Kaixuan Huang; Yangsibo Huang; Tinghao Xie; Xiangyu Qi; Mengzhou Xia; Prateek Mittal; Mengdi Wang; Peter Henderson;
70	QuRating: Selecting High-Quality Data for Training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality.	Alexander Wettig; Aatmik Gupta; Saumya Malik; Danqi Chen;
71	Rolling Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores Rolling Diffusion: a new approach that uses a sliding window denoising process.	David Ruhe; Jonathan Heek; Tim Salimans; Emiel Hoogeboom;
72	StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation stroke tokens on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed.	Zecheng Tang; Chenfei Wu; Zekai Zhang; Minheng Ni; Shengming Yin; Yu Liu; Zhengyuan Yang; Lijuan Wang; Zicheng Liu; Juntao Li; Nan Duan;
73	Does Label Smoothing Help Deep Partial Label Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In theory, we prove lower and upper bounds of the expected risk to show that label smoothing can help deep PLL.	Xiuwen Gong; Nitin Bisht; Guandong Xu;
74	Extreme Compression of Large Language Models Via Additive Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the problem of “extreme” LLM compression—defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter—from the point of view of classic methods in Multi-Codebook Quantization (MCQ).	Vage Egiazarian; Andrei Panferov; Denis Kuznedelev; Elias Frantar; Artem Babenko; Dan Alistarh;
75	MathScale: Scaling Instruction Tuning for Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e.g., GPT-3.5).As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs.	Zhengyang Tang; Xingxing Zhang; Benyou Wang; Furu Wei;
76	Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This raises a question: what kind of approaches are important for fine-tuning with preference data and why? In this paper, we answer this question by performing a rigorous analysis of a number of fine-tuning techniques on didactic and full-scale LLM problems.	Fahim Tajwar; Anikait Singh; Archit Sharma; Rafael Rafailov; Jeff Schneider; Tengyang Xie; Stefano Ermon; Chelsea Finn; Aviral Kumar;
77	WARM: On The Benefits of Weight Averaged Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify two primary challenges when designing RMs to mitigate reward hacking: distribution shifts during the RL process and inconsistencies in human preferences. As a solution, we propose Weight Averaged Reward Models (WARM), first fine-tuning multiple RMs, then averaging them in the weight space.	Alexandre Rame; Nino Vieillard; Leonard Hussenot; Robert Dadashi; Geoffrey Cideron; Olivier Bachem; Johan Ferret;
78	Offline Training of Language Model Agents with Functions As Learnable Weights Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications.	Shaokun Zhang; Jieyu Zhang; Jiale Liu; Linxin Song; Chi Wang; Ranjay Krishna; Qingyun Wu;
79	Neural Networks Learn Statistics of Increasing Complexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The _distributional simplicity bias_ (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later.	Nora Belrose; Quintin Pope; Lucia Quirke; Alex Troy Mallen; Xiaoli Fern;
80	Modeling Caption Diversity in Contrastive Vision-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image.	Samuel Lavoie; Polina Kirichenko; Mark Ibrahim; Mido Assran; Andrew Gordon Wilson; Aaron Courville; Nicolas Ballas;
81	Interpretability Illusions in The Generalization of Simplified Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model’s behavior out of distribution. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits, including the Dyck balanced-parenthesis languages and a code completion task.	Dan Friedman; Andrew Kyle Lampinen; Lucas Dixon; Danqi Chen; Asma Ghandeharioun;
82	Position Paper: On The Societal Impact of Open Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g., Llama 3, Stable Diffusion XL).	Sayash Kapoor; Rishi Bommasani; Kevin Klyman; Shayne Longpre; Ashwin Ramaswami; Peter Cihon; Aspen K Hopkins; Kevin Bankston; Stella Biderman; Miranda Bogen; Rumman Chowdhury; Alex Engler; Peter Henderson; Yacine Jernite; Seth Lazar; Stefano Maffulli; Alondra Nelson; Joelle Pineau; Aviya Skowron; Dawn Song; Victor Storchan; Daniel Zhang; Daniel E. Ho; Percy Liang; Arvind Narayanan;
83	MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, and reasoning.	Kaining Ying; Fanqing Meng; Jin Wang; Zhiqian Li; Han Lin; Yue Yang; Hao Zhang; Wenbo Zhang; Yuqi Lin; Shuo Liu; jiayi lei; Quanfeng Lu; Runjian Chen; Peng Xu; Renrui Zhang; Haozhe Zhang; Peng Gao; Yali Wang; Yu Qiao; Ping Luo; Kaipeng Zhang; Wenqi Shao;
84	Revisiting The Role of Language Priors in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study $\textit{generative VLMs}$ that are trained for next-word generation given an image.	Zhiqiu Lin; Xinyue Chen; Deepak Pathak; Pengchuan Zhang; Deva Ramanan;
85	Position: Key Claims in LLM Research Have A Long Tail of Footnotes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We contribute a definition of LLMs, critically examine five common claims regarding their properties (including ’emergent properties’), and conclude with suggestions for future research directions and their framing.	Anna Rogers; Sasha Luccioni;
86	MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns.	Zechun Liu; Changsheng Zhao; Forrest Iandola; Chen Lai; Yuandong Tian; Igor Fedorov; Yunyang Xiong; Ernie Chang; Yangyang Shi; Raghuraman Krishnamoorthi; Liangzhen Lai; Vikas Chandra;
87	Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Generalized *E*ncoding-*Decoding Diffusion Probabilistic M*odels (EDDPMs) which integrate the core capabilities for broad applicability and enhanced performance.	Guangyi Liu; Yu Wang; Zeyu Feng; Qiyu Wu; Liping Tang; Yuan Gao; Zhen Li; Shuguang Cui; Julian McAuley; Zichao Yang; Eric P. Xing; Zhiting Hu;
88	Chain of Code: Reasoning with A Language Model-Augmented Code Emulator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning.	Chengshu Li; Jacky Liang; Andy Zeng; Xinyun Chen; Karol Hausman; Dorsa Sadigh; Sergey Levine; Li Fei-Fei; Fei Xia; brian ichter;
89	On Prompt-Driven Safeguarding for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how LLMs’ behavior (i.e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation.	Chujie Zheng; Fan Yin; Hao Zhou; Fandong Meng; Jie Zhou; Kai-Wei Chang; Minlie Huang; Nanyun Peng;
90	NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by it, we propose a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model, which generates attributes in each subspace following its corresponding prompt.	Zeqian Ju; Yuancheng Wang; Kai Shen; Xu Tan; Detai Xin; Dongchao Yang; Eric Liu; Yichong Leng; Kaitao Song; Siliang Tang; Zhizheng Wu; Tao Qin; Xiangyang Li; Wei Ye; Shikun Zhang; Jiang Bian; Lei He; Jinyu Li; sheng zhao;
91	Position: A Roadmap to Pluralistic Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using large language models as a test bed.	Taylor Sorensen; Jared Moore; Jillian Fisher; Mitchell L Gordon; Niloofar Mireshghallah; Christopher Michael Rytting; Andre Ye; Liwei Jiang; Ximing Lu; Nouha Dziri; Tim Althoff; Yejin Choi;
92	InstructSpeech: Following Speech Editing Instructions Via Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we construct triplet paired data (instruction, input speech, output speech) to alleviate data scarcity and train a multi-task large language model named InstructSpeech.	Rongjie Huang; Ruofan Hu; Yongqi Wang; Zehan Wang; Xize Cheng; Ziyue Jiang; Zhenhui Ye; Dongchao Yang; Luping Liu; Peng Gao; Zhou Zhao;
93	Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs.	Abhimanyu Hans; Avi Schwarzschild; Valeriia Cherepanova; Hamid Kazemi; Aniruddha Saha; Micah Goldblum; Jonas Geiping; Tom Goldstein;
94	An Embodied Generalist Agent in 3D World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce LEO, an embodied multi-modal generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in the 3D world.We collect large-scale datasets comprising diverse object-level and scene-level tasks, which require considerable understanding of and interaction with the 3D world.	Jiangyong Huang; Silong Yong; Xiaojian Ma; Xiongkun Linghu; Puhao Li; Yan Wang; Qing Li; Song-Chun Zhu; Baoxiong Jia; Siyuan Huang;
95	Prodigy: An Expeditiously Adaptive Parameter-Free Learner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Prodigy, an algorithm that provably estimates the distance to the solution $D$, which is needed to set the learning rate optimally.	Konstantin Mishchenko; Aaron Defazio;
96	Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding.	Harry Dong; Xinyu Yang; Zhenyu Zhang; Zhangyang Wang; Yuejie Chi; Beidi Chen;
97	Iterated Denoising Energy Matching for Sampling from Boltzmann Densities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient—and no data samples—to train a diffusion-based sampler.	Tara Akhound-Sadegh; Jarrid Rector-Brooks; Joey Bose; Sarthak Mittal; Pablo Lemos; Cheng-Hao Liu; Marcin Sendera; Siamak Ravanbakhsh; Gauthier Gidel; Yoshua Bengio; Nikolay Malkin; Alexander Tong;
98	In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in-context learning has seen limited effectiveness in many settings, is difficult to quantitatively control and takes up context window space. To overcome these limitations, we propose an alternative approach that recasts in-context learning as in-context vectors (ICV).	Sheng Liu; Haotian Ye; Lei Xing; James Y. Zou;
99	Auditing Private Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first framework for auditing private prediction where we instantiate adversaries with varying poisoning and query capabilities.	Karan Chadha; Matthew Jagielski; Nicolas Papernot; Christopher A. Choquette-Choo; Milad Nasr;
100	LoCoCo: Dropping In Convolutions for Long Context Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper tackles the memory hurdle of of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo).	Ruisi Cai; Yuandong Tian; Zhangyang Wang; Beidi Chen;
101	Unlocking The Power of Spatial and Temporal Information in Medical Multimodal Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the Med-ST framework for fine-grained spatial and temporal modeling to exploit information from multiple spatial views of chest radiographs and temporal historical records.	Jinxia Yang; Bing Su; Xin Zhao; Ji-Rong Wen;
102	LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that LLMs themselves have inherent capabilities to handles s long contexts without fine-tuning.	Hongye Jin; Xiaotian Han; Jingfeng Yang; Zhimeng Jiang; Zirui Liu; Chia-Yuan Chang; Huiyuan Chen; Xia Hu;
103	Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an unsupervised adversarial fine-tuning scheme to obtain a robust CLIP vision encoder, which yields robustness on all vision down-stream tasks (LVLMs, zero-shot classification) that rely on CLIP.	Christian Schlarmann; Naman Deep Singh; Francesco Croce; Matthias Hein;
104	SHINE: Shielding Backdoors in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SHINE, a backdoor shielding method specific for DRL.	Zhuowen Yuan; Wenbo Guo; Jinyuan Jia; Bo Li; Dawn Song;
105	In-Context Unlearning: Language Models As Few-Shot Unlearners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new class of unlearning methods for LLMs called “In-Context Unlearning.”	Martin Pawelczyk; Seth Neel; Himabindu Lakkaraju;
106	How Learning By Reconstruction Produces Uninformative Features For Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite interpretability benefit of reconstruction and generation, we identify a misalignment between learning to reconstruct, and learning for perception.	Randall Balestriero; Yann LeCun;
107	GLoRe: When, Where, and How to Improve LLM Reasoning Via Global and Local Refinements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Stepwise ORMs (\textbf{SORMs}) which are trained, only on synthetic data, to approximate the expected future reward of the optimal policy or $V^{\star}$ as a form of Process-based reward modeling.We generate training data for both models synthetically by reusing data used to train the SORM.	Alexander Havrilla; Sharath Chandra Raparthy; Christoforos Nalmpantis; Jane Dwivedi-Yu; Maksym Zhuravinskyi; Eric Hambro; Roberta Raileanu;
108	Controlled Decoding from Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We pose a tokenwise RL objective and propose a modular solver for it, called controlled decoding (CD).	Sidharth Mudgal; Jong Lee; Harish Ganapathy; YaGuang Li; Tao Wang; Yanping Huang; Zhifeng Chen; Heng-Tze Cheng; Michael Collins; Trevor Strohman; Jilin Chen; Alex Beutel; Ahmad Beirami;
109	Learning Divergence Fields for Shift-Robust Graph Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging generalization problem with interdependent data.	Qitian Wu; Fan Nie; Chenxiao Yang; Junchi Yan;
110	Position: A Safe Harbor for AI Evaluation and Red Teaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose that major generative AI developers commit to providing a legal and technical safe harbor, protecting public interest safety research and removing the threat of account suspensions or legal reprisal.	Shayne Longpre; Sayash Kapoor; Kevin Klyman; Ashwin Ramaswami; Rishi Bommasani; Borhane Blili-Hamelin; Yangsibo Huang; Aviya Skowron; Zheng Xin Yong; Suhas Kotha; Yi Zeng; Weiyan Shi; Xianjun Yang; Reid Southen; Alexander Robey; Patrick Chao; Diyi Yang; Ruoxi Jia; Daniel Kang; Alex Pentland; Arvind Narayanan; Percy Liang; Peter Henderson;
111	OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End.	Fuzhao Xue; Zian Zheng; Yao Fu; Jinjie Ni; Zangwei Zheng; Wangchunshu Zhou; Yang You;
112	Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal.	Nikhil Sardana; Jacob Portes; Sasha Doubov; Jonathan Frankle;
113	EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs).	Yanxi Chen; Xuchen Pan; Yaliang Li; Bolin Ding; Jingren Zhou;
114	Large Language Models Are Geographically Biased Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to study what LLMs know about the world we live in through the lens of geography.	Rohin Manvi; Samar Khanna; Marshall Burke; David B. Lobell; Stefano Ermon;
115	Position: Video As The New Language for Real-World Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world.	Sherry Yang; Jacob C Walker; Jack Parker-Holder; Yilun Du; Jake Bruce; Andre Barreto; Pieter Abbeel; Dale Schuurmans;
116	Generalized Preference Optimization: A Unified Approach to Offline Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions.	Yunhao Tang; Zhaohan Daniel Guo; Zeyu Zheng; Daniele Calandriello; Remi Munos; Mark Rowland; Pierre Harvey Richemond; Michal Valko; Bernardo Avila Pires; Bilal Piot;
117	Compositional Text-to-Image Generation with Dense Blob Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts.	Weili Nie; Sifei Liu; Morteza Mardani; Chao Liu; Benjamin Eckart; Arash Vahdat;
118	3D-VLA: A 3D Vision-Language-Action Generative World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose 3D-VLA by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action through a generative world model.	Haoyu Zhen; Xiaowen Qiu; Peihao Chen; Jincheng Yang; Xin Yan; Yilun Du; Yining Hong; Chuang Gan;
119	ODIN: Disentangled Reward Mitigates Hacking in RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs.	Lichang Chen; Chen Zhu; Jiuhai Chen; Davit Soselia; Tianyi Zhou; Tom Goldstein; Heng Huang; Mohammad Shoeybi; Bryan Catanzaro;
120	Retrieval-Augmented Score Distillation for Text-to-3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.	Junyoung Seo; Susung Hong; Wooseok Jang; Inès Hyeonsu Kim; Min-Seop Kwak; Doyup Lee; Seungryong Kim;
121	A Minimaximalist Approach to Reinforcement Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Self-Play Preference Optimization* (SPO), an algorithm for reinforcement learning from human feedback.*	Gokul Swamy; Christoph Dann; Rahul Kidambi; Steven Wu; Alekh Agarwal;
122	CogBench: A Large Language Model Walks Into A Psychology Lab Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments.	Julian Coda-Forno; Marcel Binz; Jane X Wang; Eric Schulz;
123	Image Hijacks: Adversarial Images Can Control Generative Models at Runtime Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on the image input to a vision-language model (VLM).	Luke Bailey; Euan Ong; Stuart Russell; Scott Emmons;
124	Can AI Assistants Know What They Don’t Know? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, in this paper, we ask the question Can AI assistants know what they don’t know and express this awareness through natural language? To investigate this, we construct a model-specific I don’t know (Idk) dataset.	Qinyuan Cheng; Tianxiang Sun; Xiangyang Liu; Wenwei Zhang; Zhangyue Yin; Shimin Li; Linyang Li; Zhengfu He; Kai Chen; Xipeng Qiu;
125	Stop Regressing: Training Value Functions Via Classification for Scalable Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions.	Jesse Farebrother; Jordi Orbay; Quan Vuong; Adrien Ali Taiga; Yevgen Chebotar; Ted Xiao; Alex Irpan; Sergey Levine; Pablo Samuel Castro; Aleksandra Faust; Aviral Kumar; Rishabh Agarwal;
126	Position: The Platonic Representation Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that representations in AI models, particularly deep networks, are converging.	Minyoung Huh; Brian Cheung; Tongzhou Wang; Phillip Isola;
127	GPTSwarm: Language Agents As Optimizable Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs.	Mingchen Zhuge; Wenyi Wang; Louis Kirsch; Francesco Faccio; Dmitrii Khizbullin; Jürgen Schmidhuber;
128	A Decoder-only Foundation Model for Time-series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset.	Abhimanyu Das; Weihao Kong; Rajat Sen; Yichen Zhou;
129	Regression with Multi-Expert Deferral Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel framework of regression with deferral, which involves deferring the prediction to multiple experts.	Anqi Mao; Mehryar Mohri; Yutao Zhong;
130	$H$-Consistency Guarantees for Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of $H$-consistency bounds for regression.	Anqi Mao; Mehryar Mohri; Yutao Zhong;
131	Self-Rewarding Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that during Iterative DPO training, not only does instruction following ability improve, but also the ability to provide high-quality rewards to itself.	Weizhe Yuan; Richard Yuanzhe Pang; Kyunghyun Cho; Xian Li; Sainbayar Sukhbaatar; Jing Xu; Jason E Weston;
132	Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper analyzes potential reasons behind the issues, and designs improved reward learning algorithm termed ‘Iterative Data Smoothing’ (IDS).	Banghua Zhu; Michael Jordan; Jiantao Jiao;
133	Learning to Model The World With Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language—language like this button turns on the TV or I put the bowls away—that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future: what they will observe, how the world will behave, and which situations will be rewarded.	Jessy Lin; Yuqing Du; Olivia Watkins; Danijar Hafner; Pieter Abbeel; Dan Klein; Anca Dragan;
134	Explorations of Self-Repair in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We highlight two different mechanisms that contribute to self-repair, including changes in the final LayerNorm scaling factor and sparse sets of neurons implementing Anti-Erasure.	Cody Rushing; Neel Nanda;
135	Discrete Diffusion Modeling By Estimating The Ratios of The Data Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance.	Aaron Lou; Chenlin Meng; Stefano Ermon;
136	Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Language Agent Tree Search (LATS) — the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning.	Andy Zhou; Kai Yan; Michal Shlapentokh-Rothman; Haohan Wang; Yu-Xiong Wang;
137	InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM.	Lichang Chen; Jiuhai Chen; Tom Goldstein; Heng Huang; Tianyi Zhou;
138	In-Context Sharpness As Alerts: An Inner Representation Perspective for Hallucination Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective of inner representations.	Shiqi Chen; Miao Xiong; Junteng Liu; Zhengxuan Wu; Teng Xiao; Siyang Gao; Junxian He;
139	Privacy Backdoors: Stealing Data with Corrupted Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors.	Shanglun Feng; Florian Tramèr;
140	Repeat After Me: Transformers Are Better Than State Space Models at Copying Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as ”generalized state space models” (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context.	Samy Jelassi; David Brandfonbrener; Sham M. Kakade; eran malach;
141	Q-Align: Teaching LMMs for Visual Scoring Via Discrete Text-Defined Levels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligning with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores.	Haoning Wu; Zicheng Zhang; Weixia Zhang; Chaofeng Chen; Liang Liao; Chunyi Li; Yixuan Gao; Annan Wang; Erli Zhang; Wenxiu Sun; Qiong Yan; Xiongkuo Min; Guangtao Zhai; Weisi Lin;
142	Position Paper: Scaling Simulation Is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a structured critique of robotic simulations for real-world manipulation, by arguing that scaling simulators is neither necessary nor sufficient for making progress in general-purpose real-world robotic manipulation agents that are compliant with human preferences.	Homanga Bharadhwaj;
143	RoSA: Accurate Parameter-Efficient Fine-Tuning Via Robust Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis that jointly trains $\textit{low-rank}$ and \textit{highly-sparse} components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution.	Mahdi Nikdan; Soroush Tabesh; Elvir Crnčević; Dan Alistarh;
144	Executable Code Actions Elicit Better LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes to use executable Python code* to consolidate LLM agents’ actions into a unified action space (CodeAct).*	Xingyao Wang; Yangyi Chen; Lifan Yuan; Yizhe Zhang; Yunzhu Li; Hao Peng; Heng Ji;
145	TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions, by generating via using, growing, and periodically trimming the toolbox.	Zhiruo Wang; Graham Neubig; Daniel Fried;
146	Physics of Language Models: Part 3.1, Knowledge Storage and Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, do they answer such questions based on exposure to similar questions during training (i.e., cheating), or by genuinely learning to extract knowledge from sources like Wikipedia? In this paper, we investigate this issue using a controlled biography dataset.	Zeyuan Allen-Zhu; Yuanzhi Li;
147	DITTO: Diffusion Inference-Time T-Optimization for Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose framework for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents.	Zachary Novack; Julian McAuley; Taylor Berg-Kirkpatrick; Nicholas J. Bryan;
148	SqueezeLLM: Dense-and-Sparse Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference.	Sehoon Kim; Coleman Richard Charles Hooper; Amir Gholami; Zhen Dong; Xiuyu Li; Sheng Shen; Michael W. Mahoney; Kurt Keutzer;
149	MC-GTA: Metric-Constrained Model-Based Clustering Using Goodness-of-fit Tests with Autocorrelations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The model-based variations of these clustering algorithms (e.g. TICC and STICC) achieve SOTA performance, yet suffer from computational instability and complexity by using a metric-constrained Expectation-Maximization procedure. In order to address these two problems, we propose a novel clustering algorithm, MC-GTA (Model-based Clustering via Goodness-of-fit Tests with Autocorrelations).	Zhangyu Wang; Gengchen Mai; Krzysztof Janowicz; Ni Lao;
150	GaLore: Memory-Efficient LLM Training By Gradient Low-Rank Projection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA.	Jiawei Zhao; Zhenyu Zhang; Beidi Chen; Zhangyang Wang; Anima Anandkumar; Yuandong Tian;
151	ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Continuum Physical Dataset (ContPhy), a novel benchmark for assessing machine physical commonsense.	Zhicheng Zheng; Xin Yan; Zhenfang Chen; Jingzhou Wang; Qin Zhi Eddie Lim; Joshua B. Tenenbaum; Chuang Gan;
152	Unified Training of Universal Time Series Forecasting Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, constructing such a model poses unique challenges specific to time series data: (i) cross-frequency learning, (ii) accommodating an arbitrary number of variates for multivariate time series, and (iii) addressing the varying distributional properties inherent in large-scale data. To address these challenges, we present novel enhancements to the conventional time series Transformer architecture, resulting in our proposed Masked Encoder-based Universal Time Series Forecasting Transformer (Moirai).	Gerald Woo; Chenghao Liu; Akshat Kumar; Caiming Xiong; Silvio Savarese; Doyen Sahoo;
153	Linear Alignment: A Closed-form Solution for Aligning Human Preferences Without Tuning and Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training.	Songyang Gao; Qiming Ge; Wei Shen; Shihan Dou; Junjie Ye; Xiao Wang; Rui Zheng; Yicheng Zou; Zhi Chen; Hang Yan; Qi Zhang; Dahua Lin;
154	Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to shed the light on LLMs inner mechanisms through the lens of geometry.	Randall Balestriero; Romain Cosentino; Sarath Shekkizhar;
155	Projecting Molecules Into Synthesizable Chemical Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel framework that is capable of generating new chemical structures while ensuring synthetic accessibility.	Shitong Luo; Wenhao Gao; Zuofan Wu; Jian Peng; Connor W. Coley; Jianzhu Ma;
156	Skill Set Optimization: Reinforcing Language Model Behavior Via Transferable Skills Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills.	Kolby Nottingham; Bodhisattwa Prasad Majumder; Bhavana Dalvi Mishra; Sameer Singh; Peter Clark; Roy Fox;
157	MusicFlow: Cascaded Flow Matching for Text Guided Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MusicFlow, a cascaded text-to-music generation model based on flow matching.	K R Prajwal; Bowen Shi; Matthew Le; Apoorv Vyas; Andros Tjandra; Mahi Luthra; Baishan Guo; Huiyu Wang; Triantafyllos Afouras; David Kant; Wei-Ning Hsu;
158	Transformers, Parallel Computation, and Logarithmic Depth Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that a constant number of self-attention layers can efficiently simulate—and be simulated by—a constant number of communication rounds of Massively Parallel Computation.	Clayton Sanford; Daniel Hsu; Matus Telgarsky;
159	Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without breaking gradient isolation or introducing any learnable parameters.	Yibo Yang; Xiaojie Li; Motasem Alfarra; Hasan Abed Al Kader Hammoud; Adel Bibi; Philip Torr; Bernard Ghanem;
160	UniAudio: Towards Universal Audio Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As recent research on large language models (LLMs) has demonstrated their strong ability to handle multiple tasks, this work presents UniAudio, an LLM-based audio generation model that supports a wide range of audio generation tasks.	Dongchao Yang; Jinchuan Tian; Xu Tan; Rongjie Huang; Songxiang Liu; Haohan Guo; Xuankai Chang; Jiatong Shi; sheng zhao; Jiang Bian; Zhou Zhao; Xixin Wu; Helen M. Meng;
161	On The Duality Between Sharpness-Aware Minimization and Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM.	Yihao Zhang; Hangzhou He; Jingyu Zhu; Huanran Chen; Yifei Wang; Zeming Wei;
162	RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX.	Yao Mu; Junting Chen; Qing-Long Zhang; Shoufa Chen; Qiaojun Yu; Chongjian GE; Runjian Chen; Zhixuan Liang; Mengkang Hu; Chaofan Tao; Peize Sun; Haibao Yu; Chao Yang; Wenqi Shao; Wenhai Wang; Jifeng Dai; Yu Qiao; Mingyu Ding; Ping Luo;
163	Nash Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback.	Remi Munos; Michal Valko; Daniele Calandriello; Mohammad Gheshlaghi Azar; Mark Rowland; Zhaohan Daniel Guo; Yunhao Tang; Matthieu Geist; Thomas Mesnard; Côme Fiegel; Andrea Michi; Marco Selvi; Sertan Girgin; Nikola Momchev; Olivier Bachem; Daniel J Mankowitz; Doina Precup; Bilal Piot;
164	Feedback Loops With Language Models Drive In-Context Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that feedback loops can cause in-context reward hacking (ICRH), where the LLM at test-time optimizes a (potentially implicit) objective but creates negative side effects in the process.	Alexander Pan; Erik Jones; Meena Jagadeesan; Jacob Steinhardt;
165	Transolver: A Fast Transformer Solver for PDEs on General Geometries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries.	Haixu Wu; Huakun Luo; Haowen Wang; Jianmin Wang; Mingsheng Long;
166	The Illusion of State in State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$.	William Merrill; Jackson Petty; Ashish Sabharwal;
167	Don’t Trust Your Eyes: on The (un)reliability of Feature Visualizations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input.	Robert Geirhos; Roland S. Zimmermann; Blair Bilodeau; Wieland Brendel; Been Kim;
168	Multimodal Prototyping for Cancer Survival Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this process generates many tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses. Instead, we hypothesize that we can: (1) effectively summarize the morphological content of a WSI by condensing its constituting tokens using morphological prototypes, achieving more than $300\times$ compression; and (2) accurately characterize cellular functions by encoding the transcriptomic profile with biological pathway prototypes, all in an unsupervised fashion.	Andrew H. Song; Richard J. Chen; Guillaume Jaume; Anurag Jayant Vaidya; Alexander Baras; Faisal Mahmood;
169	Data-free Distillation of Diffusion Models with Bootstrapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model, or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm.	Jiatao Gu; Chen Wang; Shuangfei Zhai; Yizhe Zhang; Lingjie Liu; Joshua M. Susskind;
170	Rethinking Decision Transformer Via Hierarchical Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce a general sequence modeling framework for studying sequential decision making through the lens of \emph{Hierarchical RL}.	Yi Ma; Jianye HAO; Hebin Liang; Chenjun Xiao;
171	DiffDA: A Diffusion Model for Weather-scale Data Assimilation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations.	Langwen Huang; Lukas Gianinazzi; Yuejiang Yu; Peter Dominik Dueben; Torsten Hoefler;
172	Understanding Reasoning Ability of Language Models From The Perspective of Reasoning Paths Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning. To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time.	Xinyi Wang; Alfonso Amayuelas; Kexun Zhang; Liangming Pan; Wenhu Chen; William Yang Wang;
173	LLaGA: Large Language and Graph Assistant Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the the Large Language and Graph Assistant (LLaGA), an innovative model that effectively integrates LLM capabilities to handle the complexities of graph-structured data.	Runjin Chen; Tong Zhao; AJAY KUMAR JAISWAL; Neil Shah; Zhangyang Wang;
174	SAPG: Split and Aggregate Policy Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments beyond a certain point and their performance saturates. To address this, we propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling.	Jayesh Singla; Ananye Agarwal; Deepak Pathak;
175	FedREDefense: Defending Against Model Poisoning Attacks for Federated Learning Using Model Update Reconstruction Error Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing defenses, typically relying on cross-client/global information to mitigate these attacks, fall short when faced with non-IID data distributions and/or a large number of malicious clients. To address these challenges, we present FedREDefense.	Yueqi XIE; Minghong Fang; Neil Zhenqiang Gong;
176	GenCO: Generating Diverse Designs with Combinatorial Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many design settings arising in industrial design, material science, computer graphics and more require that the generated objects satisfy hard combinatorial constraints or meet objectives in addition to modeling a data distribution. To address this, we propose GenCO, a generative framework that guarantees constraint satisfaction throughout training by leveraging differentiable combinatorial solvers to enforce feasibility.	Aaron M Ferber; Arman Zharmagambetov; Taoan Huang; Bistra Dilkina; Yuandong Tian;
177	Mastering Robot Manipulation with Multimodal Prompts Through Pretraining and Multi-task Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of training a robot to understand multimodal prompts, interleaving vision signals with text descriptions.	Jiachen Li; Qiaozi Gao; Michael Johnston; Xiaofeng Gao; Xuehai He; Hangjie Shi; Suhaila Shakiah; Reza Ghanadan; William Yang Wang;
178	RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning Via Generative Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.	Yufei Wang; Zhou Xian; Feng Chen; Tsun-Hsuan Wang; Yian Wang; Katerina Fragkiadaki; Zackory Erickson; David Held; Chuang Gan;
179	DsDm: Dataset Selection with Datamodels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To develop better methods for selecting data, we start by framing dataset selection as an optimization problem that we can directly solve for: given target tasks, a learning algorithm, and candidate data, select the subset that maximizes model performance.	Logan Engstrom;
180	SILVER: Single-loop Variance Reduction and Application to Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a single-loop variance-reduced gradient estimator named SILVER (SIngle-Loop VariancE-Reduction) for the finite-sum non-convex optimization, which does not require multiple full gradients but nevertheless achieves the optimal gradient complexity.	Kazusato Oko; Shunta Akiyama; Denny Wu; Tomoya Murata; Taiji Suzuki;
181	Full-Atom Peptide Design Based on Multi-modal Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present PepFlow, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors.	Jiahan Li; Chaoran Cheng; Zuofan Wu; Ruihan Guo; Shitong Luo; Zhizhou Ren; Jian Peng; Jianzhu Ma;
182	Evaluation of Test-Time Adaptation Under Computational Time Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation.	Motasem Alfarra; Hani Itani; Alejandro Pardo; shyma yaser alhuwaider; Merey Ramazanova; Juan Camilo Perez; zhipeng cai; Matthias Müller; Bernard Ghanem;
183	Prompting A Pretrained Transformer Can Be A Universal Approximator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed.	Aleksandar Petrov; Philip Torr; Adel Bibi;
184	Agent Instructs Large Language Models to Be General Zero-Shot Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks.	Nicholas Crispino; Kyle Montgomery; Fankun Zeng; Dawn Song; Chenguang Wang;
185	Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we first introduce LoCoV1, a 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scaling to documents up to 32K tokens long.	Jon Saad-Falcon; Daniel Y Fu; Simran Arora; Neel Guha; Christopher Re;
186	The Pitfalls of Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can a mere next-token predictor faithfully model human thinking? Our work is aimed at crystallizing this intuitive concern, which is currently fragmented in the literature.	Gregor Bachmann; Vaishnavh Nagarajan;
187	Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions.	Sungwon Han; Jinsung Yoon; Sercan O Arik; Tomas Pfister;
188	Selecting Large Language Model to Fine-tune Via Rectified Scaling Law Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Scaling Law.	Haowei Lin; Baizhou Huang; Haotian Ye; Qinyu Chen; Zihao Wang; Sujian Li; Jianzhu Ma; Xiaojun Wan; James Zou; Yitao Liang;
189	HyperFields: Towards Zero-Shot Generation of NeRFs from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning.	Sudarshan Babu; Richard Liu; Avery Zhou; Michael Maire; Greg Shakhnarovich; Rana Hanocka;
190	Foundation Policies with Hilbert Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While a number of methods have been proposed to enable generic self-supervised RL, based on principles such as goal-conditioned RL, behavioral cloning, and unsupervised skill learning, such methods remain limited in terms of either the diversity of the discovered behaviors, the need for high-quality demonstration data, or the lack of a clear adaptation mechanism for downstream tasks. In this work, we propose a novel unsupervised framework to pre-train generalist policies that capture diverse, optimal, long-horizon behaviors from unlabeled offline data such that they can be quickly adapted to any arbitrary new tasks in a zero-shot manner.	Seohong Park; Tobias Kreiman; Sergey Levine;
191	Universality of Linear Recurrences Followed By Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular causal sequence-to-sequence maps.	Antonio Orvieto; Soham De; Caglar Gulcehre; Razvan Pascanu; Samuel L Smith;
192	Fast Timing-Conditioned Latent Audio Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our research focuses on the efficient generation of long-form, variable-length stereo music and sounds at 44.1kHz using text prompts with a generative model.	Zach Evans; CJ Carr; Josiah Taylor; Scott H. Hawley; Jordi Pons;
193	Error Feedback Can Accurately Compress Preconditioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to small-scale models, as they must store a sliding window of gradients, whose mem- ory requirements are multiplicative in the model dimension. In this paper, we address this issue via a novel and efficient error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence.	Ionut-Vlad Modoranu; Aleksei Kalinov; Eldar Kurtic; Elias Frantar; Dan Alistarh;
194	In-Context Language Learning: Architectures and Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL).	Ekin Akyürek; Bailin Wang; Yoon Kim; Jacob Andreas;
195	Equivariant Deep Weight Space Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align.	Aviv Navon; Aviv Shamsian; Ethan Fetaya; Gal Chechik; Nadav Dym; Haggai Maron;
196	Position: On The Possibilities of AI-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds.	Souradip Chakraborty; Amrit Bedi; Sicheng Zhu; Bang An; Dinesh Manocha; Furong Huang;
197	Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce R-Bench, a novel benchmark for evaluating Vision Relationship Hallucination.	Mingrui Wu; Jiayi Ji; Oucheng Huang; Jiale Li; Yuhang Wu; Xiaoshuai Sun; Rongrong Ji;
198	MaxMin-RLHF: Alignment with Diverse Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. Next, we propose to learn a mixture of reward models via an expectation-maximization algorithm and solve a MaxMin alignment objective inspired by the Egalitarian principle in social choice theory to better honor diverse human preferences.	Souradip Chakraborty; Jiahao Qiu; Hui Yuan; Alec Koppel; Dinesh Manocha; Furong Huang; Amrit Bedi; Mengdi Wang;
199	Position: AI/ML Influencers Have A Place in The Academic Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications. In this paper, we investigate the role of social media influencers in enhancing the visibility of machine learning research, particularly the citation counts of papers they share.	Iain Weissburg; Mehir Arora; Xinyi Wang; Liangming Pan; William Yang Wang;
200	HumanTOMATO: Text-aligned Whole-body Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works on text-driven motion generation tasks mainly have two limitations: they ignore the key role of fine-grained hand and face controlling in vivid whole-body motion generation, and lack a good alignment between text and motion. To address such limitations, we propose a Text-aligned whOle-body Motion generATiOn framework, named HumanTOMATO, which is the first attempt to our knowledge towards applicable holistic motion generation in this research area.	Shunlin Lu; Ling-Hao Chen; Ailing Zeng; Jing Lin; Ruimao Zhang; Lei Zhang; Heung-Yeung Shum;
201	MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation.	Nianzu Yang; Kaipeng Zeng; Haotian Lu; Yexin Wu; Zexin Yuan; Danni Chen; Shengdian Jiang; Jiaxiang Wu; Yimin Wang; Junchi Yan;
202	Improving Fine-grained Understanding in Image-text Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SPARse fine-grained Contrastive alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs.	Ioana Bica; Anastasija Ilic; Matthias Bauer; Goker Erdogan; Matko Bošnjak; Christos Kaplanis; Alexey A. Gritsenko; Matthias Minderer; Charles Blundell; Razvan Pascanu; Jovana Mitrovic;
203	AI Alignment with Changing and Influenceable Reward Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI’s influence on them.	Micah Carroll; Davis Foote; Anand Siththaranjan; Stuart Russell; Anca Dragan;
204	Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce three customized ensemble strategies, each tailored to one specific scenario.	Zhihe Lu; Jiawang Bai; Xin Li; Zeyu Xiao; Xinchao Wang;
205	RoboDreamer: Learning Compositional World Models for Robot Imagination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is heavily limiting in decision-making, where we seek a powerful world model to synthesize plans of unseen combinations of objects and actions in order to solve previously unseen tasks in new environments. To resolve this issue, we introduce RoboDreamer, an innovative approach for learning a compositional world model by factorizing the video generation.	Siyuan Zhou; Yilun Du; Jiaben Chen; YANDONG LI; Dit-Yan Yeung; Chuang Gan;
206	SPADE: Sparsity-Guided Debugging for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate, for the first time, that sparsity can instead be incorporated into the interpretation process itself, as a sample-specific preprocessing step.	Arshia Soltani Moakhar; Eugenia Iofinova; Elias Frantar; Dan Alistarh;
207	RigorLLM: Resilient Guardrails for Large Language Models Against Undesired Content Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently and effectively moderate harmful and unsafe inputs and outputs for LLMs.	Zhuowen Yuan; Zidi Xiong; Yi Zeng; Ning Yu; Ruoxi Jia; Dawn Song; Bo Li;
208	Dynamic Evaluation of Large Language Models By Meta Probing Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose meta probing agents (MPA), a general dynamic evaluation protocol inspired by psychometrics to evaluate LLMs.	Kaijie Zhu; Jindong Wang; Qinlin Zhao; Ruochen Xu; Xing Xie;
209	SparQ Attention: Bandwidth-Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The need for many applications to support long input sequences and process them in large batches typically causes token-generation to be bottlenecked by data transfer. For this reason, we introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by utilising memory bandwidth more efficiently within the attention layers, through selective fetching of the cached history.	Luka Ribar; Ivan Chelombiev; Luke Hudlass-Galley; Charlie Blake; Carlo Luschi; Douglas Orr;
210	A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of trials, which in turn makes it impossible to account for the privacy cost of HPO without destroying the utility. We propose an adaptive HPO method that uses cheap trials (in terms of privacy cost and runtime) to estimate optimal hyperparameters and scales them up.	Ashwinee Panda; Xinyu Tang; Saeed Mahloujifar; Vikash Sehwag; Prateek Mittal;
211	Adaptive Hierarchical Certification for Segmentation Using Randomized Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components classic methods would abstain from, effectively lowering the abstain rate whilst providing more certified semantically meaningful information. We mathematically formulate the problem setup, introduce an adaptive hierarchical certification algorithm and prove the correctness of its guarantees.	Alaa Anani; Tobias Lorenz; Bernt Schiele; Mario Fritz;
212	Stealthy Imitation: Reward-guided Environment-free Policy Stealing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range.	Zhixiong Zhuang; Maria-Irina Nicolae; Mario Fritz;
213	Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to their myopic perspective, they escalate the number of query requests, leading to increased costs, memory, and computational overheads. Addressing this, we propose the Algorithm of Thoughts—a novel strategy that propels LLMs through algorithmic reasoning pathways.	Bilgehan Sel; Ahmad Tawaha; Vanshaj Khattar; Ruoxi Jia; Ming Jin;
214	WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform.	Alexandre Drouin; Maxime Gasse; Massimo Caccia; Issam H. Laradji; Manuel Del Verme; Tom Marty; David Vazquez; Nicolas Chapados; Alexandre Lacoste;
215	The Linear Representation Hypothesis and The Geometry of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address two closely related questions: What does linear representation actually mean?	Kiho Park; Yo Joong Choe; Victor Veitch;
216	StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, they have largely overlooked the underlying coherence between the augmented domains, which in turn leads to inferior results in real-world scenarios. In this paper, we propose a simple yet effective scheme, termed as \emph{StyDeSty}, to explicitly account for the alignment of the source and pseudo domains in the process of data augmentation, enabling them to interact with each other in a self-consistent manner and further giving rise to a latent domain with strong generalization power.	Songhua Liu; Xin Jin; Xingyi Yang; Jingwen Ye; Xinchao Wang;
217	ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion.	Akhil Agnihotri; Rahul Jain; Haipeng Luo;
218	Trustless Audits Without Revealing Data or Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that it is possible to simultaneously allow model providers to keep their models and data secret while allowing other parties to trustlessly audit properties of the model and data.	Suppakit Waiwitlikhit; Ion Stoica; Yi Sun; Tatsunori Hashimoto; Daniel Kang;
219	Large Scale Dataset Distillation with Domain Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce \textbf{D}ataset \textbf{D}istillation with \textbf{D}omain \textbf{S}hift (\textbf{D3S}), a scalable distillation algorithm, made by reframing the dataset distillation problem as a \textit{domain shift} one.	Noel Loo; Alaa Maalouf; Ramin Hasani; Mathias Lechner; Alexander Amini; Daniela Rus;
220	Graph Neural Networks Use Graphs When They Shouldn’t Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While GNNs have the ability to ignore the graph-structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure in the sense that they use it even when a better solution can be obtained by ignoring it.	Maya Bechler-Speicher; Ido Amos; Ran Gilad-Bachrach; Amir Globerson;
221	Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not* confer any implicit bias advantages in online learning.*	Nikhil Vyas; Depen Morwani; Rosie Zhao; Gal Kaplun; Sham M. Kakade; Boaz Barak;
222	COALA: A Practical and Vision-Centric Federated Learning Platform Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize as task, data, and model levels.	Weiming Zhuang; Jian Xu; Chen Chen; Jingtao Li; Lingjuan Lyu;
223	Feedback Efficient Online Fine-Tuning of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples.	Masatoshi Uehara; Yulai Zhao; Kevin Black; Ehsan Hajiramezanali; Gabriele Scalia; Nathaniel Lee Diamant; Alex M Tseng; Sergey Levine; Tommaso Biancalani;
224	Refined Coreset Selection: Towards Minimal Coreset Size Under Model Performance Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored.	Xiaobo Xia; Jiale Liu; Shaokun Zhang; Qingyun Wu; Hongxin Wei; Tongliang Liu;
225	Deep Networks Always Grok and Here Is Why Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the new concept of delayed robustness, whereby a DNN groks adversarial examples and becomes robust, long after interpolation and/or generalization.	Ahmed Imtiaz Humayun; Randall Balestriero; Richard Baraniuk;
226	Towards Efficient Exact Optimization of Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose efficient exact optimization (EXO) of the alignment objective.	Haozhe Ji; Cheng Lu; Yilin Niu; Pei Ke; Hongning Wang; Jun Zhu; Jie Tang; Minlie Huang;
227	TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To emphasize temporal correlation modeling, this paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.	Jiaxiang Dong; Haixu Wu; Yuxuan Wang; Yun-Zhong Qiu; Li Zhang; Jianmin Wang; Mingsheng Long;
228	CompeteAI: Understanding The Competition Dynamics of Large Language Model-based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to examine the competition dynamics in LLM-based agents.	Qinlin Zhao; Jindong Wang; Yixuan Zhang; Yiqiao Jin; Kaijie Zhu; Hao Chen; Xing Xie;
229	Decouple Then Classify: A Dynamic Multi-view Labeling Strategy with Shared and Specific Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In literature, most existing methods randomly label samples with a given ratio, but achieve unpromising and unstable results due to the randomness, especially in multi-view settings. To address this issue, we propose a Dynamic Multi-view Labeling Strategy with Shared and Specific Information.	Xinhang Wan; Jiyuan Liu; Xinwang Liu; Yi Wen; Hao Yu; Siwei Wang; Shengju Yu; Tianjiao Wan; Jun Wang; En Zhu;
230	KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: From this analysis, we developed a tuning-free 2bit KV cache quantization algorithm, named KIVI.	Zirui Liu; Jiayi Yuan; Hongye Jin; Shaochen Zhong; Zhaozhuo Xu; Vladimir Braverman; Beidi Chen; Xia Hu;
231	Plug-in Performative Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we study a general protocol for making use of possibly misspecified models in performative prediction, called plug-in performative optimization.	Licong Lin; Tijana Zrnic;
232	Understanding Adam Optimizer Via Online Learning of Updates: Adam Is FTRL in Disguise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a different perspective based on online learning that underscores the importance of Adam’s algorithmic components.	Kwangjun Ahn; Zhiyu Zhang; Yunbum Kook; Yan Dai;
233	Provable Interactive Learning with Hindsight Instruction Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Next, we study a specialized setting where the underlying instruction-response distribution can be decomposed as a low-rank matrix. We introduce an algorithm called LORIL for this setting and show that it is a no-regret algorithm with the regret scaling with $\sqrt{T}$ and depends on the _intrinsic rank_ but does not depend on the agent’s response space.	Dipendra Misra; Aldo Pacchiano; Robert E. Schapire;
234	Scaling Laws for Fine-Grained Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze their scaling properties, highlighting certain arbitrary assumptions present in the existing literature.	Jan Ludziejewski; Jakub Krajewski; Kamil Adamczewski; Maciej Pióro; Michał Krutul; Szymon Antoniak; Kamil Ciebiera; Krystian Król; Tomasz Odrzygóźdź; Piotr Sankowski; Marek Cygan; Sebastian Jaszczur;
235	Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, MambaFormer, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently.	Jongho Park; Jaeseung Park; Zheyang Xiong; Nayoung Lee; Jaewoong Cho; Samet Oymak; Kangwook Lee; Dimitris Papailiopoulos;
236	TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat.	Yichuan Mo; Hui Huang; Mingjie Li; Ang Li; Yisen Wang;
237	Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains.	Xiaoyu Wen; Chenjia Bai; Kang Xu; Xudong Yu; Yang Zhang; Xuelong Li; Zhen Wang;
238	Position: Graph Foundation Models Are Already Here Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a graph vocabulary”, in which the basic transferable units underlying graphs encode the invariance on graphs.	Haitao Mao; Zhikai Chen; Wenzhuo Tang; Jianan Zhao; Yao Ma; Tong Zhao; Neil Shah; Mikhail Galkin; Jiliang Tang;
239	Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To demonstrate the challenge of defending finetuning interfaces, we introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection.	Danny Halawi; Alexander Wei; Eric Wallace; Tony Tong Wang; Nika Haghtalab; Jacob Steinhardt;
240	What Is Dataset Distillation Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we posit and answer three questions about the behavior, representativeness, and point-wise information content of distilled data.	William Yang; Ye Zhu; Zhiwei Deng; Olga Russakovsky;
241	Understanding The Effects of Iterative Prompting on Truthfulness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work provides a nuanced understanding of iterative prompting and introduces novel approaches to enhance the truthfulness of LLMs, thereby contributing to the development of more accurate and trustworthy AI systems	Satyapriya Krishna; Chirag Agarwal; Himabindu Lakkaraju;
242	Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a vision of LLM-Modulo Frameworks that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime.	Subbarao Kambhampati; Karthik Valmeekam; Lin Guan; Mudit Verma; Kaya Stechly; Siddhant Bhambri; Lucas Paul Saldyt; Anil B Murthy;
243	On The Expressive Power of Spectral Invariant Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this work is to gain a deep theoretical understanding of the expressive power obtainable when using spectral features.	Bohang Zhang; Lingxiao Zhao; Haggai Maron;
244	Exploiting Code Symmetries for Learning Program Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics.	Kexin Pei; Weichen Li; Qirui Jin; Shuyang Liu; Scott Geng; Lorenzo Cavallaro; Junfeng Yang; Suman Jana;
245	Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance.	Yair Schiff; Chia Hsiang Kao; Aaron Gokaslan; Tri Dao; Albert Gu; Volodymyr Kuleshov;
246	CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we approach the ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt).	Natasha Butt; Blazej Manczak; Auke Wiggers; Corrado Rainone; David W. Zhang; Michaël Defferrard; Taco Cohen;
247	Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models By Finding Problematic Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism.	Zhi-Yi Chin; Chieh Ming Jiang; Ching-Chun Huang; Pin-Yu Chen; Wei-Chen Chiu;
248	How Transformers Learn Causal Structure with Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the process by which transformers learn such causal structure via gradient-based training algorithms remains poorly understood. To better understand this process, we introduce an in-context learning task that requires learning latent causal structure.	Eshaan Nichani; Alex Damian; Jason D. Lee;
249	DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks.	Zhongkai Hao; Chang Su; Songming Liu; Julius Berner; Chengyang Ying; Hang Su; Anima Anandkumar; Jian Song; Jun Zhu;
250	Adaptive Sampling of K-Space in Magnetic Resonance for Rapid Pathology Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection.	Chen-Yu Yen; Raghav Singhal; Umang Sharma; Rajesh Ranganath; Sumit Chopra; Lerrel Pinto;
251	How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To the best of our knowledge, this paper provides the first theoretical analysis of the training dynamics of Transformers with nonlinear self-attention and nonlinear MLP, together with the ICL generalization capability of the resulting model.	Hongkang Li; Meng Wang; Songtao Lu; Xiaodong Cui; Pin-Yu Chen;
252	What Improves The Generalization of Graph Transformers? A Theoretical Dive Into The Self-attention and Positional Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces the first theoretical investigation of a shallow Graph Transformer for semi-supervised node classification, comprising a self-attention layer with relative positional encoding and a two-layer perception.	Hongkang Li; Meng Wang; Tengfei Ma; Sijia Liu; ZAIXI ZHANG; Pin-Yu Chen;
253	BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, adapting these black-box LLMs is only possible through their API services, raising concerns about transparency, privacy, and cost. To address these challenges, we introduce BBox-Adapter, a novel lightweight adapter for black-box LLMs.	Haotian Sun; Yuchen Zhuang; Wei Wei; Chao Zhang; Bo Dai;
254	MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions.	Kai Zhang; Yi Luan; Hexiang Hu; Kenton Lee; Siyuan Qiao; Wenhu Chen; Yu Su; Ming-Wei Chang;
255	Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: Neural image classifiers are known to undergo severe performance degradation when exposed to inputs that are sampled from environmental conditions that differ from their training …	Jianhao Yuan; Francesco Pinto; Adam Davies; Philip Torr;
256	S3GCL: Spectral, Swift, Spatial Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, prevailing GCL methods confront two primary challenges: 1) They predominantly operate under homophily assumptions, focusing on low-frequency signals in node features while neglecting heterophilic edges that connect nodes with dissimilar features. 2) Their reliance on neighborhood aggregation for inference leads to scalability challenges and hinders deployment in real-time applications. In this paper, we introduce S3GCL, an innovative framework designed to tackle these challenges.	Guancheng Wan; Yijun Tian; Wenke Huang; Nitesh V Chawla; Mang Ye;
257	DiJiang: Efficient Large Language Models Through Compact Kernelization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DiJiang, a novel Frequency Domain Kernelization approach that enables the transformation of a pre-trained vanilla Transformer into a linear complexity model with little training costs.	Hanting Chen; Liuzhicheng; Xutao Wang; Yuchuan Tian; Yunhe Wang;
258	Differentially Private Synthetic Data Via Foundation Model APIs 2: Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text.	Chulin Xie; Zinan Lin; Arturs Backurs; Sivakanth Gopi; Da Yu; Huseyin A Inan; Harsha Nori; Haotian Jiang; Huishuai Zhang; Yin Tat Lee; Bo Li; Sergey Yekhanin;
259	Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data.	Fengdi Che; Chenjun Xiao; Jincheng Mei; Bo Dai; Ramki Gummadi; Oscar A Ramirez; Christopher K Harris; A. Rupam Mahmood; Dale Schuurmans;
260	Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present fundamental limitations of verifying the semantic properties of LLM outputs and identifying compositional threats, illustrating inherent challenges of current approaches to censoring LLM outputs.	David Glukhov; Ilia Shumailov; Yarin Gal; Nicolas Papernot; Vardan Papyan;
261	Amortizing Pragmatic Program Synthesis with Rankings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a general method of amortizing the slow, exact RSA synthesizer.	Yewen Pu; Saujas Vaduguru; Priyan Vaithilingam; Elena Glassman; Daniel Fried;
262	Asymptotics of Feature Learning in Two-layer Networks After One Gradient-step Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step.	Hugo Cui; Luca Pesce; Yatin Dandi; Florent Krzakala; Yue Lu; Lenka Zdeborova; Bruno Loureiro;
263	Split-Ensemble: Efficient OOD-aware Ensemble Via Task and Model Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we improve on uncertainty estimation without extra OOD data or additional inference costs using an alternative Split-Ensemble* method.*	Anthony Chen; Huanrui Yang; Yulu Gan; Denis A Gudovskiy; Zhen Dong; Haofan Wang; Tomoyuki Okuno; Yohei Nakata; Kurt Keutzer; Shanghang Zhang;
264	Auto-Regressive Next-Token Predictors Are Universal Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a theoretical framework for studying auto-regressive next-token predictors.	eran malach;
265	WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion.	Xing Han Lu; Zdeněk Kasner; Siva Reddy;
266	MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking.	Dongping Chen; Ruoxi Chen; Shilin Zhang; Yaochen Wang; Yinuo Liu; Huichi Zhou; Qihui Zhang; Yao Wan; Pan Zhou; Lichao Sun;
267	Unsupervised Representation Learning of Brain Activity Via Bridging Voxel Activity and Functional Connectivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing studies have focused on either (1) voxel-level activity, where only a single weight relating the voxel activity to the task (i.e., aggregation of voxel activity over a time window) is considered, missing their temporal dynamics, or (2) functional connectivity of the brain in the level of region of interests, missing voxel-level activities. We bridge this gap and design BrainMixer, an unsupervised learning framework that effectively utilizes both functional connectivity and associated time series of voxels to learn voxel-level representation in an unsupervised manner.	Ali Behrouz; Parsa Delavari; Farnoosh Hashemi;
268	Position: What Can Large Language Models Tell Us About Time Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that current LLMs have the potential to revolutionize time series analysis, thereby promoting efficient decision-making and advancing towards a more universal form of time series analytical intelligence.	Ming Jin; YiFan Zhang; Wei Chen; Kexin Zhang; Yuxuan Liang; Bin Yang; Jindong Wang; Shirui Pan; Qingsong Wen;
269	DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample.	Yinjun Wu; Mayank Keoliya; Kan Chen; Neelay Velingker; Ziyang Li; Emily J Getzen; Qi Long; Mayur Naik; Ravi B Parikh; Eric Wong;
270	Towards Compositionality in Concept Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that existing unsupervised concept extraction methods find concepts which are not compositional. To automatically discover compositional concept representations, we identify two salient properties of such representations, and propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties.	Adam Stein; Aaditya Naik; Yinjun Wu; Mayur Naik; Eric Wong;
271	Memory Consolidation Enables Long-Context Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complexity. We propose to instead re-purpose existing pre-trained video transformers by simply fine-tuning them to attend to memories derived non-parametrically from past activations.	Ivana Balazevic; Yuge Shi; Pinelopi Papalampidi; Rahma Chaabouni; Skanda Koppula; Olivier J Henaff;
272	How to Escape Sharp Minima with Random Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice.	Kwangjun Ahn; Ali Jadbabaie; Suvrit Sra;
273	Variational Schrödinger Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the variational Schr\odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport.	Wei Deng; Weijian Luo; Yixin Tan; Marin Biloš; Yu Chen; Yuriy Nevmyvaka; Ricky T. Q. Chen;
274	Trainable Transformer in Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new efficient construction, Transformer in Transformer (in short, TINT), that allows a transformer to simulate and fine-tune more complex models during inference (e.g., pre-trained language models).	Abhishek Panigrahi; Sadhika Malladi; Mengzhou Xia; Sanjeev Arora;
275	Gated Linear Attention Transformers with Hardware-Efficient Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work describes a hardware-efficient algorithm for linear attention that trades off memory movement against parallelizability.	Songlin Yang; Bailin Wang; Yikang Shen; Rameswar Panda; Yoon Kim;
276	On The Origins of Linear Representations in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An array of recent works have argued that high-level semantic concepts are encoded linearly in the representation space of large language models. In this work, we study the origins of such linear representations.	Yibo Jiang; Goutham Rajendran; Pradeep Kumar Ravikumar; Bryon Aragam; Victor Veitch;
277	Generalization to New Sequential Decision Making Tasks with In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.	Sharath Chandra Raparthy; Eric Hambro; Robert Kirk; Mikael Henaff; Roberta Raileanu;
278	GNNs Also Deserve Editing, and They Need It More Than Once Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the specific reasons behind the difficulty of editing GNNs in succession and observe the root cause to be model overfitting.	Shaochen Zhong; Duy Le; Zirui Liu; Zhimeng Jiang; Andrew Ye; Jiamu Zhang; Jiayi Yuan; Kaixiong Zhou; Zhaozhuo Xu; Jing Ma; Shuai Xu; Vipin Chaudhary; Xia Hu;
279	Language Models Are Super Mario: Absorbing Abilities from Homologous Models As A Free Lunch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs.	Le Yu; Bowen Yu; Haiyang Yu; Fei Huang; Yongbin Li;
280	ReGAL: Refactoring Programs to Discover Generalizable Abstractions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output.	Elias Stengel-Eskin; Archiki Prasad; Mohit Bansal;
281	Critical Windows: Non-asymptotic Theory for Feature Emergence in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this is advantageous for interpretability as it implies one can localize properties of the generation to a small segment of the trajectory, it seems at odds with the continuous nature of the diffusion. We propose a formal framework for studying these windows and show that for data coming from a mixture of strongly log-concave densities, these windows can be provably bounded in terms of certain measures of inter- and intra-group separation.	Marvin Li; Sitan Chen;
282	Fair Off-Policy Learning from Observational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different — potentially discriminatory — behavioral policy.	Dennis Frauen; Valentyn Melnychuk; Stefan Feuerriegel;
283	Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities.	Zhifeng Kong; Arushi Goel; Rohan Badlani; Wei Ping; Rafael Valle; Bryan Catanzaro;
284	Consistent Adversarially Robust Linear Classification: Non-Parametric Setting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With only mild regularity conditions on the conditional distribution of the features, we examine adversarial attacks with respect to arbitrary norms and introduce a straightforward yet effective estimator with provable consistency w.r.t adversarial risk.	Elvis Dohmatob;
285	Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we shed light on how degree bias in networks affects Graph Convolutional Network (GCN) link prediction.	Arjun Subramonian; Levent Sagun; Yizhou Sun;
286	Robust Classification Via A Single Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better harness the expressive power of diffusion models, this paper proposes Robust Diffusion Classifier (RDC), a generative classifier that is constructed from a pre-trained diffusion model to be adversarially robust.	Huanran Chen; Yinpeng Dong; Zhengyi Wang; Xiao Yang; Chengqi Duan; Hang Su; Jun Zhu;
287	Towards A Self-contained Data-driven Global Weather Forecasting Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to couple the AI forecasting model, FengWu, with 4DVar to build a self-contained data-driven global weather forecasting framework, FengWu-4DVar.	Yi Xiao; LEI BAI; Wei Xue; Hao Chen; Kun Chen; kang chen; Tao Han; Wanli Ouyang;
288	SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current post-training pruning methods, while reducing the sizes of LLMs, often fail to maintain their original performance. To address these challenges, this paper introduces SPP, a Sparsity-Preserved Parameter-efficient fine-tuning method.	Xudong Lu; Aojun Zhou; Yuhui Xu; Renrui Zhang; Peng Gao; Hongsheng Li;
289	On Convergence of Incremental Gradient for Non-convex Smooth Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\epsilon$ from $O \left( \frac{n}{\epsilon} \right)$ to $O \left( \frac{1}{\epsilon}\right)$.	Anastasia Koloskova; Nikita Doikov; Sebastian U Stich; Martin Jaggi;
290	Learning Causal Relations from Subsampled Time Series with Two Time-Slices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the causal relations from subsampled time series, in which measurements are sparse and sampled at a coarser timescale than the causal timescale of the underlying system.	Anpeng Wu; Haoxuan Li; Kun Kuang; Zhang Keli; Fei Wu;
291	Generative Active Learning for Long-tailed Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore how to perform active learning specifically for generated data in the long-tailed instance segmentation task.	Muzhi Zhu; Chengxiang Fan; Hao Chen; Yang Liu; Weian Mao; Xiaogang Xu; Chunhua Shen;
292	Rejuvenating Image-GPT As Strong Visual Representation Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning.	Sucheng Ren; Zeyu Wang; Hongru Zhu; Junfei Xiao; Alan Yuille; Cihang Xie;
293	Subgoal-based Demonstration Learning for Formal Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve the performance of LLMs in formal theorem proving by thoroughly examining the structure and organization of demonstrative in-context examples.	Xueliang Zhao; Wenda Li; Lingpeng Kong;
294	Do Language Models Exhibit The Same Cognitive Biases in Problem Solving As Human Learners? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems.We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.	Andreas Opedal; Alessandro Stolfo; Haruki Shirakami; Ying Jiao; Ryan Cotterell; Bernhard Schölkopf; Abulhair Saparov; Mrinmaya Sachan;
295	Floating Anchor Diffusion Model for Multi-motif Scaffolding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Floating Anchor Diffusion (FADiff) model.	Ke Liu; Weian Mao; Shuaike Shen; Xiaoran Jiao; Zheng Sun; Hao Chen; Chunhua Shen;
296	Non-Vacuous Generalization Bounds for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data.	Sanae Lotfi; Marc Anton Finzi; Yilun Kuang; Tim G. J. Rudner; Micah Goldblum; Andrew Gordon Wilson;
297	Active Statistical Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the concept of active learning, we propose active inference—a methodology for statistical inference with machine-learning-assisted data collection.	Tijana Zrnic; Emmanuel Candes;
298	Transferring Knowledge From Large Foundation Models to Small Downstream Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these shortcomings, we introduce Adaptive Feature Transfer (AFT).	Shikai Qiu; Boran Han; Danielle C. Maddix; Shuai Zhang; Bernie Wang; Andrew Gordon Wilson;
299	Compute Better Spent: Replacing Dense Layers with Structured Matrices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we systematically explore structured matrices as replacements for dense matrices.	Shikai Qiu; Andres Potapczynski; Marc Anton Finzi; Micah Goldblum; Andrew Gordon Wilson;
300	A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the required symmetries.	Keqiang Yan; Alexandra Saxton; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji;
301	Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging the observation that a limited set of Fourier (Spectral) modes suffice to provide the required expressivity of a neural operator, we propose a simple method, based on the efficient direct evaluation of the underlying spectral transformation, to extend neural operators to arbitrary domains.	Levi E. Lingsch; Mike Yan Michelis; Emmanuel de Bezenac; Sirani M. Perera; Robert K. Katzschmann; Siddhartha Mishra;
302	MEMORYLLM: Towards Self-Updatable Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool within the latent space of the transformer.	Yu Wang; Yifan Gao; Xiusi Chen; Haoming Jiang; Shiyang Li; Jingfeng Yang; Qingyu Yin; Zheng Li; Xian Li; Bing Yin; Jingbo Shang; Julian McAuley;
303	Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This oversight and the requirement for annotated samples for downstream tasks limit eSSL’s versatility. In this work, we address these issues with the Multimodal ECG Representation Learning (MERL) framework.	Che Liu; Zhongwei Wan; Cheng Ouyang; Anand Shah; Wenjia Bai; Rossella Arcucci;
304	Transforming and Combining Rewards for Aligning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the language model. We study two closely related problems that arise in this approach.	Zihao Wang; Chirag Nagpal; Jonathan Berant; Jacob Eisenstein; Alexander Nicholas D’Amour; Sanmi Koyejo; Victor Veitch;
305	Position: What Makes An Image Realistic? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we introduce the notion of a universal critic, which unlike adversarial critics does not require adversarial training.	Lucas Theis;
306	Visual Representation Learning with Stochastic Frame Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because of the under-determined nature of frame prediction; multiple potential futures can arise from a single current frame. To tackle this challenge, in this paper, we revisit the idea of stochastic video generation that learns to capture uncertainty in frame prediction and explore its effectiveness for representation learning.	Huiwon Jang; Dongyoung Kim; Junsu Kim; Jinwoo Shin; Pieter Abbeel; Younggyo Seo;
307	UP2ME: Univariate Pre-training to Multivariate Fine-tuning As A General-purpose Framework for Multivariate Time Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general-purpose framework, named UP2ME (Univariate Pre-training to Multivariate Fine-tuning).	Yunhao Zhang; Minghao Liu; Shengyang Zhou; Junchi Yan;
308	Mean-field Chaos Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new class of score-based generative models (SGMs) designed to handle high-cardinality data distributions by leveraging concepts from mean-field theory.	Sungwoo Park; Dongjun Kim; Ahmed Alaa;
309	PIDformer: Transformer Meets Control Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation.	Tam Minh Nguyen; Cesar A Uribe; Tan Minh Nguyen; Richard Baraniuk;
310	Image Fusion Via Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods predominantly focus on pixel-level and semantic visual features for recognition, but often overlook the deeper text-level semantic information beyond vision. Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information from source images to guide the fusion process.	Zixiang Zhao; Lilun Deng; Haowen Bai; Yukun Cui; Zhipeng Zhang; Yulun Zhang; Haotong Qin; Dongdong Chen; Jiangshe Zhang; PENG WANG; Luc Van Gool;
311	Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics.	Xiangzhe Kong; Wenbing Huang; Yang Liu;
312	Soft Prompt Recovers Compressed LLMs, Transferably Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the gaining of such efficiency benefits often simultaneously demands extensive engineering efforts and intricate designs to mitigate the performance decline. \quad In this work, we leverage \textit{(Soft) Prompt Tuning} in its most vanilla form and discover such conventionally learned soft prompts can recover the performance of compressed LLMs.	Zhaozhuo Xu; Zirui Liu; Beidi Chen; Shaochen Zhong; Yuxin Tang; Jue WANG; Kaixiong Zhou; Xia Hu; Anshumali Shrivastava;
313	Self-Infilling Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.	Lin Zheng; Jianbo Yuan; Zhi Zhang; Hongxia Yang; Lingpeng Kong;
314	Slow and Steady Wins The Race: Maintaining Plasticity with Hare and Tortoise Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the Hare \& Tortoise, inspired by the brain’s complementary learning system.	Hojoon Lee; Hyeonseo Cho; Hyunseung Kim; Donghu Kim; Dugki Min; Jaegul Choo; Clare Lyle;
315	Robust Yet Efficient Conformal Prediction Sets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We derive provably robust sets by bounding the worst-case change in conformity scores. Our tighter bounds lead to more efficient sets. We cover both continuous and discrete (sparse) data and our guarantees work both for evasion and poisoning attacks (on both features and labels).	Soroush H. Zargarbashi; Mohammad Sadegh Akhondzadeh; Aleksandar Bojchevski;
316	Break The Sequential Dependency of LLM Inference Using Lookahead Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Lookahead decoding, an exact, parallel decoding algorithm that accelerates LLM decoding without needing auxiliary models or data stores.	Yichao Fu; Peter Bailis; Ion Stoica; Hao Zhang;
317	CasCast: Skillful High-resolution Precipitation Nowcasting Via Cascaded Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CasCast, a cascaded framework composed of a deterministic and a probabilistic part to decouple the predictions for mesoscale precipitation distributions and small-scale patterns.	Junchao Gong; LEI BAI; Peng Ye; Wanghan Xu; Na Liu; Jianhua Dai; Xiaokang Yang; Wanli Ouyang;
318	TVE: Learning Meta-attribution for Transferable Vision Explainer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation results in explaining various tasks being time- and resource-consuming. To address this problem, we introduce a Transferable Vision Explainer (TVE) that can effectively explain various vision models in downstream tasks.	Guanchu Wang; Yu-Neng Chuang; Fan Yang; Mengnan Du; Chia-Yuan Chang; Shaochen Zhong; Zirui Liu; Zhaozhuo Xu; Kaixiong Zhou; Xuanting Cai; Xia Hu;
319	Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in Ambient diffusion.	Giannis Daras; Alex Dimakis; Constantinos Costis Daskalakis;
320	Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we extend the Vendi scores—a family of interpretable similarity-based diversity metrics—to account for quality.	Quan Nguyen; Adji Bousso Dieng;
321	From Self-Attention to Markov Models: Unveiling The Dynamics of Generative Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study learning a 1-layer self-attention model from a set of prompts and the associated outputs sampled from the model.	Muhammed Emrullah Ildiz; Yixiao HUANG; Yingcong Li; Ankit Singh Rawat; Samet Oymak;
322	Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in learning, exploration and planning, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations.	Hongming Zhang; Tongzheng Ren; Chenjun Xiao; Dale Schuurmans; Bo Dai;
323	A Distributional Analogue to The Successor Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process.	Harley Wiltzer; Jesse Farebrother; Arthur Gretton; Yunhao Tang; Andre Barreto; Will Dabney; Marc G Bellemare; Mark Rowland;
324	Hyperbolic Active Learning for Semantic Segmentation Under Domain Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a hyperbolic neural network approach to pixel-level active learning for semantic segmentation.	Luca Franco; Paolo Mandica; Konstantinos Kallidromitis; Devin Guillory; Yu-Teng Li; Trevor Darrell; Fabio Galasso;
325	LangCell: Language-Cell Pre-training for Cell Identity Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity.	Suyuan Zhao; Jiahuan Zhang; Yushuai Wu; YIZHEN LUO; Zaiqing Nie;
326	Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi ICML 2023, Alman and Song NeurIPS 2023], we formally define a dynamic version of attention matrix multiplication problem.	Jan van den Brand; Zhao Song; Tianyi Zhou;
327	Receptive Fields As Experts in Convolutional Neural Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Mixture of Receptive Fields (MoRF) instead of using a single receptive field.	Dongze Lian; Weihao Yu; Xinchao Wang;
328	Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption.	Zhen Qin; Weigao Sun; Dong Li; Xuyang Shen; Weixuan Sun; Yiran Zhong;
329	SelfVC: Voice Conversion With Iterative Refinement Using Self Transformations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models.	Paarth Neekhara; Shehzeen Samarah Hussain; Rafael Valle; Boris Ginsburg; Rishabh Ranjan; Shlomo Dubnov; Farinaz Koushanfar; Julian McAuley;
330	A Language Model’s Guide Through Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the focus of previous work has largely been on truthfulness, in this paper we extend this framework to a richer set of concepts such as appropriateness, humor, creativity and quality, and explore to what degree current detection and guidance strategies work in these challenging settings.	Dimitri von Rütte; Sotiris Anagnostidis; Gregor Bachmann; Thomas Hofmann;
331	Decoding-time Realignment of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This process, however, is resource-intensive, especially for large models. To address this challenge, we propose decoding-time realignment (DeRa), a simple method to explore and evaluate different regularization strengths in aligned models without retraining.	Tianlin Liu; Shangmin Guo; Leonardo Bianco; Daniele Calandriello; Quentin Berthet; Felipe Llinares-López; Jessica Hoffmann; Lucas Dixon; Michal Valko; Mathieu Blondel;
332	Representation Surgery for Multi-Task Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a representation surgery solution called “Surgery to reduce representation bias in the merged model.	Enneng Yang; Li Shen; Zhenyi Wang; Guibing Guo; Xiaojun Chen; Xingwei Wang; Dacheng Tao;
333	ConvNet Vs Transformer, Supervised Vs CLIP: Beyond ImageNet Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct an in-depth comparative analysis of model behaviors beyond ImageNet accuracy, for both ConvNet and Vision Transformer architectures, each across supervised and CLIP training paradigms.	Kirill Vishniakov; Zhiqiang Shen; Zhuang Liu;
334	QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose Quest, a query-aware KV cache selection algorithm.	Jiaming Tang; Yilong Zhao; Kan Zhu; Guangxuan Xiao; Baris Kasikci; Song Han;
335	The Non-linear $F$-Design and Applications to Interactive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generalization of the classical G-optimal design concept to non-linear function classes.	Alekh Agarwal; Jian Qian; Alexander Rakhlin; Tong Zhang;
336	Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations.	Hengyi Wang; Shiwei Tan; Hao Wang;
337	A Dense Reward View on Aligning Text-to-Image Diffusion with Preference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain.	Shentao Yang; Tianqi Chen; Mingyuan Zhou;
338	Behavior Generation with Latent Actions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations.	Seungjae Lee; Yibin Wang; Haritheja Etukuru; H. Jin Kim; Nur Muhammad Mahi Shafiullah; Lerrel Pinto;
339	RICE: Breaking Through The Training Bottlenecks of Reinforcement Learning with Explanation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks.	Zelei Cheng; Xian Wu; Jiahao Yu; Sabrina Yang; Gang Wang; Xinyu Xing;
340	QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques.Second, QuIP# uses vector quantization to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing.	Albert Tseng; Jerry Chee; Qingyao Sun; Volodymyr Kuleshov; Christopher De Sa;
341	Graph Structure Extrapolation for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to achieve graph OOD generalization with the novel design of non-Euclidean-space linear extrapolation.	Xiner Li; Shurui Gui; Youzhi Luo; Shuiwang Ji;
342	Borda Regret Minimization for Generalized Linear Dueling Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret.	Yue Wu; Tao Jin; Qiwei Di; Hao Lou; Farzad Farnoud; Quanquan Gu;
343	Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How can we aggregate the input into consistent data about “collective” preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.	Vincent Conitzer; Rachel Freedman; Jobst Heitzig; Wesley H. Holliday; Bob M. Jacobs; Nathan Lambert; Milan Mossé; Eric Pacuit; Stuart Russell; Hailey Schoelkopf; Emanuel Tewolde; William S. Zwicker;
344	BiLLM: Pushing The Limit of Post-Training Quantization for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs.	Wei Huang; Yangdong Liu; Haotong Qin; Ying Li; Shiming Zhang; Xianglong Liu; Michele Magno; XIAOJUAN QI;
345	LoRA+: Efficient Low Rank Adaptation of Large Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in (Hu et al., 2021) leads to suboptimal finetuning of models with large width.	Soufiane Hayou; Nikhil Ghosh; Bin Yu;
346	Enhancing Trajectory Prediction Through Self-Supervised Waypoint Distortion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel approach called SSWDP (Self-Supervised Waypoint Distortion Prediction).	Pranav singh chib; Pravendra Singh;
347	MS-TIP: Imputation Aware Pedestrian Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the MultiScale hypergraph for Trajectory Imputation and Prediction (MS-TIP), a novel approach that simultaneously addresses the imputation of missing observations and the prediction of future trajectories.	Pranav singh chib; Achintya Nath; Paritosh Kabra; Ishu Gupta; Pravendra Singh;
348	Enabling Few-Shot Learning with PID Control: A Layer Adaptive Optimizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by classical proportional-integral-derivative (PID) control theory, this study introduces a Layer-Adaptive PID (LA-PID) Optimizer, a MAML-based optimizer that employs efficient parameter optimization methods to dynamically adjust task-specific PID control gains at each layer of the network, conducting a first-principles analysis of optimal convergence conditions.	Le Yu; Xinde Li; Pengfei Zhang; zhentong zhang; Fir Dunkin;
349	Long Is More for Alignment: A Simple But Tough-to-Beat Baseline for Instruction Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: LIMA (NeurIPS 2023) and AlpaGasus (ICLR 2024) are state-of-the-art methods for selecting such high-quality examples, either via manual curation or using GPT-3.5-Turbo as a quality scorer. We show that the extremely simple baseline of selecting the 1,000 instructions with longest responses—that intuitively contain more learnable information and are harder to overfit—from standard datasets can consistently outperform these sophisticated methods according to GPT-4 and PaLM-2 as judges, while remaining competitive on the Open LLM benchmarks that test factual knowledge.	Hao Zhao; Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion;
350	SSL4Q: Semi-Supervised Learning of Quantum Data with Application to Quantum State Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SSL4Q, manage to achieve (for the first time) semi-supervised learning specifically designed for quantum state classification.	Yehui Tang; Nianzu Yang; Mabiao Long; Junchi Yan;
351	Recovering The Pre-Fine-Tuning Weights of Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning* model weights. In this paper, we demonstrate that this assumption is often false.*	Eliahu Horwitz; Jonathan Kahana; Yedid Hoshen;
352	An LLM Compiler for Parallel Function Calling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls.	Sehoon Kim; Suhong Moon; Ryan Tabrizi; Nicholas Lee; Michael W. Mahoney; Kurt Keutzer; Amir Gholami;
353	Decomposing Uncertainty for Large Language Models Through Input Clarification Ensembling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling, which can be applied to any pre-trained LLM.	Bairu Hou; Yujian Liu; Kaizhi Qian; Jacob Andreas; Shiyu Chang; Yang Zhang;
354	Navigating Scaling Laws: Compute Optimality in Adaptive Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to the notion of a ‘compute-optimal’ model, i.e. a model that allocates a given level of compute during training optimally to maximize performance. In this work, we extend the concept of optimality by allowing for an ‘adaptive’ model, i.e. a model that can change its shape during training.	Sotiris Anagnostidis; Gregor Bachmann; Imanol Schlag; Thomas Hofmann;
355	DySLIM: Dynamics Stable Learning By Invariant Measure for Chaotic Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories’ length increases.	Yair Schiff; Zhong Yi Wan; Jeffrey B. Parker; Stephan Hoyer; Volodymyr Kuleshov; Fei Sha; Leonardo Zepeda-Núñez;
356	Learning to Predict Mutational Effects of Protein-Protein Interactions By Microenvironment-aware Hierarchical Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first construct a hierarchical prompt codebook to record common microenvironmental patterns at different structural scales independently. Then, we develop a novel codebook pre-training task, namely masked microenvironment modeling, to model the joint distribution of each mutation with their residue types, angular statistics, and local conformational changes in the microenvironment.	Lirong Wu; Yijun Tian; Haitao Lin; Yufei Huang; Siyuan Li; Nitesh V Chawla; Stan Z. Li;
357	GPT-4V(ision) Is A Generalist Web Agent, If Grounded Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website.	Boyuan Zheng; Boyu Gou; Jihyung Kil; Huan Sun; Yu Su;
358	MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, these multi-agent approaches fail to provide a final, single model for efficient inference. To address this, we introduce MAGDi, a new method for structured distillation of the reasoning interactions between multiple LLMs into smaller LMs.	Justin Chen; Swarnadeep Saha; Elias Stengel-Eskin; Mohit Bansal;
359	The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking The Curse of Information and Leap Exponents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions.	Yatin Dandi; Emanuele Troiani; Luca Arnaboldi; Luca Pesce; Lenka Zdeborova; Florent Krzakala;
360	How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study how well LLMs can negotiate with each other.	Federico Bianchi; Patrick John Chia; Mert Yuksekgonul; Jacopo Tagliabue; Dan Jurafsky; James Zou;
361	Robustness of Nonlinear Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of unsupervised representation learning in slightly misspecified settings, and thus formalize the study of robustness of nonlinear representation learning.	Simon Buchholz; Bernhard Schölkopf;
362	NExT-Chat: An LMM for Chat, Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to enhance visual comprehension, recent studies have equipped LMMs with region-level understanding capabilities by representing object bounding box coordinates as a series of text sequences (pix2seq). In this paper, we introduce a novel paradigm for object location modeling called the pix2emb method, where we ask the LMM to output the location embeddings and then decode them with different decoders.	Ao Zhang; Yuan Yao; Wei Ji; Zhiyuan Liu; Tat-Seng Chua;
363	On The Embedding Collapse When Scaling Up Recommendation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace.	Xingzhuo Guo; Junwei Pan; Ximei Wang; Baixu Chen; Jie Jiang; Mingsheng Long;
364	Fool Your (Vision And) Language Model with Embarrassingly Simple Permutations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA).	Yongshuo Zong; Tingyang Yu; Ruchika Chavhan; Bingchen Zhao; Timothy Hospedales;
365	Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the underpinning LLM.	Yongshuo Zong; Ondrej Bohdal; Tingyang Yu; Yongxin Yang; Timothy Hospedales;
366	CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To pursue more efficient vision-language Transformers, this paper introduces Cross-Guided Ensemble of Tokens (CrossGET), a general acceleration framework for vision-language Transformers.	Dachuan Shi; Chaofan Tao; Anyi Rao; Zhendong Yang; Chun Yuan; Jiaqi Wang;
367	SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data.	Chaoqun Du; Yizeng Han; Gao Huang;
368	Switchable Decision: Dynamic Neural Generation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance.	Shujian Zhang; Korawat Tanwisuth; Chengyue Gong; Pengcheng He; Mingyuan Zhou;
369	HarmonyDream: Task Harmonization Inside World Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling.	Haoyu Ma; Jialong Wu; Ningya Feng; Chenjun Xiao; Dong Li; Jianye HAO; Jianmin Wang; Mingsheng Long;
370	Training-Free Long-Context Scaling of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given the expensive overhead of finetuning large-scale models with longer sequences, we propose a training-free approach named Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of up to 100k tokens.	Chenxin An; Fei Huang; Jun Zhang; Shansan Gong; Xipeng Qiu; Chang Zhou; Lingpeng Kong;
371	Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods mainly utilize NeRF-based methods to represent the static scene and an additional time-variant MLP to model scene deformations, resulting in relatively low rendering quality as well as slow inference speed. To tackle these challenges, we propose a novel framework named Superpoint Gaussian Splatting (SP-GS).	Diwen Wan; Ruijie Lu; Gang Zeng;
372	Towards Theoretical Understandings of Self-Consuming Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric and non-parametric models.	Shi Fu; Sen Zhang; Yingjie Wang; Xinmei Tian; Dacheng Tao;
373	On The Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL).	Zhanpeng Zhou; Zijun Chen; Yilan Chen; Bo Zhang; Junchi Yan;
374	Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing research of video understanding still struggles to achieve in-depth comprehension and reasoning in complex videos, primarily due to the under-exploration of two key bottlenecks: fine-grained spatial-temporal perceptive understanding and cognitive-level video scene comprehension. This paper bridges the gap by presenting a novel solution.	Hao Fei; Shengqiong Wu; Wei Ji; Hanwang Zhang; Meishan Zhang; Mong-Li Lee; Wynne Hsu;
375	A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments.	Kuang-Huei Lee; Xinyun Chen; Hiroki Furuta; John Canny; Ian Fischer;
376	Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a systematic analysis of transfer learning using PLMs, conducting 370 experiments across a comprehensive suite of factors including different downstream tasks, architectures, model sizes, model depths, and pretraining time.	Francesca-Zhoufan Li; Ava P Amini; Yisong Yue; Kevin K Yang; Alex Xijie Lu;
377	Scalable and Flexible Causal Discovery with An Efficient Test for Adjacency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT).	Alan Nawzad Amin; Andrew Gordon Wilson;
378	Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, alternative models that are trained from scratch, such as FrameDiff, still fall short in performance. In this context, we introduce Proteus, an innovative deep diffusion network that incorporates graph-based triangle methods and a multi-track interaction network, eliminating the dependency on structure prediction pre-training with superior efficiency.	Chentong Wang; Yannan Qu; Zhangzhi Peng; Yukai Wang; Hongli Zhu; Dachuan Chen; Longxing Cao;
379	Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces RaLMSpec, a framework that accelerates iterative retrieval-augmented language model (RaLM) with speculative retrieval* and batched verification.*	Zhihao Zhang; Alan Zhu; Lijie Yang; Yihua Xu; Lanting Li; Phitchaya Mangpo Phothilimthana; Zhihao Jia;
380	MILP-FBGen: LP/MILP Instance Generation with Feasibility/Boundedness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a diffusion-based LP/MILP instance generative framework called MILP-FBGen.	Yahong Zhang; Chenchen Fan; Donghui Chen; Congrui Li; Wenli Ouyang; Mingda Zhu; Junchi Yan;
381	Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Hierarchical State-Space models (HiSS), a conceptually simple, new technique for continuous sequential prediction.	Raunaq Bhirangi; Chenyu Wang; Venkatesh Pattabiraman; Carmel Majidi; Abhinav Gupta; Tess Hellebrekers; Lerrel Pinto;
382	Spider: A Unified Framework for Context-dependent Concept Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a unified model with a single set of parameters, Spider, which only needs to be trained once.	Xiaoqi Zhao; Youwei Pang; Wei Ji; Baicheng Sheng; Jiaming Zuo; Lihe Zhang; Huchuan Lu;
383	How to Trace Latent Generative Model Generated Images Without Artificial Watermark? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we ask whether it is possible to effectively and efficiently trace the images generated by a specific latent generative model without the aforementioned requirements.	Zhenting Wang; Vikash Sehwag; Chen Chen; Lingjuan Lyu; Dimitris N. Metaxas; Shiqing Ma;
384	Prompt-tuning Latent Diffusion Models for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors.	Hyungjin Chung; Jong Chul Ye; Peyman Milanfar; Mauricio Delbracio;
385	Lightweight Image Super-Resolution Via Flexible Meta Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While some model compression techniques try to achieve more lightweight SR models with neural architecture search, knowledge distillation, or channel pruning, they typically require considerable extra computational resources or neglect to prune weights. To address these issues, we propose a flexible meta pruning (FMP) for lightweight image SR, where the network channels and weights are pruned simultaneously.	Yulun Zhang; Kai Zhang; Luc Van Gool; Martin Danelljan; Fisher Yu;
386	Flexible Residual Binarization for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, as a pixel-wise reconstruction task, binarization often results in heavy representation content distortion. To address these issues, we propose a flexible residual binarization (FRB) method for image SR.	Yulun Zhang; Haotong Qin; Zixiang Zhao; Xianglong Liu; Martin Danelljan; Fisher Yu;
387	Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we develop an innovative method named Bayesian Knowledge Distillation (BKD) to provide a transparent interpretation of the working mechanism of KD, and a suite of Bayesian inference tools for the uncertainty quantification of the student model.	Luyang Fang; Yongkai Chen; Wenxuan Zhong; Ping Ma;
388	HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.	Shengchao Hu; Ziqing Fan; Li Shen; Ya Zhang; Yanfeng Wang; Dacheng Tao;
389	Q-value Regularized Transformer for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fortunately, Dynamic Programming (DP) methods offer a solution by leveraging a value function to approximate optimal future returns for each state, while these techniques are prone to unstable learning behaviors, particularly in long-horizon and sparse-reward scenarios. Building upon these insights, we propose the Q-value regularized Transformer (QT), which combines the trajectory modeling ability of the Transformer with the predictability of optimal future returns from DP methods.	Shengchao Hu; Ziqing Fan; Chaoqin Huang; Li Shen; Ya Zhang; Yanfeng Wang; Dacheng Tao;
390	Demystifying SGD with Doubly Stochastic Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators.	Kyurae Kim; Joohwan Ko; Yian Ma; Jacob R. Gardner;
391	Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to enrich preference queries to ask both (1) which features of a given example are preferable in addition to (2) comparisons between objects.	Andi Peng; Yuying Sun; Tianmin Shu; David Abel;
392	FreeBind: Free Lunch in Unified Multimodal Space Via Knowledge Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via “space bonds.	Zehan Wang; Ziang Zhang; Xize Cheng; Rongjie Huang; Luping Liu; Zhenhui Ye; Haifeng Huang; Yang Zhao; Tao Jin; Peng Gao; Zhou Zhao;
393	A Graph Is Worth $K$ Words: Euclideanizing Graph Using Pure Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence.	Zhangyang Gao; Daize Dong; Cheng Tan; Jun Xia; Bozhen Hu; Stan Z. Li;
394	In-Context Learning Agents Are Asymmetric Belief Updaters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology.	Johannes A. Schubert; Akshay Kumar Jagadish; Marcel Binz; Eric Schulz;
395	Outlier-aware Slicing for Post-Training Quantization in Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses a critical challenge in PTQ: \textbf{the severe impact of outliers on the accuracy of quantized transformer architectures.} Specifically, we introduce the concept of `reconstruction granularity’ as a novel solution to this issue, which has been overlooked in previous works.	Yuexiao Ma; Huixia Li; Xiawu Zheng; Feng Ling; Xuefeng Xiao; Rui Wang; Shilei Wen; Fei Chao; Rongrong Ji;
396	DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the retrieval-augmented large language model with Definite Finite Automaton (DFA-RAG), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs).	Yiyou Sun; Junjie Hu; Wei Cheng; Haifeng Chen;
397	Distributional Bellman Operators Over Mean Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.	Li Kevin Wenliang; Gregoire Deletang; Matthew Aitchison; Marcus Hutter; Anian Ruoss; Arthur Gretton; Mark Rowland;
398	A Doubly Recursive Stochastic Compositional Gradient Descent Method for Federated Multi-Level Compositional Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the convergence rate of existing federated two-level compositional optimization learning algorithms fails to achieve linear speedup with respect to the number of workers under heterogeneous settings. After identifying the reason for this failure, we developed a novel federated stochastic multi-level compositional optimization algorithm by introducing a novel Jacobian-vector product estimator.	Hongchang Gao;
399	Purifying Quantization-conditioned Backdoors Via Layer-wise Activation Correction with Distribution Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an in-depth analysis of QCBs.	Boheng Li; Yishuo Cai; Jisong Cai; Yiming Li; Han Qiu; Run Wang; Tianwei Zhang;
400	Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Compression, and Tracing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the interplay between memorization and learning in the context of stochastic convex optimization* (SCO).*	Idan Attias; Gintare Karolina Dziugaite; Mahdi Haghifam; Roi Livni; Daniel M. Roy;
401	Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies Towards Equal Long-term Benefit Rate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address biases in sequential decision-making, we introduce a long-term fairness concept named Equal Long-term Benefit Rate (ELBERT).	Yuancheng Xu; Chenghao Deng; Yanchao Sun; Ruijie Zheng; Xiyao Wang; Jieyu Zhao; Furong Huang;
402	Safe Reinforcement Learning Using Finite-Horizon Gradient-based Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we propose the first estimation method for finite-horizon non-discounted constraints in deep Safe RL, termed Gradient-based Estimation (GBE), which relies on the analytic gradient derived along trajectories.	Juntao Dai; Yaodong Yang; Qian Zheng; Gang Pan;
403	Decoding Compressed Trust: Scrutinizing The Trustworthiness of Efficient LLMs Under Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously.	Junyuan Hong; Jinhao Duan; Chenhui Zhang; Zhangheng LI; Chulin Xie; Kelsey Lieberman; James Diffenderfer; Brian R. Bartoldson; AJAY KUMAR JAISWAL; Kaidi Xu; Bhavya Kailkhura; Dan Hendrycks; Dawn Song; Zhangyang Wang; Bo Li;
404	Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor.	Didi Zhu; Zhongyisun Sun; Zexi Li; Tao Shen; Ke Yan; Shouhong Ding; Chao Wu; Kun Kuang;
405	Chain-of-Thought Predictive Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos.	Zhiwei Jia; Vineet Thumuluri; Fangchen Liu; Linghao Chen; Zhiao Huang; Hao Su;
406	Graph Mixup on Approximate Gromov–Wasserstein Geodesics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though efforts have been made, most of the existing graph mixup methods neglect the intrinsic geodesic guarantee, thereby generating inconsistent sample-label pairs. To address this issue, we propose GeoMix to mixup graphs on the Gromov-Wasserstein (GW) geodesics.	Zhichen Zeng; Ruizhong Qiu; Zhe Xu; Zhining Liu; Yuchen Yan; Tianxin Wei; Lei Ying; Jingrui He; Hanghang Tong;
407	Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Jetfire, an efficient and accurate INT8 training method specific to transformers.	Haocheng Xi; Yuxiang Chen; Kang Zhao; KAI JUN TEH; Jianfei Chen; Jun Zhu;
408	Score Identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator.	Mingyuan Zhou; Huangjie Zheng; Zhendong Wang; Mingzhang Yin; Hai Huang;
409	ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images.	Rohan Wadhawan; Hritik Bansal; Kai-Wei Chang; Nanyun Peng;
410	Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While various robust aggregations have been proposed to defend against such attacks, they are subject to certain assumptions: homogeneous private data and related proxy datasets. To address these limitations, we propose Self-Driven Entropy Aggregation (SDEA), which leverages the random public dataset to conduct Byzantine-robust aggregation in heterogeneous federated learning.	Wenke Huang; Zekun Shi; Mang Ye; He Li; Bo Du;
411	Value-Evolutionary-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL) that focuses on the integration of EAs with value-based RL.	Pengyi Li; Jianye HAO; Hongyao Tang; YAN ZHENG; Fazl Barez;
412	Comparing Graph Transformers Via Positional Encodings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings.	Mitchell Black; Zhengchao Wan; Gal Mishne; Amir Nayyeri; Yusu Wang;
413	ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.	Tianying Ji; Yongyuan Liang; Yan Zeng; Yu Luo; Guowei Xu; Jiawei Guo; Ruijie Zheng; Furong Huang; Fuchun Sun; Huazhe Xu;
414	Seizing Serendipity: Exploiting The Value of Past Success in Off-Policy Actor-Critic Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Blended Exploitation and Exploration (BEE) operator, a simple yet effective approach that updates $Q$-value using both historical best-performing actions and the current policy.	Tianying Ji; Yu Luo; Fuchun Sun; Xianyuan Zhan; Jianwei Zhang; Huazhe Xu;
415	MindEye2: Shared-Subject Models Enable FMRI-To-Image With 1 Hour of Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results.	Paul Steven Scotti; Mihir Tripathy; Cesar Torrico; Reese Kneeland; Tong Chen; Ashutosh Narang; Charan Santhirasegaran; Jonathan Xu; Thomas Naselaris; Kenneth A. Norman; Tanishq Mathew Abraham;
416	Sample-Efficient Multiagent Reinforcement Learning with Reset Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, few MARL works focus on this practical problem especially in the parallel environment setting, which greatly hampers the application of MARL into the real world. In response to this gap, in this paper, we propose Multiagent Reinforcement Learning with Reset Replay (MARR) to greatly improve the sample efficiency of MARL by enabling MARL training at a high replay ratio in the parallel environment setting for the first time.	Yaodong Yang; Guangyong Chen; Jianye HAO; Pheng-Ann Heng;
417	Reinforcement Learning Within Tree Search for Fast Macro Placement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing RL-based techniques are hindered by their low sample efficiency, requiring numerous online rollouts or substantial offline expert data to achieve bootstrap, which are often impractical in industrial scenarios. To address this challenge, we propose a novel sample-efficient framework, namely EfficientPlace, for fast macro placement.	Zijie Geng; Jie Wang; Ziyan Liu; Siyuan Xu; Zhentao Tang; Mingxuan Yuan; Jianye HAO; Yongdong Zhang; Feng Wu;
418	Feel-Good Thompson Sampling for Contextual Dueling Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits.	Xuheng Li; Heyang Zhao; Quanquan Gu;
419	Position: Do Pretrained Transformers Learn In-Context By Gradient Descent? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct comprehensive empirical analyses on language models pre-trained on natural data (LLaMa-7B).	Lingfeng Shen; Aayush Mishra; Daniel Khashabi;
420	How Do Large Language Models Navigate Conflicts Between Honesty and Helpfulness? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How do large language models (LLMs) handle such nuanced trade-offs? To address this question, we use psychological models and experiments designed to characterize human behavior to analyze LLMs.	Ryan Liu; Theodore Sumers; Ishita Dasgupta; Thomas L. Griffiths;
421	The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis.	Ziquan Liu; Yufei CUI; Yan Yan; Yi Xu; Xiangyang Ji; Xue Liu; Antoni B. Chan;
422	Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, in contrast to integrating visual prompts into inputs, we regard visual prompts as additional knowledge that facilitates language models in addressing tasks associated with visual information.	Shibo Jie; Yehui Tang; Ning Ding; Zhi-Hong Deng; Kai Han; Yunhe Wang;
423	LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window.	Yiran Ding; Li Lyna Zhang; Chengruidong Zhang; Yuanyuan Xu; Ning Shang; Jiahang Xu; Fan Yang; Mao Yang;
424	Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics.	Ignat Georgiev; Krishnan Srinivasan; Jie Xu; Eric Heiden; Animesh Garg;
425	Reinformer: Max-Return Sequence Modeling for Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models.	Zifeng Zhuang; Dengyun Peng; Jinxin Liu; Ziqi Zhang; Donglin Wang;
426	Language Models As Semantic Indexers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model.	Bowen Jin; Hansi Zeng; Guoyin Wang; Xiusi Chen; Tianxin Wei; Ruirui Li; Zhengyang Wang; Zheng Li; Yang Li; Hanqing Lu; Suhang Wang; Jiawei Han; Xianfeng Tang;
427	Implicit Meta-learning May Lead Language Models to Trust More Reliable Sources Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce random strings (tags) as indicators of usefulness in a synthetic fine-tuning dataset.	Dmitrii Krasheninnikov; Egor Krasheninnikov; Bruno Kacper Mlodozeniec; Tegan Maharaj; David Krueger;
428	Learning Temporal Distances: Contrastive Successor Features Can Provide A Metric Structure for Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings.	Vivek Myers; Chongyi Zheng; Anca Dragan; Sergey Levine; Benjamin Eysenbach;
429	Hypergraph-enhanced Dual Semi-supervised Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs.	Wei Ju; Zhengyang Mao; Siyu Yi; Yifang Qin; Yiyang Gu; Zhiping Xiao; Yifan Wang; Xiao Luo; Ming Zhang;
430	Copyright Traps for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur.	Matthieu Meeus; Igor Shilov; Manuel Faysse; Yves-Alexandre de Montjoye;
431	X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts.	Yiwei Ma; Zhekai Lin; Jiayi Ji; Yijun Fan; Xiaoshuai Sun; Rongrong Ji;
432	Tuning-Free Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD).	Ahmed Khaled; Chi Jin;
433	A Tale of Tails: Model Collapse As A Change of Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: *How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus?	Elvis Dohmatob; Yunzhen Feng; Pu Yang; Francois Charton; Julia Kempe;
434	Equivariance Via Minimal Frame Averaging for More Symmetries and Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant.	Yuchao Lin; Jacob Helwig; Shurui Gui; Shuiwang Ji;
435	Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With no existing approaches to control diversity to a set value, current solutions focus on blindly promoting it via intrinsic rewards or additional loss functions, effectively changing the learning objective and lacking a principled measure for it. To address this, we introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric by representing policies as the sum of a parameter-shared component and dynamically scaled per-agent components.	Matteo Bettini; Ryan Kortvelesy; Amanda Prorok;
436	Position: Will We Run Out of Data? Limits of LLM Scaling Based on Human-generated Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the potential constraints on LLM scaling posed by the availability of public human-generated text data.	Pablo Villalobos; Anson Ho; Jaime Sevilla; Tamay Besiroglu; Lennart Heim; Marius Hobbhahn;
437	Symmetry Induces Structure and Constraint of Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models.	Liu Ziyin;
438	Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data.	He Zhang; Chang Liu; Zun Wang; Xinran Wei; Siyuan Liu; Nanning Zheng; Bin Shao; Tie-Yan Liu;
439	ViP: A Differentially Private Foundation Model for Computer Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose as a mitigation measure a recipe to train foundation vision models via self-supervised learning with differential privacy (DP) guarantee.	Yaodong Yu; Maziar Sanjabi; Yi Ma; Kamalika Chaudhuri; Chuan Guo;
440	Human-like Category Learning By Injecting Ecological Priors from Large Language Models Into Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that large language models can generate cognitive tasks, specifically category learning tasks, that match the statistics of real-world tasks, thereby addressing the first challenge.	Akshay Kumar Jagadish; Julian Coda-Forno; Mirko Thalmann; Eric Schulz; Marcel Binz;
441	FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research.Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents.	Wenzhe Li; Zihan Ding; Seth Karten; Chi Jin;
442	Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from A Single Demonstration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new algorithm called Deep Demonstration Tracing (DDT).	Xiong-Hui Chen; Junyin Ye; Hang Zhao; Yi-Chen Li; Xu-Hui Liu; Haoran Shi; Yu-Yan Xu; Zhihao Ye; Si-Hang Yang; Yang Yu; Kai Xu; Zongzhang Zhang; Anqi Huang;
443	Position: Towards Implicit Prompt For Text-To-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts with popular T2I models.	Yue Yang; Yuqi Lin; Hong Liu; Wenqi Shao; Runjian Chen; Hailong Shang; Yu Wang; Yu Qiao; Kaipeng Zhang; Ping Luo;
444	In-Context Principle Learning from Mistakes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples.	Tianjun Zhang; Aman Madaan; Luyu Gao; Steven Zheng; Swaroop Mishra; Yiming Yang; Niket Tandon; Uri Alon;
445	Vague Prototype-Oriented Diffusion Model for Multi-Class Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such a challenging setting, widely used reconstruction-based networks persistently grapple with the identical shortcut problem, wherein the infiltration of abnormal information from the condition biases the output towards an anomalous distribution. In response to this critical challenge, we introduce a Vague Prototype-Oriented Diffusion Model (VPDM) that extracts only fundamental information from the condition to prevent the occurrence of the identical shortcut problem from the input layer.	Yuxin Li; Yaoxuan Feng; Bo Chen; Wenchao Chen; Yubiao Wang; Xinyue Hu; baolin sun; Chunhui Qu; Mingyuan Zhou;
446	ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how people use text-to-image models to generate desired target images.	Kailas Vodrahalli; James Zou;
447	Timer: Generative Pre-trained Transformers Are Large Time Series Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To change the status quo of training scenario-specific small models from scratch, this paper aims at the early development of large time series models (LTSM).	Yong Liu; Haoran Zhang; Chenyu Li; Xiangdong Huang; Jianmin Wang; Mingsheng Long;
448	Diagnosing The Compositional Knowledge of Vision Language Models from A Game-Theoretic View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose evaluation methods from a novel game-theoretic view to assess the vulnerability of VLMs on different aspects of compositional understanding, e.g., relations and attributes.	Jin Wang; Shichao Dong; Yapeng Zhu; kelu Yao; Weidong Zhao; Chao Li; Ping Luo;
449	Mean-field Analysis on Two-layer Neural Networks from A Kernel Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime through the lens of kernel methods.	Shokichi Takakura; Taiji Suzuki;
450	Rethinking Optimization and Architecture for Tiny Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component.	Yehui Tang; Kai Han; Fangcheng Liu; Yunsheng Ni; Yuchuan Tian; Zheyuan Bai; Yi-Qi Hu; Sichao Liu; SHANGLING JUI; Yunhe Wang;
451	Data-Efficient Molecular Generation with Hierarchical Textual Inversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical Textual Inversion for Molecular Generation (HI-Mol), a novel data-efficient molecular generation method.	Seojin Kim; Jaehyun Nam; Sihyun Yu; Younghoon Shin; Jinwoo Shin;
452	Hybrid Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose using hybrid RL* — training on a mixture of online and expert data — to curtail unnecessary exploration.*	Juntao Ren; Gokul Swamy; Steven Wu; Drew Bagnell; Sanjiban Choudhury;
453	Calibration Bottleneck: Over-compressed Representations Are Less Calibratable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the observations, this paper introduces a weak classifier hypothesis, i.e., given a weak classification head that has not been over-trained, the representation module can be better learned to produce more calibratable features. Consequently, we propose a progressively layer-peeled training (PLP) method to exploit this hypothesis, thereby enhancing model calibratability.	Deng-Bao Wang; Min-Ling Zhang;
454	Generalization Analysis of Stochastic Weight Averaging with General Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the theoretical challenges, we adopt mathematical induction to find a recursive representation that bounds the gradient at each step. Based on this, we establish stability bounds supporting sampling with and without replacement in the non-convex setting.	Peng Wang; Li Shen; Zerui Tao; Shuaida He; Dacheng Tao;
455	Sliced Wasserstein with Random-Path Projecting Directions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an optimization-free slicing distribution that provides a fast sampling for the Monte Carlo estimation of expectation.	Khai Nguyen; Shujian Zhang; Tam Le; Nhat Ho;
456	Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We further identify two underlying causes of this inefficiency: the redundant inversion of noisy backgrounds and the unintended inversion of spurious correlations—a phenomenon we term “hallucination” in model inversion. To address these limitations, we propose a novel sparse model inversion strategy, as a plug-and-play extension to speed up existing dense inversion methods with no need for modifying their original loss functions.	Zixuan Hu; Yongxian Wei; Li Shen; Zhenyi Wang; Lei Li; Chun Yuan; Dacheng Tao;
457	Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we focus on the widespread setting where the observational data come from multiple environments, such as different hospitals, physicians, or countries.	Jonas Schweisthal; Dennis Frauen; Mihaela van der Schaar; Stefan Feuerriegel;
458	An Empirical Study of Realized GNN Expressiveness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous research has attempted to use datasets for measurement, but facing problems with difficulty (any model surpassing 1-WL has nearly 100% accuracy), granularity (models tend to be either 100% correct or near random guess), and scale (only several essentially different graphs involved). To address these limitations, we study the realized expressive power that a practical model instance can achieve using a novel expressiveness dataset, BREC, which poses greater difficulty (with up to 4-WL-indistinguishable graphs), finer granularity (enabling comparison of models between 1-WL and 3-WL), a larger scale (consisting of 800 1-WL-indistinguishable graphs that are non-isomorphic to each other).	Yanbo Wang; Muhan Zhang;
459	Weisfeiler Leman for Euclidean Equivariant Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on our results, we develop our WeLNet architecture, which sets new state-of-the-art results on the N-Body dynamics task and the GEOM-QM9 molecular conformation generation task.	Snir Hordan; Tal Amir; Nadav Dym;
460	SPARSE COCKTAIL: EVERY SPARSE PATTERN EVERY SPARSE RATIO ALL AT ONCE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take one more step forward and expand the scope of sparse co-training to cover \underline{diverse sparsity patterns} and \underline{multiple sparsity ratios} \textit{at once}.	Zhangheng LI; Shiwei Liu; Tianlong Chen; AJAY KUMAR JAISWAL; Zhenyu Zhang; Dilin Wang; Raghuraman Krishnamoorthi; Shiyu Chang; Zhangyang Wang;
461	Understanding Stochastic Natural Gradient Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding.	Kaiwen Wu; Jacob R. Gardner;
462	Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs “Difficult Downstream Tasks in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Junk DNA Hypothesis* by adopting a novel task-centric angle for the pre-trained weights of large language models (LLMs).*	Lu Yin; AJAY KUMAR JAISWAL; Shiwei Liu; Souvik Kundu; Zhangyang Wang;
463	Semantically-correlated Memories in A Dense Associative Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: I introduce a novel associative memory model named Correlated Dense Associative Memory (CDAM), which integrates both auto- and hetero-association in a unified framework for continuous-valued memory patterns.	Thomas F Burns;
464	Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, LLMs also exhibit the distraction phenomenon, where irrelevant context in the prompt degrades output quality. To address these drawbacks, we propose a novel RAG prompting methodology, superposition prompting, which can be directly applied to pre-trained transformer-based LLMs without the need for fine-tuning.	Thomas Merth; Qichen Fu; Mohammad Rastegari; Mahyar Najibi;
465	Learning from Streaming Data When Users Choose Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The service providers’ models influence which service the user will choose at the next time step, and the user’s choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss.	Jinyan Su; Sarah Dean;
466	Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a solution, we propose Dynamic Memory Compression (DMC), a method for on-line key–value cache compression at inference time.	Piotr Nawrot; Adrian Łańcucki; Marcin Chochowski; David Tarjan; Edoardo Ponti;
467	Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models.	Neta Shaul; Uriel Singer; Ricky T. Q. Chen; Matthew Le; Ali Thabet; Albert Pumarola; Yaron Lipman;
468	A Computational Framework for Solving Wasserstein Lagrangian Flows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. We propose a novel deep learning based framework approaching all of these problems from a unified perspective.	Kirill Neklyudov; Rob Brekelmans; Alexander Tong; Lazar Atanackovic; qiang liu; Alireza Makhzani;
469	Knowledge-aware Reinforced Language Models for Protein Directed Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel Knowledge-aware Reinforced Language Model (KnowRLM) for MLDE.	Yuhao Wang; Qiang Zhang; Ming Qin; Xiang Zhuang; Xiaotong Li; Zhichen Gong; Zeyuan Wang; Yu Zhao; Jianhua Yao; Keyan Ding; Huajun Chen;
470	TravelPlanner: A Benchmark for Real-World Planning with Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario.	Jian Xie; Kai Zhang; Jiangjie Chen; Tinghui Zhu; Renze Lou; Yuandong Tian; Yanghua Xiao; Yu Su;
471	Offline Imitation from Observation Via Primal Wasserstein State Occupancy Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable more flexible distance metrics, we propose Primal Wasserstein DICE (PW-DICE).	Kai Yan; Alex Schwing; Yu-Xiong Wang;
472	Protein Conformation Generation Via Force-Guided SE(3) Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided $\mathrm{SE}(3)$ diffusion model, ConfDiff, for protein conformation generation.	YanWang; Lihao Wang; Yuning Shen; Yiqun Wang; Huizhuo Yuan; Yue Wu; Quanquan Gu;
473	Diffusion Language Models Are Versatile Protein Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences.	Xinyou Wang; Zaixiang Zheng; Fei YE; Dongyu Xue; Shujian Huang; Quanquan Gu;
474	Uncertainty-Aware Reward-Free Exploration with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current intrinsic reward designs and unsupervised RL algorithms often overlook the heterogeneous nature of collected samples, thereby diminishing their sample efficiency. To overcome this limitation, in this paper, we proposed a reward-free RL algorithm called GFA-RFE.	Junkai Zhang; Weitong Zhang; Dongruo Zhou; Quanquan Gu;
475	Dual Operating Modes of In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze a generalized probabilistic model for pretraining data, obtaining a quantitative understanding of the two operating modes of ICL.	Ziqian Lin; Kangwook Lee;
476	Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MixPro, a lightweight and highly data-efficient approach for few-shot adaptation.	Yihao Xue; Ali Payani; Yu Yang; Baharan Mirzasoleiman;
477	Discovering Bias in Latent Space: An Unsupervised Debiasing Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This vulnerability often stems from the model’s preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias directly in the model’s internal representation.	Dyah Adila; Shuai Zhang; Boran Han; Bernie Wang;
478	Vanilla Bayesian Optimization Performs Great in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Spurred by the curse of dimensionality, a large collection of algorithms aim to make BO more performant in this setting, commonly by imposing various simplifying assumptions on the objective, thereby decreasing its presumed complexity. In this paper, we identify the degeneracies that make vanilla BO poorly suited to high-dimensional tasks, and further show how existing algorithms address these degeneracies through the lens of model complexity.	Carl Hvarfner; Erik Orm Hellsten; Luigi Nardi;
479	Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.	Yuchen Li; Alexandre Kirchmeyer; Aashay Mehta; Yilong Qin; Boris Dadachev; Kishore Papineni; Sanjiv Kumar; Andrej Risteski;
480	Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE).	Zhenyu He; Guhao Feng; Shengjie Luo; Kai Yang; Liwei Wang; Jingjing Xu; Zhi Zhang; Hongxia Yang; Di He;
481	Bias of Stochastic Gradient Descent or The Architecture: Disentangling The Effects of Overparameterization of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to disentangle the factors that influence generalization stemming from optimization and architectural choices by studying random and SGD-optimized networks that achieve zero training error.	Amit Peleg; Matthias Hein;
482	On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns.	Jerry Yao-Chieh Hu; Thomas Lin; Zhao Song; Han Liu;
483	Outlier-Efficient Hopfield Layers for Large Transformer-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an Outlier-Efficient Modern Hopfield Model (termed `OutEffHop`) and use it to address the outlier inefficiency problem of training gigantic transformer-based models.	Jerry Yao-Chieh Hu; Pei-Hsuan Chang; Haozheng Luo; Hong-Yu Chen; Weijian Li; Wei-Po Wang; Han Liu;
484	Stereo Risk: A Continuous Modeling Approach to Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision.	Ce Liu; Suryansh Kumar; Shuhang Gu; Radu Timofte; Yao Yao; Luc Van Gool;
485	Accurate LoRA-Finetuning Quantization of LLMs Via Information Retention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention.	Haotong Qin; Xudong Ma; Xingyu Zheng; Xiaoyang Li; Yang Zhang; Shouda Liu; Jie Luo; Xianglong Liu; Michele Magno;
486	Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish the first theoretical comparisons between two leading generative SSL paradigms: autoregressive SSL and masked SSL.	Qi Zhang; Tianqi Du; Haotian Huang; Yifei Wang; Yisen Wang;
487	Provably Robust DPO: Aligning Language Models with Noisy Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While practitioners have recently proposed heuristics to mitigate the effect of noisy preferences, a complete theoretical understanding of their workings remain elusive. In this work, we aim to bridge this gap by introducing a general framework for policy optimization in the presence of random preference flips.	Sayak Ray Chowdhury; Anush Kini; Nagarajan Natarajan;
488	Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we confront the reward overoptimization problem in diffusion model alignment through the lenses of both inductive and primacy biases.	Ziyi Zhang; Sen Zhang; Yibing Zhan; Yong Luo; Yonggang Wen; Dacheng Tao;
489	Online Linear Regression in Dynamic Environments Via Discounting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees even in the complete absence of prior knowledge. We present a novel analysis showing that a discounted variant of the Vovk-Azoury-Warmuth forecaster achieves dynamic regret of the form $R_{T}(\vec{u})\le O\Big(d\log(T)\vee \sqrt{dP_{T}^{\gamma}(\vec{u})T}\Big)$, where $P_{T}^{\gamma}(\vec{u})$ is a measure of variability of the comparator sequence, and show that the discount factor achieving this result can be learned on-the-fly.	Andrew Jacobsen; Ashok Cutkosky;
490	Unsupervised Zero-Shot Reinforcement Learning Via Functional Reward Encodings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem.	Kevin Frans; Seohong Park; Pieter Abbeel; Sergey Levine;
491	Learning to Play Atari in A World of Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior.	Pranav Agarwal; Sheldon Andrews; Samira Ebrahimi Kahou;
492	Structured Chemistry Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Interestingly, the errors often stem not from a lack of domain knowledge within the LLMs, but rather from the absence of an effective reasoning {\it structure} that guides the LLMs to elicit the right knowledge, incorporate the knowledge in step-by-step reasoning, and iteratively refine results for further improved quality. On this basis, we introduce StructChem, a simple yet effective prompting strategy that offers the desired guidance and substantially boosts the LLMs’ chemical reasoning capability.	Siru Ouyang; Zhuosheng Zhang; Bing Yan; Xuan Liu; Yejin Choi; Jiawei Han; Lianhui Qin;
493	DOGE: Domain Reweighting with Generalization Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DOmain reweighting with Generalization Estimation (DoGE), which optimizes the probability of sampling from each domain (domain weights) in a principled way.	Simin Fan; Matteo Pagliardini; Martin Jaggi;
494	SceneCraft: An LLM Agent for Synthesizing 3D Scenes As Blender Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts which render complex scenes with up to a hundred 3D assets.	Ziniu Hu; Ahmet Iscen; Aashi Jain; Thomas Kipf; Yisong Yue; David A Ross; Cordelia Schmid; Alireza Fathi;
495	Robust Stable Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose a training framework with modified SNN neurons and to reduce the mean square of membrane potential perturbation aiming at enhancing the robustness of SNN.	Jianhao Ding; Zhiyu Pan; Yujia Liu; Zhaofei Yu; Tiejun Huang;
496	PrE-Text: Training Language Models on Private Federated Data in The Age of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large models on-device, (2) on-device training is communication- and computation-intensive, and (3) on-device training can be difficult to debug and deploy. To address these problems, we propose Private Evolution-Text (PrE-Text), a method for generating differentially private (DP) synthetic textual data.	Charlie Hou; Akshat Shrivastava; Hongyuan Zhan; Rylan Conway; Trang Le; Adithya Sagar; Giulia Fanti; Daniel Lazar;
497	Proactive Detection of Voice Cloning with Localized Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.	Robin San Roman; Pierre Fernandez; Hady Elsahar; Alexandre Défossez; Teddy Furon; Tuan Tran;
498	OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to learn all plausible 3D scene configurations that match the input video, instead of just inferring a specific one.	Ziyang Song; Jinxi Li; Bo Yang;
499	Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization.	Aleksandr Beznosikov; David Dobre; Gauthier Gidel;
500	Graph As Point Set Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, this paper introduces a novel graph-to-set conversion method that bijectively transforms interconnected nodes into a set of independent points and then uses a set encoder to learn the graph representation.	Xiyuan Wang; Pan Li; Muhan Zhang;

This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~2,600 papers), please visit Paper Digest: ICML-2024 (Full List).