Paper Digest: ICML 2024 Highlights
Note: ICML-2024 accepts more than 2,600 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 2,600 ICML-2024 papers in a separate page.
To search or review papers within ICML-2024 related to a specific topic, please use the search by venue (ICML-2024), review by venue (ICML-2024) and question answering by venue (ICML-2024) services. To browse papers by author, here is a list of all authors (ICML-2024). You may also like to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICML 2024 Highlights
Paper | Author(s) | |
---|---|---|
1 | Better & Faster Large Language Models Via Multi-token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample efficiency. |
Fabian Gloeckle; Badr Youbi Idrissi; Baptiste Roziere; David Lopez-Paz; Gabriel Synnaeve; |
2 | Transformers Are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While Transformers have been the main architecture behind deep learning’s success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention, connected through various decompositions of a well-studied class of structured *semiseparable matrices*. |
Tri Dao; Albert Gu; |
3 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. |
Patrick Esser; Sumith Kulal; Andreas Blattmann; Rahim Entezari; Jonas Müller; Harry Saini; Yam Levi; Dominik Lorenz; Axel Sauer; Frederic Boesel; Dustin Podell; Tim Dockhorn; Zion English; Robin Rombach; |
4 | Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Medusa, an efficient method that augments LLM inference by adding extra decoding heads to predict multiple subsequent tokens in parallel. |
Tianle Cai; Yuhong Li; Zhengyang Geng; Hongwu Peng; Jason D. Lee; Deming Chen; Tri Dao; |
5 | Improving Factuality and Reasoning in Language Models Through Multiagent Debate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer. |
Yilun Du; Shuang Li; Antonio Torralba; Joshua B. Tenenbaum; Igor Mordatch; |
6 | How Language Model Hallucinations Can Snowball Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.To do this, we construct three question-answering datasets where LMs often state an incorrect answer which is followed by an explanation with at least one incorrect claim. |
Muru Zhang; Ofir Press; William Merrill; Alisa Liu; Noah A. Smith; |
7 | Chatbot Arena: An Open Platform for Evaluating LLMs By Human Preference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using for efficient and accurate evaluation and ranking of models. |
Wei-Lin Chiang; Lianmin Zheng; Ying Sheng; Anastasios Nikolas Angelopoulos; Tianle Li; Dacheng Li; Banghua Zhu; Hao Zhang; Michael Jordan; Joseph E. Gonzalez; Ion Stoica; |
8 | R2E: Turning Any Github Repository Into A Programming Agent Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Repository to Environment (R2E), a framework that can turn any GitHub repository into a test environment to evaluate the performance of code-generating systems, both static and interactive. |
Naman Jain; Manish Shetty; Tianjun Zhang; King Han; Koushik Sen; Ion Stoica; |
9 | Premise Order Matters in Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that even if the model performance is decent on the optimal order, permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark. |
Xinyun Chen; Ryan Andrew Chi; Xuezhi Wang; Denny Zhou; |
10 | Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. |
Chrisantha Fernando; Dylan Sunil Banarse; Henryk Michalewski; Simon Osindero; Tim Rocktäschel; |
11 | Stay on Topic with Classifier-Free Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate across a wide array of benchmarks that CFG can be used broadly as an inference-time technique in pure language modeling. |
Guillaume Sanchez; Alexander Spangher; Honglu Fan; Elad Levi; Stella Biderman; |
12 | RLAIF Vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Across the tasks of summarization, helpful dialogue generation, and harmless dialogue generation, we show that RLAIF achieves comparable performance to RLHF. Furthermore, we take a step towards self-improvement by demonstrating that RLAIF can outperform a supervised fine-tuned baseline even when the AI labeler is the same size as the policy, or even the exact same checkpoint as the initial policy. |
Harrison Lee; Samrat Phatale; Hassan Mansoor; Thomas Mesnard; Johan Ferret; Kellie Ren Lu; Colton Bishop; Ethan Hall; Victor Carbune; Abhinav Rastogi; Sushant Prakash; |
13 | DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, encoding a complex, potentially multimodal data distribution into a single *continuous* Gaussian distribution arguably represents an unnecessarily challenging learning problem. We propose ***Dis**crete-**Co**ntinuous Latent Variable **Diff**usion Models (DisCo-Diff)* to simplify this task by introducing complementary *discrete* latent variables. |
Yilun Xu; Gabriele Corso; Tommi Jaakkola; Arash Vahdat; Karsten Kreis; |
14 | ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality AI feedback automatically for a scalable alternative. |
Ganqu Cui; Lifan Yuan; Ning Ding; Guanming Yao; Bingxiang He; Wei Zhu; Yuan Ni; Guotong Xie; Ruobing Xie; Yankai Lin; Zhiyuan Liu; Maosong Sun; |
15 | Grokking Group Multiplication with Cosets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on previous work, we completely reverse engineer fully connected one-hidden layer networks that have “grokked” the arithmetic of the permutation groups $S_5$ and $S_6$. The models discover the true subgroup structure of the full group and converge on neural circuits that decompose the group arithmetic using the permutation group’s subgroups. |
Dashiell Stander; Qinan Yu; Honglu Fan; Stella Biderman; |
16 | Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called Align Your Steps. |
Amirmojtaba Sabour; Sanja Fidler; Karsten Kreis; |
17 | Disentangled 3D Scene Generation with Layout Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method to generate 3D scenes that are disentangled into their component objects. |
Dave Epstein; Ben Poole; Ben Mildenhall; Alexei A Efros; Aleksander Holynski; |
18 | Stealing Part of A Production Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2. |
Nicholas Carlini; Daniel Paleka; Krishnamurthy Dj Dvijotham; Thomas Steinke; Jonathan Hayase; A. Feder Cooper; Katherine Lee; Matthew Jagielski; Milad Nasr; Arthur Conmy; Eric Wallace; David Rolnick; Florian Tramèr; |
19 | Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We primarily question whether the use of large Web-scraped datasets *should* be viewed as differential-privacy-preserving. |
Florian Tramèr; Gautam Kamath; Nicholas Carlini; |
20 | Fast Adversarial Attacks on Language Models In One GPU Minute Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel class of fast, beam search-based adversarial attack (BEAST) for Language Models (LMs). |
Vinu Sankar Sadasivan; Shoumik Saha; Gaurang Sriramanan; Priyatham Kattakinda; Atoosa Chegini; Soheil Feizi; |
21 | Training Large Language Models for Reasoning Through Reverse Curriculum Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose **R**$^3$: Learning **R**easoning through **R**everse Curriculum **R**einforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. |
Zhiheng Xi; Wenxiang Chen; Boyang Hong; Senjie Jin; Rui Zheng; Wei He; Yiwen Ding; Shichun Liu; Xin Guo; Junzhe Wang; Honglin Guo; Wei Shen; Xiaoran Fan; Yuhao Zhou; Shihan Dou; Xiao Wang; Xinbo Zhang; peng sun; Tao Gui; Qi Zhang; Xuanjing Huang; |
22 | Compositional Image Decomposition with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a method to decompose an image into such compositional components. |
Jocelin Su; Nan Liu; Yanbo Wang; Joshua B. Tenenbaum; Yilun Du; |
23 | Learning Iterative Reasoning Through Energy Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. |
Yilun Du; Jiayuan Mao; Joshua B. Tenenbaum; |
24 | Position: Compositional Generative Modeling: A Single Model Is Not All You Need Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. |
Yilun Du; Leslie Pack Kaelbling; |
25 | Potential Based Diffusion Motion Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach towards learning potential based motion planning, where we train a neural network to capture and learn an easily optimizable potentials over motion planning trajectories. |
Yunhao Luo; Chen Sun; Joshua B. Tenenbaum; Yilun Du; |
26 | LESS: Selecting Influential Data for Targeted Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose LESS, an optimizer-aware and practically efficient algorithm to estimate data influences and perform **L**ow-rank gradi**E**nt **S**imilarity **S**earch for instruction data selection.To facilitate future work, we release code and data at [princeton-nlp/LESS](https://github.com/princeton-nlp/LESS). |
Mengzhou Xia; Sadhika Malladi; Suchin Gururangan; Sanjeev Arora; Danqi Chen; |
27 | Position: Data-driven Discovery with Large Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We instead advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms, to foster data-driven scientific discoveries with efficiency and reproducibility. |
Bodhisattwa Prasad Majumder; Harshit Surana; Dhruv Agarwal; Sanchaita Hazra; Ashish Sabharwal; Peter Clark; |
28 | Genie: Generative Interactive Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Genie, the first *generative interactive environment* trained in an unsupervised manner from unlabelled Internet videos. |
Jake Bruce; Michael D Dennis; Ashley Edwards; Jack Parker-Holder; Yuge Shi; Edward Hughes; Matthew Lai; Aditi Mavalankar; Richie Steigerwald; Chris Apps; Yusuf Aytar; Sarah Maria Elisabeth Bechtle; Feryal Behbahani; Stephanie C.Y. Chan; Nicolas Heess; Lucy Gonzalez; Simon Osindero; Sherjil Ozair; Scott Reed; Jingwei Zhang; Konrad Zolna; Jeff Clune; Nando de Freitas; Satinder Singh; Tim Rocktäschel; |
29 | Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a framework called Patchscopes and show how it can be used to answer a wide range of questions about an LLM’s computation. |
Asma Ghandeharioun; Avi Caciularu; Adam Pearce; Lucas Dixon; Mor Geva; |
30 | Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to *weakly supervise* superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? |
Collin Burns; Pavel Izmailov; Jan Hendrik Kirchner; Bowen Baker; Leo Gao; Leopold Aschenbrenner; Yining Chen; Adrien Ecoffet; Manas Joglekar; Jan Leike; Ilya Sutskever; Jeffrey Wu; |
31 | Language Models with Conformal Factuality Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose conformal factuality, a framework that can ensure high probability correctness guarantees for LMs by connecting language modeling and conformal prediction. |
Christopher Mohri; Tatsunori Hashimoto; |
32 | Equivariant Graph Neural Operator for Modeling 3D Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Equivariant Graph Neural Operator (EGNO), a novel and principled method that directly models dynamics as trajectories instead of just next-step prediction. |
Minkai Xu; Jiaqi Han; Aaron Lou; Jean Kossaifi; Arvind Ramanathan; Kamyar Azizzadenesheli; Jure Leskovec; Stefano Ermon; Anima Anandkumar; |
33 | Position: Open-Endedness Is Essential for Artificial Superhuman Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the creation of open-ended, ever self-improving AI remains elusive. **In this position paper, we argue that the ingredients are now in place to achieve *open-endedness* in AI systems with respect to a human observer. |
Edward Hughes; Michael D Dennis; Jack Parker-Holder; Feryal Behbahani; Aditi Mavalankar; Yuge Shi; Tom Schaul; Tim Rocktäschel; |
34 | Magicoder: Empowering Code Generation with OSS-Instruct Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. |
Yuxiang Wei; Zhe Wang; Jiawei Liu; Yifeng Ding; LINGMING ZHANG; |
35 | HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. |
Mantas Mazeika; Long Phan; Xuwang Yin; Andy Zou; Zifan Wang; Norman Mu; Elham Sakhaee; Nathaniel Li; Steven Basart; Bo Li; David Forsyth; Dan Hendrycks; |
36 | MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. |
Weihao Yu; Zhengyuan Yang; Linjie Li; Jianfeng Wang; Kevin Lin; Zicheng Liu; Xinchao Wang; Lijuan Wang; |
37 | NExT-GPT: Any-to-Any Multimodal LLM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. |
Shengqiong Wu; Hao Fei; Leigang Qu; Wei Ji; Tat-Seng Chua; |
38 | Linguistic Calibration of Long-Form Generations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue can be mitigated by having the LM verbally convey the probability that its claims are correct, but existing models cannot produce long-form text with calibrated confidence statements. Through the lens of decision-making, we define linguistic calibration for long-form generations: an LM is linguistically calibrated if its generations enable its users to make calibrated probabilistic predictions. |
Neil Band; Xuechen Li; Tengyu Ma; Tatsunori Hashimoto; |
39 | Rolling Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores Rolling Diffusion: a new approach that uses a sliding window denoising process. |
David Ruhe; Jonathan Heek; Tim Salimans; Emiel Hoogeboom; |
40 | Debating with More Persuasive LLMs Leads to More Truthful Answers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. |
Akbir Khan; John Hughes; Dan Valentine; Laura Ruis; Kshitij Sachan; Ansh Radhakrishnan; Edward Grefenstette; Samuel R. Bowman; Tim Rocktäschel; Ethan Perez; |
41 | IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We significantly improve multi-view generation by considering video instead of image generators. |
Luke Melas-Kyriazi; Iro Laina; Christian Rupprecht; Natalia Neverova; Andrea Vedaldi; Oran Gafni; Filippos Kokkinos; |
42 | Prismatic VLMs: Investigating The Design Space of Visually-Conditioned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the volume of new releases, key design decisions around image preprocessing, architecture, and optimization are under-explored, making it challenging to understand what factors account for model performance � a challenge further complicated by the lack of objective, consistent evaluations. To address these gaps, we first compile a suite of standardized evaluations spanning visual question answering, object localization, and challenge sets that probe properties such as hallucination; evaluations that provide fine-grained insight VLM capabilities. |
Siddharth Karamcheti; Suraj Nair; Ashwin Balakrishna; Percy Liang; Thomas Kollar; Dorsa Sadigh; |
43 | Learning to Route Among Specialized Experts for Zero-Shot Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose $\textbf{P}$ost-$\textbf{H}$oc $\textbf{A}$daptive $\textbf{T}$okenwise $\textbf{G}$ating $\textbf{O}$ver an $\textbf{O}$cean of $\textbf{S}$pecialized $\textbf{E}$xperts (**PHATGOOSE**), which learns to route among specialized modules that were produced through parameter-efficient fine-tuning. |
Mohammed Muqeeth; Haokun Liu; Yufan Liu; Colin Raffel; |
44 | Mechanistic Design and Scaling of Hybrid Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling laws. |
Michael Poli; Armin W Thomas; Eric Nguyen; Pragaash Ponnusamy; Björn Deiseroth; Kristian Kersting; Taiji Suzuki; Brian Hie; Stefano Ermon; Christopher Re; Ce Zhang; Stefano Massaroli; |
45 | Does Label Smoothing Help Deep Partial Label Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In theory, we prove lower and upper bounds of the expected risk to show that label smoothing can help deep PLL. |
Xiuwen Gong; Nitin Bisht; Guandong Xu; |
46 | Learning to Explore in POMDPs with Informational Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a POMDP agent that gathers information about the hidden state, using ideas from the meta-exploration literature. |
Annie Xie; Logan Mondal Bhamidipaty; Evan Zheran Liu; Joey Hong; Sergey Levine; Chelsea Finn; |
47 | Fundamental Limitations of Alignment in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models. |
Yotam Wolf; Noam Wies; Oshri Avnery; Yoav Levine; Amnon Shashua; |
48 | Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the reliance on self-attention for visual representation learning is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models. |
Lianghui Zhu; Bencheng Liao; Qian Zhang; Xinlong Wang; Wenyu Liu; Xinggang Wang; |
49 | Online Conformal Prediction with Decaying Step Sizes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method for online conformal prediction with decaying step sizes. |
Anastasios Nikolas Angelopoulos; Rina Barber; Stephen Bates; |
50 | Monitoring AI-Modified Content at Scale: A Case Study on The Impact of ChatGPT on AI Conference Peer Reviews Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). |
Weixin Liang; Zachary Izzo; Yaohui Zhang; Haley Lepp; Hancheng Cao; Xuandong Zhao; Lingjiao Chen; Haotian Ye; Sheng Liu; Zhi Huang; Daniel McFarland; James Y. Zou; |
51 | Graph Positional and Structural Encoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present the Graph Positional and Structural Encoder (GPSE), the first-ever graph encoder designed to capture rich PSE representations for augmenting any GNN. |
Semih Cantürk; Renming Liu; Olivier Lapointe-Gagné; Vincent Létourneau; Guy Wolf; Dominique Beaini; Ladislav Rampášek; |
52 | Model Alignment As Prospect Theoretic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using a Kahneman-Tversky model of human utility, we propose a HALO that directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do. |
Kawin Ethayarajh; Winnie Xu; Niklas Muennighoff; Dan Jurafsky; Douwe Kiela; |
53 | Neural Operators with Localized Integral and Differential Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a principled approach to operator learning that can capture local features under two frameworks by learning differential operators and integral operators with locally supported kernels. |
Miguel Liu-Schiaffini; Julius Berner; Boris Bonev; Thorsten Kurth; Kamyar Azizzadenesheli; Anima Anandkumar; |
54 | Position: The No Free Lunch Theorem, Kolmogorov Complexity, and The Role of Inductive Biases in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Whereas no free lunch theorems seemingly indicate that individual problems require specialized learners, we explain how tasks that often require human intervention such as picking an appropriately sized model when labeled data is scarce or plentiful can be automated into a single learning algorithm. |
Micah Goldblum; Marc Anton Finzi; Keefer Rowan; Andrew Gordon Wilson; |
55 | Q-Probe: A Lightweight Approach to Reward Maximization for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function. |
Kenneth Li; Samy Jelassi; Hugh Zhang; Sham M. Kakade; Martin Wattenberg; David Brandfonbrener; |
56 | Test-Time Model Adaptation with Only Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. |
Shuaicheng Niu; Chunyan Miao; Guohao Chen; Pengcheng Wu; Peilin Zhao; |
57 | Solving Poisson Equations Using Neural Walk-on-Spheres Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Neural Walk-on-Spheres (NWoS), a novel neural PDE solver for the efficient solution of high-dimensional Poisson equations. |
Hong Chul Nam; Julius Berner; Anima Anandkumar; |
58 | StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation stroke tokens on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. |
Zecheng Tang; Chenfei Wu; Zekai Zhang; Minheng Ni; Shengming Yin; Yu Liu; Zhengyuan Yang; Lijuan Wang; Zicheng Liu; Juntao Li; Nan Duan; |
59 | NExT: Teaching Large Language Models to Reason About Code Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of how programs execute at run-time. To address this issue, we propose NExT, a method to teach LLMs to inspect the execution traces of programs (variable states of executed lines) and reason about their run-time behavior through chain-of-thought (CoT) rationales. |
Ansong Ni; Miltiadis Allamanis; Arman Cohan; Yinlin Deng; Kensen Shi; Charles Sutton; Pengcheng Yin; |
60 | Interpretability Illusions in The Generalization of Simplified Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we illustrate an important caveat to this assumption: even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model’s behavior out of distribution. We illustrate this by training Transformer models on controlled datasets with systematic generalization splits, including the Dyck balanced-parenthesis languages and a code completion task. |
Dan Friedman; Andrew Kyle Lampinen; Lucas Dixon; Danqi Chen; Asma Ghandeharioun; |
61 | Language Models As Science Tutors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this, we introduce TutorEval and TutorChat.Our datasets build on open-source materials, and we release our models, data, and evaluations publicly. |
Alexis Chevalier; Jiayi Geng; Alexander Wettig; Howard Chen; Sebastian Mizera; Toni Annala; Max Aragon; Arturo Rodriguez Fanlo; Simon Frieder; Simon Machado; Akshara Prabhakar; Ellie Thieu; Jiachen T. Wang; Zirui Wang; Xindi Wu; Mengzhou Xia; Wenhan Xia; Jiatong Yu; Junjie Zhu; Zhiyong Ren; Sanjeev Arora; Danqi Chen; |
62 | The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To guide progress on unlearning, we develop RMU, a state-of-the-art unlearning method based on controlling model representations.We release our benchmark and code publicly at https://wmdp.ai. |
Nathaniel Li; Alexander Pan; Anjali Gopal; Summer Yue; Daniel Berrios; Alice Gatti; Justin D. Li; Ann-Kathrin Dombrowski; Shashwat Goel; Gabriel Mukobi; Nathan Helm-Burger; Rassin Lababidi; Lennart Justen; Andrew Bo Liu; Michael Chen; Isabelle Barrass; Oliver Zhang; Xiaoyuan Zhu; Rishub Tamirisa; Bhrugu Bharathi; Ariel Herbert-Voss; Cort B Breuer; Andy Zou; Mantas Mazeika; Zifan Wang; Palash Oswal; Weiran Lin; Adam Alfred Hunt; Justin Tienken-Harder; Kevin Y. Shih; Kemper Talley; John Guan; Ian Steneker; David Campbell; Brad Jokubaitis; Steven Basart; Stephen Fitz; Ponnurangam Kumaraguru; Kallol Krishna Karmakar; Uday Tupakula; Vijay Varadharajan; Yan Shoshitaishvili; Jimmy Ba; Kevin M. Esvelt; Alexandr Wang; Dan Hendrycks; |
63 | How Learning By Reconstruction Produces Uninformative Features For Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite interpretability benefit of reconstruction and generation, we identify a misalignment between learning to reconstruct, and learning for perception. |
Randall Balestriero; Yann LeCun; |
64 | Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the Hourglass Diffusion Transformer (HDiT), an image-generative model that exhibits linear scaling with pixel count, supporting training at high resolution (e.g. $1024 \times 1024$) directly in pixel-space. |
Katherine Crowson; Stefan Andreas Baumann; Alex Birch; Tanishq Mathew Abraham; Daniel Z Kaplan; Enrico Shippole; |
65 | RLVF: Learning from Verbal Feedback Without Overgeneralization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method Contextualized Critiques with Constrained Preference Optimization (C3PO) to learn from high-level verbal feedback while reducing overgeneralization compared to current work. |
Moritz Pascal Stephan; Alexander Khazatsky; Eric Mitchell; Annie S Chen; Sheryl Hsu; Archit Sharma; Chelsea Finn; |
66 | Distinguishing The Knowable from The Unknowable with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the feasibility of identifying *epistemic* uncertainty (reflecting a lack of knowledge), as opposed to *aleatoric* uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. |
Gustaf Ahdritz; Tian Qin; Nikhil Vyas; Boaz Barak; Benjamin L. Edelman; |
67 | MathScale: Scaling Instruction Tuning for Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e.g., GPT-3.5).As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs. |
Zhengyang Tang; Xingxing Zhang; Benyou Wang; Furu Wei; |
68 | InstructSpeech: Following Speech Editing Instructions Via Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we construct triplet paired data (instruction, input speech, output speech) to alleviate data scarcity and train a multi-task large language model named InstructSpeech. |
Rongjie Huang; Ruofan Hu; Yongqi Wang; Zehan Wang; Xize Cheng; Ziyue Jiang; Zhenhui Ye; Dongchao Yang; Luping Liu; Peng Gao; Zhou Zhao; |
69 | Position: Video As The New Language for Real-World Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. |
Sherry Yang; Jacob C Walker; Jack Parker-Holder; Yilun Du; Jake Bruce; Andre Barreto; Pieter Abbeel; Dale Schuurmans; |
70 | Retrieval-Augmented Score Distillation for Text-to-3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. |
Junyoung Seo; Susung Hong; Wooseok Jang; Inès Hyeonsu Kim; Min-Seop Kwak; Doyup Lee; Seungryong Kim; |
71 | PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel visual prompting approach for VLMs that we call Prompting with Iterative Visual Optimization (PIVOT), which casts tasks as iterative visual question answering. |
Soroush Nasiriany; Fei Xia; Wenhao Yu; Ted Xiao; Jacky Liang; Ishita Dasgupta; Annie Xie; Danny Driess; Ayzaan Wahid; Zhuo Xu; Quan Vuong; Tingnan Zhang; Tsang-Wei Edward Lee; Kuang-Huei Lee; Peng Xu; Sean Kirmani; Yuke Zhu; Andy Zeng; Karol Hausman; Nicolas Heess; Chelsea Finn; Sergey Levine; brian ichter; |
72 | Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better leverage pretraining for distribution shifts, we propose the Connect Later framework, which fine-tunes the model with targeted augmentations designed with knowledge of the shift. |
Helen Qu; Sang Michael Xie; |
73 | Modeling Caption Diversity in Contrastive Vision-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. |
Samuel Lavoie; Polina Kirichenko; Mark Ibrahim; Mido Assran; Andrew Gordon Wilson; Aaron Courville; Nicolas Ballas; |
74 | Position: Levels of AGI for Operationalizing Progress on The Path to AGI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. |
Meredith Ringel Morris; Jascha Sohl-Dickstein; Noah Fiedel; Tris Warkentin; Allan Dafoe; Aleksandra Faust; Clement Farabet; Shane Legg; |
75 | UniAudio: Towards Universal Audio Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As recent research on large language models (LLMs) has demonstrated their strong ability to handle multiple tasks, this work presents UniAudio, an LLM-based audio generation model that supports a wide range of audio generation tasks. |
Dongchao Yang; Jinchuan Tian; Xu Tan; Rongjie Huang; Songxiang Liu; Haohan Guo; Xuankai Chang; Jiatong Shi; sheng zhao; Jiang Bian; Zhou Zhao; Xixin Wu; Helen M. Meng; |
76 | Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce *Generalized* ***E****ncoding*-***D****ecoding ****D****iffusion ****P****robabilistic ****M****odels* (EDDPMs) which integrate the core capabilities for broad applicability and enhanced performance. |
Guangyi Liu; Yu Wang; Zeyu Feng; Qiyu Wu; Liping Tang; Yuan Gao; Zhen Li; Shuguang Cui; Julian McAuley; Zichao Yang; Eric P. Xing; Zhiting Hu; |
77 | Revisiting The Role of Language Priors in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study $\textit{generative VLMs}$ that are trained for next-word generation given an image. |
Zhiqiu Lin; Xinyue Chen; Deepak Pathak; Pengchuan Zhang; Deva Ramanan; |
78 | MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. |
Zechun Liu; Changsheng Zhao; Forrest Iandola; Chen Lai; Yuandong Tian; Igor Fedorov; Yunyang Xiong; Ernie Chang; Yangyang Shi; Raghuraman Krishnamoorthi; Liangzhen Lai; Vikas Chandra; |
79 | InstructRetro: Instruction Tuning Post Retrieval-Augmented Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval. |
Boxin Wang; Wei Ping; Lawrence McAfee; Peng Xu; Bo Li; Mohammad Shoeybi; Bryan Catanzaro; |
80 | Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to shed the light on LLMs inner mechanisms through the lens of geometry. |
Randall Balestriero; Romain Cosentino; Sarath Shekkizhar; |
81 | Compositional Text-to-Image Generation with Dense Blob Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage the compositionality of large language models (LLMs), we introduce a new in-context learning approach to generate blob representations from text prompts. |
Weili Nie; Sifei Liu; Morteza Mardani; Chao Liu; Benjamin Eckart; Arash Vahdat; |
82 | Position Paper: On The Societal Impact of Open Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on *open* foundation models, defined here as those with broadly available model weights (e.g., Llama 3, Stable Diffusion XL). |
Sayash Kapoor; Rishi Bommasani; Kevin Klyman; Shayne Longpre; Ashwin Ramaswami; Peter Cihon; Aspen K Hopkins; Kevin Bankston; Stella Biderman; Miranda Bogen; Rumman Chowdhury; Alex Engler; Peter Henderson; Yacine Jernite; Seth Lazar; Stefano Maffulli; Alondra Nelson; Joelle Pineau; Aviya Skowron; Dawn Song; Victor Storchan; Daniel Zhang; Daniel E. Ho; Percy Liang; Arvind Narayanan; |
83 | Scalable Pre-training of Large Autoregressive Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. |
Alaaeldin El-Nouby; Michal Klein; Shuangfei Zhai; Miguel Ángel Bautista; Vaishaal Shankar; Alexander T Toshev; Joshua M. Susskind; Armand Joulin; |
84 | Unlocking The Power of Spatial and Temporal Information in Medical Multimodal Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Med-ST framework for fine-grained spatial and temporal modeling to exploit information from multiple spatial views of chest radiographs and temporal historical records. |
Jinxia Yang; Bing Su; Xin Zhao; Ji-Rong Wen; |
85 | QuRating: Selecting High-Quality Data for Training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality. |
Alexander Wettig; Aatmik Gupta; Saumya Malik; Danqi Chen; |
86 | MC-GTA: Metric-Constrained Model-Based Clustering Using Goodness-of-fit Tests with Autocorrelations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The model-based variations of these clustering algorithms (e.g. TICC and STICC) achieve SOTA performance, yet suffer from computational instability and complexity by using a metric-constrained Expectation-Maximization procedure. In order to address these two problems, we propose a novel clustering algorithm, MC-GTA (**M**odel-based **C**lustering via **G**oodness-of-fit **T**ests with **A**utocorrelations). |
Zhangyu Wang; Gengchen Mai; Krzysztof Janowicz; Ni Lao; |
87 | Assessing The Brittleness of Safety Alignment Via Pruning and Low-Rank Modifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop methods to identify critical regions that are vital for safety guardrails, and that are disentangled from utility-relevant regions at both the neuron and rank levels. |
Boyi Wei; Kaixuan Huang; Yangsibo Huang; Tinghao Xie; Xiangyu Qi; Mengzhou Xia; Prateek Mittal; Mengdi Wang; Peter Henderson; |
88 | In-Context Unlearning: Language Models As Few-Shot Unlearners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new class of unlearning methods for LLMs called “In-Context Unlearning.” |
Martin Pawelczyk; Seth Neel; Himabindu Lakkaraju; |
89 | Position Paper: Scaling Simulation Is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a structured critique of robotic simulations for real-world manipulation, by arguing that scaling simulators is neither necessary nor sufficient for making progress in general-purpose real-world robotic manipulation agents that are compliant with human preferences. |
Homanga Bharadhwaj; |
90 | In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in-context learning has seen limited effectiveness in many settings, is difficult to quantitatively control and takes up context window space. To overcome these limitations, we propose an alternative approach that recasts in-context learning as in-context vectors (ICV). |
Sheng Liu; Haotian Ye; Lei Xing; James Y. Zou; |
91 | Neural Networks Learn Statistics of Increasing Complexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The _distributional simplicity bias_ (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. |
Nora Belrose; Quintin Pope; Lucia Quirke; Alex Troy Mallen; Xiaoli Fern; |
92 | Offline Training of Language Model Agents with Functions As Learnable Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modifications. |
Shaokun Zhang; Jieyu Zhang; Jiale Liu; Linxin Song; Chi Wang; Ranjay Krishna; Qingyun Wu; |
93 | Chain of Code: Reasoning with A Language Model-Augmented Code Emulator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning. |
Chengshu Li; Jacky Liang; Andy Zeng; Xinyun Chen; Karol Hausman; Dorsa Sadigh; Sergey Levine; Li Fei-Fei; Fei Xia; brian ichter; |
94 | Prodigy: An Expeditiously Adaptive Parameter-Free Learner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Prodigy, an algorithm that provably estimates the distance to the solution $D$, which is needed to set the learning rate optimally. |
Konstantin Mishchenko; Aaron Defazio; |
95 | Projecting Molecules Into Synthesizable Chemical Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel framework that is capable of generating new chemical structures while ensuring synthetic accessibility. |
Shitong Luo; Wenhao Gao; Zuofan Wu; Jian Peng; Connor W. Coley; Jianzhu Ma; |
96 | Learning Divergence Fields for Shift-Robust Graph Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging generalization problem with interdependent data. |
Qitian Wu; Fan Nie; Chenxiao Yang; Junchi Yan; |
97 | Graph Neural Networks Use Graphs When They Shouldn’t Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While GNNs have the ability to ignore the graph-structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure in the sense that they use it even when a better solution can be obtained by ignoring it. |
Maya Bechler-Speicher; Ido Amos; Ran Gilad-Bachrach; Amir Globerson; |
98 | Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. |
Abhimanyu Hans; Avi Schwarzschild; Valeriia Cherepanova; Hamid Kazemi; Aniruddha Saha; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
99 | Explorations of Self-Repair in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We highlight two different mechanisms that contribute to self-repair, including changes in the final LayerNorm scaling factor and sparse sets of neurons implementing Anti-Erasure. |
Cody Rushing; Neel Nanda; |
100 | Position: A Roadmap to Pluralistic Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, aligning models to serve *pluralistic* human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using large language models as a test bed. |
Taylor Sorensen; Jared Moore; Jillian Fisher; Mitchell L Gordon; Niloofar Mireshghallah; Christopher Michael Rytting; Andre Ye; Liwei Jiang; Ximing Lu; Nouha Dziri; Tim Althoff; Yejin Choi; |
101 | AI Alignment with Changing and Influenceable Reward Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI’s influence on them. |
Micah Carroll; Davis Foote; Anand Siththaranjan; Stuart Russell; Anca Dragan; |
102 | RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. |
Yao Mu; Junting Chen; Qing-Long Zhang; Shoufa Chen; Qiaojun Yu; Chongjian GE; Runjian Chen; Zhixuan Liang; Mengkang Hu; Chaofan Tao; Peize Sun; Haibao Yu; Chao Yang; Wenqi Shao; Wenhai Wang; Jifeng Dai; Yu Qiao; Mingyu Ding; Ping Luo; |
103 | Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without breaking gradient isolation or introducing any learnable parameters. |
Yibo Yang; Xiaojie Li; Motasem Alfarra; Hasan Abed Al Kader Hammoud; Adel Bibi; Philip Torr; Bernard Ghanem; |
104 | In-Context Sharpness As Alerts: An Inner Representation Perspective for Hallucination Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we aim to understand the underlying mechanisms of LLM hallucinations from the perspective of *inner representations*. |
Shiqi Chen; Miao Xiong; Junteng Liu; Zhengxuan Wu; Teng Xiao; Siyang Gao; Junxian He; |
105 | SILVER: Single-loop Variance Reduction and Application to Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a single-loop variance-reduced gradient estimator named SILVER (SIngle-Loop VariancE-Reduction) for the finite-sum non-convex optimization, which does not require multiple full gradients but nevertheless achieves the optimal gradient complexity. |
Kazusato Oko; Shunta Akiyama; Denny Wu; Tomoya Murata; Taiji Suzuki; |
106 | Regression with Multi-Expert Deferral Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel framework of *regression with deferral*, which involves deferring the prediction to multiple experts. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
107 | $H$-Consistency Guarantees for Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of $H$-consistency bounds for regression. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
108 | Learning to Model The World With Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language—language like this button turns on the TV or I put the bowls away—that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that *agents should interpret such diverse language as a signal that helps them predict the future*: what they will observe, how the world will behave, and which situations will be rewarded. |
Jessy Lin; Yuqing Du; Olivia Watkins; Danijar Hafner; Pieter Abbeel; Dan Klein; Anca Dragan; |
109 | Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an unsupervised adversarial fine-tuning scheme to obtain a robust CLIP vision encoder, which yields robustness on all vision down-stream tasks (LVLMs, zero-shot classification) that rely on CLIP. |
Christian Schlarmann; Naman Deep Singh; Francesco Croce; Matthias Hein; |
110 | Data-free Distillation of Diffusion Models with Bootstrapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing distillation methods either require significant amounts of offline computation for generating synthetic training data from the teacher model, or need to perform expensive online learning with the help of real data. In this work, we present a novel technique called BOOT, that overcomes these limitations with an efficient data-free distillation algorithm. |
Jiatao Gu; Chen Wang; Shuangfei Zhai; Yizhe Zhang; Lingjie Liu; Joshua M. Susskind; |
111 | LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that LLMs themselves have inherent capabilities to handles s long contexts without fine-tuning. |
Hongye Jin; Xiaotian Han; Jingfeng Yang; Zhimeng Jiang; Zirui Liu; Chia-Yuan Chang; Huiyuan Chen; Xia Hu; |
112 | Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to their *myopic perspective*, they escalate the number of query requests, leading to increased costs, memory, and computational overheads. Addressing this, we propose the *Algorithm of Thoughts*—a novel strategy that propels LLMs through algorithmic reasoning pathways. |
Bilgehan Sel; Ahmad Tawaha; Vanshaj Khattar; Ruoxi Jia; Ming Jin; |
113 | Skill Set Optimization: Reinforcing Language Model Behavior Via Transferable Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. |
Kolby Nottingham; Bodhisattwa Prasad Majumder; Bhavana Dalvi Mishra; Sameer Singh; Peter Clark; Roy Fox; |
114 | Universality of Linear Recurrences Followed By Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular causal sequence-to-sequence maps. |
Antonio Orvieto; Soham De; Caglar Gulcehre; Razvan Pascanu; Samuel L Smith; |
115 | Don’t Trust Your Eyes: on The (un)reliability of Feature Visualizations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. |
Robert Geirhos; Roland S. Zimmermann; Blair Bilodeau; Wieland Brendel; Been Kim; |
116 | EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). |
Yanxi Chen; Xuchen Pan; Yaliang Li; Bolin Ding; Jingren Zhou; |
117 | CogBench: A Large Language Model Walks Into A Psychology Lab Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces *CogBench*, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. |
Julian Coda-Forno; Marcel Binz; Jane X Wang; Eric Schulz; |
118 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, and reasoning. |
Kaining Ying; Fanqing Meng; Jin Wang; Zhiqian Li; Han Lin; Yue Yang; Hao Zhang; Wenbo Zhang; Yuqi Lin; Shuo Liu; jiayi lei; Quanfeng Lu; Runjian Chen; Peng Xu; Renrui Zhang; Haozhe Zhang; Peng Gao; Yali Wang; Yu Qiao; Ping Luo; Kaipeng Zhang; Wenqi Shao; |
119 | LoCoCo: Dropping In Convolutions for Long Context Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper tackles the memory hurdle of of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for **Lo**ng **Co**ntext **Co**mpression (**LoCoCo**). |
Ruisi Cai; Yuandong Tian; Zhangyang Wang; Beidi Chen; |
120 | Image Hijacks: Adversarial Images Can Control Generative Models at Runtime Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on the image input to a vision-language model (VLM). |
Luke Bailey; Euan Ong; Stuart Russell; Scott Emmons; |
121 | Can AI Assistants Know What They Don’t Know? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, in this paper, we ask the question **Can AI assistants know what they don’t know and express this awareness through natural language?** To investigate this, we construct a model-specific I don’t know (Idk) dataset. |
Qinyuan Cheng; Tianxiang Sun; Xiangyang Liu; Wenwei Zhang; Zhangyue Yin; Shimin Li; Linyang Li; Zhengfu He; Kai Chen; Xipeng Qiu; |
122 | Discrete Diffusion Modeling By Estimating The Ratios of The Data Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance. |
Aaron Lou; Chenlin Meng; Stefano Ermon; |
123 | Prompting A Pretrained Transformer Can Be A Universal Approximator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Formally, whether prompting and prefix-tuning a pretrained model can universally approximate sequence-to-sequence functions. This paper answers in the affirmative and demonstrates that much smaller pretrained models than previously thought can be universal approximators when prefixed. |
Aleksandar Petrov; Philip Torr; Adel Bibi; |
124 | SHINE: Shielding Backdoors in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SHINE, a backdoor shielding method specific for DRL. |
Zhuowen Yuan; Wenbo Guo; Jinyuan Jia; Bo Li; Dawn Song; |
125 | Executable Code Actions Elicit Better LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes to use executable Python **code** to consolidate LLM agents’ **act**ions into a unified action space (**CodeAct**). |
Xingyao Wang; Yangyi Chen; Lifan Yuan; Yizhe Zhang; Yunzhu Li; Hao Peng; Heng Ji; |
126 | Position: A Safe Harbor for AI Evaluation and Red Teaming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose that major generative AI developers commit to providing a legal and technical safe harbor, protecting public interest safety research and removing the threat of account suspensions or legal reprisal. |
Shayne Longpre; Sayash Kapoor; Kevin Klyman; Ashwin Ramaswami; Rishi Bommasani; Borhane Blili-Hamelin; Yangsibo Huang; Aviya Skowron; Zheng Xin Yong; Suhas Kotha; Yi Zeng; Weiyan Shi; Xianjun Yang; Reid Southen; Alexander Robey; Patrick Chao; Diyi Yang; Ruoxi Jia; Daniel Kang; Alex Pentland; Arvind Narayanan; Percy Liang; Peter Henderson; |
127 | OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One more important contribution of this study is an in-depth analysis of the routing mechanisms within our OpenMoE models, leading to three significant findings: Context-Independent Specialization, Early Routing Learning, and Drop-towards-the-End. |
Fuzhao Xue; Zian Zheng; Yao Fu; Jinjie Ni; Zangwei Zheng; Wangchunshu Zhou; Yang You; |
128 | Robust Yet Efficient Conformal Prediction Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive provably robust sets by bounding the worst-case change in conformity scores. Our tighter bounds lead to more efficient sets. We cover both continuous and discrete (sparse) data and our guarantees work both for evasion and poisoning attacks (on both features and labels). |
Soroush H. Zargarbashi; Mohammad Sadegh Akhondzadeh; Aleksandar Bojchevski; |
129 | An Embodied Generalist Agent in 3D World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce LEO, an embodied multi-modal generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in the 3D world.We collect large-scale datasets comprising diverse object-level and scene-level tasks, which require considerable understanding of and interaction with the 3D world. |
Jiangyong Huang; Silong Yong; Xiaojian Ma; Xiongkun Linghu; Puhao Li; Yan Wang; Qing Li; Song-Chun Zhu; Baoxiong Jia; Siyuan Huang; |
130 | Iterated Denoising Energy Matching for Sampling from Boltzmann Densities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient—and no data samples—to train a diffusion-based sampler. |
Tara Akhound-Sadegh; Jarrid Rector-Brooks; Joey Bose; Sarthak Mittal; Pablo Lemos; Cheng-Hao Liu; Marcin Sendera; Siamak Ravanbakhsh; Gauthier Gidel; Yoshua Bengio; Nikolay Malkin; Alexander Tong; |
131 | Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct our analysis both in terms of a compute budget and real-world costs and find that LLM researchers expecting reasonably large inference demand (~1B requests) should train models smaller and longer than Chinchilla-optimal. |
Nikhil Sardana; Jacob Portes; Sasha Doubov; Jonathan Frankle; |
132 | Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present fundamental limitations of verifying the semantic properties of LLM outputs and identifying compositional threats, illustrating inherent challenges of current approaches to censoring LLM outputs. |
David Glukhov; Ilia Shumailov; Yarin Gal; Nicolas Papernot; Vardan Papyan; |
133 | Q-Align: Teaching LMMs for Visual Scoring Via Discrete Text-Defined Levels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligning with human opinions. Observing that human raters only learn and judge discrete text-defined levels in subjective studies, we propose to emulate this subjective process and teach LMMs with text-defined rating levels instead of scores. |
Haoning Wu; Zicheng Zhang; Weixia Zhang; Chaofeng Chen; Liang Liao; Chunyi Li; Yixuan Gao; Annan Wang; Erli Zhang; Wenxiu Sun; Qiong Yan; Xiongkuo Min; Guangtao Zhai; Weisi Lin; |
134 | Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. |
Sungwon Han; Jinsung Yoon; Sercan O Arik; Tomas Pfister; |
135 | DITTO: Diffusion Inference-Time T-Optimization for Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose framework for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. |
Zachary Novack; Julian McAuley; Taylor Berg-Kirkpatrick; Nicholas J. Bryan; |
136 | COALA: A Practical and Vision-Centric Federated Learning Platform Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize as task, data, and model levels. |
Weiming Zhuang; Jian Xu; Chen Chen; Jingtao Li; Lingjuan Lyu; |
137 | Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper analyzes potential reasons behind the issues, and designs improved reward learning algorithm termed ‘Iterative Data Smoothing’ (IDS). |
Banghua Zhu; Michael Jordan; Jiantao Jiao; |
138 | Adaptive Sampling of K-Space in Magnetic Resonance for Rapid Pathology Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection. |
Chen-Yu Yen; Raghav Singhal; Umang Sharma; Rajesh Ranganath; Sumit Chopra; Lerrel Pinto; |
139 | Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains. |
Xiaoyu Wen; Chenjia Bai; Kang Xu; Xudong Yu; Yang Zhang; Xuelong Li; Zhen Wang; |
140 | RoSA: Accurate Parameter-Efficient Fine-Tuning Via Robust Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis that jointly trains $\textit{low-rank}$ and \textit{highly-sparse} components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution. |
Mahdi Nikdan; Soroush Tabesh; Elvir Crnčević; Dan Alistarh; |
141 | Extreme Compression of Large Language Models Via Additive Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the problem of “extreme” LLM compression—defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter—from the point of view of classic methods in Multi-Codebook Quantization (MCQ). |
Vage Egiazarian; Andrei Panferov; Denis Kuznedelev; Elias Frantar; Artem Babenko; Dan Alistarh; |
142 | InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. |
Lichang Chen; Jiuhai Chen; Tom Goldstein; Heng Huang; Tianyi Zhou; |
143 | Full-Atom Peptide Design Based on Multi-modal Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present *PepFlow*, the first multi-modal deep generative model grounded in the flow-matching framework for the design of full-atom peptides that target specific protein receptors. |
Jiahan Li; Chaoran Cheng; Zuofan Wu; Ruihan Guo; Shitong Luo; Zhizhou Ren; Jian Peng; Jianzhu Ma; |
144 | Evaluation of Test-Time Adaptation Under Computational Time Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation. |
Motasem Alfarra; Hani Itani; Alejandro Pardo; shyma yaser alhuwaider; Merey Ramazanova; Juan Camilo Perez; zhipeng cai; Matthias Müller; Bernard Ghanem; |
145 | SAPG: Split and Aggregate Policy Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we show that current RL methods, e.g. PPO, fail to ingest the benefit of parallelized environments beyond a certain point and their performance saturates. To address this, we propose a new on-policy RL algorithm that can effectively leverage large-scale environments by splitting them into chunks and fusing them back together via importance sampling. |
Jayesh Singla; Ananye Agarwal; Deepak Pathak; |
146 | Privacy Backdoors: Stealing Data with Corrupted Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. We show that this practice introduces a new risk of privacy backdoors. |
Shanglun Feng; Florian Tramèr; |
147 | ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. |
Akhil Agnihotri; Rahul Jain; Haipeng Luo; |
148 | WARM: On The Benefits of Weight Averaged Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify two primary challenges when designing RMs to mitigate reward hacking: distribution shifts during the RL process and inconsistencies in human preferences. As a solution, we propose Weight Averaged Reward Models (WARM), first fine-tuning multiple RMs, then averaging them in the weight space. |
Alexandre Rame; Nino Vieillard; Leonard Hussenot; Robert Dadashi; Geoffrey Cideron; Olivier Bachem; Johan Ferret; |
149 | Agent Instructs Large Language Models to Be General Zero-Shot Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. |
Nicholas Crispino; Kyle Montgomery; Fankun Zeng; Dawn Song; Chenguang Wang; |
150 | In-Context Language Learning: Architectures and Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL). |
Ekin Akyürek; Bailin Wang; Yoon Kim; Jacob Andreas; |
151 | Transolver: A Fast Transformer Solver for PDEs on General Geometries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Going beyond superficial and unwieldy meshes, we present Transolver based on a more foundational idea, which is learning intrinsic physical states hidden behind discretized geometries. |
Haixu Wu; Huakun Luo; Haowen Wang; Jianmin Wang; Mingsheng Long; |
152 | TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions, by generating via using, growing, and periodically trimming the toolbox. |
Zhiruo Wang; Graham Neubig; Daniel Fried; |
153 | ODIN: Disentangled Reward Mitigates Hacking in RLHF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. |
Lichang Chen; Chen Zhu; Jiuhai Chen; Davit Soselia; Tianyi Zhou; Tom Goldstein; Heng Huang; Mohammad Shoeybi; Bryan Catanzaro; |
154 | On The Duality Between Sharpness-Aware Minimization and Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. |
Yihao Zhang; Hangzhou He; Jingyu Zhu; Huanran Chen; Yifei Wang; Zeming Wei; |
155 | A Decoder-only Foundation Model for Time-series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. |
Abhimanyu Das; Weihao Kong; Rajat Sen; Yichen Zhou; |
156 | Physics of Language Models: Part 3.1, Knowledge Storage and Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, do they answer such questions based on exposure to similar questions during training (i.e., cheating), or by genuinely learning to extract knowledge from sources like Wikipedia? In this paper, we investigate this issue using a controlled biography dataset. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
157 | Auditing Private Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first framework for auditing private prediction where we instantiate adversaries with varying poisoning and query capabilities. |
Karan Chadha; Matthew Jagielski; Nicolas Papernot; Christopher A. Choquette-Choo; Milad Nasr; |
158 | Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To demonstrate the challenge of defending finetuning interfaces, we introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection. |
Danny Halawi; Alexander Wei; Eric Wallace; Tony Tong Wang; Nika Haghtalab; Jacob Steinhardt; |
159 | Feedback Loops With Language Models Drive In-Context Reward Hacking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that feedback loops can cause in-context reward hacking (ICRH), where the LLM at test-time optimizes a (potentially implicit) objective but creates negative side effects in the process. |
Alexander Pan; Erik Jones; Meena Jagadeesan; Jacob Steinhardt; |
160 | ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Continuum Physical Dataset (ContPhy), a novel benchmark for assessing machine physical commonsense. |
Zhicheng Zheng; Xin Yan; Zhenfang Chen; Jingzhou Wang; Qin Zhi Eddie Lim; Joshua B. Tenenbaum; Chuang Gan; |
161 | RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning Via Generative Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. |
Yufei Wang; Zhou Xian; Feng Chen; Tsun-Hsuan Wang; Yian Wang; Katerina Fragkiadaki; Zackory Erickson; David Held; Chuang Gan; |
162 | SqueezeLLM: Dense-and-Sparse Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. |
Sehoon Kim; Coleman Richard Charles Hooper; Amir Gholami; Zhen Dong; Xiuyu Li; Sheng Shen; Michael W. Mahoney; Kurt Keutzer; |
163 | Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on a memory bottleneck imposed by the key-value (KV) cache, a computational shortcut that requires storing previous KV pairs during decoding. |
Harry Dong; Xinyu Yang; Zhenyu Zhang; Zhangyang Wang; Yuejie Chi; Beidi Chen; |
164 | Repeat After Me: Transformers Are Better Than State Space Models at Copying Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as ”generalized state space models” (GSSMs). In this paper we show that while GSSMs are promising in terms of inference-time efficiency, they are limited compared to transformer models on tasks that require copying from the input context. |
Samy Jelassi; David Brandfonbrener; Sham M. Kakade; eran malach; |
165 | StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, they have largely overlooked the underlying coherence between the augmented domains, which in turn leads to inferior results in real-world scenarios. In this paper, we propose a simple yet effective scheme, termed as \emph{StyDeSty}, to explicitly account for the alignment of the source and pseudo domains in the process of data augmentation, enabling them to interact with each other in a self-consistent manner and further giving rise to a latent domain with strong generalization power. |
Songhua Liu; Xin Jin; Xingyi Yang; Jingwen Ye; Xinchao Wang; |
166 | Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce three customized ensemble strategies, each tailored to one specific scenario. |
Zhihe Lu; Jiawang Bai; Xin Li; Zeyu Xiao; Xinchao Wang; |
167 | GPTSwarm: Language Agents As Optimizable Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. |
Mingchen Zhuge; Wenyi Wang; Louis Kirsch; Francesco Faccio; Dmitrii Khizbullin; Jürgen Schmidhuber; |
168 | Visual Representation Learning with Stochastic Frame Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because of the under-determined nature of frame prediction; multiple potential futures can arise from a single current frame. To tackle this challenge, in this paper, we revisit the idea of stochastic video generation that learns to capture uncertainty in frame prediction and explore its effectiveness for representation learning. |
Huiwon Jang; Dongyoung Kim; Junsu Kim; Jinwoo Shin; Pieter Abbeel; Younggyo Seo; |
169 | Large Language Models Are Geographically Biased Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to study what LLMs know about the world we live in through the lens of geography. |
Rohin Manvi; Samar Khanna; Marshall Burke; David B. Lobell; Stefano Ermon; |
170 | DoRA: Weight-Decomposed Low-Rank Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA). |
Shih-yang Liu; Chien-Yi Wang; Hongxu Yin; Pavlo Molchanov; Yu-Chiang Frank Wang; Kwang-Ting Cheng; Min-Hung Chen; |
171 | GaLore: Memory-Efficient LLM Training By Gradient Low-Rank Projection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. |
Jiawei Zhao; Zhenyu Zhang; Beidi Chen; Zhangyang Wang; Anima Anandkumar; Yuandong Tian; |
172 | CompeteAI: Understanding The Competition Dynamics of Large Language Model-based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to examine the competition dynamics in LLM-based agents. |
Qinlin Zhao; Jindong Wang; Yixuan Zhang; Yiqiao Jin; Kaijie Zhu; Hao Chen; Xing Xie; |
173 | 3D-VLA: A 3D Vision-Language-Action Generative World Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose 3D-VLA by introducing a new family of embodied foundation models that seamlessly link 3D perception, reasoning, and action through a generative world model. |
Haoyu Zhen; Xiaowen Qiu; Peihao Chen; Jincheng Yang; Xin Yan; Yilun Du; Yining Hong; Chuang Gan; |
174 | Position: Key Claims in LLM Research Have A Long Tail of Footnotes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We contribute a definition of LLMs, critically examine five common claims regarding their properties (including ’emergent properties’), and conclude with suggestions for future research directions and their framing. |
Anna Rogers; Sasha Luccioni; |
175 | Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a novel DIL method named *DARE*, featuring a three-stage training process: Divergence, Adaptation, and REfinement. |
Kishaan Jeeveswaran; Elahe Arani; Bahram Zonooz; |
176 | Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do *not* confer any implicit bias advantages in online learning. |
Nikhil Vyas; Depen Morwani; Rosie Zhao; Gal Kaplun; Sham M. Kakade; Boaz Barak; |
177 | MusicFlow: Cascaded Flow Matching for Text Guided Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MusicFlow, a cascaded text-to-music generation model based on flow matching. |
K R Prajwal; Bowen Shi; Matthew Le; Apoorv Vyas; Andros Tjandra; Mahi Luthra; Baishan Guo; Huiyu Wang; Triantafyllos Afouras; David Kant; Wei-Ning Hsu; |
178 | The Illusion of State in State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$. |
William Merrill; Jackson Petty; Ashish Sabharwal; |
179 | Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Language Agent Tree Search (LATS) — the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. |
Andy Zhou; Kai Yan; Michal Shlapentokh-Rothman; Haohan Wang; Yu-Xiong Wang; |
180 | Slow and Steady Wins The Race: Maintaining Plasticity with Hare and Tortoise Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce the Hare \& Tortoise, inspired by the brain’s complementary learning system. |
Hojoon Lee; Hyeonseo Cho; Hyunseung Kim; Donghu Kim; Dugki Min; Jaegul Choo; Clare Lyle; |
181 | On Prompt-Driven Safeguarding for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how LLMs’ behavior (i.e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation. |
Chujie Zheng; Fan Yin; Hao Zhou; Fandong Meng; Jie Zhou; Kai-Wei Chang; Minlie Huang; Nanyun Peng; |
182 | Controlled Decoding from Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We pose a tokenwise RL objective and propose a modular solver for it, called *controlled decoding (CD)*. |
Sidharth Mudgal; Jong Lee; Harish Ganapathy; YaGuang Li; Tao Wang; Yanping Huang; Zhifeng Chen; Heng-Tze Cheng; Michael Collins; Trevor Strohman; Jilin Chen; Alex Beutel; Ahmad Beirami; |
183 | A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of trials, which in turn makes it impossible to account for the privacy cost of HPO without destroying the utility. We propose an adaptive HPO method that uses cheap trials (in terms of privacy cost and runtime) to estimate optimal hyperparameters and scales them up. |
Ashwinee Panda; Xinyu Tang; Saeed Mahloujifar; Vikash Sehwag; Prateek Mittal; |
184 | Adaptive Hierarchical Certification for Segmentation Using Randomized Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components classic methods would abstain from, effectively lowering the abstain rate whilst providing more certified semantically meaningful information. We mathematically formulate the problem setup, introduce an adaptive hierarchical certification algorithm and prove the correctness of its guarantees. |
Alaa Anani; Tobias Lorenz; Bernt Schiele; Mario Fritz; |
185 | Stealthy Imitation: Reward-guided Environment-free Policy Stealing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. |
Zhixiong Zhuang; Maria-Irina Nicolae; Mario Fritz; |
186 | Linear Alignment: A Closed-form Solution for Aligning Human Preferences Without Tuning and Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce \textit{Linear Alignment}, a novel algorithm that aligns language models with human preferences in one single inference step, eliminating the reliance on data annotation and model training. |
Songyang Gao; Qiming Ge; Wei Shen; Shihan Dou; Junjie Ye; Xiao Wang; Rui Zheng; Yicheng Zou; Zhi Chen; Hang Yan; Qi Zhang; Dahua Lin; |
187 | RoboDreamer: Learning Compositional World Models for Robot Imagination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is heavily limiting in decision-making, where we seek a powerful world model to synthesize plans of unseen combinations of objects and actions in order to solve previously unseen tasks in new environments. To resolve this issue, we introduce RoboDreamer, an innovative approach for learning a compositional world model by factorizing the video generation. |
Siyuan Zhou; Yilun Du; Jiaben Chen; YANDONG LI; Dit-Yan Yeung; Chuang Gan; |
188 | Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. |
Fengdi Che; Chenjun Xiao; Jincheng Mei; Bo Dai; Ramki Gummadi; Oscar A Ramirez; Christopher K Harris; A. Rupam Mahmood; Dale Schuurmans; |
189 | Transformers, Parallel Computation, and Logarithmic Depth Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that a constant number of self-attention layers can efficiently simulate—and be simulated by—a constant number of communication rounds of *Massively Parallel Computation*. |
Clayton Sanford; Daniel Hsu; Matus Telgarsky; |
190 | Selecting Large Language Model to Fine-tune Via Rectified Scaling Law Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Scaling Law. |
Haowei Lin; Baizhou Huang; Haotian Ye; Qinyu Chen; Zihao Wang; Sujian Li; Jianzhu Ma; Xiaojun Wan; James Zou; Yitao Liang; |
191 | Mean-field Chaos Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new class of score-based generative models (SGMs) designed to handle high-cardinality data distributions by leveraging concepts from mean-field theory. |
Sungwoo Park; Dongjun Kim; Ahmed Alaa; |
192 | Image Fusion Via Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods predominantly focus on pixel-level and semantic visual features for recognition, but often overlook the deeper text-level semantic information beyond vision. Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information from source images to guide the fusion process. |
Zixiang Zhao; Lilun Deng; Haowen Bai; Yukun Cui; Zhipeng Zhang; Yulun Zhang; Haotong Qin; Dongdong Chen; Jiangshe Zhang; PENG WANG; Luc Van Gool; |
193 | How Do Nonlinear Transformers Learn and Generalize in In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To the best of our knowledge, this paper provides the first theoretical analysis of the training dynamics of Transformers with nonlinear self-attention and nonlinear MLP, together with the ICL generalization capability of the resulting model. |
Hongkang Li; Meng Wang; Songtao Lu; Xiaodong Cui; Pin-Yu Chen; |
194 | What Improves The Generalization of Graph Transformers? A Theoretical Dive Into The Self-attention and Positional Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces the first theoretical investigation of a shallow Graph Transformer for semi-supervised node classification, comprising a self-attention layer with relative positional encoding and a two-layer perception. |
Hongkang Li; Meng Wang; Tengfei Ma; Sijia Liu; ZAIXI ZHANG; Pin-Yu Chen; |
195 | Improving Fine-grained Understanding in Image-text Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SPARse fine-grained Contrastive alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. |
Ioana Bica; Anastasija Ilic; Matthias Bauer; Goker Erdogan; Matko Bošnjak; Christos Kaplanis; Alexey A. Gritsenko; Matthias Minderer; Charles Blundell; Razvan Pascanu; Jovana Mitrovic; |
196 | KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: From this analysis, we developed a tuning-free 2bit KV cache quantization algorithm, named KIVI. |
Zirui Liu; Jiayi Yuan; Hongye Jin; Shaochen Zhong; Zhaozhuo Xu; Vladimir Braverman; Beidi Chen; Xia Hu; |
197 | Trustless Audits Without Revealing Data or Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that it is possible to simultaneously allow model providers to keep their models and data secret while allowing other parties to trustlessly audit properties of the model and data. |
Suppakit Waiwitlikhit; Ion Stoica; Yi Sun; Tatsunori Hashimoto; Daniel Kang; |
198 | Position: On The Possibilities of AI-Generated Text Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds. |
Souradip Chakraborty; Amrit Bedi; Sicheng Zhu; Bang An; Dinesh Manocha; Furong Huang; |
199 | Error Feedback Can Accurately Compress Preconditioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, existing approaches for accurate full-matrix preconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free Approximate Curvature (M-FAC) suffer from massive storage costs when applied even to small-scale models, as they must store a sliding window of gradients, whose mem- ory requirements are multiplicative in the model dimension. In this paper, we address this issue via a novel and efficient error-feedback technique that can be applied to compress preconditioners by up to two orders of magnitude in practice, without loss of convergence. |
Ionut-Vlad Modoranu; Aleksei Kalinov; Eldar Kurtic; Elias Frantar; Dan Alistarh; |
200 | SPADE: Sparsity-Guided Debugging for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate, for the first time, that sparsity can instead be incorporated into the interpretation process itself, as a sample-specific preprocessing step. |
Arshia Soltani Moakhar; Eugenia Iofinova; Elias Frantar; Dan Alistarh; |
201 | Large Scale Dataset Distillation with Domain Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce \textbf{D}ataset \textbf{D}istillation with \textbf{D}omain \textbf{S}hift (\textbf{D3S}), a scalable distillation algorithm, made by reframing the dataset distillation problem as a \textit{domain shift} one. |
Noel Loo; Alaa Maalouf; Ramin Hasani; Mathias Lechner; Alexander Amini; Daniela Rus; |
202 | Rejuvenating Image-GPT As Strong Visual Representation Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning. |
Sucheng Ren; Zeyu Wang; Hongru Zhu; Junfei Xiao; Alan Yuille; Cihang Xie; |
203 | Deep Networks Always Grok and Here Is Why Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the new concept of delayed robustness, whereby a DNN groks adversarial examples and becomes robust, long after interpolation and/or generalization. |
Ahmed Imtiaz Humayun; Randall Balestriero; Richard Baraniuk; |
204 | On The Expressive Power of Spectral Invariant Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is to gain a deep theoretical understanding of the expressive power obtainable when using spectral features. |
Bohang Zhang; Lingxiao Zhao; Haggai Maron; |
205 | GenCO: Generating Diverse Designs with Combinatorial Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many design settings arising in industrial design, material science, computer graphics and more require that the generated objects satisfy hard combinatorial constraints or meet objectives in addition to modeling a data distribution. To address this, we propose GenCO, a generative framework that guarantees constraint satisfaction throughout training by leveraging differentiable combinatorial solvers to enforce feasibility. |
Aaron M Ferber; Arman Zharmagambetov; Taoan Huang; Bistra Dilkina; Yuandong Tian; |
206 | Feedback Efficient Online Fine-Tuning of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel reinforcement learning procedure that efficiently explores on the manifold of feasible samples. |
Masatoshi Uehara; Yulai Zhao; Kevin Black; Ehsan Hajiramezanali; Gabriele Scalia; Nathaniel Lee Diamant; Alex M Tseng; Sergey Levine; Tommaso Biancalani; |
207 | Understanding Reasoning Ability of Language Models From The Perspective of Reasoning Paths Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning. To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time. |
Xinyi Wang; Alfonso Amayuelas; Kexun Zhang; Liangming Pan; Wenhu Chen; William Yang Wang; |
208 | Rethinking Decision Transformer Via Hierarchical Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce a general sequence modeling framework for studying sequential decision making through the lens of \emph{Hierarchical RL}. |
Yi Ma; Jianye HAO; Hebin Liang; Chenjun Xiao; |
209 | Stop Regressing: Training Value Functions Via Classification for Scalable Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. |
Jesse Farebrother; Jordi Orbay; Quan Vuong; Adrien Ali Taiga; Yevgen Chebotar; Ted Xiao; Alex Irpan; Sergey Levine; Pablo Samuel Castro; Aleksandra Faust; Aviral Kumar; Rishabh Agarwal; |
210 | Optimal Eye Surgeon: Finding Image Priors Through Sparse Generators at Initialization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Optimal Eye Surgeon (OES), a framework for pruning and training deep image generator networks. |
Avrajit Ghosh; Xitong Zhang; Kenneth K. Sun; Qing Qu; Saiprasad Ravishankar; Rongrong Wang; |
211 | Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, in contrast to integrating visual prompts into inputs, we regard visual prompts as additional knowledge that facilitates language models in addressing tasks associated with visual information. |
Shibo Jie; Yehui Tang; Ning Ding; Zhi-Hong Deng; Kai Han; Yunhe Wang; |
212 | Position: The Platonic Representation Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that representations in AI models, particularly deep networks, are converging. |
Minyoung Huh; Brian Cheung; Tongzhou Wang; Phillip Isola; |
213 | Trainable Transformer in Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new efficient construction, Transformer in Transformer (in short, TINT), that allows a transformer to simulate and fine-tune more complex models during inference (e.g., pre-trained language models). |
Abhishek Panigrahi; Sadhika Malladi; Mengzhou Xia; Sanjeev Arora; |
214 | Towards A Self-contained Data-driven Global Weather Forecasting Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to couple the AI forecasting model, FengWu, with 4DVar to build a self-contained data-driven global weather forecasting framework, FengWu-4DVar. |
Yi Xiao; LEI BAI; Wei Xue; Hao Chen; Kun Chen; kang chen; Tao Han; Wanli Ouyang; |
215 | Switchable Decision: Dynamic Neural Generation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each data instance. |
Shujian Zhang; Korawat Tanwisuth; Chengyue Gong; Pengcheng He; Mingyuan Zhou; |
216 | Amortizing Pragmatic Program Synthesis with Rankings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a general method of amortizing the slow, exact RSA synthesizer. |
Yewen Pu; Saujas Vaduguru; Priyan Vaithilingam; Elena Glassman; Daniel Fried; |
217 | Mastering Robot Manipulation with Multimodal Prompts Through Pretraining and Multi-task Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of training a robot to understand multimodal prompts, interleaving vision signals with text descriptions. |
Jiachen Li; Qiaozi Gao; Michael Johnston; Xiaofeng Gao; Xuehai He; Hangjie Shi; Suhaila Shakiah; Reza Ghanadan; William Yang Wang; |
218 | Dynamic Evaluation of Large Language Models By Meta Probing Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose meta probing agents (MPA), a general dynamic evaluation protocol inspired by psychometrics to evaluate LLMs. |
Kaijie Zhu; Jindong Wang; Qinlin Zhao; Ruochen Xu; Xing Xie; |
219 | MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking. |
Dongping Chen; Ruoxi Chen; Shilin Zhang; Yaochen Wang; Yinuo Liu; Huichi Zhou; Qihui Zhang; Yao Wan; Pan Zhou; Lichao Sun; |
220 | Exploiting Code Symmetries for Learning Program Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a group-theoretic framework that defines code symmetries as semantics-preserving transformations, where forming a code symmetry group enables precise and efficient reasoning of code semantics. |
Kexin Pei; Weichen Li; Qirui Jin; Shuyang Liu; Scott Geng; Lorenzo Cavallaro; Junfeng Yang; Suman Jana; |
221 | Self-Rewarding Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that during Iterative DPO training, not only does instruction following ability improve, but also the ability to provide high-quality rewards to itself. |
Weizhe Yuan; Richard Yuanzhe Pang; Kyunghyun Cho; Xian Li; Sainbayar Sukhbaatar; Jing Xu; Jason E Weston; |
222 | Understanding Adam Optimizer Via Online Learning of Updates: Adam Is FTRL in Disguise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a different perspective based on online learning that underscores the importance of Adam’s algorithmic components. |
Kwangjun Ahn; Zhiyu Zhang; Yunbum Kook; Yan Dai; |
223 | Learning Causal Relations from Subsampled Time Series with Two Time-Slices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the causal relations from subsampled time series, in which measurements are sparse and sampled at a coarser timescale than the causal timescale of the underlying system. |
Anpeng Wu; Haoxuan Li; Kun Kuang; Zhang Keli; Fei Wu; |
224 | MaxMin-RLHF: Alignment with Diverse Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting its insufficiency in representing diverse human preferences. Next, we propose to learn a mixture of reward models via an expectation-maximization algorithm and solve a MaxMin alignment objective inspired by the Egalitarian principle in social choice theory to better honor diverse human preferences. |
Souradip Chakraborty; Jiahao Qiu; Hui Yuan; Alec Koppel; Dinesh Manocha; Furong Huang; Amrit Bedi; Mengdi Wang; |
225 | A Minimaximalist Approach to Reinforcement Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *Self-Play Preference Optimization* (SPO), an algorithm for reinforcement learning from human feedback. |
Gokul Swamy; Christoph Dann; Rahul Kidambi; Steven Wu; Alekh Agarwal; |
226 | Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a vision of LLM-Modulo Frameworks that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. |
Subbarao Kambhampati; Karthik Valmeekam; Lin Guan; Mudit Verma; Kaya Stechly; Siddhant Bhambri; Lucas Paul Saldyt; Anil B Murthy; |
227 | Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, MambaFormer, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. |
Jongho Park; Jaeseung Park; Zheyang Xiong; Nayoung Lee; Jaewoong Cho; Samet Oymak; Kangwook Lee; Dimitris Papailiopoulos; |
228 | Memory Consolidation Enables Long-Context Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complexity. We propose to instead re-purpose existing pre-trained video transformers by simply fine-tuning them to attend to memories derived non-parametrically from past activations. |
Ivana Balazevic; Yuge Shi; Pinelopi Papalampidi; Rahma Chaabouni; Skanda Koppula; Olivier J Henaff; |
229 | HumanTOMATO: Text-aligned Whole-body Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works on text-driven motion generation tasks mainly have two limitations: they ignore the key role of fine-grained hand and face controlling in vivid whole-body motion generation, and lack a good alignment between text and motion. To address such limitations, we propose a Text-aligned whOle-body Motion generATiOn framework, named HumanTOMATO, which is the first attempt to our knowledge towards applicable holistic motion generation in this research area. |
Shunlin Lu; Ling-Hao Chen; Ailing Zeng; Jing Lin; Ruimao Zhang; Lei Zhang; Heung-Yeung Shum; |
230 | Variational Schrödinger Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the variational Schr\odinger diffusion model (VSDM), where the forward process is a multivariate diffusion and the variational scores are adaptively optimized for efficient transport. |
Wei Deng; Weijian Luo; Yixin Tan; Marin Biloš; Yu Chen; Yuriy Nevmyvaka; Ricky T. Q. Chen; |
231 | A Dense Reward View on Aligning Text-to-Image Diffusion with Preference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take on a finer dense reward perspective and derive a tractable alignment objective that emphasizes the initial steps of the T2I reverse chain. |
Shentao Yang; Tianqi Chen; Mingyuan Zhou; |
232 | BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, adapting these black-box LLMs is only possible through their API services, raising concerns about transparency, privacy, and cost. To address these challenges, we introduce BBox-Adapter, a novel lightweight adapter for black-box LLMs. |
Haotian Sun; Yuchen Zhuang; Wei Wei; Chao Zhang; Bo Dai; |
233 | GNNs Also Deserve Editing, and They Need It More Than Once Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve into the specific reasons behind the difficulty of editing GNNs in succession and observe the root cause to be model overfitting. |
Shaochen Zhong; Duy Le; Zirui Liu; Zhimeng Jiang; Andrew Ye; Jiamu Zhang; Jiayi Yuan; Kaixiong Zhou; Zhaozhuo Xu; Jing Ma; Shuai Xu; Vipin Chaudhary; Xia Hu; |
234 | Position: AI/ML Influencers Have A Place in The Academic Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications. In this paper, we investigate the role of social media influencers in enhancing the visibility of machine learning research, particularly the citation counts of papers they share. |
Iain Weissburg; Mehir Arora; Xinyi Wang; Liangming Pan; William Yang Wang; |
235 | Split-Ensemble: Efficient OOD-aware Ensemble Via Task and Model Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we improve on uncertainty estimation without extra OOD data or additional inference costs using an alternative *Split-Ensemble* method. |
Anthony Chen; Huanrui Yang; Yulu Gan; Denis A Gudovskiy; Zhen Dong; Haofan Wang; Tomoyuki Okuno; Yohei Nakata; Kurt Keutzer; Shanghang Zhang; |
236 | Human Alignment of Large Language Models Through Online Preference Optimisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our contribution is two-fold. First, we show the equivalence between two recent alignment methods, namely Identity Policy Optimisation (IPO) and Nash Mirror Descent (Nash-MD). Second, we introduce a generalisation of IPO, named IPO-MD, that leverages the regularised sampling approach proposed by Nash-MD. |
Daniele Calandriello; Zhaohan Daniel Guo; Remi Munos; Mark Rowland; Yunhao Tang; Bernardo Avila Pires; Pierre Harvey Richemond; Charline Le Lan; Michal Valko; Tianqi Liu; Rishabh Joshi; Zeyu Zheng; Bilal Piot; |
237 | Recovering The Pre-Fine-Tuning Weights of Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This practice is considered safe, as no current method can recover the unsafe, *pre-fine-tuning* model weights. In this paper, we demonstrate that this assumption is often false. |
Eliahu Horwitz; Jonathan Kahana; Yedid Hoshen; |
238 | Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging the observation that a limited set of Fourier (Spectral) modes suffice to provide the required expressivity of a neural operator, we propose a simple method, based on the efficient direct evaluation of the underlying spectral transformation, to extend neural operators to arbitrary domains. |
Levi E. Lingsch; Mike Yan Michelis; Emmanuel de Bezenac; Sirani M. Perera; Robert K. Katzschmann; Siddhartha Mishra; |
239 | LangCell: Language-Cell Pre-training for Cell Identity Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. |
Suyuan Zhao; Jiahuan Zhang; Yushuai Wu; YIZHEN LUO; Zaiqing Nie; |
240 | Decouple Then Classify: A Dynamic Multi-view Labeling Strategy with Shared and Specific Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In literature, most existing methods randomly label samples with a given ratio, but achieve unpromising and unstable results due to the randomness, especially in multi-view settings. To address this issue, we propose a Dynamic Multi-view Labeling Strategy with Shared and Specific Information. |
Xinhang Wan; Jiyuan Liu; Xinwang Liu; Yi Wen; Hao Yu; Siwei Wang; Shengju Yu; Tianjiao Wan; Jun Wang; En Zhu; |
241 | TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To emphasize temporal correlation modeling, this paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks. |
Jiaxiang Dong; Haixu Wu; Yuxuan Wang; Yun-Zhong Qiu; Li Zhang; Jianmin Wang; Mingsheng Long; |
242 | Gated Linear Attention Transformers with Hardware-Efficient Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work describes a hardware-efficient algorithm for linear attention that trades off memory movement against parallelizability. |
Songlin Yang; Bailin Wang; Yikang Shen; Rameswar Panda; Yoon Kim; |
243 | A Distributional Analogue to The Successor Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. |
Harley Wiltzer; Jesse Farebrother; Arthur Gretton; Yunhao Tang; Andre Barreto; Will Dabney; Marc G Bellemare; Mark Rowland; |
244 | Distributional Bellman Operators Over Mean Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. |
Li Kevin Wenliang; Gregoire Deletang; Matthew Aitchison; Marcus Hutter; Anian Ruoss; Arthur Gretton; Mark Rowland; |
245 | Position: Graph Foundation Models Are Already Here Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a graph vocabulary”, in which the basic transferable units underlying graphs encode the invariance on graphs. |
Haitao Mao; Zhikai Chen; Wenzhuo Tang; Jianan Zhao; Yao Ma; Tong Zhao; Neil Shah; Mikhail Galkin; Jiliang Tang; |
246 | MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes **MorphGrower**, which mimicks the neuron natural growth mechanism for generation. |
Nianzu Yang; Kaipeng Zeng; Haotian Lu; Yexin Wu; Zexin Yuan; Danni Chen; Shengdian Jiang; Jiaxiang Wu; Yimin Wang; Junchi Yan; |
247 | Refined Coreset Selection: Towards Minimal Coreset Size Under Model Performance Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored. |
Xiaobo Xia; Jiale Liu; Shaokun Zhang; Qingyun Wu; Hongxin Wei; Tongliang Liu; |
248 | UP2ME: Univariate Pre-training to Multivariate Fine-tuning As A General-purpose Framework for Multivariate Time Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general-purpose framework, named UP2ME (**U**nivariate **P**re-training to **M**ultivariate Fin**e**-tuning). |
Yunhao Zhang; Minghao Liu; Shengyang Zhou; Junchi Yan; |
249 | MEMORYLLM: Towards Self-Updatable Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool within the latent space of the transformer. |
Yu Wang; Yifan Gao; Xiusi Chen; Haoming Jiang; Shiyang Li; Jingfeng Yang; Qingyu Yin; Zheng Li; Xian Li; Bing Yin; Jingbo Shang; Julian McAuley; |
250 | Floating Anchor Diffusion Model for Multi-motif Scaffolding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Floating Anchor Diffusion (FADiff) model. |
Ke Liu; Weian Mao; Shuaike Shen; Xiaoran Jiao; Zheng Sun; Hao Chen; Chunhua Shen; |
251 | Do Language Models Exhibit The Same Cognitive Biases in Problem Solving As Human Learners? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems.We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features. |
Andreas Opedal; Alessandro Stolfo; Haruki Shirakami; Ying Jiao; Ryan Cotterell; Bernhard Schölkopf; Abulhair Saparov; Mrinmaya Sachan; |
252 | FedREDefense: Defending Against Model Poisoning Attacks for Federated Learning Using Model Update Reconstruction Error Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing defenses, typically relying on cross-client/global information to mitigate these attacks, fall short when faced with non-IID data distributions and/or a large number of malicious clients. To address these challenges, we present FedREDefense. |
Yueqi XIE; Minghong Fang; Neil Zhenqiang Gong; |
253 | Towards Efficient Exact Optimization of Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose efficient exact optimization (EXO) of the alignment objective. |
Haozhe Ji; Cheng Lu; Yilin Niu; Pei Ke; Hongning Wang; Jun Zhu; Jie Tang; Minlie Huang; |
254 | RICE: Breaking Through The Training Bottlenecks of Reinforcement Learning with Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. |
Zelei Cheng; Xian Wu; Jiahao Yu; Sabrina Yang; Gang Wang; Xinyu Xing; |
255 | Soft Prompt Recovers Compressed LLMs, Transferably Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the gaining of such efficiency benefits often simultaneously demands extensive engineering efforts and intricate designs to mitigate the performance decline. \quad In this work, we leverage \textit{(Soft) Prompt Tuning} in its most vanilla form and discover such conventionally learned soft prompts can recover the performance of compressed LLMs. |
Zhaozhuo Xu; Zirui Liu; Beidi Chen; Shaochen Zhong; Yuxin Tang; Jue WANG; Kaixiong Zhou; Xia Hu; Anshumali Shrivastava; |
256 | GLoRe: When, Where, and How to Improve LLM Reasoning Via Global and Local Refinements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Stepwise ORMs (\textbf{SORMs}) which are trained, only on synthetic data, to approximate the expected future reward of the optimal policy or $V^{\star}$ as a form of Process-based reward modeling.We generate training data for both models synthetically by reusing data used to train the SORM. |
Alexander Havrilla; Sharath Chandra Raparthy; Christoforos Nalmpantis; Jane Dwivedi-Yu; Maksym Zhuravinskyi; Eric Hambro; Roberta Raileanu; |
257 | Generative Active Learning for Long-tailed Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how to perform active learning specifically for generated data in the long-tailed instance segmentation task. |
Muzhi Zhu; Chengxiang Fan; Hao Chen; Yang Liu; Weian Mao; Xiaogang Xu; Chunhua Shen; |
258 | DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. |
Yinjun Wu; Mayank Keoliya; Kan Chen; Neelay Velingker; Ziyang Li; Emily J Getzen; Qi Long; Mayur Naik; Ravi B Parikh; Eric Wong; |
259 | Towards Compositionality in Concept Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that existing unsupervised concept extraction methods find concepts which are not compositional. To automatically discover compositional concept representations, we identify two salient properties of such representations, and propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties. |
Adam Stein; Aaditya Naik; Yinjun Wu; Mayur Naik; Eric Wong; |
260 | Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variational Bayesian explanation framework, dubbed ProbAbilistic Concept Explainers (PACE), which models the distributions of patch embeddings to provide trustworthy post-hoc conceptual explanations. |
Hengyi Wang; Shiwei Tan; Hao Wang; |
261 | Understanding and Diagnosing Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, it is crucial to develop techniques that aim to understand the sensitivities in the learnt representations of neural network policies. To achieve this we introduce a theoretically founded method that provides a systematic analysis of the unstable directions in the deep neural policy decision boundary across both time and space. |
Ezgi Korkmaz; |
262 | Prompt-tuning Latent Diffusion Models for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. |
Hyungjin Chung; Jong Chul Ye; Peyman Milanfar; Mauricio Delbracio; |
263 | Language Models Are Super Mario: Absorbing Abilities from Homologous Models As A Free Lunch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. |
Le Yu; Bowen Yu; Haiyang Yu; Fei Huang; Yongbin Li; |
264 | Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi ICML 2023, Alman and Song NeurIPS 2023], we formally define a dynamic version of attention matrix multiplication problem. |
Jan van den Brand; Zhao Song; Tianyi Zhou; |
265 | Critical Windows: Non-asymptotic Theory for Feature Emergence in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this is advantageous for interpretability as it implies one can localize properties of the generation to a small segment of the trajectory, it seems at odds with the continuous nature of the diffusion. We propose a formal framework for studying these windows and show that for data coming from a mixture of strongly log-concave densities, these windows can be provably bounded in terms of certain measures of inter- and intra-group separation. |
Marvin Li; Sitan Chen; |
266 | HyperFields: Towards Zero-Shot Generation of NeRFs from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. |
Sudarshan Babu; Richard Liu; Avery Zhou; Michael Maire; Greg Shakhnarovich; Rana Hanocka; |
267 | TVE: Learning Meta-attribution for Transferable Vision Explainer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limitation results in explaining various tasks being time- and resource-consuming. To address this problem, we introduce a **Transferable Vision Explainer** (TVE) that can effectively explain various vision models in downstream tasks. |
Guanchu Wang; Yu-Neng Chuang; Fan Yang; Mengnan Du; Chia-Yuan Chang; Shaochen Zhong; Zirui Liu; Zhaozhuo Xu; Kaixiong Zhou; Xuanting Cai; Xia Hu; |
268 | Equivariant Deep Weight Space Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. |
Aviv Navon; Aviv Shamsian; Ethan Fetaya; Gal Chechik; Nadav Dym; Haggai Maron; |
269 | From Self-Attention to Markov Models: Unveiling The Dynamics of Generative Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study learning a 1-layer self-attention model from a set of prompts and the associated outputs sampled from the model. |
Muhammed Emrullah Ildiz; Yixiao HUANG; Yingcong Li; Ankit Singh Rawat; Samet Oymak; |
270 | How to Escape Sharp Minima with Random Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main component of the algorithm is to use gradients computed from randomly perturbed iterates to estimate a direction that leads to flatter minima. For the setting where the cost function is an empirical risk over training data, we present a faster algorithm that is inspired by a recently proposed practical algorithm called sharpness-aware minimization, supporting its success in practice. |
Kwangjun Ahn; Ali Jadbabaie; Suvrit Sra; |
271 | Auto-Regressive Next-Token Predictors Are Universal Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a theoretical framework for studying auto-regressive next-token predictors. |
eran malach; |
272 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions. |
Kai Zhang; Yi Luan; Hexiang Hu; Kenton Lee; Siyuan Qiao; Wenhu Chen; Yu Su; Ming-Wei Chang; |
273 | DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. |
Zhongkai Hao; Chang Su; Songming Liu; Julius Berner; Chengyang Ying; Hang Su; Anima Anandkumar; Jian Song; Jun Zhu; |
274 | Plug-in Performative Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we study a general protocol for making use of possibly misspecified models in performative prediction, called plug-in performative optimization. |
Licong Lin; Tijana Zrnic; |
275 | GPT-4V(ision) Is A Generalist Web Agent, If Grounded Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website. |
Boyuan Zheng; Boyu Gou; Jihyung Kil; Huan Sun; Yu Su; |
276 | PIDformer: Transformer Meets Control Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. |
Tam Minh Nguyen; Cesar A Uribe; Tan Minh Nguyen; Richard Baraniuk; |
277 | CasCast: Skillful High-resolution Precipitation Nowcasting Via Cascaded Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CasCast, a cascaded framework composed of a deterministic and a probabilistic part to decouple the predictions for mesoscale precipitation distributions and small-scale patterns. |
Junchao Gong; LEI BAI; Peng Ye; Wanghan Xu; Na Liu; Jianhua Dai; Xiaokang Yang; Wanli Ouyang; |
278 | Receptive Fields As Experts in Convolutional Neural Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Mixture of Receptive Fields (MoRF) instead of using a single receptive field. |
Dongze Lian; Weihao Yu; Xinchao Wang; |
279 | What Is Dataset Distillation Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we posit and answer three questions about the behavior, representativeness, and point-wise information content of distilled data. |
William Yang; Ye Zhu; Zhiwei Deng; Olga Russakovsky; |
280 | ReGAL: Refactoring Programs to Discover Generalizable Abstractions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e., restructuring code without changing its execution output. |
Elias Stengel-Eskin; Archiki Prasad; Mohit Bansal; |
281 | Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This oversight and the requirement for annotated samples for downstream tasks limit eSSL’s versatility. In this work, we address these issues with the **M**ultimodal **E**CG **R**epresentation **L**earning (**MERL**) framework. |
Che Liu; Zhongwei Wan; Cheng Ouyang; Anand Shah; Wenjia Bai; Rossella Arcucci; |
282 | Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a class of XOR-type classification tasks with label-flipping noises. |
Xuran Meng; Difan Zou; Yuan Cao; |
283 | Score Identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. |
Mingyuan Zhou; Huangjie Zheng; Zhendong Wang; Mingzhang Yin; Hai Huang; |
284 | Understanding The Effects of Iterative Prompting on Truthfulness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work provides a nuanced understanding of iterative prompting and introduces novel approaches to enhance the truthfulness of LLMs, thereby contributing to the development of more accurate and trustworthy AI systems |
Satyapriya Krishna; Chirag Agarwal; Himabindu Lakkaraju; |
285 | Asymptotics of Feature Learning in Two-layer Networks After One Gradient-step Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. |
Hugo Cui; Luca Pesce; Yatin Dandi; Florent Krzakala; Yue Lu; Lenka Zdeborova; Bruno Loureiro; |
286 | Symmetry Induces Structure and Constraint of Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models. |
Liu Ziyin; |
287 | Position: What Can Large Language Models Tell Us About Time Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that current LLMs have the potential to revolutionize time series analysis, thereby promoting efficient decision-making and advancing towards a more universal form of time series analytical intelligence. |
Ming Jin; YiFan Zhang; Wei Chen; Kexin Zhang; Yuxuan Liang; Bin Yang; Jindong Wang; Shirui Pan; Qingsong Wen; |
288 | Foundation Policies with Hilbert Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While a number of methods have been proposed to enable generic self-supervised RL, based on principles such as goal-conditioned RL, behavioral cloning, and unsupervised skill learning, such methods remain limited in terms of either the diversity of the discovered behaviors, the need for high-quality demonstration data, or the lack of a clear adaptation mechanism for downstream tasks. In this work, we propose a novel unsupervised framework to pre-train generalist policies that capture diverse, optimal, long-horizon behaviors from unlabeled offline data such that they can be quickly adapted to any arbitrary new tasks in a zero-shot manner. |
Seohong Park; Tobias Kreiman; Sergey Levine; |
289 | Vague Prototype-Oriented Diffusion Model for Multi-Class Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such a challenging setting, widely used reconstruction-based networks persistently grapple with the identical shortcut problem, wherein the infiltration of abnormal information from the condition biases the output towards an anomalous distribution. In response to this critical challenge, we introduce a Vague Prototype-Oriented Diffusion Model (VPDM) that extracts only fundamental information from the condition to prevent the occurrence of the identical shortcut problem from the input layer. |
Yuxin Li; Yaoxuan Feng; Bo Chen; Wenchao Chen; Yubiao Wang; Xinyue Hu; baolin sun; Chunhui Qu; Mingyuan Zhou; |
290 | WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion. |
Xing Han Lu; Zdeněk Kasner; Siva Reddy; |
291 | Stereo Risk: A Continuous Modeling Approach to Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. |
Ce Liu; Suryansh Kumar; Shuhang Gu; Radu Timofte; Yao Yao; Luc Van Gool; |
292 | Robust Classification Via A Single Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better harness the expressive power of diffusion models, this paper proposes Robust Diffusion Classifier (RDC), a generative classifier that is constructed from a pre-trained diffusion model to be adversarially robust. |
Huanran Chen; Yinpeng Dong; Zhengyi Wang; Xiao Yang; Chengqi Duan; Hang Su; Jun Zhu; |
293 | SelfVC: Voice Conversion With Iterative Refinement Using Self Transformations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning (SSL) and speaker verification models. |
Paarth Neekhara; Shehzeen Samarah Hussain; Rafael Valle; Boris Ginsburg; Rishabh Ranjan; Shlomo Dubnov; Farinaz Koushanfar; Julian McAuley; |
294 | DiffDA: A Diffusion Model for Weather-scale Data Assimilation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. |
Langwen Huang; Lukas Gianinazzi; Yuejiang Yu; Peter Dominik Dueben; Torsten Hoefler; |
295 | Fair Off-Policy Learning from Observational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different — potentially discriminatory — behavioral policy. |
Dennis Frauen; Valentyn Melnychuk; Stefan Feuerriegel; |
296 | A Graph Is Worth $K$ Words: Euclideanizing Graph Using Pure Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. |
Zhangyang Gao; Daize Dong; Cheng Tan; Jun Xia; Bozhen Hu; Stan Z. Li; |
297 | Transferring Knowledge From Large Foundation Models to Small Downstream Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these shortcomings, we introduce Adaptive Feature Transfer (AFT). |
Shikai Qiu; Boran Han; Danielle C. Maddix; Shuai Zhang; Bernie Wang; Andrew Gordon Wilson; |
298 | Compute Better Spent: Replacing Dense Layers with Structured Matrices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically explore structured matrices as replacements for dense matrices. |
Shikai Qiu; Andres Potapczynski; Marc Anton Finzi; Micah Goldblum; Andrew Gordon Wilson; |
299 | On Convergence of Incremental Gradient for Non-convex Smooth Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particularly if $n$ is the training set size, we improve $n$ times the optimization term of convergence guarantee to reach accuracy $\epsilon$ from $O \left( \frac{n}{\epsilon} \right)$ to $O \left( \frac{1}{\epsilon}\right)$. |
Anastasia Koloskova; Nikita Doikov; Sebastian U Stich; Martin Jaggi; |
300 | Demystifying SGD with Doubly Stochastic Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we establish the convergence of doubly SGD with independent minibatching and random reshuffling under general conditions, which encompasses dependent component gradient estimators. |
Kyurae Kim; Joohwan Ko; Yian Ma; Jacob R. Gardner; |
301 | SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current post-training pruning methods, while reducing the sizes of LLMs, often fail to maintain their original performance. To address these challenges, this paper introduces SPP, a **S**parsity-**P**reserved **P**arameter-efficient fine-tuning method. |
Xudong Lu; Aojun Zhou; Yuhui Xu; Renrui Zhang; Peng Gao; Hongsheng Li; |
302 | Graph Structure Extrapolation for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to achieve graph OOD generalization with the novel design of non-Euclidean-space linear extrapolation. |
Xiner Li; Shurui Gui; Youzhi Luo; Shuiwang Ji; |
303 | Decomposing Uncertainty for Large Language Models Through Input Clarification Ensembling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling, which can be applied to any pre-trained LLM. |
Bairu Hou; Yujian Liu; Kaizhi Qian; Jacob Andreas; Shiyu Chang; Yang Zhang; |
304 | PrE-Text: Training Language Models on Private Federated Data in The Age of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this, on-device training has several drawbacks: (1) most user devices are too small to train large models on-device, (2) on-device training is communication- and computation-intensive, and (3) on-device training can be difficult to debug and deploy. To address these problems, we propose Private Evolution-Text (PrE-Text), a method for generating differentially private (DP) synthetic textual data. |
Charlie Hou; Akshat Shrivastava; Hongyuan Zhan; Rylan Conway; Trang Le; Adithya Sagar; Giulia Fanti; Daniel Lazar; |
305 | Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. |
Xiangzhe Kong; Wenbing Huang; Yang Liu; |
306 | Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce R-Bench, a novel benchmark for evaluating Vision Relationship Hallucination. |
Mingrui Wu; Jiayi Ji; Oucheng Huang; Jiale Li; Yuhang Wu; Xiaoshuai Sun; Rongrong Ji; |
307 | Data-Efficient Molecular Generation with Hierarchical Textual Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical Textual Inversion for Molecular Generation (HI-Mol), a novel data-efficient molecular generation method. |
Seojin Kim; Jaehyun Nam; Sihyun Yu; Younghoon Shin; Jinwoo Shin; |
308 | Understanding Stochastic Natural Gradient Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. |
Kaiwen Wu; Jacob R. Gardner; |
309 | How to Leverage Diverse Demonstrations in Offline Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a simple yet effective data selection method that identifies positive behaviors based on their *resultant states* – a more informative criterion enabling explicit utilization of dynamics information and effective extraction of both expert and beneficial diverse behaviors. |
Sheng Yue; Jiani Liu; Xingyuan Hua; Ju Ren; Sen Lin; Junshan Zhang; Yaoxue Zhang; |
310 | OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation policy from static demonstration data, followed by fast finetuning with minimal environmental interaction. |
Sheng Yue; Xingyuan Hua; Ju Ren; Sen Lin; Junshan Zhang; Yaoxue Zhang; |
311 | Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such a choice for BGD has not yet been able to theoretically justify its empirical superiority over GD, as existing convergence rates for BGD have worse constants than GD in the deterministic cases. To discover such theoretical justification, we set up a simple environment where we consider BGD applied to least-squares with two blocks of variables. |
Liangzu Peng; Wotao Yin; |
312 | Autaptic Synaptic Circuit Enhances Spatio-temporal Predictive Learning of Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They lack the ability to effectively model long-term temporal dependencies and facilitate spatial information interaction, which is crucial for tackling complex, dynamic spatio-temporal prediction tasks. To tackle these challenges, this paper draws inspiration from the concept of autaptic synapses in biology and proposes a novel Spatio-Temporal Circuit (STC) model. |
Lihao Wang; Zhaofei Yu; |
313 | Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With no existing approaches to control diversity to a set value, current solutions focus on blindly promoting it via intrinsic rewards or additional loss functions, effectively changing the learning objective and lacking a principled measure for it. To address this, we introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric by representing policies as the sum of a parameter-shared component and dynamically scaled per-agent components. |
Matteo Bettini; Ryan Kortvelesy; Amanda Prorok; |
314 | Robustness of Nonlinear Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of unsupervised representation learning in slightly misspecified settings, and thus formalize the study of robustness of nonlinear representation learning. |
Simon Buchholz; Bernhard Schölkopf; |
315 | Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first framework for training diffusion models that provably sample from the uncorrupted distribution given only noisy training data, solving an open problem in Ambient diffusion. |
Giannis Daras; Alex Dimakis; Constantinos Costis Daskalakis; |
316 | Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from A Single Demonstration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new algorithm called Deep Demonstration Tracing (DDT). |
Xiong-Hui Chen; Junyin Ye; Hang Zhao; Yi-Chen Li; Xu-Hui Liu; Haoran Shi; Yu-Yan Xu; Zhihao Ye; Si-Hang Yang; Yang Yu; Kai Xu; Zongzhang Zhang; Anqi Huang; |
317 | OODRobustBench: A Benchmark and Large-Scale Analysis of Adversarial Robustness Under Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). |
Lin Li; Yifei Wang; Chawin Sitawarin; Michael W. Spratling; |
318 | Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to enrich preference queries to ask both (1) which features of a given example are preferable in addition to (2) comparisons between objects. |
Andi Peng; Yuying Sun; Tianmin Shu; David Abel; |
319 | NExT-Chat: An LMM for Chat, Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to enhance visual comprehension, recent studies have equipped LMMs with region-level understanding capabilities by representing object bounding box coordinates as a series of text sequences (pix2seq). In this paper, we introduce a novel paradigm for object location modeling called the pix2emb method, where we ask the LMM to output the location embeddings and then decode them with different decoders. |
Ao Zhang; Yuan Yao; Wei Ji; Zhiyuan Liu; Tat-Seng Chua; |
320 | Reinformer: Max-Return Sequence Modeling for Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models. |
Zifeng Zhuang; Dengyun Peng; Jinxin Liu; Ziqi Zhang; Donglin Wang; |
321 | Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods mainly utilize NeRF-based methods to represent the static scene and an additional time-variant MLP to model scene deformations, resulting in relatively low rendering quality as well as slow inference speed. To tackle these challenges, we propose a novel framework named Superpoint Gaussian Splatting (SP-GS). |
Diwen Wan; Ruijie Lu; Gang Zeng; |
322 | A Language Model’s Guide Through Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the focus of previous work has largely been on *truthfulness*, in this paper we extend this framework to a richer set of concepts such as *appropriateness*, *humor*, *creativity* and *quality*, and explore to what degree current detection and guidance strategies work in these challenging settings. |
Dimitri von Rütte; Sotiris Anagnostidis; Gregor Bachmann; Thomas Hofmann; |
323 | Learning Temporal Distances: Contrastive Successor Features Can Provide A Metric Structure for Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. |
Vivek Myers; Chongyi Zheng; Anca Dragan; Sergey Levine; Benjamin Eysenbach; |
324 | DiJiang: Efficient Large Language Models Through Compact Kernelization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DiJiang, a novel Frequency Domain Kernelization approach that enables the transformation of a pre-trained vanilla Transformer into a linear complexity model with little training costs. |
Hanting Chen; Liuzhicheng; Xutao Wang; Yuchuan Tian; Yunhe Wang; |
325 | Chain-of-Thought Predictive Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos. |
Zhiwei Jia; Vineet Thumuluri; Fangchen Liu; Linghao Chen; Zhiao Huang; Hao Su; |
326 | Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Compression, and Tracing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the interplay between memorization and learning in the context of *stochastic convex optimization* (SCO). |
Idan Attias; Gintare Karolina Dziugaite; Mahdi Haghifam; Roi Livni; Daniel M. Roy; |
327 | Testing The Feasibility of Linear Programs with Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program using bandit feedback. |
Aditya Gangrade; Aditya Gopalan; Venkatesh Saligrama; Clayton Scott; |
328 | Online Linear Regression in Dynamic Environments Via Discounting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees *even in the complete absence of prior knowledge*. We present a novel analysis showing that a discounted variant of the Vovk-Azoury-Warmuth forecaster achieves dynamic regret of the form $R_{T}(\vec{u})\le O\Big(d\log(T)\vee \sqrt{dP_{T}^{\gamma}(\vec{u})T}\Big)$, where $P_{T}^{\gamma}(\vec{u})$ is a measure of variability of the comparator sequence, and show that the discount factor achieving this result can be learned on-the-fly. |
Andrew Jacobsen; Ashok Cutkosky; |
329 | Interpreting and Improving Diffusion Models from An Optimization Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. |
Frank Permenter; Chenyang Yuan; |
330 | Knowledge Graphs Can Be Learned with Just Intersection Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we recognize the critical importance of the intersection among the $k$-hop neighborhoods of the head, relation, and tail when determining the validity of a triple. |
Duy Le; Shaochen Zhong; Zirui Liu; Shuai Xu; Vipin Chaudhary; Kaixiong Zhou; Zhaozhuo Xu; |
331 | HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. |
Shengchao Hu; Ziqing Fan; Li Shen; Ya Zhang; Yanfeng Wang; Dacheng Tao; |
332 | Q-value Regularized Transformer for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fortunately, Dynamic Programming (DP) methods offer a solution by leveraging a value function to approximate optimal future returns for each state, while these techniques are prone to unstable learning behaviors, particularly in long-horizon and sparse-reward scenarios. Building upon these insights, we propose the Q-value regularized Transformer (QT), which combines the trajectory modeling ability of the Transformer with the predictability of optimal future returns from DP methods. |
Shengchao Hu; Ziqing Fan; Chaoqin Huang; Li Shen; Ya Zhang; Yanfeng Wang; Dacheng Tao; |
333 | Learning to Predict Mutational Effects of Protein-Protein Interactions By Microenvironment-aware Hierarchical Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first construct a hierarchical prompt codebook to record common microenvironmental patterns at different structural scales independently. Then, we develop a novel codebook pre-training task, namely masked microenvironment modeling, to model the joint distribution of each mutation with their residue types, angular statistics, and local conformational changes in the microenvironment. |
Lirong Wu; Yijun Tian; Haitao Lin; Yufei Huang; Siyuan Li; Nitesh V Chawla; Stan Z. Li; |
334 | How to Trace Latent Generative Model Generated Images Without Artificial Watermark? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask whether it is possible to effectively and efficiently trace the images generated by a specific latent generative model without the aforementioned requirements. |
Zhenting Wang; Vikash Sehwag; Chen Chen; Lingjuan Lyu; Dimitris N. Metaxas; Shiqing Ma; |
335 | On The Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). |
Zhanpeng Zhou; Zijun Chen; Yilan Chen; Bo Zhang; Junchi Yan; |
336 | MILP-FBGen: LP/MILP Instance Generation with Feasibility/Boundedness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a diffusion-based LP/MILP instance generative framework called MILP-FBGen. |
Yahong Zhang; Chenchen Fan; Donghui Chen; Congrui Li; Wenli Ouyang; Mingda Zhu; Junchi Yan; |
337 | Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies Towards Equal Long-term Benefit Rate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address biases in sequential decision-making, we introduce a long-term fairness concept named Equal Long-term Benefit Rate (ELBERT). |
Yuancheng Xu; Chenghao Deng; Yanchao Sun; Ruijie Zheng; Xiyao Wang; Jieyu Zhao; Furong Huang; |
338 | Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing research of video understanding still struggles to achieve in-depth comprehension and reasoning in complex videos, primarily due to the under-exploration of two key bottlenecks: fine-grained spatial-temporal perceptive understanding and cognitive-level video scene comprehension. This paper bridges the gap by presenting a novel solution. |
Hao Fei; Shengqiong Wu; Wei Ji; Hanwang Zhang; Meishan Zhang; Mong-Li Lee; Wynne Hsu; |
339 | BiLLM: Pushing The Limit of Post-Training Quantization for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. |
Wei Huang; Yangdong Liu; Haotong Qin; Ying Li; Shiming Zhang; Xianglong Liu; Michele Magno; XIAOJUAN QI; |
340 | Self-Infilling Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. |
Lin Zheng; Jianbo Yuan; Zhi Zhang; Hongxia Yang; Lingpeng Kong; |
341 | The Linear Representation Hypothesis and The Geometry of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address two closely related questions: What does linear representation actually mean? |
Kiho Park; Yo Joong Choe; Victor Veitch; |
342 | Hyperbolic Active Learning for Semantic Segmentation Under Domain Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a hyperbolic neural network approach to pixel-level active learning for semantic segmentation. |
Luca Franco; Paolo Mandica; Konstantinos Kallidromitis; Devin Guillory; Yu-Teng Li; Trevor Darrell; Fabio Galasso; |
343 | Fool Your (Vision And) Language Model with Embarrassingly Simple Permutations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). |
Yongshuo Zong; Tingyang Yu; Ruchika Chavhan; Bingchen Zhao; Timothy Hospedales; |
344 | Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the underpinning LLM. |
Yongshuo Zong; Ondrej Bohdal; Tingyang Yu; Yongxin Yang; Timothy Hospedales; |
345 | Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these challenges, we first introduce LoCoV1, a 12 task benchmark constructed to measure long-context retrieval where chunking is not possible or not effective. We next present the M2-BERT retrieval encoder, an 80M parameter state-space encoder model built from the Monarch Mixer architecture, capable of scaling to documents up to 32K tokens long. |
Jon Saad-Falcon; Daniel Y Fu; Simran Arora; Neel Guha; Christopher Re; |
346 | Discovering Bias in Latent Space: An Unsupervised Debiasing Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This vulnerability often stems from the model’s preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias directly in the model’s internal representation. |
Dyah Adila; Shuai Zhang; Boran Han; Bernie Wang; |
347 | S3GCL: Spectral, Swift, Spatial Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, prevailing GCL methods confront two primary challenges: 1) They predominantly operate under homophily assumptions, focusing on low-frequency signals in node features while neglecting heterophilic edges that connect nodes with dissimilar features. 2) Their reliance on neighborhood aggregation for inference leads to scalability challenges and hinders deployment in real-time applications. In this paper, we introduce S3GCL, an innovative framework designed to tackle these challenges. |
Guancheng Wan; Yijun Tian; Wenke Huang; Nitesh V Chawla; Mang Ye; |
348 | Subgoal-based Demonstration Learning for Formal Theorem Proving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve the performance of LLMs in formal theorem proving by thoroughly examining the structure and organization of demonstrative in-context examples. |
Xueliang Zhao; Wenda Li; Lingpeng Kong; |
349 | Unsupervised Representation Learning of Brain Activity Via Bridging Voxel Activity and Functional Connectivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing studies have focused on either (1) voxel-level activity, where only a single weight relating the voxel activity to the task (i.e., aggregation of voxel activity over a time window) is considered, missing their temporal dynamics, or (2) functional connectivity of the brain in the level of region of interests, missing voxel-level activities. We bridge this gap and design BrainMixer, an unsupervised learning framework that effectively utilizes both functional connectivity and associated time series of voxels to learn voxel-level representation in an unsupervised manner. |
Ali Behrouz; Parsa Delavari; Farnoosh Hashemi; |
350 | Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). |
Zhenyu He; Guhao Feng; Shengjie Luo; Kai Yang; Liwei Wang; Jingjing Xu; Zhi Zhang; Hongxia Yang; Di He; |
351 | Non-Vacuous Generalization Bounds for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. |
Sanae Lotfi; Marc Anton Finzi; Yilun Kuang; Tim G. J. Rudner; Micah Goldblum; Andrew Gordon Wilson; |
352 | Behavior Generation with Latent Actions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Vector-Quantized Behavior Transformer (VQ-BeT), a versatile model for behavior generation that handles multimodal action prediction, conditional generation, and partial observations. |
Seungjae Lee; Yibin Wang; Haritheja Etukuru; H. Jin Kim; Nur Muhammad Mahi Shafiullah; Lerrel Pinto; |
353 | Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Hierarchical State-Space models (HiSS), a conceptually simple, new technique for continuous sequential prediction. |
Raunaq Bhirangi; Chenyu Wang; Venkatesh Pattabiraman; Carmel Majidi; Abhinav Gupta; Tess Hellebrekers; Lerrel Pinto; |
354 | Robust Stable Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose a training framework with modified SNN neurons and to reduce the mean square of membrane potential perturbation aiming at enhancing the robustness of SNN. |
Jianhao Ding; Zhiyu Pan; Yujia Liu; Zhaofei Yu; Tiejun Huang; |
355 | Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. |
Zhen Qin; Weigao Sun; Dong Li; Xuyang Shen; Weixuan Sun; Yiran Zhong; |
356 | Flextron: Many-in-One Flexible Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. |
Ruisi Cai; Saurav Muralidharan; Greg Heinrich; Hongxu Yin; Zhangyang Wang; Jan Kautz; Pavlo Molchanov; |
357 | Position: What Makes An Image Realistic? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we introduce the notion of a *universal critic*, which unlike adversarial critics does not require adversarial training. |
Lucas Theis; |
358 | X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. |
Yiwei Ma; Zhekai Lin; Jiayi Ji; Yijun Fan; Xiaoshuai Sun; Rongrong Ji; |
359 | OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to learn all plausible 3D scene configurations that match the input video, instead of just inferring a specific one. |
Ziyang Song; Jinxi Li; Bo Yang; |
360 | Enabling Few-Shot Learning with PID Control: A Layer Adaptive Optimizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by classical proportional-integral-derivative (PID) control theory, this study introduces a Layer-Adaptive PID (LA-PID) Optimizer, a MAML-based optimizer that employs efficient parameter optimization methods to dynamically adjust task-specific PID control gains at each layer of the network, conducting a first-principles analysis of optimal convergence conditions. |
Le Yu; Xinde Li; Pengfei Zhang; zhentong zhang; Fir Dunkin; |
361 | QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Quest, a query-aware KV cache selection algorithm. |
Jiaming Tang; Yilong Zhao; Kan Zhu; Guangxuan Xiao; Baris Kasikci; Song Han; |
362 | FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research.Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. |
Wenzhe Li; Zihan Ding; Seth Karten; Chi Jin; |
363 | Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounted for in *learning, exploration and planning*, but presents significant computational and statistical challenges. To address these difficulties, we develop a representation-based perspective that leads to a coherent framework and tractable algorithmic approach for practical reinforcement learning from partial observations. |
Hongming Zhang; Tongzheng Ren; Chenjun Xiao; Dale Schuurmans; Bo Dai; |
364 | Hypergraph-enhanced Dual Semi-supervised Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. |
Wei Ju; Zhengyang Mao; Siyu Yi; Yifang Qin; Yiyang Gu; Zhiping Xiao; Yifan Wang; Xiao Luo; Ming Zhang; |
365 | CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To pursue more efficient vision-language Transformers, this paper introduces Cross-Guided Ensemble of Tokens (CrossGET), a general acceleration framework for vision-language Transformers. |
Dachuan Shi; Chaofan Tao; Anyi Rao; Zhendong Yang; Chun Yuan; Jiaqi Wang; |
366 | FreeBind: Free Lunch in Unified Multimodal Space Via Knowledge Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via “space bonds. |
Zehan Wang; Ziang Zhang; Xize Cheng; Rongjie Huang; Luping Liu; Zhenhui Ye; Haifeng Huang; Yang Zhao; Tao Jin; Peng Gao; Zhou Zhao; |
367 | Decoding Compressed Trust: Scrutinizing The Trustworthiness of Efficient LLMs Under Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. |
Junyuan Hong; Jinhao Duan; Chenhui Zhang; Zhangheng LI; Chulin Xie; Kelsey Lieberman; James Diffenderfer; Brian R. Bartoldson; AJAY KUMAR JAISWAL; Kaidi Xu; Bhavya Kailkhura; Dan Hendrycks; Dawn Song; Zhangyang Wang; Bo Li; |
368 | Spider: A Unified Framework for Context-dependent Concept Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified model with a single set of parameters, Spider, which only needs to be trained once. |
Xiaoqi Zhao; Youwei Pang; Wei Ji; Baicheng Sheng; Jiaming Zuo; Lihe Zhang; Huchuan Lu; |
369 | How Free Is Parameter-Free Stochastic Optimization? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. |
Amit Attia; Tomer Koren; |
370 | Tuning-Free Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD). |
Ahmed Khaled; Chi Jin; |
371 | On The Origins of Linear Representations in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An array of recent works have argued that high-level semantic concepts are encoded linearly in the representation space of large language models. In this work, we study the origins of such linear representations. |
Yibo Jiang; Goutham Rajendran; Pradeep Kumar Ravikumar; Bryon Aragam; Victor Veitch; |
372 | Improving Transformers with Dynamically Composable Multi-Head Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Dynamically Composable Multi-Head Attention (DCMHA), a parameter and computation efficient attention architecture that tackles the shortcomings of MHA and increases the expressive power of the model by dynamically composing attention heads. |
Da Xiao; Qingye Meng; Shengping Li; xingyuan yuan; |
373 | See More Details: Efficient Image Super-Resolution By Experts Mining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce SeemoRe, an efficient SR model employing expert mining. |
Eduard Zamfir; Zongwei Wu; Nancy Mehta; Yulun Zhang; Radu Timofte; |
374 | Weisfeiler Leman for Euclidean Equivariant Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on our results, we develop our WeLNet architecture, which sets new state-of-the-art results on the N-Body dynamics task and the GEOM-QM9 molecular conformation generation task. |
Snir Hordan; Tal Amir; Nadav Dym; |
375 | A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Currently, popular deep learning-based approaches rely primarily on generative models that yield exact sample likelihoods. This work introduces a method that lifts this restriction and opens the possibility to employ highly expressive latent variable models like diffusion models. |
Sebastian Sanokowski; Sepp Hochreiter; Sebastian Lehner; |
376 | Scaling Laws for Fine-Grained Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze their scaling properties, highlighting certain arbitrary assumptions present in the existing literature. |
Jan Ludziejewski; Jakub Krajewski; Kamil Adamczewski; Maciej Pióro; Michał Krutul; Szymon Antoniak; Kamil Ciebiera; Krystian Król; Tomasz Odrzygóźdź; Piotr Sankowski; Marek Cygan; Sebastian Jaszczur; |
377 | Provably Scalable Black-Box Variational Inference with Structured Variational Families Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a theoretical middle ground *between* mean-field variational families and full-rank families: *structured* variational families. |
Joohwan Ko; Kyurae Kim; Woo Chang Kim; Jacob R. Gardner; |
378 | Towards Theoretical Understandings of Self-Consuming Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric and non-parametric models. |
Shi Fu; Sen Zhang; Yingjie Wang; Xinmei Tian; Dacheng Tao; |
379 | Diagnosing The Compositional Knowledge of Vision Language Models from A Game-Theoretic View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose evaluation methods from a novel game-theoretic view to assess the vulnerability of VLMs on different aspects of compositional understanding, e.g., relations and attributes. |
Jin Wang; Shichao Dong; Yapeng Zhu; kelu Yao; Weidong Zhao; Chao Li; Ping Luo; |
380 | Representation Surgery for Multi-Task Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a representation surgery solution called “Surgery to reduce representation bias in the merged model. |
Enneng Yang; Li Shen; Zhenyi Wang; Guibing Guo; Xiaojun Chen; Xingwei Wang; Dacheng Tao; |
381 | Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conversely, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employing differentiable simulation provide gradients with reduced variance but are susceptible to sampling error in scenarios involving stiff dynamics, such as physical contact. This paper investigates the source of this error and introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. |
Ignat Georgiev; Krishnan Srinivasan; Jie Xu; Eric Heiden; Animesh Garg; |
382 | Position: Do Pretrained Transformers Learn In-Context By Gradient Descent? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conduct comprehensive empirical analyses on language models pre-trained on natural data (LLaMa-7B). |
Lingfeng Shen; Aayush Mishra; Daniel Khashabi; |
383 | TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion models have achieved notable success in image generation, but they remain highly vulnerable to backdoor attacks, which compromise their integrity by producing specific undesirable outputs when presented with a pre-defined trigger. In this paper, we investigate how to protect diffusion models from this dangerous threat. |
Yichuan Mo; Hui Huang; Mingjie Li; Ang Li; Yisen Wang; |
384 | Unified Training of Universal Time Series Forecasting Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, constructing such a model poses unique challenges specific to time series data: (i) cross-frequency learning, (ii) accommodating an arbitrary number of variates for multivariate time series, and (iii) addressing the varying distributional properties inherent in large-scale data. To address these challenges, we present novel enhancements to the conventional time series Transformer architecture, resulting in our proposed **M**asked Enc**o**der-based Un**i**ve**r**s**a**l T**i**me Series Forecasting Transformer (**Moirai**). |
Gerald Woo; Chenghao Liu; Akshat Kumar; Caiming Xiong; Silvio Savarese; Doyen Sahoo; |
385 | How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized via two separate terms: _(i)_ the _stability_ of the model with respect to individual training samples, and _(ii)_ the _feature alignment_ between the spurious pattern and the full sample. |
Simone Bombari; Marco Mondelli; |
386 | Towards Understanding The Word Sensitivity of Attention Layers: A Study Via Random Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map. |
Simone Bombari; Marco Mondelli; |
387 | An Information-Theoretic Analysis of In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new information-theoretic tools that lead to a concise yet general decomposition of error for a Bayes optimal predictor into two components: meta-learning error and intra-task error. |
Hong Jun Jeon; Jason D. Lee; Qi Lei; Benjamin Van Roy; |
388 | Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an opposing operation simplification strategy to increase the diversity of the population. |
Peijie Dong; Lujun Li; Zhenheng Tang; Xiang Liu; Xinglin Pan; Qiang Wang; Xiaowen Chu; |
389 | Transforming and Combining Rewards for Aligning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common approach for aligning language models to human preferences is to first learn a reward model from preference data, and then use this reward model to update the language model. We study two closely related problems that arise in this approach. |
Zihao Wang; Chirag Nagpal; Jonathan Berant; Jacob Eisenstein; Alexander Nicholas D’Amour; Sanmi Koyejo; Victor Veitch; |
390 | The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. |
Ziquan Liu; Yufei CUI; Yan Yan; Yi Xu; Xiangyang Ji; Xue Liu; Antoni B. Chan; |
391 | Second-Order Uncertainty Quantification: A Distance-Based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the past couple of years, various approaches to representing and quantifying different types of predictive uncertainty in machine learning, notably in the setting of classification, have been proposed on the basis of second-order probability distributions, i.e., predictions in the form of distributions on probability distributions.In light of these criticisms, we propose a set of formal criteria that meaningful uncertainty measures for predictive uncertainty based on second-order distributions should obey. |
Yusuf Sale; Viktor Bengs; Michele Caprio; Eyke Hüllermeier; |
392 | Position: Towards Implicit Prompt For Text-To-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a benchmark named ImplicitBench and conduct an investigation on the performance and impacts of implicit prompts with popular T2I models. |
Yue Yang; Yuqi Lin; Hong Liu; Wenqi Shao; Runjian Chen; Hailong Shang; Yu Wang; Yu Qiao; Kaipeng Zhang; Ping Luo; |
393 | Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the Vendi scores—a family of interpretable similarity-based diversity metrics—to account for quality. |
Quan Nguyen; Adji Bousso Dieng; |
394 | Scalable and Flexible Causal Discovery with An Efficient Test for Adjacency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). |
Alan Nawzad Amin; Andrew Gordon Wilson; |
395 | Learning to Play Atari in A World of Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. |
Pranav Agarwal; Sheldon Andrews; Samira Ebrahimi Kahou; |
396 | Online Learning and Information Exponents: The Importance of Batch Size & Time/Complexity Tradeoffs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. |
Luca Arnaboldi; Yatin Dandi; Florent Krzakala; Bruno Loureiro; Luca Pesce; Ludovic Stephan; |
397 | Comparing Graph Transformers Via Positional Encodings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A priori, it is unclear which method is better for maximizing the power of the resulting graph transformer. In this paper, we aim to understand the relationship between these different types of positional encodings. |
Mitchell Black; Zhengchao Wan; Gal Mishne; Amir Nayyeri; Yusu Wang; |
398 | Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we develop an innovative method named Bayesian Knowledge Distillation (BKD) to provide a transparent interpretation of the working mechanism of KD, and a suite of Bayesian inference tools for the uncertainty quantification of the student model. |
Luyang Fang; Yongkai Chen; Wenxuan Zhong; Ping Ma; |
399 | HarmonyDream: Task Harmonization Inside World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. |
Haoyu Ma; Jialong Wu; Ningya Feng; Chenjun Xiao; Dong Li; Jianye HAO; Jianmin Wang; Mingsheng Long; |
400 | Multimodal Prototyping for Cancer Survival Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this process generates many tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses. Instead, we hypothesize that we can: (1) effectively summarize the morphological content of a WSI by condensing its constituting tokens using morphological prototypes, achieving more than $300\times$ compression; and (2) accurately characterize cellular functions by encoding the transcriptomic profile with biological pathway prototypes, all in an unsupervised fashion. |
Andrew H. Song; Richard J. Chen; Guillaume Jaume; Anurag Jayant Vaidya; Alexander Baras; Faisal Mahmood; |
401 | Nash Learning from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. |
Remi Munos; Michal Valko; Daniele Calandriello; Mohammad Gheshlaghi Azar; Mark Rowland; Zhaohan Daniel Guo; Yunhao Tang; Matthieu Geist; Thomas Mesnard; Côme Fiegel; Andrea Michi; Marco Selvi; Sertan Girgin; Nikola Momchev; Olivier Bachem; Daniel J Mankowitz; Doina Precup; Bilal Piot; |
402 | Graph2Tac: Online Representation Learning of Formal Math Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, lemmas with close proximity regularly exhibit similar proof structures. We show that this _locality_ property can be exploited through online learning techniques to obtain solving agents that far surpass offline learners when asked to prove theorems in an unseen mathematical setting. |
Lasse Blaauwbroek; Mirek Olšák; Jason Rute; Fidel Ivan Schaposnik Massolo; Jelle Piepenbrock; Vasily Pestun; |
403 | CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we approach the ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). |
Natasha Butt; Blazej Manczak; Auke Wiggers; Corrado Rainone; David W. Zhang; Michaël Defferrard; Taco Cohen; |
404 | Generalized Preference Optimization: A Unified Approach to Offline Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose generalized preference optimization (GPO), a family of offline losses parameterized by a general class of convex functions. |
Yunhao Tang; Zhaohan Daniel Guo; Zeyu Zheng; Daniele Calandriello; Remi Munos; Mark Rowland; Pierre Harvey Richemond; Michal Valko; Bernardo Avila Pires; Bilal Piot; |
405 | ConvNet Vs Transformer, Supervised Vs CLIP: Beyond ImageNet Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct an in-depth comparative analysis of model behaviors beyond ImageNet accuracy, for both ConvNet and Vision Transformer architectures, each across supervised and CLIP training paradigms. |
Kirill Vishniakov; Zhiqiang Shen; Zhuang Liu; |
406 | Learning to Continually Learn with The Bayesian Principle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models’ robustness to forgetting. |
Soochan Lee; Hyeonseong Jeon; Jaehyeon Son; Gunhee Kim; |
407 | Near-Linear Time Approximation Algorithms for K-means with Outliers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue of aspect ratio dependency on the running time, we propose sampling-based algorithms with almost linear running time in the data size, where a crucial component of our approach is an algorithm called Fast-Sampling. |
Junyu Huang; Qilong Feng; Ziyun Huang; Jinhui Xu; Jianxin Wang; |
408 | ViP: A Differentially Private Foundation Model for Computer Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose as a mitigation measure a recipe to train foundation vision models via self-supervised learning with differential privacy (DP) guarantee. |
Yaodong Yu; Maziar Sanjabi; Yi Ma; Kamalika Chaudhuri; Chuan Guo; |
409 | Total Variation Floodgate for Variable Importance Inference in Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model assumption. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. |
Wenshuo Wang; Lucas Janson; Lihua Lei; Aaditya Ramdas; |
410 | AlphaZero-Like Tree-Search Can Guide Large Language Model Decoding and Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, these methods cannot benefit from in-domain training and only rely on pretraining process — they will not work in domains where the pre-trained LLM does not have enough knowledge to serve as an effective value function or in domains that require long-horizon planning. To address these limitations, we present an AlphaZero-like tree-search learning framework for LLMs (termed TS-LLM), systematically illustrating how tree-search with a learned value function can guide LLM decoding. |
Ziyu Wan; Xidong Feng; Muning Wen; Stephen Marcus McAleer; Ying Wen; Weinan Zhang; Jun Wang; |
411 | Prototypical Transformer As Unified Motion Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. |
Cheng Han; Yawen Lu; Guohao Sun; James Chenhao Liang; Zhiwen Cao; Qifan Wang; Qiang Guan; Sohail Dianat; Raghuveer Rao; Tong Geng; ZHIQIANG TAO; Dongfang Liu; |
412 | Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. |
Zhifeng Kong; Arushi Goel; Rohan Badlani; Wei Ping; Rafael Valle; Bryan Catanzaro; |
413 | Long Is More for Alignment: A Simple But Tough-to-Beat Baseline for Instruction Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: LIMA (NeurIPS 2023) and AlpaGasus (ICLR 2024) are state-of-the-art methods for selecting such high-quality examples, either via manual curation or using GPT-3.5-Turbo as a quality scorer. We show that the extremely simple baseline of selecting the 1,000 instructions with longest responses—that intuitively contain more learnable information and are harder to overfit—from standard datasets can consistently outperform these sophisticated methods according to GPT-4 and PaLM-2 as judges, while remaining competitive on the Open LLM benchmarks that test factual knowledge. |
Hao Zhao; Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; |
414 | TravelPlanner: A Benchmark for Real-World Planning with Language Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents? To advance this investigation, we propose TravelPlanner, a new planning benchmark that focuses on travel planning, a common real-world planning scenario. |
Jian Xie; Kai Zhang; Jiangjie Chen; Tinghui Zhu; Renze Lou; Yuandong Tian; Yanghua Xiao; Yu Su; |
415 | Training-Free Long-Context Scaling of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given the expensive overhead of finetuning large-scale models with longer sequences, we propose a training-free approach named Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of up to 100k tokens. |
Chenxin An; Fei Huang; Jun Zhang; Shansan Gong; Xipeng Qiu; Chang Zhou; Lingpeng Kong; |
416 | Sliced Wasserstein with Random-Path Projecting Directions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an optimization-free slicing distribution that provides a fast sampling for the Monte Carlo estimation of expectation. |
Khai Nguyen; Shujian Zhang; Tam Le; Nhat Ho; |
417 | MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, these multi-agent approaches fail to provide a final, single model for efficient inference. To address this, we introduce MAGDi, a new method for structured distillation of the reasoning interactions between multiple LLMs into smaller LMs. |
Justin Chen; Swarnadeep Saha; Elias Stengel-Eskin; Mohit Bansal; |
418 | Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we highlight that Hamiltonian prediction possesses a self-consistency principle, based on which we propose self-consistency training, an exact training method that does not require labeled data. |
He Zhang; Chang Liu; Zun Wang; Xinran Wei; Siyuan Liu; Nanning Zheng; Bin Shao; Tie-Yan Liu; |
419 | How Do Large Language Models Navigate Conflicts Between Honesty and Helpfulness? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How do large language models (LLMs) handle such nuanced trade-offs? To address this question, we use psychological models and experiments designed to characterize human behavior to analyze LLMs. |
Ryan Liu; Theodore Sumers; Ishita Dasgupta; Thomas L. Griffiths; |
420 | Knowledge-aware Reinforced Language Models for Protein Directed Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel Knowledge-aware Reinforced Language Model (KnowRLM) for MLDE. |
Yuhao Wang; Qiang Zhang; Ming Qin; Xiang Zhuang; Xiaotong Li; Zhichen Gong; Zeyuan Wang; Yu Zhao; Jianhua Yao; Keyan Ding; Huajun Chen; |
421 | AI Control: Improving Safety Despite Intentional Subversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do so, safety measures either aim at making LLMs try to avoid harmful outcomes or aim at preventing LLMs from causing harmful outcomes, even if they try to cause them. In this paper, we focus on this second layer of defense. |
Ryan Greenblatt; Buck Shlegeris; Kshitij Sachan; Fabien Roger; |
422 | Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish the first theoretical comparisons between two leading generative SSL paradigms: autoregressive SSL and masked SSL. |
Qi Zhang; Tianqi Du; Haotian Huang; Yifei Wang; Yisen Wang; |
423 | Active Statistical Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the concept of active learning, we propose active inference—a methodology for statistical inference with machine-learning-assisted data collection. |
Tijana Zrnic; Emmanuel Candes; |
424 | Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces RaLMSpec, a framework that accelerates iterative retrieval-augmented language model (RaLM) with *speculative retrieval* and *batched verification*. |
Zhihao Zhang; Alan Zhu; Lijie Yang; Yihua Xu; Lanting Li; Phitchaya Mangpo Phothilimthana; Zhihao Jia; |
425 | The Perception-Robustness Tradeoff in Deterministic Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the behavior of deterministic methods for solving inverse problems in imaging. |
Guy Ohayon; Tomer Michaeli; Michael Elad; |
426 | Generalization Analysis of Stochastic Weight Averaging with General Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the theoretical challenges, we adopt mathematical induction to find a recursive representation that bounds the gradient at each step. Based on this, we establish stability bounds supporting sampling with and without replacement in the non-convex setting. |
Peng Wang; Li Shen; Zerui Tao; Shuaida He; Dacheng Tao; |
427 | Scalable Multiple Kernel Clustering: Learning Clustering Structure from Expectation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive an upper bound of the difference between a kernel matrix and its expectation under a mild assumption. |
Weixuan Liang; En Zhu; Shengju Yu; Huiying Xu; Xinzhong Zhu; Xinwang Liu; |
428 | DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the retrieval-augmented large language model with Definite Finite Automaton (DFA-RAG), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs). |
Yiyou Sun; Junjie Hu; Wei Cheng; Haifeng Chen; |
429 | Equivariance Via Minimal Frame Averaging for More Symmetries and Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. |
Yuchao Lin; Jacob Helwig; Shurui Gui; Shuiwang Ji; |
430 | A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the required symmetries. |
Keqiang Yan; Alexandra Saxton; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji; |
431 | Borda Regret Minimization for Generalized Linear Dueling Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. |
Yue Wu; Tao Jin; Qiwei Di; Hao Lou; Farzad Farnoud; Quanquan Gu; |
432 | SurfPro: Functional Protein Design Based on Continuous Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein’s function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. |
Zhenqiao Song; Tinglin Huang; Lei Li; Wengong Jin; |
433 | The Pitfalls of Next-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can a mere next-token predictor faithfully model human thinking? Our work is aimed at crystallizing this intuitive concern, which is currently fragmented in the literature. |
Gregor Bachmann; Vaishnavh Nagarajan; |
434 | What’s The Score? Automated Denoising Score Matching for Nonlinear Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a family of tractable denoising score matching objectives, called local-DSM, built using local increments of the diffusion process. |
Raghav Singhal; Mark Goldstein; Rajesh Ranganath; |
435 | Break The Sequential Dependency of LLM Inference Using Lookahead Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Lookahead decoding, an exact, parallel decoding algorithm that accelerates LLM decoding without needing auxiliary models or data stores. |
Yichao Fu; Peter Bailis; Ion Stoica; Hao Zhang; |
436 | When and How Does In-Distribution Label Help Out-of-Distribution Detection? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We employ a graph-theoretic approach, rigorously analyzing the separability of ID data from OOD data in a closed-form manner. |
Xuefeng Du; Yiyou Sun; Yixuan Li; |
437 | On The Embedding Collapse When Scaling Up Recommendation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. |
Xingzhuo Guo; Junwei Pan; Ximei Wang; Baixu Chen; Jie Jiang; Mingsheng Long; |
438 | LLaGA: Large Language and Graph Assistant Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the the **L**arge **L**anguage **a**nd **G**raph **A**ssistant (**LLaGA**), an innovative model that effectively integrates LLM capabilities to handle the complexities of graph-structured data. |
Runjin Chen; Tong Zhao; AJAY KUMAR JAISWAL; Neil Shah; Zhangyang Wang; |
439 | PID: Prompt-Independent Data Protection Against Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, considering the visual encoder’s independence from textual prompts, we delve into the visual encoder and thoroughly investigate how manipulating the visual encoder affects the few-shot fine-tuning process of LDMs. Drawing on these insights, we propose a simple yet effective method called Prompt-Independent Defense (PID) to safeguard privacy against LDMs. |
Ang Li; Yichuan Mo; Mingjie Li; Yisen Wang; |
440 | An Empirical Study of Realized GNN Expressiveness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous research has attempted to use datasets for measurement, but facing problems with difficulty (any model surpassing 1-WL has nearly 100% accuracy), granularity (models tend to be either 100% correct or near random guess), and scale (only several essentially different graphs involved). To address these limitations, we study the realized expressive power that a practical model instance can achieve using a novel expressiveness dataset, BREC, which poses greater difficulty (with up to 4-WL-indistinguishable graphs), finer granularity (enabling comparison of models between 1-WL and 3-WL), a larger scale (consisting of 800 1-WL-indistinguishable graphs that are non-isomorphic to each other). |
Yanbo Wang; Muhan Zhang; |
441 | WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. |
Alexandre Drouin; Maxime Gasse; Massimo Caccia; Issam H. Laradji; Manuel Del Verme; Tom Marty; David Vazquez; Nicolas Chapados; Alexandre Lacoste; |
442 | Averaging $n$-step Returns Reduces Variance in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. |
Brett Daley; Martha White; Marlos C. Machado; |
443 | NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by it, we propose a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model, which generates attributes in each subspace following its corresponding prompt. |
Zeqian Ju; Yuancheng Wang; Kai Shen; Xu Tan; Detai Xin; Dongchao Yang; Eric Liu; Yichong Leng; Kaitao Song; Siliang Tang; Zhizheng Wu; Tao Qin; Xiangyang Li; Wei Ye; Shikun Zhang; Jiang Bian; Lei He; Jinyu Li; sheng zhao; |
444 | PolySketchFormer: Fast Transformers Via Sketching Polynomial Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent theoretical results indicate the intractability of sub-quadratic softmax attention approximation under reasonable complexity assumptions. This paper addresses this challenge by first demonstrating that polynomial attention with high degree can effectively replace softmax without sacrificing model quality. |
Praneeth Kacham; Vahab Mirrokni; Peilin Zhong; |
445 | Few-Shot Character Understanding in Movies As An Assessment to Meta-Learning of Theory-of-Mind Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our extensive human study verifies that humans are capable of solving our problem by inferring characters’ mental states based on their previously seen movies. |
Mo Yu; Qiujing Wang; Shunchi Zhang; Yisi Sang; Kangsheng Pu; Zekai Wei; Han Wang; Liyan Xu; Jing Li; Yue Yu; Jie Zhou; |
446 | Reinforcement Learning Within Tree Search for Fast Macro Placement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing RL-based techniques are hindered by their low sample efficiency, requiring numerous online rollouts or substantial offline expert data to achieve bootstrap, which are often impractical in industrial scenarios. To address this challenge, we propose a novel sample-efficient framework, namely **EfficientPlace**, for fast macro placement. |
Zijie Geng; Jie Wang; Ziyan Liu; Siyuan Xu; Zhentao Tang; Mingxuan Yuan; Jianye HAO; Yongdong Zhang; Feng Wu; |
447 | Bias of Stochastic Gradient Descent or The Architecture: Disentangling The Effects of Overparameterization of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to disentangle the factors that influence generalization stemming from optimization and architectural choices by studying *random* and *SGD-optimized* networks that achieve zero training error. |
Amit Peleg; Matthias Hein; |
448 | Enhancing Trajectory Prediction Through Self-Supervised Waypoint Distortion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel approach called SSWDP (Self-Supervised Waypoint Distortion Prediction). |
Pranav singh chib; Pravendra Singh; |
449 | MS-TIP: Imputation Aware Pedestrian Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the MultiScale hypergraph for Trajectory Imputation and Prediction (MS-TIP), a novel approach that simultaneously addresses the imputation of missing observations and the prediction of future trajectories. |
Pranav singh chib; Achintya Nath; Paritosh Kabra; Ishu Gupta; Pravendra Singh; |
450 | Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this limitation, we show that ensembling several NOs can identify high-error regions and provide good uncertainty estimates that are well-correlated with prediction errors. Based on this, we propose a cost-effective alternative, DiverseNO, that mimics the properties of the ensemble by encouraging diverse predictions from its multiple heads in the last feed-forward layer. |
S Chandra Mouli; Danielle C. Maddix; Shima Alizadeh; Gaurav Gupta; Andrew Stuart; Michael W. Mahoney; Bernie Wang; |
451 | ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images. |
Rohan Wadhawan; Hritik Bansal; Kai-Wei Chang; Nanyun Peng; |
452 | Stochastic Positional Embeddings Improve Masked Image Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to incorporate location uncertainty to MIM by using stochastic positional embeddings (StoP). |
Amir Bar; Florian Bordes; Assaf Shocher; Mido Assran; Pascal Vincent; Nicolas Ballas; Trevor Darrell; Amir Globerson; Yann LeCun; |
453 | Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. |
Yuchen Li; Alexandre Kirchmeyer; Aashay Mehta; Yilong Qin; Boris Dadachev; Kishore Papineni; Sanjiv Kumar; Andrej Risteski; |
454 | Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models By Finding Problematic Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose **Prompting4Debugging (P4D)** as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. |
Zhi-Yi Chin; Chieh Ming Jiang; Ching-Chun Huang; Pin-Yu Chen; Wei-Chen Chiu; |
455 | Provable Interactive Learning with Hindsight Instruction Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Next, we study a specialized setting where the underlying instruction-response distribution can be decomposed as a low-rank matrix. We introduce an algorithm called LORIL for this setting and show that it is a no-regret algorithm with the regret scaling with $\sqrt{T}$ and depends on the _intrinsic rank_ but does not depend on the agent’s response space. |
Dipendra Misra; Aldo Pacchiano; Robert E. Schapire; |
456 | Learning from Streaming Data When Users Choose Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The service providers’ models influence which service the user will choose at the next time step, and the user’s choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. |
Jinyan Su; Sarah Dean; |
457 | Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We further identify two underlying causes of this inefficiency: the redundant inversion of noisy backgrounds and the unintended inversion of spurious correlations—a phenomenon we term “hallucination” in model inversion. To address these limitations, we propose a novel sparse model inversion strategy, as a plug-and-play extension to speed up existing dense inversion methods with no need for modifying their original loss functions. |
Zixuan Hu; Yongxian Wei; Li Shen; Zhenyi Wang; Lei Li; Chun Yuan; Dacheng Tao; |
458 | CaM: Cache Merging for Memory-efficient LLMs Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This perturbation escalates with the compression ratio, which can precipitate a marked deterioration in LLM inference performance. This paper introduces Cache Merging (CaM) as a solution to mitigate this challenge. |
Yuxin Zhang; Yuxuan Du; Gen Luo; Yunshan Zhong; Zhenyu Zhang; Shiwei Liu; Rongrong Ji; |
459 | A Doubly Recursive Stochastic Compositional Gradient Descent Method for Federated Multi-Level Compositional Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the convergence rate of existing federated two-level compositional optimization learning algorithms fails to achieve linear speedup with respect to the number of workers under heterogeneous settings. After identifying the reason for this failure, we developed a novel federated stochastic multi-level compositional optimization algorithm by introducing a novel Jacobian-vector product estimator. |
Hongchang Gao; |
460 | Pairwise Alignment Improves Graph Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel, theoretically principled method, Pairwise Alignment (Pair-Align) to counter graph structure shift by mitigating conditional structure shift (CSS) and label shift (LS). |
Shikun Liu; Deyu Zou; Han Zhao; Pan Li; |
461 | Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Jetfire, an efficient and accurate INT8 training method specific to transformers. |
Haocheng Xi; Yuxiang Chen; Kang Zhao; KAI JUN TEH; Jianfei Chen; Jun Zhu; |
462 | Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we confront the reward overoptimization problem in diffusion model alignment through the lenses of both inductive and primacy biases. |
Ziyi Zhang; Sen Zhang; Yibing Zhan; Yong Luo; Yonggang Wen; Dacheng Tao; |
463 | Foundations of Testing for Finite-Sample Causal Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the canonical setup in theoretical causal discovery literature, where one assumes causal sufficiency and access to the graph skeleton. |
Tom Yan; Ziyu Xu; Zachary Chase Lipton; |
464 | Beyond The Federation: Topology-aware Federated Learning for Generalization to Unseen Clients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve OOF-resiliency in a scalable manner, we propose Topology-aware Federated Learning (TFL) that leverages client topology – a graph representing client relationships – to effectively train robust models against OOF data. |
Mengmeng Ma; Tang Li; Xi Peng; |
465 | Prompt-guided Precise Audio Editing with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach, referred to as **PPAE**, which serves as a general module for diffusion models and enables precise audio editing. |
Manjie Xu; Chenxing Li; Duzhen Zhang; Dan Su; Wei Liang; Dong Yu; |
466 | Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an innovative approach, **E**nergy-guided **DI**ffusion **S**ampling (EDIS), which utilizes a diffusion model to extract prior knowledge from the offline dataset and employs energy functions to distill this knowledge for enhanced data generation in the online phase. |
Xu-Hui Liu; Tian-Shuo Liu; Shengyi Jiang; Ruifeng Chen; Zhilong Zhang; Xinwei Chen; Yang Yu; |
467 | Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest investigating internal activations and quantifying LLM’s truthfulness using the local intrinsic dimension (LID) of model activations. |
Fan Yin; Jayanth Srinivasa; Kai-Wei Chang; |
468 | KernelWarehouse: Rethinking The Design of Dynamic Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to no research progress that can allow researchers to explore the setting $n > 100$ (an order of magnitude larger than the typical setting $n < 10$) for pushing forward the performance boundary of dynamic convolution while enjoying parameter efficiency. To fill this gap, in this paper, we propose KernelWarehouse, a more general form of dynamic convolution, which redefines the basic concepts of “kernels”, “assembling kernels” and “attention function” through the lens of exploiting convolutional parameter dependencies within the same layer and across neighboring layers of a ConvNet. |
Chao Li; Anbang Yao; |
469 | Translation Equivariant Transformer Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce of a new family of translation equivariant TNPs that incorporate *translation equivariance*. |
Matthew Ashman; Cristiana Diaconu; Junhyuck Kim; Lakee Sivaraya; Stratis Markou; James Requeima; Wessel P Bruinsma; Richard E. Turner; |
470 | Sequential Disentanglement By Extracting Static Information From A Single Sequence Element Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a novel and simple architecture that mitigates information leakage by offering a simple and effective subtraction inductive bias while conditioning on a single sample. |
Nimrod Berman; Ilan Naiman; Idan Arbiv; Gal Fadlon; Omri Azencot; |
471 | Improving Sharpness-Aware Minimization By Lookahead Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent studies have shown that SAM may suffer from convergence instability and oscillate around saddle points, resulting in slow convergence and inferior performance. To address this problem, we propose the use of a lookahead mechanism to gather more information about the landscape by looking further ahead, and thus find a better trajectory to converge. |
Runsheng Yu; Youzhi Zhang; James Kwok; |
472 | Causal Bandits: The Pareto Optimal Frontier of Adaptivity, A Reduction to Linear Bandits, and Limitations Around Unknown Marginals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the problem of adapting to the presence or absence of causal structure in multi-armed bandit problems. |
Ziyi Liu; Idan Attias; Daniel M. Roy; |
473 | Differentially Private Decentralized Learning with Random Walks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we characterize the privacy guarantees of decentralized learning with random walk algorithms, where a model is updated by traveling from one node to another along the edges of a communication graph. |
Edwige Cyffers; Aurélien Bellet; Jalaj Upadhyay; |
474 | How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study how well LLMs can negotiate with each other. |
Federico Bianchi; Patrick John Chia; Mert Yuksekgonul; Jacopo Tagliabue; Dan Jurafsky; James Zou; |
475 | Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. |
Weiye Zhao; Feihan Li; Yifan Sun; Rui Chen; Tianhao Wei; Changliu Liu; |
476 | Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs “Difficult Downstream Tasks in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *Junk DNA Hypothesis* by adopting a novel *task-centric* angle for the pre-trained weights of large language models (LLMs). |
Lu Yin; AJAY KUMAR JAISWAL; Shiwei Liu; Souvik Kundu; Zhangyang Wang; |
477 | Navigating Scaling Laws: Compute Optimality in Adaptive Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to the notion of a ‘compute-optimal’ model, i.e. a model that allocates a given level of compute during training optimally to maximize performance. In this work, we extend the concept of optimality by allowing for an ‘adaptive’ model, i.e. a model that can change its shape during training. |
Sotiris Anagnostidis; Gregor Bachmann; Imanol Schlag; Thomas Hofmann; |
478 | Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we focus on the widespread setting where the observational data come from multiple environments, such as different hospitals, physicians, or countries. |
Jonas Schweisthal; Dennis Frauen; Mihaela van der Schaar; Stefan Feuerriegel; |
479 | Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation predictions. To fill these gaps, we introduce an under-explored task, named flexible docking to predict poses of ligand and pocket sidechains simultaneously and introduce Re-Dock, a novel diffusion bridge generative model extended to geometric manifolds. |
Yufei Huang; Odin Zhang; Lirong Wu; Cheng Tan; Haitao Lin; Zhangyang Gao; Siyuan Li; Stan Z. Li; |
480 | Human-like Category Learning By Injecting Ecological Priors from Large Language Models Into Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that large language models can generate cognitive tasks, specifically category learning tasks, that match the statistics of real-world tasks, thereby addressing the first challenge. |
Akshay Kumar Jagadish; Julian Coda-Forno; Mirko Thalmann; Eric Schulz; Marcel Binz; |
481 | DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This poses a challenge for offline RL algorithms, as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion- based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions be- tween trajectories. |
Guanghe Li; Yixiang Shan; Zhengbang Zhu; Ting Long; Weinan Zhang; |
482 | Value-Evolutionary-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL) that focuses on the integration of EAs with value-based RL. |
Pengyi Li; Jianye HAO; Hongyao Tang; YAN ZHENG; Fazl Barez; |
483 | Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, non-equivalent efforts have been paid to incorporating the abundant protein surface information for analyzing proteins’ biological functions in juxtaposition to amino acid sequences and 3D structures. We propose a novel surface-based unsupervised learning algorithm termed Surface-VQMAE to overcome this obstacle. |
Fang Wu; Stan Z. Li; |
484 | Toward Adaptive Reasoning in Large Language Models with Thought Rollback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new reasoning framework, called *Thought Rollback* (TR), allowing LLMs to adaptively build thought structure while maintaining effective reasoning toward problem-solving under hallucinations. |
Sijia Chen; Baochun Li; |
485 | Towards Resource-friendly, Extensible and Stable Incomplete Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Incomplete multi-view clustering (IMVC) methods typically encounter three drawbacks: (1) intense time and/or space overheads; (2) intractable hyper-parameters; (3) non-zero variance results. With these concerns in mind, we give a simple yet effective IMVC scheme, termed as ToRES. |
Shengju Yu; Zhibin Dong; Siwei Wang; Xinhang Wan; Yue Liu; Weixuan Liang; Pei Zhang; Wenxuan Tu; Xinwang Liu; |
486 | Graph Mixup on Approximate Gromov–Wasserstein Geodesics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though efforts have been made, most of the existing graph mixup methods neglect the intrinsic geodesic guarantee, thereby generating inconsistent sample-label pairs. To address this issue, we propose GeoMix to mixup graphs on the Gromov-Wasserstein (GW) geodesics. |
Zhichen Zeng; Ruizhong Qiu; Zhe Xu; Zhining Liu; Yuchen Yan; Tianxin Wei; Lei Ying; Jingrui He; Hanghang Tong; |
487 | Graph As Point Set Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper introduces a novel graph-to-set conversion method that bijectively transforms interconnected nodes into a set of independent points and then uses a set encoder to learn the graph representation. |
Xiyuan Wang; Pan Li; Muhan Zhang; |
488 | Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to improve the efficiency and accuracy of amortized Bayesian inference by leveraging universal symmetries in the joint probabilistic model of parameters and data. |
Marvin Schmitt; Desi R. Ivanova; Daniel Habermann; Ullrich Koethe; Paul-Christian Bürkner; Stefan T. Radev; |
489 | OMPO: A Unified Framework for RL Under Policy and Dynamics Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. |
Yu Luo; Tianying Ji; Fuchun Sun; Jianwei Zhang; Huazhe Xu; Xianyuan Zhan; |
490 | Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates a new possibility of harnessing the emergent outperforming offline optimal policy to improve online policy learning. Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance. |
Yu Luo; Tianying Ji; Fuchun Sun; Jianwei Zhang; Huazhe Xu; Xianyuan Zhan; |
491 | Outlier-aware Slicing for Post-Training Quantization in Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses a critical challenge in PTQ: \textbf{the severe impact of outliers on the accuracy of quantized transformer architectures.} Specifically, we introduce the concept of `reconstruction granularity’ as a novel solution to this issue, which has been overlooked in previous works. |
Yuexiao Ma; Huixia Li; Xiawu Zheng; Feng Ling; Xuefeng Xiao; Rui Wang; Shilei Wen; Fei Chao; Rongrong Ji; |
492 | On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. |
Jerry Yao-Chieh Hu; Thomas Lin; Zhao Song; Han Liu; |
493 | Outlier-Efficient Hopfield Layers for Large Transformer-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an Outlier-Efficient Modern Hopfield Model (termed `OutEffHop`) and use it to address the outlier inefficiency problem of training gigantic transformer-based models. |
Jerry Yao-Chieh Hu; Pei-Hsuan Chang; Haozheng Luo; Hong-Yu Chen; Weijian Li; Wei-Po Wang; Han Liu; |
494 | Calibration Bottleneck: Over-compressed Representations Are Less Calibratable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the observations, this paper introduces a weak classifier hypothesis, i.e., given a weak classification head that has not been over-trained, the representation module can be better learned to produce more calibratable features. Consequently, we propose a progressively layer-peeled training (PLP) method to exploit this hypothesis, thereby enhancing model calibratability. |
Deng-Bao Wang; Min-Ling Zhang; |
495 | Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this assumption is often challenged by the fact that real-world clinical datasets originate from various data sources (with distinct sets of covariates), which though can be available for training (in a research or retrospective setting), are more realistically only partially available (a subset of such sets) for inference when deployed. So motivated, we introduce Contrastive Learning for clinical Outcome Prediction with Partial data Sources (CLOPPS), that trains encoders to capture information across different data sources and then leverages them to build classifiers restricting access to a single data source. |
Meng Xia; Jonathan Wilson; Benjamin Goldstein; Ricardo Henao; |
496 | DecisionNCE: Embodied Multimodal Representations Via Implicit Preference Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this paper, we propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences and seamlessly align them with language instructions. |
Jianxiong Li; Jinliang Zheng; Yinan Zheng; Liyuan Mao; Xiao Hu; Sijie Cheng; Haoyi Niu; Jihao Liu; Yu Liu; Jingjing Liu; Ya-Qin Zhang; Xianyuan Zhan; |
497 | Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While various robust aggregations have been proposed to defend against such attacks, they are subject to certain assumptions: homogeneous private data and related proxy datasets. To address these limitations, we propose Self-Driven Entropy Aggregation (SDEA), which leverages the random public dataset to conduct Byzantine-robust aggregation in heterogeneous federated learning. |
Wenke Huang; Zekun Shi; Mang Ye; He Li; Bo Du; |
498 | How Graph Neural Networks Learn: Lessons from Training Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, but whether GNNs will learn desired functions during the optimization process remains less clear. To fill this gap, we study their training dynamics in function space. |
Chenxiao Yang; Qitian Wu; David Wipf; Ruoyu Sun; Junchi Yan; |
499 | SSL4Q: Semi-Supervised Learning of Quantum Data with Application to Quantum State Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SSL4Q, manage to achieve (for the first time) semi-supervised learning specifically designed for quantum state classification. |
Yehui Tang; Nianzu Yang; Mabiao Long; Junchi Yan; |
500 | A Unified Adaptive Testing System Enabled By Hierarchical Structure Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a unified data-driven ATS framework that conceptualizes the various testing formats as a hierarchical test structure search problem. |
Junhao Yu; Yan Zhuang; Zhenya Huang; Qi Liu; Xin Li; Rui LI; Enhong Chen; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~2,600 papers), please visit Paper Digest: ICML-2024 (Full List).