Paper Digest: NeurIPS 2023 Highlights
Note: NeurIPS-2023 accepts more than 3,500 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 3,500 NeurIPS-2023 papers in a separate page, which takes quite some time to load.
To search or review papers within NIPS-2023 related to a specific topic, please use the search by venue (NIPS-2023), review by venue (NIPS-2023) and question answering by venue (NIPS-2023) services. To browse papers by author, here is a list of all authors (NIPS-2023). You may also like to explore our “Best Paper” Digest (NeurIPS), which lists the most influential NeurIPS papers since 1987.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: NeurIPS 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Toolformer: Language Models Can Teach Themselves to Use Tools Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that LMs can teach themselves to *use external tools* via simple APIs and achieve the best of both worlds. |
Timo Schick; Jane Dwivedi-Yu; Roberto Dessi; Roberta Raileanu; Maria Lomeli; Eric Hambro; Luke Zettlemoyer; Nicola Cancedda; Thomas Scialom; |
2 | The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as larger models requiring pretraining on trillions of tokens are considered, it is unclear how scalable is curation, and whether we will run out of unique high-quality data soon. At variance with previous beliefs, we show that properly filtered and deduplicated web data alone can lead to powerful models; even significantly outperforming models trained on The Pile. |
Guilherme Penedo; Quentin Malartic; Daniel Hesslow; Ruxandra Cojocaru; Hamza Alobeidli; Alessandro Cappelli; Baptiste Pannier; Ebtesam Almazrouei; Julien Launay; |
3 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. |
Wenliang Dai; Junnan Li; DONGXU LI; Anthony Meng Huat Tiong; Junqi Zhao; Weisheng Wang; Boyang Li; Pascale N Fung; Steven Hoi; |
4 | QLoRA: Efficient Finetuning of Quantized LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. |
Tim Dettmers; Artidoro Pagnoni; Ari Holtzman; Luke Zettlemoyer; |
5 | Direct Preference Optimization: Your Language Model Is Secretly A Reward Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper, we leverage a mapping between reward functions and optimal policies to show that this constrained reward maximization problem can be optimized exactly with a single stage of policy training, essentially solving a classification problem on the human preference data. |
Rafael Rafailov; Archit Sharma; Eric Mitchell; Christopher D Manning; Stefano Ermon; Chelsea Finn; |
6 | Reflexion: Language Agents with Verbal Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Reflexion, a novel framework to reinforce language agents not by updating weights, but instead through linguistic feedback. |
Noah Shinn; Federico Cassano; Ashwin Gopinath; Karthik Narasimhan; Shunyu Yao; |
7 | LIMA: Less Is More for Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling |
Chunting Zhou; Pengfei Liu; Puxin Xu; Srinivasan Iyer; Jiao Sun; Yuning Mao; Xuezhe Ma; Avia Efrat; Ping Yu; LILI YU; Susan Zhang; Gargi Ghosh; Mike Lewis; Luke Zettlemoyer; Omer Levy; |
8 | Self-Refine: Iterative Refinement with Self-Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. |
Aman Madaan; Niket Tandon; Prakhar Gupta; Skyler Hallinan; Luyu Gao; Sarah Wiegreffe; Uri Alon; Nouha Dziri; Shrimai Prabhumoye; Yiming Yang; Shashank Gupta; Bodhisattwa Prasad Majumder; Katherine Hermann; Sean Welleck; Amir Yazdanbakhsh; Peter Clark; |
9 | Vicuna Evaluation: Exploring LLM-as-a-Judge and Chatbot Arena Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement biases, as well as limited reasoning ability, and propose solutions to mitigate some of them. |
Lianmin Zheng; Wei-Lin Chiang; Ying Sheng; Siyuan Zhuang; Zhanghao Wu; Yonghao Zhuang; Zi Lin; Zhuohan Li; Dacheng Li; Eric Xing; Hao Zhang; Joseph Gonzalez; Ion Stoica; |
10 | Language Is Not All You Need: Aligning Perception with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce KOSMOS-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). |
Shaohan Huang; Li Dong; Wenhui Wang; Yaru Hao; Saksham Singhal; Shuming Ma; Tengchao Lv; Lei Cui; Owais Khan Mohammed; Barun Patra; Qiang Liu; Kriti Aggarwal; Zewen Chi; Nils Bjorck; Vishrav Chaudhary; Subhojit Som; XIA SONG; Furu Wei; |
11 | Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. |
Yuval Kirstain; Adam Polyak; Uriel Singer; Shahbuland Matiana; Joe Penna; Omer Levy; |
12 | Mathematical Capabilities of ChatGPT Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. |
Simon Frieder; Luca Pinchetti; Chevalier; Ryan-Rhys Griffiths; Tommaso Salvatori; Thomas Lukasiewicz; Philipp Petersen; Julius Berner; |
13 | Segment Everything Everywhere All at Once Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present SEEM, a promotable and interactive model for segmenting everything everywhere all at once in an image. |
Xueyan Zou; Jianwei Yang; Hao Zhang; Feng Li; Linjie Li; Jianfeng Wang; Lijuan Wang; Jianfeng Gao; Yong Jae Lee; |
14 | Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs�e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always (A)�which models systematically fail to mention in their explanations. |
Miles Turpin; Julian Michael; Ethan Perez; Samuel Bowman; |
15 | AlpacaFarm: A Simulation Framework for Methods That Learn from Human Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Replicating and understanding this instruction-following process faces three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these bottlenecks with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. |
Yann Dubois; Xuechen Li; Rohan Taori; Tianyi Zhang; Ishaan Gulrajani; Jimmy Ba; Carlos Guestrin; Percy Liang; Tatsunori Hashimoto; |
16 | Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. |
Zeqiu Wu; Yushi Hu; Weijia Shi; Nouha Dziri; Alane Suhr; Prithviraj (Raj) Ammanabrolu; Noah Smith; Mari Ostendorf; Hannaneh Hajishirzi; |
17 | ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a comprehensive solution to learn and improve text-to-image models from human preference feedback. |
Jiazheng Xu; Xiao Liu; Yuchen Wu; Yuxuan Tong; Qinkai Li; Ming Ding; Jie Tang; Yuxiao Dong; |
18 | Visual Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and an LLM for general-purpose visual and language understanding. |
Haotian Liu; Chunyuan Li; Qingyang Wu; Yong Jae Lee; |
19 | Perfect Linear Concept Erasure in Closed Form Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that a rank $k – 1$ orthogonal projection is sufficient to perfectly guard a $k$-class concept from all linear adversaries with convex loss functions, and provide the formula in closed form. |
Nora Belrose; David Schneider-Joseph; Shauli Ravfogel; Ryan Cotterell; Edward Raff; Stella Biderman; |
20 | Scaling Data-Constrained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. |
Niklas Muennighoff; Alexander Rush; Boaz Barak; Teven Le Scao; Nouamane Tazi; Aleksandra Piktus; Thomas Wolf; Colin Raffel; Sampo Pyysalo; |
21 | Faith and Fate: Limits of Transformers on Compositionality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a measure of compositional complexity, we introduce computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. |
Nouha Dziri; Ximing Lu; Melanie Sclar; Xiang (Lorraine) Li; Liwei Jiang; Bill Yuchen Lin; Sean Welleck; Peter West; Chandra Bhagavatula; Ronan Le Bras; Jena Hwang; Soumya Sanyal; Xiang Ren; Allyson Ettinger; Zaid Harchaoui; Yejin Choi; |
22 | StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that (1) when the generative model is properly configured, training self-supervised methods on synthetic images can match or beat the real image counterpart;(2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep. |
Yonglong Tian; Lijie Fan; Phillip Isola; Huiwen Chang; Dilip Krishnan; |
23 | Dissecting Knowledge Distillation: An Exploration of Its Inner Workings and Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Does its data invariance properties become similar? Our work presents a comprehensive study to try to answer these questions. |
Utkarsh Ojha; Yuheng Li; Anirudh Sundara Rajan; Yingyu Liang; Yong Jae Lee; |
24 | Data Selection for Language Models Via Importance Resampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we extend the classic importance resampling approach used in low-dimensions for LM data selection. We propose Data Selection with Importance Resampling (DSIR), an efficient and scalable framework that estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights. |
Sang Michael Xie; Shibani Santurkar; Tengyu Ma; Percy Liang; |
25 | Visual Instruction Inversion: Image Editing Via Image Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for image editing via visual prompting. |
Thao Nguyen; Yuheng Li; Utkarsh Ojha; Yong Jae Lee; |
26 | SceneScape: Text-Driven Consistent Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for text-driven perpetual view generation — synthesizing long-term videos of various scenes solely, given an input text prompt describing the scene and camera poses. |
Rafail Fridman; Amit Abecasis; Yoni Kasten; Tali Dekel; |
27 | Tree of Thoughts: Deliberate Problem Solving with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. |
Shunyu Yao; Dian Yu; Jeffrey Zhao; Izhak Shafran; Tom Griffiths; Yuan Cao; Karthik Narasimhan; |
28 | Paraphrasing Evades Detectors of AI-generated Text, But Retrieval Is An Effective Defense Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. |
Kalpesh Krishna; Yixiao Song; Marzena Karpinska; John Wieting; Mohit Iyyer; |
29 | DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. |
Sang Michael Xie; Hieu Pham; Xuanyi Dong; Nan Du; Hanxiao Liu; Yifeng Lu; Percy Liang; Quoc V Le; Tengyu Ma; Adams Wei Yu; |
30 | Scalable 3D Captioning with Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. |
Tiange Luo; Chris Rockwell; Honglak Lee; Justin Johnson; |
31 | Emergent and Predictable Memorization in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The prevalence of such undesirable memorization can pose issues for model trainers, and may even require discarding an otherwise functional model. We therefore seek to predict which sequences will be memorized before a large model’s full train-time by extrapolating the memorization behavior of lower-compute trial runs. |
Stella Biderman; USVSN PRASHANTH; Lintang Sutawika; Hailey Schoelkopf; Quentin Anthony; Shivanshu Purohit; Edward Raff; |
32 | HuggingGPT: Solving AI Tasks with ChatGPT and Its Friends in Hugging Face Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. |
Yongliang Shen; Kaitao Song; Xu Tan; Dongsheng Li; Weiming Lu; Yueting Zhuang; |
33 | Self-Supervised Learning with Lie Symmetries for Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. |
Grégoire Mialon; Quentin Garrido; Hannah Lawrence; Danyal Rehman; Bobak Kiani; Yann LeCun; |
34 | OpenProteinSet: Training Data for Structural Biology at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. |
Gustaf Ahdritz; Nazim Bouatta; Sachin Kadyan; Lukas Jarosch; Dan Berenberg; Ian Fisk; Andrew Watkins; Stephen Ra; Richard Bonneau; Mohammed AlQuraishi; |
35 | Towards Automated Circuit Discovery for Mechanistic Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a novel algorithm, Automatic Circuit DisCovery (ACDC), to automate the identification of the important units in the network. |
Arthur Conmy; Augustine Mavor-Parker; Aengus Lynch; Stefan Heimersheim; Adrià Garriga-Alonso; |
36 | Does Localization Inform Editing? Surprising Differences in Causality-Based Localization Vs. Knowledge Editing in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored. |
Peter Hase; Mohit Bansal; Been Kim; Asma Ghandeharioun; |
37 | Diffusion Self-Guidance for Controllable Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce self-guidance, a method that provides precise control over properties of the generated image by guiding the internal representations of diffusion models. |
Dave Epstein; Allan Jabri; Ben Poole; Alexei Efros; Aleksander Holynski; |
38 | ToolkenGPT: Augmenting Frozen Language Models with Massive Tools Via Tool Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the latter method offers adaptability to new tools, it struggles with the inherent context length constraint of LLMs when many new tools are presented, and mastering a new set of tools with few-shot examples remains challenging, resulting in suboptimal performance. To address these limitations, we propose a novel solution, named **ToolkenGPT**, wherein LLMs effectively learn to master tools as predicting tokens through **tool embeddings** for solving complex tasks. |
Shibo Hao; Tianyang Liu; Zhen Wang; Zhiting Hu; |
39 | Inference-Time Intervention: Eliciting Truthful Answers from A Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). |
Kenneth Li; Oam Patel; Fernanda Viégas; Hanspeter Pfister; Martin Wattenberg; |
40 | Objaverse-XL: A Colossal Universe of 3D Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. |
Matt Deitke; Ruoshi Liu; Matthew Wallingford; Huong Ngo; Oscar Michel; Aditya Kusupati; Alan Fan; Christian Laforte; Vikram Voleti; Samir Yitzhak Gadre; Eli VanderBilt; Aniruddha Kembhavi; Carl Vondrick; Georgia Gkioxari; Kiana Ehsani; Ludwig Schmidt; Ali Farhadi; |
41 | Stable Bias: Evaluating Societal Representations in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This evaluation, however, is made more difficult by the synthetic nature of these systems’ outputs: common definitions of diversity are grounded in social categories of people living in the world, whereas the artificial depictions of fictive humans created by these systems have no inherent gender or ethnicity. To address this need, we propose a new method for exploring the social biases in TTI systems. |
Sasha Alexandra Luccioni; Christopher Akiki; Margaret Mitchell; Yacine Jernite; |
42 | MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces MVDiffusion, a simple yet effective multi-view image generation method for scenarios where pixel-to-pixel correspondences are available, such as perspective crops from panorama or multi-view images given depth/pose. |
Shitao Tang; Fuyang Zhang; Jiacheng Chen; Peng Wang; Yasutaka Furukawa; |
43 | Language Models Can Solve Computer Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent recursively criticizes and improves its output (RCI). |
Geunwoo Kim; Pierre Baldi; Stephen McAleer; |
44 | Learning Universal Policies Via Text-Guided Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents. |
Yilun Du; Mengjiao (Sherry) Yang; Bo Dai; Hanjun Dai; Ofir Nachum; Josh Tenenbaum; Dale Schuurmans; Pieter Abbeel; |
45 | Holistic Evaluation of Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing evaluations primarily focus on image-text alignment and quality. To address this limitation, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). |
Tony Lee; Michihiro Yasunaga; Chenlin Meng; Yifan Mai; Joon Sung Park; Agrim Gupta; Yunzhi Zhang; Deepak Narayanan; Hannah Teufel; Marco Bellagente; Minguk Kang; Taesung Park; Jure Leskovec; Jun-Yan Zhu; Fei-Fei Li; Jiajun Wu; Stefano Ermon; Percy Liang; |
46 | Generating Images with Multimodal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. |
Jing Yu Koh; Daniel Fried; Russ Salakhutdinov; |
47 | Simple and Controllable Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. |
Jade Copet; Felix Kreuk; Itai Gat; Tal Remez; Gabriel Synnaeve; Yossi Adi; Alexandre Defossez; |
48 | Structural Pruning for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training. |
Gongfan Fang; Xinyin Ma; Xinchao Wang; |
49 | Where Are We in The Search for An Artificial Visual Cortex for Embodied Intelligence? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual ‘foundation models’ for Embodied AI. |
Arjun Majumdar; Karmesh Yadav; Sergio Arnaud; Jason Yecheng Ma; Claire Chen; Sneha Silwal; Aryan Jain; Vincent-Pierre Berges; Tingfan Wu; Jay Vakil; Pieter Abbeel; Jitendra Malik; Dhruv Batra; Yixin Lin; Oleksandr Maksymets; Aravind Rajeswaran; Franziska Meier; |
50 | Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from The Data Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to advance our understanding by presenting a straightforward and unified explanation from the data perspective. |
Huayang Li; Tian Lan; Zihao Fu; Deng Cai; Lemao Liu; Nigel Collier; Taro Watanabe; Yixuan Su; |
51 | Are Aligned Neural Networks Adversarially Aligned? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They respond helpfully to user questions, but when asked to perform some behavior that would cause harm, will politely refuse. We study to what extent these models are aligned even when interacting with an adversarial user who constructs worst-case adversarial example inputs. |
Nicholas Carlini; Florian Tramer; Daphne Ippolito; Ludwig Schmidt; Milad Nasr; Matthew Jagielski; Pang Wei Koh; Irena Gao; Christopher A. Choquette-Choo; |
52 | Patch N’ Pack: NaViT, A Vision Transformer for Any Aspect Ratio and Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence packing during training to process inputs of arbitrary resolutions and aspect ratios. |
Mostafa Dehghani; Basil Mustafa; Josip Djolonga; Jonathan Heek; Matthias Minderer; Mathilde Caron; Andreas Steiner; Joan Puigcerver; Robert Geirhos; Ibrahim Alabdulmohsin; Avital Oliver; Piotr Padlewski; Alexey Gritsenko; Mario Lucic; Neil Houlsby; |
53 | Counterfactual Memorization in Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate a notion of counterfactual memorization which characterizes how a model’s predictions change if a particular document is omitted during training. |
Chiyuan Zhang; Daphne Ippolito; Katherine Lee; Matthew Jagielski; Florian Tramer; Nicholas Carlini; |
54 | Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce *ecosystem-level analysis*: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. |
Connor Toups; Rishi Bommasani; Kathleen Creel; Sarah Bana; Dan Jurafsky; Percy Liang; |
55 | Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets. |
Oleg Platonov; Denis Kuznedelev; Artem Babenko; Liudmila Prokhorenkova; |
56 | Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, access to LLMs today is largely restricted to black-box text generation APIs; raw runtimes measured through this interface do not satisfy these desiderata: model providers can implement software and hardware optimizations orthogonal to the model, and shared infrastructure introduces performance contention. We propose a new metric for inference efficiency that puts models on equal footing as though they were served on uniform hardware and software and without performance contention. |
Deepak Narayanan; Keshav Santhanam; Peter Henderson; Rishi Bommasani; Tony Lee; Percy Liang; |
57 | Lexinvariant Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, we prove that we can construct a lexinvariant LM to converge to the true language model at a uniform rate that is polynomial in terms of the context length, with a constant factor that is sublinear in the vocabulary size. Second, to build a lexinvariant LM, we simply encode tokens using random Gaussian vectors, such that each token maps to the same representation within each sequence but different representations across sequences. |
Qian Huang; Eric Zelikman; Sarah Chen; Yuhuai Wu; Gregory Valiant; Percy Liang; |
58 | Distributed Inference and Fine-tuning of Large Language Models Over The Internet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. |
Alexander Borzunov; Dmitry Baranchuk; Tim Dettmers; Max Ryabinin; Younes Belkada; Artem Chumachenko; Pavel Samygin; Colin Raffel; |
59 | Red Teaming Deep Neural Networks with Feature Synthesis Tools Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insight is that we can train models that respond to specific triggers (e.g., a specific patch inserted into an image) with specific outputs (i.e. a label) and then evaluate interpretability tools based on whether they help humans identify these triggers. |
Stephen Casper; Tong Bu; Yuxiao Li; Jiawei Li; Kevin Zhang; Kaivalya Hariharan; Dylan Hadfield-Menell; |
60 | InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight, flexible, and easy-to-use framework for constructing interactive code environments with multiple types of feedback signals. |
John Yang; Akshara Prabhakar; Karthik Narasimhan; Shunyu Yao; |
61 | LLaVA-Med: Training A Large Language-and-Vision Assistant for Biomedicine in One Day Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. |
Chunyuan Li; Cliff Wong; Sheng Zhang; Naoto Usuyama; Haotian Liu; Jianwei Yang; Tristan Naumann; Hoifung Poon; Jianfeng Gao; |
62 | Statistical Knowledge Assessment for Generative Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given varying prompts, does a GLM consistently generate factually correct answers? In this paper, we introduce a statistical knowledge assessment framework guided by latent variables and the KaRR metric, which quantifies a model’s knowledge by computing its continuous probability across diverse text forms. |
Qingxiu Dong; Jingjing Xu; Lingpeng Kong; Zhifang Sui; Lei Li; |
63 | Jailbroken: How Does LLM Safety Training Fail? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models trained for safety and harmlessness remain susceptible to adversarial misuse, as evidenced by the prevalence of �jailbreak� attacks on early releases of ChatGPT that elicit undesired behavior. Going beyond recognition of the issue, we investigate why such attacks succeed and how they can be created. |
Alexander Wei; Nika Haghtalab; Jacob Steinhardt; |
64 | Likelihood-Based Diffusion Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take the first steps towards closing the perplexity gap between autoregressive and diffusion-based language models, with the goal of building and releasing a diffusion model which outperforms the smallest widely-adopted autoregressive model (GPT-2 124M). |
Ishaan Gulrajani; Tatsunori Hashimoto; |
65 | OpenMask3D: Open-Vocabulary 3D Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While such a representation can be directly employed to perform semantic segmentation, existing methods have limitations in their ability to handle object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. |
Ayca Takmaz; Elisabetta Fedele; Robert Sumner; Marc Pollefeys; Federico Tombari; Francis Engelmann; |
66 | VisIT-Bench: A Dynamic Benchmark for Evaluating Instruction-Following Vision-and-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce VisIT-Bench, a robust benchmark for diverse real-life vision-language instructions across 70 tasks, from recognition to reasoning. |
Yonatan Bitton; Hritik Bansal; Jack Hessel; Rulin Shao; Wanrong Zhu; Anas Awadalla; Josh Gardner; Rohan Taori; Ludwig Schmidt; |
67 | How Far Can Camels Go? Exploring The State of Instruction Tuning on Open Resources Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. |
Yizhong Wang; Hamish Ivison; Pradeep Dasigi; Jack Hessel; Tushar Khot; Khyathi Chandu; David Wadden; Kelsey MacMillan; Noah Smith; Iz Beltagy; Hannaneh Hajishirzi; |
68 | Why Diffusion Models Memorize and How to Mitigate Copying Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. |
Gowthami Somepalli; Vasu Singla; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
69 | 3D-LLM: Injecting The 3D World Into Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to inject the 3D world into large language models, and introduce a whole new family of 3D-LLMs. |
Yining Hong; Haoyu Zhen; Peihao Chen; Shuhong Zheng; Yilun Du; Zhenfang Chen; Chuang Gan; |
70 | Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities. |
Mitsuhiko Nakamoto; Yuexiang Zhai; Anikait Singh; Max Sobol Mark; Yi Ma; Chelsea Finn; Aviral Kumar; Sergey Levine; |
71 | Emergent Correspondence from Image Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that correspondence emerges in diffusion models without any explicit supervision. |
Luming Tang; Menglin Jia; Qianqian Wang; Cheng Perng Phoo; Bharath Hariharan; |
72 | Fine-Tuning Language Models with Just Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a memory-efficient zeroth-order optimizer (MeZO), adapting the classical ZO-SGD method to operate in-place, thereby fine-tuning LMs with the same memory footprint as inference. |
Sadhika Malladi; Tianyu Gao; Eshaan Nichani; Alex Damian; Jason Lee; Danqi Chen; Sanjeev Arora; |
73 | LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance visual planning skills of LLMs. |
Weixi Feng; Wanrong Zhu; Tsu-Jui Fu; Varun Jampani; Arjun Akula; Xuehai He; S Basu; Xin Eric Wang; William Yang Wang; |
74 | What Makes Good Examples for Visual In-Context Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To demystify in-context learning in computer vision, we conduct an extensive research and identify a critical problem: downstream performance is highly sensitivie to the choice of visual in-context examples. To address this problem, we propose a prompt retrieval framework specifically for large vision models, allowing the selection of in-context examples to be fully automated. |
Yuanhan Zhang; Kaiyang Zhou; Ziwei Liu; |
75 | Stable and Low-precision Training for Large-scale Vision-language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. |
Mitchell Wortsman; Tim Dettmers; Luke Zettlemoyer; Ari Morcos; Ali Farhadi; Ludwig Schmidt; |
76 | Battle of The Backbones: A Large-Scale Comparison of Pretrained Models Across Computer Vision Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. |
Micah Goldblum; Hossein Souri; Renkun Ni; Manli Shu; Viraj Prabhu; Gowthami Somepalli; Prithvijit Chattopadhyay; Adrien Bardes; Mark Ibrahim; Judy Hoffman; Rama Chellappa; Andrew Wilson; Tom Goldstein; |
77 | LLM-Pruner: On The Structural Pruning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM. |
Xinyin Ma; Gongfan Fang; Xinchao Wang; |
78 | RAPHAEL: Text-to-Image Generation Via Large Mixture of Diffusion Paths Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. |
Zeyue Xue; Guanglu Song; Qiushan Guo; Boxiao Liu; Zhuofan Zong; Yu Liu; Ping Luo; |
79 | Guide Your Agent with Adaptive Multimodal Rewards Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we instead propose to utilize the knowledge captured within large vision-language models for improving the generalization capability of control agents. |
Changyeon Kim; Younggyo Seo; Hao Liu; Lisa Lee; Jinwoo Shin; Honglak Lee; Kimin Lee; |
80 | Does Progress on ImageNet Transfer to Real-world Datasets? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models. |
Alex Fang; Simon Kornblith; Ludwig Schmidt; |
81 | The Impact of Positional Encoding on Length Generalization in Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct a systematic empirical study comparing the length generalization performance of decoder-only Transformers with five different position encoding approaches including Absolute Position Embedding (APE), T5’s Relative PE, ALiBi, and Rotary, in addition to Transformers without positional encoding (NoPE). |
Amirhossein Kazemnejad; Inkit Padhi; Karthikeyan Natesan Ramamurthy; Payel Das; Siva Reddy; |
82 | TextDiffuser: Diffusion Models As Text Painters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. |
Jingye Chen; Yupan Huang; Tengchao Lv; Lei Cui; Qifeng Chen; Furu Wei; |
83 | Paxion: Patching Action Knowledge in Video-Language Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent video-language models’ (VidLM) impressive performance on various benchmark tasks, our diagnostic tasks reveal their surprising deficiency (near-random performance) in action knowledge, suggesting that current models rely on object recognition abilities as a shortcut for action understanding. To remedy this, we propose a novel framework, **Paxion**, along with a new **Discriminative Video Dynamics Modeling (DVDM)** objective. |
Zhenhailong Wang; Ansel Blume; Sha Li; Genglin Liu; Jaemin Cho; Zineng Tang; Mohit Bansal; Heng Ji; |
84 | C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. |
Yuzhen Huang; Yuzhuo Bai; Zhihao Zhu; Junlei Zhang; Jinghan Zhang; Tangjun Su; Junteng Liu; Chuancheng Lv; Yikai Zhang; jiayi lei; Yao Fu; Maosong Sun; Junxian He; |
85 | Neural Functional Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, constructing expressive and efficient neural functional architectures that can handle high-dimensional weight-space objects remains challenging. This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers and composes them into deep equivariant models called neural functional Transformers (NFTs). |
Allan Zhou; Kaien Yang; Yiding Jiang; Kaylee Burns; Winnie Xu; Samuel Sokota; J. Zico Kolter; Chelsea Finn; |
86 | Permutation Equivariant Neural Functionals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We approach the design of neural functionals through the lens of symmetry, in particular by focusing on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. We introduce a framework for building *permutation equivariant* neural functionals, whose architectures encode these symmetries as an inductive bias. |
Allan Zhou; Kaien Yang; Kaylee Burns; Adriano Cardace; Yiding Jiang; Samuel Sokota; J. Zico Kolter; Chelsea Finn; |
87 | Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. |
Zhiqing Sun; Yikang Shen; Qinhong Zhou; Hongxin Zhang; Zhenfang Chen; David Cox; Yiming Yang; Chuang Gan; |
88 | Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-intricate setting, i.e., generating intricate visual content from simple abstract text prompts. |
Shengqiong Wu; Hao Fei; Hanwang Zhang; Tat-Seng Chua; |
89 | Focused Transformer: Contrastive Training for Context Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We pinpoint a key challenge, referred to as the distraction issue, where keys associated with distinct semantic values may overlap, making them challenging to differentiate. To address this issue, we propose the Focused Transformer (FoT), a method that utilizes a training process inspired by contrastive learning. |
Szymon Tworkowski; Konrad Staniszewski; Mikołaj Pacek; Yuhuai Wu; Henryk Michalewski; Piotr Miłoś; |
90 | Language Models Augmented with Decoupled Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Decoupled-Memory-Augmented LLMs (DeMA), which enables LLMs to memorize long history. |
Weizhi Wang; Li Dong; Hao Cheng; Xiaodong Liu; Xifeng Yan; Jianfeng Gao; Furu Wei; |
91 | Extensible Prompts for Language Models on Zero-shot Language Style Customization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose eXtensible Prompt (X-Prompt) for prompting a large language model (LLM) beyond natural language (NL). |
Tao Ge; Hu Jing; Li Dong; Shaoguang Mao; Yan Xia; Xun Wang; Si-Qing Chen; Furu Wei; |
92 | PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning About Change Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change. |
Karthik Valmeekam; Matthew Marquez; Alberto Olmo; Sarath Sreedharan; Subbarao Kambhampati; |
93 | On The Planning Abilities of Large Language Models – A Critical Investigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. |
Karthik Valmeekam; Matthew Marquez; Sarath Sreedharan; Subbarao Kambhampati; |
94 | Scaling in Depth: Unlocking Robustness Certification on ImageNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates strategies for expanding certifiably robust training to larger, deeper models. |
Kai Hu; Andy Zou; Zifan Wang; Klas Leino; Matt Fredrikson; |
95 | Grounding Neural Inference with Satisfiability Modulo Theories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a set of techniques for integrating Satisfiability Modulo Theories (SMT) solvers into the forward and backward passes of a deep network layer, called SMTLayer. |
Matt Fredrikson; Kaiji Lu; Somesh Jha; Saranya Vijayakumar; Vijay Ganesh; Zifan Wang; |
96 | Benchmarking Distribution Shift in Tabular Data with TableShift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a consequence, the robustness of tabular models to distribution shift is poorly understood. To address this issue, we introduce TableShift, a distribution shift benchmark for tabular data. |
Josh Gardner; Zoran Popovic; Ludwig Schmidt; |
97 | Improving Multimodal Datasets with Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work focuses on caption quality as one major source of noise, and studies the effectiveness of generated captions in increasing the utility of web-scraped datapoints with nondescript text. |
Thao Nguyen; Samir Yitzhak Gadre; Gabriel Ilharco; Sewoong Oh; Ludwig Schmidt; |
98 | Improving CLIP Training with Language Rewrites Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites. |
Lijie Fan; Dilip Krishnan; Phillip Isola; Dina Katabi; Yonglong Tian; |
99 | Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. |
Neel Guha; Mayee Chen; Kush Bhatia; Azalia Mirhoseini; Frederic Sala; Christopher Ré; |
100 | OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documents comprising 141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens. |
Hugo Laurençon; Lucile Saulnier; Leo Tronchon; Stas Bekman; Amanpreet Singh; Anton Lozhkov; Thomas Wang; Siddharth Karamcheti; Alexander Rush; Douwe Kiela; Matthieu Cord; Victor Sanh; |
101 | Optimizing Prompts for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. |
Yaru Hao; Zewen Chi; Li Dong; Furu Wei; |
102 | RealTime QA: What’s The Answer Right Now? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce RealTime QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). |
Jungo Kasai; Keisuke Sakaguchi; yoichi takahashi; Ronan Le Bras; Akari Asai; Xinyan Yu; Dragomir Radev; Noah Smith; Yejin Choi; Kentaro Inui; |
103 | Tracr: Compiled Transformers As A Laboratory for Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show how to compile human-readable programs into standard decoder-only transformer models. |
David Lindner; Janos Kramar; Sebastian Farquhar; Matthew Rahtz; Tom McGrath; Vladimir Mikulik; |
104 | VisionLLM: Large Language Model Is Also An Open-Ended Decoder for Vision-Centric Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an LLM-based framework for vision-centric tasks, termed VisionLLM. |
Wenhai Wang; Zhe Chen; Xiaokang Chen; Jiannan Wu; Xizhou Zhu; Gang Zeng; Ping Luo; Tong Lu; Jie Zhou; Yu Qiao; Jifeng Dai; |
105 | GenEval: An Object-focused Framework for Evaluating Text-to-image Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color. |
Dhruba Ghosh; Hannaneh Hajishirzi; Ludwig Schmidt; |
106 | What Is The Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. |
Khashayar Gatmiry; Zhiyuan Li; Tengyu Ma; Sashank Reddi; Stefanie Jegelka; Ching-Yao Chuang; |
107 | Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning. In this paper, we present Chameleon, an AI system that mitigates these limitations by augmenting LLMs with plug-and-play modules for compositional reasoning. |
Pan Lu; Baolin Peng; Hao Cheng; Michel Galley; Kai-Wei Chang; Ying Nian Wu; Song-Chun Zhu; Jianfeng Gao; |
108 | Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that we show outperforms alternatives. |
Wanrong Zhu; Jack Hessel; Anas Awadalla; Samir Yitzhak Gadre; Jesse Dodge; Alex Fang; Youngjae Yu; Ludwig Schmidt; William Yang Wang; Yejin Choi; |
109 | Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose using reinforcement learning (RL) to fine-tune text-to-image models. |
Ying Fan; Olivia Watkins; Yuqing Du; Hao Liu; Moonkyung Ryu; Craig Boutilier; Pieter Abbeel; Mohammad Ghavamzadeh; Kangwook Lee; Kimin Lee; |
110 | Provably Bounding Neural Network Preimages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present the INVPROP algorithm for verifying properties over the preimage of a linearly constrained output set of a neural network, which can be combined with branch-and-bound to increase precision. |
Christopher Brix; Suhas Kotha; Huan Zhang; J. Zico Kolter; Krishnamurthy Dvijotham; |
111 | DataComp: In Search of The Next Generation of Multimodal Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the machine learning ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. |
Samir Yitzhak Gadre; Gabriel Ilharco; Alex Fang; Jonathan Hayase; Georgios Smyrnis; Thao Nguyen; Ryan Marten; Mitchell Wortsman; Dhruba Ghosh; Jieyu Zhang; Eyal Orgad; Rahim Entezari; Giannis Daras; Sarah Pratt; Vivek Ramanujan; Yonatan Bitton; Kalyani Marathe; Stephen Mussmann; Richard Vencu; Mehdi Cherti; Ranjay Krishna; Pang Wei Koh; Olga Saukh; Alexander Ratner; Shuran Song; Hannaneh Hajishirzi; Ali Farhadi; Romain Beaumont; Sewoong Oh; Alex Dimakis; Jenia Jitsev; Yair Carmon; Vaishaal Shankar; Ludwig Schmidt; |
112 | On The Connection Between Pre-training Data Diversity and Fine-tuning Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. |
Vivek Ramanujan; Thao Nguyen; Sewoong Oh; Ali Farhadi; Ludwig Schmidt; |
113 | Ordering-based Conditions for Global Convergence of Policy Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation. |
Jincheng Mei; Bo Dai; Alekh Agarwal; Mohammad Ghavamzadeh; Csaba Szepesvari; Dale Schuurmans; |
114 | Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact, an entire family of generative models can be constructed by varying this choice. |
Arpit Bansal; Eitan Borgnia; Hong-Min Chu; Jie Li; Hamid Kazemi; Furong Huang; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
115 | Collaborative Development of NLP Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the exhaustive delineation of a concept is challenging, and an improper approach can create shortcuts or interfere with original data or other concepts. To address these challenges, we introduce CoDev, a framework that enables multi-user interaction with the model, thereby mitigating individual limitations. |
Fereshte Khani; Marco Tulio Ribeiro; |
116 | Text Alignment Is An Efficient Unified Model for Massive NLP Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. |
Yuheng Zha; Yichi Yang; Ruichen Li; Zhiting Hu; |
117 | Proximity-Informed Calibration for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. |
Miao Xiong; Ailin Deng; Pang Wei Koh; Jiaying Wu; Shen Li; Jianqing Xu; Bryan Hooi; |
118 | LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address a more general problem where the poses of the fragments are *unknown* in 3D space. |
Jiaqi Guan; Xingang Peng; PeiQi Jiang; Yunan Luo; Jian Peng; Jianzhu Ma; |
119 | Language Models Meet World Models: Embodied Experiences Enhance Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. |
Jiannan Xiang; Tianhua Tao; Yi Gu; Tianmin Shu; Zirui Wang; Zichao Yang; Zhiting Hu; |
120 | The Learnability of In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. |
Noam Wies; Yoav Levine; Amnon Shashua; |
121 | Isotropic Loss Design for Non-contrastive SSL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we analytically study learning dynamics under cosine similarity in the eigenspace of the predictor network and show that collapse is avoided through implicit variance regularization similar to Euclidean loss but with fundamentally different dynamics. |
Manu Srinath Halvagal; Axel Laborieux; Friedemann Zenke; |
122 | DreamHuman: Animatable 3D Avatars from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present \emph{DreamHuman}, a method to generate realistic animatable 3D human avatar models entirely from textual descriptions. |
Nikos Kolotouros; Thiemo Alldieck; Andrei Zanfir; Eduard Bazavan; Mihai Fieraru; Cristian Sminchisescu; |
123 | Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, if we want to make use of the semantic knowledge in a language model while still situating it in an embodied setting, we must construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. |
Wenlong Huang; Fei Xia; Dhruv Shah; Danny Driess; Andy Zeng; Yao Lu; Pete Florence; Igor Mordatch; Sergey Levine; Karol Hausman; brian ichter; |
124 | Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the belief that the inductive bias of a model architecture is more important than the bias mitigation strategy, we take a different approach to bias mitigation. |
Samuel Dooley; Rhea Sukthanker; John Dickerson; Colin White; Frank Hutter; Micah Goldblum; |
125 | Setting The Trap: Capturing and Defeating Backdoor Threats in PLMs Through Honeypots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. |
Ruixiang Tang; Jiayi Yuan; Yiming Li; Zirui Liu; Rui Chen; Xia Hu; |
126 | Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. |
Zichen Zhang; Johannes Kirschner; Junxi Zhang; Francesco Zanini; Alex Ayoub; Masood Dehghan; Dale Schuurmans; |
127 | SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This hackability is so dire that blind models with no access to the image outperform state-of-the-art vision-language models. To remedy this rampant vulnerability, we introduce $\textit{SugarCrepe}$, a new benchmark for vision-language compositionality evaluation. |
Cheng-Yu Hsieh; Jieyu Zhang; Zixian Ma; Aniruddha Kembhavi; Ranjay Krishna; |
128 | Self-Chained Image-Language Model for Video Localization and Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although humans often find a video moment to focus on and rewind the moment to answer questions, training a query-aware video moment localizer often requires expensive annotations and high computational costs. To address this issue, we propose Self-Chained Video Localization-Answering (SeViLA), a novel framework that leverages a single image-language model (BLIP-2) to tackle both temporal keyframe localization and question answering on videos. |
Shoubin Yu; Jaemin Cho; Prateek Yadav; Mohit Bansal; |
129 | Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). |
Yingcong Li; Kartik Sreenivasan; Angeliki Giannou; Dimitris Papailiopoulos; Samet Oymak; |
130 | A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. |
Valeriia Cherepanova; Gowthami Somepalli; Jonas Geiping; C. Bayan Bruss; Andrew Wilson; Tom Goldstein; Micah Goldblum; |
131 | What You See Is What You Read? Improving Text-Image Alignment Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study methods for automatic text-image alignment evaluation. |
Michal Yarom; Yonatan Bitton; Soravit Changpinyo; Roee Aharoni; Jonathan Herzig; Oran Lang; Eran Ofek; Idan Szpektor; |
132 | Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. |
Ibrahim Alabdulmohsin; Lucas Beyer; Alexander Kolesnikov; Xiaohua Zhai; |
133 | Scaling Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Major challenges in scaling self-training are the choice of label space, pseudo-annotation filtering, and training efficiency. We present the OWLv2 model and OWL-ST self-training recipe, which address these challenges. |
Matthias Minderer; Alexey Gritsenko; Neil Houlsby; |
134 | Any-to-Any Generation Via Composable Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities. |
Zineng Tang; Ziyi Yang; Chenguang Zhu; Michael Zeng; Mohit Bansal; |
135 | DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper broadens the current scope of neural solvers for NPC problems by introducing a new graph-based diffusion framework, namely DIFUSCO. |
Zhiqing Sun; Yiming Yang; |
136 | Meta-in-context Learning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. |
Julian Coda-Forno; Marcel Binz; Zeynep Akata; Matt Botvinick; Jane Wang; Eric Schulz; |
137 | Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We identify that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. |
Lifan Yuan; Yangyi Chen; Ganqu Cui; Hongcheng Gao; FangYuan Zou; Xingyi Cheng; Heng Ji; Zhiyuan Liu; Maosong Sun; |
138 | Segment Anything in 3D with NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to generalize SAM to segment 3D objects. |
Jiazhong Cen; Zanwei Zhou; Jiemin Fang; chen yang; Wei Shen; Lingxi Xie; Dongsheng Jiang; XIAOPENG ZHANG; Qi Tian; |
139 | Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks. |
Grace Luo; Lisa Dunlap; Dong Huk Park; Aleksander Holynski; Trevor Darrell; |
140 | EmbodiedGPT: Vision-Language Pre-Training Via Embodied Chain of Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. |
Yao Mu; Qinglong Zhang; Mengkang Hu; Wenhai Wang; Mingyu Ding; Jun Jin; Bin Wang; Jifeng Dai; Yu Qiao; Ping Luo; |
141 | Is Your Code Generated By ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus – a code synthesis benchmarking framework to rigorously evaluate the functional correctness of LLM-synthesized code. |
Jiawei Liu; Chunqiu Steven Xia; Yuyao Wang; LINGMING ZHANG; |
142 | ResShift: Efficient Diffusion Model for Image Super-resolution By Residual Shifting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. |
Zongsheng Yue; Jianyi Wang; Chen Change Loy; |
143 | Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In CARP, we test four LLMs with CoT prompting, and find that they are all prone to make mistakes at the early steps of the solution, leading to incorrect answers. Based on this finding, we propose a new approach that can deliberate the reasoning steps with tool interfaces, namely \textbf{DELI}. |
Beichen Zhang; Kun Zhou; Xilin Wei; Xin Zhao; Jing Sha; Shijin Wang; Ji-Rong Wen; |
144 | Simplifying and Empowering Transformers for Large-Graph Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. |
Qitian Wu; Wentao Zhao; Chenxiao Yang; Hengrui Zhang; Fan Nie; Haitian Jiang; Yatao Bian; Junchi Yan; |
145 | Are Diffusion Models Vision-And-Language Reasoners? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. |
Benno Krojer; Elinor Poole-Dayan; Vikram Voleti; Chris Pal; Siva Reddy; |
146 | Autodecoding Latent 3D Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such data is scarce for 3D generation, prohibiting the learning of large-scale diffusion models for 3D synthesis. We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. |
Evangelos Ntavelis; Aliaksandr Siarohin; Kyle Olszewski; Chaoyang Wang; Luc V Gool; Sergey Tulyakov; |
147 | Textually Pretrained Speech Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. |
Michael Hassid; Tal Remez; Tu Anh Nguyen; Itai Gat; Alexis CONNEAU; Felix Kreuk; Jade Copet; Alexandre Defossez; Gabriel Synnaeve; Emmanuel Dupoux; Roy Schwartz; Yossi Adi; |
148 | Learning New Dimensions of Human Visual Similarity Using Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a perceptual metric that assesses images holistically. |
Stephanie Fu; Netanel Tamir; Shobhita Sundaram; Lucy Chai; Richard Zhang; Tali Dekel; Phillip Isola; |
149 | MADLAD-400: Monolingual And Document-Level Large Audited Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on CommonCrawl, spanning 419 languages. |
Sneha Kudugunta; Isaac Caswell; Biao Zhang; Xavier Garcia; Derrick Xin; Aditya Kusupati; Romi Stella; Ankur Bapna; Orhan Firat; |
150 | Symbolic Discovery of Optimization Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. |
Xiangning Chen; Chen Liang; Da Huang; Esteban Real; Kaiyuan Wang; Hieu Pham; Xuanyi Dong; Thang Luong; Cho-Jui Hsieh; Yifeng Lu; Quoc V Le; |
151 | Towards Revealing The Mystery Behind Chain of Thought: A Theoretical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. |
Guhao Feng; Yuntian Gu; Haotian Ye; Bohang Zhang; Di He; Liwei Wang; |
152 | Timewarp: Transferable Acceleration of Molecular Dynamics By Learning Time-Coarsened Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present *Timewarp*, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution. |
Leon Klein; Andrew Foong; Tor Fjelde; Bruno Mlodozeniec; Marc Brockschmidt; Sebastian Nowozin; Frank Noe; Ryota Tomioka; |
153 | On Efficient Training Algorithms For Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit three algorithms: layer stacking, layer dropping, and selective backpropagation. |
Jean Kaddour; Oscar Key; Piotr Nawrot; Pasquale Minervini; Matt Kusner; |
154 | Real-World Image Variation By Aligning Diffusion Inversion Chain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our investigation uncovers that this domain gap originates from a latents’ distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. |
Yuechen Zhang; Jinbo Xing; Eric Lo; Jiaya Jia; |
155 | Tree-Rings Watermarks: Invisible Fingerprints for Diffusion Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. |
Yuxin Wen; John Kirchenbauer; Jonas Geiping; Tom Goldstein; |
156 | Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe an easy-to-use approach to automatically optimize hard text prompts through efficient gradient-based optimization. |
Yuxin Wen; Neel Jain; John Kirchenbauer; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
157 | Reward Imputation with Sketching for Contextual Batched Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks. |
Xiao Zhang; Ninglu Shao; Zihua Si; Jun Xu; Wenhan Wang; Hanjing Su; Ji-Rong Wen; |
158 | REASONER: An Explainable Recommendation Dataset with Comprehensive Labeling Ground Truths Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the past few years, while a lot of promising explainable recommender models have been proposed, the datasets used to evaluate them still suffer from several limitations, for example, the explanation ground truths are not labeled by the real users, the explanations are mostly single-modal and around only one aspect. To bridge these gaps, in this paper, we build a new explainable recommendation dataset, which, to our knowledge, is the first contribution that provides a large amount of real user labeled multi-modal and multi-aspect explaination ground truths. |
Xu Chen; Jingsen Zhang; Lei Wang; Quanyu Dai; Zhenhua Dong; Ruiming Tang; Rui Zhang; Li Chen; Xin Zhao; Ji-Rong Wen; |
159 | Pengi: An Audio Language Model for Audio Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. |
Soham Deshmukh; Benjamin Elizalde; Rita Singh; Huaming Wang; |
160 | Solving Inverse Problems Provably Via Posterior Sampling with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first framework to solve general inverse problems leveraging pre-trained *latent* diffusion models. |
Litu Rout; Negin Raoof; Giannis Daras; Constantine Caramanis; Alex Dimakis; Sanjay Shakkottai; |
161 | On The Exploitability of Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model’s behavior. |
Manli Shu; Jiongxiao Wang; Jonas Geiping; Chaowei Xiao; Tom Goldstein; |
162 | VisoGender: A Dataset for Benchmarking Gender Bias in Image-text Pronoun Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. |
Siobhan Mackenzie Hall; Fernanda Gonçalves Abrantes; Hanwen Zhu; Grace Sodunke; Aleksandar Shtedritski; Hannah Rose Kirk; |
163 | Nonparametric Identifiability of Causal Representations from Unknown Interventions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to identify both the ground truth latents and their causalgraph up to a set of ambiguities which we show to be irresolvable from interventional data. |
Julius von Kügelgen; Michel Besserve; Liang Wendong; Luigi Gresele; Armin Kekić; Elias Bareinboim; David Blei; Bernhard Schölkopf; |
164 | StyleDrop: Text-to-Image Synthesis of Any Style Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce *StyleDrop*, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. |
Kihyuk Sohn; Lu Jiang; Jarred Barber; Kimin Lee; Nataniel Ruiz; Dilip Krishnan; Huiwen Chang; Yuanzhen Li; Irfan Essa; Michael Rubinstein; Yuan Hao; Glenn Entis; Irina Blok; Daniel Castro Chin; |
165 | Multi-Objective Agency Requires Non-Markovian Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a practical non-Markovian aggregation scheme that overcomes the impossibility with only one additional parameter for each objective. |
Silviu Pitis; |
166 | Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To generate a semantic-coherent video, exhibiting a rich portrayal of temporal semantics such as the whole process of flower blooming rather than a set of “moving images”, we propose a novel Free-Bloom pipeline that harnesses large language models (LLMs) as the director to generate a semantic-coherence prompt sequence, while pre-trained latent diffusion models (LDMs) as the animator to generate the high fidelity frames. |
Hanzhuo Huang; Yufan Feng; Cheng Shi; Lan Xu; Jingyi Yu; Sibei Yang; |
167 | VPP: Efficient Universal 3D Generation Via Voxel-Point Progressive Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the characteristics of different representations, we propose VPP, a voxel-point progressive representation for both efficient and universal 3D generation. |
Zekun Qi; Muzhou Yu; Runpei Dong; Kaisheng Ma; |
168 | Bridging Discrete and Backpropagation: Straight-Through and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. |
Liyuan Liu; Chengyu Dong; Xiaodong Liu; Bin Yu; Jianfeng Gao; |
169 | Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We define a novel self-supervised learning (SSL) framework for properly arranging representations in an abstract coding space, and show that it can produce grid codes when constrained to perform high-efficiency representation of space with recurrent neural networks. |
Rylan Schaeffer; Mikail Khona; Tzuhsuan Ma; Cristobal Eyzaguirre; Sanmi Koyejo; Ila Fiete; |
170 | Are Emergent Abilities of Large Language Models A Mirage? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present our alternative explanation in a simple mathematical model, then test it in three complementary ways: we (1) make, test and confirm three predictions on the effect of metric choice using the InstructGPT/GPT-3 family on tasks with claimed emergent abilities, (2) make, test and confirm two predictions about metric choices in a meta-analysis of emergent abilities on BIG-Bench; and (3) show how to choose metrics to produce never-before-seen seemingly emergent abilities in multiple vision tasks across diverse deep networks. |
Rylan Schaeffer; Brando Miranda; Sanmi Koyejo; |
171 | Unlimiformer: Long-Range Transformers with Unlimited Length Input Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single $k$-nearest-neighbor ($k$NN) index, while the returned $k$NN distances are the attention dot-product scores. |
Amanda Bertsch; Uri Alon; Graham Neubig; Matthew Gormley; |
172 | OpenAGI: When LLM Meets Domain Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce \textbf{OpenAGI}, an open-source AGI research platform designed for multi-step, real-world tasks. |
Yingqiang Ge; Wenyue Hua; Kai Mei; jianchao ji; Juntao Tan; Shuyuan Xu; Zelong Li; Yongfeng Zhang; |
173 | A Case for Reframing Automated Medical Image Classification As Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent work has drastically reduced the cost of training segmentation networks. In light of this recent work, we reexamine the choice of training classification vs. segmentation models. |
Sarah Hooper; Mayee Chen; Khaled Saab; Kush Bhatia; Curtis Langlotz; Christopher Ré; |
174 | Controlling Text-to-Image Diffusion By Orthogonal Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method — Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. |
Zeju Qiu; Weiyang Liu; Haiwen Feng; Yuxuan Xue; Yao Feng; Zhen Liu; Dan Zhang; Adrian Weller; Bernhard Schölkopf; |
175 | H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Often, a large amount of transient state information, referred to as the $\mathsf{KV}$ $\mathsf{cache}$, is stored in GPU memory in addition to model parameters, scaling linearly with the sequence length and batch size. In this paper, we introduce a novel approach for implementing the $\mathsf{KV}$ $\mathsf{cache}$ which significantly reduces its memory footprint. |
Zhenyu Zhang; Ying Sheng; Tianyi Zhou; Tianlong Chen; Lianmin Zheng; Ruisi Cai; Zhao Song; Yuandong Tian; Christopher Ré; Clark Barrett; Zhangyang Atlas Wang; Beidi Chen; |
176 | High-Fidelity Audio Compression with Improved RVQGAN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. |
Rithesh Kumar; Prem Seetharaman; Alejandro Luebs; Ishaan Kumar; Kundan Kumar; |
177 | Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity and (2) the sparsely-annotated training data only provide weak supervision for model learning. In this paper, we propose a new WSCOS method to address these two challenges. |
Chunming He; Kai Li; Yachao Zhang; Guoxia Xu; Longxiang Tang; Yulun Zhang; Zhenhua Guo; Xiu Li; |
178 | On Evaluating Adversarial Robustness of Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose evaluating the robustness of open-source large VLMs in the most realistic and high-risk setting, where adversaries have only black-box system access and seek to deceive the model into returning the targeted responses. |
Yunqing Zhao; Tianyu Pang; Chao Du; Xiao Yang; Chongxuan LI; Ngai-Man (Man) Cheung; Min Lin; |
179 | NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable systematic research progress on 3D reconstruction from casual image captures, we propose `NAVI’: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. |
Varun Jampani; Kevis-kokitsi Maninis; Andreas Engelhardt; Arjun Karpur; Karen Truong; Kyle Sargent; Stefan Popov; Andre Araujo; Ricardo Martin Brualla; Kaushal Patel; Daniel Vlasic; Vittorio Ferrari; Ameesh Makadia; Ce Liu; Yuanzhen Li; Howard Zhou; |
180 | SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. |
Bill Yuchen Lin; Yicheng Fu; Karina Yang; Prithviraj (Raj) Ammanabrolu; Faeze Brahman; Shiyu Huang; Chandra Bhagavatula; Yejin Choi; Xiang Ren; |
181 | Bypass Exponential Time Preprocessing: Fast Neural Network Training Via Weight-Data Correlation Preprocessing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration. |
Josh Alman; 杰昊 梁; Zhao Song; Ruizhe Zhang; Danyang Zhuo; |
182 | TART: A Plug-and-play Transformer Module for Task-agnostic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and, as a proof of concept, propose TART which generically improves an LLM’s reasoning abilities using a synthetically trained reasoning module. |
Kush Bhatia; Avanika Narayan; Christopher De Sa; Christopher Ré; |
183 | Skill-it! A Data-driven Skills Framework for Understanding and Training Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for learning skills more quickly for both continual pre-training and fine-tuning regimes, where we aim to learn multiple skills in the former and an individual skill in the latter. |
Mayee Chen; Nicholas Roberts; Kush Bhatia; Jue WANG; Ce Zhang; Frederic Sala; Christopher Ré; |
184 | Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we explore Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension. |
Dan Fu; Jessica Grogan; Isys Johnson; Simran Arora; Evan Sabri Eyuboglu; Armin Thomas; Benjamin Spector; Michael Poli; Atri Rudra; Christopher Ré; |
185 | VidChapters-7M: Video Chapters at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This important topic has been understudied due to the lack of publicly released datasets. To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total. |
Antoine Yang; Arsha Nagrani; Ivan Laptev; Josef Sivic; Cordelia Schmid; |
186 | How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in A Pre-trained Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. |
Michael Hanna; Ollie Liu; Alexandre Variengien; |
187 | Multi-scale Diffusion Denoised Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the trade-off between accuracy and certified robustness of denoised smoothing: for example, we question on which representation of diffusion model would maximize the certified robustness of denoised smoothing. |
Jongheon Jeong; Jinwoo Shin; |
188 | PyNeRF: Pyramidal Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple modification to grid-based models by training model heads at different spatial grid resolutions. |
Haithem Turki; Michael Zollhöfer; Christian Richardt; Deva Ramanan; |
189 | UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. |
Wenliang Zhao; Lujia Bai; Yongming Rao; Jie Zhou; Jiwen Lu; |
190 | PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 128K data samples to support the fine-tuning, and an evaluation benchmark with 8 tasks and 15 datasets. |
Qianqian Xie; Weiguang Han; Xiao Zhang; Yanzhao Lai; Min Peng; Alejandro Lopez-Lira; Jimin Huang; |
191 | AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). |
Tong Wu; Zhihao Fan; Xiao Liu; Yeyun Gong; yelong shen; Jian Jiao; Hai-Tao Zheng; Juntao Li; zhongyu wei; Jian Guo; Nan Duan; Weizhu Chen; |
192 | BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite initial defenses proposed in recent studies, these methods have very limited generalizability and scalability. To address this issue, we propose BIRD, a technique to detect and remove backdoors from a pretrained DRL policy in a clean environment without requiring any knowledge about the attack specifications and accessing its training process. |
Xuan Chen; Wenbo Guo; Guanhong Tao; Xiangyu Zhang; Dawn Song; |
193 | Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called *dual pseudo training* (DPT), built upon strong semi-supervised learners and diffusion models. |
Zebin You; Yong Zhong; Fan Bao; Jiacheng Sun; Chongxuan LI; Jun Zhu; |
194 | Synthetic Pretraining for Few-shot Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the more challenging yet realistic setting of few-shot black-box optimization, where only a few labeled data points are available. |
Tung Nguyen; Sudhanshu Agrawal; Aditya Grover; |
195 | ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. |
Tung Nguyen; Jason Jewik; Hritik Bansal; Prakhar Sharma; Aditya Grover; |
196 | EvoPrompting: Language Models for Code-Level Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the recent impressive accomplishments of language models (LMs) for code generation, we explore the use of LMs as general adaptive mutation and crossover operators for an evolutionary neural architecture search (NAS) algorithm. |
Angelica Chen; David Dohan; David So; |
197 | Training-Free Composition of Parameter-Efficient Modules with Arithmetic Operation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. |
Jinghan Zhang; shiqi chen; Junteng Liu; Junxian He; |
198 | Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Uni-ControlNet, a novel approach that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one model. |
Shihao Zhao; Dongdong Chen; Yen-Chun Chen; Jianmin Bao; Shaozhe Hao; Lu Yuan; Kwan-Yee K. Wong; |
199 | Saddle-to-Saddle Dynamics in Diagonal Linear Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we fully describe the trajectory of gradient flow over $2$-layer diagonal linear networks for the regression setting in the limit of vanishing initialisation. |
Scott Pesme; Nicolas Flammarion; |
200 | Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset’s domains and augment the training data via language-guided image editing. |
Lisa Dunlap; Alyssa Umino; Han Zhang; Jiezhi Yang; Joseph Gonzalez; Trevor Darrell; |
201 | Backprop-Free Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, different from the time-consuming forward-backward passes, we introduce a backprop-free fashion for dataset distillation with significantly improved efficiency. |
Songhua Liu; Xinchao Wang; |
202 | DiffComplete: Diffusion-based Generative 3D Shape Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new diffusion-based approach for shape completion on 3D range scans. |
Ruihang Chu; Enze Xie; Shentong Mo; Zhenguo Li; Matthias Niessner; Chi-Wing Fu; Jiaya Jia; |
203 | ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild. |
Chun-Han Yao; Amit Raj; Wei-Chih Hung; Michael Rubinstein; Yuanzhen Li; Ming-Hsuan Yang; Varun Jampani; |
204 | PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PrimDiffusion, the first diffusion-based framework for 3D human generation. |
Zhaoxi Chen; Fangzhou Hong; Haiyi Mei; Guangcong Wang; Lei Yang; Ziwei Liu; |
205 | Fast Attention Requires Bounded Entries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate whether faster algorithms are possible by \emph{implicitly} making use of the matrix $A$. |
Josh Alman; Zhao Song; |
206 | Image Captioners Are Scalable Vision Learners Too Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, image captioning on this type of data is commonly considered an inferior pretraining strategy. In this paper, we perform a fair comparison of these two pretraining strategies, carefully matching training data, compute, and model capacity. |
Michael Tschannen; Manoj Kumar; Andreas Steiner; Xiaohua Zhai; Neil Houlsby; Lucas Beyer; |
207 | FELM: Benchmarking Factuality Evaluation of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as FELM. |
shiqi chen; Yiran Zhao; Jinghan Zhang; I-Chun Chern; Siyang Gao; Pengfei Liu; Junxian He; |
208 | Fair Graph Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the proposed coherence metric, we introduce a framework for fair graph distillation using a bi-level optimization algorithm. |
Qizhang Feng; Zhimeng Jiang; Ruiquan Li; Yicheng Wang; Na Zou; Jiang Bian; Xia Hu; |
209 | Learning Transformer Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a procedure for training Transformers that are mechanistically interpretable by design. |
Dan Friedman; Alexander Wettig; Danqi Chen; |
210 | BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs). |
Jiaming Ji; Mickel Liu; Josef Dai; Xuehai Pan; Chi Zhang; Ce Bian; Boyuan Chen; Ruiyang Sun; Yizhou Wang; Yaodong Yang; |
211 | Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an environment suite called Safety-Gymnasium, which encompasses safety-critical tasks in both single and multi-agent scenarios, accepting vector and vision-only input. |
Jiaming Ji; Borong Zhang; Jiayi Zhou; Xuehai Pan; Weidong Huang; Ruiyang Sun; Yiran Geng; Josef Dai; Yaodong Yang; |
212 | ClusterFomer: Clustering As A Universal Visual Learner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents ClusterFormer, a universal vision model that is based on the Clustering paradigm with TransFormer. |
James Liang; Yiming Cui; Qifan Wang; Tong Geng; Wenguan Wang; Dongfang Liu; |
213 | Meet in The Middle: A New Pre-training Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce “Meet in the Middle” (MIM) a new pre-training paradigm that improves data efficiency by training in two directions, left-to-right and right-to-left, and encouraging the respective modelsto agree on their token distribution for each position. |
Anh Nguyen; Nikos Karampatziakis; Weizhu Chen; |
214 | Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. |
Alvin Heng; Harold Soh; |
215 | Laying The Foundation for An Instruction-Following Generalist Agent in Minecraft Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study introduces an instruction-tuned Video Pretraining (VPT) model for Minecraft called STEVE-1, demonstrating that the unCLIP approach, utilized in DALL•E 2, is also effective for creating instruction-following sequential decision-making agents. |
Shalev Lifshitz; Keiran Paster; Harris Chan; Jimmy Ba; Sheila McIlraith; |
216 | Mixture Weight Estimation and Model Prediction in Multi-source Multi-target Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a problem of learning a model from multiple sources with the goal to performwell on a new target distribution. |
Yuyang Deng; Ilja Kuzborskij; Mehrdad Mahdavi; |
217 | Distributed Personalized Empirical Risk Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To learn personalized models at scale, we propose a distributed algorithm that replaces the standard model averaging with model shuffling to simultaneously optimize PERM objectives for all devices. |
Yuyang Deng; Mohammad Mahdi Kamani; Pouria Mahdavinia; Mehrdad Mahdavi; |
218 | H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework to automatically find an efficient integration of memory optimization and parallelism for High-Throughput Transformer Training (named H3T), which is rarely considered by existing efforts for training big Transformer-based models. |
Yuzhong Wang; Xu Han; Weilin Zhao; Guoyang Zeng; Zhiyuan Liu; Maosong Sun; |
219 | The Clock and The Pizza: Two Stories in Mechanistic Explanation of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex: small changes to model hyperparameters and initializations can induce discovery of qualitatively different algorithms from a fixed training set, and even learning of multiple different solutions in parallel. |
Ziqian Zhong; Ziming Liu; Max Tegmark; Jacob Andreas; |
220 | SyncDiffusion: Coherent Montage Via Synchronized Joint Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. |
Yuseung Lee; Kunho Kim; Hyunjin Kim; Minhyuk Sung; |
221 | M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce M3Exam, a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. |
Wenxuan Zhang; Mahani Aljunied; Chang Gao; Yew Ken Chia; Lidong Bing; |
222 | Learning Visual Prior Via Generative Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to explicitly learn the visual prior and enable the customization of sampling. |
Jinheng Xie; Kai Ye; Yudong Li; Yuexiang Li; Kevin Qinghong Lin; Yefeng Zheng; Linlin Shen; Mike Zheng Shou; |
223 | Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, it becomes increasingly important to study the properties of human texts that are invariant over text domains and various proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant of human texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings of a given text sample. |
Eduard Tulchinskii; Kristian Kuznetsov; Laida Kushnareva; Daniil Cherniavskii; Sergey Nikolenko; Irina Piontkovskaya; Serguei Barannikov; Evgeny Burnaev; |
224 | Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. |
Simian Luo; Chuanhao Yan; Chenxu Hu; Hang Zhao; |
225 | ForecastPFN: Synthetically-Trained Zero-Shot Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a different approach and devise ForecastPFN, a zero-shot forecasting model that is trained purely on a novel synthetic data distribution. |
Samuel Dooley; Gurnoor Singh Khurana; Chirag Mohapatra; Siddartha V Naidu; Colin White; |
226 | Quantizable Transformers: Removing Outliers By Helping Attention Heads Do Nothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve the exact zeros needed in the attention matrix for a no-update, the input to the softmax is pushed to be larger and larger during training, causing outliers in other parts of the network. Based on these observations, we propose two simple (independent) modifications to the attention mechanism – _clipped softmax_ and _gated attention_. |
Yelysei Bondarenko; Markus Nagel; Tijmen Blankevoort; |
227 | Evaluating The Moral Beliefs Encoded in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we design a survey, a set of evaluation metrics, and a statistical workflow on how to elicit the moral beliefs encoded in an LLM. |
Nino Scherrer; Claudia Shi; Amir Feder; David Blei; |
228 | Inverse Preference Learning: Preference-based RL Without A Reward Function Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of using highly complex architectures, we develop a new and parameter-efficient algorithm, Inverse Preference Learning (IPL), specifically designed for learning from offline preference data. Our key insight is that for a fixed policy, the $Q$-function encodes all information about the reward function, effectively making them interchangeable. |
Joey Hejna; Dorsa Sadigh; |
229 | Sharpness-Aware Minimization Leads to Low-Rank Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. |
Maksym Andriushchenko; Dara Bahri; Hossein Mobahi; Nicolas Flammarion; |
230 | Graphs Contrastive Learning with Stable and Scalable Spectral Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing spectral-based graph views either ignore the eigenvectors that encode valuable positional information or suffer from high complexity when trying to address the instability of spectral features. To tackle these challenges, we first design an informative, stable, and scalable spectral encoder, termed EigenMLP, to learn effective representations from the spectral features. |
Deyu Bo; Yuan Fang; Yang Liu; Chuan Shi; |
231 | T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. |
Kaiyi Huang; Kaiyue Sun; Enze Xie; Zhenguo Li; Xihui Liu; |
232 | Towards Label-free Scene Understanding By Vision Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. |
Runnan Chen; Youquan Liu; Lingdong Kong; Nenglun Chen; Xinge ZHU; Yuexin Ma; Tongliang Liu; Wenping Wang; |
233 | Efficient Diffusion Policies For Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. |
Bingyi Kang; Xiao Ma; Chao Du; Tianyu Pang; Shuicheng Yan; |
234 | DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives – including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. |
Boxin Wang; Weixin Chen; Hengzhi Pei; Chulin Xie; Mintong Kang; Chenhui Zhang; Chejian Xu; Zidi Xiong; Ritik Dutta; Rylan Schaeffer; Sang Truong; Simran Arora; Mantas Mazeika; Dan Hendrycks; Zinan Lin; Yu Cheng; Sanmi Koyejo; Dawn Song; Bo Li; |
235 | Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we uncover two factors limiting the VL models’ compositional reasoning performance. |
Sivan Doveh; Assaf Arbelle; Sivan Harary; Roei Herzig; Donghyun Kim; Paola Cascante-Bonilla; Amit Alfassy; Rameswar Panda; Raja Giryes; Rogerio Feris; Shimon Ullman; Leonid Karlinsky; |
236 | PolyDiffuse: Polygonal Shape Reconstruction Via Guided Set Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents \textit{PolyDiffuse}, a novel structured reconstruction algorithm that transforms visual sensor data into polygonal shapes with Diffusion Models (DM), an emerging machinery amid exploding generative AI, while formulating reconstruction as a generation process conditioned on sensor data. |
Jiacheng Chen; Ruizhi Deng; Yasutaka Furukawa; |
237 | EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. |
Michael Wornow; Rahul Thapa; Ethan Steinberg; Jason Fries; Nigam Shah; |
238 | Towards Optimal Caching and Model Selection for Large Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model selector to choose from an ensemble of models for query processing. |
Banghua Zhu; Ying Sheng; Lianmin Zheng; Clark Barrett; Michael Jordan; Jiantao Jiao; |
239 | Doubly-Robust Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce doubly-robust self-training, an innovative semi-supervised algorithm that provably balances between two extremes. |
Banghua Zhu; Mingyu Ding; Philip Jacobson; Ming Wu; Wei Zhan; Michael Jordan; Jiantao Jiao; |
240 | From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. |
Weiwen Xu; Xin Li; Wenxuan Zhang; Meng Zhou; Wai Lam; Luo Si; Lidong Bing; |
241 | Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the uncertain temporal asynchrony and limited communication conditions that are present in traffic environments can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework. |
Haibao Yu; Yingjuan Tang; Enze Xie; Jilei Mao; Ping Luo; Zaiqing Nie; |
242 | Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Annotator, a general and efficient active learning baseline, in which a plain voxel-centric online selection strategy is tailored to probe and annotate the salient and exemplar voxel girds within each LiDAR scan, broadening the potential of segmentation performance even under distribution shift. |
Binhui Xie; Shuang Li; Qingju Guo; Chi Liu; Xinjing Cheng; |
243 | Learning Mask-aware CLIP Representations for Zero-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This issue mainly relates to the fact that CLIP is trained with image-level supervision. To alleviate this issue, we propose a simple yet effective method, named Mask-aware Fine-tuning (MAFT). |
Siyu Jiao; Yunchao Wei; Yaowei Wang; Yao Zhao; Humphrey Shi; |
244 | Res-Tuning: A Flexible and Efficient Tuning Paradigm Via Unbinding Tuner from Backbone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally \textit{unbinds} tuners from the backbone. |
Zeyinzi Jiang; Chaojie Mao; Ziyuan Huang; Ao Ma; Yiliang Lv; Yujun Shen; Deli Zhao; Jingren Zhou; |
245 | AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. |
Jiakang Yuan; Bo Zhang; Xiangchao Yan; Botian Shi; Tao Chen; Yikang LI; Yu Qiao; |
246 | To Repeat or Not To Repeat: Insights from Scaling LLM Under Token-Crisis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we empirically investigate three key aspects under this approach. |
Fuzhao Xue; Yao Fu; Wangchunshu Zhou; Zangwei Zheng; Yang You; |
247 | Rubik’s Cube: High-Order Channel Interactions with A Hierarchical Receptive Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of these methods, \emph{e.g.}, convolution and the FFN architecture of transformers, only take implicit advantage of the first-order channel interaction and have yet to fully tap into its potential for high-order modeling. To address this, our study delves into modeling channel-dimension relationships, and proposes a simple yet effective and efficient high-order channel-wise operator for image restoration. |
Naishan Zheng; man zhou; Chong Zhou; Chen Change Loy; |
248 | When Do Neural Nets Outperform Boosted Trees on Tabular Data? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. |
Duncan McElfresh; Sujay Khandagale; Jonathan Valverde; Vishak Prasad C; Ganesh Ramakrishnan; Micah Goldblum; Colin White; |
249 | Large Language Models Implicitly Learn to Straighten Neural Sentence Trajectories to Construct A Predictive Representation of Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We quantify straightness using a1-dimensional curvature metric, and provide support for the trajectory straighteninghypothesis across four results: i) In trained models, the curvature progressivelydecreases from the first to the middle layers of the network. |
Eghbal Hosseini; Evelina Fedorenko; |
250 | 4D Panoptic Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. |
Jingkang Yang; Jun CEN; WENXUAN PENG; Shuai Liu; Fangzhou Hong; Xiangtai Li; Kaiyang Zhou; Qifeng Chen; Ziwei Liu; |
251 | Binarized Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. |
Yichi Zhang; Ankush Garg; Yuan Cao; Lukasz Lew; Behrooz Ghorbani; Zhiru Zhang; Orhan Firat; |
252 | Provable Convergence Guarantees for Black-box Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While black-box variational inference is widely used, there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs—namely the challenge of gradient estimators with unusual noise bounds,and a composite non-smooth objective. |
Justin Domke; Robert Gower; Guillaume Garrigos; |
253 | (S)GD Over Diagonal Linear Networks: Implicit Bias, Large Stepsizes and Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2$-layer diagonal linear networks. |
Mathieu Even; Scott Pesme; Suriya Gunasekar; Nicolas Flammarion; |
254 | Data Quality in Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time. |
Suneel Belkhale; Yuchen Cui; Dorsa Sadigh; |
255 | An Information Theory Perspective on Variance-Invariance-Covariance Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an information-theoretic perspective on the VICReg objective. |
Ravid Shwartz-Ziv; Randall Balestriero; Kenji Kawaguchi; Tim G. J. Rudner; Yann LeCun; |
256 | LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. |
Neel Guha; Julian Nyarko; Daniel Ho; Christopher Ré; Adam Chilton; Aditya K; Alex Chohlas-Wood; Austin Peters; Brandon Waldon; Daniel Rockmore; Diego Zambrano; Dmitry Talisman; Enam Hoque; Faiz Surani; Frank Fagan; Galit Sarfaty; Gregory Dickinson; Haggai Porat; Jason Hegland; Jessica Wu; Joe Nudell; Joel Niklaus; John Nay; Jonathan Choi; Kevin Tobia; Margaret Hagan; Megan Ma; Michael Livermore; Nikon Rasumov-Rahe; Nils Holzenberger; Noam Kolt; Peter Henderson; Sean Rehaag; Sharad Goel; Shang Gao; Spencer Williams; Sunny Gandhi; Tom Zur; Varun Iyer; Zehua Li; |
257 | Hierarchical Open-vocabulary Universal Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a decoupled text-image fusion mechanism and representation learning modules for both “things” and “stuff”. |
Xudong Wang; Shufan Li; Konstantinos Kallidromitis; Yusuke Kato; Kazuki Kozuka; Trevor Darrell; |
258 | How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the advancements in previous efforts, challenges remain due to various noises in the perception procedure, including communication redundancy, transmission delay, and collaboration heterogeneity. To tackle these issues, we propose How2comm, a collaborative perception framework that seeks a trade-off between perception performance and communication bandwidth. |
Dingkang Yang; Kun Yang; Yuzheng Wang; Jing Liu; Zhi Xu; Peng Zhai; Lihua Zhang; Rongbin Yin; |
259 | Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Mind-Video that learns spatiotemporal information from continuous fMRI data of the cerebral cortex progressively through masked brain modeling, multimodal contrastive learning with spatiotemporal attention, and co-training with an augmented Stable Diffusion model that incorporates network temporal inflation. |
Zijiao Chen; Jiaxin Qing; Juan Helen Zhou; |
260 | MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. |
Zeyuan Ma; Hongshu Guo; Jiacheng Chen; Zhenrui Li; Guojun Peng; Yue-Jiao Gong; Yining Ma; Zhiguang Cao; |
261 | Convolutional Neural Operators for Robust and Accurate Learning of PDEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although very successfully used in conventional machine learning, convolution based neural network architectures — believed to be inconsistent in function space — have been largely ignored in the context of learning solution operators of PDEs. Here, we present novel adaptations for convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. |
Bogdan Raonic; Roberto Molinaro; Tim De Ryck; Tobias Rohner; Francesca Bartolucci; Rima Alaifari; Siddhartha Mishra; Emmanuel de Bézenac; |
262 | Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present comprehensive benchmark suite which consists of two semiconductor material datasets and 10 MLFF models with 6 evaluation metrics. |
Geonu Kim; Byunggook Na; Gunhee Kim; Hyuntae Cho; Seungjin Kang; Hee Sun Lee; Saerom Choi; Heejae Kim; Seungwon Lee; Yongdeok Kim; |
263 | Alexa Arena: A User-Centric Interactive Platform for Embodied AI Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. |
Qiaozi Gao; Govindarajan Thattai; Suhaila Shakiah; Xiaofeng Gao; Shreyas Pansare; Vasu Sharma; Gaurav Sukhatme; Hangjie Shi; Bofei Yang; Desheng Zhang; Lucy Hu; Karthika Arumugam; Shui Hu; Matthew Wen; Dinakar Guthy; Shunan Chung; Rohan Khanna; Osman Ipek; Leslie Ball; Kate Bland; Heather Rocker; Michael Johnston; Reza Ghanadan; Dilek Hakkani-Tur; Prem Natarajan; |
264 | CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. |
Guohao Li; Hasan Hammoud; Hani Itani; Dmitrii Khizbullin; Bernard Ghanem; |
265 | $k$-Means Clustering with Distance-Based Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we initiate the study of Euclidean clustering with Distance-based privacy. |
Alessandro Epasto; Vahab Mirrokni; Shyam Narayanan; Peilin Zhong; |
266 | Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. |
Haonan Duan; Adam Dziedzic; Nicolas Papernot; Franziska Boenisch; |
267 | Contrastive Lift: 3D Object Instance Segmentation By Slow-Fast Contrastive Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instance segmentation in 3D is a challenging task due to the lack of large-scale annotated datasets. In this paper, we show that this task can be addressed effectively by leveraging instead 2D pre-trained models for instance segmentation. |
Yash Bhalgat; Iro Laina; João Henriques; Andrea Vedaldi; Andrew Zisserman; |
268 | HeadSculpt: Crafting 3D Head Avatars with Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is primarily due to the inherited limitations from the pre-trained 2D image diffusion models, which become more pronounced when it comes to 3D head avatars. In this work, we address these challenges by introducing a versatile coarse-to-fine pipeline dubbed HeadSculpt for crafting (i.e., generating and editing) 3D head avatars from textual prompts. |
Xiao Han; Yukang Cao; Kai Han; Xiatian Zhu; Jiankang Deng; Yi-Zhe Song; Tao Xiang; Kwan-Yee K. Wong; |
269 | Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale. |
Matthew Le; Bowen Shi; Apoorv Vyas; Brian Karrer; Leda Sari; Yossi Adi; Vimal Manohar; Jay Mahadeokar; Wei-Ning Hsu; |
270 | Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training By Diminishing Bias Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel framework named Unifying Cross-Lingual Medical Vision-Language Pre-Training (\textbf{Med-UniC}), designed to integrate multi-modal medical data from the two most prevalent languages, English and Spanish. |
Zhongwei Wan; Che Liu; Mi Zhang; Jie Fu; Benyou Wang; Sibo Cheng; Lei Ma; César Quilodrán-Casas; Rossella Arcucci; |
271 | VideoComposer: Compositional Video Synthesis with Motion Controllability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the paradigm of compositional generation, this work presents VideoComposer that allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions. |
Xiang Wang; Hangjie Yuan; Shiwei Zhang; Dayou Chen; Jiuniu Wang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou; |
272 | Subject-driven Text-to-Image Generation Via Apprenticeship Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with {in-context} learning. |
wenhu chen; Hexiang Hu; Yandong Li; Nataniel Ruiz; Xuhui Jia; Ming-Wei Chang; William Cohen; |
273 | Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). |
Gen Luo; Yiyi Zhou; Tianhe Ren; Shengxin Chen; Xiaoshuai Sun; Rongrong Ji; |
274 | Data Portraits: Recording Foundation Model Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even while these models are now key in AI system building, it can be difficult to answer the straightforward question: has the model already encountered a given example during training? We therefore propose a widespread adoption of Data Portraits: artifacts that record training data and allow for downstream inspection. |
Marc Marone; Benjamin Van Durme; |
275 | SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. |
Zhongang Cai; Wanqi Yin; Ailing Zeng; CHEN WEI; Qingping SUN; Wang Yanjun; Hui En Pang; Haiyi Mei; Mingyuan Zhang; Lei Zhang; Chen Change Loy; Lei Yang; Ziwei Liu; |
276 | Puzzlefusion: Unleashing The Power of Diffusion Models for Spatial Puzzle Solving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. |
Sepidehsadat (Sepid) Hossieni; Mohammad Amin Shabani; Saghar Irandoust; Yasutaka Furukawa; |
277 | What Can We Learn from Unlearnable Datasets? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To emphasize why linearly separable perturbations should not be relied upon, we propose an orthogonal projection attack which allows learning from unlearnable datasets published in ICML 2021 and ICLR 2023. |
Pedro Sandoval-Segura; Vasu Singla; Jonas Geiping; Micah Goldblum; Tom Goldstein; |
278 | Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of planning in Minecraft, a popular, democratized yet challenging open-ended environment for developing multi-task embodied agents. |
Zihao Wang; Shaofei Cai; Guanzhou Chen; Anji Liu; Xiaojian (Shawn) Ma; Yitao Liang; |
279 | Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a systematic and efficient method to expedite the annotation process for organ segmentation. |
Chongyu Qu; Tiezheng Zhang; Hualin Qiao; jie liu; Yucheng Tang; Alan Yuille; Zongwei Zhou; |
280 | ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to model the 3D parameter as a random variable instead of a constant as in SDS and present *variational score distillation* (VSD), a principled particle-based variational framework to explain and address the aforementioned issues in text-to-3D generation. |
Zhengyi Wang; Cheng Lu; Yikai Wang; Fan Bao; Chongxuan LI; Hang Su; Jun Zhu; |
281 | Testing The General Deductive Reasoning Capacity of Large Language Models Using OOD Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. |
Abulhair Saparov; Richard Yuanzhe Pang; Vishakh Padmakumar; Nitish Joshi; Mehran Kazemi; Najoung Kim; He He; |
282 | Margin Maximization in Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we initiate the study of a softmax-attention model $f(X)=v^\top X^\top \text{softmax}(XW^\top p)$, where, $X$ is the tokenized input, $v$ is the value weights, $W$ is the key-query weights, and $p$ is a tunable token/prompt. |
Davoud Ataee Tarzanagh; Yingcong Li; Xuechen Zhang; Samet Oymak; |
283 | LLMScore: Unveiling The Power of Large Language Models in Text-to-Image Synthesis Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality. |
Yujie Lu; Xianjun Yang; Xiujun Li; Xin Eric Wang; William Yang Wang; |
284 | Online Map Vectorization for Autonomous Driving: A Rasterization Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current map vectorization methods often exhibit deviations, and the existing evaluation metric for map vectorization lacks sufficient sensitivity to detect these deviations. To address these limitations, we propose integrating the philosophy of rasterization into map vectorization. |
Gongjie Zhang; Jiahao Lin; Shuang Wu; yilin song; Zhipeng Luo; Yang Xue; Shijian Lu; Zuoguan Wang; |
285 | PromptIR: Prompting for All-in-One Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a prompt-based learning approach, PromptIR, for All-In-One image restoration that can effectively restore images from various types and levels of degradation. |
Vaishnav Potlapalli; Syed Waqas Zamir; Salman Khan; Fahad Shahbaz Khan; |
286 | Learning in The Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We consider the learning of a single-index target function $f_*: \mathbb{R}^d\to\mathbb{R}$ under spiked covariance data: $f_*(\boldsymbol{x}) = … |
Jimmy Ba; Murat Erdogdu; Taiji Suzuki; Zhichao Wang; Denny Wu; |
287 | Disentangled Wasserstein Autoencoder for Protein Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Identifying and modifying those functional sites is critical for protein engineering but computationally non-trivial, and requires significant domain knowledge. To automate this process from a data-driven perspective, we propose a disentangled Wasserstein autoencoder with an auxiliary classifier, which isolates the function-related patterns from the rest with theoretical guarantees. |
Tianxiao Li; Hongyu Guo; Filippo Grazioli; Mark Gerstein; Martin Renqiang Min; |
288 | Recommender Systems with Generative Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates directly. |
Shashank Rajput; Nikhil Mehta; Anima Singh; Raghunandan Hulikal Keshavan; Trung Vu; Lukasz Heldt; Lichan Hong; Yi Tay; Vinh Tran; Jonah Samost; Maciej Kula; Ed Chi; Mahesh Sathiamoorthy; |
289 | AndroidInTheWild: A Large-Scale Dataset For Android Device Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new dataset for mobile device control, AndroidInTheWild, which is orders of magnitude larger than current datasets. |
Christopher Rawles; Alice Li; Oriana Riva; Daniel Rodriguez; Timothy Lillicrap; |
290 | Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose $\texttt{ARCO}$, a semi-supervised contrastive learning (CL) framework with stratified group theory for medical image segmentation. |
Chenyu You; Weicheng Dai; Yifei Min; Fenglin Liu; David Clifton; S. Kevin Zhou; Lawrence Staib; James Duncan; |
291 | Generator Born from Classifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make a bold attempt toward an ambitious task: given a pre-trained classifier, we aim to reconstruct an image generator, without relying on any data samples. |
Runpeng Yu; Xinchao Wang; |
292 | Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in this paper, we propose a new framework to leverage neural methods to answer complex logical queries based on an EVKG, which can satisfy not only traditional first-order logic constraints but also implicit logical constraints over eventualities concerning their occurrences and orders. |
Jiaxin Bai; Xin Liu; Weiqi Wang; Chen Luo; Yangqiu Song; |
293 | (Provable) Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, we propose a sound notion of adversarial robustness that accounts for task equivariance. |
Jan Schuchardt; Yan Scholten; Stephan Günnemann; |
294 | Dynamically Masked Discriminator for GANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method for GANs from the viewpoint of online continual learning. |
Wentian Zhang; Haozhe Liu; Bing Li; Jinheng Xie; Yawen Huang; Yuexiang Li; Yefeng Zheng; Bernard Ghanem; |
295 | LANCE: Stress-testing Visual Models By Generating Language-guided Counterfactual Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). |
Viraj Prabhu; Sriram Yenamandra; Prithvijit Chattopadhyay; Judy Hoffman; |
296 | In-Context Impersonation Reveals Large Language Models’ Strengths and Biases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. |
Leonard Salewski; Isabel Rio-Torto; Stephan Alaniz; Eric Schulz; Zeynep Akata; |
297 | PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. |
Phillip Lippe; Bas Veeling; Paris Perdikaris; Richard Turner; Johannes Brandstetter; |
298 | RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar algorithms across different scenarios. |
Dongwei Pan; Long Zhuo; Jingtan Piao; Huiwen Luo; Wei Cheng; Yuxin WANG; Siming Fan; Shengqi Liu; Lei Yang; Bo Dai; Ziwei Liu; Chen Change Loy; Chen Qian; Wayne Wu; Dahua Lin; Kwan-Yee Lin; |
299 | In-Context Learning Unlocked for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. |
Zhendong Wang; Yifan Jiang; Yadong Lu; yelong shen; Pengcheng He; Weizhu Chen; Zhangyang Atlas Wang; Mingyuan Zhou; |
300 | Beta Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce beta diffusion with multiplicative transitions over time as a novel method for generative modeling of range-bounded data supported over disjoint regions. |
Mingyuan Zhou; Tianqi Chen; Huangjie Zheng; Zhendong Wang; |
301 | Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. |
Zhendong Wang; Yifan Jiang; Huangjie Zheng; Peihao Wang; Pengcheng He; Zhangyang Atlas Wang; Weizhu Chen; Mingyuan Zhou; |
302 | FaceComposer: A Unified Framework for Versatile Facial Content Creation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents FaceComposer, a unified generative model that accomplishesa variety of facial content creation tasks, including text-conditioned face synthesis,text-guided face editing, face animation etc. |
Jiayu Wang; Kang Zhao; Yifeng Ma; Shiwei Zhang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou; |
303 | Neural Priming for Sample-Efficient Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. |
Matthew Wallingford; Vivek Ramanujan; Alex Fang; Aditya Kusupati; Roozbeh Mottaghi; Aniruddha Kembhavi; Ludwig Schmidt; Ali Farhadi; |
304 | Learning Threshold Neurons Via Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. |
Kwangjun Ahn; Sebastien Bubeck; Sinho Chewi; Yin Tat Lee; Felipe Suarez; Yi Zhang; |
305 | FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. |
Mingyuan Zhang; Huirong Li; Zhongang Cai; Jiawei Ren; Lei Yang; Ziwei Liu; |
306 | Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop MAE as a unified, modality-agnostic SSL framework. |
Huiwon Jang; Jihoon Tack; Daewon Choi; Jongheon Jeong; Jinwoo Shin; |
307 | Norm-guided Latent Space Exploration for Text-to-image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric that takes into account a norm-based prior on seeds. We describe a simple yet efficient algorithm for approximating this metric and use it to further define centroids in the latent seed space. |
Dvir Samuel; Rami Ben-Ari; Nir Darshan; Haggai Maron; Gal Chechik; |
308 | Do Imperceptible Perturbations Really Prevent Unauthorized Data Usage in Diffusion-based Image Generation Systems? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that those existing methods provide a false sense of protection. |
Bochuan Cao; Changjiang Li; Ting Wang; Jinyuan Jia; Bo Li; Jinghui Chen; |
309 | Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel exploration technique that maximizes the value-conditional state entropy, which separately estimates the state entropies that are conditioned on the value estimates of each state, then maximizes their average. |
Dongyoung Kim; Jinwoo Shin; Pieter Abbeel; Younggyo Seo; |
310 | Molecule Joint Auto-Encoding: Self-Supervised Learning of 2D and 3D Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a pretraining method for molecule joint auto-encoding (MoleculeJAE). |
weitao Du; Jiujiu Chen; Xuecang Zhang; Zhi-Ming Ma; Shengchao Liu; |
311 | Reverse Engineering Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Understanding the learned representation and underlying mechanisms of Self-Supervised Learning (SSL) often poses a challenge. In this paper, we ‘reverse engineer’ SSL, conducting an in-depth empirical analysis of its learned internal representations, encompassing diverse models, architectures, and hyperparameters. |
Ido Ben-Shaul; Ravid Shwartz-Ziv; Tomer Galanti; Shai Dekel; Yann LeCun; |
312 | Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called \sas, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. |
Zirui Liu; Guanchu Wang; Shaochen (Henry) Zhong; Zhaozhuo Xu; Daochen Zha; Ruixiang Tang; Zhimeng Jiang; Kaixiong Zhou; Vipin Chaudhary; Shuai Xu; Xia Hu; |
313 | Object-Centric Slot Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the feasibility and potential of integrating diffusion models into object-centric learning and investigate the pros and cons of this approach. |
Jindong Jiang; Fei Deng; Gautam Singh; Sungjin Ahn; |
314 | Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target dataset for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. |
Zhimeng Jiang; Xiaotian Han; Hongye Jin; Guanchu Wang; Rui Chen; Na Zou; Xia Hu; |
315 | INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. |
Shih-Cheng Huang; Zepeng Huo; Ethan Steinberg; Chia-Chun Chiang; Curtis Langlotz; Matthew Lungren; Serena Yeung; Nigam Shah; Jason Fries; |
316 | Language Models Are Visual Reasoning Coordinators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning. |
Liangyu Chen; Bo Li; Sheng Shen; Jingkang Yang; Chunyuan Li; Kurt Keutzer; Trevor Darrell; Ziwei Liu; |
317 | Adaptive Online Replanning with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how we may effectively replan with diffusion models. |
Siyuan Zhou; Yilun Du; Shun Zhang; Mengdi Xu; Yikang Shen; Wei Xiao; Dit-Yan Yeung; Chuang Gan; |
318 | DiffVL: Scaling Up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks — a combination of vision and natural language, given in multiple stages — that can be readily leveraged by a differential physics solver. |
Zhiao Huang; Feng Chen; Yewen Pu; Chunru Lin; Hao Su; Chuang Gan; |
319 | Guiding Diffusion Models for Versatile Face Restoration Via Partial Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce $\textit{partial guidance}$, a fresh perspective that is more adaptable to real-world degradations compared to existing works. |
Peiqing Yang; Shangchen Zhou; Qingyi Tao; Chen Change Loy; |
320 | Scaling Riemannian Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. |
Aaron Lou; Minkai Xu; Adam Farris; Stefano Ermon; |
321 | Neural Oscillators Are Universal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Coupled oscillators are being increasingly used as the basis of machine learning (ML) architectures, for instance in sequence modeling, graph representation learning and in physical neural networks that are used in analog ML devices. We introduce an abstract class of *neural oscillators* that encompasses these architectures and prove that neural oscillators are universal, i.e, they can approximate any continuous and casual operator mapping between time-varying functions, to desired accuracy. |
Samuel Lanthaler; T. Konstantin Rusch; Siddhartha Mishra; |
322 | Improving Few-Shot Generalization By Exploring and Exploiting Auxiliary Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. |
Alon Albalak; Colin Raffel; William Yang Wang; |
323 | JourneyDB: A Benchmark for Generative Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Synthetic images, in comparison to real data, encompass a higher level of diversity in terms of both content and style, thereby presenting significant challenges for the models to fully grasp. In light of this challenge, we introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding. |
Junting Pan; Keqiang Sun; Yuying Ge; Hao Li; Haodong Duan; Xiaoshi Wu; Renrui Zhang; Aojun Zhou; Zipeng Qin; Yi Wang; Jifeng Dai; Yu Qiao; Hongsheng Li; |
324 | Decision Stacks: Flexible Reinforcement Learning Via Modular Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules. |
Siyan Zhao; Aditya Grover; |
325 | Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the main causes of the view inconsistency problem—the embedded bias of 2D diffusion models. Based on these findings, we propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation. |
Susung Hong; Donghoon Ahn; Seungryong Kim; |
326 | Bounce: A Reliable Bayesian Optimization Algorithm for Combinatorial and Mixed Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the need for a reliable algorithm for combinatorial and mixed spaces, this paper proposes Bounce that relies on a novel map of various variable types into nested embeddings of increasing dimensionality. |
Leonard Papenmeier; Luigi Nardi; Matthias Poloczek; |
327 | Window-Based Distribution Shift Detection for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we study the case of monitoring the healthy operation of a deep neural network (DNN) receiving a stream of data, with the aim of detecting input distributional deviations over which the quality of the network’s predictions is potentially damaged. |
Guy Bar Shalom; Yonatan Geifman; Ran El-Yaniv; |
328 | Syntactic Binding in Diffusion Models: Enhancing Attribute Correspondence Through Attention Map Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As one notable example, a query like “a yellow tomato and a red lemon” may incorrectly produce an image of a yellow lemon and a red tomato. To remedy this issue, we propose SynGen, an approach which first syntactically analyses the prompt to identify entities and their modifiers, and then uses a novel loss function that encourages the cross-attention maps to agree with the linguistic binding reflected by the syntax. |
Royi Rassin; Eran Hirsch; Daniel Glickman; Shauli Ravfogel; Yoav Goldberg; Gal Chechik; |
329 | Segment Anything in High Quality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM’s original promptable design, efficiency, and zero-shot generalizability. |
Lei Ke; Mingqiao Ye; Martin Danelljan; Yifan liu; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu; |
330 | Equivariant Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce equivariant flow matching, a new training objective for equivariant CNFs that is based on the recently proposed optimal transport flow matching. |
Leon Klein; Andreas Krämer; Frank Noe; |
331 | The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that smooth fictitious play, a well-known learning model in game theory, can achieve both consensus and convergence to equilibrium in diverse multi-agent settings. |
Shuyue Hu; Harold Soh; Georgios Piliouras; |
332 | Segment Any Point Cloud Sequences By Distilling Vision Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce ***Seal***, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. |
Youquan Liu; Lingdong Kong; Jun CEN; Runnan Chen; Wenwei Zhang; Liang Pan; Kai Chen; Ziwei Liu; |
333 | Synthetic Experience Replay Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to flexibly upsample an agent’s collected experience. |
Cong Lu; Philip Ball; Yee Whye Teh; Jack Parker-Holder; |
334 | Have It Your Way: Individualized Privacy Assignment for DP-SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, setting a uniform privacy budget across all points may be overly conservative for some users or, conversely, not sufficiently protective for others. In this paper, we capture these preferences through individualized privacy budgets. |
Franziska Boenisch; Christopher Mühl; Adam Dziedzic; Roy Rinberg; Nicolas Papernot; |
335 | Mitigating Test-Time Bias for Fair Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So motivated, we introduce a straightforward technique, Post-hoc Bias Mitigation (PBM), that post-processes the outputs from the pre-trained vision-language model. |
Fanjie Kong; Shuai Yuan; Weituo Hao; Ricardo Henao; |
336 | Fixing Unsupervised Depth Estimation for Dynamical Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we present an unifying approach for jointly learning the estimation of monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. |
Yihong Sun; Bharath Hariharan; |
337 | Unified Segment-to-Segment Framework for Simultaneous Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified segment-tosegment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. |
Shaolei Zhang; Yang Feng; |
338 | ProteinBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ProteinBench, a new benchmark for protein design, which comprises extended protein design tasks, integrated models, and diverse evaluation metrics. |
Zhangyang Gao; Cheng Tan; Yijie Zhang; Xingran Chen; Lirong Wu; Stan Z. Li; |
339 | Propagating Knowledge Updates to LMs Through Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that a context distillation-based approach can both impart knowledge about entities \emph{and} propagate that knowledge to enable broader inferences. |
Shankar Padmanabhan; Yasumasa Onoe; Michael Zhang; Greg Durrett; Eunsol Choi; |
340 | GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. |
Xin Li; Dongze Lian; Zhihe Lu; Jiawang Bai; Zhibo Chen; Xinchao Wang; |
341 | Localized Symbolic Knowledge Distillation for Visual Commonsense Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build LocalizedVisual Commonsense model which allows users to specify (multiple) regions-as-input. |
Jae Sung Park; Jack Hessel; Khyathi Chandu; Paul Pu Liang; Ximing Lu; Qiuyuan Huang; Peter West; Jianfeng Gao; Ali Farhadi; Yejin Choi; |
342 | Adversarial Training for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the pursuit of fixing adversarial training (1) we show and overcome fundamental theoretical as well as practical limitations of the adopted graph learning setting in prior work; (2) we reveal that more flexible GNNs based on learnable graph diffusion are able to adjust to adversarial perturbations, while the learned message passing scheme is naturally interpretable; (3) we introduce the first attack for structure perturbations that, while targeting multiple nodes at once, is capable of handling global (graph-level) as well as local (node-level) constraints. |
Lukas Gosch; Simon Geisler; Daniel Sturm; Bertrand Charpentier; Daniel Zügner; Stephan Günnemann; |
343 | Incentives in Federated Learning: Equilibria, Dynamics, and Mechanisms for Welfare Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we model a collaborative FL framework, where every agent attempts to achieve an optimal trade-off between her learning payoff and data sharing cost. |
Aniket Murhekar; Zhuowen Yuan; Bhaskar Ray Chaudhury; Bo Li; Ruta Mehta; |
344 | Deep Language Networks: Joint Prompt Training of Stacked LLMs Using Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By viewing large language models (LLMs) as stochastic layers in a deep network, where the tunable parameters are the prompts at each layer, we chain multiple LLMs, feeding the output of the one at layer $l$ to the one at layer $l+1$, jointly training them using variational inference. |
Alessandro Sordoni; Eric Yuan; Marc-Alexandre Côté; Matheus Pereira; Adam Trischler; Ziang Xiao; Arian Hosseini; Friederike Niedtner; Nicolas Le Roux; |
345 | A Unified Conditional Framework for Diffusion-based Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a unified conditional framework based on diffusion models for image restoration. |
Yi Zhang; Xiaoyu Shi; Dasong Li; Xiaogang Wang; Jian Wang; Hongsheng Li; |
346 | CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. |
Guangyao Zhai; Evin Pınar Örnek; Shun-Cheng Wu; Yan Di; Federico Tombari; Nassir Navab; Benjamin Busam; |
347 | A Fractional Graph Laplacian Approach to Oversmoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we generalize the concept of oversmoothing from undirected to directed graphs. |
Sohir Maskey; Raffaele Paolino; Aras Bacho; Gitta Kutyniok; |
348 | Resolving Interference When Merging Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter’s values across models. To address this, we propose our method, TrIm, Elect Sign & Merge (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. |
Prateek Yadav; Derek Tam; Leshem Choshen; Colin Raffel; Mohit Bansal; |
349 | AdANNS: A Framework for Adaptive Semantic Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage _adaptive representations_ of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. |
Aniket Rege; Aditya Kusupati; Sharan Ranjit S; Alan Fan; Qingqing Cao; Sham Kakade; Prateek Jain; Ali Farhadi; |
350 | Mirror Diffusion Models for Constrained and Watermarked Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. |
Guan-Horng Liu; Tianrong Chen; Evangelos Theodorou; Molei Tao; |
351 | Recurrent Hypernetworks Are Surprisingly SOTA in Meta-RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an extensive empirical investigation and suggest a method that works without the need for additional tuning. |
Jacob Beck; Risto Vuorio; Zheng Xiong; Shimon Whiteson; |
352 | CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures. |
Denis Kuznedelev; Eldar Kurtić; Elias Frantar; Dan Alistarh; |
353 | (Un)interpretability of Transformers: A Case Study with Dyck Grammars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, through a combination of theoretical results and carefully controlled experiments on synthetic data, we take a critical viewof methods that exclusively focus on individual parts of the model, rather than consider the network as a whole. |
Kaiyue Wen; Yuchen Li; Bingbin Liu; Andrej Risteski; |
354 | FC-CLIP: Open-Vocabulary Panoptic Segmentation with A Single Frozen Convolutional CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By contrast, we propose to build everything into a single-stage framework using a shared Frozen Convolutional CLIP backbone, which not only significantly simplifies the current two-stage pipeline, but also remarkably yields a better accuracy-cost trade-off. |
Qihang Yu; Ju He; Xueqing Deng; Xiaohui Shen; Liang-Chieh Chen; |
355 | Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable **Landscape Surrogate** $\mathcal{M}$ as a replacement for $f\circ \mathbf{g}$. |
Arman Zharmagambetov; Brandon Amos; Aaron Ferber; Taoan Huang; Bistra Dilkina; Yuandong Tian; |
356 | Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as implicit topic models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LLM, then directly generalize the selected demonstrations to larger LLMs. |
Xinyi Wang; Wanrong Zhu; Michael Saxon; Mark Steyvers; William Yang Wang; |
357 | Where Did I Come From? Origin Attribution of AI-Generated Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods only focus on specific types of generative models and require additional procedures during the training phase or generation phase. This makes them unsuitable for pre-trained models that lack these specific operations and may impair generation quality. To address this problem, we first develop an alteration-free and model-agnostic origin attribution method via reverse-engineering on image generation models, i.e., inverting the input of a particular model for a specific image. |
Zhenting Wang; Chen Chen; Yi Zeng; Lingjuan Lyu; Shiqing Ma; |
358 | An Inverse Scaling Law for CLIP Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a surprising finding that there exists an inverse scaling law for CLIP training, whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training. |
Xianhang Li; Zeyu Wang; Cihang Xie; |
359 | Learning Generalizable Agents Via Saliency-guided Features Decorrelation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. |
Sili Huang; Yanchao Sun; Jifeng Hu; Siyuan Guo; Bo Yang; Hechang Chen; Yi Chang; Lichao Sun; |
360 | DreamWaltz: Make A Scene with Complex 3D Animatable Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DreamWaltz, a novel framework for generating and animating complex avatars given text guidance and parametric human body prior. |
Yukun Huang; Jianan Wang; Ailing Zeng; He CAO; Xianbiao Qi; Yukai Shi; Zheng-Jun Zha; Lei Zhang; |
361 | MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. |
Marco Bellagente; Hannah Teufel; Manuel Brack; Björn Deiseroth; Felix Friedrich; Constantin Eichenberg; Andrew Dai; Robert Baldock; Souradeep Nanda; Koen Oostermeijer; Andres Felipe Cruz-Salinas; Patrick Schramowski; Kristian Kersting; Samuel Weinbach; |
362 | Efficient Neural Music Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present **M**e**L**o**D**y (**M** for music; **L** for LM; **D** for diffusion), an LM-guided diffusion model that generates music audios of state-of-the-art quality meanwhile reducing 95.7\% or 99.6\% forward passes in MusicLM, respectively, for sampling 10s or 30s music. |
Max W. Y. Lam; Qiao Tian; Tang Li; Zongyu Yin; Siyuan Feng; Ming Tu; Yuliang Ji; Rui Xia; Mingbo Ma; Xuchen Song; Jitong Chen; Wang Yuping; Yuxuan Wang; |
363 | A Metadata-Driven Approach to Understand Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a \emph{metadata-driven} approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. |
Ting Wei Li; Qiaozhu Mei; Jiaqi Ma; |
364 | Swap Agnostic Learning, or Characterizing Omniprediction Via Multicalibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce and study the notion of Swap Agnostic Learning. |
Parikshit Gopalan; Michael Kim; Omer Reingold; |
365 | Object Reprojection Error (ORE): Camera Pose Benchmarks from Lightweight Tracking Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel evaluation protocol, Object Reprojection Error (ORE) to benchmark camera trajectories; ORE computes reprojection error for static objects within the video and requires only lightweight object tracklet annotations. |
Xingyu Chen; Weiyao Wang; Hao Tang; Matt Feiszli; |
366 | RoboDepth: Robust Out-of-Distribution Depth Estimation Under Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Common corruptions, however, tend to occur in practical scenarios, especially for safety-critical applications like autonomous driving. To fill in this gap, we present a comprehensive robustness test suite dubbed RoboDepth consisting of 18 corruptions from three categories: i) weather and lighting conditions; ii) sensor failure and movement; and iii) data processing issues. |
Lingdong Kong; Shaoyuan Xie; Hanjiang Hu; Lai Xing Ng; Benoit Cottereau; Wei Tsang Ooi; |
367 | Transformer-based Planning for Symbolic Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these models primarily rely on supervised pretraining goals borrowed from text generation and overlook equation-specific objectives like accuracy and complexity. To address this, we propose TPSR, a Transformer-based Planning strategy for Symbolic Regression that incorporates Monte Carlo Tree Search into the transformer decoding process. |
Parshin Shojaee; Kazem Meidani; Amir Barati Farimani; Chandan Reddy; |
368 | DreamSparse: Escaping from Plato’s Cave with 2D Diffusion Model Given Sparse Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. |
Paul Yoo; Jiaxian Guo; Yutaka Matsuo; Shixiang (Shane) Gu; |
369 | Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters — an approach we call Boundless DAS. |
Zhengxuan Wu; Atticus Geiger; Christopher Potts; Noah Goodman; |
370 | QuIP: 2-Bit Quantization of Large Language Models With Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from *incoherent* weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. |
Jerry Chee; Yaohui Cai; Volodymyr Kuleshov; Christopher De Sa; |
371 | Text-to-Image Diffusion Models Are Zero Shot Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, what knowledge their representations capture is not fully understood, and they have not been thoroughly explored on downstream tasks. We investigate diffusion models by proposing a method for evaluating them as zero-shot classifiers. |
Kevin Clark; Priyank Jaini; |
372 | Towards Consistent Video Editing with Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite of their low requirements of data and computation, these methods always generate editing results of unsatisfied consistency with text prompt as well as temporal sequence, limiting their applications in the real world. In this paper, we propose to address the above issue with a novel EI$^2$ model towards \textbf{E}nhancing v\textbf{I}deo \textbf{E}diting cons\textbf{I}stency of TTI-based frameworks. |
Zicheng Zhang; Bonan Li; Xuecheng Nie; Congying Han; Tiande Guo; Luoqi Liu; |
373 | Towards Self-Interpretable Graph-Level Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i.e., the vital subgraph that leads to the predictions. |
Yixin Liu; Kaize Ding; Qinghua Lu; Fuyi Li; Leo Yu Zhang; Shirui Pan; |
374 | Towards Robust and Expressive Whole-body Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework to enhance the robustness of whole-body pose and shape estimation. |
Hui En Pang; Zhongang Cai; Lei Yang; Tianwei Zhang; Qingyi Tao; Zhonghua Wu; Ziwei Liu; |
375 | InsActor: Instruction-driven Physics-based Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present $\textbf{InsActor}$, a principled generative framework that leverages recent advancements in diffusion-based human motion models to produce instruction-driven animations of physics-based characters. |
Jiawei Ren; Mingyuan Zhang; Cunjun Yu; Xiao Ma; Liang Pan; Ziwei Liu; |
376 | One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds Without Per-Shape Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass. |
Minghua Liu; Chao Xu; Haian Jin; Linghao Chen; Mukund Varma T; Zexiang Xu; Hao Su; |
377 | OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. |
Minghua Liu; Ruoxi Shi; Kaiming Kuang; Yinhao Zhu; Xuanlin Li; Shizhong Han; Hong Cai; Fatih Porikli; Hao Su; |
378 | OpenIllumination: A Multi-Illumination Dateset for Inverse Rendering Evaluation on Real Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. |
Isabella Liu; Linghao Chen; Ziyang Fu; Liwen Wu; Haian Jin; Zhong Li; Chin Ming Ryan Wong; Yi Xu; Ravi Ramamoorthi; Zexiang Xu; Hao Su; |
379 | Foundation Model Is Efficient Multimodal Multitask Model Selector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although recent-advanced approaches employed lightweight metrics to measure models’ transferability, they often depend heavily on the prior knowledge of a single task, making them inapplicable in a multi-modal multi-task scenario. To tackle this issue, we propose an efficient multitask model selector (EMMS), which employs large-scale foundation models to transform diverse label formats such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding. |
fanqing meng; Wenqi Shao; zhanglin peng; Chonghe Jiang; Kaipeng Zhang; Yu Qiao; Ping Luo; |
380 | Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To support 3D occupancy prediction, we develop a label generation pipeline that produces dense, visibility-aware labels for any given scene. |
Xiaoyu Tian; Tao Jiang; Longfei Yun; Yucheng Mao; Huitong Yang; Yue Wang; Yilun Wang; Hang Zhao; |
381 | Understanding The Latent Space of Diffusion Models Through The Lens of Riemannian Geometry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through analysis, we show that 1) the model focuses on low-frequency components early in the generative process and attunes to high-frequency details later.;2) At early timesteps, different samples share similar tangent spaces.; and 3) Simpler datasets that DMs trained on, the more consistent the tangent space for each timestep. |
Yong-Hyun Park; Mingi Kwon; Jaewoong Choi; Junghyo Jo; Youngjung Uh; |
382 | Preference-grounded Token-level Guidance for Language Model Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is, therefore, a *granularity mismatch* between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. |
Shentao Yang; Shujian Zhang; Congying Xia; Yihao Feng; Caiming Xiong; Mingyuan Zhou; |
383 | Few-shot Generation Via Recalling The Episodic-Semantic Memory Like Human Being Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the cognitive systems of human being, in this work, we carefully design a variational structured memory module (VSM), which can simultaneously store both episodic and semantic memories to assistant existing generative models to efficiently recall memory during generation. |
Zhibin Duan; Zhiyi Lv; Chaojie Wang; Bo Chen; Bo An; Mingyuan Zhou; |
384 | What Can A Single Attention Layer Learn? A Study Through The Random Features Lens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. |
Hengyu Fu; Tianyu Guo; Yu Bai; Song Mei; |
385 | LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. |
Bo Liu; Yifeng Zhu; Chongkai Gao; Yihao Feng; Qiang Liu; Yuke Zhu; Peter Stone; |
386 | ELDEN: Exploration Via Local Dependencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent’s actions may change the state of one factor that, in order, may affect the state of another factor. |
Zizhao Wang; Jiaheng Hu; Roberto Martín-Martín; Peter Stone; |
387 | Language Model Tokenizers Introduce Unfairness Between Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. |
Aleksandar Petrov; Emanuele La Malfa; Philip Torr; Adel Bibi; |
388 | What Indeed Can GPT Models Do in Chemistry? A Comprehensive Benchmark on Eight Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain. |
Taicheng Guo; kehan Guo; Bozhao Nan; Zhenwen Liang; Zhichun Guo; Nitesh Chawla; Olaf Wiest; Xiangliang Zhang; |
389 | Geometric Analysis of Matrix Sensing Over Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the problem of matrix sensing over graphs (MSoG). |
Haixiang Zhang; Ying Chen; Javad Lavaei; |
390 | Transformers Learn to Implement Preconditioned Gradient Descent for In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Going beyond the question of expressivity, we ask: \emph{Can transformers can learn to implement such algorithms by training over random problem instances?} To our knowledge, we make the first theoretical progress toward this question via analysis of the loss landscape for linear transformers trained over random instances of linear regression. |
Kwangjun Ahn; Xiang Cheng; Hadi Daneshmand; Suvrit Sra; |
391 | Fast Optimal Locally Private Mean Estimation Via Random Projections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new algorithmic framework, namely ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a $1+o(1)$-factor. |
Hilal Asi; Vitaly Feldman; Jelani Nelson; Huy Nguyen; Kunal Talwar; |
392 | Diverse Conventions for Human-AI Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. |
Bidipta Sarkar; Andy Shih; Dorsa Sadigh; |
393 | Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach that integrates the two modalities from two branches, leveraging spike streams as auxiliary visual cues for guiding deblurring in high-speed motion scenes. |
Shiyan Chen; Jiyuan Zhang; Yajing Zheng; Zhaofei Yu; Tiejun Huang; |
394 | Ambient Diffusion: Learning Clean Distributions from Corrupted Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. |
Giannis Daras; Kulin Shah; Yuval Dagan; Aravind Gollakota; Alex Dimakis; Adam Klivans; |
395 | Martingale Diffusion Models: Mitigating Sampling Drift By Learning to Be Consistent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, the standard training objective via Denoising Score Matching (DSM) is only designed to optimize over non-drifted data. To train on drifted data, we propose to enforce a \emph{Martingale} property (MP) which states that predictions of the model on its own generated data follow a Martingale, thus being consistent with the outputs that it generates. |
Giannis Daras; Yuval Dagan; Alex Dimakis; Constantinos Daskalakis; |
396 | Intriguing Properties of Quantization at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask _are quantization cliffs in performance solely a factor of scale? |
Arash Ahmadian; Saurabh Dash; Hongyu Chen; Bharat Venkitesh; Zhen Stephen Gou; Phil Blunsom; Ahmet Üstün; Sara Hooker; |
397 | Learning to Reason and Memorize with Self-Notes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. |
Jack Lanchantin; Shubham Toshniwal; Jason Weston; arthur szlam; Sainbayar Sukhbaatar; |
398 | A Randomized Approach for Tight Privacy Accounting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new differential privacy paradigm called estimate-verify-release (EVR), which tackles the challenges of providing a strict upper bound for the privacy parameter in DP compositions by converting an *estimate* of privacy parameter into a formal guarantee. |
Jiachen T. Wang; Saeed Mahloujifar; Tong Wu; Ruoxi Jia; Prateek Mittal; |
399 | A Privacy-Friendly Approach to Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical challenges in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). |
Jiachen T. Wang; Yuqing Zhu; Yu-Xiang Wang; Ruoxi Jia; Prateek Mittal; |
400 | Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to simultaneously incorporate label and environment causal independence (LECI) to fully make use of label and environment information, thereby addressing the challenges faced by prior methods on identifying causal and invariant subgraphs. |
Shurui Gui; Meng Liu; Xiner Li; Youzhi Luo; Shuiwang Ji; |
401 | Satisfiability-Aided Language Models Using Declarative Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new satisfiability-aided language modeling (SATLM) approach for improving the reasoning capabilities of LLMs. |
Xi Ye; Qiaochu Chen; Isil Dillig; Greg Durrett; |
402 | Can Language Models Teach? Teacher Explanations Improve Student Performance Via Theory of Mind Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Next, when the teacher is constrained by a budget, we decompose the teaching problem along two axes for better efficiency: (1) deciding when it is worth explaining a data point, and (2) understanding how the teacher should personalize explanations to better teach the student. We tackle both these problems by proposing a Theory of Mind approach, in which the teacher builds two few-shot mental models of the student. |
Swarnadeep Saha; Peter Hase; Mohit Bansal; |
403 | Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, Ito [2021] took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm withthe 1/2-Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule. |
Tiancheng Jin; Junyan Liu; Haipeng Luo; |
404 | Regret Matching$^+$: (In)Stability and Fast Convergence in Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We then provide two fixes: restarting and chopping off the positive orthant that RM$^+$ works in. We show that these fixes are sufficient to get $O(T^{1/4})$ individual regret and $O(1)$ social regret in normal-form games via RM$^+$ with predictions. |
Gabriele Farina; Julien Grand-Clément; Christian Kroer; Chung-Wei Lee; Haipeng Luo; |
405 | S-CLIP: Semi-supervised Vision-Language Pre-training Using Few Specialist Captions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models often struggle when applied to specialized domains like remote sensing, and adapting to such domains is challenging due to the limited number of image-text pairs available for training. To address this, we propose S-CLIP, a semi-supervised learning method for training CLIP that utilizes additional unpaired images. |
Sangwoo Mo; Minkyu Kim; Kyungmin Lee; Jinwoo Shin; |
406 | SpokenWOZ: A Large-Scale Speech-Text Dataset for Spoken Task-Oriented Dialogue in Multiple Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. |
Shuzheng Si; Wentao Ma; Haoyu Gao; Yuchuan Wu; Ting-En Lin; Yinpei Dai; Hangyu Li; Rui Yan; Fei Huang; Yongbin Li; |
407 | Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, due to the rapidly evolving process of the field and the knowledge gap between science (e.g., physics, chemistry, \& biology) and machine learning communities, a benchmarking study on geometrical representation for such data has not been conducted. To address such an issue, in this paper, we first provide a unified view of the current symmetry-informed geometric methods, classifying them into three main categories: invariance, equivariance with spherical frame basis, and equivariance with vector frame basis. |
Shengchao Liu; weitao Du; Yanjing Li; Zhuoxinran Li; Zhiling Zheng; Chenru Duan; Zhi-Ming Ma; Omar Yaghi; Animashree Anandkumar; Christian Borgs; Jennifer Chayes; Hongyu Guo; Jian Tang; |
408 | DriveMax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, realistic simulation requires accurate modeling of multi-agent interactive behaviors to be trustworthy, behaviors which can be highly nuanced and complex. To address these challenges, we introduce DriveMax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. |
Cole Gulino; Justin Fu; Wenjie Luo; George Tucker; Eli Bronstein; Yiren Lu; Jean Harb; Xinlei Pan; Yan Wang; Xiangyu Chen; John Co-Reyes; Rishabh Agarwal; Rebecca Roelofs; Yao Lu; Nico Montali; Paul Mougin; Zoey Yang; Brandyn White; Aleksandra Faust; Rowan McAllister; Dragomir Anguelov; Benjamin Sapp; |
409 | Rethinking The Role of Token Retrieval in Multi-Vector Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. |
Jinhyuk Lee; Zhuyun Dai; Sai Meher Karthik Duddu; Tao Lei; Iftekhar Naim; Ming-Wei Chang; Vincent Zhao; |
410 | Feature-Learning Networks Are Consistent Across Widths At Realistic Scales Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. |
Nikhil Vyas; Alexander Atanasov; Blake Bordelon; Depen Morwani; Sabarish Sainathan; Cengiz Pehlevan; |
411 | Self-supervised Neural Maps for Visual Positioning and Semantic Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SNAP, a deep network that learns rich 2D _neural_ maps from ground-level and overhead images. |
Paul-Edouard Sarlin; Eduard Trulls; Marc Pollefeys; Simon Lynen; Jan Hosang; |
412 | Elastic Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. |
Yueh-Hua Wu; Xiaolong Wang; Masashi Hamaya; |
413 | Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems. |
Frederik Kunstner; Victor Sanches Portella; Mark Schmidt; Nicholas Harvey; |
414 | Clifford Group Equivariant Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{E}(n)$-equivariant networks. |
David Ruhe; Johannes Brandstetter; Patrick Forré; |
415 | Direct Diffusion Bridge Using Data Consistency for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Several recent works have tried to alleviate this problem by building a diffusion process, directly bridging the clean and the corrupted for specific inverse problems. In this paper, we first unify these existing works under the name Direct Diffusion Bridges (DDB), showing that while motivated by different theories, the resulting algorithms only differ in the choice of parameters. |
Hyungjin Chung; Jeongsol Kim; Jong Chul Ye; |
416 | Mixed Samples As Probes for Unsupervised Model Selection in Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose MixVal, a novel target-only method that employs \textit{mixup} to synthesize in-between target samples for validation. |
Dapeng Hu; Jian Liang; Jun Hao Liew; Chuhui Xue; Song Bai; Xinchao Wang; |
417 | Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to conventional approaches, which primarily focus on the spatial domain exploration, we propose a paradigm shift toward the Fourier domain. |
Keji He; Chenyang Si; Zhihe Lu; Yan Huang; Liang Wang; Xinchao Wang; |
418 | Grammar Prompting for Domain-Specific Language Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore \emph{grammar prompting} as a simple approach for enabling LLMs to use external knowledge and domain-specific constraints, expressed through a grammar expressed in Backus–Naur Form (BNF), during in-context learning. |
Bailin Wang; Zi Wang; Xuezhi Wang; Yuan Cao; Rif A. Saurous; Yoon Kim; |
419 | Context-TAP: Tacking Any Point Demands Context Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel framework Context-TAP, which effectively improves point trajectory accuracy by aggregating spatial context features in videos. |
Weikang Bian; Zhaoyang Huang; Xiaoyu Shi; Yitong Dong; Yijin Li; Hongsheng Li; |
420 | Language Models Can Improve Event Prediction By Few-Shot Abductive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models have shown astonishing performance on a wide range of reasoning tasks. In this paper, we investigate whether they could reason about real-world events and help improve the prediction accuracy of event sequence models. |
Xiaoming Shi; Siqiao Xue; Kangrui Wang; Fan Zhou; James Zhang; Jun Zhou; Chenhao Tan; Hongyuan Mei; |
421 | Penalising The Biases in Norm Regularisation Enforces Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond simple intuitions, the relation between regularising parameters’ norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters’ norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor. |
Etienne Boursier; Nicolas Flammarion; |
422 | DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These event markers may not be readily available or could be challenging to acquire during real-time inference, and the sequence of eye fixations may not align with the order of spoken words. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. |
Yiqun Duan; Charles Chau; Zhen Wang; Yu-Kai Wang; Chin-teng Lin; |
423 | Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing work focuses on equipping LMs with visual understanding, we propose two novel interpretable/explainable visual programming frameworks for T2I generation and evaluation. |
Jaemin Cho; Abhay Zala; Mohit Bansal; |
424 | Random-Access Infinite Context Length for Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context. |
Amirkeivan Mohtashami; Martin Jaggi; |
425 | Scissorhands: Exploiting The Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by an interesting observation of the attention scores, we hypothesize the persistence of importance: only pivotal tokens, which had a substantial influence at one step, will significantly influence future generations. Based on our empirical verification and theoretical analysis around this hypothesis, we propose scissorhands, a system that maintains the memory usage of the KV cache at a fixed budget without finetuning the model. |
Zichang Liu; Aditya Desai; Fangshuo Liao; Weitao Wang; Victor Xie; Zhaozhuo Xu; Anastasios Kyrillidis; Anshumali Shrivastava; |
426 | One-Pass Distribution Sketch for Measuring Data Heterogeneity in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a one-pass distribution sketch to represent the client data distribution. |
Zichang Liu; Zhaozhuo Xu; Benjamin Coleman; Anshumali Shrivastava; |
427 | Benchmarking Robustness to Adversarial Image Obfuscations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reach this goal, these malicious actors may obfuscate policy violating images (e.g., overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. |
Florian Stimberg; Ayan Chakrabarti; Chun-Ta Lu; Hussein Hazimeh; Otilia Stretcu; Wei Qiao; Yintao Liu; Merve Kaya; Cyrus Rashtchian; Ariel Fuxman; Mehmet Tek; Sven Gowal; |
428 | Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that characterizes user intents by building attribute transition graphs and matching attribute patterns. |
Xin Liu; Zheng Li; Yifan Gao; Jingfeng Yang; Tianyu Cao; Zhengyang Wang; Bing Yin; Yangqiu Song; |
429 | One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we answer the questions by fairly and comprehensively investigating the adversarial performance of 10+ popular structured pruning methods. |
Shaochen (Henry) Zhong; Zaichuan You; Jiamu Zhang; Sebastian Zhao; Zachary LeClaire; Zirui Liu; Vipin Chaudhary; Shuai Xu; Xia Hu; |
430 | Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. |
Lin Guan; Karthik Valmeekam; Sarath Sreedharan; Subbarao Kambhampati; |
431 | Learning to Compress Prompts with Gist Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of gist tokens which can be cached and reused for compute efficiency. |
Jesse Mu; Xiang Li; Noah Goodman; |
432 | A Generative Model of The Hippocampal Formation Trained with Theta Driven Local Learning Rules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce a biologically plausible model of the hippocampal formation tantamount to a Helmholtz machine that we apply to a temporal stream of inputs. |
Tom M George; Kimberly Stachenfeld; Caswell Barry; Claudia Clopath; Tomoki Fukai; |
433 | ZipLM: Inference-Aware Structured Pruning of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. |
Eldar Kurtić; Elias Frantar; Dan Alistarh; |
434 | Knowledge Distillation Performs Partial Variance Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective. |
Mher Safaryan; Alexandra Peste; Dan Alistarh; |
435 | Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack. |
Aoxiang Zhang; Yu Ran; Weixuan Tang; Yuan-Gen Wang; |
436 | PAC-Bayes Generalization Certificates for Learned Inductive Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use PAC-Bayes theory to obtain generalization bounds on both the coverage and the efficiency of set-valued predictors which can be directly optimized to maximize efficiency while satisfying a desired test coverage. |
Apoorva Sharma; Sushant Veer; Asher Hancock; Heng Yang; Marco Pavone; Anirudha Majumdar; |
437 | DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DoWG (Distance over Weighted Gradients), a new parameter-free optimizer that combines adaptive gradient weighting with distance estimation. |
Ahmed Khaled; Konstantin Mishchenko; Chi Jin; |
438 | Mitigating Over-smoothing in Transformers Via Regularized Nonlocal Functionals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. |
Tam Nguyen; Tan Nguyen; Richard Baraniuk; |
439 | DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST. |
Qingkai Fang; Yan Zhou; Yang Feng; |
440 | Meta-Adapter: An Online Few-shot Learner for Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of overfitting in certain domains. To tackle these challenges, we propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner. |
cheng cheng; Lin Song; Ruoyi Xue; Hang Wang; Hongbin Sun; Yixiao Ge; Ying Shan; |
441 | Benchmarking Large Language Models on CMExam – A Comprehensive Chinese Medical Exam Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. |
Junling Liu; Peilin Zhou; Yining Hua; Dading Chong; Zhongyu Tian; Andrew Liu; Helin Wang; Chenyu You; Zhenhua Guo; LEI ZHU; Michael Li; |
442 | Can LLM Already Serve As A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present BIRD, a BIg benchmark for laRge-scale Database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. |
Jinyang Li; Binyuan Hui; Ge Qu; Binhua Li; Jiaxi Yang; Bowen Li; Bailin Wang; Bowen Qin; Ruiying Geng; Nan Huo; Xuanhe Zhou; Ma Chenhao; Guoliang Li; Kevin Chang; Fei Huang; Reynold Cheng; Yongbin Li; |
443 | MomentDiff: Generative Video Moment Retrieval from Random to Real Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To evaluate the influence of the temporal location biases, we propose two “anti-bias” datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. |
Pandeng Li; Chen-Wei Xie; Hongtao Xie; Liming Zhao; Lei Zhang; Yun Zheng; Deli Zhao; Yongdong Zhang; |
444 | CP-SLAM: Collaborative Neural Point-based SLAM System Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a collaborative implicit neural simultaneous localization and mapping (SLAM) system with RGB-D image sequences, which consists of complete front-end and back-end modules including odometry, loop detection, sub-map fusion, and global refinement. |
Jiarui Hu; Mao Mao; Hujun Bao; Guofeng Zhang; Zhaopeng Cui; |
445 | Thought Cloning: Learning to Think While Acting By Imitating Human Thinking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hypothesize one reason for such cognitive deficiencies is that they lack the benefits of thinking in language and that we can improve AI agents by training them to $\textit{think like humans do}$. We introduce a novel Imitation Learning framework, Thought Cloning, where the idea is to not just clone the behaviors of human demonstrators, $\textit{but also the thoughts humans have as they perform these behaviors}$. |
Shengran Hu; Jeff Clune; |
446 | Shaped Attention Mechanism in The Infinite Depth-and-Width Limit at Initialization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention modelwith skip connections in the proportional limit of infinite-depth-and-width. |
Lorenzo Noci; Chuning Li; Mufan Li; Bobby He; Thomas Hofmann; Chris Maddison; Dan Roy; |
447 | Mass-Producing Failures of Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deployed multimodal models can fail in ways that evaluators did not anticipate. In order to find these failures before deployment, we introduce MultiMon, a system that automatically identifies systematic failures—generalizable, natural-language descriptions that describe categories of individual failures. |
Shengbang Tong; Erik Jones; Jacob Steinhardt; |
448 | Bounding Training Data Reconstruction in DP-SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against a training data reconstruction, then utility of private models can be improved because less noise is required to protect against these more ambitious attacks. We investigate this question further in the context of DP-SGD, a standard algorithm for private deep learning, and provide an upper bound on the success of any reconstruction attack against DP-SGD together with an attack that empirically matches the predictions of our bound. |
Jamie Hayes; Borja Balle; Saeed Mahloujifar; |
449 | AllSim: Systematic Simulation and Benchmarking of Repeated Resource Allocation Policies in Multi-User Systems with Varying Resources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a key limitation has been the absence of good methods and test-beds for benchmarking these policies; almost all resource allocation policies are benchmarked in environments which are either completely synthetic or do not allow _any_ deviation from historical data. In this paper we introduce AllSim, which is a benchmarking environment for realistically simulating the impact and utility of policies for resource allocation in systems in which users compete for such scarce resources. |
Jeroen Berrevoets; Daniel Jarrett; Alex Chan; Mihaela van der Schaar; |
450 | Scaling Laws for Language Encoding Models in FMRI Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. |
Richard Antonello; Aditya Vaidya; Alexander Huth; |
451 | DaTaSeg: Taming A Universal Multi-Dataset Multi-Task Segmentation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg. |
Xiuye Gu; Yin Cui; Jonathan Huang; Abdullah Rashwan; Xuan Yang; Xingyi Zhou; Golnaz Ghiasi; Weicheng Kuo; Huizhong Chen; Liang-Chieh Chen; David Ross; |
452 | Deep Reinforcement Learning with Plasticity Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces plasticity injection, a minimalistic intervention that increases the network plasticity without changing the number of trainable parameters or biasing the predictions. |
Evgenii Nikishin; Junhyuk Oh; Georg Ostrovski; Clare Lyle; Razvan Pascanu; Will Dabney; Andre Barreto; |
453 | Twisting Towards Perfection: Asymptotically Exact Conditional Sampling in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce the Twisted Diffusion Sampler, or TDS, a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. |
Luhuan Wu; Brian Trippe; Christian Naesseth; John Cunningham; David Blei; |
454 | Benchmarking Foundation Models with Language-Model-as-an-Examiner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. |
Yushi Bai; Jiahao Ying; Yixin Cao; Xin Lv; Yuze He; Xiaozhi Wang; Jifan Yu; Kaisheng Zeng; Yijia Xiao; Haozhe Lyu; Jiayin Zhang; Juanzi Li; Lei Hou; |
455 | Connecting Multi-modal Contrastive Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR). |
Zehan Wang; Yang Zhao; Xize 成; Haifeng Huang; Jiageng Liu; Aoxiong Yin; Li Tang; Linjun Li; Yongqi Wang; Ziang Zhang; Zhou Zhao; |
456 | Two-Stage Learning to Defer with Multiple Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a two-stage scenario for learning to defer, which we argue is crucial in practice for many applications. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
457 | Structured Prediction with Stronger Consistency Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These loss functions readily lead to new structured prediction algorithms with stronger theoretical guarantees, based on their minimization. We describe efficient algorithms for minimizing several of these surrogate losses, including a new *structured logistic loss*. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
458 | $H$-Consistency Bounds: Characterization and Extensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present new and tight $H$-consistency bounds for both the family of constrained losses and that of comp-sum losses, which covers the familiar cross-entropy, or logistic loss applied to the outputs of a neural network. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
459 | Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a large-scale graph into a small-scale graph node set without explicit graph structures, i.e., graph-free data. |
Xin Zheng; Miao Zhang; Chunyang Chen; Quoc Viet Hung Nguyen; Xingquan Zhu; Shirui Pan; |
460 | GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Evaluating the performance of graph neural networks (GNNs) is an essential task for practical GNN model deployment and serving, as deployed GNNs face significant performance uncertainty when inferring on unseen and unlabeled test graphs, due to mismatched training-test graph distributions. In this paper, we study a *new* problem, **GNN model evaluation**, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs, by precisely estimating its performance (e.g., node classification accuracy) on unseen graphs without labels. |
Xin Zheng; Miao Zhang; Chunyang Chen; Soheila Molaei; Chuan Zhou; Shirui Pan; |
461 | Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the present study, we provide evidence that Graph Neural Networks(GNNs) on node classification typically perform admirably on homophilic nodes within homophilic graphs and heterophilic nodes within heterophilic graphs while struggling on the opposite node set, exhibiting a performance disparity. |
Haitao Mao; Zhikai Chen; Wei Jin; Haoyu Han; Yao Ma; Tong Zhao; Neil Shah; Jiliang Tang; |
462 | Bayesian Optimisation of Functions on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional graph search algorithms can be applied in this case, but they may be sample-inefficient and do not make use of information about the function values; on the other hand, Bayesian optimisation is a class of promising black-box solvers with superior sample efficiency, but it has been scarcely been applied to such novel setups. To fill this gap, we propose a novel Bayesian optimisation framework that optimises over functions defined on generic, large-scale and potentially unknown graphs. |
Xingchen Wan; Pierre Osselin; Henry Kenlay; Binxin Ru; Michael A Osborne; Xiaowen Dong; |
463 | WalkLM: A Uniform Language Model Fine-tuning Framework for Attributed Graph Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a fundamentally different approach than GNNs, to simultaneously achieve deep joint modeling of complex attributes and flexible structures of real-world graphs and obtain unsupervised generic graph representations that are not limited to specific downstream predictions. |
Yanchao Tan; Zihao Zhou; Hang Lv; Weiming Liu; Carl Yang; |
464 | Mind2Web: Towards A Generalist Agent for The Web Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. |
Xiang Deng; Yu Gu; Boyuan Zheng; Shijie Chen; Sam Stevens; Boshi Wang; Huan Sun; Yu Su; |
465 | MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hardware-algorithm co-design method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory, where we jointly design transformer architectures and construct the inference compiler to fit the memory resource constraint. |
Yinan Liang; Ziwei Wang; Xiuwei Xu; Yansong Tang; Jie Zhou; Jiwen Lu; |
466 | Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose an effective approach for topic modeling under the low-resource regime, the core of which is the adaptive generation of semantic matching word embeddings by integrating the contextual information of each task. |
Yishi Xu; Jianqiao Sun; Yudi Su; Xinyang Liu; Zhibin Duan; Bo Chen; Mingyuan Zhou; |
467 | Trajectory Alignment: Understanding The Edge of Stability Phenomenon Via Bifurcation Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram determined solely by the loss function, independent of the network architecture, training data, and step size. |
Minhak Song; Chulhee Yun; |
468 | 3D Open-vocabulary Segmentation with Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the challenges in 3D open-vocabulary segmentation by exploiting the open-vocabulary multimodal knowledge and object reasoning capability of pre-trained foundation models CLIP and DINO, without necessitating any fine-tuning. |
Kunhao Liu; Fangneng Zhan; Jiahui Zhang; MUYU XU; Yingchen Yu; Abdulmotaleb El Saddik; Christian Theobalt; Eric Xing; Shijian Lu; |
469 | Big Little Transformer Decoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The inference latency is further exacerbated by autoregressive generative tasks, as models need to run iteratively to generate tokens sequentially without leveraging token-level parallelization. To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications. |
Sehoon Kim; Karttikeya Mangalam; Suhong Moon; Jitendra Malik; Michael Mahoney; Amir Gholami; Kurt Keutzer; |
470 | Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We tackle the problems of latent variables identification and out-of-support image generation in representation learning. |
Sébastien Lachapelle; Divyat Mahajan; Ioannis Mitliagkas; Simon Lacoste-Julien; |
471 | Representational Strengths and Limitations of Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we establish both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embedding dimension. |
Clayton Sanford; Daniel Hsu; Matus Telgarsky; |
472 | Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap, we delve into the reasons underpinning the success of contrastive loss in CF, and propose a principled Adversarial InfoNCE loss (AdvInfoNCE), which is a variant of InfoNCE, specially tailored for CF methods. |
An Zhang; Leheng Sheng; Zhibo Cai; Xiang Wang; Tat-Seng Chua; |
473 | The Adversarial Consistency of Surrogate Risks for Binary Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the consistency of surrogate risks for robust binary classification. |
Natalie Frank; Jonathan Niles-Weed; |
474 | Adversarial Counterfactual Environment Model Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we first show that, particularly in the sequential decision-making setting, this approach may catastrophically fail to predict counterfactual action effects due to the selection bias of behavior policies during data collection. To tackle this problem, we introduce a novel model-learning objective called adversarial weighted empirical risk minimization (AWRM). |
Xiong-Hui Chen; Yang Yu; Zhengmao Zhu; ZhiHua Yu; Chen Zhenjun; Chenghe Wang; Yinan Wu; Rong-Jun Qin; Hongqiu Wu; Ruijin Ding; Huang Fangsheng; |
475 | Mutual Information Regularized Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction. |
Xiao Ma; Bingyi Kang; Zhongwen Xu; Min Lin; Shuicheng Yan; |
476 | Simplifying Neural Network Training Under Class Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Notably, we demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, architecture size, pre-training, optimizer, and label smoothing, can achieve state-of-the-art performance without any specialized loss functions or samplers. |
Ravid Shwartz-Ziv; Micah Goldblum; Yucen Li; C. Bayan Bruss; Andrew Wilson; |
477 | Let The Flows Tell: Solving Graph Combinatorial Problems with GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space. |
Dinghuai Zhang; Hanjun Dai; Nikolay Malkin; Aaron Courville; Yoshua Bengio; Ling Pan; |
478 | Unified 3D Segmenter As Prototypical Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce ProtoSEG, a prototype-based model that unifies semantic, instance, and panoptic segmentation tasks. |
Zheyun Qin; Cheng Han; Lu Xiankai; Qifan Wang; Xiushan Nie; Yilong Yin; |
479 | Second-Order Degradation and Reconstruction for Test-Time Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, these methods largely concentrate on the estimation of one degradation type (e.g., blur degradation), overlooking other degradation types like noise and JPEG in the real-world test-time scenario, thus limiting their practicality. To tackle this, we present a fast test-time adaptation framework for SR, named SRTTA, which is able to super-resolve images with various degradation types while maintaining high efficiency. |
Zeshuai Deng; Zhuokun Chen; Shuaicheng Niu; Thomas Li; Bohan Zhuang; Mingkui Tan; |
480 | FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations. 1) They may miss detailed information in the goal image, and thus fail to reason the goal location. 2) More critically, it is hard to focus on the goal-relevant regions in the observation image, because they attempt to understand observation without goal conditioning. In this paper, we aim to overcome these limitations by designing a Fine-grained Goal Prompting (\sexyname) method for image-goal navigation. |
Xinyu Sun; Peihao Chen; Jugang Fan; Jian Chen; Thomas Li; Mingkui Tan; |
481 | Assumption Violations in Causal Discovery and The Robustness of Score Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Because causal discovery without further assumptions is an ill-posed problem, each algorithm comes with its own set of usually untestable assumptions, some of which are hard to meet in real datasets. Motivated by these considerations, this paper extensively benchmarks the empirical performance of recent causal discovery methods on observational _iid_ data generated under different background conditions, allowing for violations of the critical assumptions required by each selected approach. |
Francesco Montagna; Atalanti Mastakouri; Elias Eulig; Nicoletta Noceti; Lorenzo Rosasco; Dominik Janzing; Bryon Aragam; Francesco Locatello; |
482 | UniControl: A Unified Diffusion Model for Controllable Visual Generation In The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. |
Can Qin; Shu Zhang; Ning Yu; Yihao Feng; Xinyi Yang; Yingbo Zhou; Huan Wang; Juan Carlos Niebles; Caiming Xiong; Silvio Savarese; Stefano Ermon; Yun Fu; Ran Xu; |
483 | Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by an interesting empirical result that the improvement of the ensemble largely comes from top-ambiguity samples where its member models diverge, we prove that, based on some assumptions, the ensemble has a lower selective risk than the member model for any coverage within a range. |
Qiang Ding; Yixuan Cao; Ping Luo; |
484 | Benchmarking and Analyzing 3D-aware Image Synthesis with A Modularized Codebase Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following the most popular and effective paradigm in this field, which incorporates a neural radiance field (NeRF) into the generator of a generative adversarial network (GAN), we builda well-structured codebase through modularizing the generation process. Such a design allows researchers to develop and replace each module independently, and hence offers an opportunity to fairly compare various approaches and recognize their contributions from the module perspective. |
Qiuyu Wang; Zifan Shi; Kecheng Zheng; Yinghao Xu; Sida Peng; Yujun Shen; |
485 | Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (\ie subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. |
ZHIYUAN LIU; Yaorui Shi; An Zhang; Enzhi Zhang; Kenji Kawaguchi; Xiang Wang; Tat-Seng Chua; |
486 | Differentiable Registration of Images and LiDAR Point Clouds with VoxelPoint-to-Pixel Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods struggle to map points and pixels to a shared latent space robustly since points and pixels have very different characteristics with patterns learned in different manners (MLP and CNN), and they also fail to construct supervision directly on the transformation since the PnP is non-differentiable, which leads to unstable registration results. To address these problems, we propose to learn a structured cross-modality latent space to represent pixel features and 3D features via a differentiable probabilistic PnP solver. |
Junsheng Zhou; Baorui Ma; Wenyuan Zhang; Yi Fang; Yu-Shen Liu; Zhizhong Han; |
487 | DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse syntheticimages and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). |
威佳 吴; Yuzhong Zhao; Hao Chen; Yuchao Gu; Rui Zhao; Yefei He; Hong Zhou; Mike Zheng Shou; Chunhua Shen; |
488 | Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. |
Zangwei Zheng; Xiaozhe Ren; Fuzhao Xue; Yang Luo; Xin Jiang; Yang You; |
489 | Rank-DETR for High Quality Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a simple highly performant DETR-based object detector by proposing a set of rank-oriented designs, collectively called Rank-DETR. |
Yifan Pu; Weicong Liang; Yiduo Hao; YUHUI YUAN; Yukang Yang; Chao Zhang; Han Hu; Gao Huang; |
490 | DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is unclear how the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, named DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. |
Shentong Mo; Enze Xie; Ruihang Chu; Lanqing Hong; Matthias Niessner; Zhenguo Li; |
491 | Diffusion Schrödinger Bridge Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schrödinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. |
Yuyang Shi; Valentin De Bortoli; Andrew Campbell; Arnaud Doucet; |
492 | Parallel Sampling of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. |
Andy Shih; Suneel Belkhale; Stefano Ermon; Dorsa Sadigh; Nima Anari; |
493 | Towards Unbounded Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for ‘forgetting’ and associated metrics for forget quality. |
Meghdad Kurmanji; Peter Triantafillou; Eleni Triantafillou; |
494 | On Quantum Backpropagation, Information Reuse, and Cheating Measurement Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography. |
Amira Abbas; Robbie King; Hsin-Yuan Huang; William J. Huggins; Ramis Movassagh; Dar Gilboa; Jarrod McClean; |
495 | On The Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: An Improved Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the generalization error of stochastic mirror descent for quadratically bounded losses studied in Telgarsky (2022). |
Ta Duy Nguyen; Alina Ene; Huy Nguyen; |
496 | ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present ScenarioNet, an open-source platform for large-scale traffic scenario modeling and simulation. |
Quanyi Li; Zhenghao Peng; Lan Feng; Zhizheng Liu; Chenda Duan; Wenjie Mo; Bolei Zhou; |
497 | Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the convergence in high probability of clipped gradient methods when the noise distribution has heavy tails, i.e., with bounded $p$th moments, for some $1 |
Ta Duy Nguyen; Thien H Nguyen; Alina Ene; Huy Nguyen; |
498 | Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, with a simple predictive loss, how the representation emerges from the gradient \emph{training dynamics} remains a mystery. In this paper, for 1-layer transformer with one self-attention layer plus one decoder layer, we analyze its SGD training dynamics for the task of next token prediction in a mathematically rigorous manner. |
Yuandong Tian; Yiping Wang; Beidi Chen; Simon Du; |
499 | Offline Goal-Conditioned RL with Latent States As Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Importantly, it is easier to assess the effect of actions on getting to these closer states. Based on this idea, we propose a hierarchical algorithm for goal-conditioned RL from offline data. |
Seohong Park; Dibya Ghosh; Benjamin Eysenbach; Sergey Levine; |
500 | A Logic for Expressing Log-Precision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we analyze transformers whose forward pass is computed in $\log n$ precision on contexts of length $n$. |
William Merrill; Ashish Sabharwal; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~3,500 papers), please visit Paper Digest: NeurIPS-2023 (Full List).