Paper Digest: ICLR 2023 Highlights
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICLR 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Understanding Embodied Reference with Touch-Line Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study embodied reference understanding: locating referents using embodied gestural cues and language references. |
Yang Li; Xiaoxue Chen; Hao Zhao; Jiangtao Gong; Guyue Zhou; Federico Rossano; Yixin Zhu; |
2 | ISS: Image As Stepping Stone for Text-Guided 3D Shape Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new framework called Image as Stepping Stone (ISS) for the task by introducing 2D image as a stepping stone to connect the two modalities and to eliminate the need for paired text-shape data. |
Zhengzhe Liu; Peng Dai; Ruihui Li; XIAOJUAN QI; Chi-Wing Fu; |
3 | Structured World Representations Via Block-Slot Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel object-centric representation, called Block-Slot Representation which, unlike the conventional slot representation, provides concept-level disentanglement within a slot. |
Gautam Singh; Yeongbin Kim; Sungjin Ahn; |
4 | Learning to Estimate Single-View Volumetric Flow Motions Without 3D Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the challenging problem of jointly inferring the 3D flow and volumetric densities moving in a fluid from a monocular input video with a deep neural network. |
Erik Franz; Barbara Solenthaler; Nils Thuerey; |
5 | DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from them, we propose a Dynamic Ball Query (DBQ) network to adaptively select a subset of input points according to the input features, and assign the feature transform with a suitable receptive field for each selected point. |
Jinrong Yang; Lin Song; Songtao Liu; Weixin Mao; Zeming Li; Xiaoping Li; Hongbin Sun; Jian Sun; Nanning Zheng; |
6 | Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Edgeformers, a framework built upon graph-enhanced Transformers, to perform edge and node representation learning by modeling texts on edges in a contextualized way. |
Bowen Jin; Yu Zhang; Yu Meng; Jiawei Han; |
7 | Protein Representation Learning By Geometric Structure Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to pretrain protein representations according to their 3D structures. |
Zuobai Zhang; Minghao Xu; Arian Rokkum Jamasb; Vijil Chenthamarakshan; Aurelie Lozano; Payel Das; Jian Tang; |
8 | Any-scale Balanced Samplers for Discrete Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose any-scale balanced samplers to repair the gap in non-local proposals. |
Haoran Sun; Bo Dai; Charles Sutton; Dale Schuurmans; Hanjun Dai; |
9 | Truthful Self-Play Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for evolutionary learning to emergent unbiased state representation without any supervision. |
Shohei Ohsawa; |
10 | Rethinking Symbolic Regression: Morphology and Adaptability in The Context of Evolutionary Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink SR from 2 perspectives: morphology and adaptability.For researchers interested in equation-recovery problems, we also propose a set of conventions that can be used to promote fairness in comparison across SR methods and to reduce unintentional bias. |
Kei Sen Fong; Shelvia Wongso; Mehul Motani; |
11 | Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new multimodal 3D generative model that enables shape-conditioned 3D molecular design by equivariantly encoding molecular shape and variationally encoding chemical identity. |
Keir Adams; Connor W. Coley; |
12 | TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. |
Rongjie Huang; Jinglin Liu; Huadai Liu; Yi Ren; Lichao Zhang; Jinzheng He; Zhou Zhao; |
13 | A Law of Adversarial Risk, Interpolation, and Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the relationship between label noise and adversarial risk for any data distribution. |
Daniel Paleka; Amartya Sanyal; |
14 | Short-Term Memory Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although they often have superior quality compared to standard DSP methods, this advantage is diminished by higher latency. In this work we propose a method for minimization of latency and memory consumption, called Short-Term Memory Convolution (STMC) and its transposed counterpart. |
Grzegorz Stefański; Krzysztof Arendt; Paweł Daniluk; Bartłomiej Jasik; Artur Szumaczuk; |
15 | StyleMorph: Disentangling Shape, Pose and Appearance Through 3D Morphable Image and Geometry Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce StyleMorph, a 3D generative model that relies on the 3D morphable model paradigm to disentangle shape, pose, object and scene texture for high quality image synthesis. |
Eric-Tuan Le; Edward Bartrum; Iasonas Kokkinos; |
16 | SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we successfully apply SlotFormer to perform video prediction on datasets with complex object interactions. |
Ziyi Wu; Nikita Dvornik; Klaus Greff; Thomas Kipf; Animesh Garg; |
17 | Deconstructing Distributions: A Pointwise Framework of Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new approach: we measure the performance of a collection of models when evaluated at *single input point*. |
Gal Kaplun; Nikhil Ghosh; Saurabh Garg; Boaz Barak; Preetum Nakkiran; |
18 | Trading Information Between Latents in Hierarchical Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The proposal of $\beta$-VAEs breaks this interpretation and generalizes VAEs to application domains beyond generative modeling (e.g., representation learning, clustering, or lossy data compression) by introducing an objective function that allows practitioners to trade off between the information content (“bit rate”) of the latent representation and the distortion of reconstructed data. In this paper, we reconsider this rate/distortion trade-off in the context of hierarchical VAEs, i.e., VAEs with more than one layer of latent variables. |
Tim Z. Xiao; Robert Bamler; |
19 | FairGBM: Gradient Boosting with Fairness Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. |
André Cruz; Catarina G Belém; João Bravo; Pedro Saleiro; Pedro Bizarro; |
20 | DySR: Adaptive Super-Resolution Via Algorithm and System Co-design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, there is no SR model or machine learning system that supports adaptive SR, and enabling adaptive SR model on mobile devices is challenging because adapting model can cause significant framerate drop or even service interruption. To address this challenge, we take an algorithm and system co-design approach and propose DySR that maintains QoS while maximizing the model performance. |
Syed Zawad; Cheng Li; Zhewei Yao; Elton Zheng; Yuxiong He; Feng Yan; |
21 | A CMDP-within-online Framework for Meta-Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of meta-safe reinforcement learning (meta-SRL) through the CMDP-within-online framework. |
Vanshaj Khattar; Yuhao Ding; Bilgehan Sel; Javad Lavaei; Ming Jin; |
22 | Mastering The Game of No-Press Diplomacy Via Human-Regularized Reinforcement Learning and Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function. |
Anton Bakhtin; David J Wu; Adam Lerer; Jonathan Gray; Athul Paul Jacob; Gabriele Farina; Alexander H Miller; Noam Brown; |
23 | Pruning Deep Neural Networks from A Sparsity Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. |
Enmao Diao; Ganghua Wang; Jiawei Zhang; Yuhong Yang; Jie Ding; Vahid Tarokh; |
24 | Everybody Needs Good Neighbours: An Unsupervised Locality-based Method for Bias Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new meta-algorithm for debiasing representation learning models, which combines the notions of data locality and accuracy of model fit, such that a supervised debiasing method can optimise fairness between neighbourhoods of poorly vs. well modelled instances as identified by our method. |
Xudong Han; Timothy Baldwin; Trevor Cohn; |
25 | Spacetime Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a general family of representations for directed graphs through connected time-oriented Lorentz manifolds, called spacetimes in general relativity. |
Marc T. Law; James Lucas; |
26 | Quasi-optimal Learning with Continuous Treatments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, it is important to induce a policy class whose support only contains near-optimal actions, and shrink the action-searching area for effectiveness and reliability. To achieve this, we develop a novel \emph{quasi-optimal learning algorithm}, which can be easily optimized in off-policy settings with guaranteed convergence under general function approximations. |
Yuhan Li; Wenzhuo Zhou; Ruoqing Zhu; |
27 | Learning to Extrapolate: A Transductive Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of developing machine learning systems that retain the power of overparametrized function approximators, while enabling extrapolation to out-of-support testing points when possible. |
Aviv Netanyahu; Abhishek Gupta; Max Simchowitz; Kaiqing Zhang; Pulkit Agrawal; |
28 | Label-free Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This poor performance creates a barrier for adoption in practical real world applications. Motivated by these challenges, we propose \textit{Label-free} CBM which is a framework to transform any neural network into an interpretable CBM without labeled concept data, while retaining a high accuracy. |
Tuomas Oikarinen; Subhro Das; Lam M. Nguyen; Tsui-Wei Weng; |
29 | CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. |
Tuomas Oikarinen; Tsui-Wei Weng; |
30 | Predicting Cellular Responses with Variational Causal Inference and Refined Relational Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Predicting the responses of a cell under perturbations may bring important benefits to drug discovery and personalized therapeutics. In this work, we propose a novel graph variational Bayesian causal inference framework to predict a cell’s gene expressions under counterfactual perturbations (perturbations that this cell did not factually receive), leveraging information representing biological knowledge in the form of gene regulatory networks (GRNs) to aid individualized cellular response predictions. |
Yulun Wu; Rob Barton; Zichen Wang; Vassilis N. Ioannidis; Carlo De Donno; Layne C Price; Luis F. Voloch; George Karypis; |
31 | Hard-Meta-Dataset++: Towards Understanding Few-Shot Performance on Difficult Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This phenomenon has real-world consequences for deployed few-shot systems where safety and reliability are paramount, yet little has been done to understand these failure cases. In this paper, we study these difficult tasks to gain a more nuanced understanding of the limitations of current methods. |
Samyadeep Basu; Megan Stanley; John F Bronskill; Soheil Feizi; Daniela Massiceti; |
32 | Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, few attempts have been made to understand the inherent data property of sequence data, neglecting the critical factor that may significantly affect the performance of sequence modeling. In this paper, we theoretically and empirically analyze a generic property of sequence data, i.e., continuity, and connect this property with the performance of deep models. |
Eric Qu; Xufang Luo; Dongsheng Li; |
33 | Symbolic Physics Learner: Discovering Governing Equations Via Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Distilling analytical expressions that govern nonlinear dynamics from limited data remains vital but challenging. To tackle this fundamental issue, we propose a novel Symbolic Physics Learner (SPL) machine to discover the mathematical structure of nonlinear dynamics. |
Fangzheng Sun; Yang Liu; Jian-Xun Wang; Hao Sun; |
34 | Neural Implicit Shape Editing Using Boundary Sensitivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to classic geometry representations, however, neural representations do not allow the user to exert intuitive control over the shape. Motivated by this, we leverage \emph{boundary sensitivity} to express how perturbations in parameters move the shape boundary. |
Arturs Berzins; Moritz Ibing; Leif Kobbelt; |
35 | Understanding The Role of Nonlinearity in Training Dynamics of Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the role of nonlinearity in the training dynamics of contrastive learning (CL) on one and two-layer nonlinear networks with homogeneous activation $h(x) = h'(x)x$. |
Yuandong Tian; |
36 | Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, we theoretically analyze the gradient flow dynamics to shed light on how data heterogeneity result in dimensional collapse for local models. To remedy this problem caused by the data heterogeneity, we propose FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning. |
Yujun Shi; Jian Liang; Wenqing Zhang; Vincent Tan; Song Bai; |
37 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER. |
Erik Nijkamp; Bo Pang; Hiroaki Hayashi; Lifu Tu; Huan Wang; Yingbo Zhou; Silvio Savarese; Caiming Xiong; |
38 | On The Complexity of Nonsmooth Automatic Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs. |
Jerome Bolte; Ryan Boustany; Edouard Pauwels; Béatrice Pesquet-Popescu; |
39 | M-L2O: Towards Generalizable Learning-to-Optimize By Test-Time Fast Self-Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such learned optimizers often struggle when new test problems come with a substantially deviation from the training task distribution. This paper investigates a potential solution to this open challenge, by meta-training an L2O optimizer that can perform fast test-time self-adaptation to a out-of-distribution task, in only a few steps. |
Junjie Yang; Xuxi Chen; Tianlong Chen; Zhangyang Wang; Yingbin Liang; |
40 | Benchmarking Deformable Object Manipulation with Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present DaXBench, a differentiable DOM benchmark with a wide object and task coverage. |
Siwei Chen; Cunjun Yu; Yiqing Xu; Linfeng Li; Xiao Ma; Zhongwen Xu; David Hsu; |
41 | DiffMimic: Efficient Motion Mimicking with Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed $\textbf{DiffMimic}$. |
Jiawei Ren; Cunjun Yu; Siwei Chen; Xiao Ma; Liang Pan; Ziwei Liu; |
42 | Thalamus: A Brain-inspired Algorithm for Biologically-plausible Continual Learning and Disentangled Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the brain thalamocortical circuit, we introduce a simple algorithm that uses optimization at inference time to generate internal representations of the current task dynamically. |
Ali Hummos; |
43 | Adversarial Diversity in Hanabi Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach to diverse policy generation for turn-based Dec-POMDPs with public actions, which relies on off-belief learning to encourage reasonableness and skill, and on “repulsive” fictitious transitions to encourage diversity. |
Brandon Cui; Andrei Lupu; Samuel Sokota; Hengyuan Hu; David J Wu; Jakob Nicolaus Foerster; |
44 | CogVideo: Large-scale Pretraining for Text-to-Video Generation Via Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present CogVideo, a 9B-parameter transformer for text-to-video generation. |
Wenyi Hong; Ming Ding; Wendi Zheng; Xinghan Liu; Jie Tang; |
45 | Interpretability in The Wild: A Circuit for Indirect Object Identification in GPT-2 Small Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most previous work either focuses on simple behaviors in small models, or describes complicated behaviors in larger models with broad strokes. In this work, we bridge this gap by presenting an explanation for how GPT-2 small performs a natural language task that requires logical reasoning: indirect object identification (IOI). |
Kevin Ro Wang; Alexandre Variengien; Arthur Conmy; Buck Shlegeris; Jacob Steinhardt; |
46 | Causal Reasoning in The Presence of Latent Confounders Via Neural ADMG Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with nonlinear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows. |
Matthew Ashman; Chao Ma; Agrin Hilmkil; Joel Jennings; Cheng Zhang; |
47 | Offline RL for Natural Language Generation with Implicit Language Q Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue can be addressed by finetuning such models via supervised learning on curated datasets, or via reinforcement learning. In this work, we propose a novel offline RL method, implicit language Q-learning (ILQL), designed for use on language models, that combines both the flexible utility maximization framework of RL algorithms with the ability of supervised learning to leverage previously collected data, as well as its simplicity and stability. |
Charlie Victor Snell; Ilya Kostrikov; Yi Su; Sherry Yang; Sergey Levine; |
48 | Mid-Vision Feedback for Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel mechanism which modulates perception in Convolutional Neural Networks (CNNs) based on high level categorical expectations: Mid-Vision Feedback (MVF). |
Michael Maynord; Eadom T Dessalene; Cornelia Fermuller; Yiannis Aloimonos; |
49 | HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve deep into the comparison between ViT and Swin, revealing that (i) the performance gain of Swin is mainly brought by a deepened backbone and relative positional encoding, (ii) the hierarchical design of Swin can be simplified into hierarchical patch embedding (proposed in this work), and (iii) other designs such as shifted-window attentions can be removed. |
Xiaosong Zhang; Yunjie Tian; Lingxi Xie; Wei Huang; Qi Dai; Qixiang Ye; Qi Tian; |
50 | Generalizing and Decoupling Neural Collapse Via Hyperspherical Uniformity Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how NC characterizes the training target of neural networks, we decouple NC into two objectives: minimal intra-class variability and maximal inter-class separability. We then introduce the concept of hyperspherical uniformity (which characterizes the degree of uniformity on the unit hypersphere) as a unified framework to quantify these two objectives. |
Weiyang Liu; Longhui Yu; Adrian Weller; Bernhard Schölkopf; |
51 | Score-based Generative 3D Mesh Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the regular graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. |
Zhen Liu; Yao Feng; Michael J. Black; Derek Nowrouzezahrai; Liam Paull; Weiyang Liu; |
52 | ISAAC Newton: Input-based Approximate Curvature for Newton’s Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. |
Felix Petersen; Tobias Sutter; Christian Borgelt; Dongsung Huh; Hilde Kuehne; Yuekai Sun; Oliver Deussen; |
53 | Language Models Can Teach Themselves to Program Better Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. |
Patrick Haluptzok; Matthew Bowers; Adam Tauman Kalai; |
54 | Latent Bottlenecked Attentive Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Conversely, existing sub-quadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Latent Bottlenecked Attentive Neural Processes (LBANPs), a new computationally efficient sub-quadratic NP variant, that has a querying computational complexity independent of the number of context datapoints. |
Leo Feng; Hossein Hajimirsadeghi; Yoshua Bengio; Mohamed Osama Ahmed; |
55 | Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy. |
Lingxiao Wang; Qi Cai; Zhuoran Yang; Zhaoran Wang; |
56 | Learning Kernelized Contextual Bandits in A Distributed and Asynchronous Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, in order to improve the robustness against delays and unavailability of clients that are common in practice, we propose the first asynchronous solution based on approximated kernel regression for distributed kernel bandit learning. |
Chuanhao Li; Huazheng Wang; Mengdi Wang; Hongning Wang; |
57 | GReTo: Remedying Dynamic Graph Topology-task Discordance Via Target Homophily Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit node-wise relationships and explore novel homophily measurements on dynamic graphs with both signs and distances, capturing multiple node-level spatial relations and temporal evolutions. |
Zhengyang Zhou; qihe huang; Gengyu Lin; Kuo Yang; LEI BAI; Yang Wang; |
58 | DocPrompting: Generating Code By Retrieving The Docs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, when human programmers use functions and libraries for the first time, they frequently refer to textual resources such as code manuals and documentation, to explore and understand the available functionality. Inspired by this observation, we introduce DocPrompting: a natural-language-to-code generation approach that explicitly leverages documentation by (1) retrieving the relevant documentation pieces given an NL intent, and (2) generating code based on the NL intent and the retrieved documentation. |
Shuyan Zhou; Uri Alon; Frank F. Xu; Zhengbao Jiang; Graham Neubig; |
59 | SWIFT: Rapid Decentralized Federated Learning Via Wait-Free Model Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. |
Marco Bornstein; Tahseen Rabbani; Evan Z Wang; Amrit Bedi; Furong Huang; |
60 | RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose RoPAWS, a robust extension of PAWS that can work with real-world unlabeled data. |
Sangwoo Mo; Jong-Chyi Su; Chih-Yao Ma; Mido Assran; Ishan Misra; Licheng Yu; Sean Bell; |
61 | Interpretable Geometric Deep Learning Via Learnable Randomness Injection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a general mechanism based on \learnable randomness injection (LRI) that allows building inherently interpretable models with general GDL backbones.We also propose four scientific datasets in the domains of high energy physics and biochemistry to evaluate LRI. |
Siqi Miao; Yunan Luo; Mia Liu; Pan Li; |
62 | Machine Unlearning of Federated Clusters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. |
Chao Pan; Jin Sima; Saurav Prakash; Vishal Rana; Olgica Milenkovic; |
63 | PerFedMask: Personalized Federated Learning with Optimized Masking Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a personalized FL algorithm with optimized masking vectors called PerFedMask. |
Mehdi Setayesh; Xiaoxiao Li; Vincent W.S. Wong; |
64 | A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by humans’ remarkable ability to master arithmetic and generalize to unseen problems, we present a new dataset, HINT, to study machines’ capability of learning generalizable concepts at three levels: perception, syntax, and semantics. |
Qing Li; Siyuan Huang; Yining Hong; Yixin Zhu; Ying Nian Wu; Song-Chun Zhu; |
65 | Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties, and facilitates conservatism by taking a lower bound on the Bayesian posterior value estimate. |
Jihwan Jeong; Xiaoyu Wang; Michael Gimelfarb; Hyunwoo Kim; Baher abdulhai; Scott Sanner; |
66 | SAM As An Optimal Relaxation of Bayes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. |
Thomas Möllenhoff; Mohammad Emtiyaz Khan; |
67 | Masked Vision and Language Modeling for Multi-modal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. |
Gukyeong Kwon; Zhaowei Cai; Avinash Ravichandran; Erhan Bas; Rahul Bhotika; Stefano Soatto; |
68 | Extreme Q-Learning: MaxEnt RL Without Entropy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT) inspired by Economics. |
Divyansh Garg; Joey Hejna; Matthieu Geist; Stefano Ermon; |
69 | Direct Embedding of Temporal Network Edges Via Time-Decayed Line Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. |
Sudhanshu Chanpuriya; Ryan A. Rossi; Sungchul Kim; Tong Yu; Jane Hoffswell; Nedim Lipka; Shunan Guo; Cameron N Musco; |
70 | Scaling Forward Gradient With Local Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to scale forward gradient by adding a large number of local greedy loss functions. |
Mengye Ren; Simon Kornblith; Renjie Liao; Geoffrey Hinton; |
71 | Latent Variable Representation for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle in the face of uncertainty for exploration. |
Tongzheng Ren; Chenjun Xiao; Tianjun Zhang; Na Li; Zhaoran Wang; sujay sanghavi; Dale Schuurmans; Bo Dai; |
72 | Learning in Temporally Structured Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper advances a multiscale learning model in which each weight in a neural network is decomposed into a sum of subweights learning independently with different learning and decay rates. |
Matt Jones; Tyler R. Scott; Mengye Ren; Gamaleldin Fathy Elsayed; Katherine Hermann; David Mayo; Michael Curtis Mozer; |
73 | Learning QUBO Forms in Quantum Annealing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, such explicit formulations impose tangible constraints on solution encodings. In stark contrast to prior work, this paper proposes to learn QUBO forms from data through gradient backpropagation instead of deriving them. |
Marcel Seelbach Benkner; Maximilian Krahn; Edith Tretschk; Zorah Lähner; Michael Moeller; Vladislav Golyanik; |
74 | The Generalized Eigenvalue Problem As A Nash Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a game-theoretic formulation of the top-$k$ SGEP whose Nash equilibrium is the set of generalized eigenvectors. |
Ian Gemp; Charlie Chen; Brian McWilliams; |
75 | $O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with smooth value updates, finds an $O(T^{?1})$ approximate Nash equilibrium in $T$ iterations for two-player zero-sum Markov games with full information. |
Yuepeng Yang; Cong Ma; |
76 | Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds Using Deep Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. |
Xiang Ji; Minshuo Chen; Mengdi Wang; Tuo Zhao; |
77 | Critic Sequential Monte Carlo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. |
Vasileios Lioutas; Jonathan Wilder Lavington; Justice Sefas; Matthew Niedoba; Yunpeng Liu; Berend Zwartsenberg; Setareh Dabiri; Frank Wood; Adam Scibior; |
78 | Basic Binary Convolution Unit for Binarized Image Restoration Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks. |
Bin Xia; Yulun Zhang; Yitong Wang; Yapeng Tian; Wenming Yang; Radu Timofte; Luc Van Gool; |
79 | Knowledge Distillation Based Degradation Estimation for Blind Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Knowledge Distillation based Blind-SR network (KDSR). |
Bin Xia; Yulun Zhang; Yitong Wang; Yapeng Tian; Wenming Yang; Radu Timofte; Luc Van Gool; |
80 | Spectral Decomposition Representation for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and are derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. |
Tongzheng Ren; Tianjun Zhang; Lisa Lee; Joseph E. Gonzalez; Dale Schuurmans; Bo Dai; |
81 | Fake It Until You Make It : Towards Accurate Near-Distribution Novelty Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first demonstrate existing methods could experience up to 20\% decrease in their AUCs in the near-distribution setting. Next, we propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our model is then fine-tuned to distinguish such data from the normal samples. We make quantitative as well as qualitative evaluation of this strategy, and compare the results with a variety of GAN-based models. |
Hossein Mirzaei; Mohammadreza Salehi; Sajjad Shahabi; Efstratios Gavves; Cees G. M. Snoek; Mohammad Sabokrou; Mohammad Hossein Rohban; |
82 | Contextual Image Masking Modeling Via Synergized Contrasting Without View Augmentation for Faster and Better Visual Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new contextual masking image modeling (MIM) approach called contrasting-aided contextual MIM (ccMIM), under the MIM paradigm for visual pretraining. |
Shaofeng Zhang; Feng Zhu; Rui Zhao; Junchi Yan; |
83 | Patch-Level Contrasting Without Patch Correspondence for Accurate and Dense Contrastive Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ADCLR: \underline{A}ccurate and \underline{D}ense \underline{C}ontrastive \underline{R}epresentation \underline{L}earning, a novel self-supervised learning framework for learning accurate and dense vision representation. |
Shaofeng Zhang; Feng Zhu; Rui Zhao; Junchi Yan; |
84 | A Learning Based Hypothesis Test for Harmful Covariate Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we define harmful covariate shift (HCS) as a change in distribution that may weaken the generalization of a predictive model. |
Tom Ginsberg; Zhongyuan Liang; Rahul G Krishnan; |
85 | Backpropagation at The Infinitesimal Inference Limit of Energy-Based Models: Unifying Predictive Coding, Equilibrium Propagation, and Contrastive Hebbian Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we provide a comprehensive theory of the conditions under which EBMs can approximate BP, which lets us unify many of the BP approximation results in the literature (namely, predictive coding, equilibrium propagation, and contrastive Hebbian learning) and demonstrate that their approximation to BP arises from a simple and general mathematical property of EBMs at free-phase equilibrium. |
Beren Millidge; Yuhang Song; Tommaso Salvatori; Thomas Lukasiewicz; Rafal Bogacz; |
86 | Re-Imagen: Retrieval-Augmented Text-to-Image Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)’ or `Picarones (food)’. To tackle this issue, we present the Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model that uses retrieved information to produce high-fidelity and faithful images, even for rare or unseen entities. |
Wenhu Chen; Hexiang Hu; Chitwan Saharia; William W. Cohen; |
87 | Task-customized Masked Autoencoder Via Mixture of Cluster-conditional Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE’s scalability. To address this issue, we propose a novel MAE based pre-training paradigm, named Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provide customized pre-training models for diverse downstream tasks. |
Zhili LIU; Kai Chen; Jianhua Han; Lanqing HONG; Hang Xu; Zhenguo Li; James Kwok; |
88 | A Theoretical Framework for Inference and Learning in Predictive Coding Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration. |
Beren Millidge; Yuhang Song; Tommaso Salvatori; Thomas Lukasiewicz; Rafal Bogacz; |
89 | Learning to Grow Pretrained Models for Efficient Transformer Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes an approach for accelerating transformer training by learning to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. |
Peihao Wang; Rameswar Panda; Lucas Torroba Hennigen; Philip Greengard; Leonid Karlinsky; Rogerio Feris; David Daniel Cox; Zhangyang Wang; Yoon Kim; |
90 | A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. |
Paul F Jaeger; Carsten Tim Lüth; Lukas Klein; Till J. Bungert; |
91 | Generating Intuitive Fairness Specifications for Natural Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. |
Florian E. Dorner; Momchil Peychev; Nikola Konstantinov; Naman Goel; Elliott Ash; Martin Vechev; |
92 | PiFold: Toward Effective and Efficient Protein Inverse Folding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. |
Zhangyang Gao; Cheng Tan; Stan Z. Li; |
93 | Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give generalization bounds for downstream linear prediction using our kernel PCA representation, and show empirically on a set of synthetic tasks that applying kernel PCA to contrastive learning models can indeed approximately recover the Markov chain eigenfunctions, although the accuracy depends on the kernel parameterization as well as on the augmentation strength. |
Daniel D. Johnson; Ayoub El Hanchi; Chris J. Maddison; |
94 | Provably Auditing Ordinary Least Squares in Low Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop provable and efficient algorithms for estimating stability of OLS to dropping samples in the low-dimensional regime. |
Ankur Moitra; Dhruv Rohatgi; |
95 | Learning Sparse Group Models Through Boolean Relaxation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an efficient algorithmic framework for learning sparse group models formulated as the natural convex relaxation of a cardinality-constrained program with Boolean variables. |
Yijie Wang; Yuan Zhou; Xiaoqing Huang; Kun Huang; Jie Zhang; Jianzhu Ma; |
96 | QAID: Question Answering Inspired Few-shot Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method achieve SOTA results on few-shot intent detection by combining Question-Answering architecture, Contrastive Learning techniques and use of the intent name as answer. |
Asaf Yehudai; Matan Vetzler; Yosi Mass; Koren Lazar; Doron Cohen; Boaz Carmeli; |
97 | Out-of-distribution Representation Learning for Time Series Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to view time series classification from the distribution perspective. |
Wang Lu; Jindong Wang; Xinwei Sun; Yiqiang Chen; Xing Xie; |
98 | Neural DAG Scheduling Via One-Shot Priority Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of scheduling operations/nodes, the dependency among which is characterized by a Directed Acyclic Graph (DAG). |
Wonseok Jeon; Mukul Gagrani; Burak Bartan; Weiliang Will Zeng; Harris Teague; Piero Zappi; Christopher Lott; |
99 | Efficiently Computing Nash Equilibria in Adversarial Team Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players—in the absence of any explicit coordination or communication—is competing against an adversarial player. |
Fivos Kalogiannis; Ioannis Anagnostides; Ioannis Panageas; Emmanouil-Vasileios Vlatakis-Gkaragkounis; Vaggos Chatziafratis; Stelios Andrew Stavroulakis; |
100 | Graph Neural Network-Inspired Kernels for Gaussian Processes in Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Their competitive performance is often attributed to a proper capturing of the graph inductive bias. In this work, we introduce this inductive bias into GPs to improve their predictive performance for graph-structured data. |
Zehao Niu; Mihai Anitescu; Jie Chen; |
101 | SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new method squeeze-enhanced Axial Transformer (SeaFormer) for mobile semantic segmentation. |
Qiang Wan; Jiachen Lu; Zilong Huang; Gang YU; Li Zhang; |
102 | Differentiable Gaussianization Layers for Inverse Problems Regularized By Deep Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In such cases, deep generative models are ineffective in attaining high-fidelity solutions. To address this issue, we propose to reparameterize and Gaussianize the latent tensors using novel differentiable data-dependent layers wherein custom operators are defined by solving optimization problems. |
Dongzhuo Li; |
103 | Approximate Vanishing Ideal Computations at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we scale up the Oracle Approximate Vanishing Ideal algorithm (OAVI), the only generator-constructing algorithm with known learning guarantees. |
Elias Samuel Wirth; Hiroshi Kera; Sebastian Pokutta; |
104 | SoftMatch: Addressing The Quantity-Quality Tradeoff in Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training, effectively exploiting the unlabeled data. |
Hao Chen; Ran Tao; Yue Fan; Yidong Wang; Marios Savvides; Jindong Wang; Bhiksha Raj; Xing Xie; Bernt Schiele; |
105 | Learning Uncertainty for Unknown Domains with Zero-Target-Assumption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce our Maximum-Entropy Rewarded Reinforcement Learning (MERRL) framework that selects training data for more accurate Natural Language Processing (NLP). |
Yu Yu; Hassan Sajjad; Jia Xu; |
106 | Scalable and Equivariant Spherical CNNs By Discrete-Continuous (DISCO) Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a hybrid discrete-continuous (DISCO) group convolution that is simultaneously equivariant and computationally scalable to high-resolution. |
Jeremy Ocampo; Matthew Alexander Price; Jason McEwen; |
107 | FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the analysis, we hence propose FreeMatch to define and adjust the confidence threshold in a self-adaptive manner according to the model’s learning status. |
Yidong Wang; Hao Chen; Qiang Heng; Wenxin Hou; Yue Fan; Zhen Wu; Jindong Wang; Marios Savvides; Takahiro Shinozaki; Bhiksha Raj; Bernt Schiele; Xing Xie; |
108 | Can Discrete Information Extraction Prompts Generalize Across Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a way to induce prompts by mixing language models at training time that results in prompts that generalize well across models. |
Nathanaël Carraz Rakotonirina; Roberto Dessi; Fabio Petroni; Sebastian Riedel; Marco Baroni; |
109 | Disentangling The Mechanisms Behind Implicit Regularization in SGD Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. |
Zachary Novack; Simran Kaur; Tanya Marwah; Saurabh Garg; Zachary Chase Lipton; |
110 | Transformer-based World Models Are Happy With 100k Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. |
Jan Robine; Marc Höftmann; Tobias Uelwer; Stefan Harmeling; |
111 | Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take a different route—we explicitly enhance input-output connections by maximizing their mutual information. |
Ye Zhu; Yu Wu; Kyle Olszewski; Jian Ren; Sergey Tulyakov; Yan Yan; |
112 | Confidential-PROFITT: Confidential PROof of FaIr Training of Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method to provide a confidential proof of fairness for training, in the context of widely used decision trees, which we term Confidential-PROFITT. |
Ali Shahin Shamsabadi; Sierra Calanda Wyllie; Nicholas Franzese; Natalie Dullerud; Sébastien Gambs; Nicolas Papernot; Xiao Wang; Adrian Weller; |
113 | DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our main idea is that the functional capacity required to use a representation is an important but thus-far neglected aspect of representation quality, which we quantify using explicitness or ease-of-use (E). |
Cian Eastwood; Andrei Liviu Nicolicioiu; Julius Von Kügelgen; Armin Kekić; Frederik Träuble; Andrea Dittadi; Bernhard Schölkopf; |
114 | Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods rely on intuitive assumptions and lack mathematical guarantees. To bridge this gap, we introduce Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and transparency. |
Borui Zhang; Wenzhao Zheng; Jie Zhou; Jiwen Lu; |
115 | Faster Federated Optimization Under Second-order Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider finite-sum federated optimization under a second-order function similarity condition and strong convexity, and propose two new algorithms: SVRP and Catalyzed SVRP. |
Ahmed Khaled; Chi Jin; |
116 | The Augmented Image Prior: Distilling 1000 Classes By Extrapolating from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While any image obviously cannot contain the multitudes of all existing objects, scenes and lighting conditions — within the space of all $256^{3\cdot224\cdot224}$ possible $224$-sized square images, it might still provide a strong prior for natural images. To analyze this “augmented image prior” hypothesis, we develop a simple framework for training neural networks from scratch using a single image and augmentations using knowledge distillation from a supervised pretrained teacher. |
Yuki M Asano; Aaqib Saeed; |
117 | Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key idea is to factorize canonical shapes and articulated object poses from input articulated shapes through part-level equivariant shape analysis. |
Xueyi Liu; Ji Zhang; Ruizhen Hu; Haibin Huang; He Wang; Li Yi; |
118 | Schema Inference for Interpretable Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a novel inference paradigm, termed as schema inference, that learns to deductively infer the explainable predictions by rebuilding the prior deep neural network (DNN) forwarding scheme, guided by the prevalent philosophical cognitive concept of schema. |
Haofei Zhang; Xiaokang Liu; Mengqi Xue; Kaixuan Chen; Jie Song; Mingli Song; |
119 | Autoencoders As Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training Autoencoders as Cross-Modal Teachers (ACT). |
Runpei Dong; Zekun Qi; Linfeng Zhang; Junbo Zhang; Jianjian Sun; Zheng Ge; Li Yi; Kaisheng Ma; |
120 | Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this line of research is still in its infancy, where (1) unified structural conditions enabling sample-efficient learning are lacking; (2) existing sample complexities for known tractable subclasses are far from sharp; and (3) fewer sample-efficient algorithms are available than in fully observable RL. This paper advances all three aspects above for Partially Observable RL in the general setting of Predictive State Representations (PSRs). |
Fan Chen; Yu Bai; Song Mei; |
121 | Towards Lightweight, Model-Agnostic and Diversity-Aware Active Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, most existing AAD approaches are specially tailored for a certain unsupervised detector, making it difficult to extend to other detection models. To tackle these problems, we propose a lightweight, model-agnostic and diversity-aware AAD method, named LMADA. |
Xu Zhang; Yuan Zhao; Ziang Cui; Liqun Li; Shilin He; Qingwei Lin; Yingnong Dang; Saravan Rajmohan; Dongmei Zhang; |
122 | Complexity-Based Prompting for Multi-step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. |
Yao Fu; Hao Peng; Ashish Sabharwal; Peter Clark; Tushar Khot; |
123 | Gromov-Wasserstein Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel representation learning method, Gromov-Wasserstein Autoencoders (GWAE), which directly matches the latent and data distributions using the variational autoencoding scheme. |
Nao Nakagawa; Ren Togo; Takahiro Ogawa; Miki Haseyama; |
124 | Moving Forward By Moving Backward: Embedding Action Impact Over Action Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. |
Kuo-Hao Zeng; Luca Weihs; Roozbeh Mottaghi; Ali Farhadi; |
125 | Decomposed Prompting: A Modular Approach for Solving Complex Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. |
Tushar Khot; Harsh Trivedi; Matthew Finlayson; Yao Fu; Kyle Richardson; Peter Clark; Ashish Sabharwal; |
126 | UNICORN: A Unified Backdoor Trigger Inversion Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work formally defines and analyzes the trigger and the inversion problem. |
Zhenting Wang; Kai Mei; Juan Zhai; Shiqing Ma; |
127 | How Gradient Estimator Variance and Bias Impact Learning in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we show that variance and bias can impair learning on the training data, but some degree of variance and bias in a gradient estimator can be beneficial for generalization. |
Arna Ghosh; Yuhan Helena Liu; Guillaume Lajoie; Konrad Kording; Blake Aaron Richards; |
128 | Sampling Is As Easy As Learning The Score: Theory for Diffusion Models with Minimal Data Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2. |
Sitan Chen; Sinho Chewi; Jerry Li; Yuanzhi Li; Adil Salim; Anru Zhang; |
129 | Post-hoc Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). |
Mert Yuksekgonul; Maggie Wang; James Zou; |
130 | Is A Caption Worth A Thousand Images? A Study on Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by our findings, we devise simple data and algorithmic interventions to improve the transfer performance of CLIP-style models. |
Shibani Santurkar; Yann Dubois; Rohan Taori; Percy Liang; Tatsunori Hashimoto; |
131 | Continual Post-Training of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing research has shown that post-training or adapting an LM using an unlabeled topical/domain corpus can improve the end-task performance in the domain. This paper proposes a novel method to continually post-train an LM with a sequence of unlabeled domain corpora to adapt the LMto these domains to improve their end-task performances. |
Zixuan Ke; Haowei Lin; Yijia Shao; Tatsuya Konishi; Gyuhak Kim; Bing Liu; |
132 | Learning to Generate Columns with Application to Vertex Coloring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new column generation approach based on Machine Learning (ML) for solving combinatorial optimization problems. |
Yuan Sun; Andreas T Ernst; Xiaodong Li; Jake Weiner; |
133 | Constraining Representations Yields Models That Know What They Don’t Know Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model’s internal activation patterns. |
Joao Monteiro; Pau Rodriguez; Pierre-Andre Noel; Issam H. Laradji; David Vazquez; |
134 | Temporal Domain Generalization with Drift-Aware Dynamic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address them, we propose a Temporal Domain Generalization with Drift-Aware Dynamic Neural Network (DRAIN) framework. |
Guangji Bai; Chen Ling; Liang Zhao; |
135 | Causal Estimation for Text Data with (Apparent) Overlap Violations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The purpose of this paper is to show how to handle causal identification and obtain robust causal estimation in the presence of apparent overlap violations. |
Lin Gui; Victor Veitch; |
136 | A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Visual room rearrangement evaluates an agent’s ability to rearrange objects in a room to a desired goal based solely on visual input. We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete. |
Brandon Trabucco; Gunnar A Sigurdsson; Robinson Piramuthu; Gaurav S. Sukhatme; Ruslan Salakhutdinov; |
137 | Improved Training of Physics-Informed Neural Networks Using Energy-Based Priors: A Study on Electrical Impedance Tomography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Bayesian approach through a data-driven energy-based model (EBM) as a prior, to improve the overall accuracy and quality of tomographic reconstruction. |
Akarsh Pokkunuru; Pedram Rooshenas; Thilo Strauss; Anuj Abhishek; Taufiquar Khan; |
138 | ESD: Expected Squared Difference As A Tuning-Free Trainable Calibration Measure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a tuning-free calibration obejctive loss Expected Squared Difference (ESD), where we view the calibration error from the perspective of the squared difference between two expectations. |
Hee Suk Yoon; Joshua Tian Jin Tee; Eunseop Yoon; Sunjae Yoon; Gwangsu Kim; Yingzhen Li; Chang D. Yoo; |
139 | An Extensible Multi-modal Multi-task Object Dataset with Materials Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present EMMa, an Extensible, Multimodal dataset of Amazon product listings that contains rich Material annotations. |
Trevor Scott Standley; Ruohan Gao; Dawn Chen; Jiajun Wu; Silvio Savarese; |
140 | Does Zero-Shot Reinforcement Learning Exist? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Strategies for approximate zero-shot RL have been suggested using successor features (SFs) (Borsa et al., 2018) or forward-backward (FB) representations (Touati & Ollivier, 2021), but testing has been limited. After clarifying the relationships between these schemes, we introduce improved losses and new SF models, and test the viability of zero-shot RL schemes systematically on tasks from the Unsupervised RL benchmark (Laskin et al., 2021). |
Ahmed Touati; Jérémy Rapin; Yann Ollivier; |
141 | Self-Stabilization: The Implicit Bias of Gradient Descent at The Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that, far from being chaotic, the dynamics of gradient descent at the edge of stability can be captured by a cubic Taylor expansion: as the iterates diverge in direction of the top eigenvector of the Hessian due to instability, the cubic term in the local Taylor expansion of the loss function causes the curvature to decrease until stability is restored. |
Alex Damian; Eshaan Nichani; Jason D. Lee; |
142 | Interactive Portrait Harmonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable flexible interaction between user and harmonization, we introduce interactive harmonization, a new setting where the harmonization is performed with respect to a selected region in the reference image instead of the entire background.Furthermore, we also introduce a new dataset carefully curated for validating portrait harmonization. |
Jeya Maria Jose Valanarasu; HE Zhang; Jianming Zhang; Yilin Wang; Zhe Lin; Jose Echevarria; Yinglan Ma; Zijun Wei; Kalyan Sunkavalli; Vishal Patel; |
143 | STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce STREET, a unified multi-task and multi-domain natural language reasoning and explanation benchmark. |
Danilo Neves Ribeiro; Shen Wang; Xiaofei Ma; Henghui Zhu; Rui Dong; Deguang Kong; Juliette Burger; Anjelica Ramos; zhiheng huang; William Yang Wang; George Karypis; Bing Xiang; Dan Roth; |
144 | Hierarchical Sliced Wasserstein Distance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. |
Khai Nguyen; Tongzheng Ren; Huy Nguyen; Litu Rout; Tan Minh Nguyen; Nhat Ho; |
145 | Restricted Strong Convexity of Deep Learning Models with Smooth Activations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of optimization of deep learning models with smooth activation functions. |
Arindam Banerjee; Pedro Cisneros; Libin Zhu; Misha Belkin; |
146 | Koopman Neural Operator Forecaster for Time-series with Temporal Distributional Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep sequence model based on the Koopman theory for time series forecasting: Koopman Neural Forecaster (KNF) that leverages DNNs to learn the linear Koopman space and the coefficients of chosen measurement functions. |
Rui Wang; Yihe Dong; Sercan O Arik; Rose Yu; |
147 | Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a single objective which jointly optimizes a latent-space model and policy to achieve high returns while remaining self-consistent. |
Raj Ghugare; Homanga Bharadhwaj; Benjamin Eysenbach; Sergey Levine; Russ Salakhutdinov; |
148 | Minimum Description Length Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. |
Ted Moskovitz; Ta-Chu Kao; Maneesh Sahani; Matthew Botvinick; |
149 | Decoupled Training for Long-Tailed Classification With Stochastic Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel classifier re-training algorithm for long-tailed classification. |
Giung Nam; Sunguk Jang; Juho Lee; |
150 | Where to Begin? Exploring The Impact of Pre-Training and Initialization in Federated Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically study the impact of starting from a pre-trained model in federated learning using four standard federated learning benchmark datasets. Unsurprisingly, starting from a pre-trained model reduces the training time required to reach a target error rate and enables the training of more accurate models (up to 40\%) than is possible when starting from random initialization. Surprisingly, we also find that starting federated learning from a pre-trained initialization reduces the effect of both data and system heterogeneity. |
John Nguyen; Jianyu Wang; Kshitiz Malik; Maziar Sanjabi; Michael Rabbat; |
151 | Martingale Posterior Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a different approach based on the martingale posterior, a recently developed alternative to Bayesian inference. |
Hyungi Lee; Eunggu Yun; Giung Nam; Edwin Fong; Juho Lee; |
152 | BigVGAN: A Universal Neural Vocoder with Large-Scale Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present BigVGAN, a universal vocoder that generalizes well for various out-of-distribution (OOD) scenarios without fine-tuning. |
Sang-gil Lee; Wei Ping; Boris Ginsburg; Bryan Catanzaro; Sungroh Yoon; |
153 | Progressive Voronoi Diagram Subdivision Enables Accurate Data-free Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present \emph{iVoro}, a novel framework derived from computational geometry. |
Chunwei Ma; Zhanghexuan Ji; Ziyun Huang; Yan Shen; Mingchen Gao; Jinhui Xu; |
154 | MEDICAL IMAGE UNDERSTANDING WITH PRETRAINED VISION LANGUAGE MODELS: A COMPREHENSIVE STUDY Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper discuss about how to leverage the trending vision language model to transfer to the medical domain, showing exciting performance on zero-shot and few-shot learning tasks. |
Ziyuan Qin; Hua Hui Yi; Qicheng Lao; Kang Li; |
155 | Approximate Bayesian Inference with Stein Functional Variational Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a general-purpose variational algorithm that forms a natural analogue of Stein variational gradient descent (SVGD) in function space. |
Tobias Pielok; Bernd Bischl; David Rügamer; |
156 | When and Why Vision-Language Models Behave Like Bags-of-Words, and What to Do About It? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the settings where state-of-the-art VLMs behave like bags-of-words—i.e. when they have poor relational understanding, can blunder when linking objects to their attributes, and demonstrate a severe lack of order sensitivity.Here, we create the Attribution, Relation, and Order (ARO) benchmark to systematically evaluate the ability of VLMs to understand different types of relationships, attributes, and order information. |
Mert Yuksekgonul; Federico Bianchi; Pratyusha Kalluri; Dan Jurafsky; James Zou; |
157 | Causal Imitation Learning Via Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies imitation learning through causal lenses and extends the analysis and tools developed for behavior cloning (Zhang, Kumor, Bareinboim, 2020) to inverse reinforcement learning. |
Kangrui Ruan; Junzhe Zhang; Xuan Di; Elias Bareinboim; |
158 | The Surprising Computational Power of Nondeterministic Stack RNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nondeterminism is needed for recognizing all CFLs (not just deterministic CFLs), but in this paper, we show that nondeterminism and the neural controller interact to produce two more unexpected abilities. |
Brian DuSell; David Chiang; |
159 | Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop ORCHID, a flexible framework generalizing Ollivier-Ricci curvature to hypergraphs, and prove that the resulting curvatures have favorable theoretical properties. |
Corinna Coupette; Sebastian Dalleiger; Bastian Rieck; |
160 | Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel HYperbolic Self-Paced model (HYSP) for learning skeleton-based action representations. |
Luca Franco; Paolo Mandica; Bharti Munjal; Fabio Galasso; |
161 | Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. |
Haozhe Jiang; Qiwen Cui; Zhihan Xiong; Maryam Fazel; Simon Shaolei Du; |
162 | Auto-Encoding Goodness of Fit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop the Goodness of Fit Autoencoder (GoFAE), which incorporates hypothesis tests at two levels. |
Aaron Palmer; Zhiyi Chi; Derek Aguiar; Jinbo Bi; |
163 | Sparse Tree-based Initialization for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. |
Patrick Lutz; Ludovic Arnould; Claire Boyer; Erwan Scornet; |
164 | Efficient Conditionally Invariant Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. |
Roman Pogodin; Namrata Deka; Yazhe Li; Danica J. Sutherland; Victor Veitch; Arthur Gretton; |
165 | Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the models on two typical kinds of NLP tasks, text classification and extractive question answering. |
Mingxu Tao; Yansong Feng; Dongyan Zhao; |
166 | Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start by making the empirical observation that a naive joint optimization of CL and MIM losses leads to conflicting gradient directions – more severe as the layers go deeper. |
Ziyu Jiang; Yinpeng Chen; Mengchen Liu; Dongdong Chen; Xiyang Dai; Lu Yuan; Zicheng Liu; Zhangyang Wang; |
167 | DreamFusion: Text-to-3D Using 2D Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D or multiview data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. |
Ben Poole; Ajay Jain; Jonathan T. Barron; Ben Mildenhall; |
168 | Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Indeed, we show that it fails to discover a large portion of manipulation directions that can be found by existing methods, which manually manipulates latent space without texts. To alleviate this issue, we propose a novel method that learns a Dictionary, whose entry corresponds to the representation of a single channel, by taking into account the manipulation effect coming from the interaction with multiple other channels. |
Yoonjeon Kim; Hyunsu Kim; Junho Kim; Yunjey Choi; Eunho Yang; |
169 | Effective Passive Membership Inference Attacks in Federated Learning Against Overparameterized Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work considers the challenge of performing membership inference attacks in a federated learning setting —for image classification— where an adversary can only observe the communication between the central node and a single client (a passive white-box attack). |
Jiacheng Li; Ninghui Li; Bruno Ribeiro; |
170 | Joint Edge-Model Sparse Learning Is Provably Efficient for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Encouraged by the empirical success of sparse learners in accelerating GNN training, this paper characterizes the impact of graph sampling and neuron pruning on the sample complexity and convergence rate for a desirable test accuracy quantitatively. |
Shuai Zhang; Meng Wang; Pin-Yu Chen; Sijia Liu; Songtao Lu; Miao Liu; |
171 | Tier Balancing: Towards Dynamic Fairness Over Underlying Causal Factors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, through causal modeling with a directed acyclic graph (DAG) on the decision-distribution interplay, we investigate the possibility of achieving long-term fairness from a dynamic perspective. |
Zeyu Tang; Yatong Chen; Yang Liu; Kun Zhang; |
172 | CoRTX: Contrastive Framework for Real-time Explanation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a COntrastive Real-Time eXplanation (CoRTX) framework to learn the explanation-oriented representation and relieve the intensive dependence of explainer training on explanation labels. |
Yu-Neng Chuang; Guanchu Wang; Fan Yang; Quan Zhou; Pushkar Tripathi; Xuanting Cai; Xia Hu; |
173 | Anamnesic Neural Differential Equations with Orthogonal Polynomial Projections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose PolyODE, a Neural ODE that models the latent continuous-time process as a projection onto a basis of orthogonal polynomials. |
Edward De Brouwer; Rahul G Krishnan; |
174 | Large Language Models Are Human-Level Prompt Engineers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. |
Yongchao Zhou; Andrei Ioan Muresanu; Ziwen Han; Keiran Paster; Silviu Pitis; Harris Chan; Jimmy Ba; |
175 | AutoTransfer: AutoML with Knowledge Transfer – An Application to Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose AutoTransfer, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. |
Kaidi Cao; Jiaxuan You; Jiaju Liu; Jure Leskovec; |
176 | Explaining RL Decisions with Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the literature, the explanation is often provided by saliency attribution to the features of the RL agent’s state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. |
Shripad Vilasrao Deshmukh; Arpan Dasgupta; Chirag Agarwal; Nan Jiang; Balaji Krishnamurthy; Georgios Theocharous; Jayakumar Subramanian; |
177 | On Representing Linear Programs By Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the literature has reported encouraging numerical results, this paper establishes the theoretical foundation of applying GNNs to solving LPs. |
Ziang Chen; Jialin Liu; Xinshang Wang; Wotao Yin; |
178 | On Representing Mixed-Integer Linear Programs By Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work discovers a fundamental limitation: there exist feasible and infeasible MILPs that all GNNs will, however, treat equally, indicating GNN’s lacking power to express general MILPs. Then, we show that, by restricting the MILPs to unfoldable ones or by adding random features, there exist GNNs that can reliably predict MILP feasibility, optimal objective values, and optimal solutions up to prescribed precision. |
Ziang Chen; Jialin Liu; Xinshang Wang; Wotao Yin; |
179 | Efficient Discrete Multi Marginal Optimal Transport Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage multi-marginal optimal transport (MMOT), where we take advantage of a procedure that computes a generalized earth mover’s distance as a sub-routine. |
Ronak Mehta; Jeffery Kline; Vishnu Suresh Lokhande; Glenn Fung; Vikas Singh; |
180 | Graph Signal Sampling for Inductive One-Bit Matrix Completion: A Closed-form Solution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. |
Chao Chen; Haoyu Geng; Gang Zeng; Zhaobing Han; Hua Chai; Xiaokang Yang; Junchi Yan; |
181 | A New Hierarchy of Expressivity for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we show that, contrary to the widely accepted view, the $k$-WL hierarchy is not well-suited for measuring expressive GNNs. |
Qing Wang; Dillon Ze Chen; Asiri Wijesinghe; Shouheng Li; Muhammad Farhan; |
182 | On Achieving Optimal Adversarial Test Error Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. |
Justin D. Li; Matus Telgarsky; |
183 | Powderworld: A Platform for Understanding Generalization Via Rich Task Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To take a step towards addressing this research bottleneck, this work presents Powderworld, a lightweight yet expressive simulation environment running directly on the GPU. |
Kevin Frans; Phillip Isola; |
184 | 3D Segmenter: 3D Transformer Based Semantic Segmentation Via 2D Panoramic Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, in this work, we propose the first 2D-to-3D knowledge distillation strategy to enhance 3D semantic segmentation model with knowledge embedded in the latent space of powerful 2D models.To facilitate our research, we create a large-scale, fine-annotated 3D semantic segmentation benchmark, containing voxel-wise semantic labels and aligned panoramas of 5175 scenes. |
ZHENNAN WU; YANG LI; Yifei Huang; Lin Gu; Tatsuya Harada; Hiroyuki Sato; |
185 | Fairness and Accuracy Under Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from never-before-seen domains. |
Thai-Hoang Pham; Xueru Zhang; Ping Zhang; |
186 | Text Summarization with Oracle Expectation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify two flaws with the widely used greedy labeling approach: it delivers suboptimal and deterministic oracles. |
Yumo Xu; Mirella Lapata; |
187 | Efficient Attention Via Control Variates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel analysis of random feature attention based on control variates, which characterizes its gap to full softmax attention and induces a novel efficient variant that significantly improves the approximation while remaining efficient. |
Lin Zheng; Jianbo Yuan; Chong Wang; Lingpeng Kong; |
188 | Pitfalls of Gaussians As A Noise Distribution in NCE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss – even for very simple data distributions. |
Holden Lee; Chirag Pabbaraju; Anish Prasad Sevekari; Andrej Risteski; |
189 | HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, such hierarchical aggregation significantly improves the cross-modal alignment. To demonstrate the advantages of HiCLIP, we conduct qualitative analysis on its unsupervised hierarchy induction during inference, as well as extensive quantitative experiments on both visual recognition and vision-language downstream tasks. |
Shijie Geng; Jianbo Yuan; Yu Tian; Yuxiao Chen; Yongfeng Zhang; |
190 | Sparse MoE with Random Routing As The New Dropout: Training Bigger and Self-Scalable Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new plug-and-play training framework, $\textbf{SMoE-Dropout}$ to enable scaling transformers to better accuracy in the full capacity setting without collapse. |
Tianlong Chen; Zhenyu Zhang; AJAY KUMAR JAISWAL; Shiwei Liu; Zhangyang Wang; |
191 | MIMT: Masked Image Modeling Transformer for Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus introduce an entropy model based on a masked image modeling transformer (MIMT) to learn the spatial-temporal dependencies. |
Jinxi Xiang; Kuan Tian; Jun Zhang; |
192 | Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce Sparsity May Cry Benchmark (SMC-Bench), a collection of carefully curated 4 diverse tasks with 12 datasets, that accounts for capturing a wide-range of domain-specific knowledge. |
Shiwei Liu; Tianlong Chen; Zhenyu Zhang; Xuxi Chen; Tianjin Huang; AJAY KUMAR JAISWAL; Zhangyang Wang; |
193 | Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new mechanism to augment a word vector embedding representation that offers improved bias removal while retaining the key information—resulting in improved interpretability of the representation. |
Prince Osei Aboagye; Yan Zheng; Jack Shunn; Chin-Chia Michael Yeh; Junpeng Wang; Zhongfang Zhuang; Huiyuan Chen; Liang Wang; Wei Zhang; Jeff Phillips; |
194 | The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper discovers that equivariant models are surprisingly effective in domains with latent or partial symmetries. |
Dian Wang; Jung Yeon Park; Neel Sortur; Lawson L.S. Wong; Robin Walters; Robert Platt; |
195 | Unsupervised Model Selection for Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper answers the question– Given an unlabeled dataset and a set of candidate time series anomaly detectors, how can we select the most accurate model? |
Mononito Goswami; Cristian Ignacio Challu; Laurent Callot; Lenon Minorics; Andrey Kan; |
196 | GoBigger: A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even though this has greatly promoted the development of multi-agent reinforcement learning (MARL), it is still not enough to support further exploration on the behavior of swarm intelligence between multiple teams, and cooperation between multiple agents due to their limited scalability. To alleviate this, we introduce GoBigger, a scalable platform for cooperative-competition multi-agent interactive simulation. |
Ming Zhang; Shenghan Zhang; Zhenjie Yang; Lekai Chen; Jinliang Zheng; Chao Yang; Chuming Li; Hang Zhou; Yazhe Niu; Yu Liu; |
197 | Representation Learning for Low-rank General-sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage representation learning and present a model-based and a model-free approach to construct an effective representation from collected data. |
Chengzhuo Ni; Yuda Song; Xuezhou Zhang; Zihan Ding; Chi Jin; Mengdi Wang; |
198 | Exploring Low-Rank Property in Multiple Instance Learning for Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage the properties of the apparent similarity in high-resolution WSIs, which essentially exhibit \textit{low-rank} structures in the data manifold, to develop a novel MIL with a boost in both feature embedding and feature aggregation. |
Jinxi Xiang; Jun Zhang; |
199 | Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the problem of “\textit{how to accelerate the convergence of adaptive gradient algorithms in a general manner}, and aim at providing practical insights to boost the training efficiency. |
Pan Zhou; Xingyu Xie; Shuicheng YAN; |
200 | Efficient Edge Inference By Selective Query Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel end-to-end hybrid learning framework that allows the edge to selectively query only those hard examples that the cloud can classify correctly. |
Anil Kag; Igor Fedorov; Aditya Gangrade; Paul Whatmough; Venkatesh Saligrama; |
201 | Learning Topology-preserving Data Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for learning topology-preserving data representations (dimensionality reduction). |
Ilya Trofimov; Daniil Cherniavskii; Eduard Tulchinskii; Nikita Balabin; Serguei Barannikov; Evgeny Burnaev; |
202 | Towards Understanding Why Mask Reconstruction Pretraining Helps in Downstream Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve these problems, we first theoretically show that on an auto-encoder of a two/one-layered convolution encoder/decoder, MRP can capture all discriminative semantics of each potential semantic class in the pretraining dataset. Then considering the fact that the pretraining dataset is of huge size and high diversity and thus covers most semantics in downstream dataset, in fine-tuning phase, the pretrained encoder can capture as much semantics as it can in downstream datasets, and would not lost these semantics with theoretical guarantees. |
Jiachun Pan; Pan Zhou; Shuicheng YAN; |
203 | Leveraging Incompatibility to Defend Against Backdoor Poisoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify an incompatibility property of the interaction of clean and poisoned data with the training algorithm, specifically that including poisoned data in the training dataset does not improve model accuracy on clean data and vice-versa. Leveraging this property, we develop an algorithm that iteratively refines subsets of the poisoned dataset to obtain subsets that concentrate around either clean or poisoned data. |
Charles Jin; Melinda Sun; Martin Rinard; |
204 | Statistical Guarantees for Consensus Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the statistical performance of aggregation algorithms under a stochastic label perturbation model, and show that a $K$-means type algorithm followed by a local refinement step can achieve near optimal performance, with a rate that decays exponentially in $N$. |
Zhixin Zhou; Gautam Dudeja; Arash A Amini; |
205 | More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose multi-agent conditional policy factorization (MACPF), which takes more centralized training but still enables decentralized execution. |
Jiangxing Wang; Deheng Ye; Zongqing Lu; |
206 | Calibrating Transformers Via Sparse Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Extending Transformer’s success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. |
Wenlong Chen; Yingzhen Li; |
207 | Red PANDA: Disambiguating Anomaly Detection By Removing Nuisance Factors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Breaking from previous research, we present a new anomaly detection method that allows operators to exclude an attribute when detecting anomalies. |
Niv Cohen; Jonathan Kahana; Yedid Hoshen; |
208 | STOCHASTIC NO-REGRET LEARNING FOR GENERAL GAMES WITH VARIANCE REDUCTION Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that a stochastic version of optimistic mirror descent (OMD), a variant of mirror descent with recency bias, converges fast in general games. |
Yichi Zhou; Fang Kong; Shuai Li; |
209 | The Dark Side of AutoML: Towards Architectural Backdoor Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper asks the intriguing question: is it possible to exploit neural architecture search (NAS) as a new attack vector to launch previously improbable attacks? Specifically, we present EVAS, a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers. |
Ren Pang; Changjiang Li; Zhaohan Xi; Shouling Ji; Ting Wang; |
210 | Alternating Differentiation for Optimization Layers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we developed a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems (here, specifically in the form of convex optimization problems with polyhedral constraints) in a fast and recursive way. |
Haixiang Sun; Ye Shi; Jingya Wang; Hoang Duong Tuan; H. Vincent Poor; Dacheng Tao; |
211 | On The Relative Error of Random Fourier Features for Preserving Kernel Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions. |
Kuan Cheng; Shaofeng H.-C. Jiang; Luojian Wei; Zhide Wei; |
212 | Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of a recent hardness result (Liu et al., 2022), we focus on the setting where the opponent’s previous policies are revealed to the agent for decision making. With such an information structure, we propose a new algorithm, Decentralized Optimistic hypeRpolicy mIrror deScent (DORIS), which achieves $\sqrt{K}$-regret in the context of general function approximation, where $K$ is the number of episodes. |
Wenhao Zhan; Jason D. Lee; Zhuoran Yang; |
213 | PAC Reinforcement Learning for Predictive State Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. |
Wenhao Zhan; Masatoshi Uehara; Wen Sun; Jason D. Lee; |
214 | Make-A-Video: Text-to-Video Generation Without Text-Video Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Make-A-Video — an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). |
Uriel Singer; Adam Polyak; Thomas Hayes; Xi Yin; Jie An; Songyang Zhang; Qiyuan Hu; Harry Yang; Oron Ashual; Oran Gafni; Devi Parikh; Sonal Gupta; Yaniv Taigman; |
215 | Static Prediction of Runtime Errors By Learning to Execute Programs with External Resource Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce a competitive programming dataset and task for predicting runtime errors, which we show is difficult for generic models like Transformers. We approach this task by developing an interpreter-inspired architecture with an inductive bias towards mimicking program executions, which models exception handling and learns to execute descriptions of external resources. |
David Bieber; Rishab Goel; Dan Zheng; Hugo Larochelle; Daniel Tarlow; |
216 | MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current detection of cache timing attacks relies heavily on heuristics and expert knowledge, which can lead to brittleness and inability to adapt to new attacks. To mitigate these problems, we develop a two-player environment for cache-timing attacks and detection, and leverage the idea of population-based multi-agent reinforcement learning (MARL) to train both attackers and detectors. |
Jiaxun Cui; Xiaomeng Yang; Geunbae Lee; Mulong Luo; Peter Stone; Hsien-Hsin S. Lee; Benjamin Lee; G. Edward Suh; Wenjie Xiong; Yuandong Tian; |
217 | Quantized Compressed Sensing with Score-Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the power of score-based generative models (SGM, also known as diffusion models) in capturing the rich structure of natural signals beyond simple sparsity, we propose an unsupervised data-driven approach called quantized compressed sensing with SGM (QCS-SGM), where the prior distribution is modeled by a pre-trained SGM. |
Xiangming Meng; Yoshiyuki Kabashima; |
218 | SGDA with Shuffling: Faster Convergence for Nonconvex-PŁ Minimax Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-?ojasiewicz (P?) geometry. |
Hanseul Cho; Chulhee Yun; |
219 | MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. |
Chenglin Yang; Siyuan Qiao; Qihang Yu; Xiaoding Yuan; Yukun Zhu; Alan Yuille; Hartwig Adam; Liang-Chieh Chen; |
220 | View Synthesis with Sculpted Neural Points Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new approach that performs view synthesis using point clouds. |
Yiming Zuo; Jia Deng; |
221 | Extremely Simple Activation Shaping for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, in this work, we propose an extremely simple, post-hoc, on-the-fly activation shaping method, ASH, where a large portion (e.g. 90%) of a sample’s activation at a late layer is removed, and the rest (e.g. 10%) simplified or lightly adjusted. |
Andrija Djurisic; Nebojsa Bozanic; Arjun Ashok; Rosanne Liu; |
222 | Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. |
Mhairi Dunion; Trevor McInroe; Kevin Sebastian Luck; Josiah P. Hanna; Stefano V Albrecht; |
223 | Dr.Spider: A Diagnostic Evaluation Benchmark Towards Text-to-SQL Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. |
Shuaichen Chang; Jun Wang; Mingwen Dong; Lin Pan; Henghui Zhu; Alexander Hanbo Li; Wuwei Lan; Sheng Zhang; Jiarong Jiang; Joseph Lilien; Steve Ash; William Yang Wang; Zhiguo Wang; Vittorio Castelli; Patrick Ng; Bing Xiang; |
224 | HiT-MDP: Learning The SMDP Option Framework on MDPs with Hidden Temporal Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Markov Decision Process (MDP), the Hidden Temporal MDP (HiT-MDP), and prove that the option-induced HiT-MDP is homomorphic equivalent to the option-induced SMDP. |
Chang Li; Dongjin Song; Dacheng Tao; |
225 | Expressive Monotonic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. |
Niklas Nolte; Ouail Kitouni; Mike Williams; |
226 | Information-Theoretic Analysis of Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While our bounds for the first kind of error are in line with the traditional analysis and give similar insights, our bounds on the second kind of error are algorithm-dependent, which also provide insights into algorithm designs. Specifically, we present two simple techniques for improving generalization in UDA and validate them experimentally. |
Ziqiao Wang; Yongyi Mao; |
227 | Provably Efficient Lifelong Reinforcement Learning with Linear Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We frame the problem as a linearly parameterized contextual Markov decision process (MDP), where each task is specified by a context and the transition dynamics is context-independent, and we introduce a new completeness-style assumption on the representation which is sufficient to ensure the optimal multi-task policy is realizable under the linear representation. Under this assumption, we propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks while using only sublinear planning calls. |
Sanae Amani; Lin Yang; Ching-An Cheng; |
228 | Valid P-Value for Deep Learning-driven Salient Region Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a method to quantify the reliability of a saliency region in the form of p-values. |
Miwa Daiki; Vo Nguyen Le Duy; Ichiro Takeuchi; |
229 | A Theoretical Understanding of Vision Transformers: Learning, Generalization, and Sample Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a three-layer ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. |
Hongkang Li; Meng Wang; Sijia Liu; Pin-Yu Chen; |
230 | Disentanglement of Correlated Factors Via Hausdorff Factorized Support Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a method that allows for disentangled representation learning not only under the assumption of independent factors of variation but instead fundamentally allows for much more realistic correlations during training. |
Karsten Roth; Mark Ibrahim; Zeynep Akata; Pascal Vincent; Diane Bouchacourt; |
231 | SCALE-UP: An Efficient Black-box Input-level Backdoor Detection Via Analyzing Scaled Prediction Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. |
Junfeng Guo; Yiming Li; Xun Chen; Hanqing Guo; Lichao Sun; Cong Liu; |
232 | Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we perform a comprehensive experimental evaluation on MPO and SAC to explore the effectiveness of other colors of noise as action noise. |
Onno Eberhard; Jakob Hollenstein; Cristina Pinneri; Georg Martius; |
233 | Revisiting The Assumption of Latent Separability for Backdoor Defenses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This question is central to understanding whether the assumption of latent separability provides a reliable foundation for defending against backdoor poisoning attacks. In this paper, we design adaptive backdoor poisoning attacks to present counter-examples against this assumption. |
Xiangyu Qi; Tinghao Xie; Yiming Li; Saeed Mahloujifar; Prateek Mittal; |
234 | Optimal Transport for Offline Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Optimal Transport Relabeling (OTR), an imitation learning algorithm that can automatically relabel offline data of mixed and unknown quality with rewards from a few good demonstrations. |
Yicheng Luo; zhengyao jiang; Samuel Cohen; Edward Grefenstette; Marc Peter Deisenroth; |
235 | Mitigating Dataset Bias By Using Per-Sample Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a debiasing algorithm leveraging gradient called PGD (Per-sample Gradient-based Debiasing). |
Sumyeong Ahn; Seongyoon Kim; Se-Young Yun; |
236 | Efficient Model Updates for Approximate Unlearning of Graph-Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces the first known approach for \emph{approximate graph unlearning} with provable theoretical guarantees. |
Eli Chien; Chao Pan; Olgica Milenkovic; |
237 | MAST: Masked Augmentation Subspace Training for Generalizable Self-Supervised Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to learn self-supervised features that generalize well across a variety of downstream tasks (e.g., object classification, detection and instance segmentation) without knowing any task information beforehand. |
Chen Huang; Hanlin Goh; Jiatao Gu; Joshua M. Susskind; |
238 | SmartFRZ: An Efficient Training Framework Using Attention-Based Layer Freezing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a generic and efficient training framework (SmartFRZ). |
Sheng Li; Geng Yuan; Yue Dai; Youtao Zhang; Yanzhi Wang; Xulong Tang; |
239 | Sparse Random Networks for Communication-Efficient Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an FL framework, where clients find a sparse random network using a stochastic strategy; and provide (1) lower communication cost, (2) higher accuracy, (3) faster convergence, and (4) at the end of the training, a compressed final model. |
Berivan Isik; Francesco Pase; Deniz Gunduz; Tsachy Weissman; Zorzi Michele; |
240 | PV3D: A 3D Generative Model for Portrait Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos. |
Eric Zhongcong Xu; Jianfeng Zhang; Jun Hao Liew; Wenqing Zhang; Song Bai; Jiashi Feng; Mike Zheng Shou; |
241 | S-NeRF: Neural Radiance Fields for Street Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly. |
Ziyang Xie; Junge Zhang; Wenye Li; Feihu Zhang; Li Zhang; |
242 | The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb), as well as gated linear networks (GLN). |
Blake Bordelon; Cengiz Pehlevan; |
243 | On The Data-Efficiency with Contrastive Image Transformation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the end, we propose a contrastive invariant transformation (CoIT), a simple yet promising learnable data augmentation combined with standard model-free algorithms to improve sample-efficiency. |
Sicong Liu; Xi Sheryl Zhang; Yushuo Li; Yifan Zhang; Jian Cheng; |
244 | Dataless Knowledge Fusion By Merging Weights of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. |
Xisen Jin; Pengxiang Cheng; Daniel Preotiuc-Pietro; Xiang Ren; |
245 | Long Range Language Modeling Via Gated State Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. |
Harsh Mehta; Ankit Gupta; Ashok Cutkosky; Behnam Neyshabur; |
246 | Generalization Bounds for Federated Learning: Fast Rates, Unparticipating Clients and Unbounded Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a theoretical analysis of generalization error of {federated learning}, which captures both heterogeneity and relatedness of the distributions. |
Xiaolin Hu; Shaojie Li; Yong Liu; |
247 | More ConvNets in The 2020s: Scaling Up Kernels Beyond 51×51 Using Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the possibility of training extreme convolutions larger than 31×31 and test whether the performance gap can be eliminated by strategically enlarging convolutions. |
Shiwei Liu; Tianlong Chen; Xiaohan Chen; Xuxi Chen; Qiao Xiao; Boqian Wu; Tommi Kärkkäinen; Mykola Pechenizkiy; Decebal Constantin Mocanu; Zhangyang Wang; |
248 | Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-Free RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. In this work, we make a first attempt to answer this question. |
Ruiquan Huang; Jing Yang; Yingbin Liang; |
249 | CUTS: Neural Causal Discovery from Unstructured Time-Series Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing methods assume structured input data and degenerate greatly when encountering data with randomly missing entries or non-uniform sampling frequencies, which hampers their applications in real scenarios. To address this issue, here we present CUTS, a neural Granger causal discovery algorithm to jointly impute unobserved data points and build causal graphs, via plugging in two mutually boosting modules in an iterative framework: (i) Latent data prediction stage: designs a Delayed Supervision Graph Neural Network (DSGNN) to hallucinate and register unstructured data which might be of high dimension and with complex distribution; (ii) Causal graph fitting stage: builds a causal adjacency matrix with imputed data under sparse penalty. |
Cheng Yuxiao; Runzhao Yang; Tingxiong Xiao; Zongren Li; Jinli Suo; Kunlun He; Qionghai Dai; |
250 | A Kernel Perspective of Skip Connections in Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we study their properties through their Gaussian Process and Neural Tangent kernels. |
Daniel Barzilai; Amnon Geifman; Meirav Galun; Ronen Basri; |
251 | RLx2: Training A Sparse Deep Reinforcement Learning Model from Scratch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel sparse DRL training framework, “the Rigged Reinforcement Learning Lottery” (RLx2), which builds upon gradient-based topology evolution and is capable of training a sparse DRL model based entirely on a sparse network. |
Yiqin Tan; Pihe Hu; Ling Pan; Jiatai Huang; Longbo Huang; |
252 | NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a unified framework of synthesizing and manipulating voice signals from analysis features, dubbed NANSY++. |
Hyeong-Seok Choi; Jinhyeok Yang; Juheon Lee; Hyeongju Kim; |
253 | Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SE(3)-equivariant models for visual robotic manipulation from point clouds that can be trained fully end-to-end. |
Hyunwoo Ryu; Hong-in Lee; Jeong-Hoon Lee; Jongeun Choi; |
254 | Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has achieved remarkable zero-shot performance with multi-task prompted pretraining, but little has been understood. For the first time, we show that training on a small number of key tasks beats using all the training tasks, while removing these key tasks substantially hurts performance. |
Jing Zhou; Zongyu Lin; Yanan Zheng; Jian Li; Zhilin Yang; |
255 | Characterizing The Spectrum of The NTK Via A Power Series Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Under mild conditions on the network initialization we derive a power series expansion for the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite width limit. |
Michael Murray; Hui Jin; Benjamin Bowman; Guido Montufar; |
256 | Corrupted Image Modeling for Self-Supervised Visual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. |
Yuxin Fang; Li Dong; Hangbo Bao; Xinggang Wang; Furu Wei; |
257 | Compositional Task Representations for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel prompt-free approach, Compositional Task Representations (CTR), that employs multi-task training to learn a discrete, compositional codebook. |
NAN SHAO; Zefan Cai; Hanwei xu; Chonghua Liao; Yanan Zheng; Zhilin Yang; |
258 | Equivariant Hypergraph Diffusion Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by hypergraph diffusion algorithms, this work proposes a new HNN architecture named ED-HNN, which provably approximates any continuous equivariant hypergraph diffusion operators that can model a wide range of higher-order relations. |
Peihao Wang; Shenghao Yang; Yunyu Liu; Zhangyang Wang; Pan Li; |
259 | TextShield: Beyond Successfully Detecting Adversarial Sentences in NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, {the core limitation of previous detection methods is being incapable of giving correct predictions on adversarial sentences unlike defense methods from other paradigms.} To solve this issue, this paper proposes TextShield: (1) we discover a link between text attack and saliency information, and then we propose a saliency-based detector, which can effectively detect whether an input sentence is adversarial or not. (2) We design a saliency-based corrector, which converts the detected adversary sentences to benign ones. |
Lingfeng Shen; Ze Zhang; Haiyun Jiang; Ying Chen; |
260 | REVISITING PRUNING AT INITIALIZATION THROUGH THE LENS OF RAMANUJAN GRAPH Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand the underlying mechanism of PaI, we propose to interpret it through the lens of the Ramanujan Graph – a class of expander graphs that are sparse while being highly connected. |
Duc N.M Hoang; Shiwei Liu; Radu Marculescu; Zhangyang Wang; |
261 | Discovering Latent Knowledge in Language Models Without Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. |
Collin Burns; Haotian Ye; Dan Klein; Jacob Steinhardt; |
262 | How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose CIDER, a novel representation learning framework that exploits hyperspherical embeddings for OOD detection. |
Yifei Ming; Yiyou Sun; Ousmane Dia; Yixuan Li; |
263 | Automatic Chain of Thought Prompting in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the effect of such mistakes, we investigate various principles for automatically constructing demonstrations and find that diversity matters. Inspired by these findings, we propose an automatic CoT prompting method called Auto-CoT. |
Zhuosheng Zhang; Aston Zhang; Mu Li; Alex Smola; |
264 | Parameter-Efficient Fine-Tuning Design Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings. |
Jiaao Chen; Aston Zhang; Xingjian Shi; Mu Li; Alex Smola; Diyi Yang; |
265 | Learning Multimodal Data Augmentation in Feature Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. |
Zichang Liu; Zhiqiang Tang; Xingjian Shi; Aston Zhang; Mu Li; Anshumali Shrivastava; Andrew Gordon Wilson; |
266 | AIM: Adapting Image Models for Efficient Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to Adapt pre-trained Image Models (AIM) for efficient video understanding. |
Taojiannan Yang; Yi Zhu; Yusheng Xie; Aston Zhang; Chen Chen; Mu Li; |
267 | Factorized Fourier Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Factorized Fourier Neural Operator (F-FNO), a learning-based approach for simulating partial differential equations (PDEs). |
Alasdair Tran; Alexander Mathews; Lexing Xie; Cheng Soon Ong; |
268 | FaiREE: Fair Classification with Finite-sample and Distribution-free Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FaiREE, a fair classification algorithm which can satisfy group fairness constraints with finite-sample and distribution-free theoretical guarantees. |
Puheng Li; James Zou; Linjun Zhang; |
269 | Exponential Generalization Bounds with Near-Optimal Rates for $L_q$-Stable Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, a natural question we would like to address in this paper is whether it is possible to derive near-optimal exponential generalization bounds for $L_q$-stable learning algorithms. As the core contribution of the present work, we give an affirmative answer to this question by developing strict analogues of the near-optimal generalization and risk bounds of uniformly stable algorithms for $L_q$-stable algorithms. |
Xiaotong Yuan; Ping Li; |
270 | Equal Improvability: A New Fairness Notion Considering The Long-term Impact Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To find a classifier that satisfies the EI requirement, we propose and study three different approaches that solve EI regularized optimization problems. |
Ozgur Guldogan; Yuchen Zeng; Jy-yong Sohn; Ramtin Pedarsani; Kangwook Lee; |
271 | Riemannian Metric Learning Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an optimal transport-based model for learning a metric tensor from cross-sectional samples of evolving probability measures on a common Riemannian manifold. |
Christopher Scarvelis; Justin Solomon; |
272 | MaskViT: Masked Visual Pre-Training for Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work shows that we can create good video prediction models by pre-training transformers via masked visual modeling. |
Agrim Gupta; Stephen Tian; Yunzhi Zhang; Jiajun Wu; Roberto Martín-Martín; Li Fei-Fei; |
273 | Prompting GPT-3 To Be Reliable Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish simple and effective prompting methods to make GPT-3 reliable in terms of: robustness, fairness, calibration, factuality. |
Chenglei Si; Zhe Gan; Zhengyuan Yang; Shuohang Wang; Jianfeng Wang; Jordan Lee Boyd-Graber; Lijuan Wang; |
274 | Teacher Guided Training: An Efficient Framework for Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. |
Manzil Zaheer; Ankit Singh Rawat; Seungyeon Kim; Chong You; Himanshu Jain; Andreas Veit; Rob Fergus; Sanjiv Kumar; |
275 | Sparsity-Constrained Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose in this paper a new approach for OT with explicit cardinality constraints on the transportation plan. |
Tianlin Liu; Joan Puigcerver; Mathieu Blondel; |
276 | Turning The Curse of Heterogeneity in Federated Learning Into A Blessing for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, a notorious challenge in FL is data heterogeneity where each client collects non-identically and independently distributed (non-iid) data. We propose to take advantage of such heterogeneity and turn the curse into a blessing that facilitates OoD detection in FL. |
Shuyang Yu; Junyuan Hong; Haotao Wang; Zhangyang Wang; Jiayu Zhou; |
277 | Unbiased Stochastic Proximal Solver for Graph Neural Networks with Equilibrium States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such weakness limits the scalability of the implicit graph models. To tackle such limitations, we propose two unbiased stochastic proximal solvers inspired by the stochastic proximal gradient descent method and its variance reduction variant called USP and USP-VR solvers. |
Mingjie Li; Yifei Wang; Yisen Wang; Zhouchen Lin; |
278 | Asynchronous Distributed Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a remedy, we propose Asynchronous Distributed Bilevel Optimization (ADBO) algorithm. |
Yang Jiao; Kai Yang; Tiancheng Wu; Dongjin Song; Chengtao Jian; |
279 | Relative Behavioral Attributes: Filling The Gap Between Symbolic Goal Specification and Reward Learning from Human Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose two different parametric methods that can potentially encode any kind of behavioral attributes from ordered behavior clips. |
Lin Guan; Karthik Valmeekam; Subbarao Kambhampati; |
280 | Neural Lagrangian Schr\{o}dinger Bridge: Diffusion Modeling for Population Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To satisfy these requirements of the sample trajectories, we formulate the Lagrangian Schrödinger bridge (LSB) problem and propose to solve it approximately by modeling the advection-diffusion process with regularized neural SDE.One of the main difficulties in analyzing population dynamics is that we can only obtain observation data with coarse time intervals from fixed-point observations due to experimental costs or measurement constraints. |
Takeshi Koshizuka; Issei Sato; |
281 | How to Prepare Your Task Head for Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze how the choice of task head controls feature adaptation and hence influences the downstream performance. |
Yi Ren; Shangmin Guo; Wonho Bae; Danica J. Sutherland; |
282 | Learning Fast and Slow for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, inspired by the Complementary Learning Systems (CLS) theory, we propose Fast and Slow learning Network (FSNet) as a novel framework to address the challenges of online forecasting. |
Quang Pham; Chenghao Liu; Doyen Sahoo; Steven Hoi; |
283 | Gradient-based Optimization Is Not Necessary for Generalization in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is commonly believed that the implicit regularization of optimizers is needed for neural networks to generalize in the overparameterized regime. In this paper, we observe experimentally that this implicit regularization behavior is {\em generic}, i.e. it does not depend strongly on the choice of optimizer. |
Ping-yeh Chiang; Renkun Ni; David Yu Miller; Arpit Bansal; Jonas Geiping; Micah Goldblum; Tom Goldstein; |
284 | Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a structural equation model, called Rhino, which combines vector auto-regression, deep learning and variational inference to model non-linear relationships with instantaneous effects while allowing the noise distribution to be modulated by history observations. |
Wenbo Gong; Joel Jennings; Cheng Zhang; Nick Pawlowski; |
285 | Mitigating Memorization of Noisy Labels Via Regularization Between Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is non-trivial to design a neural network with the best capacity given an arbitrary task. To circumvent this dilemma, instead of changing the model architecture, we decouple DNNs into an encoder followed by a linear classifier and propose to restrict the function space of a DNN by a representation regularizer. |
Hao Cheng; Zhaowei Zhu; Xing Sun; Yang Liu; |
286 | Backpropagation Through Combinatorial Algorithms: Identity with Projection Works Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. |
Subham Sekhar Sahoo; Anselm Paulus; Marin Vlastelica; Vít Musil; Volodymyr Kuleshov; Georg Martius; |
287 | BSTT: A Bayesian Spatial-Temporal Transformer for Sleep Staging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Bayesian spatial-temporal relation inference neural network, named Bayesian spatial-temporal transformer (BSTT), for sleep staging. |
Yuchen Liu; Ziyu Jia; |
288 | Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel noise-robust re-weighting framework SunGen to automatically construct high-quality data for zero-shot classification problems. |
Jiahui Gao; Renjie Pi; LIN Yong; Hang Xu; Jiacheng Ye; Zhiyong Wu; WEIZHONG ZHANG; Xiaodan Liang; Zhenguo Li; Lingpeng Kong; |
289 | H2RBox: Horizonal Box Annotation Is All You Need for Oriented Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection. This paper proposes a simple yet effective oriented object detection approach called H2RBox merely using horizontal box annotation for weakly-supervised training, which closes the above gap and shows competitive performance even against those trained with rotated boxes. |
Xue Yang; Gefan Zhang; Wentong Li; Yue Zhou; Xuehui Wang; Junchi Yan; |
290 | IDEAL: Query-Efficient Data-Free Learning from Black-Box Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, these works require a large number of queries to the teacher model, which incurs significant monetary and computational costs. To address these problems, we propose a novel method called \emph{query-effIcient Data-free lEarning blAck-box modeLs} (IDEAL), which aims to query-efficiently learn from black-box model APIs to train a good student without any real data. |
Jie Zhang; Chen Chen; Lingjuan Lyu; |
291 | Scaling Laws in Mean-Field Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we attempt to bridge the two largely independently evolving fields of finite-agent and infinite-agent games, by studying the scaling laws in mean-field games. |
Pengdeng Li; Xinrun Wang; Shuxin Li; Hau Chan; Bo An; |
292 | Towards Addressing Label Skews in One-Shot Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the limited number of classes in each party, the local models misclassify the data from unseen classes into seen classes, which leads to very ineffective global models from voting. To address the label skew issue in one-shot FL, we propose a novel approach named FedOV which generates diverse outliers and introduces them as an additional unknown class in local training to improve the voting performance. |
Yiqun Diao; Qinbin Li; Bingsheng He; |
293 | Sequential Attention for Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. |
Taisuke Yasuda; Mohammadhossein Bateni; Lin Chen; Matthew Fahrbach; Gang Fu; Vahab Mirrokni; |
294 | Deep Transformers Without Shortcuts: Modifying Self-attention for Faithful Signal Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And so the question remains: \emph{is it possible to train deep vanilla transformers?} We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. |
Bobby He; James Martens; Guodong Zhang; Aleksandar Botev; Andrew Brock; Samuel L Smith; Yee Whye Teh; |
295 | Approximation and Non-parametric Estimation of Functions Over High-dimensional Spheres Via Deep ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a new approximation and estimation analysis of deep feed-forward neural networks (FNNs) with the Rectified Linear Unit (ReLU) activation. |
Namjoon Suh; Tian-Yi Zhou; Xiaoming Huo; |
296 | Specformer: Spectral Graph Neural Networks Meet Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, these filters are often constructed based on some fixed-order polynomials, which have limited expressiveness and flexibility. To tackle these issues, we introduce Specformer, which effectively encodes the set of all eigenvalues and performs self-attention in the spectral domain, leading to a learnable set-to-set spectral filter. |
Deyu Bo; Chuan Shi; Lele Wang; Renjie Liao; |
297 | MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an embarrassingly simple, yet hugely effective initialization method for GNN training acceleration, called MLPInit. |
Xiaotian Han; Tong Zhao; Yozen Liu; Xia Hu; Neil Shah; |
298 | Empowering Graph Representation Learning with Test-Time Graph Transformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent efforts have been made on tackling these issues from a modeling perspective which requires additional cost of changing model architectures or re-training model parameters. In this work, we provide a data-centric view to tackle these issues and propose a graph transformation framework named GTrans which adapts and refines graph data at test time to achieve better performance. |
Wei Jin; Tong Zhao; Jiayuan Ding; Yozen Liu; Jiliang Tang; Neil Shah; |
299 | Improving The Calibration of Fine-tuned Language Models Via Denoising Variational Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of calibrating fine-tuned language models. |
Guande He; Jianfei Chen; Jun Zhu; |
300 | Softened Symbol Grounding for Neuro-symbolic Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel, softened symbol grounding process, enabling the interactions of the two worlds in a mutually beneficial manner. |
Zenan Li; Yuan Yao; Taolue Chen; Jingwei Xu; Chun Cao; Xiaoxing Ma; Jian L\{u}; |
301 | Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When applied to various downstream tasks, these frameworks rarely perform equally well for every task, because one philosophy may not span the extensive knowledge required for all tasks. In light of this, we introduce ParetoGNN, a multi-task SSL framework for node representation learning over graphs. |
Mingxuan Ju; Tong Zhao; Qianlong Wen; Wenhao Yu; Neil Shah; Yanfang Ye; Chuxu Zhang; |
302 | Learning with Logical Constraints But Without Shortcut Satisfaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new framework for learning with logical constraints. |
Zenan Li; Zehua Liu; Yuan Yao; Jingwei Xu; Taolue Chen; Xiaoxing Ma; Jian L\{u}; |
303 | Link Prediction with Non-Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extensively evaluate the performance of existing non-contrastive methods for link prediction in both transductive and inductive settings. |
William Shiao; Zhichun Guo; Tong Zhao; Evangelos E. Papalexakis; Yozen Liu; Neil Shah; |
304 | A Neural Mean Embedding Approach for Back-door and Front-door Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment. |
Liyuan Xu; Arthur Gretton; |
305 | Can We Find Nash Equilibria at A Linear Rate in Markov Games? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study decentralized learning in two-player zero-sum discounted Markov games where the goal is to design a policy optimization algorithm for either agent satisfying two properties. |
Zhuoqing Song; Jason D. Lee; Zhuoran Yang; |
306 | Weighted Ensemble Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. |
Yangjun Ruan; Saurabh Singh; Warren Richard Morningstar; Alexander A Alemi; Sergey Ioffe; Ian Fischer; Joshua V. Dillon; |
307 | $k$NN Prompting: Learning Beyond The Context with Nearest Neighbor Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In-Context Learning, which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing and standard utilization of large language models. In this paper, we disclose an actual predicament for this typical usage that it can not scale up with training data due to context length restrictions. |
Benfeng Xu; Quan Wang; Zhendong Mao; Yajuan Lyu; Qiaoqiao She; Yongdong Zhang; |
308 | Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This formulation overlooks additional contextual information from neighboring sub-graphs of entity variables x, y and z. Intuitively, there is a large gap here, as local sub-graphs have been found to provide important information for knowledge graph completion. Inspired by these observations, we propose Logical Entity RePresentation (LERP) to encode contextual information of entities in the knowledge graph. |
Chi Han; Qizheng He; Charles Yu; Xinya Du; Hanghang Tong; Heng Ji; |
309 | Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing DABP approaches mostly rely on knowledge distillation (KD) from the black-box predictor, i.e., training the model with its noisy target-domain predictions, which however inevitably introduces the confirmation bias accumulated from the prediction noises and leads to degrading performance. To mitigate such bias, we propose a new strategy, \textit{divide-to-adapt}, that purifies cross-domain knowledge distillation by proper domain division. |
Jianfei Yang; Xiangyu Peng; Kai Wang; Zheng Zhu; Jiashi Feng; Lihua Xie; Yang You; |
310 | A Simple Yet Powerful Deep Active Learning With Snapshots Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlight the effectiveness of snapshot ensembles for deep active learning. |
Seohyeon Jung; Sanghyun Kim; Juho Lee; |
311 | Domain-Indexing Variational Bayes for Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such domain indices are not always available. To address this challenge, we first provide a formal definition of domain index from the probabilistic perspective, and then propose an adversarial variational Bayesian framework that infers domain indices from multi-domain data, thereby providing additional insight on domain relations and improving domain adaptation performance. |
Zihao Xu; Hao He; Guang-Yuan Hao; Hao Wang; |
312 | Over-parameterized Model Optimization with Polyak-{\L}ojasiewicz Condition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a new regularized risk minimization for over-parameterized models with a novel PL regularization and implements it via network pruning guided by PL-based condition number. |
Yixuan Chen; Yubin Shi; Mingzhi Dong; Xiaochen Yang; Dongsheng Li; Yujiang Wang; Robert Dick; Qin Lv; Yingying Zhao; Fan Yang; Ning Gu; Li Shang; |
313 | RGI: Robust GAN-inversion for Mask-free Image Inpainting and Unsupervised Pixel-wise Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown \textit{gross} corruptions, where a small fraction of pixels are completely corrupted. |
Shancong Mou; Xiaoyi Gu; Meng Cao; Haoping Bai; Ping Huang; Jiulong Shan; Jianjun Shi; |
314 | Encoding Recurrence Into Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding … |
Feiqing Huang; Kexin Lu; Yuxi CAI; Zhen Qin; Yanwen Fang; Guangjian Tian; Guodong Li; |
315 | GAIN: On The Generalization of Instructional Action Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a benchmark, named GAIN, to analyze the GeneralizAbility of INstructional action understanding models. |
Junlong Li; Guangyi Chen; Yansong Tang; Jinan Bao; Kun Zhang; Jie Zhou; Jiwen Lu; |
316 | Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method to measure uncertainty in large language models. |
Lorenz Kuhn; Yarin Gal; Sebastian Farquhar; |
317 | Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in reality, the observation coexistence will be limited due to the high cost of acquiring expert observations. In this work, we study this challenging problem with limited observation coexistence under heterogeneous observations: Heterogeneously Observable Imitation Learning (HOIL). |
Xin-Qiang Cai; Yao-Xiang Ding; Zixuan Chen; Yuan Jiang; Masashi Sugiyama; Zhi-Hua Zhou; |
318 | Pre-training Via Denoising for Molecular Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. |
Sheheryar Zaidi; Michael Schaarschmidt; James Martens; Hyunjik Kim; Yee Whye Teh; Alvaro Sanchez-Gonzalez; Peter Battaglia; Razvan Pascanu; Jonathan Godwin; |
319 | A New Characterization of The Edge of Stability Based on A Sharpness Measure Aware of Batch Gradient Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new sharpness measure (interaction-aware-sharpness) aware of the \emph{interaction} between the batch gradient distribution and the loss landscape geometry. |
Sungyoon Lee; Cheongjae Jang; |
320 | Equivariant Energy-Guided SDE for Inverse Molecular Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose equivariant energy-guided stochastic differential equations (EEGSDE), a flexible framework for controllable 3D molecule generation under the guidance of an energy function in diffusion models. |
Fan Bao; Min Zhao; Zhongkai Hao; Peiyao Li; Chongxuan Li; Jun Zhu; |
321 | Mutual Partial Label Learning with Competitive Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a more realistic PLL scenario with competitive noise labels that are more difficult to distinguish from the true label than the random noise labels. |
Yan Yan; Yuhong Guo; |
322 | ImaginaryNet: Learning Object Detectors Without Real Images and Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we define a novel paradigm as Imaginary-Supervised Object Detection (ISOD), where no real images and manual annotations are used for training object detectors. To resolve this challenge, we propose ImaginaryNet, a framework to learn object detectors by combining pretrained language model as well as text-to-image synthesis models. |
Minheng Ni; Zitong Huang; Kailai Feng; Wangmeng Zuo; |
323 | Delving Into Semantic Scale Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we define and quantify the semantic scale of classes, which is equivalent to the feature diversity of classes. |
Yanbiao Ma; Licheng Jiao; Fang Liu; Yuxin Li; Shuyuan Yang; Xu Liu; |
324 | Agnostic Learning of General ReLU Activation Using Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. |
Pranjal Awasthi; Alex Tang; Aravindan Vijayaraghavan; |
325 | Neural-based Classification Rule Learning for Sequential Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel differentiable fully interpretable method to discover both local and global patterns (i.e. catching a relative or absolute temporal dependency) for rule-based binary classification. |
Marine Collery; Philippe Bonnard; François Fages; Remy Kusters; |
326 | Max-Margin Works While Large Margin Fails: Generalization Without Uniform Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails. Our main contribution is proving novel generalization bounds in two such settings, one linear, and one non-linear. |
Margalit Glasgow; Colin Wei; Mary Wootters; Tengyu Ma; |
327 | Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan. Taking advantage of this property, we propose Adaptive Subgoal Search (AdaSubS), a search method that adaptively adjusts the planning horizon. |
Michał Zawalski; Michał Tyrolski; Konrad Czechowski; Damian Stachura; Piotr Piękos; Tomasz Odrzygóźdź; Yuhuai Wu; Łukasz Kuciński; Piotr Miłoś; |
328 | Exploring The Limits of Differentially Private Deep Learning with Group-wise Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore whether further improvements along the two axes are possible and provide affirmative answers leveraging two instantiations of \emph{group-wise clipping}. |
Jiyan He; Xuechen Li; Da Yu; Huishuai Zhang; Janardhan Kulkarni; Yin Tat Lee; Arturs Backurs; Nenghai Yu; Jiang Bian; |
329 | Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel algorithm LSVI-RFE under the linear MDP setting, where the transition probability and reward functions are linear in a feature mapping. |
Pihe Hu; Yu Chen; Longbo Huang; |
330 | Localized Randomized Smoothing for Collective Robustness Certification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a more general collective robustness certificate for all types of models and further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). |
Jan Schuchardt; Tom Wollschläger; Aleksandar Bojchevski; Stephan Günnemann; |
331 | Towards Open Temporal Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a general and principled learning approach for open temporal graphs, called OTGNet, with the goal of addressing the above two challenges. |
Kaituo Feng; Changsheng Li; Xiaolu Zhang; JUN ZHOU; |
332 | Efficiently Controlling Multiple Risks with Pareto Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on recent results in distribution-free, finite-sample risk control for general losses, we propose Pareto Testing: a two-stage process which combines multi-objective optimization with multiple hypothesis testing. |
Bracha Laufer-Goldshtein; Adam Fisch; Regina Barzilay; Tommi S. Jaakkola; |
333 | Bridge The Inference Gaps of Neural Processes Via Expectation Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The topic of inference suboptimality and an analysis of the NP from the optimization objective perspective has hardly been studied in earlier work. To fix this issue, we propose a surrogate objective of the target log-likelihood of the meta dataset within the expectation maximization framework. |
Qi Wang; Marco Federici; Herke van Hoof; |
334 | Discovering Policies with DOMiNO Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose a Reinforcement Learning (RL) agent that can discover complex behaviours in a rich environment with a simple reward function. |
Tom Zahavy; Yannick Schroecker; Feryal Behbahani; Kate Baumli; Sebastian Flennerhag; Shaobo Hou; Satinder Singh; |
335 | Neural Architecture Design and Robustness: A Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, the aim of this paper is to facilitate better streamlined research on architectural design choices with respect to their impact on robustness as well as, for example, the evaluation of surrogate measures for robustness.We evaluate all these networks on a range of common adversarial attacks and corruption types and introduce a database on neural architecture design and robustness evaluations. |
Steffen Jung; Jovita Lukasik; Margret Keuper; |
336 | A Unified Framework of Soft Threshold Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reformulate soft threshold pruning as an implicit optimization problem solved using the *Iterative Shrinkage-Thresholding* Algorithm (ISTA), a classic method from the fields of sparse recovery and compressed sensing. |
Yanqi Chen; Zhaofei Yu; Wei Fang; Zhengyu Ma; Xiawu Zheng; Yonghong Tian; |
337 | Improving Out-of-distribution Generalization with Indirection Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generic module named Indirection Layer (InLay), which leverages indirection and data internal relationships to effectively construct symbolic indirect representations to improve out-of-distribution generalization capabilities of various neural architectures. |
Kha Pham; Hung Le; Man Ngo; Truyen Tran; |
338 | Accelerating Guided Diffusion Sampling with Splitting Numerical Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the contrary, we discover that the same techniques do not work for guided sampling, and little has been explored about its acceleration. This paper explores the culprit of this problem and provides a solution based on operator splitting methods, motivated by our key finding that high-order numerical methods are unsuitable for the conditional function. |
Suttisak Wizadwongsa; Supasorn Suwajanakorn; |
339 | Batch Multivalid Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. |
Christopher Jung; Georgy Noarov; Ramya Ramalingam; Aaron Roth; |
340 | Long-Tailed Learning Requires Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple data model inspired from natural data such as text or images, and use it to study the importance of learning features in order to achieve good generalization. |
Thomas Laurent; James von Brecht; Xavier Bresson; |
341 | The Onset of Variance-Limited Behavior for Networks in The Lazy and Rich Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, at a critical sample size $P^*$, the finite-width network generalization begins to worsen compared to the infinite width performance. In this work, we empirically study the transition from the infinite width behavior to this variance-limited regime as a function of sample size $P$ and network width $N$. |
Alexander Atanasov; Blake Bordelon; Sabarish Sainathan; Cengiz Pehlevan; |
342 | On Accelerated Perceptrons and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There have been several recent works that managed to improve this rate by a quadratic factor, to $\Omega(\sqrt{\log n}/\gamma)$, with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through \emph{optimistic} online learning. |
Guanghui Wang; Rafael Hanashiro; Etash Kumar Guha; Jacob Abernethy; |
343 | Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the very first step toward few-shot high-dimensional sequence forecasting by a Bayesian meta-learning model that learns the process of learning latent dynamics that changes with the small number of observations that are available. |
Xiajun Jiang; Ryan Missel; Zhiyuan Li; Linwei Wang; |
344 | Continual Unsupervised Disentangling of Self-Organizing Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that this is because existing approaches treat continually-arrived data independently, without considering how they are related based on the underlying semantic factors. We address this by a new generative model describing a topologically-connected mixture of spike-and-slab distributions in the latent space, learned end-to-end in a continual fashion via principled variational inference. |
Zhiyuan Li; Xiajun Jiang; Ryan Missel; Prashnna Kumar Gyawali; Nilesh Kumar; Linwei Wang; |
345 | Learning to Decompose Visual Features with Latent Textual Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address them, we propose Decomposed Feature Prompting (DeFo). |
Feng Wang; Manling Li; Xudong Lin; Hairong Lv; Alex Schwing; Heng Ji; |
346 | Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. |
Spencer Frei; Gal Vardi; Peter Bartlett; Nathan Srebro; Wei Hu; |
347 | Is Attention All That NeRF Needs? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Generalizable NeRF Transformer (GNT), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to renders novel views on the fly from source views. |
Mukund Varma T; Peihao Wang; Xuxi Chen; Tianlong Chen; Subhashini Venugopalan; Zhangyang Wang; |
348 | Squeeze Training for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we highlight that some collaborative examples, nearly perceptually indistinguishable from both adversarial and benign examples yet show extremely lower prediction loss, can be utilized to enhance adversarial training. |
Qizhang Li; Yiwen Guo; Wangmeng Zuo; Hao Chen; |
349 | Domain Generalization Via Heckman-type Selection Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate DG as a sample selection problem where each domain is sampled from a common underlying population through non-random sampling probabilities that correlate with both the features and the outcome. |
Hyungu Kahng; Hyungrok Do; Judy Zhong; |
350 | Context-enriched Molecule Representations Improve Few-shot Drug Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new method for few-shot drug discovery. |
Johannes Schimunek; Philipp Seidl; Lukas Friedrich; Daniel Kuhn; Friedrich Rippmann; Sepp Hochreiter; Günter Klambauer; |
351 | Do We Really Need Complicated Model Architectures For Temporal Networks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose \oure, a conceptually and technically simple architecture that consists of three components: \circled{1} a \emph{link-encoder} that is only based on multi-layer perceptrons (MLP) to summarize the information from temporal links, \circled{2} a \emph{node-encoder} that is only based on neighbor mean-pooling to summarize node information, and \circled{3} an MLP-based \emph{link classifier} that performs link prediction based on the outputs of the encoders. |
Weilin Cong; Si Zhang; Jian Kang; Baichuan Yuan; Hao Wu; Xin Zhou; Hanghang Tong; Mehrdad Mahdavi; |
352 | Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we examine the importance of having unbiased quantization in quantized neural network training, where to maintain it, and how to combine it with logarithmic quantization. |
Brian Chmiel; Ron Banner; Elad Hoffer; Hilla Ben-Yaacov; Daniel Soudry; |
353 | Learning with Auxiliary Activation for Memory-Efficient Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new learning rule which significantly reduces memory requirements while closely matching the performance of backpropagation. |
Sunghyeon Woo; Dongsuk Jeon; |
354 | TRANSFORMER-PATCHER: ONE MISTAKE WORTH ONE NEURON Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus a preferable solution is to rectify the mistakes as soon as they appear nonstop. Therefore, we extend the existing ME into the Sequential Model Editing (SME) to help develop more practical editing methods. |
Zeyu Huang; Yikang Shen; Xiaofeng Zhang; Jie Zhou; Wenge Rong; Zhang Xiong; |
355 | An Additive Instance-Wise Approach to Multi-class Model Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. |
Vy Vo; Van Nguyen; Trung Le; Quan Hung Tran; Reza Haf; Seyit Camtepe; Dinh Phung; |
356 | Guiding Continuous Operator Learning Through Physics-based Boundary Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Boundary enforcing Operator Network (BOON) that enables the BC satisfaction of neural operators by making structural changes to the operator kernel. |
Nadim Saad; Gaurav Gupta; Shima Alizadeh; Danielle C. Maddix; |
357 | Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, by contrast, we opt for the diversity in substitute models and advocate to attack a Bayesian model for achieving desirable transferability. |
Qizhang Li; Yiwen Guo; Wangmeng Zuo; Hao Chen; |
358 | Sublinear Algorithms for Kernel Matrices Via Kernel Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a framework for using recently developed tools for kernel density estimation to solve downstream kernel problems in sub-quadratic time. |
Ainesh Bakshi; Piotr Indyk; Praneeth Kacham; Sandeep Silwal; Samson Zhou; |
359 | Choreographer: Learning and Adapting Skills in Imagination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, it is unclear how to leverage the learned skill behaviors for adapting to downstream tasks in a data-efficient manner. We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. |
Pietro Mazzaglia; Tim Verbelen; Bart Dhoedt; Alexandre Lacoste; Sai Rajeswar; |
360 | DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint – Dropping Intermediate Tensors (DropIT). |
Joya Chen; Kai Xu; Yuhui Wang; Yifei Cheng; Angela Yao; |
361 | Exploring Temporally Dynamic Data Augmentation for Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These variations should be generated as diverse as possible using fewer additional hyper-parameters during training. Through this motivation, we propose a simple yet effective video data augmentation framework, DynaAugment. |
Taeoh Kim; Jinhyung Kim; Minho Shim; Sangdoo Yun; Myunggu Kang; Dongyoon Wee; Sangyoun Lee; |
362 | Computational Language Acquisition with Theory of Mind Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Drawing inspiration from the modern operationalized versions of ToM implemented in Rabinowitz et al. (2018) and Zhu et al. (2021), we build language-learning agents equipped with ToM, and measure its effects on the learning process. |
Andy Liu; Hao Zhu; Emmy Liu; Yonatan Bisk; Graham Neubig; |
363 | Mind’s Eye: Grounded Language Model Reasoning Through Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Mind’s Eye, a paradigm to ground language model reasoning in the physical world. |
Ruibo Liu; Jason Wei; Shixiang Shane Gu; Te-Yen Wu; Soroush Vosoughi; Claire Cui; Denny Zhou; Andrew M. Dai; |
364 | Language Models Are Realistic Tabular Data Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose GReaT (Generation of Realistic Tabular data), which exploits an auto-regressive generative LLM to sample synthetic and yet highly realistic tabular data. |
Vadim Borisov; Kathrin Sessler; Tobias Leemann; Martin Pawelczyk; Gjergji Kasneci; |
365 | Aligning Model and Macaque Inferior Temporal Cortex Representations Improves Model-to-Human Behavioral Alignment and Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conducted chronic, large-scale multi-electrode recordings across the IT cortex in six non-human primates (rhesus macaques). |
Joel Dapello; Kohitij Kar; Martin Schrimpf; Robert Baldwin Geary; Michael Ferguson; David Daniel Cox; James J. DiCarlo; |
366 | SimpleKT: A Simple But Tough-to-Beat Baseline for Knowledge Tracing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, due to the lack of standardized evaluation protocol \citep{liu2022pykt}, there is no widely agreed KT baselines and published experimental comparisons become inconsistent and self-contradictory, i.e., the reported AUC scores of DKT on ASSISTments2009 range from 0.721 to 0.821 \citep{minn2018deep,yeung2018addressing}. Therefore, in this paper, we provide a strong but simple baseline method to deal with the KT task named \textsc{simpleKT}. |
Zitao Liu; Qiongqiong Liu; Jiahao Chen; Shuyan Huang; Weiqi Luo; |
367 | Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on our investigation, we propose to generate a cost volume from a long history of image observations, compensating for the coarse but efficient matching resolution with a more optimal multi-view matching setup. |
Jinhyung Park; Chenfeng Xu; Shijia Yang; Kurt Keutzer; Kris M. Kitani; Masayoshi Tomizuka; Wei Zhan; |
368 | Massively Scaling Heteroscedastic Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes. |
Mark Collier; Rodolphe Jenatton; Basil Mustafa; Neil Houlsby; Jesse Berent; Effrosyni Kokiopoulou; |
369 | Interpretable Single/Multi-label Text Classification with Unsupervised Constituent-label Alignments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, symbolic probabilistic models function with clear interpretability, but how to combine them with neural networks to enhance their performance remains to be explored. In this paper, we try to marry these two systems for text classification via structured language models. |
Xiang Hu; XinYu KONG; Kewei Tu; |
370 | Transformer Meets Boundary Value Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A Transformer-based deep direct sampling method is proposed for solving a class of boundary value inverse problem. |
Ruchi Guo; Shuhao Cao; Long Chen; |
371 | DAG Learning Via Sparse Relaxations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. |
Valentina Zantedeschi; Luca Franceschi; Jean Kaddour; Matt Kusner; Vlad Niculae; |
372 | Soft Neighbors Are Positive Supporters in Contrastive Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the instance discrimination framework and find the binary instance labeling insufficient to measure correlations between different samples. |
Chongjian GE; Jiangliu Wang; Zhan Tong; Shoufa Chen; Yibing Song; Ping Luo; |
373 | Finding The Global Semantic Representation in GAN Through Fréchet Mean Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In other words, in this disentangled space, there exists the global semantic basis as a vector space where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. |
Jaewoong Choi; Geonho Hwang; Hyunsoo Cho; Myungjoo Kang; |
374 | MARS: Meta-learning As Score Matching in The Function Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, existing approaches resort to meta-learning restrictive diagonal Gaussian priors, severely limiting their expressiveness and performance. To circumvent these issues, we approach meta-learning through the lens of functional Bayesian neural network inference which views the prior as a stochastic process and performs inference in the function space. |
Krunoslav Lehman Pavasovic; Jonas Rothfuss; Andreas Krause; |
375 | On The Effectiveness of Out-of-distribution Data in Self-supervised Long-tail Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an alternative but easy-to-use and effective solution, \textbf{C}ontrastive with \textbf{O}ut-of-distribution (OOD) data for \textbf{L}ong-\textbf{T}ail learning (COLT), which can effectively exploit OOD data to dynamically re-balance the feature space. |
Jianhong Bai; Zuozhu Liu; Hualiang Wang; Jin Hao; YANG FENG; Huanpeng Chu; Haoji Hu; |
376 | Faster Gradient-Free Methods for Escaping Saddle Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the case when calculations of explicit gradients are expensive or even infeasible, and only function values are accessible. |
Hualin Zhang; Bin Gu; |
377 | A View From Somewhere: Human-Centric Face Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to implicitly learn a set of continuous face-varying dimensions, without ever asking an annotator to explicitly categorize a person. |
Jerone Theodore Alexander Andrews; Przemyslaw Joniak; Alice Xiang; |
378 | Dynamical Systems Embedding with A Physics-informed Convolutional Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose, \texttt{phase2vec}, an embedding method that learns high-quality, physically-meaningful representations of dynamical systems without supervision. |
Matt Ricci; Noa Moriel; Zoe Piran; Mor Nitzan; |
379 | Mind The Pool: Convolutional Neural Networks Can Overfit Input Size Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue is inherent to pooling arithmetic, with standard downsampling layers playing a major role in favoring certain input sizes and skewing the weights accordingly. We present a solution to this problem by depriving these layers from the arithmetic cues they use to overfit the input size. |
Bilal Alsallakh; David Yan; Narine Kokhlikyan; Vivek Miglani; Orion Reblitz-Richardson; Pamela Bhattacharya; |
380 | Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we strive to improve the compositional skills of existing large-scale T2I models, specifically more accurate attribute binding and better image compositions. |
Weixi Feng; Xuehai He; Tsu-Jui Fu; Varun Jampani; Arjun Reddy Akula; Pradyumna Narayana; Sugato Basu; Xin Eric Wang; William Yang Wang; |
381 | A Theory of Dynamic Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to an extensive theoretical and empirical study of the static setting, the dynamic setting lags behind due to limited empirical studies and no apparent theoretical foundation to date. Responding to this deficit, we initiate a theoretical study of dynamic benchmarking. |
Ali Shirali; Rediet Abebe; Moritz Hardt; |
382 | LAVA: Data Valuation Without Pre-Specified Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. |
Hoang Anh Just; Feiyang Kang; Tianhao Wang; Yi Zeng; Myeongseob Ko; Ming Jin; Ruoxi Jia; |
383 | Multi-level Protein Structure Pre-training Via Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering protein sequences can determine multi-level structures, in this paper, we aim to realize the comprehensive potential of protein sequences for function prediction. |
Zeyuan Wang; Qiang Zhang; Shuang-Wei HU; Haoran Yu; Xurui Jin; Zhichen Gong; Huajun Chen; |
384 | Towards Robustness Certification Against Universal Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the problem of certifying neural network robustness against universal perturbations (UPs), which have been widely used in universal adversarial attacks and backdoor attacks. |
Yi Zeng; Zhouxing Shi; Ming Jin; Feiyang Kang; Lingjuan Lyu; Cho-Jui Hsieh; Ruoxi Jia; |
385 | AutoGT: Automated Graph Transformer Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of automated graph Transformer, for the first time. |
Zizhao Zhang; Xin Wang; Chaoyu Guan; Ziwei Zhang; Haoyang Li; Wenwu Zhu; |
386 | Blurring Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we show that blurring can equivalently be defined through a Gaussian diffusion process with non-isotropic noise. |
Emiel Hoogeboom; Tim Salimans; |
387 | Adversarial Imitation Learning with Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a novel method for policy learning that incorporates two different feedback types, namely \emph{demonstrations} and \emph{preferences}. |
Aleksandar Taranovic; Andras Gabor Kupcsik; Niklas Freymuth; Gerhard Neumann; |
388 | Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give the first constant factor approximate sketches for l1 and logistic regression in a turnstile stream with almost linear sketching dimension that result in an efficient optimization problem in the sketch space. |
Alexander Munteanu; Simon Omlor; David Woodruff; |
389 | On The Soft-Subnetwork for Few-Shot Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by Regularized Lottery Ticket Hypothesis, which states that competitive smooth (non-binary) subnetworks exist within a dense network, we propose a few-shot class-incremental learning method referred to as Soft-SubNetworks (SoftNet). |
Haeyong Kang; Jaehong Yoon; Sultan Rizky Hikmawan Madjid; Sung Ju Hwang; Chang D. Yoo; |
390 | Efficient Offline Policy Optimization with A Learned Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a regularized one-step model-based method that outperforms MuZero Unplugged on Atari benchmark. |
Zichen Liu; Siyi Li; Wee Sun Lee; Shuicheng YAN; Zhongwen Xu; |
391 | Bispectral Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural network architecture, Bispectral Neural Networks (BNNs) for learning representations that are invariant to the actions of compact commutative groups on the space over which a signal is defined. |
Sophia Sanborn; Christian A Shewmake; Bruno Olshausen; Christopher J. Hillar; |
392 | Learning Group Importance Using The Differentiable Hypergeometric Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the differentiable hypergeometric distribution. |
Thomas M. Sutter; Laura Manduchi; Alain Ryser; Julia E Vogt; |
393 | TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify that CBN and TBN are in a trade-off relationship and present a new test-time normalization (TTN) method that interpolates the statistics by adjusting the importance between CBN and TBN according to the domain-shift sensitivity of each BN layer. |
Hyesu Lim; Byeonggeun Kim; Jaegul Choo; Sungha Choi; |
394 | ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present ManiSkill2, the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. |
Jiayuan Gu; Fanbo Xiang; Zhan Ling; Xinyue Wei; Xiqiang Liu; Xuanlin Li; Rui Chen; Stone Tao; Tongzhou Mu; Pengwei Xie; Yunchao Yao; Yihe Tang; Xiaodi Yuan; Zhiao Huang; Hao Su; |
395 | MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the aforementioned issues, we propose MaskMix and Progressive Attention Labeling (PAL) in image and label space, respectively. |
Qihao Zhao; Yangyu Huang; Wei Hu; Fan Zhang; Jun Liu; |
396 | Flow Matching for Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. |
Yaron Lipman; Ricky T. Q. Chen; Heli Ben-Hamu; Maximilian Nickel; Matthew Le; |
397 | Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose compositional prompt tuning with motion cues: an extended prompt tuning paradigm for compositional predictions of video data. |
Kaifeng Gao; Long Chen; Hanwang Zhang; Jun Xiao; Qianru Sun; |
398 | Out-of-Distribution Detection and Selective Generation for Conditional Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the space of potential low-quality outputs is larger as arbitrary text can be generated and it is important to know when to trust the generated output. We present a highly accurate and lightweight OOD detection method for CLMs, and demonstrate its effectiveness on abstractive summarization and translation. |
Jie Ren; Jiaming Luo; Yao Zhao; Kundan Krishna; Mohammad Saleh; Balaji Lakshminarayanan; Peter J Liu; |
399 | Budgeted Training for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem by proposing a framework that enables the training process under any training budget, while achieving competitive model performances. |
zhuofan xia; Xuran Pan; Xuan Jin; Yuan He; Hui Xue’; Shiji Song; Gao Huang; |
400 | ODAM: Gradient-based Instance-Specific Visual Explanations for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Gradient-weighted Object Detector Activation Mapping (Grad-ODAM), a visualized explanation technique for interpreting the predictions of object detectors. |
Chenyang ZHAO; Antoni B. Chan; |
401 | Regression with Label Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the optimal mechanism takes the form of a randomized response on bins, and propose an efficient algorithm for finding the optimal bin values. |
Badih Ghazi; Pritish Kamath; Ravi Kumar; Ethan Leeman; Pasin Manurangsi; Avinash Varadarajan; Chiyuan Zhang; |
402 | Boosting Adversarial Transferability Using Dynamic Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we induce dynamic cues within the image models without sacrificing their original performance on images. |
Muzammal Naseer; Ahmad Mahmood; Salman Khan; Fahad Khan; |
403 | Towards Inferential Reproducibility of Machine Learning Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to shift from the goal of duplicating a SOTA training result without any changes to a new type of reproducibility called inferential reproducibility that treats performance variation depending on data characteristics, meta-parameter settings, and their interactions as an inherent and interesting feature of non-deterministic deep learning, not as a bug that needs to be resolved. |
Michael Hagmann; Philipp Meier; Stefan Riezler; |
404 | Mole-BERT: Rethinking Pre-training Graph Neural Networks for Molecules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explain the negative transfer in molecular graph pre-training and develop two novel pre-training strategies to alleviate this issue. |
Jun Xia; Chengshuai Zhao; Bozhen Hu; Zhangyang Gao; Cheng Tan; Yue Liu; Siyuan Li; Stan Z. Li; |
405 | The KFIoU Loss for Rotated Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an effective approximate SkewIoU loss based on Gaussian modeing and Kalman filter, which mainly consists of two items. |
Xue Yang; Yue Zhou; Gefan Zhang; Jirui Yang; Wentao Wang; Junchi Yan; XIAOPENG ZHANG; Qi Tian; |
406 | PowerQuant: Automorphism Search for Non-Uniform Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. |
Edouard YVINEC; Arnaud Dapogny; Matthieu Cord; Kevin Bailly; |
407 | Rethinking Skip Connection Model As A Learnable Markov Chain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we deep dive into the model’s behaviors with skip connections which can be formulated as a learnable Markov chain. |
Chen Dengsheng; Jie Hu; Wenwen Qiang; Xiaoming Wei; Enhua Wu; |
408 | Effects of Graph Convolutions in Multi-layer Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. |
Aseem Baranwal; Kimon Fountoulakis; Aukosh Jagannath; |
409 | Addressing Parameter Choice Issues in Unsupervised Domain Adaptation By Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While several heuristics exist that follow this strategy, methods are still missing that rely on thorough theories for bounding the target error. In this turn, we propose a method that extends weighted least squares to vector-valued functions, e.g., deep neural networks. |
Marius-Constantin Dinu; Markus Holzleitner; Maximilian Beck; Hoan Duc Nguyen; Andrea Huber; Hamid Eghbal-zadeh; Bernhard A. Moser; Sergei Pereverzyev; Sepp Hochreiter; Werner Zellinger; |
410 | Bayesian Semi-supervised Learning with A Principled Likelihood from A Generative Model of Data Curation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are thus able to introduce Bayesian SSL, which gives considerable improvements over standard SSL in the setting of 40 labelled points on CIFAR-10, with performance of $92.2\pm 0.3\%$ vs $88.6\%$ in the original FixMatch paper. |
Stoil Krasimirov Ganev; Laurence Aitchison; |
411 | Spherical Sliced-Wasserstein Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a SW discrepancy on the sphere using only tools intrinsic to the manifold. |
Clément Bonet; Paul Berg; Nicolas Courty; François Septier; Lucas Drumetz; Minh Tan Pham; |
412 | Scenario-based Question Answering with Interacting Contextual Properties Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although understanding the relationship between conditions is crucial for solving this challenging QA task, limited work has been done so far in modeling this. In this paper, we propose the T-Reasoner model, which solves this problem with three jointly learned modules: an entailment module which checks whether a condition has been satisfied by the scenario, a decoding module which locates eligible answers from documents, and a reasoning module which infers the relationship between conditions and performs a reasoning step to determine the logically consistent answers and identify missing conditions. |
Haitian Sun; William W. Cohen; Ruslan Salakhutdinov; |
413 | CktGNN: Circuit Graph Neural Network for Electronic Design Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. |
Zehao Dong; Weidong Cao; Muhan Zhang; Dacheng Tao; Yixin Chen; Xuan Zhang; |
414 | Prompt-to-Prompt Image Editing with Cross-Attention Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we pursue an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. |
Amir Hertz; Ron Mokady; Jay Tenenbaum; Kfir Aberman; Yael Pritch; Daniel Cohen-or; |
415 | Efficient Out-of-Distribution Detection Based on In-Distribution Data Patterns Memorization with Modern Hopfield Energy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing OOD methods refining the confidence estimation procedure from output logits with handpicked hyperparameters, we propose a new store-then-compare paradigm. |
Jinsong Zhang; Qiang Fu; Xu Chen; Lun Du; Zelin Li; Gang Wang; xiaoguang Liu; Shi Han; Dongmei Zhang; |
416 | CodeT: Code Generation with Generated Tests Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method, CodeT, that leverages the same pre-trained language models to automatically generate test cases for the code samples, thus reducing the human effort and increasing the coverage of the test scenarios. |
Bei Chen; Fengji Zhang; Anh Nguyen; Daoguang Zan; Zeqi Lin; Jian-Guang Lou; Weizhu Chen; |
417 | Does Deep Learning Learn to Abstract? A Systematic Probing Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective. |
Shengnan An; Zeqi Lin; Bei Chen; Qiang Fu; Nanning Zheng; Jian-Guang Lou; |
418 | Exact Group Fairness Regularization Via Classwise Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we propose a principled method that indeed can incorporate an $\textit{exact}$ form of a well-justified group fairness metric, Difference of Conditional Accuracy (DCA), as a regularizer using a $\textit{classwise}$ distributionally robust optimization (DRO) framework. |
Sangwon Jung; Taeeon Park; Sanghyuk Chun; Taesup Moon; |
419 | Weighted Clock Logic Point Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework for modeling temporal point processes called clock logic neural networks (CLNN) which learn weighted clock logic (wCL) formulas as interpretable temporal rules by which some events promote or inhibit other events. |
Ruixuan Yan; Yunshi Wen; Debarun Bhattacharjya; Ronny Luss; Tengfei Ma; Achille Fokoue; Anak Agung Julius; |
420 | DiffEdit: Diffusion-based Semantic Image Editing with Mask Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we propose DiffEdit, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query. |
Guillaume Couairon; Jakob Verbeek; Holger Schwenk; Matthieu Cord; |
421 | Human Alignment of Neural Network Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the factors that affect alignment between the representations learned by neural networks and human concept representations. |
Lukas Muttenthaler; Jonas Dippel; Lorenz Linhardt; Robert A. Vandermeulen; Simon Kornblith; |
422 | SMART: Self-supervised Multi-task PretrAining with ContRol Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The challenge becomes combinatorially more complex if we want to pretrain representations amenable to a large variety of tasks. To tackle this problem, in this work, we formulate a general pretraining-finetuning pipeline for sequential decision making, under which we propose a generic pretraining framework \textit{Self-supervised Multi-task pretrAining with contRol Transformer (SMART)}. |
Yanchao Sun; Shuang Ma; Ratnesh Madaan; Rogerio Bonatti; Furong Huang; Ashish Kapoor; |
423 | Are More Layers Beneficial to Graph Transformers? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel graph transformer model named DeepGraph that explicitly employs substructure tokens in the encoded representation, and applies local attention on related nodes to obtain substructure based attention encoding. |
Haiteng Zhao; Shuming Ma; Dongdong Zhang; Zhi-Hong Deng; Furu Wei; |
424 | A Universal 3D Molecular Representation Learning Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a universal 3D MRL framework that significantly enlarges the representation ability and application scope of MRL schemes. |
Gengmo Zhou; Zhifeng Gao; Qiankun Ding; Hang Zheng; Hongteng Xu; Zhewei Wei; Linfeng Zhang; Guolin Ke; |
425 | Accurate Bayesian Meta-Learning By Accurate Task Posterior Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior work studies a range of architectural modifications to boost performance, such as attentive computation paths or improved context aggregation schemes, while the influence of the VI scheme remains under-explored. We aim to bridge this gap by introducing GMM-NP, a novel BML model, which builds on recent work that enables highly accurate, full-covariance Gaussian mixture (GMM) TP approximations by combining VI with natural gradients and trust regions. |
Michael Volpp; Philipp Dahlinger; Philipp Becker; Christian Daniel; Gerhard Neumann; |
426 | Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that $Q$-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee. To substantiate this concept, we propose the technique of $Q$ replay buffer, which stores the learned $Q$-networks from the past iterations, and arrive at a practical actor-critic implementation. |
Wei Hung; Bo Kai Huang; Ping-Chun Hsieh; Xi Liu; |
427 | Single-shot General Hyper-parameter Optimization for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Federated Loss SuRface Aggregation (FLoRA), a general FL-HPO solution framework that can address use cases of tabular data and any Machine Learning (ML) model including gradient boosting training algorithms, SVMs, neural networks, among others and thereby further expands the scope of FL-HPO. |
Yi Zhou; Parikshit Ram; Theodoros Salonidis; Nathalie Baracaldo; Horst Samulowitz; Heiko Ludwig; |
428 | AE-FLOW: Autoencoders with Normalizing Flows for Medical Images Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we propose a normalizing flow based autoencoder for an efficient and tractable representation of normal medical images. |
Yuzhong Zhao; Qiaoqiao Ding; Xiaoqun Zhang; |
429 | What Is Missing in IRM Training and Evaluation? Challenges and Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, a series of advanced IRM algorithms have been developed that show practical improvement over IRMV1. In this work, we revisit these recent IRM advancements and identify and resolve three practical limitations in IRM training and evaluation. |
Yihua Zhang; Pranay Sharma; Parikshit Ram; Mingyi Hong; Kush R. Varshney; Sijia Liu; |
430 | Distributional Meta-Gradient Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: All the existing algorithms adhere to the same reward learning regime, where the adaptive return is simply formulated in the form of expected cumulative rewards, upon which the policy and critic update rules are specified under well adopted distance metrics. In this paper, we present a novel algorithm which builds on the success of meta-gradient RL algorithms and effectively improves such algorithms by following a simple recipe, i.e., going beyond the expected return to formulate and learn the return in a more expressive form, value distributions. |
Haiyan Yin; Shuicheng YAN; Zhongwen Xu; |
431 | Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. |
Alex Gu; Songtao Lu; Parikshit Ram; Tsui-Wei Weng; |
432 | Linearly Mapping from Image to Text Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we show that the image representations from vision models can be transferred as continuous prompts to frozen LMs by training only a single linear projection. |
Jack Merullo; Louis Castricato; Carsten Eickhoff; Ellie Pavlick; |
433 | Evidential Uncertainty and Diversity Guided Active Learning for Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly porting current AL methods to the SGG task poses the following challenges: 1) unreliable uncertainty estimates, and 2) data bias problems. To deal with these challenges, we propose EDAL (\textbf{E}vidential Uncertainty and \textbf{D}iversity Guided Deep \textbf{A}ctive \textbf{L}earning), a novel AL framework tailored for the SGG task. |
Shuzhou Sun; Shuaifeng Zhi; Janne Heikkilä; Li Liu; |
434 | StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. |
Haoxuan Li; Chunyuan Zheng; Peng Wu; |
435 | Variational Latent Branching Model for Off-Policy Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. |
Qitong Gao; Ge Gao; Min Chi; Miroslav Pajic; |
436 | TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a principled approach that can effectively reduce the bias and variance simultaneously for existing DR estimators when the error-imputation model is misspecified. |
Haoxuan Li; Yan Lyu; Chunyuan Zheng; Peng Wu; |
437 | Improving Deep Policy Gradients with Value Function Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on improving value approximation and analyzing the effects on Deep Policy Gradient primitives such as value prediction, variance reduction, and correlation of gradient estimates with the true gradient. |
Enrico Marchesini; Christopher Amato; |
438 | LMSeg: Language-guided Multi-dataset Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the multi-dataset segmentation and propose a scalable Language-guided Multi-dataset Segmentation framework, dubbed LMSeg, which supports both semantic and panoptic segmentation. |
Qiang Zhou; Yuang Liu; Chaohui Yu; Jingliang Li; Zhibin Wang; Fan Wang; |
439 | Graph Neural Networks for Link Prediction with Subgraph Sketching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. |
Benjamin Paul Chamberlain; Sergey Shirobokov; Emanuele Rossi; Fabrizio Frasca; Thomas Markovich; Nils Yannick Hammerla; Michael M. Bronstein; Max Hansmire; |
440 | Extracting Robust Models with Uncertain Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, how to extract a robust model with similar resilience against adversarial attacks is never investigated. This paper presents the first study toward this goal. |
Guanlin Li; Guowen Xu; Shangwei Guo; Han Qiu; Jiwei Li; Tianwei Zhang; |
441 | Combinatorial Pure Exploration of Causal Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide the first gap-dependent and fully adaptive pure exploration algorithms on two types of causal models — the binary generalized linear model (BGLM) and general graphs. |
Nuoya Xiong; Wei Chen; |
442 | Continuous-time Identification of Dynamic State-space Models By Deep Subspace Encoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, even with numerous recent developments, the CT nonlinear state-space (NL-SS) model identification problem remains to be solved in full, considering common experimental aspects such as the presence of external inputs, measurement noise, latent states, and general robustness. This paper presents a novel estimation method that addresses all these aspects and that can obtain state-of-the-art results on multiple benchmarks with compact fully connected neural networks capturing the CT dynamics. |
Gerben I. Beintema; Maarten Schoukens; Roland Tóth; |
443 | Better Generative Replay for Continual Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By analyzing the behaviors of clients during training, we find the unstable training process caused by distributed training on non-IID data leads to a notable performance degradation. To address this problem, we propose our FedCIL model with two simple but effective solutions: 1. |
Daiqing Qi; Handong Zhao; Sheng Li; |
444 | Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, several works have demonstrated high gains by taking a straightforward approach for incorporating intermediate supervision in compounded natural language problems: the sequence-to-sequence LM is fed with an augmented input, in which the decomposed tasks’ labels are simply concatenated to the original input. In this paper, we prove a positive learning result that motivates these recent efforts. |
Noam Wies; Yoav Levine; Amnon Shashua; |
445 | On The Word Boundaries of Emergent Languages Based on Harris’s Articulation Scheme Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is not obvious whether such a simulated language would have the same properties as natural language. In this paper, we test if they satisfy HAS. |
Ryo Ueda; Taiga Ishii; Yusuke Miyao; |
446 | Generative Modelling with Inverse Heat Dissipation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. |
Severi Rissanen; Markus Heinonen; Arno Solin; |
447 | Self-supervision Through Random Segments with Autoregressive Coding (RandSAC) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of self-supervised autoregressive representation learning in natural language (GPT and its variants), and advances in recent visual architecture design with Vision Transformers (ViTs), in this paper, we explore the effects various design choices have on the success of applying such training strategies for visual feature learning. |
Tianyu Hua; Yonglong Tian; Sucheng Ren; Michalis Raptis; Hang Zhao; Leonid Sigal; |
448 | Ask Me Anything: A Simple Strategy for Prompting Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a prompting strategy based on aggregating the predictions of multiple prompts, which enables a 6B parameter model to exceed the few-shot performance of GPT3-175B on 15/20 popular benchmarks. |
Simran Arora; Avanika Narayan; Mayee F Chen; Laurel Orr; Neel Guha; Kush Bhatia; Ines Chami; Christopher Re; |
449 | DAVA: Disentangling Adversarial Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we introduce DAVA, a novel training procedure for variational auto-encoders that alleviates the issue of hyperparameter selection at the cost of a comparatively small overhead. |
Benjamin Estermann; Roger Wattenhofer; |
450 | Temperature Schedules for Self-supervised Contrastive Methods on Long-tail Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on imbalanced data. |
Anna Kukleva; Moritz Böhle; Bernt Schiele; Hilde Kuehne; Christian Rupprecht; |
451 | From $t$-SNE to UMAP with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we uncover their conceptual connection via a new insight into contrastive learning methods. |
Sebastian Damrich; Niklas Böhm; Fred A Hamprecht; Dmitry Kobak; |
452 | Sharper Bounds for Uniformly Stable Algorithms with Stationary $\varphi$-mixing Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we use algorithmic stability to study the generalization performance of learning algorithms with $\varphi$-mixing data, where the dependency between observations weakens over time. |
Shi Fu; Yunwen Lei; Qiong Cao; Xinmei Tian; Dacheng Tao; |
453 | Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. |
Baiting Zhu; Meihua Dang; Aditya Grover; |
454 | Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse, which reveals that the last-layer features of the same class will collapse into a vertex, and the vertices of all classes are aligned with the classifier prototypes, which are formed as a simplex equiangular tight frame (ETF). |
Yibo Yang; Haobo Yuan; Xiangtai Li; Zhouchen Lin; Philip Torr; Dacheng Tao; |
455 | Efficient Recurrent Architectures Through Activity Sparsity and Sparse Back-propagation Through Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a solution inspired by biological neuron dynamics that makes the communication between RNN units sparse and discrete. |
Anand Subramoney; Khaleelulla Khan Nazeer; Mark Schöne; Christian Mayr; David Kappel; |
456 | Neural EPDOs: Spatially Adaptive Equivariant Partial Differential Operator Based Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel nonlinear PDOs scheme that is both spatially adaptive and translation equivariant. |
Lingshen He; Yuxuan Chen; Zhengyang Shen; Yibo Yang; Zhouchen Lin; |
457 | Learning to Segment from Noisy Annotations: A Spatial Correction Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel noise model for segmentation problems that encodes spatial correlation and bias, which are prominent in segmentation annotations. |
Jiachen Yao; Yikai Zhang; Songzhu Zheng; Mayank Goswami; Prateek Prasanna; Chao Chen; |
458 | Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel \underline{e}dge guided \underline{g}enerative \underline{a}dversarial \underline{n}etwork with \underline{c}ontrastive learning (ECGAN) for the challenging semantic image synthesis task. |
Hao Tang; XIAOJUAN QI; Guolei Sun; Dan Xu; Nicu Sebe; Radu Timofte; Luc Van Gool; |
459 | Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. |
Florent Delgrange; Ann Nowe; Guillermo Perez; |
460 | STaSy: Score-based Tabular Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new model named $\textbf{S}$core-based $\textbf{Ta}$bular data $\textbf{Sy}$nthesis ($\texttt{STaSy}$) and its training strategy based on the paradigm of score-based generative modeling. |
Jayoung Kim; Chaejeong Lee; Noseong Park; |
461 | De Novo Molecular Generation Via Connection-aware Motif Mining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose MiCaM to generate molecules based on mined connection-aware motifs. |
Zijie Geng; Shufang Xie; Yingce Xia; Lijun Wu; Tao Qin; Jie Wang; Yongdong Zhang; Feng Wu; Tie-Yan Liu; |
462 | When Source-Free Domain Adaptation Meets Learning with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study SFDA from the perspective of learning with label noise (LLN). |
Li Yi; Gezheng Xu; Pengcheng Xu; Jiaqi Li; Ruizhi Pu; Charles Ling; Ian McLeod; Boyu Wang; |
463 | Test-Time Adaptation Via Self-Training with Nearest Neighbor Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. |
Minguk Jang; Sae-Young Chung; Hye Won Chung; |
464 | Federated Neural Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So, this paper introduces the federated neural-upper confidence bound (FN-UCB) algorithm. |
Zhongxiang Dai; Yao Shu; Arun Verma; Flint Xiaofeng Fan; Bryan Kian Hsiang Low; Patrick Jaillet; |
465 | Measuring Axiomatic Identifiability of Counterfactual Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for evaluating image counterfactuals. |
Miguel Monteiro; Fabio De Sousa Ribeiro; Nick Pawlowski; Daniel C. Castro; Ben Glocker; |
466 | In-sample Actor Critic for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose In-sample Actor Critic (IAC) which utilizes sampling-importance resampling to execute in-sample policy evaluation. |
Hongchang Zhang; Yixiu Mao; Boyuan Wang; Shuncheng He; Yi Xu; Xiangyang Ji; |
467 | Planning Goals for Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose planning exploratory goals (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. |
Edward S. Hu; Richard Chang; Oleh Rybkin; Dinesh Jayaraman; |
468 | Kernel Neural Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the Neural Optimal Transport (NOT) algorithm which uses the general optimal transport formulation and learns stochastic transport plans. |
Alexander Korotin; Daniil Selikhanovych; Evgeny Burnaev; |
469 | Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to do targeted hyperparameter optimization with lexicographic preference over multiple objectives, motivated by various practical applications. |
Shaokun Zhang; Feiran Jia; Chi Wang; Qingyun Wu; |
470 | A Non-monotonic Self-terminating Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. |
Cheolhyoung Lee; Eugene Choi; Kyunghyun Cho; |
471 | Noise-Robust De-Duplication at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study uses the unique timeliness of historical news wires to create a 27,210 document dataset, with 122,876 positive duplicate pairs, for studying noise-robust de-duplication. |
Emily Silcock; Luca D’Amico-Wong; Jinglin Yang; Melissa Dell; |
472 | Simplicial Embeddings in Self-Supervised Learning and Downstream Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use softmax to embed representations in a collection of simplices in SSL models, which offers improved generalization properties for downstream classification. |
Samuel Lavoie; Christos Tsirigotis; Max Schwarzer; Ankit Vani; Michael Noukhovitch; Kenji Kawaguchi; Aaron Courville; |
473 | Policy Pre-training for Autonomous Driving Via Self-supervised Geometric Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pre-training in visuomotor driving. |
Penghao Wu; Li Chen; Hongyang Li; Xiaosong Jia; Junchi Yan; Yu Qiao; |
474 | Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose MARVEL, which adaptively allocates the parameter budget among weight matrices according to their importance score. |
Qingru Zhang; Minshuo Chen; Alexander Bukharin; Pengcheng He; Yu Cheng; Weizhu Chen; Tuo Zhao; |
475 | Treeformer: Dense Gradient Trees for Efficient Attention Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we view attention computation as that of nearest neighbor retrieval, and use decision tree based hierarchical navigation to reduce the retrieval cost per query token from linear in sequence length to nearly logarithmic. |
Lovish Madaan; Srinadh Bhojanapalli; Himanshu Jain; Prateek Jain; |
476 | GPTQ: Accurate Quantization for Generative Pre-trained Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. |
Elias Frantar; Saleh Ashkboos; Torsten Hoefler; Dan Alistarh; |
477 | Neural Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel neural-networks-based algorithm to compute optimal transport maps and plans for strong and weak transport costs. |
Alexander Korotin; Daniil Selikhanovych; Evgeny Burnaev; |
478 | DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction. |
Pierre Schumacher; Daniel Haeufle; Dieter Büchler; Syn Schmitt; Georg Martius; |
479 | Optimal Activation Functions for The Random Features Regression Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find scenarios under which the optimal AFs are linear, saturated linear functions, or expressible in terms of Hermite polynomials. |
Jianxin Wang; José Bento; |
480 | Unsupervised Object-Centric Learning with Bi-level Optimized Query Slot Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods, however, have been exceedingly difficult to train without supervision and are ambiguous in the notion of object, especially for complex natural scenes. In this paper, we propose to address these issues by (1) initializing Slot-Attention modules with learnable queries and (2) optimizing the model with bi-level optimization. |
Baoxiong Jia; Yu Liu; Siyuan Huang; |
481 | Augmentation with Projection: Towards An Effective and Efficient Data Augmentation Paradigm for Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. |
Ziqi Wang; Yuexin Wu; Frederick Liu; Daogao Liu; Le Hou; Hongkun Yu; Jing Li; Heng Ji; |
482 | Learning An Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This phenomenon, we term \emph{Feature Replication Hypothesis}, coupled with the \emph{Implicit Bias} of SGD to converge to maximum margin solutions in the feature space, leads the models to rely mostly on the simple features for classification. To mitigate this bias, we propose \emph{Feature Reconstruction Regularizer (FRR)} to ensure that the learned features can be reconstructed back from the logits. |
Sravanti Addepalli; Anshul Nasery; Venkatesh Babu Radhakrishnan; Praneeth Netrapalli; Prateek Jain; |
483 | Imbalanced Semi-supervised Learning with Bias Adaptive Classifier Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such an assumption is far from realistic scenarios and thus severely limits the performance of current pseudo-labeling methods under the context of class-imbalance. To alleviate this problem, we design a bias adaptive classifier that targets the imbalanced SSL setups. |
Renzhen Wang; Xixi Jia; Quanziang Wang; Yichen Wu; Deyu Meng; |
484 | On Compositional Uncertainty Quantification for Seq2seq Graph Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to quantify and evaluate compositional uncertainty for seq2seq graph parsing by proposing a simple probabilistic framework and rigorous evaluation metrics. |
Zi Lin; Jeremiah Zhe Liu; Du Phan; Panupong Pasupat; Jingbo Shang; |
485 | EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an Efficient Unsupervised Reinforcement Learning Framework with Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency. |
Yifu Yuan; Jianye HAO; Fei Ni; Yao Mu; YAN ZHENG; Yujing Hu; Jinyi Liu; Yingfeng Chen; Changjie Fan; |
486 | A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov decision process (MDP) models in the literature for tractable RL. |
Zixiang Chen; Chris Junchi Li; Huizhuo Yuan; Quanquan Gu; Michael Jordan; |
487 | Sequence to Sequence Text Generation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is difficult due to the discrete nature of text. We tackle this challenge by proposing DiffuSeq: a diffusion model designed for sequence-to-sequence (Seq2Seq) text generation tasks. |
Shansan Gong; Mukai Li; Jiangtao Feng; Zhiyong Wu; Lingpeng Kong; |
488 | Measure The Predictive Heterogeneity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that it can be reliably estimated from finite data with PAC bounds even in high dimensions. |
Jiashuo Liu; Jiayun Wu; Renjie Pi; Renzhe Xu; Xingxuan Zhang; Bo Li; Peng Cui; |
489 | InPL: Pseudo-labeling The Inliers First for Imbalanced Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. |
Zhuoran Yu; Yin Li; Yong Jae Lee; |
490 | PandA: Unsupervised Learning of Parts and Appearances in The Feature Maps of GANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. |
James Oldfield; Christos Tzelepis; Yannis Panagakis; Mihalis Nicolaou; Ioannis Patras; |
491 | Unsupervised Visualization of Image Datasets Using Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present a new method, called t-SimCNE, for unsupervised visualization of image data. |
Niklas Böhm; Philipp Berens; Dmitry Kobak; |
492 | Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the underlying neural mechanisms are not fully understood, various evidence supports that synaptic plasticity plays a critical role in memory formation and fast learning. Inspired by these results, we equip Recurrent Neural Networks (RNNs) with plasticity rules to enable them to adapt their parameters according to ongoing experiences. In addition to the traditional local Hebbian plasticity, we propose a global, gradient-based plasticity rule, which allows the model to evolve towards its self-determined target. |
Yu Duan; Zhongfan Jia; Qian Li; Yi Zhong; Kaisheng Ma; |
493 | Learned Index with Dynamic $\epsilon$ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a mathematically-grounded learned index framework with dynamic $\epsilon$, which is efficient and pluggable to existing learned index methods. |
Daoyuan Chen; Wuchao Li; Yaliang Li; Bolin Ding; Kai Zeng; Defu Lian; Jingren Zhou; |
494 | ViT-Adapter: Exploring Plain Vision Transformer for Accurate Dense Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike recently advanced variants that incorporate vision-specific inductive biases into their architectures, the plain ViT suffers inferior performance on dense predictions due to weak prior assumptions. To address this issue, we propose the ViT-Adapter, which allows plain ViT to achieve comparable performance to vision-specific transformers. |
Zhe Chen; Yuchen Duan; Wenhai Wang; Junjun He; Tong Lu; Jifeng Dai; Yu Qiao; |
495 | Pareto Invariant Risk Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, these compromises could easily lead to suboptimal performance of either the ERM or OOD objective. To address these issues, we introduce a multi-objective optimization (MOO) perspective to understand the OOD optimization process, and propose a new optimization scheme called PAreto Invariant Risk Minimization (PAIR). |
Yongqiang Chen; Kaiwen Zhou; Yatao Bian; Binghui Xie; Bingzhe Wu; Yonggang Zhang; MA KAILI; Han Yang; Peilin Zhao; Bo Han; James Cheng; |
496 | ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present ROSCOE, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics. |
Olga Golovneva; Moya Peng Chen; Spencer Poff; Martin Corredor; Luke Zettlemoyer; Maryam Fazel-Zarandi; Asli Celikyilmaz; |
497 | Variational Information Pursuit for Interpretable Predictions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. |
Aditya Chattopadhyay; Kwan Ho Ryan Chan; Benjamin David Haeffele; Donald Geman; Rene Vidal; |
498 | Deep Learning on Implicit Neural Representations of Shapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forward this research problem and propose inr2vec, a framework that can compute a compact latent representation for an input INR in a single inference pass. |
Luca De Luigi; Adriano Cardace; Riccardo Spezialetti; Pierluigi Zama Ramirez; Samuele Salti; Luigi di Stefano; |
499 | Image to Sphere: Learning Equivariant Features for Efficient Pose Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose a novel mapping of features from the image domain to the 3D rotation manifold. |
David Klee; Ondrej Biza; Robert Platt; Robin Walters; |
500 | Generalization and Estimation Error Bounds for Model-based Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks, and derive practical design rules that allow to construct model-based networks with guaranteed high generalization. |
Avner Shultzman; Eyar Azar; Miguel R. D. Rodrigues; Yonina C. Eldar; |
501 | Consolidator: Mergable Adapter with Group Connections for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, some of them bring heavy inference cost though benefiting storage. To tackle these problems, we propose consolidator to achieve efficient transfer learning for vision transformers. |
Tianxiang Hao; Hui Chen; Yuchen Guo; Guiguang Ding; |
502 | Multivariate Time-series Imputation with Disentangled Temporal Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from existing approaches, we propose TIDER, a novel matrix factorization-based method with disentangled temporal representations that account for multiple factors, namely trend, seasonality, and local bias, to model complex dynamics. |
SHUAI LIU; Xiucheng Li; Gao Cong; Yile Chen; YUE JIANG; |
503 | Characterizing The Influence of Graph Elements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the nodes/edges in a graph are interdependent in GCNs, it is challenging to derive influence functions for GCNs. To fill this gap, we started with the simple graph convolution (SGC) model that operates on an attributed graph, and formulated an influence function to approximate the changes of model parameters when a node or an edge is removed from an attributed graph. |
Zizhang Chen; Peizhao Li; Hongfu Liu; Pengyu Hong; |
504 | LipsFormer: Introducing Lipschitz Continuity to Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a Lipschitz continuous Transformer, called LipsFormer, to pursue training stability both theoretically and empirically for Transformer-based models. |
Xianbiao Qi; Jianan Wang; Yihao Chen; Yukai Shi; Lei Zhang; |
505 | Neuro-Symbolic Procedural Planning with Commonsense Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper proposes a neuro-symbolic procedural PLANner (PLAN) that elicits procedural planning knowledge from the LLMs with commonsense-infused prompting. |
Yujie Lu; Weixi Feng; Wanrong Zhu; Wenda Xu; Xin Eric Wang; Miguel Eckstein; William Yang Wang; |
506 | Robust Scheduling with GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new approach to scheduling by sampling proportionally to the proxy metric using a novel GFlowNet method. |
David W Zhang; Corrado Rainone; Markus Peschl; Roberto Bondesan; |
507 | On The Performance of Temporal Difference Learning With Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we provide a convergence analysis of Neural TD Learning with a projection onto $B(\theta_0, \omega)$, a ball of fixed radius $\omega$ around the initial point $\theta_0$. |
HAOXING TIAN; Ioannis Paschalidis; Alex Olshevsky; |
508 | WikiWhy: Answering and Explaining Cause-and-Effect Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. |
Matthew Ho; Aditya Sharma; Justin Chang; Michael Saxon; Sharon Levy; Yujie Lu; William Yang Wang; |
509 | Data Augmentation Alone Can Improve Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proves that, contrary to previous findings, data augmentation alone can significantly boost accuracy and robustness in adversarial training. |
Lin Li; Michael W. Spratling; |
510 | Spikformer: When Spiking Neural Network Meets Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer). |
Zhaokun Zhou; Yuesheng Zhu; Chao He; Yaowei Wang; Shuicheng YAN; Yonghong Tian; Li Yuan; |
511 | NERDS: A General Framework to Train Camera Denoisers from Single Noisy Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To utilize this property, we can adopt noisy/clean image synthesis at low-resolution to train camera denoisers. On this basis, we propose a new solution pipeline — NERDS that estimates camera noises and synthesizes noisy-clean image pairs from only noisy images. |
Heewon Kim; Kyoung Mu Lee; |
512 | Modeling The Data-Generating Process Is Necessary for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the relationship between spurious attributes and the classification label, we obtain realizations of the canonical causal graph that characterize common distribution shifts and show that each shift entails different independence constraints over observed variables. |
Jivat Neet Kaur; Emre Kiciman; Amit Sharma; |
513 | Strong Inductive Biases Provably Prevent Harmless Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper argues that the degree to which interpolation is harmless hinges upon the strength of an estimator’s inductive bias, i.e., how heavily the estimator favors solutions with a certain structure: while strong inductive biases prevent harmless interpolation, weak inductive biases can even require fitting noise to generalize well. |
Michael Aerni; Marco Milanta; Konstantin Donhauser; Fanny Yang; |
514 | Certified Training: Small Boxes Are All You Need Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the novel certified training method, SABR, which outperforms existing methods across perturbation magnitudes on MNIST, CIFAR-10, and TinyImageNet, in terms of both standard and certifiable accuracies. |
Mark Niklas Mueller; Franziska Eckert; Marc Fischer; Martin Vechev; |
515 | Efficient Certified Training and Robustness Verification of Neural ODEs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite significant progress in robustness verification for standard feed-forward architectures, the verification of high dimensional NODEs remains an open problem. In this work we address this challenge and propose GAINS, an analysis framework for NODEs combining three key ideas: (i) a novel class of ODE solvers, based on variable but discrete time steps, (ii) an efficient graph representation of solver trajectories, and (iii) a novel abstraction algorithm operating on this graph representation. |
Mustafa Zeqiri; Mark Niklas Mueller; Marc Fischer; Martin Vechev; |
516 | Confidence Estimation Using Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. |
Chen Li; Xiaoling Hu; Chao Chen; |
517 | Neural Episodic Control with State Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces Neural Episodic Control with State Abstraction (NECSA), a simple but effective state abstraction-based episodic control containing a more comprehensive episodic memory, a novel state evaluation, and a multi-step state analysis. |
Zhuo Li; Derui Zhu; Yujing Hu; Xiaofei Xie; Lei Ma; YAN ZHENG; Yan Song; Yingfeng Chen; Jianjun Zhao; |
518 | Leveraging Large Language Models for Multiple Choice Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse tasks and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated. |
Joshua Robinson; David Wingate; |
519 | Relative Representations Enable Zero-shot Latent Space Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to adopt pairwise similarities as an alternative data representation, that can be used to enforce the desired invariance without any additional training. |
Luca Moschella; Valentino Maiorca; Marco Fumero; Antonio Norelli; Francesco Locatello; Emanuele Rodolà; |
520 | ILA-DA: Improving Transferability of Intermediate Level Attack with Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, it has been shown that simple image transformations can also enhance attack transferability. Based on these two observations, we propose ILA-DA, which employs three novel augmentation techniques to enhance ILA. |
Chiu Wai Yan; Tsz-Him Cheung; Dit-Yan Yeung; |
521 | Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe a series of experiments: we show that existing knowledge is conserved more strongly in parameter-efficient training and that parameter-efficient scaling scales with model and dataset size. |
Zaid Khan; Yun Fu; |
522 | Real-time Variational Method for Learning Neural Trajectory and Its Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite the potential of real-time alternatives to give immediate feedback to experimentalists, and enhance experimental design, they have received markedly less attention. In this work, we introduce the exponential family variational Kalman filter (eVKF), an online recursive Bayesian method aimed at inferring latent trajectories while simultaneously learning the dynamical system generating them. |
Matthew Dowling; Yuan Zhao; Il Memming Park; |
523 | Minimalistic Unsupervised Learning with The Sparse Manifold Transform Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. |
Yubei Chen; Zeyu Yun; Yi Ma; Bruno Olshausen; Yann LeCun; |
524 | Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a self-ensemble protection (SEP) method to take advantage of intermediate checkpoints in a single training process for data protection. |
Sizhe Chen; Geng Yuan; Xinwen Cheng; Yifan Gong; Minghai Qin; Yanzhi Wang; Xiaolin Huang; |
525 | On The Duality Between Contrastive and Non-contrastive Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that contrastive and non-contrastive self-supervised methods can be shown to be closely related, and then study how implementation details impact performance. We validate empirically our findings and significantly improve known behaviours. |
Quentin Garrido; Yubei Chen; Adrien Bardes; Laurent Najman; Yann LeCun; |
526 | AGRO: Adversarial Discovery of Error-prone Groups for Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose AGRO—Adversarial Group discovery for Distributionally Robust Optimization—an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them. |
Bhargavi Paranjape; Pradeep Dasigi; Vivek Srikumar; Luke Zettlemoyer; Hannaneh Hajishirzi; |
527 | Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. |
Zhang-Wei Hong; Remi Tachet des Combes; Pulkit Agrawal; Romain Laroche; |
528 | Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. |
Jesse Farebrother; Joshua Greaves; Rishabh Agarwal; Charline Le Lan; Ross Goroshin; Pablo Samuel Castro; Marc G Bellemare; |
529 | Self-Consistency Improves Chain of Thought Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. |
Xuezhi Wang; Jason Wei; Dale Schuurmans; Quoc V Le; Ed H. Chi; Sharan Narang; Aakanksha Chowdhery; Denny Zhou; |
530 | Investigating Multi-task Pretraining and Generalization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose to investigate the generalization capabilities of a popular actor-critic method, IMPALA. |
Adrien Ali Taiga; Rishabh Agarwal; Jesse Farebrother; Aaron Courville; Marc G Bellemare; |
531 | ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a simple neural network building block called ChordMixer which can model the attention for long sequences with variable lengths. |
Ruslan Khalitov; Tong Yu; Lei Cheng; Zhirong Yang; |
532 | Personalized Federated Learning with Feature Alignment and Classifier Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct explicit local-global feature alignment by leveraging global semantic knowledge for learning a better representation. |
Jian Xu; Xinyi Tong; Shao-Lun Huang; |
533 | EA-HAS-Bench: Energy-aware Hyperparameter and Architecture Search Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we present the first large-scale energy-aware benchmark that allows studying AutoML methods to achieve better trade-offs between performance and search energy consumption, named EA-HAS-Bench. |
Shuguang Dou; XINYANG JIANG; Cai Rong Zhao; Dongsheng Li; |
534 | Distributionally Robust Recourse Action Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this assumption does not always hold in practice because of data distribution shifts, and in this case, the recourse action may become invalid. To redress this shortcoming, we propose the Distributionally Robust Recourse Action (DiRRAc) framework, which generates a recourse action that has high probability of being valid under a mixture of model shifts. |
Duy Nguyen; Ngoc Bui; Viet Anh Nguyen; |
535 | A Probabilistic Framework for Task-aligned Intra- and Inter-area Neural Manifold Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we propose a novel probabilistic framework that allows for interpretable partitioning of population variability within and across areas in the context of naturalistic behavior. |
Edoardo Balzani; Jean-Paul G Noel; Pedro Herrero-Vidal; Dora E Angelaki; Cristina Savin; |
536 | Block and Subword-Scaling Floating-Point (BSFP) : An Efficient Non-Uniform Quantization For Low Precision Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Block and Subword-Scaling Floating-Point (BSFP), a non-uniform quantization scheme for the skewed and non-uniform distribution of weight vectors in neural networks. |
Yun-Chen Lo; Tse-Kuang Lee; Ren-Shuo Liu; |
537 | Simple Yet Effective Graph Contrastive Learning for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues that negatively impact the generality and robustness of CL-based recommenders. |
Xuheng Cai; Chao Huang; Lianghao Xia; Xubin Ren; |
538 | Unbiased Supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle the problem of learning representations that are robust to biases. |
Carlo Alberto Barbano; Benoit Dufumier; Enzo Tartaglione; Marco Grangetto; Pietro Gori; |
539 | SQA3D: Situated Question Answering in 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D). |
Xiaojian Ma; Silong Yong; Zilong Zheng; Qing Li; Yitao Liang; Song-Chun Zhu; Siyuan Huang; |
540 | Data Valuation Without Training of A Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. |
Ki Nohyun; Hoyong Choi; Hye Won Chung; |
541 | HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present $\texttt{HotProtein}$, a large-scale protein dataset with \textit{growth temperature} annotations of thermostability, containing $182$K amino acid sequences and $3$K folded structures from $230$ different species with a wide temperature range $-20^{\circ}\texttt{C}\sim 120^{\circ}\texttt{C}$. |
Tianlong Chen; Chengyue Gong; Daniel Jesus Diaz; Xuxi Chen; Jordan Tyler Wells; qiang liu; Zhangyang Wang; Andrew Ellington; Alex Dimakis; Adam Klivans; |
542 | Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle these issues, in this paper, we propose Switch-NeRF, a novel end-to-end large-scale NeRF with learning-based scene decomposition. |
Zhenxing MI; Dan Xu; |
543 | Measuring Forgetting of Memorized Training Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When models are trained on large datasets, we show that privacy attacks become less effective on examples seen early in training, and investigate why. |
Matthew Jagielski; Om Thakkar; Florian Tramer; Daphne Ippolito; Katherine Lee; Nicholas Carlini; Eric Wallace; Shuang Song; Abhradeep Guha Thakurta; Nicolas Papernot; Chiyuan Zhang; |
544 | Fundamental Limits in Formal Verification of Message-Passing Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that in the context of Message Passing Neural Networks (MPNN), a common Graph Neural Network (GNN) model, formal verification is impossible. |
Marco Sälzer; Martin Lange; |
545 | Part-Based Models Improve Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks by introducing a part-based model for object classification. |
Chawin Sitawarin; Kornrapat Pongmala; Yizheng Chen; Nicholas Carlini; David Wagner; |
546 | GLM-130B: An Open Bilingual Pre-trained Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. |
Aohan Zeng; Xiao Liu; Zhengxiao Du; Zihan Wang; Hanyu Lai; Ming Ding; Zhuoyi Yang; Yifan Xu; Wendi Zheng; Xiao Xia; Weng Lam Tam; Zixuan Ma; Yufei Xue; Jidong Zhai; Wenguang Chen; Zhiyuan Liu; Peng Zhang; Yuxiao Dong; Jie Tang; |
547 | How Robust Is Unsupervised Representation Learning to Distribution Shift? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We posit that the input-driven objectives of unsupervised algorithms lead to representations that are more robust to distribution shift than the target-driven objective of SL. We verify this by extensively evaluating the performance of SSL and AE on both synthetic and realistic distribution shift datasets. |
Yuge Shi; Imant Daunhawer; Julia E Vogt; Philip Torr; Amartya Sanyal; |
548 | 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current 3D target-aware models either rely on the voxelized atom densities or the autoregressive sampling process, which are not equivariant to rotation or easily violate geometric constraints resulting in unrealistic structures. In this work, we develop a 3D equivariant diffusion model to solve the above challenges. |
Jiaqi Guan; Wesley Wei Qian; Xingang Peng; Yufeng Su; Jian Peng; Jianzhu Ma; |
549 | Differentially Private Adaptive Optimization with Delayed Preconditioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore techniques to estimate and efficiently adapt to gradient geometry in private adaptive optimization without auxiliary data. |
Tian Li; Manzil Zaheer; Ken Liu; Sashank J. Reddi; Hugh Brendan McMahan; Virginia Smith; |
550 | On The Perils of Cascading Robust Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new attack against cascading ensembles and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set. |
Ravi Mangal; Zifan Wang; Chi Zhang; Klas Leino; Corina Pasareanu; Matt Fredrikson; |
551 | Graph Contrastive Learning for Skeleton-based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition ($\textit{SkeletonGCL}$) to explore the $\textit{global}$ context across all sequences. |
Xiaohu Huang; Hao Zhou; Bin Feng; Xinggang Wang; Wenyu Liu; Jian Wang; Haocheng Feng; Junyu Han; Errui Ding; Jingdong Wang; |
552 | An Image Is Worth One Word: Personalizing Text-to-Image Generation Using Textual Inversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In other words, we ask: how can we use language-guided models to turn *our* cat into a painting, or imagine a new product based on *our* favorite toy? Here we present a simple approach that allows such creative freedom. |
Rinon Gal; Yuval Alaluf; Yuval Atzmon; Or Patashnik; Amit Haim Bermano; Gal Chechik; Daniel Cohen-or; |
553 | Learning Cut Selection for Mixed-Integer Linear Programming Via Hierarchical Sequence Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we observe from extensive empirical results that (P3) what order of selected cuts should be preferred has a significant impact on the efficiency of solving MILPs as well. To address this challenge, we propose a novel hierarchical sequence model (HEM) to learn cut selection policies via reinforcement learning. |
Zhihai Wang; Xijun Li; Jie Wang; Yufei Kuang; Mingxuan Yuan; Jia Zeng; Yongdong Zhang; Feng Wu; |
554 | Visual Classification Via Description from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an alternative framework for classification with VLMs, which we call classification by description. |
Sachit Menon; Carl Vondrick; |
555 | CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to approach text-queried universal sound separation by using only unlabeled data. |
Hao-Wen Dong; Naoya Takahashi; Yuki Mitsufuji; Julian McAuley; Taylor Berg-Kirkpatrick; |
556 | Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, fine-tuning disrupts the pretrained visual representation, and causes representational drift towards the fine-tuned task thus leading to a loss of the versatility of the original model. We introduce a method for lossless adaptation to address this shortcoming of classical fine-tuning. |
Mohit Sharma; Claudio Fantacci; Yuxiang Zhou; Skanda Koppula; Nicolas Heess; Jon Scholz; Yusuf Aytar; |
557 | The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a thorough understanding of crossmodal KD. |
Zihui Xue; Zhengqi Gao; Sucheng Ren; Hang Zhao; |
558 | FedFA: Federated Feature Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The primary goal of this paper is to develop a robust federated learning algorithm to address feature shift in clients’ samples, which can be caused by various factors, e.g., acquisition differences in medical imaging. |
Tianfei Zhou; Ender Konukoglu; |
559 | Adversarial Training Descends Without Descent: Finding Actual Descent Directions Based on Danskin’s Theorem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More precisely, we provide a counterexample to a corollary of Danskin’s Theorem presented in the seminal paper of Madry et al. (2018) which states that a solution of the inner maximization problem can yield a descent direction for the adversarially robust loss. |
Fabian Latorre; Igor Krawczuk; Leello Tadesse Dadi; Thomas Pethick; Volkan Cevher; |
560 | Visual Recognition with Deep Nearest Centroids Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers. |
Wenguan Wang; Cheng Han; Tianfei Zhou; Dongfang Liu; |
561 | Dual Diffusion Implicit Bridges for Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models, that circumvents training on domain pairs. |
Xuan Su; Jiaming Song; Chenlin Meng; Stefano Ermon; |
562 | CROM: Continuous Reduced-Order Modeling of PDEs Using Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to accelerate PDE solvers using reduced-order modeling (ROM). |
Peter Yichen Chen; Jinxu Xiang; Dong Heon Cho; Yue Chang; G A Pershing; Henrique Teles Maia; Maurizio M Chiaramonte; Kevin Thomas Carlberg; Eitan Grinspun; |
563 | Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Universal Vision-Language Dense Retrieval (UniVL-DR), which builds a unified model for multi-modal retrieval. |
Zhenghao Liu; Chenyan Xiong; Yuanhuiyi Lv; Zhiyuan Liu; Ge Yu; |
564 | DINO As A Von Mises-Fisher Mixture Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the fact that the learned representations are -normalized, we show that DINO can be interpreted as a mixture model of von Mises-Fisher components. |
Hariprasath Govindarajan; Per Sidén; Jacob Roll; Fredrik Lindsten; |
565 | Average Sensitivity of Decision Tree Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we adopt the notion of average sensitivity as a stability measure, and design an algorithm with low average sensitivity that outputs a decision tree whose accuracy is nearly equal to the optimal decision tree. |
Satoshi Hara; Yuichi Yoshida; |
566 | Relational Attention: Generalizing Transformers for Graph-Structured Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As set processors, transformers are at a disadvantage in reasoning over more general graph-structured data where nodes represent entities and edges represent relations between entities. To address this shortcoming, we generalize transformer attention to consider and update edge vectors in each transformer layer. |
Cameron Diao; Ricky Loynd; |
567 | Holistic Adversarially Robust Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method, HARP, that copes with aggressive pruning significantly better than prior work. |
Qi Zhao; Christian Wressnegger; |
568 | Classically Approximating Variational Quantum Machine Learning with Random Fourier Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we propose a classical sampling method that can closely approximate most VQCs with Hamiltonian encoding, given only the description of their architecture. |
Jonas Landman; Slimane Thabet; Constantin Dalyac; Hela Mhiri; Elham Kashefi; |
569 | Distilling Model Failures As Directions in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This can make these methods labor-intensive and dataset-specific. To address these shortcomings, we present a scalable method for automatically distilling a model’s failure modes. |
Saachi Jain; Hannah Lawrence; Ankur Moitra; Aleksander Madry; |
570 | Theoretical Characterization of The Generalization Performance of Overfitted Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the theoretical understanding of when and why overparameterized models such as DNNs can generalize well in meta-learning is still limited. As an initial step towards addressing this challenge, this paper studies the generalization performance of overfitted meta-learning under a linear regression model with Gaussian features. |
Peizhong Ju; Yingbin Liang; Ness Shroff; |
571 | Stable Target Field for Reduced Variance Score Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to remedy the problem by incorporating a reference batch for minibatch updates where the reference batch is used to calculate weighted conditional scores as the more stable training targets. |
Yilun Xu; Shangyuan Tong; Tommi S. Jaakkola; |
572 | Unveiling The Sampling Density in Non-uniform Geometric Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce geometric graphs with hubs, an effective model for real-world graphs, and retrieve the sampling density by which those graphs are sampled from continuous latent spaces, to achieve various tasks. |
Raffaele Paolino; Aleksandar Bojchevski; Stephan Günnemann; Gitta Kutyniok; Ron Levie; |
573 | Interneurons Accelerate Learning Dynamics in Recurrent Neural Networks for Statistical Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. |
David Lipshutz; Cengiz Pehlevan; Dmitri Chklovskii; |
574 | Scaling Laws for A Multi-Agent Reinforcement Learning Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present an extensive study of performance scaling for a cornerstone reinforcement learning algorithm, AlphaZero. |
Oren Neumann; Claudius Gros; |
575 | Combinatorial-Probabilistic Trade-Off: P-Values of Community Properties Test in The Stochastic Block Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an inferential framework testing the general community combinatorial properties of the stochastic block model. |
Shuting Shen; Junwei Lu; |
576 | Learning The Positions in CountSketch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries. |
Yi Li; Honghao Lin; Simin Liu; Ali Vakilian; David Woodruff; |
577 | Language Modelling with Pixels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Tackling this bottleneck results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. This paper introduces PIXEL, the Pixel-based Encoder of Language, which suffers from neither of these issues. |
Phillip Rust; Jonas F. Lotz; Emanuele Bugliarello; Elizabeth Salesky; Miryam de Lhoneux; Desmond Elliott; |
578 | Active Learning for Object Detection with Evidential Deep Learning and Hierarchical Uncertainty Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new active learning strategy for object detection that overcomes the shortcomings of prior works. |
Younghyun Park; Soyeong Kim; Wonjeong Choi; Dong-Jun Han; Jaekyun Moon; |
579 | Your Contrastive Learning Is Secretly Doing Stochastic Neighbor Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Contrastive learning, especially self-supervised contrastive learning (SSCL), has achieved great success in extracting powerful features from unlabeled data. In this work, we contribute to the theoretical understanding of SSCL and uncover its connection to the classic data visualization method, stochastic neighbor embedding (SNE), whose goal is preserving pairwise distances. |
Tianyang Hu; Zhili LIU; Fengwei Zhou; Wenjia Wang; Weiran Huang; |
580 | DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to learn dexterous manipulation of deformable objects using multi-fingered hands.Concretely, we first collect a small set of human demonstrations using teleoperation. |
Sizhe Li; Zhiao Huang; Tao Chen; Tao Du; Hao Su; Joshua B. Tenenbaum; Chuang Gan; |
581 | Brain-like Representational Straightening of Natural Movies in Robust Feedforward Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we show robustness to noise can produce representational straightening in feedforward neural networks. |
Tahereh Toosi; Elias Issa; |
582 | Warping The Space: Weight Space Rotation for Class-Incremental Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce WaRP, the \textit{weight space rotation process}, which transforms the original parameter space into a new space so that we can push most of the previous knowledge compactly into only a few important parameters. |
Do-Yeon Kim; Dong-Jun Han; Jun Seo; Jaekyun Moon; |
583 | Amortised Invariance Learning for Contrastive Self-Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the notion of amortized invariance learning for contrastive self supervision. |
Ruchika Chavhan; Jan Stuehmer; Calum Heggan; Mehrdad Yaghoobi; Timothy Hospedales; |
584 | Autoregressive Conditional Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. |
Wessel Bruinsma; Stratis Markou; James Requeima; Andrew Y. K. Foong; Anna Vaughan; Tom Andersson; Anthony Buonomo; Scott Hosking; Richard E Turner; |
585 | Control Graph As Unified IO for Morphology-Task Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data. |
Hiroki Furuta; Yusuke Iwasawa; Yutaka Matsuo; Shixiang Shane Gu; |
586 | Contrastive Corpus Attribution for Explaining Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand’s representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. |
Chris Lin; Hugh Chen; Chanwoo Kim; Su-In Lee; |
587 | ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neural message passing is a basic feature extraction unit for graph-structured data considering neighboring node features in network propagation from one layer to the next. We model such process by an interacting particle system with attractive and repulsive forces and the Allen-Cahn force arising in the modeling of phase transition. |
Yuelin Wang; Kai Yi; Xinliang Liu; Yu Guang Wang; Shi Jin; |
588 | Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Offline MARL algorithm to Discover coordInation Skills (ODIS) from multi-task data. |
Fuxiang Zhang; Chengxing Jia; Yi-Chen Li; Lei Yuan; Yang Yu; Zongzhang Zhang; |
589 | Unified Discrete Diffusion for Simultaneous Vision-Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The recently developed discrete diffusion model performs extraordinarily well in generation tasks, especially in the text-to-image task, showing great potential for modeling multimodal signals. In this paper, we leverage these properties and present a unified multimodal generation model, which can perform text-based, image-based, and even vision-language simultaneous generation using a single model. |
Minghui Hu; Chuanxia Zheng; Heliang Zheng; Tat-Jen Cham; Chaoyue Wang; Zuopeng Yang; Dacheng Tao; Ponnuthurai N. Suganthan; |
590 | CrAM: A Compression-Aware Minimizer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. |
Alexandra Peste; Adrian Vladu; Eldar Kurtic; Christoph H Lampert; Dan Alistarh; |
591 | Winning Both The Accuracy of Floating Point Activation and The Simplicity of Integer Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to simultaneously achieve the accuracy of FP activation and the simplicity of integer arithmetic, we present a method for replacing FP arithmetic with integer one without changing FP activations in the storage format while weights are quantized. |
Yulhwa Kim; Jaeyong Jang; Jehun Lee; Jihoon Park; Jeonghoon Kim; Byeongwook Kim; Baeseong park; Se Jung Kwon; Dongsoo Lee; jae-joon kim; |
592 | Variance-Aware Sparse Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T \sigma_t^2} + 1\right)$, where $\sigma_t^2$ is the variance of the noise at the $t$-th round. |
Yan Dai; Ruosong Wang; Simon Shaolei Du; |
593 | DDM$^2$: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose Denoising Diffusion Models for Denoising Diffusion MRI (DDM^2), a self-supervised denoising method for MRI denoising using diffusion denoising generative models. |
Tiange Xiang; Mahmut Yurt; Ali B Syed; Kawin Setsompop; Akshay Chaudhari; |
594 | Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Voxurf, a voxel-based surface reconstruction approach that is both efficient and accurate. |
Tong Wu; Jiaqi Wang; Xingang Pan; Xudong XU; Christian Theobalt; Ziwei Liu; Dahua Lin; |
595 | Guiding Energy-based Models Via Contrastive Latent Variables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel and effective framework for improving EBMs via contrastive representation learning (CRL). |
Hankook Lee; Jongheon Jeong; Sejun Park; Jinwoo Shin; |
596 | The Curious Case of Benign Memorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that deep models have the surprising ability to separate noise from signal by distributing the task of memorization and feature learning to different layers. |
Sotiris Anagnostidis; Gregor Bachmann; Lorenzo Noci; Thomas Hofmann; |
597 | Non-parametric Outlier Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework, non-parametric outlier synthesis (NPOS), which generates artificial OOD training data and facilitates learning a reliable decision boundary between ID and OOD data. |
Leitian Tao; Xuefeng Du; Jerry Zhu; Yixuan Li; |
598 | Deep Ranking Ensembles for Hyperparameter Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, we present a novel method that meta-learns neural network surrogates optimized for ranking the configurations’ performances while modeling their uncertainty via ensembling. |
Abdus Salam Khazi; Sebastian Pineda Arango; Josif Grabocka; |
599 | Unmasking The Lottery Ticket Hypothesis: What’s Encoded in A Winning Ticket’s Mask? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And why is iterative pruning needed, i.e. why can’t we prune to very high sparsities in one shot? We develop answers to these questions in terms of the geometry of the error landscape. |
Mansheej Paul; Feng Chen; Brett W. Larsen; Jonathan Frankle; Surya Ganguli; Gintare Karolina Dziugaite; |
600 | Building A Subspace of Policies for Scalable Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to strike a better balance between scalability and performance by designing a method whose size grows adaptively depending on the task sequence. |
Jean-Baptiste Gaya; Thang Doan; Lucas Caccia; Laure Soulier; Ludovic Denoyer; Roberta Raileanu; |
601 | Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the concept of the multi-view point cloud (Voint cloud), representing each 3D point as a set of features extracted from several view-points. |
Abdullah Hamdi; Silvio Giancola; Bernard Ghanem; |
602 | Constructive TT-representation of The Tensors Given As Index Interaction Functions with Applications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a method to build explicit tensor-train (TT) representations. |
Gleb Ryzhakov; Ivan Oseledets; |
603 | Real-Time Image Demoir$\acute{e}$ing on Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we launch the first study on accelerating demoir$\acute{e}$ing networks and propose a dynamic demoir$\acute{e}$ing acceleration method (DDA) towards a real-time deployment on mobile devices. |
Yuxin Zhang; Mingbao Lin; Xunchao Li; Han Liu; Guozhi Wang; Fei Chao; Ren Shuai; Yafei Wen; Xiaoxin Chen; Rongrong Ji; |
604 | Simple and Scalable Nearest Neighbor Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple and scalable nearest neighbor machine translation framework to drastically promote the decoding and storage efficiency of $k$NN-based models while maintaining the translation performance. |
Yuhan Dai; Zhirui Zhang; Qiuzhi Liu; Qu Cui; Weihua Li; Yichao Du; Tong Xu; |
605 | Near-optimal Coresets for Robust Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider robust clustering problems in $\mathbb{R}^d$, specifically $k$-clustering problems (e.g., $k$-Median and $k$-Means) with $m$ \emph{outliers}, where the cost for a given center set $C \subset \mathbb{R}^d$ aggregates the distances from $C$ to all but the furthest $m$ data points, instead of all points as in classical clustering. |
Lingxiao Huang; Shaofeng H.-C. Jiang; Jianing Lou; Xuan Wu; |
606 | Temporal Dependencies in Feature Importance for Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Time series data introduces two key challenges for explainability methods: firstly, observations of the same feature over subsequent time steps are not independent, and secondly, the same feature can have varying importance to model predictions over time. In this paper, we propose Windowed Feature Importance in Time (WinIT), a feature removal based explainability approach to address these issues. |
Kin Kwan Leung; Clayton Rooke; Jonathan Smith; Saba Zuberi; Maksims Volkovs; |
607 | Reliability of CKA As A Similarity Measure in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. |
MohammadReza Davari; Stefan Horoi; Amine Natik; Guillaume Lajoie; Guy Wolf; Eugene Belilovsky; |
608 | Identifiability Results for Multimodal Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present new identifiability results for multimodal contrastive learning, showing that it is possible to recover shared factors in a more general setup than the multi-view setting studied previously. |
Imant Daunhawer; Alice Bizeul; Emanuele Palumbo; Alexander Marx; Julia E Vogt; |
609 | Prototypical Calibration for Few-shot Learning of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose prototypical calibration to adaptively learn a more robust decision boundary for zero- and few-shot classification, instead of greedy decoding. |
Zhixiong Han; Yaru Hao; Li Dong; Yutao Sun; Furu Wei; |
610 | Rotamer Density Estimators Are Unsupervised Learners of The Effect of Mutations on Protein-Protein Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that mutational effects on binding can be predicted by the change in conformational flexibility of the protein-protein interface. |
Shitong Luo; Yufeng Su; Zuofan Wu; Chenpeng Su; Jian Peng; Jianzhu Ma; |
611 | Exploring Active 3D Object Detection from A Generalization Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our empirical study, however, suggests that mainstream uncertainty-based and diversity-based active learning policies are not effective when applied in the 3D detection task, as they fail to balance the trade-off between point cloud informativeness and box-level annotation costs. To overcome this limitation, we jointly investigate three novel criteria in our framework CRB for point cloud acquisition – label conciseness, feature representativeness and geometric balance, which hierarchically filters out the point clouds of redundant 3D bounding box labels, latent features and geometric characteristics (e.g., point cloud density) from the unlabeled sample pool and greedily selects informative ones with fewer objects to annotate. |
Yadan Luo; Zhuoxiao Chen; Zijian Wang; Xin Yu; Zi Huang; Mahsa Baktashmotlagh; |
612 | Copy Is All You Need Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. |
Tian Lan; Deng Cai; Yan Wang; Heyan Huang; Xian-Ling Mao; |
613 | Latent State Marginalization As A Low-cost Approach to Improving Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the adoption of latent variable policies within the MaxEnt framework, which we can provably approximate any policy distribution, and additionally, naturally emerges under the use of world models with a latent belief state. |
Dinghuai Zhang; Aaron Courville; Yoshua Bengio; Qinqing Zheng; Amy Zhang; Ricky T. Q. Chen; |
614 | Generative Augmented Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, intermediate rewards play a critical role in learning, for example from intrinsic motivation to provide intermediate feedback even in particularly challenging sparse reward tasks. Inspired by this, we propose Generative Augmented Flow Networks (GAFlowNets), a novel learning framework to incorporate intermediate rewards into GFlowNets. |
Ling Pan; Dinghuai Zhang; Aaron Courville; Longbo Huang; Yoshua Bengio; |
615 | What Makes Convolutional Models Great on Long Sequence Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the two principles, we propose a simple yet effective convolutional model called Structured Global Convolution (SGConv). |
Yuhong Li; Tianle Cai; Yi Zhang; Deming Chen; Debadeepta Dey; |
616 | Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This way, the finetuned models are biased towards the limited reference solutions, which limits their generalization to unseen examples. To mitigate this issue, we propose to let the model perform sampling during training and learn from both self-sampled fully-correct solutions, which yield the correct answer upon execution, and partially-correct solutions, whose intermediate state matches an intermediate state of a known correct solution. |
Ansong Ni; Jeevana Priya Inala; Chenglong Wang; Alex Polozov; Christopher Meek; Dragomir Radev; Jianfeng Gao; |
617 | GFlowNets and Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically used to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been used for distributions over discrete structures such as graphs. |
Nikolay Malkin; Salem Lahlou; Tristan Deleu; Xu Ji; Edward J Hu; Katie E Everett; Dinghuai Zhang; Yoshua Bengio; |
618 | Unsupervised 3d Object Learning Through Neuron Activity Aware Plasticity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an unsupervised deep learning model for 3D object classification. |
Beomseok Kang; Biswadeep Chakraborty; Saibal Mukhopadhyay; |
619 | Heterogeneous Neuronal and Synaptic Dynamics for Spike-Efficient Unsupervised Learning: Theory and Design Principles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper shows that the heterogeneity in neuronal and synaptic dynamics reduces the spiking activity of a Recurrent Spiking Neural Network (RSNN) while improving prediction performance, enabling spike-efficient (unsupervised) learning. |
Biswadeep Chakraborty; Saibal Mukhopadhyay; |
620 | Associative Memory Augmented Asynchronous Spatiotemporal Representation Learning for Event-based Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\textit{EventFormer}$, a computationally efficient event-based representation learning framework for asynchronously processing event camera data. |
Uday Kamal; Saurabh Dash; Saibal Mukhopadhyay; |
621 | Backstepping Temporal Difference Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the divergent behavior, several off-policy TD learning algorithms have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective. |
Han-Dong Lim; Donghwan Lee; |
622 | TrojText: Test-time Invisible Textual Trojan Insertion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TrojText to study whether the invisible textual Trojan attack can be efficiently performed without the presence of training data in a more realistic and cost-efficient manner. |
Yepeng Liu; Bo Feng; Qian Lou; |
623 | Cross-Layer Retrospective Retrieving Via Layer Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. |
Yanwen Fang; Yuxi CAI; Jintai Chen; Jingyu Zhao; Guangjian Tian; Guodong Li; |
624 | Provable Defense Against Geometric Transformations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, no prior work has been able to incorporate the objective of deterministic certified robustness against geometric transformations into the training procedure, as existing verifiers are exceedingly slow. To address these challenges, we propose the first provable defense for deterministic certified geometric robustness. |
Rem Yang; Jacob Laurel; Sasa Misailovic; Gagandeep Singh; |
625 | Improving Deep Regression with Ordinal Entropy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the analysis, we propose an ordinal entropy loss to encourage higher-entropy feature spaces while maintaining ordinal relationships to improve the performance of regression tasks. |
Shihao Zhang; Linlin Yang; Michael Bi Mi; Xiaoxu Zheng; Angela Yao; |
626 | Learning Object-Language Alignments for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data. |
Chuang Lin; Peize Sun; Yi Jiang; Ping Luo; Lizhen Qu; Gholamreza Haffari; Zehuan Yuan; Jianfei Cai; |
627 | SP2 : A Second Order Stochastic Polyak Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show SP2 is very competitive on matrix completion, non-convex test problems and logistic regression. |
Shuang Li; William Joseph Swartworth; Martin Takáč; Deanna Needell; Robert M. Gower; |
628 | ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Embedding models have yielded promising results for KGC, yet any current KGC embedding model is incapable of: (1) fully capturing vital inference patterns (e.g., composition), (2) capturing prominent logical rules jointly (e.g., hierarchy and composition), and (3) providing an intuitive interpretation of captured patterns. In this work, we propose ExpressivE, a fully expressive spatio-functional embedding model that solves all these challenges simultaneously. |
Aleksandar Pavlovic; Emanuel Sallinger; |
629 | Pareto-Optimal Diagnostic Policy Learning in Clinical Applications Via Semi-Model-Based Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. |
Zheng Yu; Yikuan Li; Joseph Chahn Kim; Kaixuan Huang; Yuan Luo; Mengdi Wang; |
630 | Memory Gym: Partially Observable Challenges to Memory-Based Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Memory Gym is a novel benchmark for challenging Deep Reinforcement Learning agents to memorize events across long sequences, be robust to noise, and generalize. |
Marco Pleines; Matthias Pallasch; Frank Zimmer; Mike Preuss; |
631 | Multi-lingual Evaluation of Code Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MBXP, an execution-based code completion benchmark in 10+ programming languages. |
Ben Athiwaratkun; Sanjay Krishna Gouda; Zijian Wang; Xiaopeng Li; Yuchen Tian; Ming Tan; Wasi Uddin Ahmad; Shiqi Wang; Qing Sun; Mingyue Shang; Sujan Kumar Gonugondla; Hantian Ding; Varun Kumar; Nathan Fulton; Arash Farahani; Siddhartha Jain; Robert Giaquinto; Haifeng Qian; Murali Krishna Ramanathan; Ramesh Nallapati; Baishakhi Ray; Parminder Bhatia; Sudipta Sengupta; Dan Roth; Bing Xiang; |
632 | Neural Design for Genetic Perturbation Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget. Inspired by this application domain, we study the problem of batch query bandit optimization and introduce the Optimistic Arm Elimination ($\mathrm{OAE}$) principle designed to find an almost optimal arm under different functional relationships between the queries (arms) and the outputs (rewards). |
Aldo Pacchiano; Drausin Wulsin; Robert A Barton; Luis Voloch; |
633 | MPCFORMER: FAST, PERFORMANT AND PRIVATE TRANSFORMER INFERENCE WITH MPC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design the framework MPCFORMER using secure multi-party computation (MPC) and Knowledge Distillation (KD). |
Dacheng Li; Hongyi Wang; Rulin Shao; Han Guo; Eric Xing; Hao Zhang; |
634 | Searching Lottery Tickets in Graph Neural Networks: A Dual Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the searching of graph lottery tickets from a complementary perspective — transforming a random ticket into a graph lottery ticket, which allows us to more comprehensively explore the relationships between the original network/graph and their sparse counterpart. |
Kun Wang; Yuxuan Liang; Pengkun Wang; Xu Wang; Pengfei Gu; Junfeng Fang; Yang Wang; |
635 | Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an efficient algorithm LSUOB-REPS which achieves $\widetilde{O}(dS^2\sqrt{K}+\sqrt{HSAK})$ regret guarantee with high probability, where $d$ is the ambient dimension of the feature mapping, $S$ is the size of the state space, $A$ is the size of the action space, $H$ is the episode length and $K$ is the number of episodes. |
Canzhe Zhao; Ruofeng Yang; Baoxiang Wang; Shuai Li; |
636 | Partial Label Unsupervised Domain Adaptation with Class-Prototype Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this assumption does not hold in many real-world scenarios where the training and test data come from different distributions. In this paper, we formalize this learning scenario as a new problem called partial label unsupervised domain adaptation (PLUDA). |
Yan Yan; Yuhong Guo; |
637 | ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ResAct which seeks a policy that is close to, but better than, the online-serving policy.In this way, we can collect sufficient data near the learned policy so that state-action values can be properly estimated, and there is no need to perform online exploration. |
Wanqi Xue; Qingpeng Cai; Ruohan Zhan; Dong Zheng; Peng Jiang; Kun Gai; Bo An; |
638 | Tailoring Language Generation Models Under Total Variation Distance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, in the attempt to cover the low-probability regions in the data distribution, the model systematically overestimates the probability of corrupted text sequences, which we conjecture is one of the main reasons for text degeneration during autoregressive decoding. To remedy this problem, we leverage the total variation distance (TVD) with its robustness to outliers, and develop practical bounds to apply it to language generation. |
Haozhe Ji; Pei Ke; Zhipeng Hu; Rongsheng Zhang; Minlie Huang; |
639 | Toward Adversarial Training on Contextualized Language Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the observations, we propose simple yet effective \textit{Contextualized representation-Adversarial Training} (CreAT), in which the attack is explicitly optimized to deviate the contextualized representation and obtains the global worst-case adversarial examples. |
Hongqiu Wu; Yongxiang Liu; Hanwen Shi; hai zhao; Min Zhang; |
640 | Composite Slice Transformer: An Efficient Transformer with Composition of Multi-Scale Multi-Range Attentions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Composite Slice Transformer (CST), a Transformer-based network equipped with a composition of multi-scale multi-range attentions, boosting both efficiency and modeling capability. |
Mingu Lee; Saurabh Pitre; Tianyu Jiang; Pierre-David Letourneau; Matthew J Morse; Kanghwan Jang; Joseph Soriaga; Parham Noorzad; Hsin-Pai Cheng; Christopher Lott; |
641 | User-Interactive Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: At the same time, offline RL algorithms are not able to tune their most important hyperparameter – the proximity of the learned policy to the original policy. We propose an algorithm that allows the user to tune this hyperparameter at runtime, thereby addressing both of the above mentioned issues simultaneously. |
Phillip Swazinna; Steffen Udluft; Thomas Runkler; |
642 | Monocular Scene Reconstruction with 3D SDF Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an SDF transformer network, which replaces the role of 3D CNN for better 3D feature aggregation. |
Weihao Yuan; Xiaodong Gu; Heng Li; Zilong Dong; Siyu Zhu; |
643 | DENSE RGB SLAM WITH NEURAL IMPLICIT MAPS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a dense RGB SLAM method with neural implicit map representation. |
Heng Li; Xiaodong Gu; Weihao Yuan; luwei yang; Zilong Dong; Ping Tan; |
644 | Federated Learning As Variational Inference: A Scalable Expectation Propagation Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper extends the inference view and describes a variational inference formulation of federated learning where the goal is to find a global variational posterior that well-approximates the true posterior. |
Han Guo; Philip Greengard; Hongyi Wang; Andrew Gelman; Eric Xing; Yoon Kim; |
645 | Near-optimal Policy Identification in Active Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the AE-LSVI algorithm for best policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). |
Xiang Li; Viraj Mehta; Johannes Kirschner; Ian Char; Willie Neiswanger; Jeff Schneider; Andreas Krause; Ilija Bogunovic; |
646 | Semi-supervised Community Detection Via Structural Similarity Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast algorithm that computes a `structural similarity metric’ between the new node and each of the $K$ communities, aggregating information in labeled and unlabeled data. |
Yicong Jiang; Tracy Ke; |
647 | Diffusion Models for Causal Discovery Via Topological Ordering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing computational methods for obtaining the Hessian still do not scale as the number of variables and the number of samples are increased. Therefore, inspired by recent innovations in diffusion probabilistic models (DPMs), we propose \emph{DiffAN}, a topological ordering algorithm that leverages DPMs. |
Pedro Sanchez; Xiao Liu; Alison Q O’Neil; Sotirios A. Tsaftaris; |
648 | FoSR: First-order Spectral Rewiring for Addressing Oversquashing in GNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a computationally efficient algorithm that prevents oversquashing by systematically adding edges to the graph based on spectral expansion. |
Kedar Karhadkar; Pradeep Kr. Banerjee; Guido Montufar; |
649 | One-Pixel Shortcut: On The Learning Preference of Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we resolve this problem from a novel perspective by perturbing only one pixel in each image.By OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy. |
Shutong Wu; Sizhe Chen; Cihang Xie; Xiaolin Huang; |
650 | Defending Against Adversarial Audio Via Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models. |
Shutong Wu; Jiongxiao Wang; Wei Ping; Weili Nie; Chaowei Xiao; |
651 | Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It persists even if unbiased predictions are achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we quantify maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. |
Yewen Fan; Nian Si; Kun Zhang; |
652 | Scalable Estimation of Nonparametric Markov Networks with Mixed-Type Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we generalize the characterization of the conditional independence structure to handle general distributions for all data types (i.e., continuous, discrete, and mixed-type) with general functional relations among variables, thus giving rise to a Markov network structure learning algorithm in one of the most general settings. |
Yujia Zheng; Ignavier Ng; Yewen Fan; Kun Zhang; |
653 | FIT: A Metric for Model Sensitivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that FIT can estimate the final performance of a network without retraining. |
Ben Zandonati; Adrian Alan Pol; Maurizio Pierini; Olya Sirkin; Tal Kopetz; |
654 | Avoiding Spurious Correlations Via Logit Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explicitly consider the presence of the potential spurious correlations exist in the majority of training data. |
Sheng Liu; Xu Zhang; Nitesh Sekhar; Yue Wu; Prateek Singhal; Carlos Fernandez-Granda; |
655 | Outcome-directed Reinforcement Learning By Uncertainty \& Temporal Distance-Aware Curriculum Goal Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate it, we propose an uncertainty \& temporal distance-aware curriculum goal generation method for the outcome-directed RL via solving a bipartite matching problem. |
Daesol Cho; Seungjae Lee; H. Jin Kim; |
656 | Packed Ensembles for Efficient Uncertainty Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Packed-Ensembles (PE), a strategy to design and train lightweight structured ensembles by carefully modulating the dimension of their encoding space. |
Olivier Laurent; Adrien Lafage; Enzo Tartaglione; Geoffrey Daniel; Jean-marc Martinez; Andrei Bursuc; Gianni Franchi; |
657 | Calibrating The Rigged Lottery: Making All Tickets Reliable Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new sparse training method to produce sparse models with improved confidence calibration. |
Bowen Lei; Ruqi Zhang; Dongkuan Xu; Bani Mallick; |
658 | Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. |
Nicolai Dorka; Tim Welschehold; Wolfram Burgard; |
659 | Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that language models tend to perform fairly well at single step inference or entailment tasks, but struggle to chain together multiple reasoning steps to solve more complex problems. In light of this, we propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as general processing modules, and alternates between selection and inference to generate a series of interpretable, casual reasoning steps leading to the final answer. |
Antonia Creswell; Murray Shanahan; Irina Higgins; |
660 | Iterative Patch Selection for High-Resolution Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple method, Iterative Patch Selection (IPS), which decouples the memory usage from the input size and thus enables the processing of arbitrarily large images under tight hardware constraints. |
Benjamin Bergner; Christoph Lippert; Aravindh Mahendran; |
661 | Neural Networks Efficiently Learn Low-Dimensional Representations with SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1}, \boldsymbol{x}\rangle, …, \langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. |
Alireza Mousavi-Hosseini; Sejun Park; Manuela Girotti; Ioannis Mitliagkas; Murat A Erdogdu; |
662 | Contextual Bandits with Concave Rewards, and An Application to Fair Ranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first algorithm with provably vanishing regret for CBCR without restrictions on the policy space, whereas prior works were restricted to finite policy spaces or tabular representations. |
Virginie Do; Elvis Dohmatob; Matteo Pirotta; Alessandro Lazaric; Nicolas Usunier; |
663 | MA-BERT: Towards Matrix Arithmetic-only BERT Inference By Eliminating Complex Non-linear Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose four correlated techniques that include approximating softmax with a two-layer neural network, replacing GELU with ReLU, fusing normalization layers with adjacent linear layers, and leveraging knowledge transfer from baseline models. |
Neo Wei Ming; Zhehui Wang; Cheng Liu; Rick Siow Mong Goh; Tao Luo; |
664 | Stochastic Multi-Person 3D Motion Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a dual-level generative modeling framework that separately models independent individual movements at the local level and social interactions at the global level. |
Sirui Xu; Yu-Xiong Wang; Liangyan Gui; |
665 | Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach tailored for 3D CNNs considering the model complexity. |
Junyan Wang; Zhenhong Sun; Yichen Qian; Dong Gong; Xiuyu Sun; Ming Lin; Maurice Pagnucco; Yang Song; |
666 | Training Language Models for Deeper Understanding Improves Brain Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is still an open question whether these models are learning a deeper understanding of the text, or if the models are simply learning a heuristic to complete the task. This work investigates this further by turning to the one language processing system that truly understands complex language: the human brain. |
Khai Loong Aw; Mariya Toneva; |
667 | Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. |
Albert Qiaochu Jiang; Sean Welleck; Jin Peng Zhou; Timothee Lacroix; Jiacheng Liu; Wenda Li; Mateja Jamnik; Guillaume Lample; Yuhuai Wu; |
668 | Mini-batch $k$-means Terminates Within $O(d/\epsilon)$ Iterations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We answer the question: Does \emph{local} progress (on batches) imply \emph{global} progress (on the entire dataset) for mini-batch $k$-means? |
Gregory Schwartzman; |
669 | Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. |
Jae Oh Woo; |
670 | Explaining Temporal Graph Models Through An Explorer-Navigator Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap, in this paper, we propose T-GNNExplainer for temporal graph model explanation. |
Wenwen Xia; Mincai Lai; Caihua Shan; Yao Zhang; Xinnan Dai; Xiang Li; Dongsheng Li; |
671 | FedExP: Speeding Up Federated Averaging Via Extrapolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present FedExP, a method to adaptively determine the server step size in FL based on dynamically varying pseudo-gradients throughout the FL process. |
Divyansh Jhunjhunwala; Shiqiang Wang; Gauri Joshi; |
672 | Planning with Large Language Models for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Transformer decoding algorithm, Planning-Guided Transformer Decoding (PG-TD), that uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs. |
Shun Zhang; Zhenfang Chen; Yikang Shen; Mingyu Ding; Joshua B. Tenenbaum; Chuang Gan; |
673 | Hyper-Decision Transformer for Efficient Online Policy Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. |
Mengdi Xu; Yuchen Lu; Yikang Shen; Shun Zhang; Ding Zhao; Chuang Gan; |
674 | Multimodal Analogical Reasoning Over Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the new task of multimodal analogical reasoning over a knowledge graph, which requires multimodal reasoning ability with the help of background knowledge.Specifically, we construct a Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph MarKG. |
Ningyu Zhang; Lei Li; Xiang Chen; Xiaozhuan Liang; Shumin Deng; Huajun Chen; |
675 | Provable Memorization Capacity of Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the first study of the memorization capacity of the Transformer architecture. |
Junghwan Kim; Michelle Kim; Barzan Mozafari; |
676 | Countinuous Pseudo-labeling from The Start Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We believe this has the potential for over-fitting to the labeled dataset in low resource settings and that ST from the start of training should reduce over-fitting. In this paper we show how we can do this by dynamically controlling the evolution of PLs during the training process in ASR. |
Dan Berrebbi; Ronan Collobert; Samy Bengio; Navdeep Jaitly; Tatiana Likhomanenko; |
677 | Recursive Time Series Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training on available realizations, where data is limited, often induces severe over-fitting thereby preventing generalization. To address this issue, we introduce a general recursive framework for time series augmentation, which we call the Recursive Interpolation Method (RIM). |
Amine Mohamed Aboussalah; Minjae Kwon; Raj G Patel; Cheng Chi; Chi-Guhn Lee; |
678 | InCoder: A Generative Model for Code Infilling and Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via masking and infilling). |
Daniel Fried; Armen Aghajanyan; Jessy Lin; Sida Wang; Eric Wallace; Freda Shi; Ruiqi Zhong; Scott Yih; Luke Zettlemoyer; Mike Lewis; |
679 | Learning Where and When to Reason in Neuro-symbolic Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, all the existing methods usually either impose the constraints in a “weak” form at training time, with no guarantees at inference, or fail to provide a general framework that supports different tasks and constraint types. We tackle this open problem from a neuro-symbolic perspective. |
Cristina Cornelio; Jan Stuehmer; Shell Xu Hu; Timothy Hospedales; |
680 | A Time Series Is Worth 64 Words: Long-term Forecasting with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. |
Yuqi Nie; Nam H Nguyen; Phanwadee Sinthong; Jayant Kalagnanam; |
681 | Diversify and Disambiguate: Out-of-Distribution Robustness Via Disagreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DivDis, a simple two-stage framework for identifying and resolving ambiguity in data. |
Yoonho Lee; Huaxiu Yao; Chelsea Finn; |
682 | Surgical Fine-Tuning Improves Adaptation to Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common approach to transfer learning under distribution shift is to fine-tune the last few layers of a pre-trained model, preserving learned features while also adapting to the new task. This paper shows that in such settings, selectively fine-tuning a subset of layers (which we term surgical fine-tuning) matches or outperforms commonly used fine-tuning approaches. |
Yoonho Lee; Annie S Chen; Fahim Tajwar; Ananya Kumar; Huaxiu Yao; Percy Liang; Chelsea Finn; |
683 | Predictive Inference with Feature Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation learning. |
Jiaye Teng; Chuan Wen; Dinghuai Zhang; Yoshua Bengio; Yang Gao; Yang Yuan; |
684 | On The Edge of Benign Overfitting: Label Noise and Overparameterization Level Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we examine whether overfitting is truly benign in real-world classification tasks. |
Kaiyue Wen; Jiaye Teng; Jingzhao Zhang; |
685 | Sampling with Mollified Interaction Energy Descent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new optimization-based method for sampling called mollified interaction energy descent (MIED). |
Lingxiao Li; qiang liu; Anna Korba; Mikhail Yurochkin; Justin Solomon; |
686 | A Non-asymptotic Analysis of Oversmoothing in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we distinguish between two different effects when applying graph convolutions—an undesirable mixing effect that homogenizes node representations in different classes, and a desirable denoising effect that homogenizes node representations in the same class. |
Xinyi Wu; Zhengdao Chen; William Wei Wang; Ali Jadbabaie; |
687 | Systematic Rectification of Language Models Via Dead-end Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we formally extend the dead-end theory from the recent reinforcement learning (RL) literature to also cover uncertain outcomes. |
Meng Cao; Mehdi Fatemi; Jackie CK Cheung; Samira Shabanian; |
688 | CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL), namely the reward extrapolation error, where the learned reward function may fail to explain the task correctly and misguide the agent in unseen environments due to the intrinsic covariate shift. |
Sheng Yue; Guanbo Wang; Wei Shao; Zhaofeng Zhang; Sen Lin; Ju Ren; Junshan Zhang; |
689 | Agent-based Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel graph neural network we call AgentNet, which is designed specifically for graph-level tasks. |
Karolis Martinkus; Pál András Papp; Benedikt Schesch; Roger Wattenhofer; |
690 | Compositional Task Generalization with Discovered Successor Feature Modules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel neural network architecture, “Modular Successor Feature Approximators” (MSFA), where modules both discover what is useful to predict, and learn their own predictive representations. |
Wilka Torrico Carvalho; Angelos Filos; Richard Lewis; Honglak Lee; Satinder Singh; |
691 | Spike Calibration: Bridging The Gap Between ANNs and SNNs in ANN-SNN Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the performance degrades severely under low time-steps, which hampers the practical applications of SNNs on neuromorphic chips. In this paper, instead of evaluating different conversion errors and then eliminating these errors, we define offset spike to measure the deviation degree of actual and desired firing rates of SNNs. |
Zecheng Hao; Jianhao Ding; Tong Bu; Tiejun Huang; Zhaofei Yu; |
692 | An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an Equal-size Hard Expectation–Maximization (EqHard-EM) algorithm to train a multi-decoder model for diverse dialogue generation. |
Yuqiao Wen; Yongchang Hao; Yanshuai Cao; Lili Mou; |
693 | DepthFL : Depthwise Federated Learning for Heterogeneous Clients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new approach based on depth scaling called DepthFL. |
Minjae Kim; Sangyoon Yu; Suhyun Kim; Soo-Mook Moon; |
694 | Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the conditions under which artificial agents may naturally develop turn-taking conventions in a simple language game. |
Valentin Taillandier; Dieuwke Hupkes; Benoît Sagot; Emmanuel Dupoux; Paul Michel; |
695 | MaskFusion: Feature Augmentation for Click-Through Rate Prediction Via Input-adaptive Mask Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these works either suffer from poor high-order feature interaction modeling using DNN or ignore the balance between generalization and memorization during the recommendation. To mitigate these problems, we propose an adaptive feature fusion framework called MaskFusion, to additionally capture the explicit interactions between the input feature and the existing deep part structure of deep CTR models dynamically, besides the common feature interactions proposed in existing works. |
Chao Liao; Jianchao Tan; Jiyuan Jia; Yi Guo; Chengru Song; |
696 | Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most prior attempts at this problem proposed neural networks that implement independent component analysis, which works under the limitation that latent elements are mutually independent. Here, we relax this limitation and propose a biologically plausible neural network that extracts correlated latent sources by exploiting information about their domains. |
Bariscan Bozkurt; Ateş İsfendiyaroğlu; Cengiz Pehlevan; Alper Tunga Erdogan; |
697 | Multifactor Sequential Disentanglement Via Structured Koopman Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. |
Nimrod Berman; Ilan Naiman; Omri Azencot; |
698 | Learnable Behavior Control: Breaking Atari Human World Records Via Sample-Efficient Behavior Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general framework called Learnable Behavioral Control (LBC) to address the limitation, which a) enables a significantly enlarged behavior selection space via formulating a hybrid behavior mapping from all policies; b) constructs a unified goal-directed learnable process for behavior selection. |
Jiajun Fan; Yuzheng Zhuang; Yuecheng Liu; Jianye HAO; Bin Wang; Jiangcheng Zhu; Hao Wang; Shu-Tao Xia; |
699 | Variance Reduction Is An Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression As A Cherry on The Top Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions. |
Eduard Gorbunov; Samuel Horváth; Peter Richtárik; Gauthier Gidel; |
700 | Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the explainability for NLI by weakly supervised logical reasoning, and propose an Explainable Phrasal Reasoning (EPR) approach. |
Zijun Wu; Zi Xuan Zhang; Atharva Naik; Zhijian Mei; Mauajama Firdaus; Lili Mou; |
701 | Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. |
Maksym Yatsura; Kaspar Sakmann; N. Grace Hua; Matthias Hein; Jan Hendrik Metzen; |
702 | Function-space Regularized Rényi Divergences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new family of regularized Rényi divergences parametrized not only by the order $\alpha$ but also by a variational function space. |
Jeremiah Birrell; Yannis Pantazis; Paul Dupuis; Luc Rey-Bellet; Markos Katsoulakis; |
703 | Novel View Synthesis with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 3DiM (pronounced three-dim), a diffusion model for 3D novel view synthesis from as few as a single image. |
Daniel Watson; William Chan; Ricardo Martin Brualla; Jonathan Ho; Andrea Tagliasacchi; Mohammad Norouzi; |
704 | Guarded Policy Optimization with Imperfect Online Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we relax the assumption of a well-performing teacher and develop a new method that can incorporate arbitrary teacher policies with modest or inferior performance. |
Zhenghai Xue; Zhenghao Peng; Quanyi Li; Zhihan Liu; Bolei Zhou; |
705 | LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LilNetX, an end-to-end trainable technique for neural networks that enables learning models with specified accuracy-rate-computation trade-off. |
Sharath Girish; Kamal Gupta; Saurabh Singh; Abhinav Shrivastava; |
706 | Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, all existing backdoor attacks exclusively require to modify training inputs (e.g., images), which may be impractical in real-world applications. In this paper, we aim to break this wall and propose the first clean-image backdoor attack, which only poisons the training labels without touching the training samples. |
Kangjie Chen; Xiaoxuan Lou; Guowen Xu; Jiwei Li; Tianwei Zhang; |
707 | Markup-to-Image Diffusion Models with Scheduled Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on recent advances in image generation, we present a fully data-driven approach to rendering markup into images. |
Yuntian Deng; Noriyuki Kojima; Alexander M Rush; |
708 | How Much Space Has Been Explored? Measuring The Chemical Space Covered By Databases and Machine-Generated Molecules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel evaluation framework for measures of the chemical space based on two analyses: an axiomatic analysis with two intuitive axioms that a good measure should obey, and an empirical analysis on the correlation between a measure and a proxy gold standard. |
Yutong Xie; Ziqiao Xu; Jiaqi Ma; Qiaozhu Mei; |
709 | Understanding DDPM Latent Codes Through Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While having important practical applications, such as the estimation of the likelihood, the theoretical properties of this map are not yet fully understood. In the present work, we partially address this question for the popular case of the VP-SDE (DDPM) approach. |
Valentin Khrulkov; Gleb Ryzhakov; Andrei Chertkov; Ivan Oseledets; |
710 | Achieve The Minimum Width of Neural Networks for Universal Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider neural networks with an arbitrary set of activation functions. |
Yongqiang Cai; |
711 | UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. |
Jiasen Lu; Christopher Clark; Rowan Zellers; Roozbeh Mottaghi; Aniruddha Kembhavi; |
712 | Distributed Differential Privacy in Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to obtain a pure-DP guarantee under distributed trust model while sacrificing no more regret than that under central trust model. |
Sayak Ray Chowdhury; Xingyu Zhou; |
713 | Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the extrapolation properties of GD when applied to overparameterized linear RNNs. |
Edo Cohen-Karlik; Itamar Menuhin-Gruman; Nadav Cohen; Raja Giryes; Amir Globerson; |
714 | Betty: An Automatic Differentiation Library for Multilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for large-scale MLO. |
Sang Keun Choe; Willie Neiswanger; Pengtao Xie; Eric Xing; |
715 | Near-Optimal Adversarial Reinforcement Learning with Switching Costs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we propose two novel switching-reduced algorithms with regrets that match our lower bound when the transition function is known, and match our lower bound within a small factor of $\tilde{O}( H^{1/3} )$ when the transition function is unknown. Our regret analysis demonstrates the near-optimal performance of them. |
Ming Shi; Yingbin Liang; Ness Shroff; |
716 | NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we observe that existing graph Transformers treat nodes as independent tokens and construct a single long sequence composed of all node tokens so as to train the Transformer model, causing it hard to scale to large graphs due to the quadratical complexity on the number of nodes for the self-attention computation. |
Jinsong Chen; Kaiyuan Gao; Gaichao Li; Kun He; |
717 | TiAda: A Time-scale Adaptive Algorithm For Nonconvex Minimax Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a single-loop adaptive GDA algorithm called TiAda for nonconvex minimax optimization that automatically adapts to the time-scale separation. |
Xiang Li; Junchi YANG; Niao He; |
718 | Collaborative Pure Exploration in Kernel Bandit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Collaborative Pure Exploration in Kernel Bandit model (CoPE-KB), where multiple agents collaborate to complete different but related tasks with limited communication. |
Yihan Du; Wei Chen; Yuko Kuroki; Longbo Huang; |
719 | Clifford Neural Layers for PDE Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Their algebraic properties, such as multiplication, addition and other arithmetic operations can be described by Clifford algebras. To our knowledge, this paper presents the first usage of such multivector representations together with Clifford convolutions and Clifford Fourier transforms in the context of deep learning. |
Johannes Brandstetter; Rianne van den Berg; Max Welling; Jayesh K Gupta; |
720 | How Informative Is The Approximation Error from Tensor Decomposition for Neural Network Compression? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show empirically an approximation error resulting from compressing a network layer with tensor decomposition is correlated with the classification error, enabling the choice of layer, decomposition and rank to be based on the approximation error. |
Jetze Schuurmans; kim batselier; Julian Kooij; |
721 | CUDA: Curriculum of Data Augmentation for Long-tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we first investigate the correlation between the degree of augmentation and class-wise performance, and find that the proper degree of augmentation must be allocated for each class to mitigate class imbalance problems. Motivated by this finding, we propose a simple and efficient novel curriculum, which is designed to find the appropriate per-class strength of data augmentation, called CUDA: CUrriculum of Data Augmentation for long-tailed recognition. |
Sumyeong Ahn; Jongwoo Ko; Se-Young Yun; |
722 | Revisiting Graph Adversarial Attack and Defense From A Data Distribution Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we discover an interesting phenomenon: the adversarial edges are not uniformly distributed on the graph. |
Kuan Li; Yang Liu; Xiang Ao; Qing He; |
723 | Understanding New Tasks Through The Lens of Training Data Via Exponential Tilting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the problem of reweighing the training samples to gain insights into the distribution of the target task. |
Subha Maity; Mikhail Yurochkin; Moulinath Banerjee; Yuekai Sun; |
724 | A VAE for Transformers with Nonparametric Variational Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Variational AutoEncoder (VAE) for Transformers by developing a Variational Information Bottleneck (VIB) regulariser for Transformer embeddings. |
James Henderson; Fabio James Fehr; |
725 | Behavior Prior Representation Learning for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a simple, yet effective approach for learning state representations. |
Hongyu Zang; Xin Li; Jie Yu; Chen Liu; Riashat Islam; Remi Tachet des Combes; Romain Laroche; |
726 | LPT: Long-tailed Prompt Tuning for Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though promising, fine-tuning the whole pretrained model tends to suffer from high cost in computation and deployment of different models for different tasks, as well as weakened generalization capability for overfitting to certain features of long-tailed data. To alleviate these issues, we propose an effective Long-tailed Prompt Tuning (LPT) method for long-tailed classification tasks. |
Bowen Dong; Pan Zhou; Shuicheng Yan; Wangmeng Zuo; |
727 | Towards Real-Time Neural Image Compression With Mask Decay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient single-model variable-bit-rate network, which is able to run at 30 FPS with 768×512 input images and still outperforms VVC for the RD performance. |
Wang Guo-Hua; Jiahao Li; Bin Li; Yan Lu; |
728 | Policy Expansion for Bridging Offline-to-Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One natural approach is to initialize the policy for online learning with the one trained offline. In this work, we introduce a policy expansion scheme for this task. |
Haichao Zhang; Wei Xu; Haonan Yu; |
729 | Learning Differentiable Solvers for Systems with Hard Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a practical method to enforce linear partial differential equation (PDE) constraints for functions defined by neural networks (NNs), up to a desired tolerance. |
Geoffrey Négiar; Michael W. Mahoney; Aditi Krishnapriyan; |
730 | Confidence-Based Feature Imputation for Graphs with Partially Known Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in cases of high rates of missing features, they were unable to avoid significant performance degradation. To overcome this limitation, we introduce a novel concept of channel-wise confidence in a node feature, which is assigned to each imputed channel feature of a node for reflecting the certainty of the imputation. |
Daeho Um; Jiwoong Park; Seulki Park; Jin young Choi; |
731 | LiftedCL: Lifting Contrastive Learning for Human-Centric Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Lifting Contrastive Learning (LiftedCL) to obtain 3D-aware human-centric representations which absorb 3D human structure information. |
Ziwei Chen; Qiang Li; Xiaofeng Wang; Wankou Yang; |
732 | Learning Controllable Adaptive Simulation for Multi-scale Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Learning controllable Adaptive simulation for Multi-scale Physics (LAMP) as the first full deep learning-based surrogate model that jointly learns the evolution model and optimizes appropriate spatial resolutions that devote more compute to the highly dynamic regions. |
Tailin Wu; Takashi Maruyama; Qingqing Zhao; Gordon Wetzstein; Jure Leskovec; |
733 | FedDAR: Federated Domain-Aware Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client’s data distribution is assumed to be a mixture of several predefined domains. |
Aoxiao Zhong; Hao He; Zhaolin Ren; Na Li; Quanzheng Li; |
734 | Learning to Estimate Shapley Values with Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to make Shapley values practical for vision transformers (ViTs). |
Ian Connick Covert; Chanwoo Kim; Su-In Lee; |
735 | Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence. |
Zhengrui Ma; Chenze Shao; Shangtong Gui; Min Zhang; Yang Feng; |
736 | Prompt Learning with Optimal Transport for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike conventional methods of only learning one single prompt, we propose to learn multiple comprehensive prompts to describe diverse characteristics of categories such as intrinsic attributes or extrinsic contexts. |
Guangyi Chen; Weiran Yao; Xiangchen Song; Xinyue Li; Yongming Rao; Kun Zhang; |
737 | Masked Frequency Modeling for Self-Supervised Visual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models. |
Jiahao Xie; Wei Li; Xiaohang Zhan; Ziwei Liu; Yew-Soon Ong; Chen Change Loy; |
738 | A Framework for Benchmarking Class-out-of-distribution Detection and Its Application to ImageNet Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present a novel technique to benchmark image classifiers’ ability to detect class-out-of-distribution instances (i.e., instances whose true labels the model does not recognize) at various levels of detection difficulty. |
Ido Galil; Mohammed Dabbah; Ran El-Yaniv; |
739 | Twofer: Tackling Continual Domain Shift with Simultaneous Domain Generalization and Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing DG works are ineffective for continually changing domains due to severe catastrophic forgetting of learned knowledge. To overcome these limitations of DA or DG in tackling continual domain shifts, we propose Twofer, a framework that simultaneously achieves target domain generalization (TDG), target domain adaptation (TDA), and forgetting alleviation (FA). |
Chenxi Liu; Lixu Wang; Lingjuan Lyu; Chen Sun; Xiao Wang; Qi Zhu; |
740 | VA-DepthNet: A Variational Approach to Single Image Depth Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The paper’s main contribution is to reveal the benefit of classical and well-founded variational constraints in the neural network design for the SIDP task. |
Ce Liu; Suryansh Kumar; Shuhang Gu; Radu Timofte; Luc Van Gool; |
741 | Human Motion Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for human motion data. |
Guy Tevet; Sigal Raab; Brian Gordon; Yoni Shafir; Amit Haim Bermano; Daniel Cohen-or; |
742 | Human MotionFormer: Transferring Human Motions with Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Human MotionFormer, a hierarchical ViT framework for motion transfer between two human subjects. |
Hongyu Liu; Xintong Han; Chenbin Jin; Lihui Qian; Huawei Wei; Zhe Lin; Faqiang Wang; Haoye Dong; Yibing Song; Jia Xu; Qifeng Chen; |
743 | On The Convergence of AdaGrad on $\mathbb{R}^d$: Beyond Convexity, Non-Asymptotic Rate and Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. |
Zijian Liu; Ta Duy Nguyen; Alina Ene; Huy Nguyen; |
744 | Semi Parametric Inducing Point Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce semi-parametric inducing point networks (SPIN), a general-purpose architecture that can query the training set at inference time in a compute-efficient manner. |
Richa Rastogi; Yair Schiff; Alon Hacohen; Zhaozhi Li; Ian Lee; Yuntian Deng; Mert R. Sabuncu; Volodymyr Kuleshov; |
745 | Breaking The Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a curse of dimensionality results in poor scalability and low sample efficiency, inhibiting MARL for decades. To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space. |
Jianye HAO; Xiaotian Hao; Hangyu Mao; Weixun Wang; Yaodong Yang; Dong Li; YAN ZHENG; Zhen Wang; |
746 | Using Language to Extend to Unseen Domains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead consider how simply verbalizing the training domain (e.g.“photos of birds”) as well as domains we want to extend to but do not have data for (e.g.“paintings of birds”) can improve robustness. |
Lisa Dunlap; Clara Mohri; Devin Guillory; Han Zhang; Trevor Darrell; Joseph E. Gonzalez; Aditi Raghunathan; Anna Rohrbach; |
747 | SIMPLE: Specialized Model-Sample Matching for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an alternative direction, i.e., to efficiently leverage a pool of pretrained models without fine-tuning. |
Ziyue Li; Kan Ren; XINYANG JIANG; Yifei Shen; Haipeng Zhang; Dongsheng Li; |
748 | New Insights for The Stability-Plasticity Dilemma in Online Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the stability-plasticity dilemma in online continual learning, we propose an online continual learning framework named multi-scale feature adaptation network (MuFAN) that utilizes a richer context encoding extracted from different levels of a pre-trained network. |
Dahuin Jung; Dongjin Lee; Sunwon Hong; Hyemi Jang; Ho Bae; Sungroh Yoon; |
749 | DFlow: Learning to Synthesize Better Optical Flow Datasets Via A Differentiable Pipeline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, manually identifying and verifying all such necessary properties are intractable mainly due to the requirement of large-scale trial-and-error experiments with iteratively generating whole synthetic datasets. To tackle this challenge, we propose a differentiable optical flow data generation pipeline and a loss function to drive the pipeline, called DFlow. |
Kwon Byung-Ki; Nam Hyeon-Woo; Ji-Yun Kim; Tae-Hyun Oh; |
750 | Ensuring DNN Solution Feasibility for Optimization Problems with Linear Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose preventive learning as the first framework to guarantee Deep Neural Network (DNN) solution feasibility for optimization problems with linear constraints without post-processing. |
Tianyu Zhao; Xiang Pan; Minghua Chen; Steven Low; |
751 | Toeplitz Neural Network for Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While showing good performance, the transformer models are inefficient to scale to long input sequences, mainly due to the quadratic space-time complexity of attention. To overcome this inefficiency, we propose to model sequences with a relative position encoded Toeplitz matrix and use a Toeplitz matrix-vector production trick to reduce the space-time complexity of the sequence modeling to log linear. |
Zhen Qin; Xiaodong Han; Weixuan Sun; Bowen He; Dong Li; Dongxu Li; Yuchao Dai; Lingpeng Kong; Yiran Zhong; |
752 | Model Ensemble Instead of Prompt Fusion: A Sample-specific Knowledge Transfer Method for Few-shot Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks with abundant training samples. |
XIANGYU PENG; Chen Xing; Prafulla Kumar Choubey; Chien-Sheng Wu; Caiming Xiong; |
753 | Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method that enables synthesizing novel views and novel poses of arbitrary human performers from sparse multi-view images. |
YoungJoong Kwon; Dahun Kim; Duygu Ceylan; Henry Fuchs; |
754 | Retrieval-based Controllable Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new retrieval-based framework for controllable molecule generation. |
Zichao Wang; Weili Nie; Zhuoran Qiao; Chaowei Xiao; Richard Baraniuk; Anima Anandkumar; |
755 | Deep Generative Symbolic Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We make the observation that closed-form equations often have structural characteristics and invariances (e.g. the commutative law) that could be further exploited to build more effective symbolic regression solutions. Motivated by this observation, our key contribution is to leverage pre-trained deep generative models to capture the intrinsic regularities of equations, thereby providing a solid foundation for subsequent optimization steps. |
Samuel Holt; Zhaozhi Qian; Mihaela van der Schaar; |
756 | Order Matters: Agent-by-agent Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the \textbf{A}gent-by-\textbf{a}gent \textbf{P}olicy \textbf{O}ptimization (A2PO) algorithm to improve the sample efficiency and retain the guarantees of monotonic improvement for each agent during training. |
Xihuai Wang; Zheng Tian; Ziyu Wan; Ying Wen; Jun Wang; Weinan Zhang; |
757 | A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we combine ML with optimization and propose a novel predict-and-search framework for efficiently identifying high-quality feasible solutions. |
Qingyu Han; Linxin Yang; Qian Chen; Xiang Zhou; Dong Zhang; Akang Wang; Ruoyu Sun; Xiaodong Luo; |
758 | Improved Convergence of Differential Private SGD with Gradient Clipping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a by-product, we propose a new clipping technique, called value clipping, to mitigate the computational overhead caused by the classic gradient clipping. |
Huang Fang; Xiaoyun Li; Chenglin Fan; Ping Li; |
759 | Solving Constrained Variational Inequalities Via A First-order Interior Point-based Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop an interior-point approach to solve constrained variational inequality (cVI) problems. |
Tong Yang; Michael Jordan; Tatjana Chavdarova; |
760 | Generating Diverse Cooperative Agents By Learning Incompatible Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to learn diverse behaviors via policy compatibility. |
Rujikorn Charakorn; Poramate Manoonpong; Nat Dilokthanakul; |
761 | Logical Message Passing Networks with One-hop Inference on Atomic Formulas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple framework for complex query answering that decomposes the KG embeddings from neural set operators. |
Zihao Wang; Yangqiu Song; Ginny Wong; Simon See; |
762 | Rethinking The Effect of Data Augmentation in Adversarial Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by this observation, we revisit existing self-AT and discover an inherent dilemma that affects self-AT robustness: either strong or weak data augmentations are harmful to self-AT, and a medium strength is insufficient to bridge the gap. To resolve this dilemma, we propose a simple remedy named DynACL (Dynamic Adversarial Contrastive Learning). |
Rundong Luo; Yifei Wang; Yisen Wang; |
763 | Data-Free One-Shot Federated Learning Under Very High Statistical Heterogeneity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, one-shot FL methods often degrade under high statistical heterogeneity, fail to promote pipeline security, or require an auxiliary public dataset. To address these limitations, we propose two novel data-free one-shot FL methods: FedCVAE-Ens and its extension FedCVAE-KD. |
Emilio Luz-Ricca; Clare Elizabeth Heinbaugh; Huajie Shao; |
764 | Code Translation with Compiler Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we leverage low-level compiler intermediate representations (IR) code translation. |
Marc Szafraniec; Baptiste Roziere; Hugh James Leather; Patrick Labatut; Francois Charton; Gabriel Synnaeve; |
765 | What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel and comprehensive study of selective prediction and the uncertainty estimation performance of 523 existing pretrained deep ImageNet classifiers that are available in popular repositories. |
Ido Galil; Mohammed Dabbah; Ran El-Yaniv; |
766 | Predictor-corrector Algorithms for Stochastic Optimization Under Gradual Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Often, the underlying process that drives the distribution shift is continuous in nature. We exploit this underlying continuity by developing predictor-corrector algorithms for time-varying stochastic optimization that anticipates changes in the underlying data generating process through a predictor-corrector term in the update rule. |
Subha Maity; Debarghya Mukherjee; Moulinath Banerjee; Yuekai Sun; |
767 | The In-Sample Softmax for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We highlight a simple fact: it is more straightforward to approximate an in-sample softmax using only actions in the dataset. |
Chenjun Xiao; Han Wang; Yangchen Pan; Adam White; Martha White; |
768 | Replay Memory As An Empirical MDP: Combining Conservative Estimation with Experience Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we further exploit the information in the replay memory by treating it as an empirical \emph{Replay Memory MDP (RM-MDP)}. |
Hongming Zhang; Chenjun Xiao; Han Wang; Jun Jin; bo xu; Martin Müller; |
769 | A Unified Algebraic Perspective on Lipschitz Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel algebraic perspective unifying various types of 1-Lipschitz neural networks, and show that AOL and CPL can be re-derived and generalized using exactly the same semidefinite programming (SDP) condition. |
Alexandre Araujo; Aaron J Havens; Blaise Delattre; Alexandre Allauzen; Bin Hu; |
770 | Curriculum-based Co-design of Morphology and Control of Voxel-based Soft Robots Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Curriculum-based Co-design (CuCo) method for learning to design and control VSRs through an easy-to-difficult process. |
Yuxing Wang; Shuang Wu; Haobo Fu; QIANG FU; Tiantian Zhang; Yongzhe Chang; Xueqian Wang; |
771 | When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our study, one interesting observation is that deep \textit{Q} functions approximate well inside the convex hull of training data. Inspired by this, we propose a new method, \textit{DOGE (Distance-sensitive Offline RL with better GEneralization)}. |
Jianxiong Li; Xianyuan Zhan; Haoran Xu; Xiangyu Zhu; Jingjing Liu; Ya-Qin Zhang; |
772 | Impossibly Good Experts and How to Follow Them Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the sequential decision making problem of learning from an expert that has access to more information than the learner.We provide a set of necessary criteria on the expert that will allow a learner to recover the optimal policy in the reduced information space from the expert’s advice alone. |
Aaron Walsman; Muru Zhang; Sanjiban Choudhury; Dieter Fox; Ali Farhadi; |
773 | Memorization-Dilation: Modeling Neural Collapse Under Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we study a more realistic variant of the layer-peeled model, which takes the positivity of the features into account. |
Duc Anh Nguyen; Ron Levie; Julian Lienen; Eyke Hüllermeier; Gitta Kutyniok; |
774 | SLTUNET: A Simple Unified Model for Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SLTUNET, a simple unified neural model designed to support multiple SLTrelated tasks jointly, such as sign-to-gloss, gloss-to-text and sign-to-text translation. |
Biao Zhang; Mathias Müller; Rico Sennrich; |
775 | Mind The Gap: Offline Policy Optimizaiton for Imperfect Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a unified offline policy optimization approach, \textit{RGM} (Reward Gap Minimization), which can smartly handle diverse types of imperfect rewards. |
Jianxiong Li; Xiao Hu; Haoran Xu; Jingjing Liu; Xianyuan Zhan; Qing-Shan Jia; Ya-Qin Zhang; |
776 | Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To make the whole process learnable, we introduce a multimodal meta-learning approach. |
Ivona Najdenkoska; Xiantong Zhen; Marcel Worring; |
777 | Masked Image Modeling with Denoising Contrast Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We first treat masked patch prediction as denoising contrastive learning in self-supervised image pre-training, achieving state-of-the-art results. |
Kun Yi; Yixiao Ge; Xiaotong Li; Shusheng Yang; Dian Li; Jianping Wu; Ying Shan; Xiaohu Qie; |
778 | Learning on Large-scale Text-attributed Graphs Via Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient and effective solution to learning on large text-attributed graphs by fusing graph structure and language learning with a variational Expectation-Maximization (EM) framework, called GLEM. |
Jianan Zhao; Meng Qu; Chaozhuo Li; Hao Yan; Qian Liu; Rui Li; Xing Xie; Jian Tang; |
779 | Distributionally Robust Post-hoc Classifiers Under Prior Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. |
Jiaheng Wei; Harikrishna Narasimhan; Ehsan Amid; Wen-Sheng Chu; Yang Liu; Abhishek Kumar; |
780 | Learning Sparse and Low-Rank Priors for Image Recovery Via Iterative Reweighted Least Squares Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce a novel optimization algorithm for image recovery under learned sparse and low-rank constraints, which are parameterized with weighted extensions of the $\ell_p^p$-vector and $\mathcal{S}_p^p$ Schatten-matrix quasi-norms for $0\! |
Stamatios Lefkimmiatis; Iaroslav Sergeevich Koshelev; |
781 | Adversarial Training of Self-supervised Monocular Depth Estimation Against Physical-World Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel adversarial training method for self-supervised MDE models based on view synthesis without using the depth ground truth. |
Zhiyuan Cheng; James Chenhao Liang; Guanhong Tao; Dongfang Liu; Xiangyu Zhang; |
782 | Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose sparse upcycling — a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. |
Aran Komatsuzaki; Joan Puigcerver; James Lee-Thorp; Carlos Riquelme Ruiz; Basil Mustafa; Joshua Ainslie; Yi Tay; Mostafa Dehghani; Neil Houlsby; |
783 | Diffusion Probabilistic Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce Diffusion Probabilistic Fields (DPF), a diffusion model that can learn distributions over continuous functions defined over metric spaces, commonly known as fields. |
Peiye Zhuang; Samira Abnar; Jiatao Gu; Alex Schwing; Joshua M. Susskind; Miguel Ángel Bautista; |
784 | SpeedyZero: Mastering Atari with Limited Data and Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop SpeedyZero, a distributed RL system built upon a state-of-the-art model-based RL method, EfficientZero, with a dedicated system design for fast distributed computation. |
Yixuan Mei; Jiaxuan Gao; Weirui Ye; Shaohuai Liu; Yang Gao; Yi Wu; |
785 | Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a more general framework, Hidden-Utility Self-Play (HSP), which explicitly models human biases as hidden reward functions in the self-play objective. |
Chao Yu; Jiaxuan Gao; Weilin Liu; Botian Xu; Hao Tang; Jiaqi Yang; Yu Wang; Yi Wu; |
786 | What Do Self-Supervised Vision Transformers Learn? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present comparative studies on how and why contrastive learning (CL) and masked image modeling (MIM) differ in their representations and performance on downstream tasks. |
Namuk Park; Wonjae Kim; Byeongho Heo; Taekyung Kim; Sangdoo Yun; |
787 | Symmetric Pruning in Quantum Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Empirical evidence showed that QNNs with handcraft symmetric ans\atze generally experience better trainability than those with asymmetric ans\atze, while theoretical explanations remain vague. To fill this knowledge gap, here we propose the effective quantum neural tangent kernel (EQNTK) and connect this concept with over-parameterization theory to quantify the convergence of QNNs towards the global optima. |
Xinbiao Wang; Junyu Liu; Tongliang Liu; Yong Luo; Yuxuan Du; Dacheng Tao; |
788 | Learning Continuous Normalizing Flows For Faster Convergence To Target Distribution Via Ascent Regularizations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new class of continuous NFs, ascent continuous normalizing flows (ACNFs), that makes a base distribution converge faster to a target distribution. |
Shuangshuang Chen; Sihao Ding; Yiannis Karayiannidis; Mårten Björkman; |
789 | Beyond Calibration: Estimating The Grouping Loss of Modern Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose an estimator to approximate the grouping loss. |
Alexandre Perez-Lebel; Marine Le Morvan; Gael Varoquaux; |
790 | Tuning Frequency Bias in Neural Network Training with Nonuniform Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the Neural Tangent Kernel (NTK), one can provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities. Since most training data sets are not drawn from such distributions, we use the NTK model and a data-dependent quadrature rule to theoretically quantify the frequency biasing of NN training given fully nonuniform data. |
Annan Yu; Yunan Yang; Alex Townsend; |
791 | Quantifying Memorization Across Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe three log-linear relationships that quantify the degree to which LMs emit memorized training data. |
Nicholas Carlini; Daphne Ippolito; Matthew Jagielski; Katherine Lee; Florian Tramer; Chiyuan Zhang; |
792 | Temporal Coherent Test Time Optimization for Robust Video Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Temporal Coherent Test-time Optimization framework (TeCo) to utilize spatio-temporal information in test-time optimization for robust video classification. |
Chenyu Yi; SIYUAN YANG; Yufei Wang; Haoliang Li; Yap-peng Tan; Alex Kot; |
793 | MultiViz: Towards Visualizing and Understanding Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How can we visualize the internal modeling of multimodal interactions in these models? Our paper aims to fill this gap by proposing MultiViz, a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages: (1) unimodal importance: how each modality contributes towards downstream modeling and prediction, (2) cross-modal interactions: how different modalities relate with each other, (3) multimodal representations: how unimodal and cross-modal interactions are represented in decision-level features, and (4) multimodal prediction: how decision-level features are composed to make a prediction. |
Paul Pu Liang; Yiwei Lyu; Gunjan Chhablani; Nihal Jain; Zihao Deng; Xingbo Wang; Louis-Philippe Morency; Ruslan Salakhutdinov; |
794 | Learning Locality and Isotropy in Dialogue Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. |
Han Wu; Haochen Tan; Mingjie Zhan; Gangming Zhao; Shaoqing Lu; Ding Liang; Linqi Song; |
795 | SimPer: Simple Self-Supervised Learning of Periodic Targets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SimPer, a simple contrastive SSL regime for learning periodic information in data. |
Yuzhe Yang; Xin Liu; Jiang Wu; Silviu Borac; Dina Katabi; Ming-Zher Poh; Daniel McDuff; |
796 | Simplicial Hopfield Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by setwise connectivity in biology, we extend Hopfield networks by adding setwise connections and embedding these connections in a simplicial complex. |
Thomas F Burns; Tomoki Fukai; |
797 | A Critical Look at Evaluation of GNNs Under Heterophily: Are We Really Making Progress? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we challenge this evaluation setting.Then, we propose a set of heterophilous graphs of varying properties that we believe can serve as a better benchmark for testing the performance of GNNs under heterophily. |
Oleg Platonov; Denis Kuznedelev; Michael Diskin; Artem Babenko; Liudmila Prokhorenkova; |
798 | Recon: Reducing Conflicting Gradients From The Root For Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take a different approach to reduce conflicting gradients from the root. |
Guangyuan SHI; Qimai Li; Wenlong Zhang; Jiaxin Chen; Xiao-Ming Wu; |
799 | A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By analyzing the effect of different layers in the network, we find that shallow and deep layers have different characteristics in CIL. Motivated by this, we propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel. |
Da-Wei Zhou; Qi-Wei Wang; Han-Jia Ye; De-Chuan Zhan; |
800 | Reversible Column Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new neural network design paradigm Reversible Column Networks (RevCols). |
Yuxuan Cai; Yizhuang Zhou; Qi Han; Jianjian Sun; Xiangwen Kong; Jun Li; Xiangyu Zhang; |
801 | $\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through an in-depth analysis of the topology of GNNs, we observe that the topology of the graph leads to significant differences between nodes, and most of the nodes in a graph appear to have a small aggregation value. Motivated by this, in this paper, we propose the Aggregation-Aware mixed-precision Quantization ($\rm A^2Q$) for GNNs, where an appropriate bitwidth is automatically learned and assigned to each node in the graph. |
Zeyu Zhu; Fanrong Li; Zitao Mo; Qinghao Hu; Gang Li; Zejian Liu; Xiaoyao Liang; Jian Cheng; |
802 | Wav2Tok: Deep Sequence Tokenizer for Audio Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes Wav2Tok, a deep sequence tokenizer for audio that converts continuous-valued audio sequences to sequences of discrete tokens that are easier to retrieve via sequence queries. |
Adhiraj Banerjee; Vipul Arora; |
803 | In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach (TOLSTOI) that imputes speech representations internal to a baseline RNN-T, starting from text-only inputs, and performs in-situ adaptation that results in higher adaptation accuracy without any runtime overheads during decoding. |
Ashish Mittal; Sunita Sarawagi; Preethi Jyothi; |
804 | The Tilted Variational Autoencoder: Improving Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, a small volume in the high-density region of the prior is problematic because it restricts the separation of latent points. To ameliorate this, we propose a simple generalization of the Gaussian distribution, called the tilted Gaussian, which has a maximum probability density occurring on a sphere instead of a single point. |
Griffin Floto; Stefan Kremer; Mihai Nica; |
805 | E3Bind: An End-to-End Equivariant Network for Protein-Ligand Docking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work focuses on blind flexible self-docking, where we aim to predict the positions, orientations and conformations of docked molecules. |
Yangtian Zhang; Huiyu Cai; Chence Shi; Jian Tang; |
806 | Sampling-free Inference for Ab-Initio Potential Energy Surface Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the inference shortcomings by proposing the Potential learning from ab-initio Networks (PlaNet) framework, in which we simultaneously train a surrogate model in addition to the neural wave function. |
Nicholas Gao; Stephan Günnemann; |
807 | Cycle-consistent Masked AutoEncoder for Unsupervised Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a cycle cross-domain reconstruction task for unsupervised domain generalization in the absence of paired images. |
Haiyang Yang; SHIXIANG TANG; Xiaotong Li; Feng Zhu; Yizhou Wang; Meilin Chen; LEI BAI; Rui Zhao; Wanli Ouyang; |
808 | Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An interpretable-by-design deep reinforcement learning agent is proposed which uses prototypes for decision making. |
Eoin M. Kenny; Mycal Tucker; Julie Shah; |
809 | Self-Distillation for Further Pre-training of Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, all of them solely focus on language models and we empirically find that a Vision Transformer is vulnerable to overfitting as we continue to pretrain the model on target unlabeled data. In order to tackle this limitation, we propose self-distillation as a regularization for a further pre-training stage. |
Seanie Lee; Minki Kang; Juho Lee; Sung Ju Hwang; Kenji Kawaguchi; |
810 | KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, they have trivial task-specific knowledge and are limited to yielding low-quality synthetic data. To combat this issue, we propose Knowledge Mixture Data Augmentation Model (KnowDA), a Seq2Seq language model pretrained on a mixture of diverse NLP tasks under a novel framework of Knowledge Mixture Training (KoMT). |
Yufei Wang; Jiayi Zheng; Can Xu; Xiubo Geng; Tao Shen; Chongyang Tao; Daxin Jiang; |
811 | DELTA: DEBIASED FULLY TEST-TIME ADAPTATION Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a plug-in solution called DELTA for debiased fully test-time adaptation, which consists of two components: (i) Test-time batch renormalization (TBR), introduced to alleviate the bias in normalization statistics. |
Bowen Zhao; Chen Chen; Shu-Tao Xia; |
812 | Self-Supervised Set Representation Learning for Unsupervised Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One notable aspect of metric-based meta-learning, however, is that it is widely interpreted as a set-level problem since the inference of discriminative class prototypes (or set representations) from few examples is crucial for the performance of downstream tasks. Motivated by this, we propose Set-SimCLR, a novel self-supervised set representation learning framework for targeting UML problem. |
Dong Bok Lee; Seanie Lee; Kenji Kawaguchi; Yunji Kim; Jihwan Bang; Jung-Woo Ha; Sung Ju Hwang; |
813 | Learning Human-Compatible Representations for Case-Based Decision Support Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we incorporate ideas from metric learning with supervised learning to examine the importance of alignment for effective decision support. |
Han Liu; Yizhou Tian; Chacha Chen; Shi Feng; Yuxin Chen; Chenhao Tan; |
814 | LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite it deeply exploiting the lexicon-representing capability of pre-trained language models, a crucial gap remains between language modeling and lexicon-weighting retrieval — the former preferring certain or low-entropy words whereas the latter favoring pivot or high-entropy words — becoming the main barrier to lexicon-weighting performance for large-scale retrieval. To bridge this gap, we propose a brand-new pre-training framework, lexicon-bottlenecked masked autoencoder (LexMAE), to learn importance-aware lexicon representations. |
Tao Shen; Xiubo Geng; Chongyang Tao; Can Xu; Xiaolong Huang; Binxing Jiao; Linjun Yang; Daxin Jiang; |
815 | HypeR: Multitask Hyper-Prompted Training Enables Large-Scale Retrieval Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose HypeR, a hyper-prompted training mechanism to enable uniform retrieval across tasks of different domains. |
ZeFeng Cai; Chongyang Tao; Tao Shen; Can Xu; Xiubo Geng; Xin Alex Lin; Liang He; Daxin Jiang; |
816 | Spiking Convolutional Neural Networks for Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a conversion + fine-tuning” two-step method for training SNN for text classification and proposes a simple but effective way to encode pre-trained word embeddings as spike trains. |
Changze Lv; Jianhan Xu; Xiaoqing Zheng; |
817 | Federated Learning from Small Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Bad local models can arbitrarily deteriorate the aggregate model quality, causing federating learning to fail in these settings. We propose a novel approach that avoids this problem by interleaving model aggregation and permutation steps. |
Michael Kamp; Jonas Fischer; Jilles Vreeken; |
818 | A Study of Causal Confusion in Preference-Based Reward Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the presence of non-causal distractor features, noise in the stated preferences, partial state observability, and larger model capacity can all exacerbate causal confusion. |
Jeremy Tien; Jerry Zhi-Yang He; Zackory Erickson; Anca Dragan; Daniel S. Brown; |
819 | Robust Graph Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a robust graph dictionary learning method based on a novel robust variant of GWD. |
Weijie Liu; Jiahao Xie; Chao Zhang; Makoto Yamada; Nenggan Zheng; Hui Qian; |
820 | Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to order the messages passing into the node representation, with specific blocks of neurons targeted for message passing within specific hops. |
Yunchong Song; Chenghu Zhou; Xinbing Wang; Zhouhan Lin; |
821 | Omnigrok: Grokking Beyond Algorithmic Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim to understand grokking by analyzing the loss landscapes of neural networks, identifying the mismatch between training and test losses as the cause for grokking. |
Ziming Liu; Eric J Michaud; Max Tegmark; |
822 | ManyDG: Many-domain Generalization for Healthcare Applications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, considering the diversity of patient covariates, we propose a new setting by treating each patient as a separate domain (leading to many domains). |
Chaoqi Yang; M Brandon Westover; Jimeng Sun; |
823 | D4AM: A General Denoising Framework for Downstream Acoustic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a general denoising framework for various downstream acoustic models, called D4AM. |
Chi-Chang Lee; Yu Tsao; Hsin-Min Wang; Chu-Song Chen; |
824 | General Neural Gauge Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This begs a question: can we learn the gauge transformation along with the neural scene representations in an end-to-end manner? In this work, we extend this problem to a general paradigm with a taxonomy of discrete and continuous cases, and develop an end-to-end training framework to jointly optimize the gauge transformation and radiance fields. |
Fangneng Zhan; Lingjie Liu; Adam Kortylewski; Christian Theobalt; |
825 | Fooling SHAP with Stealthily Biased Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a complementary family of attacks that leave the model intact and manipulate SHAP explanations using stealthily biased sampling of the data points used to approximate expectations w.r.t the background distribution. |
gabriel laberge; Ulrich Aïvodji; Satoshi Hara; Mario Marchand; Foutse Khomh; |
826 | Inequality Phenomenon in $l_{\infty}$-adversarial Training, and Its Unrealized Threats Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To validate our hypothesis, we proposed two simple attacks that either perturb or replace important features with noise or occlusion. |
Ranjie Duan; YueFeng Chen; Yao Zhu; Xiaojun Jia; Rong Zhang; Hui Xue’; |
827 | Continual Transformers: Redundancy-Free Attention for Online Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose novel formulations of the Scaled Dot-Product Attention, which enable Transformers to perform efficient online token-by-token inference on a continual input stream. |
Lukas Hedegaard; Arian Bakhtiarnia; Alexandros Iosifidis; |
828 | Efficient Federated Domain Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While most existing FL algorithms focus on the conventional non-IID setting of class imbalance or missing classes across clients, in practice, the distribution differences could be more complex, e.g., changes in class conditional (domain) distributions. In this paper, we consider this complex case in FL wherein each client has access to only one domain distribution. |
Zeyu Zhou; Sheikh Shams Azam; Christopher Brinton; David I. Inouye; |
829 | Meta Knowledge Condensation for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike existing paradigms, we introduce an alternative perspective to significantly decrease the federate learning communication cost without leaking original data. |
Ping Liu; Xin Yu; Joey Tianyi Zhou; |
830 | Unicom: Universal and Compact Representation Learning for Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first cluster the large-scale LAION dataset into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model. |
Xiang An; Jiankang Deng; Kaicheng Yang; Jaiwei Li; Ziyong Feng; Jia Guo; Jing Yang; Tongliang Liu; |
831 | Self-supervised Geometric Correspondence for Category-level 6D Object Pose Estimation in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current approaches are restricted by leveraging annotations from simulation or collected from humans. In this paper, we overcome this barrier by introducing a self-supervised learning approach trained directly on large-scale real-world object videos for category-level 6D pose estimation in the wild. |
Kaifeng Zhang; Yang Fu; Shubhankar Borse; Hong Cai; Fatih Porikli; Xiaolong Wang; |
832 | Task-Aware Information Routing from Common Representation Space in Lifelong Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, inspired by Global Workspace Theory of conscious information access in the brain, we propose TAMiL, a continual learning method that entails task-attention modules to capture task-specific information from the common representation space. |
Prashant Shivaram Bhat; Bahram Zonooz; Elahe Arani; |
833 | Error Sensitivity Modulation Based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose ESMER which employs a principled mechanism for modulating the error sensitivity in a dual-memory rehearsal-based system. |
Fahad Sarfraz; Elahe Arani; Bahram Zonooz; |
834 | Towards Robust Object Detection Invariant to Real-World Domain Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing classification domain generalization (DG) methods cannot effectively solve the robust object detection problem, because they either rely on multiple source domains with large style variance or destroy the content structures of the original images. In this paper, we analyze and investigate effective solutions to overcome domain style overfitting for robust object detection without the above shortcomings. |
Qi Fan; Mattia Segu; Yu-Wing Tai; Fisher Yu; Chi-Keung Tang; Bernt Schiele; Dengxin Dai; |
835 | PaLI: A Jointly-Scaled Multilingual Language-Image Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PaLI, a model that extends this approach to the joint modeling of language and vision. |
Xi Chen; Xiao Wang; Soravit Changpinyo; AJ Piergiovanni; Piotr Padlewski; Daniel Salz; Sebastian Goodman; Adam Grycner; Basil Mustafa; Lucas Beyer; Alexander Kolesnikov; Joan Puigcerver; Nan Ding; Keran Rong; Hassan Akbari; Gaurav Mishra; Linting Xue; Ashish V Thapliyal; James Bradbury; Weicheng Kuo; Mojtaba Seyedhosseini; Chao Jia; Burcu Karagol Ayan; Carlos Riquelme Ruiz; Andreas Peter Steiner; Anelia Angelova; Xiaohua Zhai; Neil Houlsby; Radu Soricut; |
836 | Learnable Topological Features For Phylogenetic Inference Via Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel structural representation method for phylogenetic inference based on learnable topological features. |
Cheng Zhang; |
837 | Flow Annealed Importance Sampling Bootstrap Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train normalizing flows to fit multi-modal target distributions by generating samples where the flow is a poor approximation of the target using an annealed importance sampling bootstrap procedure. |
Laurence Illing Midgley; Vincent Stimper; Gregor N. C. Simm; Bernhard Schölkopf; José Miguel Hernández-Lobato; |
838 | NeRN: Learning Neural Representations for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Similarly to the spatial smoothness of visual scenes, we show that incorporating a smoothness constraint over the original network’s weights aids NeRN towards a better reconstruction. |
Maor Ashkenazi; Zohar Rimon; Ron Vainshtein; Shir Levi; Elad Richardson; Pinchas Mintz; Eran Treister; |
839 | LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works show that a squared norm regularization on the implicit reward function is effective, but do not provide a theoretical analysis of the resulting properties of the algorithms. In this work, we show that using this regularizer under a mixture distribution of the policy and the expert provides a particularly illuminating perspective: the original objective can be understood as squared Bellman error minimization, and the corresponding optimization problem minimizes the $\chi^2$-Divergence between the expert and the mixture distribution. |
Firas Al-Hafez; Davide Tateo; Oleg Arenz; Guoping Zhao; Jan Peters; |
840 | Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a novel and practical problem of Fair Unsupervised Representation Learning with Partially annotated Sensitive labels (FURL-PS).In this way, we construct a balanced and unbiased dataset. |
Fengda Zhang; Kun Kuang; Long Chen; Yuxuan Liu; Chao Wu; Jun Xiao; |
841 | A Laplace-inspired Distribution on SO(3) for Probabilistic Rotation Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we draw inspiration from multivariate Laplace distribution and propose a novel Rotation Laplace distribution on SO(3). |
Yingda Yin; Yang Wang; He Wang; Baoquan Chen; |
842 | In-context Reinforcement Learning with Algorithm Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. |
Michael Laskin; Luyu Wang; Junhyuk Oh; Emilio Parisotto; Stephen Spencer; Richie Steigerwald; DJ Strouse; Steven Stenberg Hansen; Angelos Filos; Ethan Brooks; maxime gazeau; Himanshu Sahni; Satinder Singh; Volodymyr Mnih; |
843 | Robust Attention for Contextual Biased Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to learn causal object features robust for contextual bias, we propose a novel attention module named Interventional Dual Attention (IDA) for visual recognition. |
Ruyang Liu; Jingjia Huang; Ge Li; Thomas H. Li; |
844 | Minimax Optimal Kernel Operator Learning Via Multilevel Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. |
Jikai Jin; Yiping Lu; Jose Blanchet; Lexing Ying; |
845 | EAGLE: Large-scale Learning of Turbulent Fluid Dynamics with Mesh Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new large-scale dataset for learning non-steady fluid mechanics and a method based on self-attention on graphs |
Steeven JANNY; Aurélien Bénéteau; Madiha Nadri; Julie Digne; Nicolas THOME; Christian Wolf; |
846 | Matching Receptor to Odorant with Protein Language and Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we combine [CLS] token from protBERT with a molecular graph and propose a tailored GNN architecture incorporating inductive biases from the protein-molecule binding. |
Matej Hladiš; Maxence Lalis; Sebastien Fiorucci; Jérémie Topin; |
847 | Asynchronous Gradient Play in Zero-Sum Multi-agent Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make progress by studying asynchronous gradient plays in zero-sum polymatrix games under delayed feedbacks. |
Ruicheng Ao; Shicong Cen; Yuejie Chi; |
848 | Concept-level Debugging of Part-Prototype Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, like other models, they are prone to picking up confounders and shortcuts from the data, thus suffering from compromised prediction accuracy and limited generalization. We propose ProtoPDebug, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model’s explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is fine-tuned to align with this supervision. |
Andrea Bontempelli; Stefano Teso; Katya Tentori; Fausto Giunchiglia; Andrea Passerini; |
849 | Dual Algorithmic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose to learn algorithms by exploiting duality of the underlying algorithmic problem. |
Danilo Numeroso; Davide Bacciu; Petar Veličković; |
850 | Sampling-based Inference for Large Linear Models, with Application to Linearised Laplace Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alas, the computational cost associated with Bayesian linear models constrains this method’s application to small networks, small output spaces and small datasets. We address this limitation by introducing a scalable sample-based Bayesian inference method for conjugate Gaussian multi-output linear models, together with a matching method for hyperparameter (regularisation) selection. |
Javier Antoran; Shreyas Padhy; Riccardo Barbano; Eric Nalisnick; David Janz; José Miguel Hernández-Lobato; |
851 | Video Scene Graph Generation from Single-Frame Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first weakly-supervised VidSGG task with only single-frame weak supervision: SF-VidSGG. |
Siqi Chen; Long Chen; Jun Xiao; |
852 | Personalized Reward Learning with Interaction-Grounded Learning (IGL) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. |
Jessica Maghakian; Paul Mineiro; Kishan Panaganti; Mark Rucker; Akanksha Saran; Cheng Tan; |
853 | Optimizing Bi-Encoder for Named Entity Recognition Via Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a bi-encoder framework for named entity recognition (NER), which applies contrastive learning to map candidate text spans and entity types into the same vector representation space. |
Sheng Zhang; Hao Cheng; Jianfeng Gao; Hoifung Poon; |
854 | On The Saturation Effect of Kernel Ridge Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture. |
Yicheng Li; Haobo Zhang; Qian Lin; |
855 | Visually-Augmented Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current large-scale pre-trained language models rely on the text-only self-supervised training with massive text data, which precludes them from utilizing relevant visual information when necessary. To address this, we propose a novel pre-training framework, named VaLM, to Visually-augment text tokens with retrieved relevant images for Language Modeling. |
Weizhi Wang; Li Dong; Hao Cheng; Haoyu Song; Xiaodong Liu; Xifeng Yan; Jianfeng Gao; Furu Wei; |
856 | DiGress: Discrete Denoising Diffusion for Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. |
Clement Vignac; Igor Krawczuk; Antoine Siraudin; Bohan Wang; Volkan Cevher; Pascal Frossard; |
857 | Diffusion Probabilistic Modeling of Protein Backbones in 3D for The Motif-scaffolding Problem Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to learn a distribution over diverse and longer protein backbone structures via an E(3)-equivariant graph neural network. |
Brian L. Trippe; Jason Yim; Doug Tischer; David Baker; Tamara Broderick; Regina Barzilay; Tommi S. Jaakkola; |
858 | Advancing Radiograph Representation Learning with Masked Record Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Modern studies in radiograph representation learning (R$^2$L) rely on either self-supervision to encode invariant semantics or associated radiology reports to incorporate medical expertise, while the complementarity between them is barely noticed. To explore this, we formulate the self- and report-completion as two complementary objectives and present a unified framework based on masked record modeling (MRM). |
Hong-Yu Zhou; Chenyu Lian; Liansheng Wang; Yizhou Yu; |
859 | Time to Augment Visual Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This gives access to “augmentations” not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations for learning object categories. |
Arthur Aubret; Markus R. Ernst; Céline Teulière; Jochen Triesch; |
860 | Parametrizing Product Shape Manifolds By Composite Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This, however, often comes with high computational costs, which raises the question if one can learn an efficient neural network approximation. We show that this is indeed possible for shape spaces with a special product structure, namely those smoothly approximable by a direct sum of low-dimensional manifolds. |
Josua Sassen; Klaus Hildebrandt; Benedikt Wirth; Martin Rumpf; |
861 | Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to address the issue, a concept of the moderate coreset is discussed. |
Xiaobo Xia; Jiale Liu; Jun Yu; Xu Shen; Bo Han; Tongliang Liu; |
862 | Generative Modeling Helps Weak Supervision (and Vice Versa) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. |
Benedikt Boecking; Nicholas Roberts; Willie Neiswanger; Stefano Ermon; Frederic Sala; Artur Dubrawski; |
863 | Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture of Stochastic Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images. |
Zhitong Gao; Yucong Chen; Chuyu Zhang; Xuming He; |
864 | Weakly-supervised HOI Detection Via Prior-guided Bi-level Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we develop a CLIP-guided HOI representation capable of incorporating the prior knowledge at both image level and HOI instance level, and adopt a self-taught mechanism to prune incorrect human-object associations. |
Bo Wan; Yongfei Liu; Desen Zhou; Tinne Tuytelaars; Xuming He; |
865 | Boosting Causal Discovery Via Adaptive Sample Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple yet effective model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore for short, where the learned weights tailors quantitatively to the important degree of each samples. |
An Zhang; Fangfu Liu; Wenchang Ma; Zhibo Cai; Xiang Wang; Tat-Seng Chua; |
866 | Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under the linear MDP setting with feature dimension $d$ and planning horizon $H$, we propose a new algorithm that collects at most $\widetilde{O}(\frac{d^2H^5}{\epsilon^2})$ trajectories within $H$ deployments to identify $\epsilon$-optimal policy for any (possibly data-dependent) choice of reward functions. |
Dan Qiao; Yu-Xiang Wang; |
867 | An Adaptive Policy to Employ Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design an adaptive policy to employ SAM based on the loss landscape geometry. |
Weisen Jiang; Hansi Yang; Yu Zhang; James Kwok; |
868 | Capturing The Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present a novel method to estimate 3D human pose and shape from monocular videos. |
Sen Yang; Wen Heng; Gang Liu; GUOZHONG LUO; Wankou Yang; Gang YU; |
869 | PatchDCT: Patch Refinement for High Quality Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose a simple and novel method named PatchDCT, which separates the mask decoded from a DCT vector into several patches and refines each patch by the designed classifier and regressor. |
Qinrou Wen; Jirui Yang; Xue Yang; Kewei Liang; |
870 | Jointly Learning Visual and Auditory Speech Representations from Raw Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present RAVEn, a self-supervised multi-modal approach to jointly learn visual and auditory speech representations. |
Alexandros Haliassos; Pingchuan Ma; Rodrigo Mira; Stavros Petridis; Maja Pantic; |
871 | Reparameterization Through Spatial Gradient Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional neural networks. |
Alexander Detkov; Mohammad Salameh; Muhammad Fetrat; Jialin Zhang; Robin Luwei; SHANGLING JUI; Di Niu; |
872 | LMC: Fast Training of GNNs Via Subgraph Sampling with Provable Convergence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This poses significant challenges to their convergence analysis and convergence speeds, which seriously limits their reliable real-world applications. To address this challenge, we propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). |
Zhihao Shi; Xize Liang; Jie Wang; |
873 | Gray-Box Gaussian Processes for Automated Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel gray-box Bayesian Optimization technique for HPO in RL, that enriches Gaussian Processes with reward curve estimations based on generalized logistic functions. |
Gresa Shala; André Biedenkapp; Frank Hutter; Josif Grabocka; |
874 | Continuized Acceleration for Quasar Convex Functions in Non-Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that a recently proposed continuized Nesterov acceleration can be applied to minimizing quasar convex functions and achieves the optimal bound with a high probability. |
Jun-Kun Wang; Andre Wibisono; |
875 | Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim at theoretically understanding GD with hard and conjugate labels for a binary classification problem. |
Jun-Kun Wang; Andre Wibisono; |
876 | Accelerating Hamiltonian Monte Carlo Via Chebyshev Integration Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider accelerating the process of sampling from a distribution $\pi(x) \propto \exp(-f(x))$ via HMC via time-varying integration time. |
Jun-Kun Wang; Andre Wibisono; |
877 | MEDFAIR: BENCHMARKING FAIRNESS FOR MEDICAL IMAGEING Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MEDFAIR, a framework to benchmark the fairness of machine learning models for medical imaging. |
Yongshuo Zong; Yongxin Yang; Timothy Hospedales; |
878 | Progress Measures for Grokking Via Mechanistic Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that progress measures can be found via mechanistic interpretability—that is, by reverse engineering learned models into components and measuring the progress of each component over the course of training. |
Neel Nanda; Lawrence Chan; Tom Lieberum; Jess Smith; Jacob Steinhardt; |
879 | Decision Transformer Under Random Frame Dropping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To devise a robust and deployable algorithm, we propose Decision Transformer under Random Frame Dropping(DeFog), an offline RL algorithm that enables agents to act robustly in frame dropping scenarios without online interaction. |
Kaizhe Hu; Ray Chen Zheng; Yang Gao; Huazhe Xu; |
880 | Modeling Content Creator Incentives on Algorithm-curated Platforms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. |
Jiri Hron; Karl Krauth; Michael Jordan; Niki Kilbertus; Sarah Dean; |
881 | DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of 3D scene geometry decomposition and manipulation from 2D views. |
Bing WANG; Lu Chen; Bo Yang; |
882 | An Efficient Encoder-decoder Architecture with Top-down Attention for Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain’s top-down attention, called TDANet, with decreased model complexity without sacrificing performance. |
Kai Li; Runxuan Yang; Xiaolin Hu; |
883 | Bayesian Oracle for Bounding Information Gain in Neural Encoding Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we generalize the jack-knife oracle estimator for the mean—commonly used for correlation metrics—to a flexible Bayesian oracle estimator for IG based on posterior predictive distributions. |
Konstantin-Klemens Lurz; Mohammad Bashiri; Edgar Y. Walker; Fabian H. Sinz; |
884 | Improving The Imputation of Missing Data with Markov Blanket Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Markov Blanket discovery approach to determine the optimal feature set for a given variable by considering both observed variables and missingness of partially observed variables to account for systematic missingness. |
Yang Liu; Anthony Constantinou; |
885 | Simple Emergent Action Representations from Multi-Task Policy Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While sensory representations have been widely studied, the representations of actions that form motor skills are yet under exploration. In this work, we find that when a multi-task policy network takes as input states and task embeddings, a space based on the task embeddings emerges to contain meaningful action representations with moderate constraints.Within this space, interpolated or composed embeddings can serve as a high-level interface to instruct the agent to perform meaningful action sequences. |
Pu Hua; Yubei Chen; Huazhe Xu; |
886 | Malign Overfitting: Interpolation and Invariance Are Fundamentally at Odds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This suggests that the phenomenon of “benign overfitting, in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. |
Yoav Wald; Gal Yona; Uri Shalit; Yair Carmon; |
887 | Empirical Study of Pre-training A Backbone for 3D Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, its effects on 3DHPSE are open to question, whose target is fixed to a single class, the human. In this regard, we inspect the effectiveness of SSL on 3DHPSE and investigate two other pre-training approaches that have received relatively less attention. |
Hongsuk Choi; Hyeongjin Nam; Taeryung Lee; Gyeongsik Moon; Kyoung Mu Lee; |
888 | Selective Frequency Network for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we exploit a multi-branch and content-aware module to decompose the feature into separate frequency subbands dynamically and locally, and then accentuate the useful ones via the channel-wise attention mechanism. |
Yuning Cui; Yi Tao; Zhenshan Bing; Wenqi Ren; Xinwei Gao; Xiaochun Cao; Kai Huang; Alois Knoll; |
889 | Proposal-Contrastive Pretraining for Object Detection from Fewer Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this problem, we are interested in transformer-based object detectors that have recently gained traction in the community with good performance and with the particularity of generating many diverse object proposals. In this work, we present Proposal Selection Contrast (ProSeCo), a novel unsupervised overall pretraining approach that leverages this property. |
Quentin Bouniot; Romaric Audigier; Angelique Loesch; Amaury Habrard; |
890 | Mass-Editing Memory in A Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by an order of magnitude. |
Kevin Meng; Arnab Sen Sharma; Alex J Andonian; Yonatan Belinkov; David Bau; |
891 | Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that recent advances in self-supervised representation learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago. |
Andrii Zadaianchuk; Matthaeus Kleindessner; Yi Zhu; Francesco Locatello; Thomas Brox; |
892 | CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to achieve a tighter measurement of the model exposure by considering a realistic threat model. |
Samuel Maddock; Alexandre Sablayrolles; Pierre Stock; |
893 | Effective Self-supervised Pre-training on Low-compute Networks Without Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most prior works attribute this poor performance to the capacity bottleneck of the low-compute networks and opt to bypass the problem through the use of knowledge distillation (KD). In this work, we revisit SSL for efficient neural networks, taking a closer at what are the detrimental factors causing the practical limitations, and whether they are intrinsic to the self-supervised low-compute setting. |
Fuwen Tan; Fatemeh Sadat Saleh; Brais Martinez; |
894 | DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel learning framework, DualAfford, to learn collaborative affordance for dual-gripper manipulation tasks.We will release code and data upon acceptance. |
Yan Zhao; Ruihai Wu; Zhehuan Chen; Yourong Zhang; Qingnan Fan; Kaichun Mo; Hao Dong; |
895 | Instance-wise Batch Label Restoration Via Gradients in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An analytic method is proposed to perform instance-wise batch label restoration from only the gradient of the final layer. |
Kailang Ma; Yu Sun; Jian Cui; Dawei Li; Zhenyu Guan; Jianwei Liu; |
896 | Perfectly Secure Steganography Using Minimum Entropy Coupling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)’s information theoretic-model of steganography if and only if it is induced by a coupling. |
Christian Schroeder de Witt; Samuel Sokota; J Zico Kolter; Jakob Nicolaus Foerster; Martin Strohmeier; |
897 | Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the Denoising Diffusion Null-Space Model (DDNM), a novel zero-shot framework for arbitrary linear IR problems, including but not limited to image super-resolution, colorization, inpainting, compressed sensing, and deblurring. |
Yinhuai Wang; Jiwen Yu; Jian Zhang; |
898 | UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning. |
Jinhao Jiang; Kun Zhou; Xin Zhao; Ji-Rong Wen; |
899 | ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X—a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images. |
Badr Youbi Idrissi; Diane Bouchacourt; Randall Balestriero; Ivan Evtimov; Caner Hazirbas; Nicolas Ballas; Pascal Vincent; Michal Drozdzal; David Lopez-Paz; Mark Ibrahim; |
900 | AnyDA: Anytime Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a simple yet effective framework for anytime domain adaptation that is executable with dynamic resource constraints to achieve accuracy-efficiency trade-offs under domain-shifts. |
Omprakash Chakraborty; Aadarsh Sahoo; Rameswar Panda; Abir Das; |
901 | Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to enable humans and agents to collaborate through explicit communication by designing an efficient and interpretable Meta-Command Communication-based framework, dubbed MCC, for accomplishing effective human-agent collaboration in MOBA games. |
Yiming Gao; Feiyu Liu; Liang Wang; Zhenjie Lian; Weixuan Wang; Siqin Li; Xianliang Wang; Xianhan Zeng; Rundong Wang; jiawei wang; QIANG FU; Yang Wei; Lanxiao Huang; Wei Liu; |
902 | A Higher Precision Algorithm for Computing The $1$-Wasserstein Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm that runs in $O(T(n,\varepsilon/d) \log n)$ time and boosts the accuracy of estimating $\mathcal{W}(\mu,\nu)$ to an additive factor of $\min\{\varepsilon, (d\log_{\sqrt{d}/\varepsilon} n)\mathcal{W}(\mu,\nu)\}$. |
Pankaj K Agarwal; Sharath Raghvendra; Pouyan Shirzadian; Rachita Sowle; |
903 | Interpretations of Domain Adaptations Via Layer Variational Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. |
Huan-Hsin Tseng; Hsin-Yi Lin; Kuo-Hsuan Hung; Yu Tsao; |
904 | A Self-Attention Ansatz for Ab-initio Quantum Chemistry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel neural network architecture using self-attention, the Wavefunction Transformer (PsiFormer), which can be used as an approximation (or Ansatz) for solving the many-electron Schrödinger equation, the fundamental equation for quantum chemistry and material science. |
Ingrid von Glehn; James S Spencer; David Pfau; |
905 | Learning What and Where – Unsupervised Disentangling Location and Identity Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce a self-supervised LOCation and Identity tracking system (Loci), which excels on the CATER tracking challenge. |
Manuel Traub; Sebastian Otte; Tobias Menge; Matthias Karlbauer; Jannik Thuemmel; Martin V. Butz; |
906 | Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate how multimodal prompt engineering can use language as the intermediate representation to combine complementary knowledge from different pretrained (potentially multimodal) language models for a variety of tasks. |
Andy Zeng; Maria Attarian; brian ichter; Krzysztof Marcin Choromanski; Adrian Wong; Stefan Welker; Federico Tombari; Aveek Purohit; Michael S Ryoo; Vikas Sindhwani; Johnny Lee; Vincent Vanhoucke; Pete Florence; |
907 | Multi-skill Mobile Manipulation for Object Rearrangement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose that the manipulation skills should include mobility to have flexibility in interacting with the target object from multiple locations and at the same time the navigation skill could have multiple end points which lead to successful manipulation. |
Jiayuan Gu; Devendra Singh Chaplot; Hao Su; Jitendra Malik; |
908 | On The Inadequacy of Optimizing Alignment and Uniformity in Contrastive Learning of Sentence Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the suitability of the decoupled form of contrastive loss, i.e., alignment and uniformity, in SRL. |
Zhijie Nie; Richong Zhang; Yongyi Mao; |
909 | BALTO: Efficient Tensor Program Optimization with Diversity-based Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose BALTO, a fast TPO approach with biased-diversity-based active learning, aiming at reducing much lower training costs under similar optimization accuracy.The key insight is that random sampling of existing approaches suffers from a heavy redundancy of low-performance programs, which incurs tremendous duplicated time-consuming measurements. |
Jun Bi; Xiaqing Li; Qi Guo; Rui Zhang; Yuanbo Wen; Xing Hu; Zidong Du; Xinkai Song; Yifan Hao; Yunji Chen; |
910 | Decision S4: Efficient Sequence-Based RL Via State Spaces Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present two main algorithms: (i) an off-policy training procedure that works with trajectories, while still maintaining the training efficiency of the S4 model. |
Shmuel Bar David; Itamar Zimerman; Eliya Nachmani; Lior Wolf; |
911 | DecAF: Joint Decoding of Answers and Logical Forms for Question Answering Over Knowledge Bases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel KBQA framework that jointly generates both direct answers and logical forms, and then combines them to obtain the final answers. |
Donghan Yu; Sheng Zhang; Patrick Ng; Henghui Zhu; Alexander Hanbo Li; Jun Wang; Yiqun Hu; William Yang Wang; Zhiguo Wang; Bing Xiang; |
912 | ZiCo: Zero-shot NAS Via Inverse Coefficient of Variation on Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve this state of affairs, as the main theoretical contribution, we first reveal how some specific gradient properties across different samples impact the convergence rate of neural networks. Based on this theoretical analysis, we propose a new zero-shot proxy, ZiCo, the first proxy that works consistently better than #Params. |
Guihong Li; Yuedong Yang; Kartikeya Bhardwaj; Radu Marculescu; |
913 | Hierarchical Relational Learning for Few-Shot Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hierarchical relational learning method (HiRe) for few-shot KG completion. |
Han Wu; Jie Yin; Bala Rajaratnam; Jianyuan Guo; |
914 | Integrating Symmetry Into Differentiable Planning with Steerable Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by equivariant convolution networks, we treat the path planning problem as \textit{signals} over grids. |
Linfeng Zhao; Xupeng Zhu; Lingzhi Kong; Robin Walters; Lawson L.S. Wong; |
915 | Spectral Augmentation for Self-Supervised Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to find a principled way for topology augmentations by exploring the invariance of graphs from the spectral perspective. |
Lu Lin; Jinghui Chen; Hongning Wang; |
916 | Can We Faithfully Represent Absence States to Compute Shapley Values on A DNN? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are no studies investigating how to represent the absence of input variables and verify the faithfulness of baseline values. Therefore, we revisit the feature representation of a deep model in terms of causality, and propose to use causal patterns to examine whether the masking method faithfully removes information encoded in the input variable. |
Jie Ren; Zhanpeng Zhou; Qirui Chen; Quanshi Zhang; |
917 | OTOv2: Automatic, Generic, User-Friendly Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the second generation of Only-Train-Once (OTOv2), which trains and compresses an arbitrary DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. |
Tianyi Chen; Luming Liang; Tianyu DING; Zhihui Zhu; Ilya Zharkov; |
918 | RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we model the problem with Markov Games and propose a simple yet effective method, ranked policy memory (RPM), to collect diverse multi-agent trajectories for training MARL policies with good generalizability. |
Wei Qiu; Xiao Ma; Bo An; Svetlana Obraztsova; Shuicheng YAN; Zhongwen Xu; |
919 | Is The Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating The Bayes Error in Binary Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show \emph{uncertainty} of the classes. |
Takashi Ishida; Ikko Yamane; Nontawat Charoenphakdee; Gang Niu; Masashi Sugiyama; |
920 | Compositional Law Parsing with Latent Random Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a deep latent variable model for Compositional LAw Parsing (CLAP). |
Fan Shi; Bin Li; Xiangyang Xue; |
921 | Weakly Supervised Knowledge Transfer with Probabilistic Logical Reasoning for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ProbKT, a framework based on probabilistic logical reasoning to train object detection models with arbitrary types of weak supervision. |
Martijn Oldenhof; Adam Arany; Yves Moreau; Edward De Brouwer; |
922 | 3EF: Class-Incremental Learning Via Efficient Energy-Based Expansion and Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a unifying energy-based theory and framework called Efficient Energy-Based Expansion and Fusion (3EF) to analyze and achieve the goal of CIL. |
Fu-Yun Wang; Da-Wei Zhou; Liu Liu; Yatao Bian; Han-Jia Ye; De-Chuan Zhan; Peilin Zhao; |
923 | Neural Bregman Divergences for Distance Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to learning arbitrary Bergman divergences in a differentiable manner via input convex neural networks and show that it overcomes significant limitations of previous works. |
Fred Lu; Edward Raff; Francis Ferraro; |
924 | ChiroDiff: Modelling Chirographic Data with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely Denoising Diffusion Probabilistic Models or DDPMs for chirographic data that specifically addresses these flaws. |
Ayan Das; Yongxin Yang; Timothy Hospedales; Tao Xiang; Yi-Zhe Song; |
925 | Learning Harmonic Molecular Representations on Riemannian Manifold Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a Harmonic Molecular Representation learning (HMR) framework, which represents a molecule using the Laplace-Beltrami eigenfunctions of the molecular surface. |
Yiqun Wang; Yuning Shen; Shi Chen; Lihao Wang; Fei YE; Hao Zhou; |
926 | Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a general multi-scale framework that can be applied to state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). |
Mohammad Amin Shabani; Amir H. Abdi; Lili Meng; Tristan Sylvain; |
927 | Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we develop a stochastic multi-objective gradient correction (MoCo) method for multi-objective optimization. |
Heshan Devaka Fernando; Han Shen; Miao Liu; Subhajit Chaudhury; Keerthiram Murugesan; Tianyi Chen; |
928 | Understanding Neural Coding on Latent Manifolds By Sharing Features and Dividing Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose $\textit{feature sharing}$ across neural tuning curves, which significantly improves performance and leads to better-behaved optimization. |
Martin Bjerke; Lukas Schott; Kristopher T Jensen; Claudia Battistin; David A. Klindt; Benjamin Adric Dunn; |
929 | Meta-Learning in Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, however, strategic interactions—ranging from routing problems to online advertising auctions—evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games. |
Keegan Harris; Ioannis Anagnostides; Gabriele Farina; Mikhail Khodak; Steven Wu; Tuomas Sandholm; |
930 | On Emergence of Activation Sparsity in Trained Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper reveals a curious observation that modern large-scale machine learning models with Transformer architectures have sparse activation maps. |
Zonglin Li; Chong You; Srinadh Bhojanapalli; Daliang Li; Ankit Singh Rawat; Sashank J. Reddi; Ke Ye; Felix Chern; Felix Yu; Ruiqi Guo; Sanjiv Kumar; |
931 | Adaptive Robust Evidential Optimization For Open Set Detection from Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Adaptive Robust Evidential Optimization (AREO) that offers a principled way to quantify sample uncertainty through evidential learning while optimally balancing the model training over all classes in the close set through adaptive distributively robust optimization (DRO). |
Hitesh Sapkota; Qi Yu; |
932 | A View of Mini-batch SGD Via Generating Functions: Conditions of Convergence, Phase Transitions, Benefit from Negative Momenta Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. |
Maksim Velikanov; Denis Kuznedelev; Dmitry Yarotsky; |
933 | On The Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a theoretical and empirical analysis of why a recently-proposed automatic evaluation metric for language generators correlates well with human judgments. We identify its use of embeddings from pretrained language models as the main reason. |
Tiago Pimentel; Clara Isabel Meister; Ryan Cotterell; |
934 | Is Adversarial Training Really A Silver Bullet for Mitigating Data Poisoning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an indiscriminative feature-based poisoning approach to substantially degrade adversarial training, which was previously considered to be impossible. |
Rui Wen; Zhengyu Zhao; Zhuoran Liu; Michael Backes; Tianhao Wang; Yang Zhang; |
935 | The Devil Is in The Wrongly-classified Samples: Towards Unified Open-set Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, Unified Open-set Recognition (UOSR) has been proposed to reject not only unknown samples but also known but wrongly classified samples, which tends to be more practical in real-world applications. In this paper, we deeply analyze the UOSR task under different training and evaluation settings to shed light on this promising research direction. |
Jun CEN; Di Luan; Shiwei Zhang; Yixuan Pei; Yingya Zhang; Deli Zhao; Shaojie Shen; Qifeng Chen; |
936 | DFPC: Data Flow Driven Pruning of Coupled Channels Without Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel data-free algorithm to accelerate neural networks via pruning coupled channels. |
Tanay Narshana; Chaitanya Murti; Chiranjib Bhattacharyya; |
937 | TVSPrune – Pruning Non-discriminative Filters Via Total Variation Separability of Intermediate Representations Without Fine Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the challenge of pruning filters with only access to random samples drawn from the original distribution and without access to the original training set or loss function. |
Chaitanya Murti; Tanay Narshana; Chiranjib Bhattacharyya; |
938 | Deep Variational Implicit Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe a scalable variational inference algorithm for training DVIP and show that it outperforms previous IP-based methods and also deep GPs. |
Luis A. Ortega; Simon Rodriguez Santana; Daniel Hernández-Lobato; |
939 | Contrastive Learning for Unsupervised Domain Adaptation of Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a novel framework for UDA of time series data, called CLUDA. |
Yilmazcan Ozyurt; Stefan Feuerriegel; Ce Zhang; |
940 | Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in practice, the base and the target datasets of few-shot classification are usually from different domains, which is the problem of cross-domain few-shot classification. We tackle this problem by making a small proportion of unlabeled images in the target domain accessible in the training stage. |
Hao ZHENG; Runqi Wang; Jianzhuang Liu; Asako Kanezaki; |
941 | Achieving Sub-linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the infinite horizon average reward constrained Markov Decision Process (CMDP). |
Arnob Ghosh; Xingyu Zhou; Ness Shroff; |
942 | Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is unclear whether existing robust training methods effectively increase the margin for each vulnerable point during training. To understand this, we propose a continuous-time framework for quantifying the relative speed of the decision boundary with respect to each individual point. |
Yuancheng Xu; Yanchao Sun; Micah Goldblum; Tom Goldstein; Furong Huang; |
943 | Benchmarking Constraint Inference in Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we construct an ICRL benchmark in the context of two major application domains: robot control and autonomous driving. |
Guiliang Liu; Yudong Luo; Ashish Gaurav; Kasra Rezaee; Pascal Poupart; |
944 | SCoMoE: Efficient Mixtures of Experts with Structured Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the communication cost, we propose SCoMoE, an MoE architecture with structured all-to-all communication, inspired by the hierarchical architecture of the communication topology. |
zhiyuan zeng; Deyi Xiong; |
945 | Learning Soft Constraints From Constrained Expert Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in many settings, the agent may optimize a reward function subject to some constraints, where the constraints induce behaviors that may be otherwise difficult to express with just a reward function. We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data. |
Ashish Gaurav; Kasra Rezaee; Guiliang Liu; Pascal Poupart; |
946 | Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically and empirically show the crucial expressivity-transferability trade-off of skills across sequential tasks, controlled by information asymmetry. |
Sasha Salter; Kristian Hartikainen; Walter Goodwin; Ingmar Posner; |
947 | Programmatically Grounded, Compositionally Generalizable Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose \ours, a {\it modular} approach to better leverage pretrained VL models by exploiting the syntactic and semantic structures of an input language instruction. |
Renhao Wang; Jiayuan Mao; Joy Hsu; Hang Zhao; Jiajun Wu; Yang Gao; |
948 | Behavior Proximal Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, starting from the analysis of offline monotonic policy improvement, we get a surprising finding that some online on-policy algorithms are naturally able to solve offline RL. |
Zifeng Zhuang; Kun LEI; Jinxin Liu; Donglin Wang; Yilang Guo; |
949 | TabCaps: A Capsule Neural Network for Tabular Data Classification with BoW Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to encapsulate all tabular features of an instance into vectorial features and process them collectively rather than have to deal with individual ones, which directly captures the representations at the instance level and benefits robust performances. |
Jintai Chen; KuanLun Liao; Yanwen Fang; Danny Chen; Jian Wu; |
950 | An Exact Poly-Time Membership-Queries Algorithm for Extracting A Three-Layer ReLU Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a polynomial-time algorithm that can learn a depth-two ReLU network from queries under mild general position assumptions. |
Amit Daniely; Elad Granot; |
951 | Protein Representation Learning Via Knowledge Enhanced Primary Structure Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it fails to consider the semantic gap between protein sequences and natural language, and the resulting feature misalignment may adversely affect representation learning. To mitigate this, we propose Knowledge-exploited Auto-encoder for Proteins (KeAP), which performs implicit knowledge encoding by learning to exploit knowledge for protein primary structure reasoning. |
Hong-Yu Zhou; Yunxiang Fu; Zhicheng Zhang; Bian Cheng; Yizhou Yu; |
952 | GOOD: Exploring Geometric Cues for Detecting Objects in An Open World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. |
Haiwen Huang; Andreas Geiger; Dan Zhang; |
953 | Cycle to Clique (Cy2C) Graph Neural Network: A Sight to See Beyond Neighborhood Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper mathematically identifies the caliber of graph neural networks in classifying isomorphism classes of graphs with continuous node attributes up to their local topological properties. |
Yun Young Choi; Sun Woo Park; Youngho Woo; U Jin Choi; |
954 | Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose multitask prompt tuning (MPT), which first learns a single transferable prompt by decomposing and distilling knowledge from multiple task-specific source prompts. |
Zhen Wang; Rameswar Panda; Leonid Karlinsky; Rogerio Feris; Huan Sun; Yoon Kim; |
955 | Contrastive Audio-Visual Masked Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we propose the Contrastive Audio-Visual Masked Auto-Encoder (CAV-MAE) by combining contrastive learning and masked data modeling, two major self-supervised learning frameworks, to learn a joint and coordinated audio-visual representation. |
Yuan Gong; Andrew Rouditchenko; Alexander H. Liu; David Harwath; Leonid Karlinsky; Hilde Kuehne; James R. Glass; |
956 | Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a memory- and time-efficient second-order algorithm named Eva with two novel techniques: 1) we construct the second-order information with the Kronecker factorization of small stochastic vectors over a mini-batch of training data to reduce memory consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices using the Sherman-Morrison formula. |
Lin Zhang; Shaohuai Shi; Bo Li; |
957 | Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of teacher’s features as the prior knowledge before the feature distillation. |
Martin Zong; Zengyu Qiu; Xinzhu Ma; Kunlin Yang; Chunya Liu; Jun Hou; Shuai Yi; Wanli Ouyang; |
958 | SketchKnitter: Vectorized Sketch Generation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show vectorized sketch generation can be identified as a reversal of the stroke deformation process. |
Qiang Wang; Haoge Deng; Yonggang Qi; Da Li; Yi-Zhe Song; |
959 | Towards Better Selective Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we challenge the aforementioned methods and confirm that the superior performance of state-of-the-art methods is owed to training a more generalizable classifier rather than their proposed selection mechanisms. |
Leo Feng; Mohamed Osama Ahmed; Hossein Hajimirsadeghi; Amir H. Abdi; |
960 | Achieve Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a near-optimal algorithm on both individual and group regrets, in addition, we also propose a novel communication module in the algorithm, which only needs O(log(logT)) communication times where T is the number of decision rounds. |
Xuchuang Wang; Lin Yang; Yu-Zhen Janice Chen; Xutong Liu; Mohammad Hajiesmaili; Don Towsley; John C.S. Lui; |
961 | Learning Hyper Label Model for Programmatic Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a hyper label model that (once learned) infers the ground-truth labels for each dataset in a single forward pass without dataset-specific parameter learning. |
Renzhi Wu; Shen-En Chen; Jieyu Zhang; Xu Chu; |
962 | A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that a single algorithm—a simple extension to mirror descent with proximal regularization that we call magnetic mirror descent (MMD)—can produce strong results in both settings, despite their fundamental differences. |
Samuel Sokota; Ryan D’Orazio; J Zico Kolter; Nicolas Loizou; Marc Lanctot; Ioannis Mitliagkas; Noam Brown; Christian Kroer; |
963 | Anisotropic Message Passing: Graph Neural Networks with Directional and Long-Range Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, an anisotropic state based on Cartesian multipoles is proposed as an addition to the existing hidden features. |
Moritz Thürlemann; Sereina Riniker; |
964 | SYNC: SAFETY-AWARE NEURAL CONTROL FOR STABILIZING STOCHASTIC DELAY-DIFFERENTIAL EQUATIONS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Stabilization of the systems described by stochastic delay-differential equations is a challenging task in control community. Here, to achieve this task, we leverage neural networks to learn control policies using the information of the controlled systems in some prescribed regions. |
Jingdong Zhang; Qunxi Zhu; Wei Yang; Wei Lin; |
965 | That Label’s Got Style: Handling Label Style Bias for Uncertain Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in datasets that contain both data variability and differing label styles. In this paper, we demonstrate that applying state-of-the-art segmentation uncertainty models on such datasets can lead to model bias caused by the different label styles. |
Kilian Zepf; Eike Petersen; Jes Frellsen; Aasa Feragen; |
966 | Tensor-Based Sketching Method for The Low-Rank Approximation of Data Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, from a subspace perspective, we propose a tensor-based sketching method for low-rank approximation of data streams. |
Cuiyu Liu; Xiao Chuanfu; Mingshuo Ding; Chao Yang; |
967 | Differentiable Mathematical Programming for Object-Centric Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose topology-aware feature partitioning into $k$ disjoint partitions for given scene features as a method for object-centric representation learning. |
Adeel Pervez; Phillip Lippe; Efstratios Gavves; |
968 | Semi-Implicit Variational Inference Via Score Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching. |
Longlin Yu; Cheng Zhang; |
969 | Scalable Subset Sampling with Neural Conditional Poisson Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple alternative method for sampling subsets based on \emph{conditional Poisson sampling}. |
Adeel Pervez; Phillip Lippe; Efstratios Gavves; |
970 | POPGym: Benchmarking Partially Observable Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Partially Observable Process Gym (POPGym), a two-part library containing (1) a diverse collection of 14 partially observable environments, each with multiple difficulties and (2) implementations of 13 memory model baselines — the most in a single RL library. |
Steven Morad; Ryan Kortvelesy; Matteo Bettini; Stephan Liwicki; Amanda Prorok; |
971 | Write and Paint: Generative Vision-Language Models Are Unified Modal Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we disclose the potential of symmetrical generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs. |
Shizhe Diao; Wangchunshu Zhou; Xinsong Zhang; Jiawei Wang; |
972 | Unsupervised Meta-learning Via Few-shot Pseudo-supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the limitations, we propose a simple yet effective unsupervised meta-learning framework, coined Pseudo-supervised Contrast (PsCo), for few-shot classification. |
Huiwon Jang; Hankook Lee; Jinwoo Shin; |
973 | Policy-Based Self-Competition for Planning Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We leverage the idea of self-competition and directly incorporate a historical policy into the planning process instead of its scalar performance. Based on the recently introduced Gumbel AlphaZero (GAZ), we propose our algorithm GAZ ‘Play-to-Plan’ (GAZ PTP), in which the agent learns to find strong trajectories by planning against possible strategies of its past self. |
Jonathan Pirnay; Quirin Göttl; Jakob Burger; Dominik Gerhard Grimm; |
974 | Active Image Indexing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper improves the robustness of image copy detection with active indexing, that optimizes the interplay of these two components. |
Pierre Fernandez; Matthijs Douze; Herve Jegou; Teddy Furon; |
975 | LDMIC: Learning-based Distributed Multi-view Image Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design a multi-view image compression framework based on symmetric distributed source coding paradigm, which achieves higher compression performance than previous multi-view image compression methods. |
Xinjie Zhang; Jiawei Shao; Jun Zhang; |
976 | Few-shot Cross-domain Image Generation Via Inference-time Latent-code Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, our objective is to adapt a Deep generative model trained on a large-scale source dataset to multiple target domains with scarce data. |
Arnab Kumar Mondal; Piyush Tiwary; Parag Singla; Prathosh AP; |
977 | Global Explainability of GNNs Via Logic Combination of Learned Concepts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose GLGExplainer (Global Logic-based GNN Explainer), the first Global Explainer capable of generating explanations as arbitrary Boolean combinations of learned graphical concepts. |
Steve Azzolin; Antonio Longa; Pietro Barbiero; Pietro Lio; Andrea Passerini; |
978 | Differentially Private $L_2$-Heavy Hitters in The Sliding Window Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the problem of privately releasing the $L_2$-heavy hitters in the sliding window model, which include $L_p$-heavy hitters for $p\le 2$ and in some sense are the strongest possible guarantees that can be achieved using polylogarithmic space, but cannot be handled by existing techniques due to the sub-additivity of the $L_2$ norm. |
Jeremiah Blocki; Seunghoon Lee; Tamalika Mukherjee; Samson Zhou; |
979 | Gradient Boosting Performs Gaussian Process Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that gradient boosting based on symmetric decision trees can be equivalently reformulated as a kernel method that converges to the solution of a certain Kernel Ridge Regression problem. |
Aleksei Ustimenko; Artem Beliakov; Liudmila Prokhorenkova; |
980 | Calibrating Sequence Likelihood Improves Conditional Language Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model’s latent space. |
Yao Zhao; Mikhail Khalman; Rishabh Joshi; Shashi Narayan; Mohammad Saleh; Peter J Liu; |
981 | The Continuous CNN: from Task-Specific to Unified CNN Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to incorporate considerations such as input length, resolution, and dimentionality of the data. To overcome the need for such problem-specific CNN architectures, and the fragmentation they represent to the field, we introduce the \textit{Continuous Convolutional Neural Network} (CCNN): a single CNN architecture that can be used for tasks on data of arbitrary resolution, dimensionality and length without structural changes. |
David M Knigge; David W. Romero; Albert Gu; Efstratios Gavves; Erik J Bekkers; Jakub Mikolaj Tomczak; Mark Hoogendoorn; Jan-jakob Sonke; |
982 | SMART: Sentences As Basic Units for Text Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Widely used evaluation metrics for text generation either do not work well with longer texts or fail to evaluate all aspects of text quality. In this paper, we introduce a new metric called SMART to mitigate such limitations. |
Reinald Kim Amplayo; Peter J Liu; Yao Zhao; Shashi Narayan; |
983 | Planckian Jitter: Countering The Color-crippling Effects of Color Jitter on Self-supervised Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyze how the color jitter traditionally used in data augmentation negatively impacts the quality of the color features in learned feature representations. |
Simone Zini; Alex Gomez-Villa; Marco Buzzelli; Bartłomiej Twardowski; Andrew D. Bagdanov; Joost van de weijer; |
984 | GAMR: A Guided Attention Model for (visual) Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present a novel transformer-based module for visual reasoning, the Guided Attention Model for (visual) Reasoning ($\textit{GAMR}$), which instantiates an active vision theory — positing that the brain solves complex visual reasoning problems dynamically — via sequences of attention shifts to select and route task-relevant visual information into memory. |
Mohit Vaishnav; Thomas Serre; |
985 | Disentangling Learning Representations with Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Gaussian Channel Autoencoder (GCAE), a method which achieves reliable disentanglement via scalable non-parametric density estimation of the latent space. |
Eric Yeats; Frank Y Liu; Hai Li; |
986 | Reward Design with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores how to simplify reward design by using a large language model (LLM) such as GPT-3 as a proxy reward function, where the user provides a textual prompt containing a few examples (few-shot) or a description (zero-shot) of desired behavior. Our approach leverages this proxy reward function in an RL framework. |
Minae Kwon; Sang Michael Xie; Kalesha Bullard; Dorsa Sadigh; |
987 | Bayes-MIL: A New Probabilistic Perspective on Attention-based Multiple Instance Learning for Whole Slide Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Bayes-MIL to address the problem from a probabilistic perspective. |
Yufei CUI; Ziquan Liu; Xiangyu Liu; Xue Liu; Cong Wang; Tei-Wei Kuo; Chun Jason Xue; Antoni B. Chan; |
988 | When to Make and Break Commitments? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the time pressure created by the continual cost of keeping a commitment, we aim to answer: When should a decision-maker break a commitment that is likely to fail—either to make an alternative commitment or to make no further commitments at all? |
Alihan Hüyük; Zhaozhi Qian; Mihaela van der Schaar; |
989 | CASR: Generating Complex Sequences with Autoregressive Self-Boost Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for complex sequences, the heuristic rules to break down them may hurt performance, and increase additional exposure bias. To tackle these challenges, we propose a PLM-friendly autoregressive self-boost refinement framework, CASR. |
Hongwei Han; Mengyu Zhou; Shi Han; Xiu Li; Dongmei Zhang; |
990 | Coverage-centric Coreset Selection for High Pruning Rates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These methods perform well at low pruning rates; but at high pruning rates, they have been found to suffer a catastrophic accuracy drop, performing worse than even random coreset selection. In this paper, we explore the reasons for this accuracy drop both theoretically and empirically. |
Haizhong Zheng; Rui Liu; Fan Lai; Atul Prakash; |
991 | Long-Tailed Partial Label Learning Via Dynamic Rebalancing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that even with the auxiliary of an oracle class prior, the state-of-the-art methods underperform due to an adverse fact that the constant rebalancing in LT is harsh to the label disambiguation in PLL. To overcome this challenge, we thus propose a dynamic rebalancing method, termed as RECORDS, without assuming any prior knowledge about the class distribution. |
Feng Hong; Jiangchao Yao; Zhihan Zhou; Yanfeng Wang; Ya Zhang; |
992 | Learning with Stochastic Orders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning high-dimensional distributions is often done with explicit likelihood modeling or implicit modeling via minimizing integral probability metrics (IPMs). In this paper, we expand this learning paradigm to stochastic orders, namely, the \emph{convex} or \emph{Choquet order} between probability measures. |
Carles Domingo-Enrich; Yair Schiff; Youssef Mroueh; |
993 | Certifiably Robust Policy Learning Against Adversarial Multi-Agent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $C<\frac{N-1}{2}$ agents to a victim agent. |
Yanchao Sun; Ruijie Zheng; Parisa Hassanzadeh; Yongyuan Liang; Soheil Feizi; Sumitra Ganesh; Furong Huang; |
994 | Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In such cases, a combination of both a demonstration and an instruction more concisely and effectively conveys the task to the robot than either modality alone. To instantiate this problem setting, we train a single multi-task policy on a few hundred challenging robotic pick-and-place tasks and propose DeL-TaCo (Joint Demo-Language Task Conditioning), a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. |
Albert Yu; Ray Mooney; |
995 | Strategic Classification on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As we show through analysis and simulation, this can work either against the system—or for it. Based on this, we propose a differentiable framework for strategically-robust learning of graph-based classifiers. |
Itay Eilat; Ben Finkelshtein; Chaim Baskin; Nir Rosenfeld; |
996 | Over-Training with Mixup May Hurt Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we report a previously unobserved phenomenon in Mixup raining: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. |
Zixuan Liu; Ziqiao Wang; Hongyu Guo; Yongyi Mao; |
997 | Formal Mathematics Statement Curriculum Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore the use of expert iteration in the context of language modeling applied to formal mathematics. |
Stanislas Polu; Jesse Michael Han; Kunhao Zheng; Mantas Baksys; Igor Babuschkin; Ilya Sutskever; |
998 | D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a deep learning approach to KS-DFT. |
Tianbo Li; Min Lin; Zheyuan Hu; Kunhao Zheng; Giovanni Vignale; Kenji Kawaguchi; A.H. Castro Neto; Kostya S. Novoselov; Shuicheng YAN; |
999 | The Trade-off Between Universality and Label Efficiency of Representations from Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There are two key desiderata for the representation: label efficiency (the ability to learn an accurate classifier on top of the representation with a small amount of labeled data) and universality (usefulness across a wide range of downstream tasks). In this paper, we focus on one of the most popular instantiations of this paradigm: contrastive learning with linear probing, i.e., learning a linear predictor on the representation pre-trained by contrastive learning. |
Zhenmei Shi; Jiefeng Chen; Kunyang Li; Jayaram Raghuram; Xi Wu; Yingyu Liang; Somesh Jha; |
1000 | Learning Heterogeneous Interaction Strengths By Trajectory Prediction with Graph Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the relational attentive inference network (RAIN) to infer continuously weighted interaction graphs without any ground-truth interaction strengths. |
Seungwoong Ha; Hawoong Jeong; |
1001 | Conditional Antibody Design As 3D Equivariant Graph Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Multi-channel Equivariant Attention Network (MEAN), an end-to-end model that is able to co-design 1D sequences and 3D structures of CDRs. |
Xiangzhe Kong; Wenbing Huang; Yang Liu; |
1002 | Solving Continuous Control Via Q-learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements as well as wider hyperparameter search spaces. We show that these issues can be largely alleviated via Q-learning by combining action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL). |
Tim Seyde; Peter Werner; Wilko Schwarting; Igor Gilitschenski; Martin Riedmiller; Daniela Rus; Markus Wulfmeier; |
1003 | Masked Visual-Textual Prediction for Document Image Representation Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Masked Visual-Textual Prediction for document image representation pretraining, called MaskDoc. |
Yuechen Yu; Yulin Li; Chengquan Zhang; Xiaoqiang Zhang; Zengyuan Guo; Xiameng Qin; Kun Yao; Junyu Han; Errui Ding; Jingdong Wang; |
1004 | Bit-Pruning: A Sparse Multiplication-Less Dot-Product Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we realize energy-efficient neural networks by exploiting a $\texttt{mult}$-less, sparse dot-product. |
Yusuke Sekikawa; Shingo Yashima; |
1005 | Robust Active Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a parameter-free approach with provable guarantees to query the soft-labels of points that are simultaneously informative and correctly labeled by the teacher. |
Cenk Baykal; Khoa Trinh; Fotis Iliopoulos; Gaurav Menghani; Erik Vee; |
1006 | Approximate Nearest Neighbor Search Through Modern Error-Correcting Codes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a better way of using LSH functions for ANNS. |
Noam Touitou; Nissim Halabi; |
1007 | Self-supervised Learning with Rotation-invariant Kernels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a regularization loss based on kernel mean embeddings with rotation-invariant kernels on the hypersphere (also known as dot-product kernels) for self-supervised learning of image representations. |
Léon Zheng; Gilles Puy; Elisa Riccietti; Patrick Perez; Rémi Gribonval; |
1008 | MocoSFL: Enabling Cross-client Collaborative Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing collaborative self-supervised learning (SSL) schemes are not suitable for cross-client applications because of their expensive computation and large local data requirements. To address these issues, we propose MocoSFL, a collaborative SSL framework based on Split Federated Learning (SFL) and Momentum Contrast (MoCo). |
Jingtao Li; Lingjuan Lyu; Daisuke Iso; Chaitali Chakrabarti; Michael Spranger; |
1009 | Architectural Optimization Over Subgroups of Equivariant Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the equivariance relaxation morphism, which preserves functionality while reparameterizing a group equivariant layer to operate with equivariance constraints on a subgroup, as well as the $[G]$-mixed equivariant layer, which mixes layers constrained to different groups to enable within-layer equivariance optimization. |
Kaitlin Maile; Dennis George Wilson; Patrick Forré; |
1010 | DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead frame molecular docking as a generative modeling problem and develop DiffDock, a diffusion generative model over the non-Euclidean manifold of ligand poses. |
Gabriele Corso; Hannes Stärk; Bowen Jing; Regina Barzilay; Tommi S. Jaakkola; |
1011 | ViewCo: Discovering Text-Supervised Segmentation Masks Via Multi-View Semantic Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works focus on pixel grouping and cross-modal semantic alignment, while ignoring the correspondence among multiple augmented views of the same image. To overcome such limitation, we propose multi-View Consistent learning (ViewCo) for text-supervised semantic segmentation. |
Pengzhen Ren; Changlin Li; Hang Xu; Yi Zhu; Guangrun Wang; Jianzhuang Liu; Xiaojun Chang; Xiaodan Liang; |
1012 | Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, a new approach is proposed based on, for the first time, an interplay between thoughtfully designed continuous and discrete dynamics. |
Lingkai Kong; Yuqing Wang; Molei Tao; |
1013 | Re-calibrating Feature Attributions for Model Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a re-calibration technique to calibrate existing integral-based attribution methods with valid references for a consistent explanation. |
Peiyu Yang; NAVEED AKHTAR; Zeyi Wen; Mubarak Shah; Ajmal Saeed Mian; |
1014 | Meta-Learning Black-Box Optimization Via Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose to discover effective update rules for evolution strategies via meta-learning. |
Robert Tjarko Lange; Tom Schaul; Yutian Chen; Tom Zahavy; Valentin Dalibard; Chris Lu; Satinder Singh; Sebastian Flennerhag; |
1015 | Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by an observation that the sentences in parallel documents are approximately in the same order, which is universal across languages, we propose to model this sequential sentence relation to facilitate cross-lingual representation learning. |
Shunyu Zhang; Yaobo Liang; MING GONG; Daxin Jiang; Nan Duan; |
1016 | Emergent World Representations: Exploring A Sequence Model Trained on A Synthetic Task Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. |
Kenneth Li; Aspen K Hopkins; David Bau; Fernanda Viégas; Hanspeter Pfister; Martin Wattenberg; |
1017 | PASHA: Efficient HPO and NAS with Progressive Resource Allocation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. |
Ondrej Bohdal; Lukas Balles; Martin Wistuba; Beyza Ermis; Cedric Archambeau; Giovanni Zappella; |
1018 | Broken Neural Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a smoothly broken power law functional form that accurately models the scaling behaviors (of artificial neural networks) (i.e. how the evaluation metric of interest varies as the amount of compute used for training, number of model parameters, or training dataset size varies) for each task from a very large and diverse set of upstream and downstream (i.e. zero-shot, prompted, and fine-tuned) tasks. |
Ethan Caballero; Kshitij Gupta; Irina Rish; David Krueger; |
1019 | Metadata Archaeology: Unearthing Data Subsets By Leveraging Training Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work provides a unified and efficient framework for Metadata Archaeology — uncovering and inferring metadata of examples in a dataset. |
Shoaib Ahmed Siddiqui; Nitarshan Rajkumar; Tegan Maharaj; David Krueger; Sara Hooker; |
1020 | Learnable Graph Convolutional Attention Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim at exploiting the strengths of both approaches to their full extent. |
Adrián Javaloy; Pablo Sanchez Martin; Amit Levi; Isabel Valera; |
1021 | Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information. |
Jie Yang; Ailing Zeng; Shilong Liu; Feng Li; Ruimao Zhang; Lei Zhang; |
1022 | Sample-Efficient Reinforcement Learning By Breaking The Replay Ratio Barrier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. |
Pierluca D’Oro; Max Schwarzer; Evgenii Nikishin; Pierre-Luc Bacon; Marc G Bellemare; Aaron Courville; |
1023 | Simplified State Space Layers for Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. |
Jimmy T.H. Smith; Andrew Warrington; Scott Linderman; |
1024 | Distributed Extra-gradient with Optimal Complexity and Communication Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs. |
Ali Ramezani-Kebrya; Kimon Antonakopoulos; Igor Krawczuk; Justin Deschenaux; Volkan Cevher; |
1025 | DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DINO (DETR with Improved deNoising anchOr boxes), a strong end-to-end object detector. |
Hao Zhang; Feng Li; Shilong Liu; Lei Zhang; Hang Su; Jun Zhu; Lionel Ni; Harry Shum; |
1026 | Contrastive Meta-Learning for Partially Observable Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the problem of learning a unified representation from partial observations, where useful features may be present in only some of the views. |
Adam Jelley; Amos Storkey; Antreas Antoniou; Sam Devlin; |
1027 | Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel algorithm that designs exploration incentives via learnable representations of the dynamics model by embedding the neural dynamics into a kernel space induced by the system noise. |
Sirui Zheng; Lingxiao Wang; Shuang Qiu; Zuyue Fu; Zhuoran Yang; Csaba Szepesvari; Zhaoran Wang; |
1028 | CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose subtokenziation that reduces average length by 17–40% without downstream performance drop, and show that a carefully chosen subtokenization may significantly improve quality by 0.5-2%, possibly with some length increase. |
Nadezhda Chirkova; Sergey Troshin; |
1029 | TILP: Differentiable Learning of Temporal Logical Rules on Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TILP, a differentiable framework for temporal logical rules learning. |
Siheng Xiong; Yuan Yang; Faramarz Fekri; James Clayton Kerce; |
1030 | Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a more fundamental solution, we propose new prior and posterior distributions invariant to scaling transformations by \textit{decomposing} the scale and connectivity of parameters, thereby allowing the resulting generalization bound to describe the generalizability of a broad class of networks with the more practical class of transformations such as weight decay with batch normalization. |
SungYub Kim; Sihwan Park; Kyung-Su Kim; Eunho Yang; |
1031 | LogicDP: Creating Labels for Graph Data Via Inductive Logic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LogicDP, a data programming framework for graph data. |
Yuan Yang; Faramarz Fekri; James Clayton Kerce; Ali Payani; |
1032 | Hierarchical Protein Representations Via Complete 3D Graph Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to develop a novel hierarchical graph network, known as ProNet, to capture the relations. |
Limei Wang; Haoran Liu; Yi Liu; Jerry Kurtin; Shuiwang Ji; |
1033 | A Statistical Framework for Personalized Federated Learning and Estimation: Theory, Algorithms, and Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we begin with a generative framework that could potentially unify several different algorithms as well as suggest new algorithms. |
Kaan Ozkara; Antonious M. Girgis; Deepesh Data; Suhas Diggavi; |
1034 | WiNeRT: Towards Neural Ray Tracing for Wireless Channel Modelling and Differentiable Simulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we work towards a neural surrogate to model wireless electro-magnetic propagation effects in indoor environments. |
Tribhuvanesh Orekondy; Pratik Kumar; Shreya Kadambi; Hao Ye; Joseph Soriaga; Arash Behboodi; |
1035 | Grounding Graph Network Simulators Using Physical Sensor Observations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we integrate sensory information to ground Graph Network Simulators on real world observations. |
Jonas Linkerhägner; Niklas Freymuth; Paul Maria Scheikl; Franziska Mathis-Ullrich; Gerhard Neumann; |
1036 | Implicit Regularization for Group Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. |
Jiangyuan Li; Thanh V Nguyen; Chinmay Hegde; Raymond K. W. Wong; |
1037 | Noise Injection Node Regularization for Robust Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Noise Injection Node Regularization (NINR), a method of injecting structured noise into Deep Neural Networks (DNN) during the training stage, resulting in an emergent regularizing effect. |
Noam Itzhak Levi; Itay Mimouni Bloch; Marat Freytsis; Tomer Volansky; |
1038 | Compositional Semantic Parsing with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. |
Andrew Drozdov; Nathanael Schärli; Ekin Akyürek; Nathan Scales; Xinying Song; Xinyun Chen; Olivier Bousquet; Denny Zhou; |
1039 | Guess The Instruction! Making Language Models Stronger Zero-Shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. |
Seonghyeon Ye; Doyoung Kim; Joel Jang; Joongbo Shin; Minjoon Seo; |
1040 | Binding Language Models in Symbolic Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Binder, a training-free neural-symbolic framework that maps the task input to a program, which (1) allows binding a unified API of language model (LM) functionalities to a programming language (e.g., SQL, Python) to extend its grammar coverage and thus tackle more diverse questions, (2) adopts an LM as both the program parser and the underlying model called by the API during execution, and (3) requires only a few in-context exemplar annotations. |
Zhoujun Cheng; Tianbao Xie; Peng Shi; Chengzu Li; Rahul Nadkarni; Yushi Hu; Caiming Xiong; Dragomir Radev; Mari Ostendorf; Luke Zettlemoyer; Noah A. Smith; Tao Yu; |
1041 | Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose the Summarization Program (SP), an interpretable modular framework consisting of an (ordered) list of binary trees, each encoding the step-by-step generative process of an abstractive summary sentence from the source document. |
Swarnadeep Saha; Shiyue Zhang; Peter Hase; Mohit Bansal; |
1042 | Lower Bounds on The Depth of Integral ReLU Neural Networks Via Lattice Polytopes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the set of functions representable by ReLU neural networks with integer weights strictly increases with the network depth while allowing arbitrary width. |
Christian Alexander Haase; Christoph Hertrich; Georg Loho; |
1043 | Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead propose a DTW layer based around deep declarative networks. |
Ming Xu; Sourav Garg; Michael Milford; Stephen Gould; |
1044 | Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it for vessel segmentation. |
Boah Kim; Yujin Oh; Jong Chul Ye; |
1045 | The Union of Manifolds Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does not properly capture the low-dimensional structure typically present in data. |
Bradley CA Brown; Anthony L. Caterini; Brendan Leigh Ross; Jesse C Cresswell; Gabriel Loaiza-Ganem; |
1046 | Disparate Impact in Differential Privacy from Gradient Misalignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we study the fine-grained causes of unfairness in DPSGD and identify gradient misalignment due to inequitable gradient clipping as the most significant source. |
Maria S. Esipova; Atiyeh Ashari Ghomi; Yaqiao Luo; Jesse C Cresswell; |
1047 | Multi-Rate VAE: Train Once, Get The Full Rate-Distortion Curve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Multi-Rate VAE (MR-VAE), a computationally efficient framework for learning optimal parameters corresponding to various $\beta$ in a single training run. |
Juhan Bae; Michael R. Zhang; Michael Ruan; Eric Wang; So Hasegawa; Jimmy Ba; Roger Baker Grosse; |
1048 | Closing The Gap: Exact Maximum Likelihood Training of Generative Autoencoders Using Invertible Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide an exact likelihood alternative to the variational training of generative autoencoders. |
Gianluigi Silvestri; Daan Roos; Luca Ambrogioni; |
1049 | Benchmarking Offline Reinforcement Learning on Real-Robot Hardware Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. |
Nico Gürtler; Sebastian Blaes; Pavel Kolev; Felix Widmaier; Manuel Wuthrich; Stefan Bauer; Bernhard Schölkopf; Georg Martius; |
1050 | PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology. |
Xuan Li; Yi-Ling Qiao; Peter Yichen Chen; Krishna Murthy Jatavallabhula; Ming Lin; Chenfanfu Jiang; Chuang Gan; |
1051 | Sign and Basis Invariant Networks for Spectral Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SignNet and BasisNet—new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if v is an eigenvector then so is -v; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. |
Derek Lim; Joshua David Robinson; Lingxiao Zhao; Tess Smidt; Suvrit Sra; Haggai Maron; Stefanie Jegelka; |
1052 | KNN-Diffusion: Image Generation Via Large-Scale Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose using large-scale retrieval methods, in particular, efficient k-Nearest-Neighbors (kNN), which offers novel capabilities: (1) training a substantially small and efficient text-to-image diffusion model without any text, (2) generating out-of-distribution images by simply swapping the retrieval database at inference time, and (3) performing text-driven local semantic manipulations while preserving object identity. |
Shelly Sheynin; Oron Ashual; Adam Polyak; Uriel Singer; Oran Gafni; Eliya Nachmani; Yaniv Taigman; |
1053 | AudioGen: Textually Guided Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of generating audio samples conditioned on descriptive text captions. |
Felix Kreuk; Gabriel Synnaeve; Adam Polyak; Uriel Singer; Alexandre Défossez; Jade Copet; Devi Parikh; Yaniv Taigman; Yossi Adi; |
1054 | On The Importance and Applicability of Pre-Training for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients’ data. |
Hong-You Chen; Cheng-Hao Tu; Ziwei Li; Han Wei Shen; Wei-Lun Chao; |
1055 | Fisher-Legendre (FishLeg) Optimization of Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new approach to estimate the natural gradient via Legendre-Fenchel duality, provide a convergence proof, and show competitive performance on a number of benchmarks. |
Jezabel R Garcia; Federica Freddi; Stathi Fotiadis; Maolin Li; Sattar Vakili; Alberto Bernacchia; Guillaume Hennequin; |
1056 | Multi-Objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform a rigorous analysis of policy induced value functions and use the insights to distinguish three views of Pareto optimality. |
Haoye Lu; Daniel Herman; Yaoliang Yu; |
1057 | MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a unified permutation-equivalent modeling approach, i.e., modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. |
Bencheng Liao; Shaoyu Chen; Xinggang Wang; Tianheng Cheng; Qian Zhang; Wenyu Liu; Chang Huang; |
1058 | Discrete Predictor-Corrector Diffusion Models for Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Discrete Predictor-Corrector diffusion models (DPC), extending predictor-corrector samplers in Gaussian diffusion models to the discrete case. |
Jose Lezama; Tim Salimans; Lu Jiang; Huiwen Chang; Jonathan Ho; Irfan Essa; |
1059 | On The Certification of Classifiers for Outperforming Human Annotators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is $\textit{unobserved}$. |
Qiongkai Xu; Christian Walder; Chenchen Xu; |
1060 | Towards Stable Test-time Adaptation in Dynamic Wild World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. |
Shuaicheng Niu; Jiaxiang Wu; Yifan Zhang; Zhiquan Wen; Yaofo Chen; Peilin Zhao; Mingkui Tan; |
1061 | Does Decentralized Learning with Non-IID Unlabeled Data Benefit from Self Supervision? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we carefully study decentralized learning with unlabeled data through the lens of self-supervised learning (SSL), specifically contrastive visual representation learning. |
Lirui Wang; Kaiqing Zhang; Yunzhu Li; Yonglong Tian; Russ Tedrake; |
1062 | PEER: A Collaborative Language Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and incapable of verbally planning or explaining their actions. To address these shortcomings, we introduce PEER, a collaborative language model that is trained to imitate the entire writing process itself. |
Timo Schick; Jane A. Yu; Zhengbao Jiang; Fabio Petroni; Patrick Lewis; Gautier Izacard; Qingfei You; Christoforos Nalmpantis; Edouard Grave; Sebastian Riedel; |
1063 | Embedding Fourier for Ultra-High-Definition Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing methods that address the problem in the spatial domain, we propose a new solution, UHDFour, that embeds Fourier transform into a cascaded network. |
Chongyi Li; Chun-Le Guo; man zhou; Zhexin Liang; Shangchen Zhou; Ruicheng Feng; Chen Change Loy; |
1064 | Implicit Regularization in Heavy-ball Momentum Accelerated Stochastic Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the implicit regularization in (SGD+M) and (GD+M) through a series of experiments validating our theory. |
Avrajit Ghosh; He Lyu; Xitong Zhang; Rongrong Wang; |
1065 | Selective Annotation Makes Language Models Better Few-Shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on this framework, we propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate. |
Hongjin SU; Jungo Kasai; Chen Henry Wu; Weijia Shi; Tianlu Wang; Jiayi Xin; Rui Zhang; Mari Ostendorf; Luke Zettlemoyer; Noah A. Smith; Tao Yu; |
1066 | Imitating Human Behaviour with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce several innovations to make diffusion models suitable for sequential environments; designing suitable architectures, investigating the role of guidance, and developing reliable sampling strategies. |
Tim Pearce; Tabish Rashid; Anssi Kanervisto; Dave Bignell; Mingfei Sun; Raluca Georgescu; Sergio Valcarcel Macua; Shan Zheng Tan; Ida Momennejad; Katja Hofmann; Sam Devlin; |
1067 | Free Lunch for Domain Adversarial Training: Environment Label Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its success, we observe training instability from DAT, mostly due to over-confident domain discriminator and environment label noise. To address this issue, we proposed Environment Label Smoothing (ELS), which encourages the discriminator to output soft probability, which thus reduces the confidence of the discriminator and alleviates the impact of noisy environment labels. |
YiFan Zhang; xue wang; Jian Liang; Zhang Zhang; Liang Wang; Rong Jin; Tieniu Tan; |
1068 | Improved Sample Complexity for Reward-free Reinforcement Learning Under Low-rank MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on reward-free RL under low-rank MDP models, which capture the representation learning in RL. |
Yuan Cheng; Ruiquan Huang; Yingbin Liang; Jing Yang; |
1069 | CircNet: Meshing 3D Point Clouds with Circumcenter Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Taking advantage of learning-based techniques in triangulation, existing methods enumerate the complete combinations of candidate triangles, which is both complex and inefficient. In this paper, we leverage the duality between a triangle and its circumcenter, and introduce a deep neural network that detects the circumcenters to achieve point cloud triangulation. |
Huan Lei; Ruitao Leng; Liang Zheng; Hongdong Li; |
1070 | Towards The Generalization of Contrastive Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we define a kind of $(\sigma,\delta)$-measure to mathematically quantify the data augmentation, and then provide an upper bound of the downstream classification error rate based on the measure. |
Weiran Huang; Mingyang Yi; Xuyang Zhao; Zihao Jiang; |
1071 | Towards The Out-of-Distribution Generalization of Contrastive Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by focusing on the data augmentation used in SSL, we establish a theoretical framework for the OOD performance of contrastive-based self-supervised learning. |
Xuyang Zhao; Tianqi Du; Yisen Wang; Jun Yao; Weiran Huang; |
1072 | Representational Dissimilarity Metric Spaces for Stochastic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these measures of _deterministic_ representational similarity ignore the scale and geometric structure of noise, both of which play important roles in neural computation. To rectify this, we generalize previously proposed shape metrics (Williams et al. 2021) to quantify differences in _stochastic_ representations. |
Lyndon Duong; Josue Nassar; Jingyang Zhou; Jules Berman; Jeroen Olieslagers; Alex H Williams; |
1073 | On The Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. |
Yifan Xu; Nicklas Hansen; Zirui Wang; Yung-Chieh Chan; Hao Su; Zhuowen Tu; |
1074 | Multimodal Federated Learning Via Contrastive Representation Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose \textit{Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)}, a multimodal federated learning framework that enables training larger server models from clients with heterogeneous model architectures and data modalities, while only communicating knowledge on public dataset. |
Qiying Yu; Yimu Wang; Ke Xu; Yang Liu; Jingjing Liu; |
1075 | Improved Learning-augmented Algorithms for K-means and K-medians Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the problem of clustering in the learning-augmented setting. |
Thy Dinh Nguyen; Anamay Chaturvedi; Huy Nguyen; |
1076 | Open-Vocabulary Object Detection Upon Frozen Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present F-VLM, a simple open-vocabulary object detection method built uponFrozenVision andLanguageModels. |
Weicheng Kuo; Yin Cui; Xiuye Gu; AJ Piergiovanni; Anelia Angelova; |
1077 | E-CRF: Embedded Conditional Random Field for Boundary-caused Class Weights Confusion in Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We call this issue Boundary-caused Class Weights Confusion (BCWC). We try to focus on this problem and propose a novel method named Embedded Conditional Random Field (E-CRF) to alleviate it. |
Jie Zhu; Huabin Huang; Banghuai Li; Leye Wang; |
1078 | Bag of Tricks for Unsupervised Text-to-Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a bag of tricks to enable effective unsupervised TTS. |
Yi Ren; Chen Zhang; Shuicheng YAN; |
1079 | MCAL: Minimum Cost Human-Machine Active Labeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of hybrid human-machine labeling, which trains a classifier to accurately auto-label part of the data set. |
Hang Qiu; Krishna Chintalapudi; Ramesh Govindan; |
1080 | Relaxed Combinatorial Optimization Networks with Self-Supervision: Theoretical and Empirical Notes on The Cardinality-Constrained Case Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to develop a new paradigm to solve the CO problem by incorporating the constraints into the network architecture and computational operators, which is a more natural learning pipeline and decouples the constraint violation penalty from the raw objective optimization. |
Runzhong Wang; Li Shen; Yiting Chen; Xiaokang Yang; Dacheng Tao; Junchi Yan; |
1081 | ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the robustness of a combinatorial solver as a blackbox regardless it is classic or learning-based though the latter can often be more interesting to the ML community. |
Han Lu; Zenan Li; Runzhong Wang; Qibing Ren; Xijun Li; Mingxuan Yuan; Jia Zeng; Xiaokang Yang; Junchi Yan; |
1082 | DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. |
Alexander Tyurin; Peter Richtárik; |
1083 | Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on learning the back-end solver under the most general form of GM: Lawler’s QAP, whose input is the affinity matrix. |
Chang Liu; Zetian Jiang; Runzhong Wang; Lingxiao Huang; Pinyan Lu; Junchi Yan; |
1084 | Liquid Structural State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We use the recently proposed parametrization and memorization techniques for training state-space models in a linearized version of liquid neural networks, and achieve SOTA on sequence modeling tasks. |
Ramin Hasani; Mathias Lechner; Tsun-Hsuan Wang; Makram Chahine; Alexander Amini; Daniela Rus; |
1085 | 3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. |
Ho Hin Lee; Shunxing Bao; Yuankai Huo; Bennett A. Landman; |
1086 | SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SoftZoo, a soft robot co-design platform for locomotion in diverse environments. |
Tsun-Hsuan Wang; Pingchuan Ma; Andrew Everett Spielberg; Zhou Xian; Hao Zhang; Joshua B. Tenenbaum; Daniela Rus; Chuang Gan; |
1087 | Words Are All You Need? Language As An Approximation for Representational Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conducted an evaluation of 611 pre-trained models across three domains — images, audio, video — and found that there is a large gap in performance between human similarity judgments and pre-trained DNNs. To address this gap, we propose a new class of similarity approximation methods based on language. |
Raja Marjieh; Pol Van Rijn; Ilia Sucholutsky; Theodore Sumers; Harin Lee; Thomas L. Griffiths; Nori Jacoby; |
1088 | Hungry Hungry Hippos: Towards Language Modeling with State Space Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. |
Tri Dao; Daniel Y Fu; Khaled Kamal Saab; Armin W Thomas; Atri Rudra; Christopher Re; |
1089 | Anti-Symmetric DGN: A Stable Architecture for Deep Graph Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Anti-Symmetric Deep Graph Networks (A-DGNs), a framework for stable and non-dissipative DGN design, conceived through the lens of ordinary differential equations. |
Alessio Gravina; Davide Bacciu; Claudio Gallicchio; |
1090 | Information-Theoretic Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new mathematical foundation for diffusion models inspired by classic results in information theory that connects Information with Minimum Mean Square Error estimators, the so-called I-MMSE relations. |
Xianghao Kong; Rob Brekelmans; Greg Ver Steeg; |
1091 | FunkNN: Neural Interpolation for Functional Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The question is then how to extrapolate the spectrum in a data-driven way while meeting the above design criteria. Our answer is FunkNN—a novel convolutional network which learns how to reconstruct continuous images at arbitrary coordinates and can be applied to any image dataset. |
AmirEhsan Khorashadizadeh; Anadi Chaman; Valentin Debarnot; Ivan Dokmanić; |
1092 | How to Train Your HIPPO: State Space Models with Generalized Orthogonal Basis Projections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, the theoretical mechanism by which S4 models long-range dependencies actually remains unexplained. We derive a more general and intuitive formulation of the HiPPO framework, which provides a simple mathematical interpretation of S4 as a decomposition onto exponentially-warped Legendre polynomials, explaining its ability to capture long dependencies. |
Albert Gu; Isys Johnson; Aman Timalsina; Atri Rudra; Christopher Re; |
1093 | GNNSafe: Out-of-Distribution Detection for Graph Neural Networks with Energy Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify a provably effective OOD discriminator based on an energy function directly extracted from a graph neural network trained with standard supervised classification loss. |
Qitian Wu; Yiting Chen; Chenxiao Yang; Junchi Yan; |
1094 | DIFFormer: Scalable (Graph) Transformers Induced By Energy Constrained Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances’ information by their interactions. |
Qitian Wu; Chenxiao Yang; Wentao Zhao; Yixuan He; David Wipf; Junchi Yan; |
1095 | Graph Neural Networks Are Inherently Good Generalizers: Insights By Bridging GNNs and MLPs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper pinpoints the major source of GNNs’ performance gain to their intrinsic generalization capabilities, by introducing an intermediate model class dubbed as P(ropagational)MLP, which is identical to standard MLP in training, and then adopt GNN’s architecture in testing. |
Chenxiao Yang; Qitian Wu; Jiahua Wang; Junchi Yan; |
1096 | $\Lambda$-DARTS: Mitigating Performance Collapse By Harmonizing Operation Selection Among Cells Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the weight-sharing framework used for cell-search in DARTS and the convergence of architecture parameters has not been analyzed yet. In this paper, we provide a thorough and novel theoretical and empirical analysis on DARTS and its point of convergence. |
Sajad Movahedi; Melika Adabinejad; Ayyoob Imani; Arezou Keshavarz; Mostafa Dehghani; Azadeh Shakery; Babak N Araabi; |
1097 | Promptagator: Few-shot Dense Retrieval From 8 Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. |
Zhuyun Dai; Vincent Y Zhao; Ji Ma; Yi Luan; Jianmo Ni; Jing Lu; Anton Bakalov; Kelvin Guu; Keith Hall; Ming-Wei Chang; |
1098 | VoGE: A Differentiable Volume Renderer Using Gaussian Ellipsoids for Analysis-by-Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose VoGE, which uses ray tracing to capture nearest components with their volume density distributions on the rays and aggregates via integral of the volume densities based on Gaussian ellipsoids, which brings more efficient and stable gradients. |
Angtian Wang; Peng Wang; Jian Sun; Adam Kortylewski; Alan Yuille; |
1099 | CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, adapting image-text pre-trained models to video-text pre-training (i.e., post-pretraining) has not demonstrated a significant advantage yet. In this paper, we tackle this challenge by raising and addressing two questions: 1) what are the factors hindering post-pretraining CLIP from improving performance on video-text tasks, and 2) how to mitigate the impact of these factors. |
Hongwei Xue; Yuchong Sun; Bei Liu; Jianlong Fu; Ruihua Song; Houqiang Li; Jiebo Luo; |
1100 | Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, this paper addresses the value expansion class of model-based approaches. |
Daniel Palenicek; Michael Lutter; Joao Carvalho; Jan Peters; |
1101 | Boosting The Cycle Counting Power of Graph Neural Networks with I$^2$-GNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we prove that Subgraph MPNNs fail to count more-than-4-cycles at node level, implying that node representations cannot correctly encode the surrounding substructures like ring systems with more than four atoms. To overcome this limitation, we propose I$^2$-GNNs to extend Subgraph MPNNs by assigning different identifiers for the root node and its neighbors in each subgraph. |
Yinan Huang; Xingang Peng; Jianzhu Ma; Muhan Zhang; |
1102 | Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an actor-critic framework for graph-based machine learning pipelines with non-differentiable blocks, and apply it to repetitive combinatorial optimization problems (COPs) under hard constraints. |
Zhongyuan Zhao; Ananthram Swami; Santiago Segarra; |
1103 | Estimating Individual Treatment Effects Under Unobserved Confounding Using Binary Instruments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating ITEs using binary IVs and thus yield an unbiased ITE estimator. |
Dennis Frauen; Stefan Feuerriegel; |
1104 | Safe Reinforcement Learning From Pixels Using A Stochastic Latent Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of safe reinforcement learning from pixel observations. |
Yannick Hogewind; Thiago D. Simão; Tal Kachman; Nils Jansen; |
1105 | This Looks Like It Rather Than That: ProtoKNN For Similarity-Based Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to this difficulty, the effectiveness of similarity-based classifiers (e.g., k-nearest neighbor (KNN)) on the ‘this looks like that’ framework have not been sufficiently examined. To alleviate this problem, we propose ProtoKNN, an extension of ProtoPNet that adopts KNN classifiers. |
Yuki Ukai; Tsubasa Hirakawa; Takayoshi Yamashita; Hironobu Fujiyoshi; |
1106 | The Role of ImageNet Classes in Fréchet Inception Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID looks at in generated images. |
Tuomas Kynkäänniemi; Tero Karras; Miika Aittala; Timo Aila; Jaakko Lehtinen; |
1107 | Bidirectional Language Models Are Also Few-shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. |
Ajay Patel; Bryan Li; Mohammad Sadegh Rasooli; Noah Constant; Colin Raffel; Chris Callison-Burch; |
1108 | MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the above problems, we propose to combine local features and global correlations to capture the overall view of time series (e.g., fluctuations, trends). |
Huiqiang Wang; Jian Peng; Feihu Huang; Jince Wang; Junhui Chen; Yifei Xiao; |
1109 | Revisit Finetuning Strategy for Few-Shot Learning to Strengthen The Equivariance of Emdeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To extract the undistorted features, we designed Linear-Probing-Finetuning with Firth-Bias (LP-FT-FB) to yield an accurate bias on the limited samples for better finetuning the pre-trained feature extractor, imposing equivariance on the whole model. |
Heng Wang; Tan Yue; Xiang Ye; Zihang He; Bohan Li; Yong Li; |
1110 | Synthetic Data Generation of Many-to-Many Datasets Via Random Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We synthesise datasets with many-to-many relationships by first generating the relationships via random graph generation and then generating the data attributes. |
Kai Xu; Georgi Ganev; Emile Joubert; Rees Davison; Olivier Van Acker; Luke Robinson; |
1111 | Memorization Capacity of Neural Networks with Conditional Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the fundamental limits of neural conditional computation from the perspective of memorization capacity. |
Erdem Koyuncu; |
1112 | Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: From this analysis, we present a simple SSL method, The Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE. |
Youngwan Lee; Jeffrey Ryan Willette; Jonghee Kim; Juho Lee; Sung Ju Hwang; |
1113 | The Best of Both Worlds: Accurate Global and Personalized Models Through Federated Learning with Data-Free Hyper-Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose FedHKD (Federated Hyper-Knowledge Distillation), a novel FL algorithm in which clients rely on knowledge distillation (KD) to train local models. |
Huancheng Chen; Chaining Wang; Haris Vikalo; |
1114 | Learning to Solve Constraint Satisfaction Problems with Recurrent Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the Transformer model extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over the state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. |
Zhun Yang; Adam Ishay; Joohyung Lee; |
1115 | Learning Simultaneous Navigation and Construction in Grid Worlds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to study a new learning task, mobile construction, to enable an agent to build designed structures in 1/2/3D grid worlds while navigating in the same evolving environments. |
Wenyu Han; Haoran Wu; Eisuke Hirota; Alexander Gao; Lerrel Pinto; Ludovic Righetti; Chen Feng; |
1116 | Leveraging Future Relationship Reasoning for Vehicle Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method to formulate a stochastic future relationship among agents using lane structure. |
Daehee Park; Hobin Ryu; Yunseo Yang; Jegyeong Cho; Jiwon Kim; Kuk-Jin Yoon; |
1117 | Neural Networks and The Chomsky Hierarchy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we conduct an extensive empirical study (10250 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. |
Gregoire Deletang; Anian Ruoss; Jordi Grau-Moya; Tim Genewein; Li Kevin Wenliang; Elliot Catt; Chris Cundy; Marcus Hutter; Shane Legg; Joel Veness; Pedro A Ortega; |
1118 | Federated Nearest Neighbor Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework that, instead of multi-round model-based interactions, leverages one-round memorization-based interaction to share knowledge across different clients to build low-overhead privacy-preserving systems. |
Yichao Du; Zhirui Zhang; Bingzhe Wu; Lemao Liu; Tong Xu; Enhong Chen; |
1119 | Unsupervised Manifold Alignment with Joint Multidimensional Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. |
Dexiong Chen; Bowen Fan; Carlos Oliver; Karsten Borgwardt; |
1120 | Label Propagation with Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002) that moreover takes advantage of useful prior information, specifically probabilistic hypothesized labels on the unlabeled data. |
Rattana Pukdee; Dylan Sam; Pradeep Kumar Ravikumar; Nina Balcan; |
1121 | Out-of-distribution Detection with Implicit Outlier Transformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, the performance of OE when facing unseen OOD data, can be weaken. To address this issue, we propose a novel OE-based approach that makes the model perform well for unseen OOD situations, even for unseen OOD cases. |
Qizhou Wang; Junjie Ye; Feng Liu; Quanyu Dai; Marcus Kalander; Tongliang Liu; Jianye HAO; Bo Han; |
1122 | Pushing The Limits of Fewshot Anomaly Detection in Industry Vision: Graphcore Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To push the limits, we reveal that rotation-invariant feature property has a significant impact in industrial-based FSAD. |
Guoyang Xie; Jinbao Wang; Jiaqi Liu; Yaochu Jin; Feng Zheng; |
1123 | Reproducible Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the notion of reproducible policies in the context of stochastic bandits, one of the canonical problems in interactive learning. |
Alkis Kalavasis; Grigoris Velegkas; Hossein Esfandiari; Vahab Mirrokni; Andreas Krause; Amin Karbasi; |
1124 | Which Layer Is Learning Faster? A Systematic Exploration of Layer-wise Convergence Rate for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work demonstrates that the shallower layers of DNNs tend to converge faster than the deeper layers. |
Yixiong Chen; Alan Yuille; Zongwei Zhou; |
1125 | Improving Differentiable Neural Architecture Search By Encouraging Transferability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods for solving this problem have a variety of limitations, such as cannot prevent the happening of architecture degeneration, being excessively restrictive in setting the number of skip connections, etc. To address these limitations, we propose a new approach for improving the generalizability and stability of differentiable NAS, by developing a transferability-encouraging tri-level optimization framework which improves the architecture of a main model by encouraging good transferability to an auxiliary model. |
Parth Sheth; Pengtao Xie; |
1126 | Enhancing The Inductive Biases of Graph Neural ODE for Modeling Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a graph-based neural ODE, Gnode, to learn the time evolution of dynamical systems. |
Suresh Bishnoi; Ravinder Bhattoo; Jayadeva Jayadeva; Sayan Ranu; N M Anoop Krishnan; |
1127 | A Differential Geometric View and Explainability of GNN on Evolving Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a smooth parameterization of the GNN predicted distributions using axiomatic attribution, where the distributions are on a low- dimensional manifold within a high-dimensional embedding space. |
Yazheng Liu; Xi Zhang; Sihong Xie; |
1128 | FastFill: Efficient Compatible Model Update Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce \method: a compatible model update process using feature alignment and policy based partial backfilling to promptly elevate retrieval performance. |
Florian Jaeckle; Fartash Faghri; Ali Farhadi; Oncel Tuzel; Hadi Pouransari; |
1129 | Versatile Neural Processes for Learning Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient NP framework dubbed Versatile Neural Processes (VNP), which largely increases the capability of approximating functions. |
Zongyu Guo; Cuiling Lan; Zhizheng Zhang; Yan Lu; Zhibo Chen; |
1130 | BAYES RISK CTC: CONTROLLABLE CTC ALIGNMENT IN SEQUENCE-TO-SEQUENCE TASKS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, the motivation of this work is to make the CTC alignment prediction controllable and thus equip CTC with extra functionalities. |
Jinchuan Tian; Brian Yan; Jianwei Yu; CHAO WENG; Dong Yu; Shinji Watanabe; |
1131 | Dilated Convolution with Learnable Spacings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new method to increase the RF size without increasing the number of parameters. |
Ismail Khalfaoui Hassani; Thomas Pellegrini; Timothée Masquelier; |
1132 | Iterative Circuit Repair Against Formal Specifications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a deep learning approach for repairing sequential circuits against formal specifications given in linear-time temporal logic (LTL). |
Matthias Cosler; Frederik Schmitt; Christopher Hahn; Bernd Finkbeiner; |
1133 | On Pre-training Language Model for Antibody Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, to investigate the problem, we aim to answer the following key questions: (1) How do pre-trained language models perform in antibody tasks with different specificity? (2) How many benefits will the model gain if we introduce the specific biological mechanism to the pretraining process? (3) Do the learned antibody pre-trained representations make sense in real-world antibody problems, like drug discovery and immune process understanding? Previously, no benchmark available largely hindered the study to answer these questions. |
Danqing Wang; Fei YE; Hao Zhou; |
1134 | TempCLR: Temporal Alignment Representation with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a contrastive learning framework TempCLR to compare the full video and the paragraph explicitly. |
Yuncong Yang; Jiawei Ma; Shiyuan Huang; Long Chen; Xudong Lin; Guangxing Han; Shih-Fu Chang; |
1135 | Behind The Scenes of Gradient Descent: A Trajectory Analysis Via Basis Function Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work analyzes the solution trajectory of gradient-based algorithms via a novel basis function decomposition. |
Jianhao Ma; Lingjun Guo; Salar Fattahi; |
1136 | Gradient Gating for Deep Multi-Rate Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs). |
T. Konstantin Rusch; Benjamin Paul Chamberlain; Michael W. Mahoney; Michael M. Bronstein; Siddhartha Mishra; |
1137 | Light Sampling Field and BRDF Representation for Physically-based Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this paper proposes a novel lighting representation that models direct and indirect light locally through light sampling strategy in a learned light sampling field. |
Jing Yang; Hanyuan Xiao; Wenbin Teng; Yunxuan Cai; Yajie Zhao; |
1138 | MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. |
Nicklas Hansen; Yixin Lin; Hao Su; Xiaolong Wang; Vikash Kumar; Aravind Rajeswaran; |
1139 | Rethinking Graph Lottery Tickets: Graph Sparsity Matters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we find that the performance of a sparsified GNN degrades significantly when the graph sparsity goes beyond a certain extent. Therefore, we propose two techniques to improve GNN performance when the graph sparsity is high. |
Bo Hui; Da Yan; Xiaolong Ma; Wei-Shinn Ku; |
1140 | Hyperbolic Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, encoding features that capture the hierarchical relationships between states into the model’s latent representations is often conducive to recovering effective policies. In this work, we study a new class of deep RL algorithms that promote encoding such relationships by using hyperbolic space to model latent representations. |
Edoardo Cetin; Benjamin Paul Chamberlain; Michael M. Bronstein; Jonathan J Hunt; |
1141 | Diffusion Policies As An Expressive Policy Class for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose representing the policy as a diffusion model, a recent class of highly-expressive deep generative models. |
Zhendong Wang; Jonathan J Hunt; Mingyuan Zhou; |
1142 | Trainable Weight Averaging: Efficient Training By Optimizing Historical Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we realize that the averaging coefficients could be determined in a trainable manner and propose Trainable Weight Averaging (TWA), a novel optimization method in the reduced subspace spanned by historical solutions. |
Tao Li; Zhehao Huang; Qinghua Tao; Yingwen Wu; Xiaolin Huang; |
1143 | Augmented Lagrangian Is Enough for Optimal Offline RL with General Function Approximation and Partial Coverage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage the marginalized importance sampling (MIS) formulation of RL and present the first set of offline RL algorithms that are statistically optimal and practical under general function approximation and single-policy concentrability, bypassing the need for uncertainty quantification. |
Paria Rashidinejad; Hanlin Zhu; Kunhe Yang; Stuart Russell; Jiantao Jiao; |
1144 | Learning to CROSS Exchange to Solve Min-max Vehicle Routing Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by CE, we propose Neuro CE (NCE), a fundamental operator of \textit{learned} meta-heuristic, to solve various min-max VRPs while overcoming the limitations of CE, i.e., the expensive $\mathcal{O}(n^4)$ search cost. |
Minjun Kim; Junyoung Park; Jinkyoo Park; |
1145 | Transformers Are Sample-Efficient World Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. |
Vincent Micheli; Eloi Alonso; François Fleuret; |
1146 | PGrad: Learning Principal Gradients For Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a novel DG training strategy, we call PGrad, to learn a robust gradient direction, improving models’ generalization ability on unseen domains. |
Zhe Wang; Jake Grigsby; Yanjun Qi; |
1147 | On The Robustness of Safe Reinforcement Learning Under Observational Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that baseline adversarial attack techniques for standard RL tasks are not always effective for safe RL and proposed two new approaches – one maximizes the cost and the other maximizes the reward. |
Zuxin Liu; Zijian Guo; Zhepeng Cen; Huan Zhang; Jie Tan; Bo Li; Ding Zhao; |
1148 | Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. |
Wenhao Yu; Dan Iter; Shuohang Wang; Yichong Xu; Mingxuan Ju; Soumya Sanyal; Chenguang Zhu; Michael Zeng; Meng Jiang; |
1149 | Protein Sequence and Structure Co-Design with Equivariant Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state from random initialization, based on context features given a priori. |
Chence Shi; Chuanrui Wang; Jiarui Lu; Bozitao Zhong; Jian Tang; |
1150 | Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. |
Yuda Song; Yifei Zhou; Ayush Sekhari; Drew Bagnell; Akshay Krishnamurthy; Wen Sun; |
1151 | IS SYNTHETIC DATA FROM GENERATIVE MODELS READY FOR IMAGE RECOGNITION? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in the data-scare settings (i.e. zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. |
Ruifei He; Shuyang Sun; Xin Yu; Chuhui Xue; Wenqing Zhang; Philip Torr; Song Bai; XIAOJUAN QI; |
1152 | Concept Gradient: Concept-based Interpretation Without Linear Assumption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we proposed Concept Gradient (CG), which extends concept-based, gradient interpretation methods to non-linear concept functions. |
Andrew Bai; Chih-Kuan Yeh; Pradeep Kumar Ravikumar; Neil Y.C. Lin; Cho-Jui Hsieh; |
1153 | Generalize Learned Heuristics to Solve Large-scale Vehicle Routing Problems in Real-time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We contribute in the three directions: We propose a Two-stage Divide Method (TAM) to generate sub-route sequence rather than node sequence for generalizing the heuristics learned on small-scale-VRPs to solve large-scale VRPs in real-time. |
Qingchun Hou; Jingwei Yang; Yiqiang Su; Xiaoqing Wang; Yuming Deng; |
1154 | Re-parameterizing Your Optimizers Rather Than Architectures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to incorporate model-specific prior knowledge into optimizers by modifying the gradients according to a set of model-specific hyper-parameters. |
Xiaohan Ding; Honghao Chen; Xiangyu Zhang; Kaiqi Huang; Jungong Han; Guiguang Ding; |
1155 | Provable Unsupervised Data Sharing for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the theoretical benefit of unlabeled data in the context of linear MDPs and propose a novel and Provable Data Sharing algorithm, which we refer to as PDS, to utilize such unlabeled data for offline RL. |
Hao Hu; Yiqin Yang; Qianchuan Zhao; Chongjie Zhang; |
1156 | Diffusion Posterior Sampling for General Noisy Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via the Laplace approximation of the posterior sampling. |
Hyungjin Chung; Jeongsol Kim; Michael Thompson Mccann; Marc Louis Klasky; Jong Chul Ye; |
1157 | Recitation-Augmented Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new paradigm to help Large Language Models (LLMs) generate more accurate factual knowledge without retrieving from an external corpus, called RECITation-augmented gEneration (RECITE). |
Zhiqing Sun; Xuezhi Wang; Yi Tay; Yiming Yang; Denny Zhou; |
1158 | Masked Unsupervised Self-training for Label-free Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to leverage the abundant unlabeled data from a target domain to improve the performance of a pre-trained zero-shot classifier, by unsupervised finetuning of the pre-trained model. |
Junnan Li; Silvio Savarese; Steven Hoi; |
1159 | Statistical Efficiency of Score Matching: The View from Isoperimetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood — which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated — i.e. the Poincar\’e, log-Sobolev and isoperimetric constant — quantities which govern the mixing time of Markov processes like Langevin dynamics. |
Frederic Koehler; Alexander Heckett; Andrej Risteski; |
1160 | Understanding Weight-magnitude Hyperparameters in Training Binary Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The magnitude is interpretable for real-valued weights, but loses its meaning for binary weights. In this paper we offer a new interpretation of these magnitude-based hyperparameters based on higher-order gradient filtering during network optimization. |
Joris Quist; Yunqiang Li; Jan van Gemert; |
1161 | Efficient Planning in A Compact Latent Action Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To advance efficient planning for high-dimensional continuous control, we propose Trajectory Autoencoding Planner (TAP), which learns low-dimensional latent action codes with a state-conditional VQ-VAE. |
zhengyao jiang; Tianjun Zhang; Michael Janner; Yueying Li; Tim Rocktäschel; Edward Grefenstette; Yuandong Tian; |
1162 | Leveraging Importance Weights in Subset Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting. |
Gui Citovsky; Giulia DeSalvo; Sanjiv Kumar; Srikumar Ramalingam; Afshin Rostamizadeh; Yunjuan Wang; |
1163 | Why Adversarial Training Can Hurt Robust Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate that, surprisingly, the opposite can be true for a natural class of perceptible perturbations — even though adversarial training helps when enough data is available, it may in fact hurt robust generalization in the small sample size regime. |
Jacob Clarysse; Julia Hörrmann; Fanny Yang; |
1164 | Sequential Learning of Neural Networks for Prequential MDL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we evaluate approaches for computing prequential description lengths for image classification datasets with neural networks. |
Jorg Bornschein; Yazhe Li; Marcus Hutter; |
1165 | Learning Probabilistic Topological Representations Using Discrete Morse Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first deep learning based method to learn topological/structural representations. |
Xiaoling Hu; Dimitris Samaras; Chao Chen; |
1166 | Multi-objective Optimization Via Equivariant Deep Hypervolume Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the computational complexity for the calculation of the hypervolume scales unfavorably with increasing number of objectives and data points, which restricts its use in those common multi-objective optimization frameworks. To overcome these restrictions we propose to approximate the hypervolume function with a deep neural network, which we call DeepHV. |
Jim Boelrijk; Bernd Ensing; Patrick Forré; |
1167 | DiffusER: Diffusion Via Edit-based Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generally applicable text generative model which takes inspiration from diffusion models and parameterises generation steps as text editing steps without compromising performance and adding flexibility. |
Machel Reid; Vincent Josua Hellendoorn; Graham Neubig; |
1168 | Provable Sim-to-real Transfer in Continuous Domain with Partial Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the sim-to-real transfer in continuous domain with partial observations, where the simulated environments and real-world environments are modeled by linear quadratic Gaussian (LQG) systems. |
Jiachen Hu; Han Zhong; Chi Jin; Liwei Wang; |
1169 | DynaMS: Dyanmic Margin Selection for Efficient Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose dynamic margin selection (DynaMS). |
Jiaxing Wang; Yong Li; Jingwei Zhuo; Xupeng Shi; WEIZHONG ZHANG; Lixing Gong; Tong Tao; Pengzhang Liu; Yongjun Bao; Weipeng Yan; |
1170 | TANGOS: Regularizing Tabular Neural Networks Through Gradient Orthogonalization and Specialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Tabular Neural Gradient Orthogonalization and Specialization (TANGOS), a novel framework for regularization in the tabular setting built on latent unit attributions. |
Alan Jeffares; Tennison Liu; Jonathan Crabbé; Fergus Imrie; Mihaela van der Schaar; |
1171 | Model-based Causal Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the {\em model-based causal Bayesian optimization algorithm (MCBO)} that learns a full system model instead of only modeling intervention-reward pairs. |
Scott Sussex; Anastasia Makarova; Andreas Krause; |
1172 | On Explaining Neural Network Robustness with Activation Path Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the robustness of neural networks from the activation pattern perspective. |
Ziping Jiang; |
1173 | Structure By Architecture: Structured Representations Without Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike most methods which rely on matching an arbitrary, relatively unstructured, prior distribution for sampling, we propose a sampling technique that relies solely on the independence of latent variables, thereby avoiding the trade-off between reconstruction quality and generative performance typically observed in VAEs. |
Felix Leeb; Giulia Lanzillotta; Yashas Annadani; Michel Besserve; Stefan Bauer; Bernhard Schölkopf; |
1174 | Compressing Multidimensional Weather and Climate Data Into Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Weather and climate simulations produce petabytes of high-resolution data that are later analyzed by researchers in order to understand climate change or severe weather. We propose a new method of compressing this multidimensional weather and climate data: a coordinate-based neural network is trained to overfit the data, and the resulting parameters are taken as a compact representation of the original grid-based data. |
Langwen Huang; Torsten Hoefler; |
1175 | Timing Is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we intro- duce a reinforcement learning (RL) framework named Learnable Impulse Control Reinforcement Algorithm (LICRA), for learning to optimally select both when to act and which actions to take when actions incur costs. |
David Henry Mguni; Aivar Sootla; Juliusz Krzysztof Ziomek; Oliver Slumbers; Zipeng Dai; Kun Shao; Jun Wang; |
1176 | DECAP: Decoding CLIP Latents for Zero-shot Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple framework, named DeCap, for zero-shot captioning. |
Wei Li; Linchao Zhu; Longyin Wen; Yi Yang; |
1177 | Robust Explanation Constraints for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Relying on constraint relaxation techniques from non-convex optimization, we develop a method that upper-bounds the largest change an adversary can make to a gradient-based explanation via bounded manipulation of either the input features or model parameters. |
Matthew Robert Wicker; Juyeon Heo; Luca Costabello; Adrian Weller; |
1178 | Offline Reinforcement Learning Via High-Fidelity Generative Behavior Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training, which deviates from their initial motivation. |
Huayu Chen; Cheng Lu; Chengyang Ying; Hang Su; Jun Zhu; |
1179 | Optimizing Spca-based Continual Learning: A Theoretical Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a theoretical analysis of a simple but efficient continual learning algorithm. |
Chunchun Yang; Malik Tiomoko; Zengfu Wang; |
1180 | Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Focusing on the offline RL setting, we aim to build a simple and discrete world model that abstracts the original environment. |
Deyao Zhu; Li Erran Li; Mohamed Elhoseiny; |
1181 | Domain Generalisation Via Domain Adaptation: An Adversarial Fourier Amplitude Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the domain generalisation (DG) problem by posing it as a domain adaptation (DA) task where we adversarially synthesise the worst-case `target’ domain and adapt a model to that worst-case domain, thereby improving the model’s robustness. |
Minyoung Kim; Da Li; Timothy Hospedales; |
1182 | Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study DP data synthesis algorithms based on Bayesian networks (BN) from a statistical perspective. |
Ximing Li; Chendi Wang; Guang Cheng; |
1183 | Online Low Rank Matrix Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose OCTAL (Online Collaborative filTering using iterAtive user cLustering) that guarantees nearly optimal regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. |
Soumyabrata Pal; Prateek Jain; |
1184 | A Primal-Dual Framework for Transformers and Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the self-attention corresponds to the support vector expansion derived from a support vector regression problem and provide a principled framework for constructing new attention mechanisms from popular neural network layers. |
Tan Minh Nguyen; Tam Minh Nguyen; Nhat Ho; Andrea L. Bertozzi; Richard Baraniuk; Stanley Osher; |
1185 | Scaffolding A Student to Instill Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel knowledge distillation (KD) method to selectively instill teacher knowledge into a student model motivated by situations where the student’s capacity is significantly smaller than that of the teachers. |
Anil Kag; Durmus Alp Emre Acar; Aditya Gangrade; Venkatesh Saligrama; |
1186 | Understanding The Generalization of Adam in Learning Neural Networks with Proper Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it has been observed in many deep learning applications such as image classification, Adam can converge to a different solution with a worse test error compared to (stochastic) gradient descent, even with a fine-tuned regularization. In this paper, we provide a theoretical explanation for this phenomenon: we show that in the nonconvex setting of learning over-parameterized two-layer convolutional neural networks starting from the same random initialization, for a class of data distributions (inspired from image data), Adam and gradient descent (GD) can converge to different global solutions of the training objective with provably different generalization errors, even with weight decay regularization. |
Difan Zou; Yuan Cao; Yuanzhi Li; Quanquan Gu; |
1187 | Learning ReLU Networks to High Uniform Accuracy Is Intractable Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we precisely quantify the number of training samples needed for any conceivable training algorithm to guarantee a given uniform accuracy on any learning problem formulated over target classes containing (or consisting of) ReLU neural networks of a prescribed architecture. |
Julius Berner; Philipp Grohs; Felix Voigtlaender; |
1188 | How Sharpness-Aware Minimization Minimizes Sharpness? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove the implicit bias of Sharpness-Aware Minimization (SAM) is minimizing the top eigenvalue of Hessian in the full-batch setting or minimizing the trace of Hessian when batch size is 1. |
Kaiyue Wen; Tengyu Ma; Zhiyuan Li; |
1189 | The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the type of solutions to which stochastic gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss. |
Mor Shpigel Nacson; Rotem Mulayoff; Greg Ongie; Tomer Michaeli; Daniel Soudry; |
1190 | MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is thus crucial to consider the dependency between the environment and co-player when shaping a curriculum in multi-agent domains. In this work, we use this insight and extend Unsupervised Environment Design (UED) to multi-agent environments. |
Mikayel Samvelyan; Akbir Khan; Michael D Dennis; Minqi Jiang; Jack Parker-Holder; Jakob Nicolaus Foerster; Roberta Raileanu; Tim Rocktäschel; |
1191 | Leveraging Unlabeled Data to Track Memorization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called $\textit{susceptibility}$, to gauge such memorization for neural networks. |
Mahsa Forouzesh; Hanie Sedghi; Patrick Thiran; |
1192 | Learning Vortex Dynamics for Fluid Inference and Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel machine learning method based on differentiable vortex particles to infer and predict fluid dynamics from a single video. |
Yitong Deng; Hong-Xing Yu; Jiajun Wu; Bo Zhu; |
1193 | Quality-Similar Diversity Via Population Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the quality-similar diversity problem that features diversity among policies of similar qualities.To derive the gradient of the user-specified diversity with respect to a policy, which is not trivially available, we introduce a set of BD estimators and connect it with the classical policy gradient theorem. |
Shuang Wu; Jian Yao; Haobo Fu; Ye Tian; Chao Qian; Yaodong Yang; QIANG FU; Yang Wei; |
1194 | Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present rectified flow, a simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions $\pi_0$ and $\pi_1$, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. |
Xingchao Liu; Chengyue Gong; qiang liu; |
1195 | Learning Diffusion Bridges on Constrained Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, because diffusion processes are most naturally applied on the unconstrained Euclidean space $\mathrm{R}^d$, key challenges arise for developing diffusion based models for learning data on constrained and structured domains. We present a simple and unified framework to achieve this that can be easily adopted to various types of domains, including product spaces of any type (be it bounded/unbounded, continuous/discrete, categorical/ordinal, or their mix). |
Xingchao Liu; Lemeng Wu; Mao Ye; qiang liu; |
1196 | Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is the first to consider indiscriminate poisoning attacks of contrastive learning. We propose contrastive poisoning, the first effective such attack on CL. |
Hao He; Kaiwen Zha; Dina Katabi; |
1197 | Decompositional Generation Process for Instance-Dependent Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. |
Congyu Qiao; Ning Xu; Xin Geng; |
1198 | FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing work on robust aggregation and certified FL robustness does not study how hardening benign clients can affect the global model (and the malicious clients). In this work, we theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting. |
Kaiyuan Zhang; Guanhong Tao; Qiuling Xu; Siyuan Cheng; Shengwei An; Yingqi Liu; Shiwei Feng; Guangyu Shen; Pin-Yu Chen; Shiqing Ma; Xiangyu Zhang; |
1199 | Minimum Variance Unbiased N:M Sparsity for The Neural Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we first establish a tensor-level optimality criteria. Previous works aimed to minimize the mean-square-error (MSE) of each pruned block. We show that while minimization of the MSE works fine for pruning the weights and activations, it catastrophically fails for the neural gradients. Instead, we show that accurate pruning of the neural gradients requires an unbiased minimum-variance pruning mask. |
Brian Chmiel; Itay Hubara; Ron Banner; Daniel Soudry; |
1200 | Incremental Learning of Structured Memory Via Closed-Loop Transcription Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a minimal computational model for learning structured memories of multiple object classes in an incremental setting. |
Shengbang Tong; Xili Dai; Ziyang Wu; Mingyang Li; Brent Yi; Yi Ma; |
1201 | Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory. |
Xiaoman Pan; Wenlin Yao; Hongming Zhang; Dian Yu; Dong Yu; Jianshu Chen; |
1202 | NewModel: Improving DeBERTa Using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new pre-trained language model, NewModel, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. |
Pengcheng He; Jianfeng Gao; Weizhu Chen; |
1203 | Pushing The Accuracy-Fairness Tradeoff Frontier with Introspective Self-play Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose $Introspective Self-play$ (ISP), a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias, by adding an auxiliary $introspection$ task requiring a model to predict the bias for each data point in addition to the label. |
Jeremiah Zhe Liu; Krishnamurthy Dj Dvijotham; Jihyeon Lee; Quan Yuan; Balaji Lakshminarayanan; Deepak Ramachandran; |
1204 | Asymptotic Instance-Optimal Algorithms for Interactive Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design the first asymptotic instance-optimal algorithm for general interactive decision making problems with finite number of decisions under mild conditions. |
Kefan Dong; Tengyu Ma; |
1205 | The Hidden Uniform Cluster Prior in Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By moving away from conventional uniformity priors and instead preferring power-law distributed feature clusters, we show that one can improve the quality of the learned representations on real-world class-imbalanced datasets. |
Mido Assran; Randall Balestriero; Quentin Duval; Florian Bordes; Ishan Misra; Piotr Bojanowski; Pascal Vincent; Michael Rabbat; Nicolas Ballas; |
1206 | Globally Optimal Training of Neural Networks with Threshold Activation Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we study weight decay regularized training problems of deep neural networks with threshold activations. |
Tolga Ergen; Halil Ibrahim Gulluk; Jonathan Lacotte; Mert Pilanci; |
1207 | Mosaic Representation Learning for Self-supervised Visual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it overlooks the diverse contextual backgrounds, which reduces the variance of the input views and degenerates the performance. To address this problem, we propose a mosaic representation learning framework (MosRep), consisting of a new data augmentation strategy that enriches the backgrounds of each small crop and improves the quality of visual representations. |
Zhaoqing Wang; Ziyu Chen; Yaqian Li; Yandong Guo; Jun Yu; Mingming Gong; Tongliang Liu; |
1208 | FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FluidLab, a simulation environment with a diverse set of manipulation tasks involving complex fluid dynamics. |
Zhou Xian; Bo Zhu; Zhenjia Xu; Hsiao-Yu Tung; Antonio Torralba; Katerina Fragkiadaki; Chuang Gan; |
1209 | Task Ambiguity in Humans and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We motivate the direction of studying task ambiguity in humans and language models, evaluating them on a new benchmark of ambiguously-specified tasks and develop methods for improving performance. |
Alex Tamkin; Kunal Handa; Avash Shrestha; Noah Goodman; |
1210 | Preference Transformer: Modeling Human Preferences Using Transformers for RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers. |
Changyeon Kim; Jongjin Park; Jinwoo Shin; Honglak Lee; Pieter Abbeel; Kimin Lee; |
1211 | Molecule Generation For Target Protein Binding with Structural Motifs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although deep generative models and geometric deep learning have made great progress in drug design, existing works either sample in the 2D graph space or fail to generate valid molecules with realistic substructures. To tackle these problems, we propose a Fragment-based LigAnd Generation framework (FLAG), to generate 3D molecules with valid and realistic substructures fragment-by-fragment. |
ZAIXI ZHANG; Qi Liu; Shuxin Zheng; Yaosen Min; |
1212 | Deep Generative Modeling on Limited Data with Regularization By Nontransferable Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the classical perspective of the bias-variance tradeoff, we propose regularized deep generative model (Reg-DGM), which leverages a nontransferable pre-trained model to reduce the variance of generative modeling with limited data. |
Yong Zhong; Hong Tao Liu; Xiaodong Liu; Fan Bao; Weiran Shen; Chongxuan Li; |
1213 | Can CNNs Be More Robust Than Transformers? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we question that belief by closely examining the design of Transformers. |
Zeyu Wang; Yutong Bai; Yuyin Zhou; Cihang Xie; |
1214 | Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the risk-aware reinforcement learning (RL) problem in the episodic finite-horizon Markov decision process with unknown transition and reward functions. |
Thanh Lam; Arun Verma; Bryan Kian Hsiang Low; Patrick Jaillet; |
1215 | Bi-level Physics-Informed Neural Networks for PDE Constrained Optimization Using Broyden’s Hypergradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel bi-level optimization framework to resolve the challenge by decoupling the optimization of the targets and constraints. |
Zhongkai Hao; Chengyang Ying; Hang Su; Jun Zhu; Jian Song; Ze Cheng; |
1216 | Analog Bits: Generating Discrete Data Using Diffusion Models with Self-Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Bit Diffusion: a simple and generic approach for generating discrete data with continuous diffusion models. |
Ting Chen; Ruixiang ZHANG; Geoffrey Hinton; |
1217 | Understanding Edge-of-Stability Training Dynamics with A Minimalist Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. |
Xingyu Zhu; Zixuan Wang; Xiang Wang; Mo Zhou; Rong Ge; |
1218 | Learning Proximal Operators to Discover Multiple Optima Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence.We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method. |
Lingxiao Li; Noam Aigerman; Vladimir Kim; Jiajin Li; Kristjan Greenewald; Mikhail Yurochkin; Justin Solomon; |
1219 | Neural Radiance Field Codebooks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view reconstruction. |
Matthew Wallingford; Aditya Kusupati; Alex Fang; Vivek Ramanujan; Aniruddha Kembhavi; Roozbeh Mottaghi; Ali Farhadi; |
1220 | FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Modern deep learning systems are increasingly deployed in situations such as personalization and federated learning where it is necessary to support i) learning on small amounts of data, and ii) communication efficient distributed training protocols. In this work, we develop FiLM Transfer (FiT) which fulfills these requirements in the image classification setting by combining ideas from transfer learning (fixed pretrained backbones and fine-tuned FiLM adapter layers) and meta-learning (automatically configured Naive Bayes classifiers and episodic training) to yield parameter efficient models with superior classification accuracy at low-shot. |
Aliaksandra Shysheya; John F Bronskill; Massimiliano Patacchiola; Sebastian Nowozin; Richard E Turner; |
1221 | AANG : Automating Auxiliary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an approach for automatically generating a suite of auxiliary objectives. |
Lucio M. Dery; Paul Michel; Mikhail Khodak; Graham Neubig; Ameet Talwalkar; |
1222 | NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel collaborative contrastive loss for NeRF to segment objects in complex real-world scenes, without any annotation. |
Zhiwen Fan; Peihao Wang; Yifan Jiang; Xinyu Gong; Dejia Xu; Zhangyang Wang; |
1223 | Private Federated Learning Without A Trusted Server: Optimal Algorithms for Convex Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide tight (up to logarithms) upper and lower bounds for ISRL-DP FL with convex/strongly convex loss functions and homogeneous (i.i.d.) silo data. |
Andrew Lowy; Meisam Razaviyayn; |
1224 | Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods require these channels to be constantly accessible and known to the agents a priori. In this work, we lift these requirements such that the agents must discover the cheap talk channels and learn how to use them. |
Yat Long Lo; Christian Schroeder de Witt; Samuel Sokota; Jakob Nicolaus Foerster; Shimon Whiteson; |
1225 | Analyzing Tree Architectures in Ensembles Via Neural Tangent Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate and analyze the Neural Tangent Kernel (NTK) induced by soft tree ensembles for arbitrary tree architectures. |
Ryuichi Kanoh; Mahito Sugiyama; |
1226 | Evaluating Long-Term Memory in 3D Mazes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the Memory Maze, a 3D domain of randomized mazes specifically designed for evaluating long-term memory in agents.With Memory Maze, we propose an online reinforcement learning benchmark, a diverse offline dataset, and an offline probing evaluation. |
Jurgis Pašukonis; Timothy P Lillicrap; Danijar Hafner; |
1227 | Proactive Multi-Camera Collaboration for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a multi-agent reinforcement learning (MARL) scheme to proactive Multi-Camera Collaboration for 3D Human Pose Estimation (MCC-HPE) in dynamic human crowds. |
Hai Ci; Mickel Liu; Xuehai Pan; fangwei zhong; Yizhou Wang; |
1228 | Become A Proficient Player with Limited Data Through Watching Pure Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a two-phase training pipeline as follows: for the pre-training phase, we implicitly extract the hidden action embedding from videos and pre-train the visual representation and the environment dynamics network through a novel cycle consistency objective based on vector quantization; for down-stream tasks, we finetune with small amount of task data based on the learned models. |
Weirui Ye; Yunsheng Zhang; Pieter Abbeel; Yang Gao; |
1229 | Hidden Markov Transformer for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Hidden Markov Transformer (HMT), which treats the moments of starting translating as hidden events and the target sequence as the corresponding observed events, thereby organizing them as a hidden Markov model. |
Shaolei Zhang; Yang Feng; |
1230 | Rank Preserving Framework for Asymmetric Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that the primary concern of the users is the rank of the returned images, we propose a generic rank preserving framework, which achieves feature compatibility and the order consistency between query and gallery models simultaneously. |
Hui Wu; Min Wang; Wengang Zhou; Houqiang Li; |
1231 | Mega: Moving Average Equipped Gated Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. |
Xuezhe Ma; Chunting Zhou; Xiang Kong; Junxian He; Liangke Gui; Graham Neubig; Jonathan May; Luke Zettlemoyer; |
1232 | Parallel Deep Neural Networks Have Zero Duality Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we particularly prove that the duality gap for deeper linear networks with vector outputs is non-zero. |
Yifei Wang; Tolga Ergen; Mert Pilanci; |
1233 | Multi-domain Image Generation and Translation with Identifiability Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent advances in nonlinear Independent Component Analysis (ICA) theory, we propose a new method to learn the joint distribution from the marginals by enforcing a specific type of minimal change across domains. |
Shaoan Xie; Lingjing Kong; Mingming Gong; Kun Zhang; |
1234 | Pessimism in The Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. |
Miao Lu; Yifei Min; Zhaoran Wang; Zhuoran Yang; |
1235 | Understanding Zero-shot Adversarial Robustness for Large-Scale Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify and explore the problem of adapting large-scale models for zero-shot adversarial robustness. |
Chengzhi Mao; Scott Geng; Junfeng Yang; Xin Wang; Carl Vondrick; |
1236 | Continual Evaluation for Lifelong Learning: Identifying The Stability Gap Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we establish a framework for continual evaluation that uses per-iteration evaluation and define a new set of metrics that enables quantifying the stability gap. |
Matthias De Lange; Gido M van de Ven; Tinne Tuytelaars; |
1237 | Transformer-based Model for Symbolic Regression Via Joint Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent transformer-based methods for SR focus more attention on large scale training data and ignore the ill-posed problem: the lack of sufficient supervision, i.e. expressions that may be completely different have the same supervision because of their same skeleton, which makes it challenging to deal with data that may be from the same expression skeleton but with different coefficients. Therefore, we present a transformer-based model for SR with the ability to alleviate this problem. |
Wenqiang Li; Weijun Li; Linjun Sun; Min Wu; Lina Yu; Jingyi Liu; Yanjie Li; Songsong Tian; |
1238 | Robust and Controllable Object-Centric Learning Through Energy-based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present EGO, a conceptually simple and general approach to learning object-centric representation through energy-based model. |
Ruixiang ZHANG; Tong Che; Boris Ivanovic; Renhao Wang; Marco Pavone; Yoshua Bengio; Liam Paull; |
1239 | Robust Fair Clustering: A Novel Fairness Attack and Defense Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a highly effective & novel fairness attack against state-of-the-art fair clustering models, & for self-completeness, we propose a defense framework based on consensus clustering & graph representation learning that is robust to our attack. |
Anshuman Chhabra; Peizhao Li; Prasant Mohapatra; Hongfu Liu; |
1240 | Learning to Jointly Share and Prune Weights for Grounding Based Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging this feature of transformers, we propose weight sharing across two transformer backbones and within the same transformer backbone and pruning across two backbones in a unified framework. |
Shangqian Gao; Burak Uzkent; Yilin Shen; Heng Huang; Hongxia Jin; |
1241 | Graph Domain Adaptation Via Theory-Grounded Spectral Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper targets at designing theory-grounded algorithms for graph domain adaptation (GDA). |
Yuning You; Tianlong Chen; Zhangyang Wang; Yang Shen; |
1242 | Understanding Why Generalized Reweighting Does Not Improve Over ERM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But a line of recent work has empirically shown that these approaches do not significantly improve over ERM in real applications with distribution shift. The goal of this work is to obtain a comprehensive theoretical understanding of this intriguing phenomenon. |
Runtian Zhai; Chen Dan; J Zico Kolter; Pradeep Kumar Ravikumar; |
1243 | Particle-based Variational Inference with Preconditioned Functional Gradient Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper remedies the problem by proposing a general framework to obtain tractable functional gradient flow estimates. |
Hanze Dong; Xi Wang; LIN Yong; Tong Zhang; |
1244 | Combating Exacerbated Heterogeneity for Robust Decentralized Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discover that the attribution behind this phenomenon is the generated adversarial data could exacerbate the data heterogeneity among local clients, making the wrapped federated learning perform poorly. To deal with this problem, we propose a novel framework termed as Slack Federated Adversarial Training (SFAT), assigning the client-wise slack during aggregation to combat the intensified heterogeneity. |
Jianing Zhu; Jiangchao Yao; Tongliang Liu; quanming yao; Jianliang Xu; Bo Han; |
1245 | Bidirectional Propagation for Cross-Modal 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, in contrast to existing pixel-to-point feature propagation, we investigate an opposite point-to-pixel direction, allowing point-wise features to flow inversely into the 2D image branch. |
Yifan Zhang; Qijian Zhang; Junhui Hou; Yixuan Yuan; Guoliang Xing; |
1246 | TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Technically, we propose the TimesNet with TimesBlock as a task-general backbone for time series analysis. |
Haixu Wu; Tengge Hu; Yong Liu; Hang Zhou; Jianmin Wang; Mingsheng Long; |
1247 | Learning Without Prejudices: Continual Unbiased Learning Via Benign and Malignant Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We term such type of forgetting benign forgetting, and categorize detrimental forgetting as malignant forgetting. Based on this finding, our objective in this study is twofold: (a) to discourage malignant forgetting by generating previous representations, and (b) encourage benign forgetting by employing contrastive learning in conjunction with feature-level augmentation. |
Myeongho Jeon; Hyoje Lee; Yedarm Seong; Myungjoo Kang; |
1248 | FINDE: Neural Differential Equations for Finding and Preserving Invariant Quantities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose first integral-preserving neural differential equation (FINDE). |
Takashi Matsubara; Takaharu Yaguchi; |
1249 | A Holistic View of Noise Transition Matrix in Deep Learning and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore learning statistically consistent classifiers under label noise by estimating the noise transition matrix T. |
LIN Yong; Renjie Pi; WEIZHONG ZHANG; Xiaobo Xia; Jiahui Gao; Xiao Zhou; Tongliang Liu; Bo Han; |
1250 | NORM: Knowledge Distillation Via N-to-One Representation Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present $N$-to-$O$ne $R$epresentation $M$atching (NORM), a new two-stage knowledge distillation method, which relies on a linear Feature Transform (FT) module. |
Xiaolong Liu; Lujun Li; Chao Li; Anbang Yao; |
1251 | GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the Group Propagation Vision Transformer (GPViT): a novel non- hierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. |
Chenhongyi Yang; Jiarui Xu; Shalini De Mello; Elliot J. Crowley; Xiaolong Wang; |
1252 | Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN’s ability to adaptively estimate functions with heterogeneous smoothness — a property of functions in Besov or Bounded Variation (BV) classes. |
Kaiqi Zhang; Yu-Xiang Wang; |
1253 | Sparse Token Transformer with Attention Back Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous token pruning approaches often remove tokens during the feed-forward stage without consideration of their impact on later layers’ attentions, which has a potential risk of dropping out important tokens for the given task. To tackle this issue, we propose an attention back-tracking method that tracks the importance of each attention in a Transformer architecture from the outputs to the inputs, to preserve the tokens that have a large impact on the final predictions. |
Heejun Lee; Minki Kang; Youngwan Lee; Sung Ju Hwang; |
1254 | Maximizing Communication Efficiency for Large-scale Training Via 0/1 Adam Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate the non-linearity in Adam causes slow convergence even when 1-bit compression or local steps are individually applied. |
Yucheng Lu; Conglong Li; Minjia Zhang; Christopher De Sa; Yuxiong He; |
1255 | PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. |
Toygun Basaklar; Suat Gumussoy; Umit Ogras; |
1256 | Learning Symbolic Models for Graph-structured Physical Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach that generalizes symbolic regression to graph-structured physical mechanisms. |
Hongzhi Shi; Jingtao Ding; Yufan Cao; quanming yao; Li Liu; Yong Li; |
1257 | Linear Connectivity Reveals Generalization Strategies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to existing results from image classification, we find that among text classifiers (trained on MNLI, QQP, and CoLA), some pairs of finetuned models have large barriers of increasing loss on the linear paths between them. |
Jeevesh Juneja; Rachit Bansal; Kyunghyun Cho; João Sedoc; Naomi Saphra; |
1258 | Dirichlet-based Uncertainty Calibration for Active Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite active DA methods address this by further proposing targetness to measure the representativeness of target domain characteristics, their predictive uncertainty is usually based on the prediction of deterministic models, which can easily be miscalibrated on data with distribution shift. Considering this, we propose a Dirichlet-based Uncertainty Calibration (DUC) approach for active DA, which simultaneously achieves the mitigation of miscalibration and the selection of informative target samples. |
Mixue Xie; Shuang Li; Rui Zhang; Chi Harold Liu; |
1259 | Accurate Image Restoration with Attention Retractable Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Obviously, this strategy could result in restricted receptive fields. To address this issue, we propose \textbf{A}ttention \textbf{R}etractable \textbf{T}ransformer (ART) for image restoration, which presents both dense and sparse attention modules in the network. |
Jiale Zhang; Yulun Zhang; Jinjin Gu; Yongbing Zhang; Linghe Kong; Xin Yuan; |
1260 | Causal Representation Learning for Instantaneous and Temporal Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This effectively creates “instantaneous” effects and invalidates previous identifiability results. To address this issue, we propose iCITRIS, a causal representation learning method that allows for instantaneous effects in temporal sequences with known intervention targets. |
Phillip Lippe; Sara Magliacane; Sindy Löwe; Yuki M Asano; Taco Cohen; Efstratios Gavves; |
1261 | Visual Imitation Learning with Patch Rewards Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to measure the expertise of various local regions of image samples, or called patches, and recover multi-dimensional patch rewards accordingly. |
Minghuan Liu; Tairan He; Weinan Zhang; Shuicheng YAN; Zhongwen Xu; |
1262 | Diffusion Models Already Have A Semantic Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose asymmetric reverse process (Asyrp) which discovers the semantic latent space in frozen pretrained diffusion models. |
Mingi Kwon; Jaeseok Jeong; Youngjung Uh; |
1263 | Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although ratio matching is a sound method to learn discrete EBMs, it suffers from expensive computation and excessive memory requirement, thereby resulting in difficulties for learning EBMs on high-dimensional data. Motivated from these limitations, in this study, we propose ratio matching with gradient-guided importance sampling (RMwGGIS). |
Meng Liu; Haoran Liu; Shuiwang Ji; |
1264 | Dataset Pruning: Reducing Training Data By Examining Generalization Influence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To answer these, we propose dataset pruning, an optimization-based sample selection method that can (1) examine the influence of removing a particular set of training samples on model’s generalization ability with theoretical guarantee, and (2) construct a smallest subset of training data that yields strictly constrained generalization gap. |
Shuo Yang; Zeke Xie; Hanyu Peng; Min Xu; Mingming Sun; Ping Li; |
1265 | Plateau in Monotonic Linear Interpolation — A Biased View of Loss Landscape for Deep Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the MLI property is not necessarily related to the hardness of optimization problems, and empirical observations on MLI for deep neural networks depend heavily on the biases. |
Xiang Wang; Annie N. Wang; Mo Zhou; Rong Ge; |
1266 | Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the gap, we propose Crossformer, a Transformer-based model utilizing cross-dimension dependency for MTS forecasting. |
Yunhao Zhang; Junchi Yan; |
1267 | BrainBERT: Self-supervised Representation Learning for Intracranial Electrodes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. |
Christopher Wang; Vighnesh Subramaniam; Adam Uri Yaari; Gabriel Kreiman; Boris Katz; Ignacio Cases; Andrei Barbu; |
1268 | Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates, both theoretically and empirically, the operator learning of PDEs with discontinuous solutions. |
Samuel Lanthaler; Roberto Molinaro; Patrik Hadorn; Siddhartha Mishra; |
1269 | Discovering Informative and Robust Positives for Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a non-contrastive learning framework without relying on negative samples for unsupervised video domain adaptation. |
Chang Liu; Kunpeng Li; Michael Stopa; Jun Amano; Yun Fu; |
1270 | Composing Ensembles of Pre-trained Models Via Iterative Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a unified framework for composing ensembles of different pre-trained models — combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. |
Shuang Li; Yilun Du; Joshua B. Tenenbaum; Antonio Torralba; Igor Mordatch; |
1271 | Automated Data Augmentations for Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose GraphAug, a novel automated data augmentation method aiming at computing label-invariant augmentations for graph classification. |
Youzhi Luo; Michael Curtis McThrow; Wing Yee Au; Tao Komikado; Kanji Uchino; Koji Maruhashi; Shuiwang Ji; |
1272 | Learning Label Encodings for Deep Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Lacking heretofore have been automated approaches to find a good label encoding for a given application. This paper introduces \emph{Regularized Label Encoding Learning (RLEL)} for end-to-end training of an entire network and its label encodings. |
Deval Shah; Tor M. Aamodt; |
1273 | Fair Attribute Completion on Graph with Missing Attributes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FairAC, a fair attribute completion method, to complement missing information and learn fair node embeddings for graphs with missing attributes. |
Dongliang Guo; Zhixuan Chu; Sheng Li; |
1274 | Robustness to Corruption in Pre-trained Bayesian Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop ShiftMatch, a new training-data-dependent likelihood for robustness to corruption in Bayesian neural networks (BNNs). |
Xi Wang; Laurence Aitchison; |
1275 | NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent developments have also led to physics-informed deep neural models capable of representing complex dynamical Partial Differential Equations (PDEs). Inspired by these developments, we propose Neural Time Fields (NTFields) for robot motion planning in cluttered scenarios. |
Ruiqi Ni; Ahmed H Qureshi; |
1276 | Meta-learning Adaptive Deep Kernel Gaussian Processes for Molecular Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Adaptive Deep Kernel Fitting with Implicit Function Theorem (ADKF-IFT), a novel framework for learning deep kernel Gaussian processes (GPs) by interpolating between meta-learning and conventional deep kernel learning. |
Wenlin Chen; Austin Tripp; José Miguel Hernández-Lobato; |
1277 | ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re$^2$), a novel solution to the aforementioned two drawbacks. |
Jianye HAO; Pengyi Li; Hongyao Tang; YAN ZHENG; Xian Fu; Zhaopeng Meng; |
1278 | Deep Ensembles for Graphs with Higher-order Dependencies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the presence of higher-order sequential dependencies, we show that the tendency of traditional graph representations to underfit each node’s neighborhood causes existing GNNs to generalize poorly. To address this, we propose a novel Deep Graph Ensemble (DGE), which captures neighborhood variance by training an ensemble of GNNs on different neighborhood subspaces of the same node within a higher-order network structure. |
Steven Krieg; William Burgis; Patrick Soga; Nitesh Chawla; |
1279 | Denoising Masked Autoencoders Are Certifiable Robust Vision Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new self-supervised method, which is called denoising masked autoencoders (DMAE), for learning certified robust classifiers of images. |
QuanLin Wu; Hang Ye; Yuntian Gu; Huishuai Zhang; Liwei Wang; Di He; |
1280 | Sound Randomized Smoothing in Floating-Point Arithmetic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss the implicit assumptions of randomized smoothing and show that they do not apply to generic image classification models whose smoothed versions are commonly certified. In order to overcome this problem, we propose a sound approach to randomized smoothing when using floating-point precision with essentially equal speed for quantized input. |
Vaclav Voracek; Matthias Hein; |
1281 | Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a novel episodic risk-sensitive Reinforcement Learning (RL) problem, named Iterated CVaR RL, which aims to maximize the tail of the reward-to-go at each step, and focuses on tightly controlling the risk of getting into catastrophic situations at each stage. |
Yihan Du; Siwei Wang; Longbo Huang; |
1282 | Test-Time Robust Personalization for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify the pitfalls of existing works under test-time distribution shifts and propose Federated Test-time Head Ensemble plus tuning (FedTHE+), which personalizes FL models with robustness to various test-time distribution shifts.Along with this, we build a benchmark for assessing performance and robustness of personalized FL methods during deployment. |
Liangze Jiang; Tao Lin; |
1283 | Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To eliminate excessive computational cost of DaNAS methods and the sub-optimality of rapid NAS methods, we propose a distillation-aware meta accuracy prediction model which can predict a given architecture’s final performances on a dataset when performing KD with a given teacher, without having to actually train it on the target task. |
Hayeon Lee; Sohyun An; Minseon Kim; Sung Ju Hwang; |
1284 | Denoising Diffusion Error Correction Codes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths. |
Yoni Choukroun; Lior Wolf; |
1285 | Dynamic Prompt Learning Via Policy Gradient for Semi-structured Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. |
Pan Lu; Liang Qiu; Kai-Wei Chang; Ying Nian Wu; Song-Chun Zhu; Tanmay Rajpurohit; Peter Clark; Ashwin Kalyan; |
1286 | Phase Transition for Detecting A Small Community in A Large Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using Sinkhorn’s theorem, we show that the signal captured by the X^2-test may be a modeling artifact, and it may disappear once we replace the Erdös-Renyi model by a broader network model. |
Jiashun Jin; Tracy Ke; Paxton Turner; Anru Zhang; |
1287 | The Power of Regularization in Solving Extensive-Form Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). |
Mingyang Liu; Asuman E. Ozdaglar; Tiancheng Yu; Kaiqing Zhang; |
1288 | Progressive Compressed Auto-Encoder for Self-supervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This redundancy is neglected by existing methods and causes non-negligible overheads in computation and storage that do not necessarily benefit self-supervised learning. In this paper, we present a novel approach named Progressive Compressed AutoEncoder (PCAE) to address this problem by progressively compacting tokens and retaining the least necessary information for representation. |
Jin Li; Yaoming Wang; XIAOPENG ZHANG; Yabo Chen; Dongsheng Jiang; Wenrui Dai; Chenglin Li; Hongkai Xiong; Qi Tian; |
1289 | CFlowNets: Continuous Control with Generative Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose generative continuous flow networks (CFlowNets) that can be applied to continuous control tasks. |
Yinchuan Li; Shuang Luo; Haozhi Wang; Jianye HAO; |
1290 | Causal Balancing for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underlying causal mechanisms for the data generation process. |
Xinyi Wang; Michael Saxon; Jiachen Li; Hongyang Zhang; Kun Zhang; William Yang Wang; |
1291 | Breaking Correlation Shift Via Conditional Invariant Regularizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an algorithm to make the model to generalize on data with spurious correlation, the method can be implemented without information on spurious feature. |
Mingyang Yi; Ruoyu Wang; Jiacheng Sun; Zhenguo Li; Zhi-Ming Ma; |
1292 | GRACE-C: Generalized Rate Agnostic Causal Estimation Via Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm that combines constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions to achieve speed up of multiple orders of magnitude. |
Mohammadsajad Abavisani; David Danks; Sergey Plis; |
1293 | Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Equiformer, a graph neural network leveraging the strength of Transformer architectures and incorporating SE(3)/E(3)-equivariant features based on irreducible representations (irreps). |
Yi-Lun Liao; Tess Smidt; |
1294 | Automating Nearest Neighbor Search Configuration with Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is becoming an increasingly unrealistic demand as ANN search grows in popularity. To tackle this obstacle to ANN adoption, this work proposes a constrained optimization-based approach to tuning quantization-based ANN algorithms. |
Philip Sun; Ruiqi Guo; Sanjiv Kumar; |
1295 | Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy-data distribution that we can confidently learn. |
Huangjie Zheng; Pengcheng He; Weizhu Chen; Mingyuan Zhou; |
1296 | NTK-SAP: Improving Neural Network Pruning By Aligning Training Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent advances in neural tangent kernel (NTK) theory suggest that the training dynamics of large enough neural networks is closely related to the spectrum of the NTK. Motivated by this finding, we propose to prune the connections that have the least influence on the spectrum of the NTK. |
Yite Wang; Dawei Li; Ruoyu Sun; |
1297 | TabPFN: A Transformer That Solves Small Tabular Classification Problems in A Second Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. |
Noah Hollmann; Samuel Müller; Katharina Eggensperger; Frank Hutter; |
1298 | Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To deal with these, we propose a valid method BFRNet, including a basic audio-visual speech separator and a Filter-Recovery Network (FRNet). |
Haoyue Cheng; Zhaoyang Liu; Wayne Wu; Limin Wang; |
1299 | Sparse Q-Learning: Offline Reinforcement Learning with Implicit Value Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. |
Haoran Xu; Li Jiang; Jianxiong Li; Zhuoran Yang; Zhaoran Wang; Victor Wai Kin Chan; Xianyuan Zhan; |
1300 | Topology-aware Robust Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. |
Fengchun Qiao; Xi Peng; |
1301 | Limitless Stability for Graph Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work establishes rigorous, novel and widely applicable stability guarantees and transferability bounds for general graph convolutional networks — without reference to any underlying limit object or statistical distribution. |
Christian Koke; |
1302 | Token Merging: Your ViT But Faster Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Token Merging (ToMe), a simple method to increase the throughput of existing ViT models without needing to train. |
Daniel Bolya; Cheng-Yang Fu; Xiaoliang Dai; Peizhao Zhang; Christoph Feichtenhofer; Judy Hoffman; |
1303 | Spatial Attention Kinetic Networks with E(n)-Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple alternative functional form that uses neurally parametrized linear combinations of edge vectors to achieve equivariance while still universally approximating node environments. |
Yuanqing Wang; John Chodera; |
1304 | Revisiting The Entropy Semiring for Neural Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the entropy semiring for neural speech recognition models, and show how alignment entropy can be used to supervise models through regularization or distillation. |
Oscar Chang; Dongseong Hwang; Olivier Siohan; |
1305 | Neural Groundplans: Persistent Neural Scene Representations from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. |
Prafull Sharma; Ayush Tewari; Yilun Du; Sergey Zakharov; Rares Andrei Ambrus; Adrien Gaidon; William T. Freeman; Fredo Durand; Joshua B. Tenenbaum; Vincent Sitzmann; |
1306 | Stochastic Differentially Private and Fair Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide the first stochastic differentially private algorithm for fair learning that is guaranteed to converge. |
Andrew Lowy; Devansh Gupta; Meisam Razaviyayn; |
1307 | Volumetric Optimal Transportation By Fast Fourier Transform Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and powerful method, the FFT-OT (fast Fourier transform-optimal transport), to compute the 3-dimensional OT problems. |
Na Lei; DONGSHENG An; Min Zhang; Xiaoyin Xu; David Gu; |
1308 | Function-Consistent Feature Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering this, we argue that the similarity between teacher and student features should \textit{not} be measured merely based on their appearance (i.e. L2 distance), but should, more importantly, be measured by their difference in function, namely how the lateral parts of the network will read, decode, and process them. Therefore, we propose Function-Consistent Feature Distillation (FCFD), which explicitly optimizes the functional similarity between teacher and student features. |
Dongyang Liu; Meina Kan; Shiguang Shan; Xilin CHEN; |
1309 | Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an automated linearization method to train a DNN with limited ReLU budget for inference in yielding models able to perform significantly better than exiting private inference SOTA both in terms of potentially improved latency and accuracy. |
Souvik Kundu; Shunlin Lu; Yuke Zhang; Jacqueline Tiffany Liu; Peter Anthony Beerel; |
1310 | Decompose to Generalize: Species-Generalized Animal Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper challenges the cross-species generalization problem for animal pose estimation, aiming to learn a pose estimator that can be well generalized to novel species. |
Guangrui Li; Yifan Sun; Zongxin Yang; Yi Yang; |
1311 | Image As Set of Points Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a straightforward and promising paradigm for visual representation, which is called Context Clusters. |
Xu Ma; Yuqian Zhou; Huan Wang; Can Qin; Bin Sun; Chang Liu; Yun Fu; |
1312 | Trainability Preserving Neural Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present trainability preserving pruning (TPP), a scalable method to preserve network trainability against pruning, aiming for improved pruning performance. |
Huan Wang; Yun Fu; |
1313 | DrML: Diagnosing and Rectifying Vision Models Using Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a combination of theoretical explanation and empirical verification, we present conditions under which classifiers trained on embeddings from one modality can be equivalently applied to embeddings from another modality. |
Yuhui Zhang; Jeff Z. HaoChen; Shih-Cheng Huang; Kuan-Chieh Wang; James Zou; Serena Yeung; |
1314 | Adversarial Attacks on Adversarial Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a security threat to adversarial multi-armed bandit, in which an attacker perturbs the loss or reward signal to control the behavior of the victim bandit player. |
Yuzhe Ma; Zhijin Zhou; |
1315 | Harnessing Out-Of-Distribution Examples Via Augmenting Content and Style Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To Harness OOD data, this paper proposes HOOD method that can leverage the content and style from each image instance to identify benign and malign OOD data. |
Zhuo Huang; Xiaobo Xia; Li Shen; Bo Han; Mingming Gong; Chen Gong; Tongliang Liu; |
1316 | TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is barely explored in the literature to model these three perspectives in each network layer in an end-to-end manner, which can not only minimize the effort of carefully designing empirical structures for the three multi-task representation learning objectives, but also greatly improve the representation learning capability of the multi-task network since all the model capacity will be used to optimize the three objectives together. In this paper, we propose TaskPrompter, a novel spatial-channel multi-task prompting transformer framework to achieve this target. |
Hanrong Ye; Dan Xu; |
1317 | Learning Domain-Agnostic Representation for Disease Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To disentangle disease-related features, we first leverage structural causal modeling to explicitly model disease-related and center-effects that are provable to be disentangled from each other. Guided by this, we propose a novel Domain Agnostic Representation Model (DarMo) based on variational Auto-Encoder. |
Churan Wang; Jing Li; Xinwei Sun; Fandong Zhang; Yizhou Yu; Yizhou Wang; |
1318 | BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the incorporation of LiDAR-based detectors for multi-view 3D object detection. |
Zehui Chen; Zhenyu Li; Shiquan Zhang; Liangji Fang; Qinhong Jiang; Feng Zhao; |
1319 | Suppressing The Heterogeneity: A Strong Feature Extractor for Few-shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by these observations, we propose a feature extractor with Multi-level Heterogeneity Suppressing (MuHS). |
Zhengdong Hu; Yifan Sun; Yi Yang; |
1320 | Sparse and Hierarchical Masked Modeling for Convolutional Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a simple yet powerful framework to pre-train convolutional network (convnet) with Sparse masKed modeling. |
Keyu Tian; Yi Jiang; qishuai diao; Chen Lin; Liwei Wang; Zehuan Yuan; |
1321 | On Amortizing Convex Conjugates for Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper focuses on computing the convex conjugate operation that arises when solving Euclidean Wasserstein-2 optimal transport problems. |
Brandon Amos; |
1322 | Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the dynamic nature of 3D molecules, where the continuous motion of a molecule in the 3D Euclidean space forms a smooth potential energy surface, we propose a 3D coordinate denoising pretraining framework to model such an energy landscape. |
Shengchao Liu; Hongyu Guo; Jian Tang; |
1323 | DAG Matters! GFlowNets Enhanced Explainer for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the exponential size of candidate subgraphs limits the applicability of state-of-the-art methods to large-scale GNNs. We enhance on this through a different approach: by proposing a generative structure – GFlowNets-based GNN Explainer (GFlowExplainer), we turn the optimization problem into a step-by-step generative problem. |
Wenqian Li; Yinchuan Li; Zhigang Li; Jianye HAO; Yan Pang; |
1324 | Continuous-Discrete Convolution for (3+1)D Geometry-Sequence Modeling in Proteins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a Continuous-Discrete Convolution (CDConv) that uses irregular and regular approaches to model the geometry and sequence structures, respectively. |
Hehe Fan; Zhangyang Wang; Yi Yang; Mohan Kankanhalli; |
1325 | Solving Stochastic Weak Minty Variational Inequalities Without Increasing Batch Size Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty variational inequality (MVI). |
Thomas Pethick; Olivier Fercoq; Puya Latafat; Panagiotis Patrinos; Volkan Cevher; |
1326 | A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we generalize the SLTH to functions that preserve the action of the group $G$—i.e. $G$-equivariant network—and prove, with high probability, that one can approximate any $G$-equivariant network of fixed width and depth by pruning a randomly initialized overparametrized $G$-equivariant network to a $G$-equivariant subnetwork. |
Damien Ferbach; Christos Tsirigotis; Gauthier Gidel; Joey Bose; |
1327 | CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CO3, namely {Co}operative {Co}ntrastive Learning and {Co}ntextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner. |
Runjian Chen; Yao Mu; Runsen Xu; Wenqi Shao; Chenhan Jiang; Hang Xu; Yu Qiao; Zhenguo Li; Ping Luo; |
1328 | FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its performance suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting. In this paper, we propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems. |
Yan Sun; Li Shen; Tiansheng Huang; Liang Ding; Dacheng Tao; |
1329 | Share Your Representation Only: Guaranteed Improvement of The Privacy-Utility Tradeoff in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). |
Zebang Shen; Jiayuan Ye; Anmin Kang; Hamed Hassani; Reza Shokri; |
1330 | EquiMod: An Equivariance Module to Improve Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a generic equivariance module that structures the learned latent space, in the sense that our module learns to predict the displacement in the embedding space caused by the augmentations. |
Alexandre DEVILLERS; Mathieu Lefort; |
1331 | KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by text clustering, we study correlation clustering where similarities must be queried via an expensive model (e.g. a large language model) with additional help from a cheap but noisy model (e.g. an embedding based model). |
Sandeep Silwal; Sara Ahmadian; Andrew Nystrom; Andrew McCallum; Deepak Ramachandran; Seyed Mehran Kazemi; |
1332 | Few-Shot Domain Adaptation For End-to-End Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on a generative channel model based on the Gaussian mixture density network (MDN), and propose a regularized, parameter-efficient adaptation of the MDN using a set of affine transformations. |
Jayaram Raghuram; Yijing Zeng; Dolores Garcia; Rafael Ruiz; Somesh Jha; Joerg Widmer; Suman Banerjee; |
1333 | Online Bias Correction for Task-Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show both theoretically and empirically how experience replay biases the outputs of the model towards recent stream observations. |
Aristotelis Chrysakis; Marie-Francine Moens; |
1334 | Don’t Fear The Unlabelled: Safe Semi-supervised Learning Via Debiasing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a slight modification of most common semi-supervised learning methods to make them safe by debiasing their risk estimate. In particular, we apply it successfully to Fixmatch. |
Hugo Schmutz; Olivier HUMBERT; Pierre-Alexandre Mattei; |
1335 | Learning A Data-Driven Policy Network for Pre-Training Automated Feature Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel AutoFE framework Feature Set Data-Driven Search (FETCH), a pipeline mainly for feature generation and selection. |
Liyao Li; Haobo Wang; Liangyu Zha; Qingyi Huang; Sai Wu; Gang Chen; Junbo Zhao; |
1336 | Actionable Neural Representations: Grid Cells from Minimal Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study a novel definition of an optimal representation of structured spaces, and show that it can be used to derive the brain’s grid cells and their perturbations normatively. |
Will Dorrell; Peter E. Latham; Timothy E. J. Behrens; James C. R. Whittington; |
1337 | A Message Passing Perspective on Learning Dynamics of Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that if we cast a contrastive objective equivalently into the function space, then its learning dynamics admits an interpretable form. |
Yifei Wang; Qi Zhang; Tianqi Du; Jiansheng Yang; Zhouchen Lin; Yisen Wang; |
1338 | Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a trajectory-informed derivative estimation method which only uses the optimization trajectory (i.e., the history of function queries during optimization) and hence eliminates the need for additional function queries to estimate a derivative. |
Yao Shu; Zhongxiang Dai; Weicong Sng; Arun Verma; Patrick Jaillet; Bryan Kian Hsiang Low; |
1339 | Neuroevolution Is A Competitive Alternative to Reinforcement Learning for Skill Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that less widely-used neuroevolution methods, specifically Quality Diversity (QD), are a competitive alternative to information-theory-augmented RL for skill discovery. |
Felix Chalumeau; Raphael Boige; Bryan Lim; Valentin Macé; Maxime Allard; Arthur Flajolet; Antoine Cully; Thomas PIERROT; |
1340 | Uniform-in-time Propagation of Chaos for The Mean Field Gradient Langevin Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we establish a quantitative weak propagation of chaos result for the system, with a finite-particle discretization error of $\mathcal{O}(1/N)$ \textit{uniformly over time}, where $N$ is the width of the neural network. |
Taiji Suzuki; Atsushi Nitanda; Denny Wu; |
1341 | Individual Privacy Accounting with Gaussian Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This kind of analysis has been carried out for the R\’enyi differential privacy by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. |
Antti Koskela; Marlon Tobaben; Antti Honkela; |
1342 | Evolving Populations of Diverse RL Agents with MAP-Elites Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, existing approaches mixing ME with RL tend to be tied to a specific RL algorithm, which effectively prevents their use on problems where the corresponding RL algorithm fails. To address these shortcomings, we introduce a flexible framework that allows the use of any RL algorithm within a population update and alleviates the aforementioned limitations by evolving populations of agents (whose definition include hyperparameters and all learnable parameters) instead of just policies. |
Thomas PIERROT; Arthur Flajolet; |
1343 | RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new primal–dual algorithm, in which the dual update is randomized; equivalently, the proximity operator of one of the function in the problem is replaced by a stochastic oracle. |
Laurent Condat; Peter Richtárik; |
1344 | Fast Nonlinear Vector Quantile Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We extend Vector Quantile Regression to support non-linear specification, while ensuring monotonicity and scaling to millions of samples. |
Aviv A. Rosenberg; Sanketh Vedula; Yaniv Romano; Alexander Bronstein; |
1345 | Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Worse, these entities are often unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured inputs. |
Michael Chang; Alyssa Li Dayan; Franziska Meier; Thomas L. Griffiths; Sergey Levine; Amy Zhang; |
1346 | Transfer NAS with Meta-learned Bayesian Surrogates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is in contrast to the manual design process by researchers and engineers that leverage previous deep learning experiences by, e.g., transferring architectures from previously solved, related problems. We propose to adopt this human design strategy and introduce a novel surrogate for NAS, that is meta-learned across prior architecture evaluations across different datasets. |
Gresa Shala; Thomas Elsken; Frank Hutter; Josif Grabocka; |
1347 | Scaling Up Probabilistic Circuits By Latent Variable Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This phenomenon suggests that the existing optimizers fail to exploit the full expressive power of large PCs. We propose to overcome such bottleneck by latent variable distillation: we leverage the less tractable but more expressive deep generative models to provide extra supervision over the latent variables of PCs. |
Anji Liu; Honghua Zhang; Guy Van den Broeck; |
1348 | UL2: Unifying Language Learning Paradigms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. |
Yi Tay; Mostafa Dehghani; Vinh Q. Tran; Xavier Garcia; Jason Wei; Xuezhi Wang; Hyung Won Chung; Dara Bahri; Tal Schuster; Steven Zheng; Denny Zhou; Neil Houlsby; Donald Metzler; |
1349 | Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address limitations in prior approaches by assuming a more nuanced form of group shift: conditioned on the label, we assume that the true group function is simple. |
Amrith Setlur; Don Dennis; Benjamin Eysenbach; Aditi Raghunathan; Chelsea Finn; Virginia Smith; Sergey Levine; |
1350 | Feature Selection and Low Test Error in Shallow Low-rotation ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work establishes low test error of gradient flow (GF) and stochastic gradient descent (SGD) on two-layer ReLU networks with standard initialization scale, in three regimes where key sets of weights rotate little (either naturally due to GF and SGD, or due to an artificial constraint), and making use of margins as the core analysis technique. |
Matus Telgarsky; |
1351 | The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. |
Daniel Kunin; Atsushi Yamamura; Chao Ma; Surya Ganguli; |
1352 | Coupled Multiwavelet Operator Learning for Coupled Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards this end, we propose a \textit{coupled multiwavelets neural operator} (CMWNO) learning scheme by decoupling the coupled integral kernels during the multiwavelet decomposition and reconstruction procedures in the Wavelet space. |
Xiongye Xiao; Defu Cao; Ruochen Yang; Gaurav Gupta; Chenzhong Yin; Gengshuo Liu; Radu Balan; Paul Bogdan; |
1353 | Multi-Objective Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a systematic study of multi-objective online learning. |
Jiyan Jiang; Wenpeng Zhang; Shiji Zhou; Lihong Gu; Xiaodong Zeng; Wenwu Zhu; |
1354 | Sparse Distributed Memory Is A Continual Learner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. |
Trenton Bricken; Xander Davies; Deepak Singh; Dmitry Krotov; Gabriel Kreiman; |
1355 | UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new sampling method, UniMax, that delivers more uniform coverage of head languages while mitigating overfitting on tail languages by explicitly capping the number of repeats over each languages corpus. |
Hyung Won Chung; Xavier Garcia; Adam Roberts; Yi Tay; Orhan Firat; Sharan Narang; Noah Constant; |
1356 | GNNInterpreter: A Probabilistic Generative Model-Level Explanation for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model-agnostic model-level explanation method for different GNNs that follow the message passing scheme, GNNInterpreter, to explain the high-level decision-making process of the GNN model. |
Xiaoqi Wang; Han Wei Shen; |
1357 | Learning to Reason Over Visual Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the goal of designing AI systems with this capacity, recent work has focused on evaluating whether neural networks can learn to solve RPM-like problems. This work has generally found that strong performance on these problems requires the incorporation of inductive biases that are specific to the RPM problem format, raising the question of whether such models might be more broadly useful. |
Shanka Subhra Mondal; Taylor Whittington Webb; Jonathan Cohen; |
1358 | Contextual Convolutional Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new Convolutional Neural Network, named Contextual Convolutional Network, that capably serves as a general-purpose backbone for visual recognition. |
Shuxian Liang; Xu Shen; Tongliang Liu; Xian-Sheng Hua; |
1359 | Imitating Graph-Based Planning with Goal-Conditioned Policies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the sample-efficiency of such RL schemes still remains a challenge, particularly for long-horizon tasks. To address this issue, we present a simple yet effective self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. |
Junsu Kim; Younggyo Seo; Sungsoo Ahn; Kyunghwan Son; Jinwoo Shin; |
1360 | Statistical Inference for Fisher Market Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we focus on the specific case of linear Fisher markets. |
Luofeng Liao; Yuan Gao; Christian Kroer; |
1361 | A Theoretical Study of Inductive Biases in Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class. |
Jeff Z. HaoChen; Tengyu Ma; |
1362 | Easy Differentially Private Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. |
Kareem Amin; Matthew Joseph; Mónica Ribero; Sergei Vassilvitskii; |
1363 | From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. |
Zichen Jeff Cui; Yibin Wang; Nur Muhammad Mahi Shafiullah; Lerrel Pinto; |
1364 | A Closer Look at Model Adaptation Using Feature Distortion and Simplicity Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the strong effectiveness of LP+FT, we propose incorporating hardness-promoting perturbations during LP to obtain initializations for FT that further decrease SB. |
Puja Trivedi; Danai Koutra; Jayaraman J. Thiagarajan; |
1365 | Digging Into Backbone Design on Face Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering the intrinsic design property and the virtual importance role of the face detection backbone, we thus ask a critical question: How to employ NAS to search FD-friendly backbone architecture? To cope with this question, we propose a distribution-dependent stage-aware ranking score (DDSAR-Score) to explicitly characterize the stage-level expressivity and identify the individual importance of each stage, thus satisfying the aforementioned design criterion of the FD backbone. |
Yang Liu; Fei Wang; Lei Shang; Jiankang Deng; Baigui Sun; Xuansong Xie; |
1366 | Understanding and Adopting Rational Behavior By Bellman Score Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make a key observation that knowing how changes in the underlying rewards affect the optimal behavior allows one to solve a variety of aforementioned problems. |
Kuno Kim; Stefano Ermon; |
1367 | Towards Smooth Video Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work investigates how to model the temporal relations for composing a video with arbitrary number of frames, from a few to even infinite, using generative adversarial networks (GANs). |
Qihang Zhang; Ceyuan Yang; Yujun Shen; Yinghao Xu; Bolei Zhou; |
1368 | Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we evaluate and improve the generalization performance for rein- forcement learning (RL) agents on the set of “controllable” states, where good policies exist in these states to achieve high rewards. |
Li-Cheng Lan; Huan Zhang; Cho-Jui Hsieh; |
1369 | Continuous PDE Dynamics Forecasting with Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address this problem by introducing a new data-driven approach, DINo, that models a PDE’s flow with continuous-time dynamics of spatially continuous functions. |
Yuan Yin; Matthieu Kirchmeyer; Jean-Yves Franceschi; Alain Rakotomamonjy; patrick gallinari; |
1370 | STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective framework for few-shot tabular learning, coined Self-generated Tasks from UNlabeled Tables (STUNT). |
Jaehyun Nam; Jihoon Tack; Kyungmin Lee; Hankook Lee; Jinwoo Shin; |
1371 | Language Models Can (kind Of) Reason: A Systematic Formal Analysis of Chain-of-Thought Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable systematic exploration of the reasoning ability of LLMs, we present a new synthetic question-answering dataset called PrOntoQA, where each example is generated from a synthetic world model represented in first-order logic. |
Abulhair Saparov; He He; |
1372 | Understanding The Covariance Structure of Convolutional Filters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first observe that such learned filters have highly-structured covariance matrices, and moreover, we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks of different depths, widths, patch sizes, and kernel sizes, indicating a degree of model-independence to the covariance structure. |
Asher Trockman; Devin Willmott; J Zico Kolter; |
1373 | Masked Distillation with Receptive Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a learnable embedding dubbed receptive token to locate the pixels of interests (PoIs) in the feature map, with a distillation mask generated via pixel-wise attention. |
Tao Huang; Yuan Zhang; Shan You; Fei Wang; Chen Qian; Jian Cao; Chang Xu; |
1374 | Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work studies the threats of adversarial attack on multivariate probabilistic forecasting models and viable defense mechanisms. |
Linbo Liu; Youngsuk Park; Trong Nghia Hoang; Hilaf Hasson; Luke Huan; |
1375 | Efficient Deep Reinforcement Learning Requires Regulating Statistical Overfitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we attempt to understand the primary bottleneck in sample-efficient deep RL by examining several potential hypotheses such as non-stationarity, excessive action distribution shift, and overfitting. |
Qiyang Li; Aviral Kumar; Ilya Kostrikov; Sergey Levine; |
1376 | Offline Reinforcement Learning with Differentiable Function Approximation Is Provably Efficient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards bridging the gap, we take a step by considering offline reinforcement learning with \emph{differentiable function class approximation} (DFA). |
Ming Yin; Mengdi Wang; Yu-Xiang Wang; |
1377 | MECTA: Memory-Economic Continual Test-Time Model Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a novel solution, dubbed MECTA, to drastically improve the memory efficiency of gradient-based CTA. |
Junyuan Hong; Lingjuan Lyu; Jiayu Zhou; Michael Spranger; |
1378 | Robust Algorithms on Adaptive Inputs from Bounded Adversaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study dynamic algorithms robust to adaptive inputs generated from sources with bounded capabilities, such as sparsity or limited interaction. |
Yeshwanth Cherapanamjeri; Sandeep Silwal; David Woodruff; Fred Zhang; Qiuyi Zhang; Samson Zhou; |
1379 | Interpretability with Full Complexity By Constraining Feature Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The optimal compression of each feature—at every stage of approximation—allows fine-grained inspection of how feature values are similar or distinct with regards to the prediction. We develop a framework for extracting insight from the spectrum of approximate models and demonstrate its utility on a range of tabular datasets. |
Kieran A Murphy; Danielle Bassett; |
1380 | Chasing All-Round Graph Representation Robustness: Model, Training, and Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify a fundamental issue in graph adversarial learning and then propose a novel method to enlarge the model capacity and enrich the representation diversity of adversarial samples. |
Chunhui Zhang; Yijun Tian; Mingxuan Ju; Zheyuan Liu; Yanfang Ye; Nitesh Chawla; Chuxu Zhang; |
1381 | What Shapes The Loss Landscape of Self Supervised Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive an analytically tractable theory of SSL landscape and show that it accurately captures an array of collapse phenomena and identifies their causes. |
Liu Ziyin; Ekdeep Singh Lubana; Masahito Ueda; Hidenori Tanaka; |
1382 | No Reason for No Supervision: Improved Generalization in Supervised Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a supervised learning setup that leverages the best of both worlds. |
Mert Bülent Sarıyıldız; Yannis Kalantidis; Karteek Alahari; Diane Larlus; |
1383 | Simple Initialization and Parametrization of Sinusoidal Networks Via Their Kernel Bandwidth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We perform a theoretical analysis of a simplified sinusoidal network and use this to propose an informed initialization scheme. |
Filipe de Avila Belbute-Peres; J Zico Kolter; |
1384 | Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. |
Rui Yuan; Simon Shaolei Du; Robert M. Gower; Alessandro Lazaric; Lin Xiao; |
1385 | EVA3D: Compositional 3D Human Generation from 2D Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose, EVA3D, an unconditional 3D human generative model learned from 2D image collections only. |
Fangzhou Hong; Zhaoxi Chen; Yushi LAN; Liang Pan; Ziwei Liu; |
1386 | Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on offline RL with linear function approximation and propose two new algorithms, LinPEVI-ADV+ and LinPMVI-ADV+, for single-agent MDPs and two-player zero-sum Markov games (MGs), respectively. |
Wei Xiong; Han Zhong; Chengshuai Shi; Cong Shen; Liwei Wang; Tong Zhang; |
1387 | Quantile Risk Control: A Flexible Framework for Bounding The Probability of High-Loss Predictions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a flexible framework to produce a variety of bounds on quantiles of the loss distribution incurred by a predictor. |
Jake Snell; Thomas P Zollo; Zhun Deng; Toniann Pitassi; Richard Zemel; |
1388 | Git Re-Basin: Merging Models Modulo Permutation Symmetries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that neural network loss landscapes contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. 2021. We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. |
Samuel Ainsworth; Jonathan Hayase; Siddhartha Srinivasa; |
1389 | The Role of Coverage in Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Concretely, we show that coverability—that is, existence of a data distribution that satisfies a ubiquitous coverage condition called concentrability—can be viewed as a structural property of the underlying MDP, and can be exploited by standard algorithms for sample-efficient exploration, even when the agent does not know said distribution. |
Tengyang Xie; Dylan J Foster; Yu Bai; Nan Jiang; Sham M. Kakade; |
1390 | Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We formalize the notions of coordination level and heterogeneity level of an environment and present HECOGrid, a suite of multi-agent RL environments that facilitates empirical evaluation of different MARL approaches across different levels of coordination and environmental heterogeneity by providing a quantitative control over coordination and heterogeneity levels of the environment. |
Dianbo Liu; Vedant Shah; Oussama Boussif; Cristian Meo; Anirudh Goyal; Tianmin Shu; Michael Curtis Mozer; Nicolas Heess; Yoshua Bengio; |
1391 | Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Structured Exploration with Achievements (SEA), a multi-stage reinforcement learning algorithm that learns the environment structure with offline data and uses the learned structure to learn different skills and improve overall exploration with online environment interactions in a particular type of environment that has an internal achievement system. |
Zihan Zhou; Animesh Garg; |
1392 | PINTO: Faithful Language Reasoning Using Prompted-Generated Rationales Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. |
PeiFeng Wang; Aaron Chan; Filip Ilievski; Muhao Chen; Xiang Ren; |
1393 | Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and Its Superiority to Kernel Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established. We investigate the excess risk of two-layer ReLU neural networks in a teacher-student regression model, in which a student network learns an unknown teacher network through its outputs. |
Shunta Akiyama; Taiji Suzuki; |
1394 | GEASS: Neural Causal Feature Selection for High-dimensional Biological Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we present GEASS (Granger fEAture Selection of Spatiotemporal data), which identifies sparse Granger causality mechanisms of high dimensional spatiotemporal data by a single neural network. |
Mingze Dong; Yuval Kluger; |
1395 | Conditional Positional Encodings for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a conditional positional encoding (CPE) scheme for vision Transformers. |
Xiangxiang Chu; Zhi Tian; Bo Zhang; Xinlong Wang; Chunhua Shen; |
1396 | Characterizing Intrinsic Compositionality in Transformers with Tree Projections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a method to functionally project a transformer into the space of tree structured models and use it to uncover intrinsic compositionality of transformers trained on language data. |
Shikhar Murty; Pratyusha Sharma; Jacob Andreas; Christopher D Manning; |
1397 | Augmentation Component Analysis: Modeling Similarity Via The Augmentation Overlaps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So the augmentation feature, composed of the distribution of augmentations, can act as the ideal embedding, and similarity over them reveals how much the augmentations of two samples overlap. Without computational burdens to explicitly estimate its value, we propose Augmentation Component Analysis (ACA) with a contrastive-like loss to learn principal components and an on-the-fly projection loss to embed data. |
Lu Han; Han-Jia Ye; De-Chuan Zhan; |
1398 | ModelAngelo: Automated Model Building for Cryo-EM Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advances in machine learning applications to protein structure prediction show potential for automating this process. Taking inspiration from these techniques, we have built ModelAngelo for automated model building of proteins in cryo-EM maps. |
Kiarash Jamali; Dari Kimanius; Sjors HW Scheres; |
1399 | Distilling Cognitive Backdoor Within An Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a simple method to distill and detect backdoor patterns within an image: \emph{Cognitive Distillation} (CD). |
Hanxun Huang; Xingjun Ma; Sarah Monazam Erfani; James Bailey; |
1400 | 3D Generation on ImageNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first time, we propose a 3D generator which works on non-aligned datasets. |
Ivan Skorokhodov; Aliaksandr Siarohin; Yinghao Xu; Jian Ren; Hsin-Ying Lee; Peter Wonka; Sergey Tulyakov; |
1401 | Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games, and study equilibrium finding algorithms in both the infinite-horizon discounted setting and the finite-horizon episodic setting. |
Shicong Cen; Yuejie Chi; Simon Shaolei Du; Lin Xiao; |
1402 | Rethinking The Expressive Power of GNNs Via Graph Biconnectivity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take a fundamentally different perspective to study the expressive power of GNNs beyond the WL test. |
Bohang Zhang; Shengjie Luo; Liwei Wang; Di He; |
1403 | Interaction-Based Disentanglement of Entities for Object-Centric World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes STEDIE, a new model that disentangles object representations based on interactions, into interaction-relevant relational features and interaction-irrelevant global features without supervision. |
Akihiro Nakano; Masahiro Suzuki; Yutaka Matsuo; |
1404 | One Transformer Can Understand Both 2D & 3D Molecular Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe a general-purpose neural network model for chemistry should be able to handle molecular tasks across data modalities. To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations. |
Shengjie Luo; Tianlang Chen; Yixian Xu; Shuxin Zheng; Tie-Yan Liu; Liwei Wang; Di He; |
1405 | Linear Mode Connectivity of Deep Neural Networks Via Permutation Invariance and Renormalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we empirically investigate the conjecture from Entezari et al. (2021) which states that if permutation invariance is taken into account, then there should be no loss barrier to the linear interpolation between SGD solutions. |
Keller Jordan; Hanie Sedghi; Olga Saukh; Rahim Entezari; Behnam Neyshabur; |
1406 | Learning to Compose Soft Prompts for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) like CLIP. |
Nihal V. Nayak; Peilin Yu; Stephen Bach; |
1407 | Diffusion-GAN: Training GANs with Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Diffusion-GAN, a novel GAN framework that leverages a forward diffusion chain to generate Gaussian-mixture distributed instance noise. |
Zhendong Wang; Huangjie Zheng; Pengcheng He; Weizhu Chen; Mingyuan Zhou; |
1408 | Unsupervised Learning for Combinatorial Optimization Needs Meta Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Albeit with some advantages over traditional solvers, the current framework optimizes an averaged performance over the distribution of historical problem instances, which misaligns with the actual goal of CO that looks for a good solution to every future encountered instance. With this observation, we propose a new objective of unsupervised learning for CO where the goal of learning is to search good initialization for future problem instances rather than give direct solutions. |
Haoyu Peter Wang; Pan Li; |
1409 | Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel attack that reveals private user text by deploying malicious parameter vectors, and which succeeds even with mini-batches, multiple users, and long sequences. |
Liam H Fowl; Jonas Geiping; Steven Reich; Yuxin Wen; Wojciech Czaja; Micah Goldblum; Tom Goldstein; |
1410 | Adaptive Optimization in The $\infty$-Width Limit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive the infinite width limits of neural networks trained with adaptive optimizers. |
Etai Littwin; Greg Yang; |
1411 | Sparse Mixture-of-Experts Are Domain Generalizable Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to explore an orthogonal direction, i.e., the design of the backbone architecture. |
Bo Li; Yifei Shen; Jingkang Yang; Yezhen Wang; Jiawei Ren; Tong Che; Jun Zhang; Ziwei Liu; |
1412 | Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. |
Aviral Kumar; Rishabh Agarwal; Xinyang Geng; George Tucker; Sergey Levine; |
1413 | Diffusion-based Image Translation Using Disentangled Style and Content Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based image translation method using disentangled style and content representation. |
Gihyun Kwon; Jong Chul Ye; |
1414 | Empowering Networks With Scale and Rotation Equivariance Using A Similarity Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we devise a method that provides networks with equivariance with respect to translation, rotation, and scaling simultaneously. |
Zikai Sun; Thierry Blu; |
1415 | What Learning Algorithm Is In-context Learning? Investigations with Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the transformers can implement learning algorithms for linear models based e.g gradient descent, then observe they closely match the predictors of known algorithms, transitioning between different predictors as transformer depth vary. |
Ekin Akyürek; Jacob Andreas; Dale Schuurmans; Tengyu Ma; Denny Zhou; |
1416 | Enhancing Meta Learning Via Multi-Objective Soft Improvement Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing MOO solvers need to access all the objectives’ gradients in each iteration, and cannot scale to the huge number of tasks in typical meta-learning settings. To alleviate this problem, we propose a scalable gradient-based solver with the use of mini-batch. |
Runsheng Yu; Weiyu Chen; Xinrun Wang; James Kwok; |
1417 | $\mathrm{SE}(3)$-Equivariant Attention Networks for Shape Reconstruction in Function Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for 3D shape reconstruction from unoriented point clouds. |
Evangelos Chatzipantazis; Stefanos Pertigkiozoglou; Edgar Dobriban; Kostas Daniilidis; |
1418 | Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that ensemble/knowledge distillation in \emph{deep learning} works very differently from traditional learning theory (such as boosting or NTKs). We develop a theory showing that when data has a structure we refer to as “multi-view”, then ensemble of independently trained neural networks can provably improve test accuracy, and such superior test accuracy can also be provably distilled into a single model. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
1419 | How Can GANs Learn Hierarchical Generative Models for Real-World Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally study how GANs can efficiently learn certain hierarchically generated distributions that are close to the distribution of real-life images. |
Zeyuan Allen-Zhu; Yuanzhi Li; |
1420 | Spotlight: Mobile UI Understanding Using Vision-Language Models with A Focus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Spotlight, a vision-only approach for mobile UI understanding. |
Gang Li; Yang Li; |
1421 | A Control-Centric Benchmark for Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find empirically that for planning robotic manipulation, existing metrics can be unreliable at predicting execution success. To address this, we propose a benchmark for action-conditioned video prediction in the form of a control benchmark that evaluates a given model for simulated robotic manipulation through sampling-based planning. |
Stephen Tian; Chelsea Finn; Jiajun Wu; |
1422 | Heavy-tailed Noise Does Not Explain The Gap Between SGD and Adam, But Sign Descent Might Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find evidence that stochasticity and heavy-tailed noise are not major factors in the performance gap between SGD and Adam. |
Frederik Kunstner; Jacques Chen; Jonathan Wilder Lavington; Mark Schmidt; |
1423 | Building Normalizing Flows with Stochastic Interpolants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A simple generative model based on a continuous-time normalizing flow between any pair of base and target distributions is proposed. |
Michael Samuel Albergo; Eric Vanden-Eijnden; |
1424 | Dual Student Networks for Data-Free Model Stealing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on. |
James Beetham; Navid Kardan; Ajmal Saeed Mian; Mubarak Shah; |
1425 | Competitive Physics Informed Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This strategy is called physics-informed neural networks (PINNs), but it currently cannot produce high-accuracy solutions, typically attaining about $0.1\%$ relative error. We present an adversarial approach that overcomes this limitation, which we call competitive PINNs (CPINNs). |
Qi Zeng; Yash Kothari; Spencer H Bryngelson; Florian Tobias Schaefer; |
1426 | Energy-Inspired Self-Supervised Pretraining for Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the fact that forward and backward passes of a deep network naturally form symmetric mappings between input and output representations, we introduce a simple yet effective self-supervised vision model pretraining framework inspired by energy-based models (EBMs). |
Ze Wang; Jiang Wang; Zicheng Liu; Qiang Qiu; |
1427 | Effectively Modeling Time Series with Simple Discrete State Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For efficient training and inference, we introduce an algorithm that reduces the memory and compute of a forward pass with the companion matrix. |
Michael Zhang; Khaled Kamal Saab; Michael Poli; Tri Dao; Karan Goel; Christopher Re; |
1428 | Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce two generalized objectives for reward-function learning, inspired from the classical learning-to-rank literature. |
Yihao Feng; Shentao Yang; Shujian Zhang; Jianguo Zhang; Caiming Xiong; Mingyuan Zhou; Huan Wang; |
1429 | Supervision Complexity and Its Role in Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision complexity: a measure of alignment between teacher-provided supervision and the student’s neural tangent kernel. |
Hrayr Harutyunyan; Ankit Singh Rawat; Aditya Krishna Menon; Seungyeon Kim; Sanjiv Kumar; |
1430 | Transferable Unlearnable Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their unlearnable effects significantly decrease when used in other training settings and datasets. To tackle this issue, we propose a novel unlearnable strategy based on Clustering Separability Discriminant (CSD), which aims to better transfer the unlearnable effects to other training settings and datasets by enhancing the linear separability. |
Jie Ren; Han Xu; Yuxuan Wan; Xingjun Ma; Lichao Sun; Jiliang Tang; |
1431 | Random Laplacian Features for Learning with Hyperbolic Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simpler approach: learn a hyperbolic embedding of the input, then map once from it to Euclidean space using a mapping that encodes geometric priors by respecting the isometries of hyperbolic space, and finish with a standard Euclidean network. |
Tao Yu; Christopher De Sa; |
1432 | Neural Causal Models for Counterfactual Identification and Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the evaluation of counterfactual statements through neural models. |
Kevin Muyuan Xia; Yushu Pan; Elias Bareinboim; |
1433 | SIMPLE: A Gradient Estimator for K-Subset Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we fall back to discrete $k$-subset sampling on the forward pass. |
kareem ahmed; Zhe Zeng; Mathias Niepert; Guy Van den Broeck; |
1434 | Learning Iterative Neural Optimizers for Image Steganography Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue that image steganography is inherently performed on the (elusive) manifold of natural images, and propose to train an iterative neural network to perform the optimization steps. |
Varsha Kishore; Xiangyu Chen; Kilian Q Weinberger; |
1435 | Confidence-Conditioned Value Functions for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To do so, in this work, we propose learning value functions that additionally condition on the degree of conservatism, which we dub confidence-conditioned value functions. |
Joey Hong; Aviral Kumar; Sergey Levine; |
1436 | On The Sensitivity of Reward Inference to Misspecified Human Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the impact of assuming wrong human models on reward learning. |
Joey Hong; Kush Bhatia; Anca Dragan; |
1437 | How Much Data Are Augmentations Worth? An Investigation Into Scaling Laws, Invariance, and Implicit Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we disentangle several key mechanisms through which data augmentations operate. |
Jonas Geiping; Micah Goldblum; Gowthami Somepalli; Ravid Shwartz-Ziv; Tom Goldstein; Andrew Gordon Wilson; |
1438 | Fundamental Limits on The Robustness of Image Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that image classifiers are fundamentally sensitive to small perturbations in their inputs. |
Zheng Dai; David Gifford; |
1439 | Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. |
Zhong Yi Wan; Leonardo Zepeda-Nunez; Anudhyan Boral; Fei Sha; |
1440 | Understanding Influence Functions and Datamodels Via Harmonic Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper establishes connections between datamodels, influence functions and Fourier coefficients using theoretical tools from harmonic analysis of Boolean functions. |
Nikunj Saunshi; Arushi Gupta; Mark Braverman; Sanjeev Arora; |
1441 | BC-IRL: Learning Generalizable Reward Functions from Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce BC-IRL a new inverse reinforcement learning method that learns reward functions that generalize better when compared to maximum-entropy IRL approaches. |
Andrew Szot; Amy Zhang; Dhruv Batra; Zsolt Kira; Franziska Meier; |
1442 | TextGrad: Advancing Robustness Evaluation in NLP By Gradient-Driven Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap, we propose TextGrad, a new attack generator using gradient-driven optimization, supporting high-accuracy and high-quality assessment of adversarial robustness in NLP. |
Bairu Hou; Jinghan Jia; Yihua Zhang; Guanhua Zhang; Yang Zhang; Sijia Liu; Shiyu Chang; |
1443 | Information Plane Analysis for Dropout Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we show how the stochasticity induced by dropout layers can be utilized to estimate MI in a theoretically sound manner. |
Linara Adilova; Bernhard C Geiger; Asja Fischer; |
1444 | Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore an alternative update for the actor, based on an extension of the cross entropy method (CEM) to condition on inputs (states). |
Samuel Neumann; Sungsu Lim; Ajin George Joseph; Yangchen Pan; Adam White; Martha White; |
1445 | Characteristic Neural Ordinary Differential Equation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Characteristic-Neural Ordinary Differential Equations (C-NODEs), a framework for extending Neural Ordinary Differential Equations (NODEs) beyond ODEs. |
Xingzi Xu; Ali Hasan; Khalil Elkhalil; Jie Ding; Vahid Tarokh; |
1446 | Fast Sampling of Diffusion Models with Exponential Integrator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to develop a fast sampling method for DMs with a much less number of steps while retaining high sample quality. |
Qinsheng Zhang; Yongxin Chen; |
1447 | GDDIM: Generalized Denoising Diffusion Implicit Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to extend the denoising diffusion implicit model (DDIM) to general diffusion models~(DMs) besides isotropic diffusions. |
Qinsheng Zhang; Molei Tao; Yongxin Chen; |
1448 | Panning for Gold in Federated Learning: Targeted Text Extraction Under Arbitrarily Large-Scale Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first attack on FL that achieves targeted extraction of sequences that contain privacy-critical phrases, whereby we employ maliciously modified parameters to allow the transformer itself to filter relevant sequences from aggregated user data and encode them in the gradient update. |
Hong-Min Chu; Jonas Geiping; Liam H Fowl; Micah Goldblum; Tom Goldstein; |
1449 | Artificial Neuronal Ensembles with Learned Context Dependent Gating Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Learned Context Dependent Gating (LXDG), a method to flexibly allocate and recall `artificial neuronal ensembles’, using a particular network structure and a new set of regularization terms. |
Matthew James Tilley; Michelle Miller; David Freedman; |
1450 | Learning Language Representations with Logical Inductive Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we seek to go further beyond and explore a new logical inductive bias for better language representation learning. |
Jianshu Chen; |
1451 | How Does Self-supervised Learning Work? A Representation Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to theoretically understand a special kind of SSL approaches based on pre-training and fine-tuning. |
Yiwen Kou; Zixiang Chen; Yuan Cao; Quanquan Gu; |
1452 | Provable Robustness Against Wasserstein Distribution Shifts Via Input Randomization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. |
Aounon Kumar; Alexander Levine; Tom Goldstein; Soheil Feizi; |
1453 | Denoising Diffusion Samplers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore here a similar idea to sample approximately from unnormalized probability density functions and estimate their normalizing constants. |
Francisco Vargas; Will Sussman Grathwohl; Arnaud Doucet; |
1454 | How I Learned to Stop Worrying and Love Retraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Going a step further, we propose similarly imposing a budget on the initial dense training phase and show that the resulting simple and efficient method is capable of outperforming significantly more complex or heavily parameterized state-of-the-art approaches that attempt to sparsify the network during training. |
Max Zimmer; Christoph Spiegel; Sebastian Pokutta; |
1455 | GOGGLE: Generative Modelling for Tabular Data By Learning Relational Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we learn and exploit relational structure underlying tabular data to better model variable dependence, and as a natural means to introduce regularization on relationships and include prior knowledge. |
Tennison Liu; Zhaozhi Qian; Jeroen Berrevoets; Mihaela van der Schaar; |
1456 | Progressive Prompts: Continual Learning for Language Models Without Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Progressive Prompts – a simple and efficient approach for continual learning in language models. |
Anastasia Razdaibiedina; Yuning Mao; Rui Hou; Madian Khabsa; Mike Lewis; Amjad Almahairi; |
1457 | Deep Learning From Crowdsourced Labels: Coupled Cross-Entropy Minimization, Identifiability, and Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our analysis reveals for the first time that the CCEM can indeed correctly identify the annotators’ confusion characteristics and the desired “ground-truth” neural classifier under realistic conditions, e.g., when only incomplete annotator labeling and finite samples are available. |
Shahana Ibrahim; Tri Nguyen; Xiao Fu; |
1458 | Projective Proximal Gradient Descent for Nonconvex Nonsmooth Optimization: Fast Convergence Without Kurdyka-Lojasiewicz (KL) Property Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Projected Proximal Gradient Descent (PPGD) which solves a class of nonconvex and nonsmooth optimization problems, where the nonconvexity nd nonsmoothness come from a nonsmooth regularization term which is nonconvex but piecewise convex. |
Yingzhen Yang; Ping Li; |
1459 | First Steps Toward Understanding The Extrapolation of Nonlinear Models to Unseen Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$, where $f_i$ is an \emph{arbitrary} function on the subset of features $x_i$, can extrapolate to unseen distributions, if the covariance of the features is well-conditioned. |
Kefan Dong; Tengyu Ma; |
1460 | Variable Compositionality Reliably Emerges in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a variation-based framework for interpreting the mappings produced by neural networks in emergent communication games and find that they reliably exhibit straight-forward compositional structure, with a degree of natural language-like variation that obscures their compositionality under measures used in previous work. |
Henry Conklin; Kenny Smith; |
1461 | Multiple Sequence Alignment As A Sequence-to-sequence Learning Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce BetaAlign, a novel methodology for aligning sequences using a natural language processing (NLP) approach. |
Edo Dotan; Yonatan Belinkov; Oren Avram; Elya Wygoda; Noa Ecker; Michael Alburquerque; Omri Keren; Gil Loewenthal; Tal Pupko; |
1462 | A Mixture-of-Expert Approach to RL-based Dialogue Management Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. |
Yinlam Chow; Azamat Tulepbergenov; Ofir Nachum; Dhawal Gupta; Moonkyung Ryu; Mohammad Ghavamzadeh; Craig Boutilier; |
1463 | F-DM: A Multi-stage Diffusion Model Via Progressive Signal Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose f-DM, a generalized family of DMs which allows progressive signal transformation. |
Jiatao Gu; Shuangfei Zhai; Yizhe Zhang; Miguel Ángel Bautista; Joshua M. Susskind; |
1464 | Progressive Mix-Up for Few-Shot Supervised Multi-Source Domain Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The multi-source setting further prevents the transfer task as excessive domain gap introduced from all the source domains. To tackle this problem, we newly propose a progressive mix-up (P-Mixup) mechanism to introduce an intermediate mix-up domain, pushing both the source domains and the few-shot target domain aligned to this mix-up domain. |
Ronghang Zhu; Ronghang Zhu; Xiang Yu; Sheng Li; |
1465 | Efficient Approximation of Neural Population Structure and Correlations with Probabilistic Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a computationally efficient framework to model a wide range of population structures with high order correlations and a large number of neurons. |
Koosha Khalvati; Samantha Johnson; Stefan Mihalas; Michael A Buice; |
1466 | Exploring Perceptual Straightness in Learned Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the relationship between network architecture, robustness, biologically-inspired filtering mechanisms, and representational straightness in response to time-varying input; we identify curvature as a useful way of evaluating neural network representations. |
Anne Harrington; Vasha DuTell; Ayush Tewari; Mark Hamilton; Simon Stent; Ruth Rosenholtz; William T. Freeman; |
1467 | Is Forgetting Less A Good Inductive Bias for Forward Transfer? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that the measure of forward transfer to a task should not be affected by the restricted updates on the task by the continual learner to preserve previous tasks. |
Jiefeng Chen; Arslan Chaudhry; Timothy Nguyen; Dilan Gorur; |
1468 | Learning Structured Representations By Embedding Class Hierarchy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to learn structured representations that preserve the hierarchy between label classes by using CPCC as a regularizer. |
Siqi Zeng; Remi Tachet des Combes; Han Zhao; |
1469 | TypeT5: Seq2seq Type Inference Using Static Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new type inference method that treats type prediction as a code completion task by leveraging CodeT5, a state-of-the-art seq2seq pre-trained language model for code. |
Jiayi Wei; Greg Durrett; Isil Dillig; |
1470 | Dichotomy of Control: Separating What You Can Control from What You Cannot Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the dichotomy of control (DoC), a future-conditioned supervised learning framework that separates mechanisms within a policy’s control (actions) from those outside of a policy’s control (environment stochasticity). |
Sherry Yang; Dale Schuurmans; Pieter Abbeel; Ofir Nachum; |
1471 | Revisiting Curiosity for Exploration in Procedurally Generated Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider lifelong and episodic curiosities used in prior works, and compare the performance of all lifelong-episodic combinations on the commonly used MiniGrid benchmark. |
Kaixin Wang; Kuangqi Zhou; Bingyi Kang; Jiashi Feng; Shuicheng YAN; |
1472 | Can Neural Networks Learn Implicit Logic from Physical Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We test the hypothesis that neural networks without inherent inductive biases for logical reasoning can acquire an implicit representation of negation and disjunction. |
Aaron Traylor; Roman Feiman; Ellie Pavlick; |
1473 | ESCHER: Eschewing Importance Sampling in Games By Computing A History Value Function to Estimate Regret Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose an unbiased model-free method that does not require any importance sampling. |
Stephen Marcus McAleer; Gabriele Farina; Marc Lanctot; Tuomas Sandholm; |
1474 | Serving Graph Compression for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study graph compression to reduce the storage requirement for GNN in serving. |
Si Si; Felix Yu; Ankit Singh Rawat; Cho-Jui Hsieh; Sanjiv Kumar; |
1475 | On The Specialization of Neural Modules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end we introduce a minimal space of datasets motivated by practical systematic generalization benchmarks. |
Devon Jarvis; Richard Klein; Benjamin Rosman; Andrew M Saxe; |
1476 | HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a large prediction discrepancy often diminishes the benefits of knowledge distillation. To address this challenge, we propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning. |
Chen Liang; Haoming Jiang; Zheng Li; Xianfeng Tang; Bing Yin; Tuo Zhao; |
1477 | FIGARO: Controllable Music Generation Using Learned and Expert Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we release FIGARO, a Transformer-based conditional model trained to generate symbolic music based on a sequence of high-level control codes. |
Dimitri von Rütte; Luca Biggio; Yannic Kilcher; Thomas Hofmann; |
1478 | Language Models Are Multilingual Chain-of-thought Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse languages. |
Freda Shi; Mirac Suzgun; Markus Freitag; Xuezhi Wang; Suraj Srivats; Soroush Vosoughi; Hyung Won Chung; Yi Tay; Sebastian Ruder; Denny Zhou; Dipanjan Das; Jason Wei; |
1479 | Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. |
Rajkumar Ramamurthy; Prithviraj Ammanabrolu; Kianté Brantley; Jack Hessel; Rafet Sifa; Christian Bauckhage; Hannaneh Hajishirzi; Yejin Choi; |
1480 | Learning Multi-scale Local Conditional Probability Models of Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But these models are implicit, and the means by which these networks capture complex global statistical structure, apparently without suffering from the curse of dimensionality, remain a mystery. To study this, we generalize a multi-scale model class motivated by the renormalization group of theoretical physics. |
Zahra Kadkhodaie; Florentin Guth; Stéphane Mallat; Eero P Simoncelli; |
1481 | Subsampling in Large Graphs Using Ricci Curvature Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the asymptotic results about the within-community edge and between-community edge’s OR curvature, we propose a subsampling algorithm based on our theoretical results, the Ollivier-Ricci curvature Gradient-based subsampling (ORG-sub) algorithm. |
Shushan Wu; Huimin Cheng; Jiazhang Cai; Ping Ma; Wenxuan Zhong; |
1482 | Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. |
Donggyun Kim; Jinwoo Kim; Seongwoong Cho; Chong Luo; Seunghoon Hong; |
1483 | Scaling Up and Stabilizing Differentiable Planning with Implicit Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, an issue prevents it from scaling up to larger-scale problems: they need to differentiate through forward iteration layers to compute gradients, which couples forward computation and backpropagation and needs to balance forward planner performance and computational cost of the backward pass. To alleviate this issue, we propose to differentiate through the Bellman fixed-point equation to decouple forward and backward passes for Value Iteration Network and its variants, which enables constant backward cost (in planning horizon) and flexible forward budget and helps scale up to large tasks. |
Linfeng Zhao; Huazhe Xu; Lawson L.S. Wong; |
1484 | Score-based Continuous-time Discrete Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend diffusion models to discrete variables by introducing a stochastic jump process where the reverse process denoises via a continuous-time Markov chain. |
Haoran Sun; Lijun Yu; Bo Dai; Dale Schuurmans; Hanjun Dai; |
1485 | Is Model Ensemble Necessary? Model-based RL Via A Single Model with Lipschitz Regularized Value Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide both practical and theoretical insights on the empirical success of the probabilistic dynamics model ensemble through the lens of Lipschitz continuity. |
Ruijie Zheng; Xiyao Wang; Huazhe Xu; Furong Huang; |
1486 | Disentangling with Biological Constraints: A Theory of Functional Cell Types Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, such disentangled representations are highly sought after in machine learning. Here we mathematically prove that simple biological constraints on neurons, namely nonnegativity and energy efficiency in both activity and weights, promote such sought after disentangled representations by enforcing neurons to become selective for single factors of task variation. |
James C. R. Whittington; Will Dorrell; Surya Ganguli; Timothy Behrens; |
1487 | Learning Rigid Dynamics with Face Interaction Graph Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we introduce the “Face Interaction Graph Network” (FIGNet) which extends beyond GNN-based methods, and computes interactions between mesh faces, rather than nodes. |
Kelsey R Allen; Yulia Rubanova; Tatiana Lopez-Guevara; William F Whitney; Alvaro Sanchez-Gonzalez; Peter Battaglia; Tobias Pfaff; |
1488 | Images As Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train neural nets to execute sequences of synaptic learning rules to sequentially generate natural images (instead of weight matrices). |
Kazuki Irie; Jürgen Schmidhuber; |
1489 | Why (and When) Does Local SGD Generalize Better Than SGD? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The main contributions of this paper include (i) the derivation of an SDE that captures the long-term behavior of Local SGD with a small learning rate, after approaching the manifold of minima, (ii) a comparison between the SDEs of Local SGD and SGD, showing that Local SGD induces a stronger drift term that can result in a stronger effect of regularization, e.g., a faster reduction of sharpness, and (iii) empirical evidence validating that having small learning rate and long enough training time enables the generalization improvement over SGD but removing either of the two conditions leads to no improvement. |
Xinran Gu; Kaifeng Lyu; Longbo Huang; Sanjeev Arora; |
1490 | Depth Separation with Multilayer Mean-Field Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous results often focus on representation power, for example, Safran et al. (2019) constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network. In this paper, we show that this separation is in fact algorithmic: one can learn the function constructed by Safran et al. (2019) using an overparametrized network with polynomially many neurons ef?ciently. |
Yunwei Ren; Mo Zhou; Rong Ge; |
1491 | Analogical Networks for Memory-Modulated 3D Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Analogical Networks, a model that casts fine-grained visual parsing into analogical inference: instead of mapping input scenes to part labels, which is hard to adapt in a few-shot manner to novel inputs, our model retrieves related scenes from memory and their corresponding part structures, and predicts analogous part structures in the input scene, via an end-to-end learnable modulation mechanism. |
Nikolaos Gkanatsios; Mayank Singh; Zhaoyuan Fang; Shubham Tulsiani; Katerina Fragkiadaki; |
1492 | Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide sharp path-dependent generalization and excess risk guarantees for the full-batch Gradient Descent (GD) algorithm on smooth losses (possibly non-Lipschitz, possibly nonconvex). |
Konstantinos Nikolakakis; Farzin Haddadpour; Amin Karbasi; Dionysios Kalogerias; |
1493 | GNNDelete: A General Unlearning Strategy for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of graph unlearning, wherein graph neural network (GNN) model is trained to specified accuracy and then deployed while a sequence of requests arrives to delete graph elements (nodes, edges) from the model. |
Jiali Cheng; George Dasoulas; Huan He; Chirag Agarwal; Marinka Zitnik; |
1494 | ReAct: Synergizing Reasoning and Acting in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. |
Shunyu Yao; Jeffrey Zhao; Dian Yu; Nan Du; Izhak Shafran; Karthik R Narasimhan; Yuan Cao; |
1495 | Towards Convergence to Nash Equilibria in Two-team Zero-sum Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On a brighter note, we propose a first-order method that leverages control theory techniques and under some conditions enjoys last-iterate local convergence to a Nash equilibrium. |
Fivos Kalogiannis; Ioannis Panageas; Emmanouil-Vasileios Vlatakis-Gkaragkounis; |
1496 | DensePure: Understanding Diffusion Models Towards Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the theoretical understanding of why diffusion models are able to improve the certified robustness is still lacking, preventing from further improvement. In this study, we close this gap by analyzing the fundamental properties of diffusion models and establishing the conditions under which they can enhance certified robustness. |
Zhongzhu Chen; Kun Jin; Chaowei Xiao; Jiongxiao Wang; Weili Nie; Mingyan Liu; Anima Anandkumar; Bo Li; Dawn Song; |
1497 | Where to Diffuse, How to Diffuse and How to Get Back: Learning in Multivariate Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study linear Multivariate Diffusion Models (MDMs). |
Raghav Singhal; Mark Goldstein; Rajesh Ranganath; |
1498 | Spatio-temporal Point Processes with Deep Non-stationary Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper develops a deep non-stationary influence kernel for spatio-temporal point processes with a novel parameterization that enables us to well approximate complicated kernels in a low-rank form. |
Zheng Dong; Xiuyuan Cheng; Yao Xie; |
1499 | Scalable Batch-Mode Deep Bayesian Active Learning Via Equivalence Class Annealing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Batch-BALanCe, a scalable batch-mode active learning algorithm, which combines insights from decision-theoretic active learning, combinatorial information measure, and diversity sampling. |
Renyu Zhang; Aly A Khan; Robert L. Grossman; Yuxin Chen; |
1500 | Explicitly Minimizing The Blur Error of Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a new formulation of the reconstruction term for the VAE that specifically penalizes the generation of blurry images while at the same time still maximizing the ELBO under the modeled distribution. |
Gustav Bredell; Kyriakos Flouris; Krishna Chaitanya; Ertunc Erdil; Ender Konukoglu; |
1501 | Is Conditional Generative Modeling All You Need for Decision Making? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent improvements in conditional generative modeling have made it possible to generate high-quality images from language descriptions alone. We investigate whether these methods can directly address the problem of sequential decision-making. |
Anurag Ajay; Yilun Du; Abhi Gupta; Joshua B. Tenenbaum; Tommi S. Jaakkola; Pulkit Agrawal; |
1502 | TEMPERA: Test-Time Prompt Editing Via Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). |
Tianjun Zhang; Xuezhi Wang; Denny Zhou; Dale Schuurmans; Joseph E. Gonzalez; |
1503 | Evaluating Representations with Readout Model Switching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. |
Yazhe Li; Jorg Bornschein; Marcus Hutter; |
1504 | Pseudoinverse-Guided Diffusion Models for Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Models trained for specific inverse problems work well but are limited to their particular use cases, whereas methods that use problem-agnostic models are general but often perform worse empirically. To address this dilemma, we introduce Pseudoinverse-guided Diffusion Models ($\Pi$GDM), an approach that uses problem-agnostic models to close the gap in performance. |
Jiaming Song; Arash Vahdat; Morteza Mardani; Jan Kautz; |
1505 | Planning with Language Models Through Iterative Energy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The typical autoregressive generation procedures of language models preclude sequential refinement of earlier steps, which limits the effectiveness of a predicted plan. In this paper, we suggest an approach towards integrating planning with language models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks. |
Hongyi Chen; Yilun Du; Yiye Chen; Patricio A. Vela; Joshua B. Tenenbaum; |
1506 | Last Layer Re-Training Is Sufficient for Robustness to Spurious Correlations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple method based on retraining the last layer of a neural network which achieves strong results on spurious correlation benchmarks. |
Polina Kirichenko; Pavel Izmailov; Andrew Gordon Wilson; |
1507 | Don’t Forget The Nullspace! Nullspace Occupancy As A Mechanism for Out of Distribution Failure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify a particular failure mode of OoD generalization for discriminative classifiers that is based on test data (from a new domain) lying in the nullspace of features learnt from source data. |
Daksh Idnani; Vivek Madan; Naman Goyal; David J. Schwab; Shanmukha Ramakrishna Vedantam; |
1508 | ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, inspired by the power of contrastive learning in preventing dimensional collapse, we propose a novel normalization layer ContraNorm. |
Xiaojun Guo; Yifei Wang; Tianqi Du; Yisen Wang; |
1509 | Accelerated Single-Call Methods for Constrained Min-Max Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study first-order methods for constrained min-max optimization. |
Yang Cai; Weiqiang Zheng; |
1510 | Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first show that the value function of HiP-MDPs is Lipschitz continuous under certain conditions. We then derive regret bounds for both settings through the lens of Lipschitz continuity. |
Haotian Fu; Jiayu Yao; Omer Gottesman; Finale Doshi-Velez; George Konidaris; |
1511 | The Lie Derivative for Measuring Learned Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters. |
Nate Gruver; Marc Anton Finzi; Micah Goldblum; Andrew Gordon Wilson; |
1512 | Agree to Disagree: Diversity Through Disagreement for Better Transferability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we advocate for learning an ensemble of models which capture a diverse set of predictive features. Towards this, we propose a new algorithm D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data, but disagreement on the OOD data. |
Matteo Pagliardini; Martin Jaggi; François Fleuret; Sai Praneeth Karimireddy; |
1513 | Taking A Step Back with KCal: Multi-Class Kernel-Based Calibration for Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new Kernel-based calibration method called KCal. |
Zhen Lin; Shubhendu Trivedi; Jimeng Sun; |
1514 | SemPPL: Predicting Pseudo-Labels for Better Contrastive Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SEMPPL), that combines labelled and unlabelled data to learn informative representations. |
Matko Bošnjak; Pierre Harvey Richemond; Nenad Tomasev; Florian Strub; Jacob C Walker; Felix Hill; Lars Holger Buesing; Razvan Pascanu; Charles Blundell; Jovana Mitrovic; |
1515 | Transfer Learning with Deep Tabular Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the benefits that representation learning provides for knowledge transfer in the tabular domain. |
Roman Levin; Valeriia Cherepanova; Avi Schwarzschild; Arpit Bansal; C. Bayan Bruss; Tom Goldstein; Andrew Gordon Wilson; Micah Goldblum; |
1516 | Understanding Train-Validation Split in Meta-Learning with Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the benefit of train-validation split for classification problems with neural network models trained by gradient descent. |
Xinzhe Zuo; Zixiang Chen; Huaxiu Yao; Yuan Cao; Quanquan Gu; |
1517 | Revisiting Robustness in Graph Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, because manual inspection of a graph is difficult, it is unclear if the studied perturbations always preserve a core assumption of adversarial examples: that of unchanged semantic content. To address this problem, we introduce a more principled notion of an adversarial graph, which is aware of semantic content change. |
Lukas Gosch; Daniel Sturm; Simon Geisler; Stephan Günnemann; |
1518 | Few-shot Backdoor Attacks Via Neural Tangent Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected. We pose this attack as a novel bilevel optimization problem: construct strong poison examples that maximize the attack success rate of the trained model. |
Jonathan Hayase; Sewoong Oh; |
1519 | Hyperparameter Optimization Through Neural Network Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple and efficient way for optimizing hyperparameters inspired by the marginal likelihood, an optimization objective that requires no validation data. |
Bruno Kacper Mlodozeniec; Matthias Reisser; Christos Louizos; |
1520 | Symmetries, Flat Minima and The Conserved Quantities of Gradient Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for finding continuous symmetries in the parameter space, which give rise to the low-loss valleys.Importantly, we introduce a novel set of nonlinear, data-dependent symmetries for neural networks. |
Bo Zhao; Iordan Ganev; Robin Walters; Rose Yu; Nima Dehmamy; |
1521 | Hebbian Deep Learning Without Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, grounded on recent theory for Hebbian learning in soft winner-take-all networks, we present multilayer SoftHebb, i.e. an algorithm that trains deep neural networks, without any feedback, target, or error signals. |
Adrien Journé; Hector Garcia Rodriguez; Qinghai Guo; Timoleon Moraitis; |
1522 | Making Better Decision By Directly Planning in Continuous Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel POMP algorithm with a D3P planner module to achieve the efficient planning in the continuous action space control problem. |
Jinhua Zhu; Yue Wang; Lijun Wu; Tao Qin; Wengang Zhou; Tie-Yan Liu; Houqiang Li; |
1523 | (Certified!!) Adversarial Robustness for Free! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we show how to achieve state-of-the-art certified adversarial robustness to 2-norm bounded perturbations by relying exclusively on off-the-shelf pretrained models. |
Nicholas Carlini; J Zico Kolter; Florian Tramer; Krishnamurthy Dj Dvijotham; Leslie Rice; Mingjie Sun; |
1524 | MMVAE+: Enhancing The Generative Quality of Multimodal VAEs Without Compromises Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular mixture-based models achieve good coherence only at the expense of sample diversity and a resulting lack of generative quality. We present a novel variant of the mixture-of-experts multimodal variational autoencoder that improves its generative quality, while maintaining high semantic coherence. |
Emanuele Palumbo; Imant Daunhawer; Julia E Vogt; |
1525 | Scaling Laws For Deep Learning Based Image Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study whether major performance gains are expected from scaling up the training set size. |
Tobit Klug; Reinhard Heckel; |
1526 | Canary in A Coalmine: Better Membership Inference with Ensembled Adversarial Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. |
Yuxin Wen; Arpit Bansal; Hamid Kazemi; Eitan Borgnia; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
1527 | Revisiting Populations in Multi-agent Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we advocate for an alternate population-level training paradigm for referential games based on the idea of partitioning the agents into sender-receiver pairs and limiting co-adaptation across pairs. |
Paul Michel; Mathieu Rita; Kory Wallace Mathewson; Olivier Tieleman; Angeliki Lazaridou; |
1528 | Sequential Gradient Coding For Straggler Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the distributed computation of a sequence of gradients $\{g(1),g(2),\ldots,g(J)\}$, where processing of each gradient $g(t)$ starts in round-$t$ and finishes by round-$(t+T)$. |
Nikhil Krishnan Muralee Krishnan; MohammadReza Ebrahimi; Ashish J Khisti; |
1529 | Learning MLPs on Graphs: A Unified View of Effectiveness, Robustness, and Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we ascribe the lack of effectiveness and robustness to three significant challenges: 1) the misalignment between content feature and label spaces, 2) the strict hard matching to teacher’s output, and 3) the sensitivity to node feature noises. |
Yijun Tian; Chuxu Zhang; Zhichun Guo; Xiangliang Zhang; Nitesh Chawla; |
1530 | Generating Sequences By Learning to Self-Correct Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Self-Correction, an approach that decouples an imperfect base generator (an off-the-shelf language model or supervised sequence-to-sequence model) from a separate corrector that learns to iteratively correct imperfect generations. |
Sean Welleck; Ximing Lu; Peter West; Faeze Brahman; Tianxiao Shen; Daniel Khashabi; Yejin Choi; |
1531 | Learning About Progress From Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the use of expert demonstrations in long-horizon tasks to learn a monotonically increasing function that summarizes progress. |
Jake Bruce; Ankit Anand; Bogdan Mazoure; Rob Fergus; |
1532 | Learning Fair Graph Representations Via Automated Data Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method, known as Graphair, to learn fair representations based on automated graph data augmentations. |
Hongyi Ling; Zhimeng Jiang; Youzhi Luo; Shuiwang Ji; Na Zou; |
1533 | Emergence of Maps in The Memories of Blind Navigation Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation. |
Erik Wijmans; Manolis Savva; Irfan Essa; Stefan Lee; Ari S. Morcos; Dhruv Batra; |
1534 | Latent Neural ODEs with Sparse Bayesian Multiple Shooting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimized in parallel, while ensuring probabilistic control on continuity over consecutive segments. |
Valerii Iakovlev; Cagatay Yildiz; Markus Heinonen; Harri Lähdesmäki; |
1535 | $\mathcal{O}$-GNN: Incorporating Ring Priors Into Molecular Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a new variant of GNN, ring-enhanced GNN ($\mathcal{O}$-GNN), that explicitly models rings in addition to atoms and bonds in compounds. |
Jinhua Zhu; Kehan Wu; Bohan Wang; Yingce Xia; Shufang Xie; Qi Meng; Lijun Wu; Tao Qin; Wengang Zhou; Houqiang Li; Tie-Yan Liu; |
1536 | Transformers Learn Shortcuts to Automata Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This raises the question: what solutions are these shallow and non-recurrent models finding? We investigate this question in the setting of learning automata, discrete dynamical systems naturally suited to recurrent modeling and expressing algorithmic tasks. |
Bingbin Liu; Jordan T. Ash; Surbhi Goel; Akshay Krishnamurthy; Cyril Zhang; |
1537 | Obtaining More Generalizable Fair Classifiers on Imbalanced Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a theoretically principled, yet flexible approach that encourages both classification and fairness generalization and can be flexibly combined with many existing fair learning methods with logits-based losses. |
Zhun Deng; Jiayao Zhang; Linjun Zhang; Ting Ye; Yates Coley; Weijie J Su; James Zou; |
1538 | Understanding The Robustness of Self-supervised Learning Through Topic Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the context of topic modeling and highlight a key advantage of self-supervised learning – when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model, and hence is less susceptible to model misspecification. |
Zeping Luo; Shiyou Wu; Cindy Weng; Mo Zhou; Rong Ge; |
1539 | Provably Efficient Neural Offline Reinforcement Learning Via Perturbed Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel offline reinforcement learning (RL) algorithm, namely PEturbed-Reward Value Iteration (PERVI) which amalgamates the randomized value function idea with the pessimism principle. |
Thanh Nguyen-Tang; Raman Arora; |
1540 | Neuromechanical Autoencoders: Learning to Couple Elastic and Neural Network Nonlinearity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Just as deep neural networks provide flexible and massively-parametric function approximators for perceptual and control tasks, cellular solid metamaterials are promising as a rich and learnable space for approximating a variety of actuation tasks. In this work we take advantage of these complementary computational concepts to co-design materials and neural network controls to achieve nonintuitive mechanical behavior. |
Deniz Oktay; Mehran Mirramezani; Eder Medina; Ryan P Adams; |
1541 | Towards Universal Visual Reward and Representation Via Value-Implicit Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce $\textbf{V}$alue-$\textbf{I}$mplicit $\textbf{P}$re-training (VIP), a self-supervised pre-trained visual representation capable of generating dense and smooth reward functions for unseen robotic tasks. |
Yecheng Jason Ma; Shagun Sodhani; Dinesh Jayaraman; Osbert Bastani; Vikash Kumar; Amy Zhang; |
1542 | Bridging The Gap to Real-World Object-Centric Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. |
Maximilian Seitzer; Max Horn; Andrii Zadaianchuk; Dominik Zietlow; Tianjun Xiao; Carl-Johann Simon-Gabriel; Tong He; Zheng Zhang; Bernhard Schölkopf; Thomas Brox; Francesco Locatello; |
1543 | Towards A Unified Theoretical Understanding of Non-contrastive Learning Via Rank Differential Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new understanding for non-contrastive learning, named the Rank Differential Mechanism (RDM). |
Zhijian Zhuo; Yifei Wang; Jinwen Ma; Yisen Wang; |
1544 | Stay Moral and Explore: Learn to Behave Morally in Text-based Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a general framework named Moral Awareness Adaptive Learning (MorAL) that enhances the morality capacity of an agent using a plugin moral-aware learning model. |
Zijing Shi; Meng Fang; Yunqiu Xu; Ling Chen; Yali Du; |
1545 | Learning to Induce Causal Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we instead treat the inference process as a black box and design a neural network architecture that learns the mapping from both observational and interventional data to graph structures via supervised training on synthetic graphs. |
Nan Rosemary Ke; Silvia Chiappa; Jane X Wang; Jorg Bornschein; Anirudh Goyal; Melanie Rey; Theophane Weber; Matthew Botvinick; Michael Curtis Mozer; Danilo Jimenez Rezende; |
1546 | Geometrically Regularized Autoencoders for Non-Euclidean Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the recent surge of interest in machine learning problems involving non-Euclidean data, in this paper we address the regularization of autoencoders on curved spaces. |
Cheongjae Jang; Yonghyeon Lee; Yung-Kyun Noh; Frank C. Park; |
1547 | Online Boundary-Free Continual Learning By Scheduled Data Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a scheduled transfer of previously learned knowledge. |
Hyunseo Koh; Minhyuk Seo; Jihwan Bang; Hwanjun Song; Deokki Hong; Seulki Park; Jung-Woo Ha; Jonghyun Choi; |
1548 | Efficient Learning of Rationalizable Equilibria in General-Sum Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop provably efficient algorithms for finding approximate CE and CCE that are also rationalizable. |
Yuanhao Wang; Dingwen Kong; Yu Bai; Chi Jin; |
1549 | Energy-Based Test Sample Adaptation for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose energy-based sample adaptation at test time for domain generalization. |
Zehao Xiao; Xiantong Zhen; Shengcai Liao; Cees G. M. Snoek; |
1550 | Revisiting Adapters with Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We establish that using the classification token of a Vision Transformer (ViT) as an adapter is enough to match the classification performance of dual normalization layers, while using significantly less additional parameters. |
Sylvestre-Alvise Rebuffi; Francesco Croce; Sven Gowal; |
1551 | EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we design EPISODE, the very first algorithm to solve FL problems with heterogeneous data in the nonconvex and relaxed smoothness setting. |
Michael Crawshaw; Yajie Bao; Mingrui Liu; |
1552 | On The Trade-Off Between Actionable Explanations and The Right to Be Forgotten Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To date it is unknown whether these two principles can be operationalized simultaneously. Therefore, we introduce and study the problem of recourse invalidation in the context of data deletion requests. |
Martin Pawelczyk; Tobias Leemann; Asia Biega; Gjergji Kasneci; |
1553 | Computing All Optimal Partial Transports Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the question of computing the OT-profile. |
Abhijeet Phatak; Sharath Raghvendra; CHITTARANJAN TRIPATHY; Kaiyi Zhang; |
1554 | Probabilistically Robust Recourse: Navigating The Trade-offs Between Costs and Robustness in Algorithmic Recourse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, prior approaches do not provide end users with any agency over navigating the aforementioned trade-offs. In this work, we address the above challenges by proposing the first algorithmic framework which enables users to effectively manage the recourse cost vs. robustness trade-offs. |
Martin Pawelczyk; Teresa Datta; Johan Van den Heuvel; Gjergji Kasneci; Himabindu Lakkaraju; |
1555 | Pseudo-label Training and Model Inertia in Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study inertia effects under different training settings and we identify distribution simplification as a mechanism behind the observed results. |
Benjamin Hsu; Anna Currey; Xing Niu; Maria Nadejde; Georgiana Dinu; |
1556 | HyperDeepONet: Learning Operator with Complex Target Function Space Using The Limited Resources Via Hypernetwork Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose HyperDeepONet, which uses the expressive power of the hypernetwork to enable learning of a complex operator with smaller set of parameters. |
Jae Yong Lee; SungWoong CHO; Hyung Ju Hwang; |
1557 | Latent Graph Inference Using Product Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we generalize the discrete Differentiable Graph Module (dDGM) for latent graph learning. |
Haitz Sáez de Ocáriz Borde; Anees Kazi; Federico Barbero; Pietro Lio; |
1558 | Meta Temporal Point Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to train TPPs in a meta learning framework, where each sequence is treated as a different task, via a novel framing of TPPs as neural processes (NPs).We introduce context sets to model TPPs as an instantiation of NPs. |
Wonho Bae; Mohamed Osama Ahmed; Frederick Tung; Gabriel L. Oliveira; |
1559 | Implicit Bias of Large Depth Networks: A Notion of Rank for Nonlinear Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the representation cost of fully connected neural networks with homogeneous nonlinearities – which describes the implicit bias in function space of networks with L2-regularization or with losses such as the cross-entropy – converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions. |
Arthur Jacot; |
1560 | Neural Compositional Rule Learning for Knowledge Graph Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an end-to-end neural model for learning compositional logic rules called NCRL. |
Kewei Cheng; Nesreen Ahmed; Yizhou Sun; |
1561 | MetaGL: Evaluation-Free Selection of Graph Learning Models Via Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop the first meta-learning approach for evaluation-free graph learning model selection, called MetaGL, which utilizes the prior performances of existing methods on various benchmark graph datasets to automatically select an effective model for the new graph, without any model training or evaluations. |
Namyong Park; Ryan A. Rossi; Nesreen Ahmed; Christos Faloutsos; |
1562 | A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop Neural-IVP, a new method for solving initial value PDEs with Neural Networks that is both stable and scalable. |
Marc Anton Finzi; Andres Potapczynski; Matthew Choptuik; Andrew Gordon Wilson; |
1563 | Quantifying and Mitigating The Impact of Label Errors on Model Disparity Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards mitigating the impact of training-time label error, we present an approach to estimate how changing a single training input’s label affects a model’s group disparity metric on a test set. |
Julius Adebayo; Melissa Hall; Bowen Yu; Bobbie Chern; |