Paper Digest: ICML 2023 Highlights
The Internationl Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2023, it is to be held in Hawaii, US.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICML 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Bayesian Estimation of Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel, more efficient Bayesian approach that brings privacy estimates within the reach of practitioners. |
Santiago Zanella-Beguelin; Lukas Wutschitz; Shruti Tople; Ahmed Salem; Victor Rühle; Andrew Paverd; Mohammad Naseri; Boris Köpf; Daniel Jones; |
2 | Adaptive Estimation of Graphical Models Under Total Positivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an adaptive multiple-stage estimation method, which refines the estimate by solving a weighted $\ell_1$-regularized problem in each stage. |
Jiaxi Ying; José Vinícius De Miranda Cardoso; Daniel P. Palomar; |
3 | GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). |
Hanjing Wang; Man-Kit Sit; Congjie He; Ying Wen; Weinan Zhang; Jun Wang; Yaodong Yang; Luo Mai; |
4 | Disentangled Multi-Fidelity Deep Bayesian Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework called Disentangled Multi-fidelity Deep Bayesian Active Learning (D-MFDAL), which learns the surrogate models conditioned on the distribution of functions at multiple fidelities. |
Dongxia Wu; Ruijia Niu; Matteo Chinazzi; Yian Ma; Rose Yu; |
5 | Understand and Modularize Generator Optimization in ELECTRA-style Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the effectiveness of ELECTRA-style pre-training, their performance is dependent on the careful selection of the model size for the auxiliary generator, leading to high trial-and-error costs. In this paper, we present the first systematic study of this problem. |
Chengyu Dong; Liyuan Liu; Hao Cheng; Jingbo Shang; Jianfeng Gao; Xiaodong Liu; |
6 | NeRFool: Uncovering The Vulnerability of Generalizable Neural Radiance Fields Against Adversarial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present NeRFool, which to the best of our knowledge is the first work that sets out to understand the adversarial robustness of GNeRF. |
Yonggan Fu; Ye Yuan; Souvik Kundu; Shang Wu; Shunyao Zhang; Celine Lin; |
7 | Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. |
Jiacheng Guo; Zihao Li; Huazheng Wang; Mengdi Wang; Zhuoran Yang; Xuezhou Zhang; |
8 | Block Subsampled Randomized Hadamard Transform for Nyström Approximation on Distributed Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). |
Oleg Balabanov; Matthias Beaupère; Laura Grigori; Victor Lederer; |
9 | Unconstrained Online Learning with Unbounded Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a new setting for online learning with unbounded domains and non-Lipschitz losses. |
Andrew Jacobsen; Ashok Cutkosky; |
10 | Optimistic Planning By Regularized Dynamic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. |
Antoine Moulin; Gergely Neu; |
11 | Autoregressive Diffusion Model for Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an *autoregressive diffusion* model for graph generation. |
Lingkai Kong; Jiaming Cui; Haotian Sun; Yuchen Zhuang; B. Aditya Prakash; Chao Zhang; |
12 | Differentiable Tree Operations Promote Compositional Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. |
Paul Soulos; Edward J Hu; Kate McCurdy; Yunmo Chen; Roland Fernandez; Paul Smolensky; Jianfeng Gao; |
13 | Can Neural Network Memorization Be Localized? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. |
Pratyush Maini; Michael Curtis Mozer; Hanie Sedghi; Zachary Chase Lipton; J Zico Kolter; Chiyuan Zhang; |
14 | Domain Adaptation for Time Series Under Feature and Label Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present RAINCOAT, the first model for both closed-set and universal domain adaptation on complex time series. |
Huan He; Owen Queen; Teddy Koker; Consuelo Cuevas; Theodoros Tsiligkaridis; Marinka Zitnik; |
15 | Towards Sustainable Learning: Coresets for Data-efficient Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks. |
Yu Yang; Kang Hao; Baharan Mirzasoleiman; |
16 | On Enhancing Expressive Power Via Compositions of Single Fixed-Size ReLU Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the expressive power of deep neural networks through the framework of function compositions. |
Shijun Zhang; Jianfeng Lu; Hongkai Zhao; |
17 | Contextual Reliability: When Different Features Matter in Different Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize a new setting called contextual reliability which accounts for the fact that the right features to use may vary depending on the context.Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability. |
Gaurav Rohit Ghosal; Amrith Setlur; Daniel S. Brown; Anca Dragan; Aditi Raghunathan; |
18 | On Data Manifolds Entailed By Structural Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we characterize the data manifolds entailed by structural causal models. |
Ricardo Dominguez-Olmedo; Amir-Hossein Karimi; Georgios Arvanitidis; Bernhard Schölkopf; |
19 | Are Neurons Actually Collapsed? On The Fine-Grained Structure in Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has observed an intriguing Neural Collapse” phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations. |
Yongyi Yang; Jacob Steinhardt; Wei Hu; |
20 | Fast Sampling of Diffusion Models Via Operator Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. |
Hongkai Zheng; Weili Nie; Arash Vahdat; Kamyar Azizzadenesheli; Anima Anandkumar; |
21 | Unsupervised Out-of-Distribution Detection with Diffusion Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task — Lift, Map, Detect (LMD) — that leverages recent advancement in diffusion models. |
Zhenzhen Liu; Jin Peng Zhou; Yufan Wang; Kilian Q Weinberger; |
22 | Sequence Modeling with Multiresolution Convolutional Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. |
Jiaxin Shi; Ke Alexander Wang; Emily Fox; |
23 | The Hessian Perspective Into The Nature of Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove tight upper bounds (with linear activations), which closely follow the empirical trend of the Hessian rank and in practice also hold for more general settings. |
Sidak Pal Singh; Thomas Hofmann; Bernhard Schölkopf; |
24 | Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it remains obscure and underexplored if learning systems can be equipped with similar capabilities of automatically discovering such key geometric quantities from doing tasks. In this work, we therefore for the first time formulate and propose a novel learning problem on this question and set up a benchmark suite including tasks, data, and evaluation metrics for studying the problem. |
Yijia Weng; Kaichun Mo; Ruoxi Shi; Yanchao Yang; Leonidas Guibas; |
25 | Improved Learning-Augmented Algorithms for The Multi-Option Ski Rental Problem Via Best-Possible Competitive Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present improved learning-augmented algorithms for the multi-option ski rental problem. |
Yongho Shin; Changyeol Lee; Gukryeol Lee; Hyung-Chan An; |
26 | On Regularization and Inference with Label Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, *regularization with constraints* and *constrained inference*, by quantifying their impact on model performance. |
Kaifu Wang; Hangfeng He; Tin D. Nguyen; Piyush Kumar; Dan Roth; |
27 | Simple Disentanglement of Style and Content in Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a simple post-processing framework to disentangle content and style in learned representations from pre-trained vision models. |
Lilian Ngweta; Subha Maity; Alex Gittens; Yuekai Sun; Mikhail Yurochkin; |
28 | Beyond The Edge of Stability Via Two-step Gradient Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a `Sharpness-Minimisation’ implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems, via analysis of two-step gradient updates. |
Lei Chen; Joan Bruna; |
29 | On The Role of Attention in Prompt-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set. |
Samet Oymak; Ankit Singh Rawat; Mahdi Soltanolkotabi; Christos Thrampoulidis; |
30 | Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy. |
Zhongzhi Yu; Yang Zhang; Kaizhi Qian; Cheng Wan; Yonggan Fu; Yongan Zhang; Celine Lin; |
31 | Hyperbolic Representation Learning: Revisiting and Advancing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first introduce a position-tracking mechanism to scrutinize existing prevalent hyperbolic models, revealing that the learned representations are sub-optimal and unsatisfactory. To address this, we propose a simple yet effective method, hyperbolic informed embedding (HIE), by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to the origin (i.e., induced hyperbolic norm) to advance existing hyperbolic models. |
Menglin YANG; min zhou; Zhitao Ying; yankai Chen; Irwin King; |
32 | Learning Belief Representations for Partially Observable Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments. |
Andrew Wang; Andrew C Li; Toryn Q. Klassen; Rodrigo Toro Icarte; Sheila A. McIlraith; |
33 | MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations. |
Jennifer J. Sun; Markus Marks; Andrew Wesley Ulmer; Dipam Chakraborty; Brian Geuther; Edward Hayes; Heng Jia; Vivek Kumar; Sebastian Oleszko; Zachary Partridge; Milan Peelman; Alice Robie; Catherine E Schretter; Keith Sheppard; Chao Sun; Param Uttarwar; Julian Morgan Wagner; Erik Werner; Joseph Parker; Pietro Perona; Yisong Yue; Kristin Branson; Ann Kennedy; |
34 | Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we scale this SOTA method to ImageNet-1K and does its effectiveness on CIFAR transfer to ImageNet-1K? To answer these questions, we first propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with $\sim 6$x reduction in memory footprint. We further discover that it is challenging for MTT to handle datasets with a large number of classes, and propose a novel soft label assignment that drastically improves its convergence. |
Justin Cui; Ruochen Wang; Si Si; Cho-Jui Hsieh; |
35 | MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. |
Jiachen Yao; Chang Su; Zhongkai Hao; Songming Liu; Hang Su; Jun Zhu; |
36 | Polynomial Preconditioning for Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new family of preconditioners generated by the symmetric polynomials. |
Nikita Doikov; Anton Rodomanov; |
37 | Internally Rewarded Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally formulate IRRL and present a class of problems that belong to IRRL. |
Mengdi Li; Xufeng Zhao; Jae Hee Lee; Cornelius Weber; Stefan Wermter; |
38 | Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we establish the existence of highly sparse trainable initializations for evolution strategies (ES) and characterize qualitative differences compared to gradient descent (GD)-based sparse training. |
Robert Tjarko Lange; Henning Sprekeler; |
39 | Slot-VAE: Object-Centric Scene Generation with Slot Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation. |
Yanbo Wang; Letao Liu; Justin Dauwels; |
40 | Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a scalable method, Total Propagation X (TPX) that improves over TP by changing the node used for IVW, and employing coordinate wise weighting. |
Paavo Parmas; Takuma Seno; Yuma Aoki; |
41 | Predicting Rare Events By Shrinking Towards Proportional Odds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present PRESTO, a relaxation of the proportional odds model for ordinal regression. |
Gregory Faletto; Jacob Bien; |
42 | The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, the nature of such *approximation factors*—especially their optimal form in a given learning problem—is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. |
Philip Amortila; Nan Jiang; Csaba Szepesvari; |
43 | Internet Explorer: Targeted Representation Learning on The Open Web Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on a target dataset. |
Alexander Cong Li; Ellis Langham Brown; Alexei A Efros; Deepak Pathak; |
44 | Robust and Private Stochastic Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the stochastic linear bandit problem under the additional requirements of *differential privacy*, *robustness* and *batched observations*. |
Vasileios Charisopoulos; Hossein Esfandiari; Vahab Mirrokni; |
45 | High-dimensional Location Estimation Via Norm Concentration for Subgamma Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build on the theory using *smoothed* estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$-smoothed distribution. |
Shivam Gupta; Jasper C.H. Lee; Eric Price; |
46 | Action Matching: Learning Stochastic Dynamics from Samples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution. |
Kirill Neklyudov; Rob Brekelmans; Daniel Severo; Alireza Makhzani; |
47 | Short-lived High-volume Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an $\ell$-Layered Sieve Policy that recursively refines the action space for $\ell\leq w$ times. |
Su Jia; Nishant Oli; Ian Anderson; Paul Duff; Andrew A Li; Ramamoorthi Ravi; |
48 | Learning Temporally AbstractWorld Models Without Online Experimentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an approach for simultaneously learning sets of skills and temporally abstract, skill-conditioned world models purely from offline data, enabling agents to perform zero-shot online planning of skill sequences for new tasks. |
Benjamin Freed; Siddarth Venkatraman; Guillaume Adrien Sartoretti; Jeff Schneider; Howie Choset; |
49 | Active Policy Improvement from Multiple Black-box Oracles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. |
Xuefeng Liu; Takuma Yoneda; Chaoqi Wang; Matthew Walter; Yuxin Chen; |
50 | Stochastic Gradient Descent-Induced Drift of Representation in A Two-Layer Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network. |
Farhad Pashakhanloo; Alexei Koulakov; |
51 | IRNeXt: Rethinking Convolutional Network Design for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we excavate the potential of the convolutional neural network (CNN) and show that our CNN-based model can receive comparable or better performance than Transformer models with low computation overhead on several image restoration tasks. |
Yuning Cui; Wenqi Ren; Sining Yang; Xiaochun Cao; Alois Knoll; |
52 | TabLeak: Tabular Data Leakage in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. |
Mark Vero; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev; |
53 | Can Large Language Models Reason About Program Invariants? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the application of large language models to invariant prediction, finding that models trained on source code and fine-tuned for invariant generation can perform invariant prediction as static rather than dynamic analysis. |
Kexin Pei; David Bieber; Kensen Shi; Charles Sutton; Pengcheng Yin; |
54 | Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion. |
Alexander Lin; Bahareh Tolooshams; Yves Atchade; Demba E. Ba; |
55 | Stable and Consistent Prediction of 3D Characteristic Orientation Via Invariant Residual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel method to decouple the shape geometry and semantics of the input point cloud to achieve both stability and consistency. |
Seungwook Kim; Chunghyun Park; Yoonwoo Jeong; Jaesik Park; Minsu Cho; |
56 | Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion. |
Pierre-Alexandre Kamienny; Guillaume Lample; sylvain lamprier; Marco Virgolin; |
57 | Mixing Predictions for Online Metric Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. |
Antonios Antoniadis; Christian Coester; Marek Elias; Adam Polak; Bertrand Simon; |
58 | Approximate Causal Effect Identification Under Weak Confounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the effect of weak confounding’ on causal estimands. |
Ziwei Jiang; Lai Wei; Murat Kocaoglu; |
59 | Bootstrap in High Dimension with Low Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the use of bootstraps in high-dimensional environments with a small number of resamples. |
Henry Lam; Zhenyuan Liu; |
60 | Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current research adopts the belief that practical and effective approaches to countering such poisons do not exist. In this paper, we argue that it is time to abandon this belief. |
Zhuoran Liu; Zhengyu Zhao; Martha Larson; |
61 | Proper Losses for Discrete Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We initiate the study of proper losses for evaluating generative models in the discrete setting. |
Dhamma Kimpara; Rafael Frongillo; Bo Waggoner; |
62 | Private Federated Learning with Autotuned Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. |
Enayat Ullah; Christopher A. Choquette-Choo; Peter Kairouz; Sewoong Oh; |
63 | SpotEM: Efficient Video Search for Episodic Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy. |
Santhosh Kumar Ramakrishnan; Ziad Al-Halah; Kristen Grauman; |
64 | DRCFS: Doubly Robust Causal Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. |
Francesco Quinzan; Ashkan Soleymani; Patrick Jaillet; Cristian R. Rojas; Stefan Bauer; |
65 | Open-Vocabulary Universal Image Segmentation with MaskCLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time. |
Zheng Ding; Jieke Wang; Zhuowen Tu; |
66 | Universal Physics-Informed Neural Networks: Symbolic Differential Operator Discovery with Sparse Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we perform symbolic discovery of differential operators in a situation where there is sparse experimental data. |
Lena Podina; Brydon Eastman; Mohammad Kohandel; |
67 | Partial Optimality in Cubic Correlation Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we focus on establishing partial optimality conditions for the special case of complete graphs and cubic objective functions. |
David Stein; Silvia Di Gregorio; Bjoern Andres; |
68 | InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data. |
Yingheng Wang; Yair Schiff; Aaron Gokaslan; Weishen Pan; Fei Wang; Christopher De Sa; Volodymyr Kuleshov; |
69 | A Large-Scale Study of Probabilistic Calibration in Neural Network Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While neural network miscalibration has been studied primarily in classification, we investigate this in the less-explored domain of regression. |
Victor Dheur; Souhaib Ben Taieb; |
70 | Global Optimality of Elman-type RNNs in The Mean-field Regime Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. |
Andrea Agazzi; Jianfeng Lu; Sayan Mukherjee; |
71 | Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. |
Phillip Si; Zeyi Chen; Subham Sekhar Sahoo; Yair Schiff; Volodymyr Kuleshov; |
72 | When Do Minimax-fair Learning and Empirical Risk Minimization Coincide? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. |
Harvineet Singh; Matthäus Kleindessner; Volkan Cevher; Rumi Chunara; Chris Russell; |
73 | Theoretical Behavior of XAI Methods in The Presence of Suppressor Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the majority of the studied approaches will attribute non-zero importance to a non-class-related suppressor feature in the presence of correlated noise. |
Rick Wilming; Leo Kieslich; Benedict Clark; Stefan Haufe; |
74 | Probabilistic Imputation for Time-series Classification with Missing Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel probabilistic framework for classification with multivariate time series data with missing values. |
SeungHyun Kim; Hyunsu Kim; Eunggu Yun; Hwangrae Lee; Jaehun Lee; Juho Lee; |
75 | MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a generic approach to offline PLfO, called Modality-agnostic Adversarial Hypothesis Adaptation for Learning from Observations (MAHALO). |
Anqi Li; Byron Boots; Ching-An Cheng; |
76 | Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed *naturally-occurring* model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. |
Faisal Hamman; Erfaun Noorani; Saumitra Mishra; Daniele Magazzeni; Sanghamitra Dutta; |
77 | Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. |
Matthias Gerstgrasser; David C. Parkes; |
78 | Multi-Agent Learning from Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, some recent works move away from the optimality assumption to study the Learning from a Learner (LfL) problem, where the challenge is inferring the reward function of a learning agent from a sequence of demonstrations produced by progressively improving policies. In this work, we take one of the initial steps in addressing the multi-agent version of this problem and propose a new algorithm, MA-LfL (Multiagent Learning from a Learner). |
Mine Melodi Caliskan; Francesco Chini; Setareh Maghsudi; |
79 | Cut Your Losses with Squentropy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the squentropy loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes.We provide an extensive set of experiment on multi-class classification problems showing that the squentropy loss outperforms both the pure cross-entropy and rescaled square losses in terms of the classification accuracy. |
Like Hui; Mikhail Belkin; Stephen Wright; |
80 | The Statistical Scope of Multicalibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar property $\Gamma$ if and only if $\Gamma$ is *elicitable*. |
Georgy Noarov; Aaron Roth; |
81 | Sparse Learning of Dynamical Systems in RKHS: An Operator-Theoretic Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a method for sparse learning of transfer operators from $\beta$-mixing stochastic processes, in both discrete and continuous time, and provide sample complexity analysis extending existing theoretical guarantees for learning from non-sparse, i.i.d. data. |
Boya Hou; Sina Sanjari; Nathan Dahlin; Subhonmesh Bose; Umesh Vaidya; |
82 | Conformal Prediction with Missing Values Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study conformal prediction with missing values in the covariates — a setting that brings new challenges to uncertainty quantification. |
Margaux Zaffran; Aymeric Dieuleveut; Julie Josse; Yaniv Romano; |
83 | Weakly Supervised Regression with Interval Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a thorough study on RIT. |
Xin Cheng; Yuzhou Cao; Ximing Li; Bo An; Lei Feng; |
84 | Controllable Neural Symbolic Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods, in their current form, lack the capability to incorporate user-defined prior knowledge, which is often required in natural sciences and engineering fields. To overcome this limitation, we propose a novel neural symbolic regression method, named Neural Symbolic Regression with Hypothesis (NSRwH) that enables the explicit incorporation of assumptions about the expected structure of the ground-truth expression into the prediction process. |
Tommaso Bendinelli; Luca Biggio; Pierre-Alexandre Kamienny; |
85 | DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. |
Lisang Ding; Kexin Jin; Bicheng Ying; Kun Yuan; Wotao Yin; |
86 | How Do Transformers Learn Topic Structure: Towards A Mechanistic Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our paper, we provide fine-grained mechanistic understanding of how transformers learn “semantic structure”, understood as capturing co-occurrence structure of words. |
Yuchen Li; Yuanzhi Li; Andrej Risteski; |
87 | Subset-Based Instance Optimality in Private Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new definition of instance optimality for differentially private estimation algorithms. |
Travis Dick; Alex Kulesza; Ziteng Sun; Ananda Theertha Suresh; |
88 | A Statistical Perspective on Retrieval-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a formal treatment of retrieval-based models to characterize their performance via a novel statistical perspective. |
Soumya Basu; Ankit Singh Rawat; Manzil Zaheer; |
89 | Adaptive IMLE for Few-shot Pretraining-free Generative Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a more generalized formulation of IMLE which includes the original formulation as a special case, and we prove that the theoretical guarantees hold under weaker conditions. |
Mehran Aghabozorgi; Shichong Peng; Ke Li; |
90 | Polyhedral Complex Extraction from ReLU Networks Using Edge Subdivision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of regions, we propose to subdivide edges, leading to a novel method for polyhedral complex extraction. |
Arturs Berzins; |
91 | Task-Specific Skill Localization in Fine-tuned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus fine-tuning allows the model to quickly pick up task-specific skills, but there has been limited study of *where* these newly-learnt skills reside inside the massive model. This paper introduces the term *skill localization* for this problem and proposes a solution. |
Abhishek Panigrahi; Nikunj Saunshi; Haoyu Zhao; Sanjeev Arora; |
92 | Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we learn unforeseen robustness by harnessing the variations in the abundant out-of-distribution data. |
Sicheng Zhu; Bang An; Furong Huang; Sanghyun Hong; |
93 | Fast Federated Machine Unlearning with Nonlinear Functional Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a fast FMU algorithm, FFMU, for improving the FMU efficiency while maintaining the unlearning quality. |
Tianshi Che; Yang Zhou; Zijie Zhang; Lingjuan Lyu; Ji Liu; Da Yan; Dejing Dou; Jun Huan; |
94 | FARE: Provably Fair Representation Learning with Practical Certificates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address that challenge and introduce FARE (Fairness with Restricted Encoders), the first FRL method with practical fairness certificates. |
Nikola Jovanović; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev; |
95 | Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation — effective Minkowski dimension. |
Zixuan Zhang; Minshuo Chen; Mengdi Wang; Wenjing Liao; Tuo Zhao; |
96 | BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, representation rehearsal in vision transformers lacks diversity, resulting in overfitting and consequently, performance drops significantly compared to raw image rehearsal. Therefore, we propose BiRT, a novel representation rehearsal-based continual learning approach using vision transformers. |
Kishaan Jeeveswaran; Prashant Shivaram Bhat; Bahram Zonooz; Elahe Arani; |
97 | On The Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, an important challenge of this approach is the representational collapse, where the subspace of the latent representations collapses into a low-dimensional manifold. To address this issue, we propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold by decorrelating the features in the latent space. |
Hojoon Lee; Koanho Lee; Dongyoon Hwang; Hyunho Lee; Byungkun Lee; Jaegul Choo; |
98 | Trainability, Expressivity and Interpretability in Gated Neural ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories. |
Timothy Doyeon Kim; Tankut Can; Kamesh Krishnamurthy; |
99 | Feature Learning in Deep Classifiers Through Intermediate Neural Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an empirical study of the feature learning process in deep classifiers. |
Akshay Rangamani; Marius Lindegaard; Tomer Galanti; tomaso a poggio; |
100 | Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we model the Q function seamlessly with *off-the-shelf* deep generative models such as normalizing flows. |
Owen M Dugan; Peter Y. Lu; Rumen Dangovski; Di Luo; Marin Soljacic; |
101 | Unearthing InSights Into Mars: Unsupervised Source Separation with Limited Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space—an interpretable, low-dimensional representation of stationary processes. |
Ali Siahkoohi; Rudy Morel; Maarten V. de Hoop; Erwan Allys; Grégory Sainton; Taichi Kawamura; |
102 | Doubly Optimal No-Regret Learning in Monotone Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the *accelerated optimistic gradient* (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. |
Yang Cai; Weiqiang Zheng; |
103 | The Test of Tests: A Framework for Differentially Private Hypothesis Testing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a generic framework for creating differentially private versions of any hypothesis test in a black-box way. |
Zeki Kazan; Kaiyan Shi; Adam Groce; Andrew Bray; |
104 | Multi-Symmetry Ensembles: Improving Diversity and Generalization Via Opposing Symmetries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters. |
Charlotte Loh; Seungwook Han; Shivchander Sudalairaj; Rumen Dangovski; Kai Xu; Florian Wenzel; Marin Soljacic; Akash Srivastava; |
105 | When Is Realizability Sufficient for Off-Policy Reinforcement Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class. |
Andrea Zanette; |
106 | Hidden Symmetries of ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. |
Elisenda Grigsby; Kathryn Lindsey; David Rolnick; |
107 | Bootstrapped Representations in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). |
Charline Le Lan; Stephen Tu; Mark Rowland; Anna Harutyunyan; Rishabh Agarwal; Marc G Bellemare; Will Dabney; |
108 | New Metrics and Search Algorithms for Weighted Causal DAGs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, under some standard assumptions, we study causal graph discovery via _adaptive interventions with node-dependent interventional costs_. |
Davin Choo; Kirankumar Shiragur; |
109 | Exact Inference in High-order Structured Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of inference in high-order structured prediction tasks. |
Chuyang Ke; Jean Honorio; |
110 | Path Neural Networks: Expressive and Accurate Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Path Neural Networks (PathNNs), a model that updates node representations by aggregating paths emanating from nodes. |
Gaspard Michel; Giannis Nikolentzos; Johannes F. Lutzeyer; Michalis Vazirgiannis; |
111 | The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of temporal-difference-based policy evaluation in reinforcement learning. |
Mark Rowland; Yunhao Tang; Clare Lyle; Remi Munos; Marc G Bellemare; Will Dabney; |
112 | Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis For DDIM-type Samplers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling. |
Sitan Chen; Giannis Daras; Alex Dimakis; |
113 | On Bridging The Gap Between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the role of depth in the concentration of mean-field predictions for Gram matrices of hidden representations in deep multilayer perceptron (MLP) with batch normalization (BN) at initialization. |
Amir Joudaki; Hadi Daneshmand; Francis Bach; |
114 | Hyperparameters in Reinforcement Learning and How To Tune Them Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. |
Theresa Eimer; Marius Lindauer; Roberta Raileanu; |
115 | Towards Constituting Mathematical Structures for Learning to Optimize Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive the basic mathematical conditions that successful update rules commonly satisfy. |
Jialin Liu; Xiaohan Chen; Zhangyang Wang; Wotao Yin; HanQin Cai; |
116 | Identifiability and Generalizability in Constrained Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes. |
Andreas Schlaginhaufen; Maryam Kamgarpour; |
117 | Adaptive Whitening in Neural Populations with Gain-modulating Interneurons Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing neural circuit models of adaptive whitening operate by modifying synaptic interactions; however, such modifications would seem both too slow and insufficiently reversible. Motivated by the extensive neuroscience literature on gain modulation, we propose an alternative model that adaptively whitens its responses by modulating the gains of individual neurons. |
Lyndon Duong; David Lipshutz; David Heeger; Dmitri Chklovskii; Eero P Simoncelli; |
118 | Optimizing The Collaboration Structure in Cross-Silo Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FedCollab, a novel FL framework that alleviates negative transfer by clustering clients into non-overlapping coalitions based on their distribution distances and data quantities. |
Wenxuan Bao; Haohan Wang; Jun Wu; Jingrui He; |
119 | Smart Initial Basis Selection for Linear Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a learning-based approach for initial basis selection. |
Zhenan Fan; Xinglu Wang; Oleksandr Yakovenko; Abdullah Ali Sivas; Owen Ren; Yong Zhang; Zirui Zhou; |
120 | From Relational Pooling to Subgraph GNNs: A Universal Framework for More Expressive Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Starting from RP, we propose to explicitly assign labels to nodes as additional features to improve graph isomorphism distinguishing power of message passing neural networks. |
Cai Zhou; Xiyuan Wang; Muhan Zhang; |
121 | The Unreasonable Effectiveness of Few-shot Learning for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. |
Xavier Garcia; Yamini Bansal; Colin Cherry; George Foster; Maxim Krikun; Melvin Johnson; Orhan Firat; |
122 | Auxiliary Learning As An Asymmetric Bargaining Game Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach, named AuxiNash, for balancing tasks in auxiliary learning by formalizing the problem as generalized bargaining game with asymmetric task bargaining power. |
Aviv Shamsian; Aviv Navon; Neta Glazer; Kenji Kawaguchi; Gal Chechik; Ethan Fetaya; |
123 | Additive Causal Bandits with Unknown Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To remedy this, we adopt an additional additive assumption on the outcome which allows us to solve the problem by casting it as an additive combinatorial linear bandit problem with full-bandit feedback. We propose a novel action-elimination algorithm for this setting, show how to apply this algorithm to the causal bandit problem, provide sample complexity bounds, and empirically validate our findings on a suite of randomly generated causal models, effectively showing that one does not need to explicitly learn the parents of the outcome to identify the best intervention. |
Alan Malek; Virginia Aglietti; Silvia Chiappa; |
124 | Mitigating Spurious Correlations in Multi-modal Models During Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest. |
Yu Yang; Besmira Nushi; Hamid Palangi; Baharan Mirzasoleiman; |
125 | Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss. |
Ezgi Korkmaz; Jonah Brown-Cohen; |
126 | Efficient Exploration Via Epistemic-Risk-Seeking Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new, differentiable optimistic objective that when optimized yields a policy that provably explores efficiently, with guarantees even under function approximation. |
Brendan O’Donoghue; |
127 | Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes Via Double Kullback-Leibler Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. |
Jian Cao; Myeongjong Kang; Felix Jimenez; Huiyan Sang; Florian Tobias Schaefer; Matthias Katzfuss; |
128 | Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. |
Aditya Mate; Bryan Wilder; Aparna Taneja; Milind Tambe; |
129 | Efficient Quantum Algorithms for Quantum Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present efficient quantum algorithms that are exponentially faster than classical algorithms for solving the quantum optimal control problem. |
Xiantao Li; Chunhao Wang; |
130 | SLAMB: Accelerated Large Batch Training with Sparse Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we combine sparsification-based gradient compression with the layer-wise adaptive moments optimizer for large batch training (LAMB). |
Hang Xu; Wenxuan Zhang; Jiawei Fei; Yuzhe Wu; TingWen Xie; Jun Huang; Yuchen Xie; Mohamed Elhoseiny; Panos Kalnis; |
131 | Cooperation in The Latent Space: The Benefits of Adding Mixture Components in Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show how the mixture components cooperate when they jointly adapt to maximize the ELBO. |
Oskar Kviman; Ricky Molén; Alexandra Hotti; Semih Kurt; Víctor Elvira; Jens Lagergren; |
132 | Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. |
Siyu Chen; Jibang Wu; Yifan Wu; Zhuoran Yang; |
133 | High Probability Convergence of Stochastic Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise. |
Zijian Liu; Ta Duy Nguyen; Thien Hang Nguyen; Alina Ene; Huy Nguyen; |
134 | Towards Understanding and Reducing Graph Structural Noise for GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on understanding and alleviating the effect of graph structural noise on GNN performance. |
Mingze Dong; Yuval Kluger; |
135 | On The Convergence of Gradient Flow on Multi-layer Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2\cdots W_L)$. |
Hancheng Min; Rene Vidal; Enrique Mallada; |
136 | Single Point-Based Distributed Zeroth-Order Optimization with A Non-Convex Stochastic Objective Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a zero-order distributed optimization method based on a one-point estimate of the gradient tracking technique. |
Elissa Mhanna; Mohamad Assaad; |
137 | Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning Via Multi-Level Monte Carlo Actor-Critic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. |
Wesley Suttle; Amrit Bedi; Bhrij Patel; Brian M. Sadler; Alec Koppel; Dinesh Manocha; |
138 | Neural Network Accelerated Implicit Filtering: Integrating Neural Network Surrogates With Provably Convergent Derivative Free Optimization Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce neural network accelerated implicit filtering (NNAIF), a novel family of methods for solving noisy derivative free (i.e. black box, zeroth order) optimization problems. |
Brian Irwin; Eldad Haber; Raviv Gal; Avi Ziv; |
139 | Stochastic Marginal Likelihood Gradients Using Neural Tangent Kernels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. |
Alexander Immer; Tycho F.A. van der Ouderaa; Mark van der Wilk; Gunnar Ratsch; Bernhard Schölkopf; |
140 | Constrained Optimization Via Exact Augmented Lagrangian and Randomized Iterative Sketching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. |
Ilgee Hong; Sen Na; Michael W. Mahoney; mladen kolar; |
141 | FAENet: Frame Averaging Equivariant GNN for Materials Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce (1) a flexible, model-agnostic framework based on stochastic frame averaging that enforces E(3) equivariance or invariance, without any architectural constraints; (2) FAENet: a simple, fast and expressive GNN that leverages stochastic frame averaging to process geometric information without constraints. |
Alexandre AGM Duval; Victor Schmidt; Alex Hernández-García; Santiago Miret; Fragkiskos D. Malliaros; Yoshua Bengio; David Rolnick; |
142 | Maximal Initial Learning Rates in Deep ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ – the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. |
Gaurav Iyer; Boris Hanin; David Rolnick; |
143 | Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by a recent success in analyzing an extrapolated cyclic scheme for generalized variational inequalities, we propose an *Accelerated Cyclic Coordinate Dual Averaging with Extrapolation* (A-CODER) method for composite convex optimization, where the objective function can be expressed as the sum of a smooth convex function accessible via a gradient oracle and a convex, possibly nonsmooth, function accessible via a proximal oracle. |
Cheuk Yin Lin; Chaobing Song; Jelena Diakonikolas; |
144 | A Robust Test for The Stationarity Assumption in Sequential Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. |
Jitao Wang; Chengchun Shi; Zhenke Wu; |
145 | Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. |
Minshuo Chen; Kaixuan Huang; Tuo Zhao; Mengdi Wang; |
146 | Why Target Networks Stabilise Temporal Difference Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the question: “why do target networks stabilise TD learning”? |
Mattie Fellows; Matthew J. A. Smith; Shimon Whiteson; |
147 | Simplified Temporal Consistency Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. |
Yi Zhao; Wenshuai Zhao; Rinu Boney; Juho Kannala; Joni Pajarinen; |
148 | Generalized Implicit Follow-The-Regularized-Leader Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework. |
Keyi Chen; Francesco Orabona; |
149 | On Pitfalls of Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the sheer number of existing methods, the inconsistent experimental conditions and lack of standardization in prior literature make it difficult to measure their actual efficacies and progress. To address this issue, we present a large-scale open-sourced Test-Time Adaptation Benchmark, dubbed TTAB, which includes nine state-of-the-art algorithms, a diverse array of distribution shifts, and two comprehensive evaluation protocols. |
Hao Zhao; Yuejiang Liu; Alexandre Alahi; Tao Lin; |
150 | Algorithms for Bounding Contribution for Histogram Estimation Under User-level Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings. |
Yuhan Liu; Ananda Theertha Suresh; Wennan Zhu; Peter Kairouz; Marco Gruteser; |
151 | Are Gaussian Data All You Need? The Extents and Limits of Universality in High-Dimensional Generalized Linear Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. |
Luca Pesce; Florent Krzakala; Bruno Loureiro; Ludovic Stephan; |
152 | Model-based Offline Reinforcement Learning with Count-based Conservatism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. |
Byeongchan Kim; Min-hwan Oh; |
153 | One-shot Imitation in A Non-Stationary Environment Via Multi-Modal Skill Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we explore the compositionality of complex tasks, and present a novel skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation; from a single demonstration for a complex unseen task, a semantic skill sequence is inferred and then each skill in the sequence is converted into an action sequence optimized for environmental hidden dynamics that can vary over time. |
Sangwoo Shin; Daehee Lee; Minjong Yoo; Woo Kyung Kim; Honguk Woo; |
154 | Structural Re-weighting Improves Graph Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work examines different impacts of distribution shifts caused by either graph structure or node attributes and identifies a new type of shift, named conditional structure shift (CSS), which current GDA approaches are provably sub-optimal to deal with. A novel approach, called structural reweighting (StruRW), is proposed to address this issue and is tested on synthetic graphs, four benchmark datasets, and a new application in HEP. |
Shikun Liu; Tianchun Li; Yongbin Feng; Nhan Tran; Han Zhao; Qiang Qiu; Pan Li; |
155 | Hypervolume Knowledge Gradient: A Lookahead Approach for Multi-Objective Bayesian Optimization with Partial Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general one-step lookahead acquisition function based on the Knowledge Gradient that addresses the complex question of what to evaluate when and at which design points in a principled Bayesian decision-theoretic fashion. |
Sam Daulton; Maximilian Balandat; Eytan Bakshy; |
156 | Machine Learning Force Fields with Data Cost Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework — ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. |
Alexander Bukharin; Tianyi Liu; Shengjie Wang; Simiao Zuo; Weihao Gao; Wen Yan; Tuo Zhao; |
157 | Continuous Spatiotemporal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for modeling of continuous systems. |
Antonio Henrique de Oliveira Fonseca; Emanuele Zappala; Josue Ortega Caro; David van Dijk; |
158 | ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks Via Learned Finite State Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide an approach for batching dynamic DNNs based on finite state machines, which enables the automatic discovery of batching policies specialized for each DNN via reinforcement learning. |
Siyuan Chen; Pratik Pramod Fegade; Tianqi Chen; Phillip Gibbons; Todd Mowry; |
159 | Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given their central role in comparing and improving generative models, understanding their limitations are crucially important. To that end, in this work, we identify a critical flaw in the common approximation of these metrics using k-nearest-neighbors, namely, that the very interpretations of fidelity and diversity that are assigned to Precision and Recall can fail in high dimensions, resulting in very misleading conclusions. |
Mahyar Khayatkhoei; Wael AbdAlmageed; |
160 | An SDE for Modeling SAM: Theory and Insights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings. |
Enea Monzio Compagnoni; Luca Biggio; Antonio Orvieto; Frank Norbert Proske; Hans Kersting; Aurelien Lucchi; |
161 | Simple and Fast Group Robustness By Automatic Feature Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Automatic Feature Reweighting (AFR), an extremely simple and fast method for updating the model to reduce the reliance on spurious features. |
Shikai Qiu; Andres Potapczynski; Pavel Izmailov; Andrew Gordon Wilson; |
162 | MG-GNN: Multigrid Graph Neural Networks for Learning Multilevel Domain Decomposition Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose multigrid graph neural networks (MG-GNN), a novel GNN architecture for learning optimized parameters in two-level DDMs. |
Ali Taghibakhshi; Nicolas Nytko; Tareq Uz Zaman; Scott MacLachlan; Luke Olson; Matthew West; |
163 | Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, in this work, we develop an implicit gradient-based approach, which is easy to implement, and is suitable for machine learning applications. |
Prashant Khanduri; Ioannis Tsaknakis; Yihua Zhang; Jia Liu; Sijia Liu; Jiawei Zhang; Mingyi Hong; |
164 | Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To circumvent the known hardness results and the use of computationally intractable oracles, we propose to leverage the potential *information-sharing* among agents, a standard practice in empirical MARL and a common model for multi-agent control systems with communications. |
Xiangyu Liu; Kaiqing Zhang; |
165 | Revisiting Structured Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we revisit SVAEs using modern machine learning tools and demonstrate their advantages over more general alternatives in terms of both accuracy and efficiency. |
Yixiu Zhao; Scott Linderman; |
166 | Tied-Augment: Controlling Representation Similarity Improves Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a general framework called Tied-Augment, which improves the efficacy of data augmentation in a wide range of applications by adding a simple term to the loss that can control the similarity of representations under distortions. |
Emirhan Kurtuluş; Zichao Li; Yann Dauphin; Ekin Dogus Cubuk; |
167 | Equivariance with Learned Canonicalization Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. |
Sékou-Oumar Kaba; Arnab Kumar Mondal; Yan Zhang; Yoshua Bengio; Siamak Ravanbakhsh; |
168 | Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts. |
Andi Peng; Aviv Netanyahu; Mark K Ho; Tianmin Shu; Andreea Bobu; Julie Shah; Pulkit Agrawal; |
169 | Applied Online Algorithms with Heterogeneous Predictors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate how to more effectively utilize historical datasets and application domain knowledge by intentionally using predictors of *different* quantities. |
Jessica Maghakian; Russell Lee; Mohammad Hajiesmaili; Jian Li; Ramesh Sitaraman; Zhenhua Liu; |
170 | Ewald-based Long-Range Message Passing for Molecular Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method. |
Arthur Kosmala; Johannes Gasteiger; Nicholas Gao; Stephan Günnemann; |
171 | Future-conditioned Unsupervised Pretraining for Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data. |
Zhihui Xie; Zichuan Lin; Deheng Ye; QIANG FU; Yang Wei; Shuai Li; |
172 | A Model-Based Method for Minimizing CVaR and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a variant of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. |
Si Yi Meng; Robert M. Gower; |
173 | Extrapolative Controlled Sequence Generation Via Iterative Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. |
Vishakh Padmakumar; Richard Yuanzhe Pang; He He; Ankur P Parikh; |
174 | General Sequential Episodic Memory Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a class of General Sequential Episodic Memory Models (GSEMM) that, in the adiabatic limit, exhibit a dynamic energy surface, leading to a series of meta-stable states capable of encoding memory sequences. |
Arjun Karuvally; Terrence Sejnowski; Hava T Siegelmann; |
175 | Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization Over (Non-)Convex Set Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose homeomorphic projection as a low-complexity scheme to guarantee NN solution feasibility for optimization over a general set homeomorphic to a unit ball, covering all compact convex sets and certain classes of nonconvex sets. |
Enming Liang; Minghua Chen; Steven Low; |
176 | LEVER: Learning to Verify Language-to-Code Generation with Execution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. |
Ansong Ni; Srini Iyer; Dragomir Radev; Veselin Stoyanov; Wen-tau Yih; Sida Wang; Xi Victoria Lin; |
177 | End-to-end Training of Deep Boltzmann Machines By Unbiased Contrastive Divergence with Local Mode Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution. |
Shohei Taniguchi; Masahiro Suzuki; Yusuke Iwasawa; Yutaka Matsuo; |
178 | MPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration while addressing the problem of modality entanglement. |
Haiyang Xu; Qinghao Ye; Ming Yan; Yaya Shi; Jiabo Ye; Yuanhong Xu; Chenliang Li; Bin Bi; Qi Qian; Wei Wang; Guohai Xu; Ji Zhang; Songfang Huang; Fei Huang; Jingren Zhou; |
179 | Kernel QuantTree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Kernel QuantTree (KQT), a non-parametric change detection algorithm that monitors multivariate data through a histogram. |
Diego Stucchi; Paolo Rizzo; Nicolò Folloni; Giacomo Boracchi; |
180 | Rethinking Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a different approach to the backdoor attack problem. |
Alaa Khaddaj; Guillaume Leclerc; Aleksandar Makelov; Kristian Georgiev; Hadi Salman; Andrew Ilyas; Aleksander Madry; |
181 | Adversarial Robustness of Amortized Bayesian Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator, and show how it improves the adversarial robustness of amortized Bayesian inference. |
Manuel Gloeckler; Michael Deistler; Jakob H. Macke; |
182 | Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, sequential decision making problems are often carried out in a batched manner, either due to the inherent nature of the problem or to serve the purpose of reducing communication and computation costs. In this work, we jointly study these problems in two popular settings, namely, stochastic multi-armed bandits (MABs) and infinite-horizon reinforcement learning (RL), where TS is used to learn the unknown reward distributions and transition dynamics, respectively. |
Amin Karbasi; Nikki Lijing Kuang; Yian Ma; Siddharth Mitra; |
183 | Adversarially Robust PAC Learnability of Real-Valued Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Along the way, we introduce a novel agnostic sample compression scheme for real-valued functions, which may be of independent interest. |
Idan Attias; Steve Hanneke; |
184 | Reprogramming Pretrained Language Models for Antibody Sequence Infilling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce ReprogBert in which a pretrained English language model is repurposed for protein sequence infilling – thus considers cross-language adaptation using less data. |
Igor Melnyk; Vijil Chenthamarakshan; Pin-Yu Chen; Payel Das; Amit Dhurandhar; Inkit Padhi; Devleena Das; |
185 | Text-To-Concept (and Back) Via Cross-Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that the mapping between an image’s representation in one model to its representation in another can be learned surprisingly well with just a linear layer, even across diverse models. Building on this observation, we propose *text-to-concept*, where features from a fixed pretrained model are aligned linearly to the CLIP space, so that text embeddings from CLIP’s text encoder become directly comparable to the aligned features. |
Mazda Moayeri; Keivan Rezaei; Maziar Sanjabi; Soheil Feizi; |
186 | Multi-channel Autobidding with Budget and ROI Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. |
Yuan Deng; Negin Golrezaei; Patrick Jaillet; Jason Cheuk Nam Liang; Vahab Mirrokni; |
187 | Statistical Indistinguishability of Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the similarity of outcomes of learning rules through the lens of the Total Variation (TV) distance of distributions. |
Alkis Kalavasis; Amin Karbasi; Shay Moran; Grigoris Velegkas; |
188 | Online Learning with Feedback Graphs: The True Shape of Regret Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define a new quantity $R^*$, called the *problem complexity*, and prove that the minimax regret is proportional to $R^*$ for any graph and time horizon $T$. |
Tomáš Kocák; Alexandra Carpentier; |
189 | $H$-Consistency Bounds for Pairwise Misranking Loss Surrogates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of *$H$-consistency bounds* for score-based ranking. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
190 | From Perception to Programs: Regularize, Overparameterize, and Amortize Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We develop techniques for synthesizing neurosymbolic programs. Such programs mix discrete symbolic processing with continuous neural computation. We relax this mixed … |
Hao Tang; Kevin Ellis; |
191 | Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose learning to jump as a general recipe for generative modeling of various types of data. |
Tianqi Chen; Mingyuan Zhou; |
192 | Identifying Interpretable Subspaces in Image Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. |
Neha Kalibhat; Shweta Bhardwaj; C. Bayan Bruss; Hamed Firooz; Maziar Sanjabi; Soheil Feizi; |
193 | Neural Markov Jump Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we introduce an alternative, variational inference algorithm for Markov jump processes which relies on neural ordinary differential equations, and is trainable via back-propagation. |
Patrick Seifner; Ramses J Sanchez; |
194 | PFGM++: Unlocking The Potential of Physics-Inspired Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). |
Yilun Xu; Ziming Liu; Yonglong Tian; Shangyuan Tong; Max Tegmark; Tommi S. Jaakkola; |
195 | Parallel Neurosymbolic Integration with Concordia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Concordia, a framework overcoming the limitations of prior art. |
Jonathan Feldstein; Modestas Jurčius; Efthymia Tsamoura; |
196 | Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CTreeOT, a convergent, differentiable algorithm for matching two trees when each tree is conditioned on some input. |
Harshit Varma; Abhijeet Awasthi; Sunita Sarawagi; |
197 | Uncertainty Estimation for Molecules: Desiderata and Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By our analysis, we conclude that none of the previous works satisfies all criteria. To fill this gap, we propose Localized Neural Kernel (LNK) a Gaussian Process (GP)-based extension to existing GNNs satisfying the desiderata. |
Tom Wollschläger; Nicholas Gao; Bertrand Charpentier; Mohamed Amine Ketata; Stephan Günnemann; |
198 | Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta’s cost function. |
Steinar Laenen; Bogdan Adrian Manghiuc; He Sun; |
199 | Posterior Sampling for Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest due to its performance in the tabular setting. This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning that retains its model-based essence. |
Remo Sasso; Michelangelo Conserva; Paulo Rauber; |
200 | Beam Tree Recursive Cells Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Beam Tree Recursive Cell (BT-Cell) – a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction. |
Jishnu Ray Chowdhury; Cornelia Caragea; |
201 | Revisiting Sampling for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the idea of using sampling for combinatorial optimization, motivated by the significant recent advances of gradient-based discrete MCMC and new techniques for parallel neighborhood exploration on accelerators. |
Haoran Sun; Katayoon Goshvadi; Azade Nova; Dale Schuurmans; Hanjun Dai; |
202 | Memory-Based Meta-Learning on Non-Stationary Distributions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. |
Tim Genewein; Gregoire Deletang; Anian Ruoss; Li Kevin Wenliang; Elliot Catt; Vincent Dutordoir; Jordi Grau-Moya; Laurent Orseau; Marcus Hutter; Joel Veness; |
203 | Improved Online Learning Algorithms for CTR Prediction in Ad Auctions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. |
Zhe Feng; Christopher Liaw; Zixin Zhou; |
204 | User-level Private Stochastic Convex Optimization with Optimal Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of differentially private (DP) stochastic convex optimization (SCO) under the notion of user-level differential privacy. |
Raef Bassily; Ziteng Sun; |
205 | The Value of Out-of-Distribution Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the amount of OOD data. |
Ashwin De Silva; Rahul Ramesh; Carey Priebe; Pratik Chaudhari; Joshua T Vogelstein; |
206 | SMURF-THP: Score Matching-based UnceRtainty QuantiFication for Transformer Hawkes Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence interval for the predicted event’s arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. |
Zichong Li; Yanbo Xu; Simiao Zuo; Haoming Jiang; Chao Zhang; Tuo Zhao; Hongyuan Zha; |
207 | Linear Causal Disentanglement Via Interventions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study observed variables that are a linear transformation of a linear latent causal model. |
Chandler Squires; Anna Seigal; Salil S Bhate; Caroline Uhler; |
208 | Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, our study motivates a modification to the standard attention mechanism, where attention weights are scaled using soft masks generated by a convolutional network. |
Mattia Atzeni; Mrinmaya Sachan; Andreas Loukas; |
209 | Counterfactual Identifiability of Bijective Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature. |
Arash Nasr-Esfahany; Mohammad Alizadeh; Devavrat Shah; |
210 | Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. |
Ainesh Bakshi; Allen Liu; Ankur Moitra; morris yau; |
211 | Exploring The Benefits of Training Expert Language Models Over Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we report surprising findings that show an expert LM trained on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by an average of 3.20% and 1.29%, respectively. |
Joel Jang; Seungone Kim; Seonghyeon Ye; Doyoung Kim; Lajanugen Logeswaran; Moontae Lee; Kyungjae Lee; Minjoon Seo; |
212 | E$(n)$ Equivariant Message Passing Simplicial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents $\mathrm{E}(n)$ Equivariant Message Passing Simplicial Networks (EMPSNs), a novel approach to learning on geometric graphs and point clouds that is equivariant to rotations, translations, and reflections. |
Floor Eijkelboom; Rob Hesselink; Erik J Bekkers; |
213 | Generating Private Synthetic Data with Genetic Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, even when possible, these approaches impose a fundamental limitation in which modifications to the minimization problem become additional sources of error. Therefore, we propose Private-GSD, a private genetic algorithm based on *zeroth*-order optimization heuristics that do not require modifying the original objective; thus, it avoids the aforementioned limitations of first-order optimization. |
Terrance Liu; Jingwu Tang; Giuseppe Vietri; Steven Wu; |
214 | Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. |
Paul Vicol; |
215 | Modeling Temporal Data As Continuous Functions with Stochastic Process Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we define suitable noise sources and introduce novel denoising and score-matching models. |
Marin Biloš; Kashif Rasul; Anderson Schneider; Yuriy Nevmyvaka; Stephan Günnemann; |
216 | Actor-Critic Alignment for Offline-to-Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. |
Zishun Yu; Xinhua Zhang; |
217 | Flexible Phase Dynamics for Bio-Plausible Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics, that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. |
Ezekiel Williams; Colin Bredenberg; Guillaume Lajoie; |
218 | Optimal No-Regret Learning for One-Sided Lipschitz Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by applications in pricing and contract design, we study the maximization of one-sided Lipschitz functions, which only provide the (weaker) guarantee that they do not grow too quickly in one direction. |
Paul Duetting; Guru Guruganesh; Jon Schneider; Joshua Ruizhi Wang; |
219 | Understanding The Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model. |
Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang; |
220 | Neural FIM for Learning Fisher Information Metrics from Point Cloud Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data – allowing for a continuous manifold model for the data. |
Oluwadamilola Fasina; Guillaume Huguet; Alexander Tong; Yanlei Zhang; Guy Wolf; Maximilian Nickel; Ian Adelstein; Smita Krishnaswamy; |
221 | In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a solution, we introduce with NINCO a novel test OOD dataset, each sample checked to be ID free, which with its fine-grained range of OOD classes allows for a detailed analysis of an OOD detector’s strengths and failure modes, particularly when paired with a number of synthetic “OOD unit-tests”.We provide code and data at https://github.com/j-cb/NINCO. |
Julian Bitterwolf; Maximilian Müller; Matthias Hein; |
222 | Variational Mixture of HyperGenerators for Learning Distributions Over Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel deep generative model, named VaMoH. |
Batuhan Koyuncu; Pablo Sanchez Martin; Ignacio Peis; Pablo M. Olmos; Isabel Valera; |
223 | Robust Subtask Learning for Compositional Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Typically, a high-level task is decomposed into a sequence of subtasks and a separate policy is trained to perform each subtask. In this paper, we focus on the problem of training subtask policies in a way that they can be used to perform any task; here, a task is given by a sequence of subtasks. |
Kishor Jothimurugan; Steve Hsu; Osbert Bastani; Rajeev Alur; |
224 | Strategic Classification with Unknown User Manipulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel batch-learning setting in which we use unlabeled data from previous rounds to estimate the manipulation structure. |
Tosca Lechner; Ruth Urner; Shai Ben-David; |
225 | LSDS++ : Dual Sampling for Accelerated K-means++ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new variant named LSDS++, which improves the sampling efficiency of LocalSearch++ via a strategy called dual sampling. |
Chenglin Fan; Ping Li; Xiaoyun Li; |
226 | Data-OOB: Out-of-bag Estimate As A Simple and Efficient Data Value Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate. |
Yongchan Kwon; James Zou; |
227 | From Adaptive Query Release to Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. |
Enayat Ullah; Raman Arora; |
228 | Label Differential Privacy and Private Training Data Release Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to enable learning of an accurate predictive model while protecting the privacy of each user’s label. |
Robert Istvan Busa-Fekete; Andres Munoz medina; Umar Syed; Sergei Vassilvitskii; |
229 | Learning Dense Correspondences Between Photos and Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, PSC6k, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. |
Xuanchen Lu; Xiaolong Wang; Judith E Fan; |
230 | Conformalization of Sparse Generalized Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. |
Etash Kumar Guha; Eugene Ndiaye; Xiaoming Huo; |
231 | Iterative Approximate Cross-Validation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new paradigm to efficiently approximate CV when the ERM problem is solved via an iterative first-order algorithm, without running until convergence. |
Yuetian Luo; Zhimei Ren; Rina Barber; |
232 | DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. |
Jiaqi Guan; Xiangxin Zhou; Yuwei Yang; Yu Bao; Jian Peng; Jianzhu Ma; Qiang Liu; Liang Wang; Quanquan Gu; |
233 | Correcting Discount-factor Mismatch in On-policy Policy Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. |
Fengdi Che; Gautham Vasan; A. Rupam Mahmood; |
234 | Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this manuscript, *we use internal cost heuristics of adaptive differential equation solvers at stochastic time-points to guide the training towards learning a dynamical system that is easier to integrate*. |
Avik Pal; Alan Edelman; Christopher Vincent Rackauckas; |
235 | SinDDM: A Single Image Denoising Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce a framework for training a DDM on a single image. |
Vladimir Kulikov; Shahar Yadin; Matan Kleiner; Tomer Michaeli; |
236 | DRew: Dynamically Rewired Message Passing with Delay Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose a framework, applicable to any MPNN architecture, that performs a layer-dependent rewiring to ensure gradual densification of the graph. |
Benjamin Gutteridge; Xiaowen Dong; Michael M. Bronstein; Francesco Di Giovanni; |
237 | Masked Trajectory Models for Prediction, Representation, and Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. |
Philipp Wu; Arjun Majumdar; Kevin Stone; Yixin Lin; Igor Mordatch; Pieter Abbeel; Aravind Rajeswaran; |
238 | Predictive Flows for Faster Ford-Fulkerson Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent work has shown that leveraging learned predictions can improve the running time of algorithms for bipartite matching and similar combinatorial problems. In this work, we build on this idea to improve the performance of the widely used Ford-Fulkerson algorithm for computing maximum flows by seeding Ford-Fulkerson with predicted flows. |
Sami Davies; Benjamin Moseley; Sergei Vassilvitskii; Yuyan Wang; |
239 | GLOBE-CE: A Translation Based Approach for Global Counterfactual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to propose Global & Efficient Counterfactual Explanations (GLOBE-CE), a flexible framework that tackles the reliability and scalability issues associated with current state-of-the-art, particularly on higher dimensional datasets and in the presence of continuous features. |
Dan Ley; Saumitra Mishra; Daniele Magazzeni; |
240 | In Search of Insights, Not Magic Bullets: Towards Demystification of The Model Selection Dilemma in Heterogeneous Treatment Effect Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner’, we therefore empirically investigate success- and failure modes of different selection criteria. |
Alicia Curth; Mihaela van der Schaar; |
241 | Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE. |
Zhenghao Lin; Yeyun Gong; yelong shen; Tong Wu; Zhihao Fan; Chen Lin; Nan Duan; Weizhu Chen; |
242 | Multisample Flow Matching: Straightening Flows with Minibatch Couplings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multisample Flow Matching, a more general framework that uses non-trivial couplings between data and noise samples while satisfying the correct marginal constraints. |
Aram-Alexandre Pooladian; Heli Ben-Hamu; Carles Domingo-Enrich; Brandon Amos; Yaron Lipman; Ricky T. Q. Chen; |
243 | Active Causal Structure Learning with Advice Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the problem of active causal structure learning with advice. |
Davin Choo; Themistoklis Gouleakis; Arnab Bhattacharyya; |
244 | ReDi: Efficient Learning-Free Diffusion Inference Via Trajectory Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To accelerate the inference, we propose ReDi, a simple yet learning-free Retrieval-based Diffusion sampling framework. |
Kexun Zhang; Xianjun Yang; William Yang Wang; Lei Li; |
245 | Looped Transformers As Programmable Computers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. |
Angeliki Giannou; Shashank Rajput; Jy-yong Sohn; Kangwook Lee; Jason D. Lee; Dimitris Papailiopoulos; |
246 | Alternately Optimized Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new optimization framework for semi-supervised learning on graphs from a multi-view learning perspective. |
Haoyu Han; Xiaorui Liu; Haitao Mao; MohamadAli Torkamani; Feng Shi; Victor Lee; Jiliang Tang; |
247 | Graph Inductive Biases in Transformers Without Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing. |
Liheng Ma; Chen Lin; Derek Lim; Adriana Romero-Soriano; Puneet K. Dokania; Mark Coates; Philip Torr; Ser-Nam Lim; |
248 | Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the block coordinate descent methods of Gauss-Seidel type with proximal regularization (BCD-PR), which is a classical method of minimizing general nonconvex objectives under constraints that has a wide range of practical applications. |
Dohyun Kwon; Hanbaek Lyu; |
249 | Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An intriguing hypothesis is that traveling waves serve to structure neural representations both in space and time, thereby acting as an inductive bias towards natural data. In this work, we investigate this hypothesis by introducing the Neural Wave Machine (NWM) — a locally coupled oscillatory recurrent neural network capable of exhibiting traveling waves in its hidden state. |
T. Anderson Keller; Max Welling; |
250 | Principled Acceleration of Iterative Numerical Methods Using Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a systematic study of these approaches and how they differ from meta-learning is lacking. In this paper, we propose a framework to analyze such learning-based acceleration approaches, where one can immediately identify a departure from classical meta-learning. |
Sohei Arisaka; Qianxiao Li; |
251 | Efficient and Equivariant Graph Networks for Predicting Quantum Hamiltonian Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a SE(3)-equivariant network, named QHNet, that achieves efficiency and equivariance. |
Haiyang Yu; Zhao Xu; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji; |
252 | Measuring The Impact of Programming Language Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. |
Gabriel Orlanski; Kefan Xiao; Xavier Garcia; Jeffrey Hui; Joshua Howland; Jonathan Malmaud; Jacob Austin; Rishabh Singh; Michele Catasta; |
253 | LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. |
Yixiao Li; Yifan Yu; Qingru Zhang; Chen Liang; Pengcheng He; Weizhu Chen; Tuo Zhao; |
254 | Node Embedding from Neural Hamiltonian Orbits in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we model the embedding update of a node feature as a Hamiltonian orbit over time. |
Qiyu Kang; Kai Zhao; Yang Song; Sijie Wang; Wee Peng Tay; |
255 | Multi-Task Differential Privacy Under Distribution Skew Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a systematic analysis of the problem, by studying how to optimally allocate a user’s privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that in the presence of distribution skew, this gives a quantifiable improvement of excess empirical risk. |
Walid Krichene; Prateek Jain; Shuang Song; Mukund Sundararajan; Abhradeep Guha Thakurta; Li Zhang; |
256 | A/B Testing in Network Data with Covariate-Adaptive Randomization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new adaptive procedure to balance both the network and the covariates. |
Jialu Wang; Ping Li; Feifang Hu; |
257 | Towards Bridging The Gaps Between The Right to Explanation and The Right to Be Forgotten Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, enforcing the right to be forgotten may trigger model updates which in turn invalidate previously provided explanations, thus violating the right to explanation. In this work, we investigate the technical implications arising due to the interference between the two aforementioned regulatory principles, and propose the first algorithmic framework to resolve the tension between them. |
Satyapriya Krishna; Jiaqi Ma; Himabindu Lakkaraju; |
258 | General Covariance Data Augmentation for Neural PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To alleviate the problem, we propose a computationally cheap augmentation strategy based on general covariance and simple random coordinate transformations. |
Fanaskov Vladimir; Tianchi Yu; Alexander Rudikov; Ivan Oseledets; |
259 | Deep Anomaly Detection Under Labeling Budget Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. |
Aodong Li; Chen Qiu; Marius Kloft; Padhraic Smyth; Stephan Mandt; Maja Rudolph; |
260 | On Kinetic Optimal Probability Paths for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense. |
Neta Shaul; Ricky T. Q. Chen; Maximilian Nickel; Matthew Le; Yaron Lipman; |
261 | Off-Policy Average Reward Actor-Critic with Deterministic Policy Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. |
Naman Saxena; Subhojyoti Khastagir; Shishir N Y; Shalabh Bhatnagar; |
262 | Coarse-to-Fine: A Hierarchical Diffusion Model for Molecule Generation in 3D Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fragment-based molecule generation is a promising strategy, however, it is nontrivial to be adapted for 3D non-autoregressive generations because of the combinational optimization problems. In this paper, we utilize a coarse-to-fine strategy to tackle this problem, in which a Hierarchical Diffusion-based model (i.e. HierDiff) is proposed to preserve the validity of local segments without relying on autoregressive modeling. |
Bo Qiang; Yuxuan Song; Minkai Xu; Jingjing Gong; Bowen Gao; Hao Zhou; Wei-Ying Ma; Yanyan Lan; |
263 | PASTA: Pessimistic Assortment Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, based on the principle of pessimism, we propose a novel algorithm called Pessimistic ASsortment opTimizAtion (PASTA for short), which can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. |
Juncheng Dong; Weibin Mo; Zhengling Qi; Cong Shi; Ethan X Fang; Vahid Tarokh; |
264 | Near-optimal Conservative Exploration in Reinforcement Learning Under Episode-wise Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process. |
Donghao Li; Ruiquan Huang; Cong Shen; Jing Yang; |
265 | Double-Weighting for Covariate Shift Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples. |
Jose Ignacio Segovia; Santiago Mazuelas; Anqi Liu; |
266 | Near-Optimal Algorithms for Private Online Optimization in The Realizable Regime Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. |
Hilal Asi; Vitaly Feldman; Tomer Koren; Kunal Talwar; |
267 | Leveraging Proxy of Training Data for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose two lightweight yet informative proxies of the training data and a TTA method fully exploiting them. |
Juwon Kang; Nayeong Kim; Donghyeon Kwon; Jungseul Ok; Suha Kwak; |
268 | The Monge Gap: A Regularizer to Learn All Transport Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More generally, we question the relevance of using Brenier’s result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$. |
Théo Uscidda; marco cuturi; |
269 | Dirichlet Diffusion Score Model for Biological Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. |
Pavel Avdeyev; Chenlai Shi; Yuhao Tan; Kseniia Dudnyk; Jian Zhou; |
270 | Compositional Score Modeling for Simulation-Based Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new method based on conditional score modeling that enjoys the benefits of both approaches. |
Tomas Geffner; George Papamakarios; Andriy Mnih; |
271 | Local Vertex Colouring Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the expressivity of GNNs from the perspective of graph search. |
Shouheng Li; Dongwoo Kim; Qing Wang; |
272 | The Flan Collection: Designing Data and Methods for Effective Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the design decision of publicly available instruction tuning methods, by reproducing and breaking down the development of Flan 2022 (Chung et al., 2022). |
Shayne Longpre; Le Hou; Tu Vu; Albert Webson; Hyung Won Chung; Yi Tay; Denny Zhou; Quoc V Le; Barret Zoph; Jason Wei; Adam Roberts; |
273 | Diffusion Models for Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Denoising Diffusion Optimization Models (DDOM), a new inverse approach for offline black-box optimization based on diffusion models. |
Siddarth Krishnamoorthy; Satvik Mehul Mashkaria; Aditya Grover; |
274 | NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when faced with real-world physical data, which are often highly non-uniformly distributed, it is challenging to use mesh-based techniques such as the FFT. To address this, we introduce the Non-Uniform Neural Operator (NUNO), a comprehensive framework designed for efficient operator learning with non-uniform data. |
Songming Liu; Zhongkai Hao; Chengyang Ying; Hang Su; Ze Cheng; Jun Zhu; |
275 | Unit Scaling: Out-of-the-Box Low-Precision Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats. |
Charlie Blake; Douglas Orr; Carlo Luschi; |
276 | Polarity Is All You Need to Learn and Transfer Faster Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we investigate the role of weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update, yet polarities are largely kept unchanged. |
Qingyang Wang; Michael Alan Powell; Eric W Bridgeford; Ali Geisa; Joshua T Vogelstein; |
277 | Training-Free Neural Active Learning with Initialization-Robustness Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce our expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points which lead to trained NNs with both (a) good predictive performances and (b) initialization robustness. |
Apivich Hemachandra; Zhongxiang Dai; Jasraj Singh; See-Kiong Ng; Bryan Kian Hsiang Low; |
278 | Generative Decoding of Visual Stimuli Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by that fact, we introduce a novel neural network architecture for the problem of neural decoding. |
Eleni Miliotou; Panagiotis Kyriakis; Jason D Hinman; Andrei Irimia; Paul Bogdan; |
279 | Forget Unlearning: Towards True Data-Deletion in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records. |
Rishav Chourasia; Neil Shah; |
280 | Learning Rate Schedules in The Presence of Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants. |
Matthew Fahrbach; Adel Javanmard; Vahab Mirrokni; Pratik Worah; |
281 | Extending Conformal Prediction to Hidden Markov Models with Exact Validity Via De Finetti’s Theorem for Markov Chains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We generalize conformal prediction to the Hidden Markov Model (HMM) framework where the assumption of exchangeability is not valid. |
Buddhika Nettasinghe; Samrat Chatterjee; Ramakrishna Tipireddy; Mahantesh M Halappanavar; |
282 | Gradient Descent Converges Linearly for Logistic Regression on Separable Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$. |
Kyriakos Axiotis; Maxim Sviridenko; |
283 | Approximate Stein Classes for Truncated Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation. |
Daniel James Williams; Song Liu; |
284 | Pareto Manifold Learning: Tackling Multiple Tasks Via Ensembles of Single-task Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose *Pareto Manifold Learning*, an ensembling method in weight space. |
Nikolaos Dimitriadis; Pascal Frossard; François Fleuret; |
285 | Mitigating Propagation Failures in Physics-informed Neural Networks Using Retain-Resample-Release (R3) Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a novel perspective of failure modes of PINNs by hypothesizing that training PINNs relies on successful propagation of solution from initial and/or boundary condition points to interior points. |
Arka Daw; Jie Bu; Sifan Wang; Paris Perdikaris; Anuj Karpatne; |
286 | DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By appropriately leveraging inter-task relationships, we propose a novel CL method, named DualHSIC, to boost the performance of existing rehearsal-based methods in a simple yet effective way. |
Zifeng Wang; Zheng Zhan; Yifan Gong; Yucai Shao; Stratis Ioannidis; Yanzhi Wang; Jennifer Dy; |
287 | Certifying Ensembles: A General Certification Theory with S-Lipschitzness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we generalise Lipschitz continuity by introducing S-Lipschitz classifiers, which we use to analyse the theoretical robustness of ensembles. |
Aleksandar Petrov; Francisco Eiras; Amartya Sanyal; Philip Torr; Adel Bibi; |
288 | Regions of Reliability in The Evaluation of Multivariate Probabilistic Forecasts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation. |
Étienne Marcotte; Valentina Zantedeschi; Alexandre Drouin; Nicolas Chapados; |
289 | AbODE: Ab Initio Antibody Design Using Conjoined ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates, some central challenges from multiple tasks including protein folding (sequence to structure), inverse folding (structure to sequence), and docking (binding). We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions. |
Yogesh Verma; Markus Heinonen; Vikas Garg; |
290 | TGRL: An Algorithm for Teacher Guided Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Teacher Guided Reinforcement Learning (TGRL), a principled approach to dynamically balance following the teacher’s guidance and leveraging RL. |
Idan Shenfeld; Zhang-Wei Hong; Aviv Tamar; Pulkit Agrawal; |
291 | Learning-augmented Private Algorithms for Multiple Quantile Release Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework—previously applied largely to improve time complexity or competitive ratios—as a powerful way of designing and analyzing privacy-preserving methods that can take advantage of such external information to improve utility. |
Mikhail Khodak; Kareem Amin; Travis Dick; Sergei Vassilvitskii; |
292 | Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization Under Concept Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To exploit data features of label-sparse samples more efficiently, we propose an adaptively weighted online optimization algorithm — AdaWAC — to incorporate data augmentation consistency regularization in sample reweighting. |
Yijun Dong; Yuege Xie; Rachel Ward; |
293 | Spatial Implicit Neural Representations for Global-Scale Species Mapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously. |
Elijah Cole; Grant Van Horn; Christian Lange; Alexander Shepard; Patrick Leary; Pietro Perona; Scott Loarie; Oisin Mac Aodha; |
294 | Adaptive Coordination in Social Embodied Rearrangement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior ZSC approaches struggle to generalize in our complex and visually rich setting, and on further analysis, we find that they fail to generate diverse coordination behaviors at training time. To counter this, we propose Behavior Diversity Play (BDP), a novel ZSC approach that encourages diversity through a discriminability objective. |
Andrew Szot; Unnat Jain; Dhruv Batra; Zsolt Kira; Ruta Desai; Akshara Rai; |
295 | OMS-DPM: Optimizing The Model Schedule for Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we reveal an overlooked dimension—model schedule—for optimizing the trade-off between generation quality and speed. |
Enshu Liu; Xuefei Ning; Zinan Lin; Huazhong Yang; Yu Wang; |
296 | Prefer to Classify: Improving Text Classifiers Via Auxiliary Preference Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences. |
Jaehyung Kim; Jinwoo Shin; Dongyeop Kang; |
297 | On Preemption and Learning in Stochastic Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution. |
Nadav Merlis; Hugo Richard; Flore Sentenac; Corentin Odic; Mathieu Molina; Vianney Perchet; |
298 | A Modern Look at The Relationship Between Sharpness and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But does it really capture generalization in modern practical settings? We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on ImageNet and CIFAR-10 to fine-tuning CLIP on ImageNet and BERT on MNLI. |
Maksym Andriushchenko; Francesco Croce; Maximilian Müller; Matthias Hein; Nicolas Flammarion; |
299 | Linear CNNs Discover The Statistical Structure of The Dataset Using Only The Most Dominant Frequencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We here present a stepping stone towards a deeper understanding of convolutional neural networks (CNNs) in the form of a theory of learning in linear CNNs. |
Hannah Pinson; Joeri Lenaerts; Vincent Ginis; |
300 | Properties of The Mallows Model Depending on The Number of Alternatives: A Warning for An Experimentalist Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically and theoretically analyze how the properties of rankings sampled from the Mallows model change when increasing the number of alternatives. |
Niclas Boehmer; Piotr Faliszewski; Sonja Kraiczy; |
301 | Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds Under Minimal Smoothness Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give an improved theoretical analysis of score-based generative modeling. |
Hongrui Chen; Holden Lee; Jianfeng Lu; |
302 | Extending Kernel PCA Through Dualization: Sparsity, Robustness and Fast Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions. |
Francesco Tonin; Alex Lambert; Panagiotis Patrinos; Johan Suykens; |
303 | Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We establish a novel generic algorithm that achieves *no-horizon dependence* in terms of sample complexity for both Markov Decision Processes (MDP) and Games, via reduction to a good-conditioned *auxiliary Markovian environment*, in which only “important” state-action pairs are preserved. |
Shengshi Li; Lin Yang; |
304 | Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate a primal-dual (PD) method for the saddle point problem (SPP) that uses a linear approximation of the primal function instead of the standard proximal step, resulting in a linearized PD (LPD) method. |
Mohammad Khalafi; Digvijay Boob; |
305 | Are Random Decompositions All We Need in High Dimensional Bayesian Optimisation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that data-driven learners of decompositions can be easily misled towards local decompositions that do not hold globally across the search space. Then, we formally show that a random tree-based decomposition sampler exhibits favourable theoretical guarantees that effectively trade off maximal information gain and functional mismatch between the actual black-box and its surrogate as provided by the decomposition. |
Juliusz Krzysztof Ziomek; Haitham Bou Ammar; |
306 | Individually Fair Learning with One-Sided Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. |
Yahav Bechavod; Aaron Roth; |
307 | User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. |
Marc Anton Finzi; Anudhyan Boral; Andrew Gordon Wilson; Fei Sha; Leonardo Zepeda-Nunez; |
308 | Exphormer: Sparse Transformers for Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. |
Hamed Shirzad; Ameya Velingker; Balaji Venkatachalam; Danica J. Sutherland; Ali Kemal Sinop; |
309 | Supervised Metric Learning to Rank for Retrieval Via Contextual Similarity Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity. |
Christopher Liao; Theodoros Tsiligkaridis; Brian Kulis; |
310 | Optimal Stochastic Non-smooth Non-convex Optimization Through Online-to-Non-convex Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. |
Ashok Cutkosky; Harsh Mehta; Francesco Orabona; |
311 | Federated Heavy Hitter Recovery Under Linear Sketching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). |
Adria Gascon; Peter Kairouz; Ziteng Sun; Ananda Theertha Suresh; |
312 | Coupled Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Coupled Variational Auto-Encoder (C-VAE), which formulates the VAE problem as one of Optimal Transport (OT) between the prior and data distributions. |
Xiaoran Hao; Patrick Shafto; |
313 | Conditionally Strongly Log-Concave Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is a growing gap between the impressive results of deep image generative models and classical algorithms that offer theoretical guarantees. The former suffer from mode collapse or memorization issues, limiting their application to scientific data. The latter require restrictive assumptions such as log-concavity to escape the curse of dimensionality. We partially bridge this gap by introducing conditionally strongly log-concave (CSLC) models, which factorize the data distribution into a product of conditional probability distributions that are strongly log-concave. |
Florentin Guth; Etienne Lempereur; Joan Bruna; Stéphane Mallat; |
314 | BNN-DP: Robustness Certification of Bayesian Neural Networks Via Dynamic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs). |
Steven Adams; Andrea Patane; Morteza Lahijanian; Luca Laurenti; |
315 | Nonlinear Causal Discovery with Latent Confounders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a nonlinear causal model involving hidden confounders. |
David Kaltenpoth; Jilles Vreeken; |
316 | Toward Efficient Gradient-Based Value Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization. |
Arsalan Sharifnassab; Richard S. Sutton; |
317 | RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a hybrid framework, Representation Asymmetry and Collaboration Evolution (RACE), which combines EA and MARL for efficient collaboration. |
Pengyi Li; Jianye HAO; Hongyao Tang; YAN ZHENG; Xian Fu; |
318 | Revisiting Data-Free Knowledge Distillation with Poisoned Teachers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we make the first effort to uncover the security risk of data-free KD w.r.t. untrusted pre-trained models. |
Junyuan Hong; Yi Zeng; Shuyang Yu; Lingjuan Lyu; Ruoxi Jia; Jiayu Zhou; |
319 | Fast As CHITA: Neural Network Pruning with Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. |
Riade Benbaki; Wenyu Chen; Xiang Meng; Hussein Hazimeh; Natalia Ponomareva; Zhe Zhao; Rahul Mazumder; |
320 | Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. |
Muthu Chidambaram; Xiang Wang; Chenwei Wu; Rong Ge; |
321 | Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by lifted neural networks and compartmental neuron models we propose a simple energy based compartmental neuron model, termed dual propagation, in which each neuron is a dyad with two intrinsic states. |
Rasmus Høier; D. Staudt; Christopher Zach; |
322 | Identifying Useful Learnwares for Heterogeneous Label Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make an attempt to improve the effectiveness of RKME specification for heterogeneous label spaces, where the learnware market does not contain a model that has the same label space as the user’s task, by considering a class-specific model specification explicitly, along with a class-wise learnware identification method. |
Lan-Zhe Guo; Zhi Zhou; Yu-Feng Li; Zhi-Hua Zhou; |
323 | Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose cvHM, a general inference framework for latent GP models leveraging Hida-Matérn kernels and conjugate computation variational inference (CVI). |
Matthew Dowling; Yuan Zhao; Il Memming Park; |
324 | Fair Neighbor Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework of fair neighbor embedding, the Fair Neighbor Retrieval Visualizer, which formulates fair nonlinear dimensionality reduction as an information retrieval task whose performance and fairness are quantified by information retrieval criteria. |
Jaakko Peltonen; Wen Xu; Timo Nummenmaa; Jyrki Nummenmaa; |
325 | Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we develop a theoretical formulation for arbitrary discrete-state Markov processes in the forward diffusion process using exact (as opposed to variational) analysis. |
Javier E. Santos; Zachary R Fox; Nicholas Lubbers; Yen Ting Lin; |
326 | Few-Sample Feature Selection Via Feature Manifold Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new method for few-sample supervised feature selection (FS). |
David Cohen; Tal Shnitzer; Yuval Kluger; Ronen Talmon; |
327 | Generalized Reductions: Making Any Hierarchical Clustering Fair and Balanced with Low Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^\delta poly\log(n))$ fair approximation for any constant $\delta\in(0,1)$. |
Marina Knittel; Max Springer; John P Dickerson; MohammadTaghi Hajiaghayi; |
328 | Distribution Free Prediction Sets for Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction. |
Jase Clarkson; |
329 | Bandit Online Linear Optimization with Hints and Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. |
Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit; |
330 | Federated Online and Bandit Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. |
Kumar Kshitij Patel; Lingxiao Wang; Aadirupa Saha; Nathan Srebro; |
331 | Test-time Adaptation with Slot-Centric Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent slot-centric generative models attempt to decompose scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised slot-centric scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives. |
Mihir Prabhudesai; Anirudh Goyal; Sujoy Paul; Sjoerd van Steenkiste; Mehdi S. M. Sajjadi; Gaurav Aggarwal; Thomas Kipf; Deepak Pathak; Katerina Fragkiadaki; |
332 | Representation-Driven Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a representation-driven framework for reinforcement learning. |
Ofir Nabati; Guy Tennenholtz; Shie Mannor; |
333 | Improving L1-Certified Robustness Via Randomized Smoothing By Leveraging Box Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current techniques are not able to utilize the fact that any adversarial example has to lie in the image space, that is $[0,1]^d$; otherwise, one can trivially detect it. To address this suboptimality, we derive new certification formulae which lead to significant improvements in the certified $\ell_1$-robustness without the need of adapting the classifiers or change of smoothing distributions. |
Vaclav Voracek; Matthias Hein; |
334 | LIV: Language-Image Representations and Rewards for Robotic Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. |
Yecheng Jason Ma; Vikash Kumar; Amy Zhang; Osbert Bastani; Dinesh Jayaraman; |
335 | MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. |
Omer Bar-Tal; Lior Yariv; Yaron Lipman; Tali Dekel; |
336 | On The Relationship Between Explanation and Prediction: A Causal View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Ys. |
Amir-Hossein Karimi; Krikamol Muandet; Simon Kornblith; Bernhard Schölkopf; Been Kim; |
337 | RGE: A Repulsive Graph Rectification for Node Classification Via Influence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the same vein, we observe that edge groups connecting to the same train node exhibit significant differences in their influences, hence no matter how negative each is, removing them at once may have a rather negative effect as a group. Based on this motivation, we propose a new edge-removing strategy, Repulsive edge Group Elimination (RGE), that preferentially removes edges with no interference in groups. |
Jaeyun Song; SungYub Kim; Eunho Yang; |
338 | Provable Multi-instance Deep AUC Maximization with Stochastic Pooling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address a neglected yet non-negligible computational challenge of MIL in the context of DAM, i.e., bag size is too large to be loaded into GPU memory for backpropagation, which is required by the standard pooling methods of MIL. To tackle this challenge, we propose variance-reduced stochastic pooling methods in the spirit of stochastic optimization by formulating the loss function over the pooled prediction as a multi-level compositional function. |
Dixian Zhu; Bokun Wang; Zhi Chen; Yaxing Wang; Milan Sonka; Xiaodong Wu; Tianbao Yang; |
339 | Generative Pretraining for Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose BONET, a generative framework for pretraining a novel model-based optimizer using offline datasets. |
Satvik Mehul Mashkaria; Siddarth Krishnamoorthy; Aditya Grover; |
340 | Representer Point Selection for Explaining Regularized High-dimensional Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel class of sample-based explanations we term *high-dimensional representers*, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples. |
Che-Ping Tsai; Jiong Zhang; Hsiang-Fu Yu; Eli Chien; Cho-Jui Hsieh; Pradeep Kumar Ravikumar; |
341 | Efficient Displacement Convex Optimization with Particle Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures. This paper considers particle gradient descent with a finite number of particles and establishes its theoretical guarantees to optimize functions that are *displacement convex* in measures. |
Hadi Daneshmand; Jason D. Lee; Chi Jin; |
342 | Generalized-Smooth Nonconvex Optimization Is As Efficient As Smooth Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. |
Ziyi Chen; Yi Zhou; Yingbin Liang; Zhaosong Lu; |
343 | Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. |
Kaiwen Zheng; Cheng Lu; Jianfei Chen; Jun Zhu; |
344 | Compressing Tabular Data Via Latent Variable Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Data used for analytics and machine learning often take the form of tables with categorical entries. We introduce a family of lossless compression algorithms for such data that proceed in four steps: (i) Estimate latent variables associated to rows and columns; (ii) Partition the table in blocks according to the row/column latents; (iii) Apply a sequential (e.g. Lempel-Ziv) coder to each of the blocks; (iv) Append a compressed encoding of the latents. |
Andrea Montanari; Eric Weiner; |
345 | A Conditional Normalizing Flow for Accelerated Multi-Coil MR Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead focus on sampling from the posterior distribution, which provides more comprehensive information for downstream inference tasks. |
Jeffrey Wen; Rizwan Ahmad; Philip Schniter; |
346 | Learnability and Algorithm for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL). |
Gyuhak Kim; Changnan Xiao; Tatsuya Konishi; Bing Liu; |
347 | Online Local Differential Private Quantile Inference Via Self-normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on binary inquiries, we developed an algorithm to estimate population quantiles under Local Differential Privacy (LDP). |
Yi Liu; Qirui Hu; Lei Ding; Linglong Kong; |
348 | Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure. |
Atsushi Nitanda; Kazusato Oko; Denny Wu; Nobuhito Takenouchi; Taiji Suzuki; |
349 | Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames. |
Ondrej Biza; Sjoerd van Steenkiste; Mehdi S. M. Sajjadi; Gamaleldin Fathy Elsayed; Aravindh Mahendran; Thomas Kipf; |
350 | On Provable Copyright Protection for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set. We give a formal definition of near access-freeness (NAF) and prove bounds on the probability that a model satisfying this definition outputs a sample similar to $C$, even if $C$ is included in its training set. |
Nikhil Vyas; Sham M. Kakade; Boaz Barak; |
351 | Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that multilingual compression and linguistic fairness are compatible with differential privacy, but that differential privacy is at odds with training data influence sparsity, an objective for transparency. |
Phillip Rust; Anders Søgaard; |
352 | Sequential Strategic Screening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In other words, we introduce the combination of strategic classificationwith screening processes. |
Lee Cohen; Saeed Sharifi -Malvajerdi; Kevin Stangl; Ali Vakilian; Juba Ziani; |
353 | Investigating The Role of Model-Based Learning in Exploration and Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate transfer learning in the context of model-based agents. |
Jacob C Walker; Eszter Vértes; Yazhe Li; Gabriel Dulac-Arnold; Ankesh Anand; Theophane Weber; Jessica B Hamrick; |
354 | Unveiling The Latent Space Geometry of Push-Forward Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on recent developments in geometric measure theory, we prove a sufficient condition for optimality in the case where the dimension of the latent space is larger than the number of modes. |
Thibaut Issenhuth; Ugo Tanielian; Jeremie Mary; David Picard; |
355 | Identification of The Adversary from A Single Adversarial Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, instead of enhancing the robustness, we take the investigator’s perspective and propose a new framework to trace the first compromised model copy in a forensic investigation manner. |
Minhao Cheng; Rui Min; Haochen Sun; Pin-Yu Chen; |
356 | K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Policy Clustering algorithm, called K-SHAP, that learns to group anonymous state-action pairs according to the agent policies. |
Andrea Coletta; Svitlana Vyetrenko; Tucker Balch; |
357 | The Acquisition of Physical Knowledge in Generative Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We outline an approach that allows us to examine two distinct hypotheses of human development — stochastic optimization and complexity increase. |
Luca M. Schulze Buschoff; Eric Schulz; Marcel Binz; |
358 | Towards Understanding Generalization of Macro-AUC in Multi-label Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest. |
Guoqiang Wu; Chongxuan Li; Yilong Yin; |
359 | The Persistent Laplacian for Data Science: Evaluating Higher-Order Persistent Spectral Representations of Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we provide the first investigation into the efficacy of the persistence Laplacian as an embedding of data for downstream classification and regression tasks. |
Thomas Davies; Zhengchao Wan; Ruben Sanchez-Garcia; |
360 | DUET: 2D Structured and Approximately Equivariant Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. |
Xavier Suau; Federico Danieli; T. Anderson Keller; Arno Blaas; Chen Huang; Jason Ramapuram; Dan Busbridge; Luca Zappella; |
361 | Forward-Backward Gaussian Variational Inference Via JKO in The Bures-Wasserstein Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. |
Michael Ziyang Diao; Krishna Balasubramanian; Sinho Chewi; Adil Salim; |
362 | Optimal Convergence Rates for Agnostic Nyström Kernel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there lacks a unified analysis for Nyström approximation, and the asymptotical minimax optimality for Nyström methods usually require a strict condition, assuming that the target regression lies exactly in the hypothesis space. In this paper, to tackle these problems, we provide a refined generalization analysis for Nyström approximation in the agnostic setting, where the target regression may be out of the hypothesis space. |
Jian Li; Yong Liu; Weiping Wang; |
363 | SeMAIL: Eliminating Distractors in Visual Imitation Via Separated Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the convention of MBIL research, existing algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose a new algorithm – named Separated Model-based Adversarial Imitation Learning (SeMAIL) – decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately. |
Shenghua Wan; Yucen Wang; Minghao Shao; Ruying Chen; De-Chuan Zhan; |
364 | Combinatorial Neural Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The score of an arm is an unknown function of the arm’s feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB ($\texttt{CN-UCB}$) and Combinatorial Neural Thompson Sampling ($\texttt{CN-TS}$). |
Taehyun Hwang; Kyuwook Chai; Min-hwan Oh; |
365 | Variational Autoencoding Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we present Variational Autoencoding Neural Operators (VANO), a general strategy for making a large class of operator learning architectures act as variational autoencoders. |
Jacob H Seidman; Georgios Kissas; George J. Pappas; Paris Perdikaris; |
366 | Who Needs to Know? Minimal Knowledge for Optimal Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that there is a well-defined dichotomy between strategically relevant and irrelevant information. |
Niklas Lauffer; Ameesh Shah; Micah Carroll; Michael D Dennis; Stuart Russell; |
367 | Efficient Parametric Approximations of Neural Network Function Space Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. |
Nikita Dhawan; Sicong Huang; Juhan Bae; Roger Baker Grosse; |
368 | Predicting Ordinary Differential Equations with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory. |
Sören Becker; Michal Klein; Alexander Neitz; Giambattista Parascandolo; Niki Kilbertus; |
369 | A Unifying Framework to The Analysis of Interaction Methods Using Synergy Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a unifying framework for game-theory-inspired attribution and $k^\text{th}$-order interaction methods. |
Daniel Lundstrom; Meisam Razaviyayn; |
370 | Stein Variational Goal Generation for Adaptive Exploration in Multi-Goal Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this context, a curriculum over goals helps agents learn by adapting training tasks to their current capabilities. In this work, we propose Stein Variational Goal Generation (SVGG), which samples goals of intermediate difficulty for the agent, by leveraging a learned predictive model of its goal reaching capabilities. |
Nicolas Castanet; Olivier Sigaud; sylvain lamprier; |
371 | Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. |
Pierre Bréchet; Katerina Papagiannouli; Jing An; Guido Montufar; |
372 | Shortest Edit Path Crossover: A Theory-driven Solution to The Permutation Problem in Evolutionary Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the first theoretical analysis of the behaviors of mutation, crossover and RL in black-box NAS, and proposes a new crossover operator based on the shortest edit path (SEP) in graph space. |
Xin Qiu; Risto Miikkulainen; |
373 | Fisher Information Embedding for Node and Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel attention-based node embedding framework for graphs. |
Dexiong Chen; Paolo Pellizzoni; Karsten Borgwardt; |
374 | Efficient Latency-Aware CNN Depth Compression Via Two-Stage Dynamic Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel depth compression algorithm which targets general convolution operations.We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. |
Jinuk Kim; Yeonwoo Jeong; Deokjae Lee; Hyun Oh Song; |
375 | Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Git-Theta’s design and features and include an example use-case of Git-Theta where a pre-trained model is continually adapted and modified. |
Nikhil Kandpal; Brian Lester; Mohammed Muqeeth; Anisha Mascarenhas; Monty Evans; Vishal Baskaran; Tenghao Huang; Haokun Liu; Colin Raffel; |
376 | Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the problem of learning single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron [Kakade et al. 2011], and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. |
Jingfeng Wu; Difan Zou; Zixiang Chen; Vladimir Braverman; Quanquan Gu; Sham M. Kakade; |
377 | Explaining Reinforcement Learning with Shapley Values Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. |
Daniel Beechey; Thomas M. S. Smith; Özgür Şimşek; |
378 | Naive Imputation Implicitly Regularizes High-dimensional Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. |
Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet; |
379 | Coin Sampling: Gradient-Based Bayesian Inference Without Learning Rates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a suite of new particle-based methods for scalable Bayesian inference based on coin betting, which are entirely learning-rate free. |
Louis Sharrock; Christopher Nemeth; |
380 | Distribution Free Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Distribution Free Domain Generalization (DFDG) procedure for classification by conducting standardization to avoid the dominance of a few domains in the training process. |
Peifeng Tong; Wu Su; He Li; Jialin Ding; Haoxiang Zhan; Song Xi Chen; |
381 | Simple Hardware-Efficient Long Convolutions for Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence. |
Daniel Y Fu; Elliot L Epstein; Eric Nguyen; Armin W Thomas; Michael Zhang; Tri Dao; Atri Rudra; Christopher Re; |
382 | Neural Signature Kernels As Infinite-width-depth-limits of Controlled ResNets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the paradigm of reservoir computing, we consider randomly initialized controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs), a unified architecture which enconpasses both RNNs and ResNets. |
Nicola Muca Cirone; Maud Lemercier; Cristopher Salvi; |
383 | Feature Programming for Multivariate Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework. |
Alex Daniel Reneau; Jerry Yao-Chieh Hu; Ammar Gilani; Han Liu; |
384 | Theory on Forgetting and Generalization of Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, there is a lack of understanding on what factors are important and how they affect catastrophic forgetting and generalization performance. To fill this gap, our theoretical analysis, under overparameterized linear models, provides the first-known explicit form of the expected forgetting and generalization error for a general CL setup with an arbitrary number of tasks. |
Sen Lin; Peizhong Ju; Yingbin Liang; Ness Shroff; |
385 | PAL: Program-aided Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. |
Luyu Gao; Aman Madaan; Shuyan Zhou; Uri Alon; Pengfei Liu; Yiming Yang; Jamie Callan; Graham Neubig; |
386 | Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute The Least Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, quantifying the value of examples for SSL has remained an open question. In this work, we address this problem for the first time, by proving that examples that contribute the most to contrastive SSL are those that have the most similar augmentations to other examples, in expectation. |
Siddharth Joshi; Baharan Mirzasoleiman; |
387 | A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite its success, the model’s capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split. |
Hanchen Xie; Jiageng Zhu; Mahyar Khayatkhoei; Jiazhi Li; Mohamed E. Hussein; Wael AbdAlmageed; |
388 | Learning Lightweight Object Detectors Via Multi-Teacher Progressive Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. |
Shengcao Cao; Mengtian Li; James Hays; Deva Ramanan; Yu-Xiong Wang; Liangyan Gui; |
389 | COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models. |
Jinqi Xiao; Miao Yin; Yu Gong; Xiao Zang; Jian Ren; Bo Yuan; |
390 | Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. |
Qinqing Zheng; Mikael Henaff; Brandon Amos; Aditya Grover; |
391 | Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents. |
aqeel labash; Florian Stelzer; Daniel Majoral; Raul Vicente; |
392 | Robust Situational Reinforcement Learning in Face of Context Disturbances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods on robust RL aim at learning robust policies against the deviations of the entire system dynamics. To tackle this problem, this paper proposes the framework of robust situational Markov decision process (RS-MDP) which captures the possible deviations of context transitions explicitly. |
Jinpeng Zhang; Yufeng Zheng; Chuheng Zhang; Li Zhao; Lei Song; Yuan Zhou; Jiang Bian; |
393 | QAS-Bench: Rethinking Quantum Architecture Search and A Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, beyond a particular domain, we formulate the QAS problem into two basic (and relatively even ideal) tasks: i) arbitrary quantum circuit (QC) regeneration given a target QC; ii) approximating an arbitrary unitary (oracle).Based on these two tasks, we generate a public QAS benchmark including 900 random QCs and 400 random unitary matrices which is still missing in the literature. |
Xudong Lu; Kaisen Pan; Ge Yan; Jiaming Shan; Wenjie Wu; Junchi Yan; |
394 | Half-Hop: A Graph Upsampling Approach for Slowing Down Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. |
Mehdi Azabou; Venkataramana Ganesh; Shantanu Thakoor; Chi-Heng Lin; Lakshmi Sathidevi; Ran Liu; Michal Valko; Petar Veličković; Eva L Dyer; |
395 | Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Neuro-Symbolic Continual Learning, where a model has to solve a sequence of neuro-symbolic tasks, that is, it has to map sub-symbolic inputs to high-level concepts and compute predictions by reasoning consistently with prior knowledge. |
Emanuele Marconato; Gianpaolo Bontempo; ELISA FICARRA; Simone Calderara; Andrea Passerini; Stefano Teso; |
396 | Group Equivariant Fourier Neural Operators for Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform. |
Jacob Helwig; Xuan Zhang; Cong Fu; Jerry Kurtin; Stephan Wojtowytsch; Shuiwang Ji; |
397 | Evaluating Unsupervised Denoising Requires Unsupervised Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose two novel metrics: the unsupervised mean squared error (MSE) and the unsupervised peak signal-to-noise ratio (PSNR), which are computed using only noisy data. |
Adria Marcos Morales; Matan Leibovich; Sreyas Mohan; Joshua Lawrence Vincent; Piyush Haluai; Mai Tan; Peter Crozier; Carlos Fernandez-Granda; |
398 | Rethinking Weak Supervision in Helping Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the empirical evidence showing that semi-supervised labels improve the representations of contrastive learning, it remains unknown if noisy supervised information can be directly used in training instead of after manual denoising. Therefore, to explore the mechanical differences between semi-supervised and noisy-labeled information in helping contrastive learning, we establish a unified theoretical framework of contrastive learning under weak supervision. |
Jingyi Cui; Weiran Huang; Yifei Wang; Yisen Wang; |
399 | Near-Optimal Quantum Coreset Construction Algorithms for Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity. |
Yecheng Xue; Xiaoyu Chen; Tongyang Li; Shaofeng H.-C. Jiang; |
400 | On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish the theoretical forms of HCEs and derive their properties at the individual level in both linear and nonlinear models. |
Richard A Watson; Hengrui Cai; Xinming An; Samuel McLean; Rui Song; |
401 | Bilevel Optimization with Coupled Decision-Dependent Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the inclusion of decision-dependent distributions in bilevel optimization. |
Songtao Lu; |
402 | Differentially Private Distributed Bayesian Linear Regression with MCMC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel Bayesian inference framework for distributed differentially private linear regression. |
Barış Alparslan; Sinan Yıldırım; Ilker Birbil; |
403 | Nearly Optimal Competitive Ratio for Online Allocation Problems with Two-sided Resource Constraints and Finite Requests Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the online allocation problem of maximizing the overall revenue subject to both lower and upper bound constraints. |
Qixin Zhang; Wenbing Ye; Zaiyi Chen; Haoyuan Hu; Enhong Chen; Yu Yang; |
404 | InGram: Inductive Knowledge Graph Embedding Via Relation Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an INductive knowledge GRAph eMbedding method, InGram, that can generate embeddings of new relations as well as new entities at inference time. |
Jaejun Lee; Chanyoung Chung; Joyce Jiyoung Whang; |
405 | Topological Point Cloud Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. |
Vincent Peter Grande; Michael T Schaub; |
406 | Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself. |
Chengshuai Shi; Wei Xiong; Cong Shen; Jing Yang; |
407 | Trapdoor Normalization with Irreversible Ownership Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a deep model watermark with an irreversible ownership verification scheme: Trapdoor Normalization (TdN), inspired by the trapdoor function in traditional cryptography. |
Hanwen Liu; Zhenyu Weng; Yuesheng Zhu; Yadong MU; |
408 | Margin-based Sampling in High Dimensions: When Being Active Is Less Efficient Than Staying Passive Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that PL outperforms margin-based AL even for noiseless data and when using the Bayes optimal decision boundary for sampling. |
Alexandru Tifrea; Jacob Clarysse; Fanny Yang; |
409 | Random Matrix Analysis to Balance Between Supervised and Unsupervised Learning Under The Low Density Separation Assumption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. |
Vasilii Feofanov; Malik Tiomoko; Aladin Virmaux; |
410 | On The Initialization of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the variance of forward and backward propagation across GNN layers and show that the variance instability of GNN initializations comes from the combined effect of the activation function, hidden dimension, graph structure and message passing. |
Jiahang Li; Yakun Song; Xiang song; David Wipf; |
411 | When Does Privileged Information Explain Away Label Noise? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate the role played by different properties of the PI in explaining away label noise. |
Guillermo Ortiz-Jimenez; Mark Collier; Anant Nawalgaria; Alexander D’Amour; Jesse Berent; Rodolphe Jenatton; Effrosyni Kokiopoulou; |
412 | A Category-theoretical Meta-analysis of Definitions of Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose that the concepts of the cartesian and monoidal products should serve as the core of disentanglement. |
Yivan Zhang; Masashi Sugiyama; |
413 | Quantifying The Variability Collapse of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel metric, named Variability Collapse Index (VCI), to quantify the variability collapse phenomenon in the NC paradigm. |
Jing Xu; Haoxiong Liu; |
414 | Global Optimality for Euclidean CCCP Under Riemannian Convexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions. |
Melanie Weber; Suvrit Sra; |
415 | Reinforcement Learning Can Be More Efficient with Multiple Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, in this work, we study whether directly incorporating multiple alternate reward formulations of the same task in a single agent can lead to faster learning. |
Christoph Dann; Yishay Mansour; Mehryar Mohri; |
416 | Adversarial Learning of Distributional Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose an adversarial learning framework for distributional reinforcement learning, which adopts the concept of influence measure from the statistics community. |
Yang Sui; Yukun Huang; Hongtu Zhu; Fan Zhou; |
417 | Robust Explanation for Free or At The Cost of Faithfulness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, explanation methods are shown as vulnerable to adversarial perturbations, implying security concerns in high-stakes domains. In this paper, we investigate when robust explanations are necessary and what they cost. |
Zeren Tan; Yang Tian; |
418 | Jump-Start Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks: a guide-policy, and an exploration-policy. |
Ikechukwu Uchendu; Ted Xiao; Yao Lu; Banghua Zhu; Mengyuan Yan; Joséphine Simon; Matthew Bennice; Chuyuan Fu; Cong Ma; Jiantao Jiao; Sergey Levine; Karol Hausman; |
419 | Recovery Bounds on Class-Based Optimal Transport: A Sum-of-Norms Regularization Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For this purpose, we propose a convex OT program with a sum-of-norms regularization term, which provably recovers the underlying class structure under geometric assumptions. |
Arman Rahbar; Ashkan Panahi; Morteza Haghir Chehreghani; Devdatt Dubhashi; Hamid Krim; |
420 | Multi-Task Off-Policy Learning from Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problem, we propose a hierarchical off-policy optimization algorithm HierOPO. |
Joey Hong; Branislav Kveton; Manzil Zaheer; Sumeet Katariya; Mohammad Ghavamzadeh; |
421 | Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. |
Dominik Schnaus; Jongseok Lee; Daniel Cremers; Rudolph Triebel; |
422 | Towards Stable and Efficient Adversarial Training Against $l_1$ Bounded Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem of stably and efficiently training a deep neural network robust to adversarial perturbations bounded by an $l_1$ norm. |
Yulun Jiang; Chen Liu; Zhichao Huang; Mathieu Salzmann; Sabine Süsstrunk; |
423 | Universal Morphology Control Via Contextual Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies, but pay little attention to the dependency of a robot’s control policy on its morphology context. In this paper, we propose a hierarchical architecture to better model this dependency via contextual modulation, which includes two key submodules: (1) Instead of enforcing hard parameter sharing across robots, we use hypernetworks to generate morphology-dependent control parameters; (2) We propose a fixed attention mechanism that solely depends on the morphology to modulate the interactions between different limbs in a robot. |
Zheng Xiong; Jacob Beck; Shimon Whiteson; |
424 | CocktailSGD: Fine-tuning Foundation Models Over 500Mbps Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CocktailSGD, a novel communication-efficient training framework that combines three distinct compression techniques — random sparsification, top-K sparsification, and quantization — to achieve much greater compression than each individual technique alone. |
Jue WANG; Yucheng Lu; Binhang Yuan; Beidi Chen; Percy Liang; Christopher De Sa; Christopher Re; Ce Zhang; |
425 | Is Overfitting Necessary for Implicit Video Representation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a new paradigm in efficient INR for videos based on the idea of strong lottery ticket (SLT) hypothesis (Zhou et al., 2019), which demonstrates the possibility of finding an accurate subnetwork mask, called supermask, for a randomly initialized classification network without weight training. |
Hee Min Choi; Hyoa Kang; Dokwan Oh; |
426 | Learning The Right Layers A Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we assume a semi-supervised learning setting, where the class of a small percentage of nodes is initially provided, and we propose a parameter-free Laplacian-regularized model that learns an optimal nonlinear combination of the different layers from the available input labels. |
Sara Venturini; Andrea Cristofari; Francesco Rinaldi; Francesco Tudisco; |
427 | Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $\mathcal{O}\big(K^2(\ln T)\sqrt{T}\big)$ where $K$ is the number of experts. |
Dirk van der Hoeven; Ciara Pike-Burke; Hao Qiu; Nicolò Cesa-Bianchi; |
428 | Provable Benefit of Mixup for Finding Optimal Decision Boundaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. |
Junsoo Oh; Chulhee Yun; |
429 | Covariate Balancing Using The Integral Probability Metric for Causal Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider to use the integral probability metric (IPM), which is a metric between two probability measures, for covariate balancing. |
Insung Kong; Yuha Park; Joonhyuk Jung; Kwonsang Lee; Yongdai Kim; |
430 | Fair and Accurate Decision Making Through Group-Aware Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In some cases, these AI systems can be unfair by exhibiting bias or discrimination against certain social groups, which can have severe consequences in real life. Inspired by one of the most well-known human learning skills called grouping, we address this issue by proposing a novel machine learning (ML) framework where the ML model learns to group a diverse set of problems into distinct subgroups to solve each subgroup using its specific sub-model. |
Ramtin Hosseini; Li Zhang; Bhanu Garg; Pengtao Xie; |
431 | Robust and Scalable Bayesian Online Changepoint Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. |
Matias Altamirano; Francois-Xavier Briol; Jeremias Knoblauch; |
432 | Uncertainty Estimation By Fisher Information-based Evidential Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). |
Danruo DENG; Guangyong Chen; Yang YU; Furui Liu; Pheng-Ann Heng; |
433 | Network Effects in Performative Prediction Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the multi-agent performative prediction (Multi-PP) games over multiplex networks. |
Xiaolu Wang; Chung-Yiu Yau; Hoi To Wai; |
434 | MyoDex: A Generalizable Prior for Dexterous Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. |
Vittorio Caggiano; Sudeep Dasari; Vikash Kumar; |
435 | The Case for 4-bit Precision: K-bit Inference Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, a 30B 8-bit model and a 60B 4-bit model have the same number of bits but may have very different zero-shot accuracies. In this work, we study this trade-off by developing inference scaling laws of zero-shot performance in Large Language Models (LLMs) to determine the bit-precision and model size that maximizes zero-shot performance. |
Tim Dettmers; Luke Zettlemoyer; |
436 | Explore and Exploit The Diverse Knowledge in Model Zoo for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper argues that the knowledge contained in weaker models is valuable and presents a method for leveraging the diversity within the model zoo to improve out-of-distribution generalization capabilities. |
Yimeng Chen; Tianyang Hu; Fengwei Zhou; Zhenguo Li; Zhi-Ming Ma; |
437 | Dividing and Conquering A BlackBox to A Mixture of Interpretable Models: Route, Interpret, Repeat Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. |
Shantanu Ghosh; Ke Yu; Forough Arabshahi; kayhan Batmanghelich; |
438 | Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, optimization becomes challenging due to chaotic and non-smooth loss landscapes. To tackle this issue, we propose a novel approach called Adaptive Barrier Smoothing (ABS), which introduces a class of softened complementarity systems that correspond to barrier-smoothed objectives. |
Shenao Zhang; Wanxin Jin; Zhaoran Wang; |
439 | Online Restless Bandits with Unobserved States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TSEETC, a learning algorithm based on Thompson Sampling with Episodic Explore-Then-Commit. |
Bowen Jiang; Bo Jiang; Jian Li; TAO LIN; Xinbing Wang; Chenghu Zhou; |
440 | Analyzing Diffusion As Serial Reproduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By identifying a correspondence between diffusion models and a well-known paradigm in cognitive science known as serial reproduction, whereby human agents iteratively observe and reproduce stimuli from memory, we show how the aforementioned properties of diffusion models can be explained as a natural consequence of this correspondence. |
Raja Marjieh; Ilia Sucholutsky; Thomas A Langlois; Nori Jacoby; Thomas L. Griffiths; |
441 | Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by concerns about making online decisions that incur undue amount of risk at each time step, in this paper, we formulate the probably anytime-safe stochastic combinatorial semi-bandits problem. |
Yunlong Hou; Vincent Tan; Zixin Zhong; |
442 | Trompt: Towards A Better Deep Neural Network for Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models. |
Kuan-Yu Chen; Ping-Han Chiang; Hsin-Rung Chou; Ting-Wei Chen; Tien-Hao Chang; |
443 | A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. |
James Urquhart Allingham; Jie Ren; Michael W Dusenberry; Xiuye Gu; Yin Cui; Dustin Tran; Jeremiah Zhe Liu; Balaji Lakshminarayanan; |
444 | Multi-Objective GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. |
Moksh Jain; Sharath Chandra Raparthy; Alex Hernández-García; Jarrid Rector-Brooks; Yoshua Bengio; Santiago Miret; Emmanuel Bengio; |
445 | Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. |
Yiming Cui; Linjie Yang; Haichao Yu; |
446 | Cramming: Training A Language Model on A Single GPU in One Day Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. |
Jonas Geiping; Tom Goldstein; |
447 | On Computing Optimal Tree Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. |
Christian Komusiewicz; Pascal Kunz; Frank Sommer; Manuel Sorge; |
448 | Kernel Logistic Regression Approximation of An Understandable ReLU Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an understandable neural network whose score function is modeled as an additive sum of univariate spline functions. |
Marie Guyomard; Susana Barbosa; Lionel Fillatre; |
449 | On Coresets for Clustering in Small Dimensional Euclidean Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of constructing small coresets for $k$-Median in Euclidean spaces. |
Lingxiao Huang; Ruiyuan Huang; Zengfeng Huang; Xuan Wu; |
450 | A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new GT-based Risk-Averse Equilibrium (RAE) that always produces a solution that minimises the potential variance in reward accounting for the strategy of other agents. |
Oliver Slumbers; David Henry Mguni; Stefano B Blumberg; Stephen Marcus McAleer; Yaodong Yang; Jun Wang; |
451 | Learning Compiler Pass Orders Using Coreset and Normalized Value Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of predicting passes sequentially, we directly learn a policy on the pass sequence space, which outperforms the default -Oz flag by an average of 4.5% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. |
Youwei Liang; Kevin Stone; Ali Shameli; Chris Cummins; Mostafa Elhoushi; Jiadong Guo; Benoit Steiner; Xiaomeng Yang; Pengtao Xie; Hugh James Leather; Yuandong Tian; |
452 | SGD with Large Step Sizes Learns Sparse Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing *loss stabilization*, and (ii) this stabilization induces a hidden stochastic dynamics that *biases it implicitly* toward simple predictors. |
Maksym Andriushchenko; Aditya Vardhan Varre; Loucas Pillaud-Vivien; Nicolas Flammarion; |
453 | Optimally-weighted Estimators of The Maximum Mean Discrepancy for Likelihood-Free Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity. |
Ayush Bharti; Masha Naslidnyk; Oscar Key; Samuel Kaski; Francois-Xavier Briol; |
454 | Input Uncertainty Propagation Through Trained Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of input uncertainty propagation through trained neural networks. |
Paul Monchot; Loic Coquelin; Sébastien Julien Petit; Sébastien Marmin; Erwan Le Pennec; Nicolas Fischer; |
455 | GeCoNeRF: Few-shot Neural Radiance Fields Via Geometric Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework to regularize Neural Radiance Field (NeRF) in a few-shot setting with a geometry-aware consistency regularization. |
Min-Seop Kwak; Jiuhn Song; Seungryong Kim; |
456 | Learning Prescriptive ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a piecewise linear neural network model that can balance strong prescriptive performance and interpretability, which we refer to as the prescriptive ReLU network, or P-ReLU. |
Wei Sun; Asterios Tsiourvas; |
457 | Achieving Linear Speedup in Non-IID Federated Bilevel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d. datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation. |
Minhui Huang; Dewei Zhang; Kaiyi Ji; |
458 | Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, the alignment between LLMs’ knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. |
Thomas Carta; Clément ROMAC; Thomas Wolf; sylvain lamprier; Olivier Sigaud; Pierre-Yves Oudeyer; |
459 | In Search for A Generalizable Method for Source Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. |
Malik Boudiaf; tom denton; Bart van Merrienboer; Vincent Dumoulin; Eleni Triantafillou; |
460 | Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element. |
Georgii Sergeevich Novikov; Daniel Bershatsky; Julia Gusak; Alex Shonenkov; Denis Valerievich Dimitrov; Ivan Oseledets; |
461 | Learning Controllable Degradation for Real-World Super-Resolution Via Constrained Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remedy the issue, we propose to generate realistic SR datasets for unseen degradation levels by exploring the latent space of real LR images and thereby producing more diverse yet realistic LR images with complex real-world artifacts. |
Seobin Park; Dongjin Kim; Sungyong Baik; Tae Hyun Kim; |
462 | Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we explore how the generative process affects the downstream ML task. |
Boris van Breugel; Zhaozhi Qian; Mihaela van der Schaar; |
463 | Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds Through Algorithmic Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. |
Anass Aghbalou; Guillaume Staerman; |
464 | Deep Perturbation Learning: Enhancing The Network Performance Via Image Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike the existing works, in this paper, we introduce a novel framework Deep Perturbation Learning (DPL), the new insights into understanding image perturbations, to enhance the performance of networks rather than decrease the performance. |
Zifan Song; Xiao Gong; Guosheng Hu; Cai Rong Zhao; |
465 | A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. |
Guanyu Nie; Yididiya Y Nadew; Yanhui Zhu; Vaneet Aggarwal; Christopher John Quinn; |
466 | Do Machine Learning Models Learn Statistical Rules Inferred from Data? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thereby seek to infer statistical rules from the data and quantify the extent to which a model has learned them. We propose a framework SQRL that integrates logic-based methods with statistical inference to derive these rules from a model’s training data without supervision. |
Aaditya Naik; Yinjun Wu; Mayur Naik; Eric Wong; |
467 | Thompson Sampling with Less Exploration Is Fast and Optimal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\epsilon$-Exploring Thompson Sampling ($\epsilon$-TS), a modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. |
Tianyuan Jin; XIANGLIN YANG; Xiaokui Xiao; Pan Xu; |
468 | Generalized Disparate Impact for Configurable Fairness Solutions in ML Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We make two contributions in the field of AI fairness over continuous protected attributes. |
Luca Giuliani; Eleonora Misino; Michele Lombardi; |
469 | On The Effectiveness of Offline RL for Dialogue Response Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. |
Paloma Sodhi; Felix Wu; Ethan R. Elenberg; Kilian Q Weinberger; Ryan McDonald; |
470 | Computational Asymmetries in Robust Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $\Sigma^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $\mathit{NP}$-hard, while attacking it is $\Sigma_2^P$-hard. |
Samuele Marro; Michele Lombardi; |
471 | Nearly-tight Bounds for Deep Kernel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove novel and nearly-tight generalization bounds based on the uniform covering number and the Rademacher chaos complexity for deep (multiple) kernel machines. |
Yifan Zhang; Min-Ling Zhang; |
472 | DoCoFL: Downlink Compression for Cross-Device Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose DoCoFL — a new framework for downlink compression in the cross-device setting. |
Ron Dorfman; Shay Vargaftik; Yaniv Ben-Itzhak; Kfir Yehuda Levy; |
473 | KDEformer: Accelerating Transformers Via Kernel Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds. |
Amir Zandieh; Insu Han; Majid Daliri; Amin Karbasi; |
474 | Probabilistic Attention-to-Influence Neural Models for Event Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While neural sequence models are able to capture complex and potentially long-range historical dependencies, they often lack the interpretability of simpler models for event sequence dynamics. We provide a novel neural framework in such a setting – a probabilistic attention-to-influence neural model – which not only captures complex instance-wise interactions between events but also learns influencers for each event type of interest. |
Xiao Shou; Debarun Bhattacharjya; Tian Gao; Dharmashankar Subramanian; Oktie Hassanzadeh; Kristin Bennett; |
475 | Conformal Prediction Sets for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a conformal procedure to equip GNNs with prediction sets that come with distribution-free guarantees — the output set contains the true label with arbitrarily high probability.By leveraging the network homophily we construct sets with comparable or better efficiency (average size) and significantly improved singleton hit ratio (correct sets of size one). |
Soroush H. Zargarbashi; Simone Antonelli; Aleksandar Bojchevski; |
476 | Flash: Concept Drift Adaptation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel adaptive optimizer called Flash that simultaneously addresses both statistical heterogeneity and the concept drift issues. |
Kunjal Panchal; Sunav Choudhary; Subrata Mitra; Koyel Mukherjee; Somdeb Sarkhel; Saayan Mitra; Hui Guan; |
477 | Input Perturbation Reduces Exposure Bias in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. |
Mang Ning; Enver Sangineto; Angelo Porrello; Simone Calderara; Rita Cucchiara; |
478 | Efficient Training of Language Models Using Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study an efficient approach to train language models using few-shot learners. |
Sashank J. Reddi; Sobhan Miryoosefi; Stefani Karp; Shankar Krishnan; Satyen Kale; Seungyeon Kim; Sanjiv Kumar; |
479 | NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a general acceleration methodology called NeuralStagger by spatially and temporally decomposing the original learning tasks into several coarser-resolution subtasks. |
Xinquan Huang; Wenlei Shi; Qi Meng; Yue Wang; Xiaotian Gao; Jia Zhang; Tie-Yan Liu; |
480 | Out-of-Distribution Generalization of Federated Learning Via Implicit Invariant Relationships Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, learning invariant relationships is often in an explicit manner from data, representation, and distribution, which violates the federated principles of privacy-preserving and limited communication. In this paper, we propose FedIIR, which implicitly learns invariant relationships from parameter for out-of-distribution generalization, adhering to the above principles. |
Yaming Guo; Kai Guo; Xiaofeng Cao; Tieru Wu; Yi Chang; |
481 | Differentially Private Sharpness-Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate sharpness, a key factor in achieving better generalization, in private learning. |
Jinseong Park; Hoki Kim; Yujin Choi; Jaewook Lee; |
482 | Monotonic Location Attention for Length Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce novel variants of location attention building on top of Dubois et al. (2020) to address the new diagnostic tasks. |
Jishnu Ray Chowdhury; Cornelia Caragea; |
483 | Effective Structured Prompting By Meta-Learning and Representative Verbalizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting. |
Weisen Jiang; Yu Zhang; James Kwok; |
484 | End-to-end Differentiable Clustering with Associative Memories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM. |
Bishwajit Saha; Dmitry Krotov; Mohammed J Zaki; Parikshit Ram; |
485 | Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression Under Gaussian Marginals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the task of agnostically learning halfspaces under the Gaussian distribution. |
Ilias Diakonikolas; Daniel Kane; Lisheng Ren; |
486 | PaLM-E: An Embodied Multimodal Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. |
Danny Driess; Fei Xia; Mehdi S. M. Sajjadi; Corey Lynch; Aakanksha Chowdhery; brian ichter; Ayzaan Wahid; Jonathan Tompson; quan vuong; Tianhe Yu; Wenlong Huang; Yevgen Chebotar; Pierre Sermanet; Daniel Duckworth; Sergey Levine; Vincent Vanhoucke; Karol Hausman; Marc Toussaint; Klaus Greff; Andy Zeng; Igor Mordatch; Pete Florence; |
487 | Fighting Fire with Fire: Contrastive Debiasing Without Bias-free Data Via Generative Bias-transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Contrastive Debiasing via Generative Bias-transformation (CDvG) which is capable of operating without explicitly exploiting bias labels and bias-free samples. |
Yeonsung Jung; Hajin Shim; June Yong Yang; Eunho Yang; |
488 | Image Generation with Shortest Path Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. |
Ayan Das; Stathi Fotiadis; Anil Batra; Farhang Nabiei; FengTing Liao; Sattar Vakili; Da-shan Shiu; Alberto Bernacchia; |
489 | Deterministic Equivalent and Error Universality of Deep Random Features Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. |
Dominik Schröder; Hugo Cui; Daniil Dmitriev; Bruno Loureiro; |
490 | DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study Metropolis-Hastings (MH), one of the most fundamental MCMC methods, for large-scale Bayesian inference under differential privacy. |
Wanrong Zhang; Ruqi Zhang; |
491 | A Fast, Well-Founded Approximation to The Empirical Neural Tangent Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call sum of logits, converges to the true eNTK at initialization. |
Mohamad Amin Mohamadi; Wonho Bae; Danica J. Sutherland; |
492 | DiscoBAX – Discovery of Optimal Intervention Sets in Genomic Experiment Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DiscoBAX – a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. |
Clare Lyle; Arash Mehrjou; Pascal Notin; Andrew Jesson; Stefan Bauer; Yarin Gal; Patrick Schwab; |
493 | On The Within-Group Fairness of Screening Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that screening policies that use calibrated classifiers may suffer from an understudied type of within-group unfairness—they may unfairly treat qualified members within demographic groups of interest. |
Nastaran Okati; Stratis Tsirtsis; Manuel Gomez Rodriguez; |
494 | UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes the Unified and Progressive Pruning (UPop) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. |
Dachuan Shi; Chaofan Tao; Ying Jin; Zhendong Yang; Chun Yuan; Jiaqi Wang; |
495 | Learning Representations Without Compositional Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this assumption is not always valid for real-world tabular datasets with complex dependencies between feature sets, resulting in localized information that is harder to learn. To overcome this limitation, we propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. |
Tennison Liu; Jeroen Berrevoets; Zhaozhi Qian; Mihaela van der Schaar; |
496 | Optimization for Amortized Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient amortized optimization scheme for inverse problems with a deep generative prior. |
Tianci Liu; Tong Yang; Quan Zhang; Qi Lei; |
497 | Topologically Faithful Image Segmentation Via Induced Matching of Persistence Barcodes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the concept of induced matchings from persistent homology to achieve a spatially correct matching between persistence barcodes in a segmentation setting. |
Nico Daniel Stucki; Johannes C. Paetzold; Suprosanna Shit; bjoern menze; Ulrich Bauer; |
498 | Fast Algorithms for Distributed K-Clustering with Outliers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the $k$-clustering problems with outliers in distributed setting. |
Junyu Huang; Qilong Feng; Ziyun Huang; Jinhui Xu; Jianxin Wang; |
499 | GNOT: A General Neural Operator Transformer for Operator Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there are several challenges for learning operators in practical applications like the irregular mesh, multiple input functions, and complexity of the PDEs’ solution. To address these challenges, we propose a general neural operator transformer (GNOT), a scalable and effective transformer-based framework for learning operators. |
Zhongkai Hao; Zhengyi Wang; Hang Su; Chengyang Ying; Yinpeng Dong; Songming Liu; Ze Cheng; Jian Song; Jun Zhu; |
500 | A Kernelized Stein Discrepancy for Biological Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the “KSD-B”, a novel divergence measure for distributions over biological sequences that is based on the kernelized Stein discrepancy (KSD). |
Alan Nawzad Amin; Eli N Weinstein; Debora Susan Marks; |
501 | Minimax Estimation of Discontinuous Optimal Transport Maps: The Semi-discrete Case Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb{R}^d$, on the basis of i.i.d. samples. |
Aram-Alexandre Pooladian; Vincent Divol; Jonathan Niles-Weed; |
502 | Best Arm Identification in Multi-Agent Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. |
Filippo Vannella; Alexandre Proutiere; Jaeseong Jeong; |
503 | Streaming Submodular Maximization with Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the problem of privately maximizing a submodular function in the streaming setting. |
Anamay Chaturvedi; Huy Nguyen; Thy Dinh Nguyen; |
504 | What Do CNNs Learn in The First Layer and Why? A Linear Systems Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It has previously been reported that the representation that is learned in the first layer of deep Convolutional Neural Networks (CNNs) is highly consistent across initializations and architectures. In this work, we quantify this consistency by considering the first layer as a filter bank and measuring its energy distribution. |
Rhea Chowers; Yair Weiss; |
505 | Retrosynthetic Planning with Dual Value Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. |
Guoqing Liu; Di Xue; Shufang Xie; Yingce Xia; Austin Tripp; Krzysztof Maziarz; Marwin Segler; Tao Qin; Zongzhang Zhang; Tie-Yan Liu; |
506 | Paging with Succinct Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study learning-augmented paging from the new perspective of requiring the least possible amount of predicted information. |
Antonios Antoniadis; Joan Boyar; Marek Elias; Lene M. Favrholdt; Ruben Hoeksma; Kim S. Larsen; Adam Polak; Bertrand Simon; |
507 | Superhuman Fairness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead re-cast fair machine learning as an imitation learning task by introducing superhuman fairness, which seeks to simultaneously outperform human decisions on multiple predictive performance and fairness measures. |
Omid Memarrast; Linh Vu; Brian D Ziebart; |
508 | Constrained Phi-Equilibria Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce and computationally characterize constrained Phi-equilibria—a more general notion than constrained CEs—in normal-form games. |
Martino Bernasconi; Matteo Castiglioni; Alberto Marchesi; Francesco Trovò; Nicola Gatti; |
509 | Expectation-Complete Graph Representations with Homomorphisms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. |
Pascal Welke; Maximilian Thiessen; Fabian Jogl; Thomas Gärtner; |
510 | Masked Bayesian Neural Networks : Theoretical Guarantee and Its Posterior Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new node-sparse BNN model which has good theoretical properties and is computationally feasible. |
Insung Kong; Dongyoon Yang; Jongjin Lee; Ilsang Ohn; GYUSEUNG BAEK; Yongdai Kim; |
511 | Discover-Then-Rank Unlabeled Support Vectors in The Dual Space for Multi-Class Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to approach active learning (AL) from a novel perspective of discovering and then ranking potential support vectors by leveraging the key properties of the dual space of a sparse kernel max-margin predictor. |
Dayou Yu; Weishi Shi; Qi Yu; |
512 | Communication-Constrained Bandits Under Additive Gaussian Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. |
Prathamesh Mayekar; Jonathan Scarlett; Vincent Tan; |
513 | Collaborative Causal Inference with Fair Incentives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a reward scheme designed using the unique statistical properties that are required by causal inference to guarantee certain desirable incentive criteria (e.g., fairness, benefit) for the parties based on their contributions. To achieve this, we propose a data valuation function to value parties’ data for CCI based on the distributional closeness of its resulting treatment effect estimate to that utilizing the aggregated data from all parties. |
Rui Qiao; Xinyi Xu; Bryan Kian Hsiang Low; |
514 | Performative Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment. |
Debmalya Mandal; Stelios Triantafyllou; Goran Radanovic; |
515 | Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs. |
Shokichi Takakura; Taiji Suzuki; |
516 | GuardHFL: Privacy Guardian for Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We put forth GuardHFL, the first-of-its-kind efficient and privacy-preserving HFL framework. |
Hanxiao Chen; Meng Hao; Hongwei Li; Kangjie Chen; Guowen Xu; Tianwei Zhang; Xilin Zhang; |
517 | Overcoming Simplicity Bias in Deep Networks Using A Feature Sieve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the *feature sieve*. |
Rishabh Tiwari; Pradeep Shenoy; |
518 | Exploring The Limits of Model-Targeted Indiscriminate Data Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters (i.e., model-targeted attacks). |
Yiwei Lu; Gautam Kamath; Yaoliang Yu; |
519 | The Regret of Exploration and The Control of Bad Episodes in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The first contribution of this paper is the introduction of a new performance measure of a RL algorithm that is more discriminating than the regret, that we call the *regret of exploration* that measures the asymptotic cost of exploration. The second contribution is a new *performance test* (PT) to end episodes in RL optimistic algorithms. |
Victor Boone; Bruno Gaujal; |
520 | The Wisdom of Hindsight Makes Language Models Better Instruction Followers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. |
Tianjun Zhang; Fangchen Liu; Justin Wong; Pieter Abbeel; Joseph E. Gonzalez; |
521 | STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conjecture that learning N:M masks with Adam should take the critical regime of variance estimation into account. In light of this, we propose STEP, an Adam-aware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (*precondition phase*) and subsequently, the variance remains fixed and is used as a precondition to learn N:M masks (*mask-learning phase*). |
Yucheng Lu; Shivani Agrawal; Suvinay Subramanian; Oleg Rybakov; Christopher De Sa; Amir Yazdanbakhsh; |
522 | Speeding Up Bellman Ford Via Minimum Violation Permutations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its running time is governed by the order the algorithm examines vertices for iterative updates on the value of their shortest path. In this work we study this problem through the lens of ‘Algorithms with predictions,’ and show how to leverage auxiliary information from similar instances to improve the running time. |
Silvio Lattanzi; Ola Svensson; Sergei Vassilvitskii; |
523 | Reflected Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To incorporate data constraints in a principled manner, we present Reflected Diffusion Models, which instead reverse a reflected stochastic differential equation evolving on the support of the data. |
Aaron Lou; Stefano Ermon; |
524 | Compositional Exemplars for In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we systematically formulate in-context example selection as a subset selection problem, and optimize it in an end-to-end fashion. |
Jiacheng Ye; Zhiyong Wu; Jiangtao Feng; Tao Yu; Lingpeng Kong; |
525 | How Much Does Initialization Affect Generalization? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show on the contrary that, independently of architecture, SGD can itself be the cause of poor generalization if one does not ensure good initialization. |
Sameera Ramasinghe; Lachlan Ewen MacDonald; Moshiur Farazi; Hemanth Saratchandran; Simon Lucey; |
526 | Context Consistency Regularization for Label Sparsity in Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, consistency regularization techniques have been used to generate artificial labels from unlabeled augmented instances. To fully exploit the sequential characteristic of time series in consistency regularization, we propose a novel method of data augmentation called *context-attached augmentation*, which adds preceding and succeeding instances to a target instance to form its augmented instance. |
Yooju Shin; Susik Yoon; Hwanjun Song; Dongmin Park; Byunghyun Kim; Jae-Gil Lee; Byung Suk Lee; |
527 | Constrained Monotonic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, this construction does not work with popular non-saturated activation functions as it can only approximate convex functions. We show this shortcoming can be fixed by constructing two additional activation functions from a typical unsaturated monotonic activation function and employing each of them on the part of neurons. |
Davor Runje; Sharath M Shankaranarayana; |
528 | Attributing Image Generative Models Using Latent Fingerprints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff. |
Guangyu Nie; Changhoon Kim; Yezhou Yang; Yi Ren; |
529 | Principled Offline RL in The Presence of Rich Exogenous Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information and introduce new offline RL benchmarks that offer the ability to study this problem. |
Riashat Islam; Manan Tomar; Alex Lamb; Yonathan Efroni; Hongyu Zang; Aniket Rajiv Didolkar; Dipendra Misra; Xin Li; Harm van Seijen; Remi Tachet des Combes; John Langford; |
530 | Robust Non-Linear Feedback Coding Via Power-Constrained Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise. |
Junghoon Kim; Taejoon Kim; David Love; Christopher Brinton; |
531 | Discrete Continuous Optimization Framework for Simultaneous Clustering and Training in Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a new framework of learning mixture models via automatic clustering called PRESTO, wherein we optimize a joint objective function on the model parameters and the partitioning, with each model tailored to perform well on its specific cluster. |
Parth Vipul Sangani; Arjun Shashank Kashettiwar; Pritish Chakraborty; Bhuvan Reddy Gangula; Durga S; Ganesh Ramakrishnan; Rishabh K Iyer; Abir De; |
532 | Generating Language Corrections for Teaching Physical Control Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design and build CORGI, a model trained to generate language corrections for physical control tasks, such as learning to ride a bike. |
Megha Srivastava; Noah Goodman; Dorsa Sadigh; |
533 | Revisiting The Linear-Programming Framework for Offline RL with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. |
Asuman E. Ozdaglar; Sarath Pattathil; Jiawei Zhang; Kaiqing Zhang; |
534 | Sampling-Based Accuracy Testing of Posterior Estimators for General Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Tests of Accuracy with Random Points (TARP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators. |
Pablo Lemos; Adam Coogan; Yashar Hezaveh; Laurence Perreault-Levasseur; |
535 | Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we prove that by tuning hyperparameters to maximize marginal likelihood (the empirical Bayes procedure), performance, as measured by the marginal likelihood, *improves monotonically* with the input dimension. On the other hand, cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent. |
Liam Hodgkinson; Chris van der Heide; Fred Roosta; Michael W. Mahoney; |
536 | Statistical Foundations of Prior-Data Fitted Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior. |
Thomas Nagler; |
537 | QASA: Advanced Question Answering on Scientific Articles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our intensive think-aloud study that revealed the three types of questions: surface, testing, and deep questions, we first propose the QASA benchmark that consists of 1798 novel question answering pairs that require full-stack reasoning on scientific articles in AI and ML fields. Then we propose the QASA approach that tackles the full-stack reasoning with large language models via associative selection, evidential rationale-generation, and systematic composition. |
Yoonjoo Lee; Kyungjae Lee; Sunghyun Park; Dasol Hwang; Jaehyeon Kim; Hong-in Lee; Moontae Lee; |
538 | Anti-Exploration By Random Network Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. |
Alexander Nikulin; Vladislav Kurenkov; Denis Tarasov; Sergey Kolesnikov; |
539 | Truncating Trajectories in Monte Carlo Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths, i.e., *truncated*. |
Riccardo Poiani; Alberto Maria Metelli; Marcello Restelli; |
540 | Fast, Differentiable and Sparse Top-k: A Convex Analysis Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose new differentiable and sparse top-$k$ operators. |
Michael Eli Sander; Joan Puigcerver; Josip Djolonga; Gabriel Peyré; Mathieu Blondel; |
541 | Certified Robust Neural Networks: Generalization and Corruption Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Surprisingly, overfitting is a major concern in adversarial training despite being mostly absent in standard training. We provide here theoretical evidence for this peculiar “robust overfitting” phenomenon. |
Mohammed Amine Bennouna; Ryan Lucas; Bart Van Parys; |
542 | NNSplitter: An Active Defense Solution for DNN Model Via Automated Weight Obfuscation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an active model IP protection scheme, namely NNSplitter, which actively protects the model by splitting it into two parts: the obfuscated model that performs poorly due to weight obfuscation, and the model secrets consisting of the indexes and original values of the obfuscated weights, which can only be accessed by authorized users with the support of the trusted execution environment. |
Tong Zhou; Yukui Luo; Shaolei Ren; Xiaolin Xu; |
543 | FP-Diffusion: Improving Score-based Diffusion Models By Enforcing The Underlying Score Fokker-Planck Equation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients). |
Chieh-Hsin Lai; Yuhta Takida; Naoki Murata; Toshimitsu Uesaka; Yuki Mitsufuji; Stefano Ermon; |
544 | Data Poisoning Attacks Against Multimodal Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities. |
Ziqing Yang; Xinlei He; Zheng Li; Michael Backes; Mathias Humbert; Pascal Berrang; Yang Zhang; |
545 | Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data Via Amalgamation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a compositional SDR that can handle zeros naturally while incorporating the nonlinear nature and spurious negative correlations among components rigorously. |
Junyoung Park; Jeongyoun Ahn; Cheolwoo Park; |
546 | Team Belief DAG: Generalizing The Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria Via Regret Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide new complexity results on the computation of optimal strategies for teams, and propose a new representation, coined *team belief DAG (TB-DAG)*, that describes team strategies as a convex set. |
Brian Hu Zhang; Gabriele Farina; Tuomas Sandholm; |
547 | A Theory of Representation Learning Gives A Deep Generalisation of Kernel Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, standard theoretical approaches (formally NNGPs) involving infinite width limits eliminate representation learning. We therefore develop a new infinite width limit, the Bayesian representation learning limit, that exhibits representation learning mirroring that in finite-width models, yet at the same time, retains some of the simplicity of standard infinite-width limits. |
Adam X. Yang; Maxime Robeyns; Edward Milsom; Ben Anson; Nandi Schoots; Laurence Aitchison; |
548 | MANSA: Learning Fast and Slow in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination. |
David Henry Mguni; Haojun Chen; Taher Jafferjee; Jianhong Wang; Longfei Yue; Xidong Feng; Stephen Marcus McAleer; Feifei Tong; Jun Wang; Yaodong Yang; |
549 | Causal Discovery with Latent Confounders Based on Higher-Order Cumulants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of the power of the closed-form solution to OICA corresponding to the One-Latent-Component structure, we formulate a way to estimate the mixing matrix using the higher-order cumulants, and further propose the testable One-Latent-Component condition to identify the latent variables and determine causal orders. |
Ruichu Cai; Zhiyi Huang; Wei Chen; Zhifeng Hao; Kun Zhang; |
550 | Unsupervised Skill Discovery for Learning Shared Structures Across Changing Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new unsupervised skill discovery algorithm that discovers a set of skills that can represent shared structures across changing environments. |
Sang-Hyun Lee; Seung-Woo Seo; |
551 | Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Synthetic prompting, a method that leverages a few handcrafted examples to prompt the model to generate more examples by itself, and selects effective demonstrations to elicit better reasoning. |
Zhihong Shao; Yeyun Gong; yelong shen; Minlie Huang; Nan Duan; Weizhu Chen; |
552 | Contextual Conservative Interleaving Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the contextual conservative interleaving bandit problem, which has a performance constraint that requires the chosen actions to be not much worse than given baseline actions in each round. |
Kei Takemura; |
553 | Multi-Objective Population Based Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT. |
Arkadiy Dushatskiy; Alexander Chebykin; Tanja Alderliesten; Peter Bosman; |
554 | Active Learning Based Structural Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, Active Learning based Structural Inference (ALaSI), to infer the existence of directed connections from observed agents’ states over a time period in a dynamical system. |
Aoran Wang; Jun Pang; |
555 | Multi-Fidelity Covariance Estimation in The Log-Euclidean Geometry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold. |
Aimee Maurais; Terrence Alsup; Benjamin Peherstorfer; Youssef Marzouk; |
556 | MolDiff: Addressing The Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We define this problem as the atom-bond inconsistency problem and claim it is the main reason for current approaches to generating unrealistic 3D molecules. To overcome this problem, we propose a new diffusion model called MolDiff which can generate atoms and bonds simultaneously while still maintaining their consistency by explicitly modeling the dependence between their relationships. |
Xingang Peng; Jiaqi Guan; qiang liu; Jianzhu Ma; |
557 | Deep Temporal Sets with Evidential Reinforced Attentions for Unique Behavioral Pattern Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real-life applications, such as digital behavioral biomarker identification, often require the discovery of complex spatiotemporal patterns in multimodal data, which is largely under-explored. To fill this gap, we propose a novel model that integrates uniquely designed Deep Temporal Sets (DTS) with Evidential Reinforced Attentions (ERA). |
Dingrong Wang; Deep Shankar Pandey; Krishna Prasad Neupane; Zhiwei Yu; Ervine Zheng; Zhi Zheng; Qi Yu; |
558 | Vector Quantized Wasserstein Auto-Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study learning deep discrete representations from the generative viewpoint. |
Long Tung Vuong; Trung Le; He Zhao; Chuanxia Zheng; Mehrtash Harandi; Jianfei Cai; Dinh Phung; |
559 | Simple Embodied Language Learning As A Byproduct of Meta-Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we design an office navigation environment, where the agent�s goal is to find a particular office, and office locations differ in different buildings (i.e., tasks). |
Evan Zheran Liu; Sahaana Suri; Tong Mu; Allan Zhou; Chelsea Finn; |
560 | Spred: Solving L1 Penalty with SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent. |
Liu Ziyin; Zihao Wang; |
561 | Text-To-4D Dynamic Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions. |
Uriel Singer; Shelly Sheynin; Adam Polyak; Oron Ashual; Iurii Makarov; Filippos Kokkinos; Naman Goyal; Andrea Vedaldi; Devi Parikh; Justin Johnson; Yaniv Taigman; |
562 | Effective and Efficient Structural Inference with Reservoir Computing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an effective and efficient structural inference approach by integrating a Reservoir Computing (RC) network into a Variational Auto-encoder-based (VAE-based) structural inference framework. |
Aoran Wang; Tsz Pan Tong; Jun Pang; |
563 | Fractional Denoising for 3D Molecular Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. |
Shikun Feng; Yuyan Ni; Yanyan Lan; Zhi-Ming Ma; Weiying Ma; |
564 | Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use two separate assumptions for positive and negative curvatures, so that we can study the different implications of the two. |
Zhengmian Hu; Xidong Wu; Heng Huang; |
565 | A Reinforcement Learning Framework for Dynamic Mediation Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons. |
Lin Ge; Jitao Wang; Chengchun Shi; Zhenke Wu; Rui Song; |
566 | On The Functional Similarity of Robust and Non-Robust Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we investigate the functional similarity of robust and non-robust representations for image classification with the help of model stitching. |
András Balogh; Márk Jelasity; |
567 | Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. |
Yonggui Yan; Jie Chen; Pin-Yu Chen; Xiaodong Cui; Songtao Lu; Yangyang Xu; |
568 | Multi-Agent Best Arm Identification with Private Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For each privacy definition, we propose an algorithm based on a two-level successive elimination scheme. |
Alexandre Rio; Merwan Barlier; Igor Colin; Marta Soare; |
569 | Evidential Interactive Learning for Medical Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an evidential interactive learning framework that leverages evidence-based uncertainty estimation and interactive machine learning to improve image captioning with limited labeled data. |
Ervine Zheng; Qi Yu; |
570 | Generalization Bounds Using Data-Dependent Fractal Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. |
Benjamin Dupuis; George Deligiannidis; Umut Simsekli; |
571 | Quantitative Universal Approximation Bounds for Deep Belief Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. |
Julian Sieber; Johann Gehringer; |
572 | Learn to Accumulate Evidence from All Training Samples: Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This constraint often leads to inferior predictive performance compared to standard softmax models, making it challenging to extend them to many large-scale datasets. To unveil the real cause of this undesired behavior, we theoretically investigate evidential models and identify a fundamental limitation that explains the inferior performance: existing evidential activation functions create *zero evidence regions*, which prevent the model to learn from training samples falling into such regions. |
Deep Shankar Pandey; Qi Yu; |
573 | Multi-agent Online Scheduling: MMS Allocations for Indivisible Items Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of fairly allocating a sequence of indivisible items that arrive online in an arbitrary order to a group of $n$ agents with additive normalized valuation functions, we consider the allocation of goods and chores separately and propose algorithms for approximating maximin share (MMS) allocations for both settings. |
Shengwei Zhou; Rufan Bai; Xiaowei Wu; |
574 | High Fidelity Image Counterfactuals with Probabilistic Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general causal generative modelling framework for accurate estimation of high fidelity image counterfactuals with deep structural causal models. |
Fabio De Sousa Ribeiro; Tian Xia; Miguel Monteiro; Nick Pawlowski; Ben Glocker; |
575 | Two Losses Are Better Than One: Faster Optimization Using A Cheaper Proxy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. |
Blake Woodworth; Konstantin Mishchenko; Francis Bach; |
576 | Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities. |
Nader Asadi; MohammadReza Davari; Sudhir Mudur; Rahaf Aljundi; Eugene Belilovsky; |
577 | Sequential Changepoint Detection Via Backward Confidence Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). |
Shubhanshu Shekhar; Aaditya Ramdas; |
578 | On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute time. |
Francesco Di Giovanni; Lorenzo Giusti; Federico Barbero; Giulia Luise; Pietro Lio; Michael M. Bronstein; |
579 | Causal Bounds in Quasi-Markovian Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of computing bounds for causal queries on quasi-Markovian graphs with unobserved confounders and discrete valued observed variables, where identifiability does not hold. |
Madhumitha Shridharan; Garud Iyengar; |
580 | Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified “unconstrained feature model”. In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. |
Hien Dang; Tho Tran Huu; Stanley Osher; Hung Tran-The; Nhat Ho; Tan Minh Nguyen; |
581 | Continual Task Allocation in Meta-Policy Network Via Sparse Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly adapt to new tasks (plasticity) meanwhile retaining the common knowledge from previous tasks (stability). We address it by Continual Task Allocation via Sparse Prompting (CoTASP), which learns over-complete dictionaries to produce sparse masks as prompts extracting a sub-network for each task from a meta-policy network. |
Yijun Yang; Tianyi Zhou; Jing Jiang; Guodong Long; Yuhui Shi; |
582 | Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. |
Anant Raj; Lingjiong Zhu; Mert Gurbuzbalaban; Umut Simsekli; |
583 | Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR. Next, we show that the finite-difference computation also works better in the sense of generalization performance. |
Ryo Karakida; Tomoumi Takase; Tomohiro Hayase; Kazuki Osawa; |
584 | TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models. |
Zhaoyan Liu; Noël Vouitsis; Satya Krishna Gorti; Jimmy Ba; Gabriel Loaiza-Ganem; |
585 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner. |
Huaishao Luo; Junwei Bao; Youzheng Wu; Xiaodong He; Tianrui Li; |
586 | Causal Modeling of Policy Interventions From Treatment–Outcome Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, the current methods are not applicable if the treatment policy is unknown or a counterfactual analysis is needed. To handle these limitations, we model the treatments and outcomes jointly in continuous time, by combining Gaussian processes and point processes. |
Çağlar Hızlı; S. T. John; Anne Tuulikki Juuti; Tuure Tapani Saarinen; Kirsi Hannele Pietiläinen; Pekka Marttinen; |
587 | Cross-Entropy Loss Functions: Theoretical Analysis and Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a theoretical analysis of a broad family of loss functions, *comp-sum losses*, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. |
Anqi Mao; Mehryar Mohri; Yutao Zhong; |
588 | Learning Deductive Reasoning from Synthetic Corpus Based on Formal Logic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a synthetic corpus based approach for language models (LMs) to acquire logical deductive reasoning ability.We release the code, data, and models. |
Terufumi Morishita; Gaku Morio; Atsuki Yamaguchi; Yasuhiro Sogawa; |
589 | Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce notions of robustness, together with dedicated statistical methods, for $\textit{Consensus Ranking}$ the flagship problem in ranking data analysis, aiming at summarizing a probability distribution on $\mathfrak{S}_n$ by a $\textit{median}$ ranking. |
Morgane Goibert; Clément Calauzènes; Ekhine Irurozki; Stephan CLEMENCON; |
590 | Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). |
Taku Yamagata; Ahmed Khalil; Raul Santos-Rodriguez; |
591 | Extrapolated Random Tree for Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel tree-based algorithm named *Extrapolated Random Tree for Regression* (ERTR) that adapts to arbitrary smoothness of the regression function while maintaining the interpretability of the tree. |
Yuchao Cai; Yuheng Ma; Yiwei Dong; Hanfang Yang; |
592 | Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. |
Cheng Lu; Huayu Chen; Jianfei Chen; Hang Su; Chongxuan Li; Jun Zhu; |
593 | ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, it is often observed that the policy alternates between satisfying the constraints and maximizing the reward, rarely accomplishing both objectives simultaneously. Here, we address this problem by introducing Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD), a principled CRL method with guaranteed last-iterate convergence. |
Ted Moskovitz; Brendan O’Donoghue; Vivek Veeriah; Sebastian Flennerhag; Satinder Singh; Tom Zahavy; |
594 | Improving Adversarial Robustness By Putting More Regularizations on Less Robust Samples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. |
Dongyoon Yang; Insung Kong; Yongdai Kim; |
595 | Federated Linear Contextual Bandits with User-level Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). |
Ruiquan Huang; Huanyu Zhang; Luca Melis; Milan Shen; Meisam Hejazinia; Jing Yang; |
596 | Simple Diffusion: End-to-end Diffusion for High Resolution Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to improve denoising diffusion for high resolution images while keeping the model as simple as possible. |
Emiel Hoogeboom; Jonathan Heek; Tim Salimans; |
597 | Geometric Clifford Algebra Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. |
David Ruhe; Jayesh K Gupta; Steven De Keninck; Max Welling; Johannes Brandstetter; |
598 | Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sliced-Wasserstein Flow (SWF) is a promising approach to nonparametric generative modeling but has not been widely adopted due to its suboptimal generative quality and lack of conditional modeling capabilities. In this work, we make two major contributions to bridging this gap. |
Chao Du; Tianbo Li; Tianyu Pang; Shuicheng YAN; Min Lin; |
599 | Accelerated Stochastic Optimization Methods Under Quasar-convexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that our algorithms have fast convergence and outperform existing algorithms on several examples, including the classical problem of learning linear dynamical systems. |
Qiang Fu; Dongchu Xu; Ashia Camage Wilson; |
600 | Conformal Prediction for Federated Uncertainty Quantification Under Label Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints. |
Vincent Plassier; Mehdi Makni; Aleksandr Rubashevskii; Eric Moulines; Maxim Panov; |
601 | The Edge of Orthogonality: A Simple View of What Makes BYOL Tick Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim at exploring the simplest possible mathematical arguments towards explaining the underlying mechanisms behind self-predictive unsupervised learning. |
Pierre Harvey Richemond; Allison Tam; Yunhao Tang; Florian Strub; Bilal Piot; Felix Hill; |
602 | Nested Elimination: A Simple Algorithm for Best-Item Identification From Choice-Based Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an elimination-based algorithm, namely Nested Elimination (NE), which is inspired by the nested structure implied by the information-theoretic lower bound. |
Junwen Yang; Yifan Feng; |
603 | Harmonic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate effective means of representing harmonic functions in neural networks and extend such results also to quantum neural networks to demonstrate the generality of our approach. We benchmark our approaches against (quantum) physics-informed neural networks, where we show favourable performance. |
Atiyo Ghosh; Antonio Andrea Gentile; Mario Dagrada; Chul Lee; Seong-Hyok Sean Kim; Hyukgeun Cha; Yunjun Choi; Dongho Kim; JEONG-IL KYE; Vincent Emanuel Elfving; |
604 | N$\text{A}^\text{2}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study an interpretable value decomposition framework via the family of generalized additive models. |
Zichuan Liu; Yuanyang Zhu; Chunlin Chen; |
605 | An Investigation Into Pre-Training Object-Centric Representations for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments.For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as Does OCR pre-training improve performance on object-centric tasks? |
Jaesik Yoon; Yi-Fu Wu; Heechul Bae; Sungjin Ahn; |
606 | GNN&GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the framework can not effectively use the embedding spatial information in GNN and still highly relies on large-scale solvers in LNS, resulting in the scale of IP being limited by the ability of the current solver and performance bottlenecks. To handle these issues, this paper presents a GNN&GBDT-guided fast optimizing framework for large-scale IPs that only uses a small-scale optimizer to solve large-scale IPs efficiently. |
Huigen Ye; Hua Xu; Hongyan Wang; Chengming Wang; Yu Jiang; |
607 | State and Parameter Learning with PARIS Particle Gibbs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs (PPG) sampler, which can be viewed as a PaRIS (Olsson, Westerborn 2017) algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. |
Gabriel Cardoso; Yazid Janati El Idrissi; Sylvain Le Corff; Eric Moulines; Jimmy Olsson; |
608 | Tuning Computer Vision Models With Task Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward. We adopt this approach and show its surprising effectiveness to improve generic models pretrained to imitate example outputs across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning. |
André Susano Pinto; Alexander Kolesnikov; Yuge Shi; Lucas Beyer; Xiaohua Zhai; |
609 | Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the key notion of label non-uniformity, which is derived from the Wasserstein distance between the softmax distribution of the logits and the uniform distribution. |
Feng Ji; See Hian Lee; Hanyang Meng; Kai Zhao; Jielong Yang; Wee Peng Tay; |
610 | Meta-learning Parameterized Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. |
Haotian Fu; Shangqun Yu; Saket Tiwari; Michael Littman; George Konidaris; |
611 | A Scalable Frank-Wolfe-Based Algorithm for The Max-Cut SDP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of solving large-scale instances of the Max-Cut semidefinite program (SDP), i.e., optimizing a linear function over $n\times n$ positive semidefinite (PSD) matrices with unit diagonal. |
Chi Bach Pham; Wynita Griggs; James Saunderson; |
612 | Last Switch Dependent Bandits with Monotone Payoff Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. |
Ayoub Foussoul; Vineet Goyal; Orestis Papadigenopoulos; assaf zeevi; |
613 | Multi-class Graph Clustering Via Approximated Effective $p$-Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. |
Shota Saito; Mark Herbster; |
614 | Under-Counted Tensor Completion with Neural Incorporation of Attributes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a low-rank Poisson tensor model with an expressive unknown nonlinear side information extractor is proposed for under-counted multi-aspect data. |
Shahana Ibrahim; Xiao Fu; Rebecca Hutchinson; Eugene Seo; |
615 | Weighted Sampling Without Replacement for Deep Top-$k$ Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose using the Weighted Sampling Without Replacement (WSWR) method as a learning objective for top-$k$ loss. |
Dieqiao Feng; Yuanqi Du; Carla P Gomes; Bart Selman; |
616 | Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. |
Ilias Diakonikolas; Daniel Kane; Ankit Pensia; Thanasis Pittas; |
617 | Online Mechanism Design for Information Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the full feedback problem, we propose an algorithm that guarantees $\tilde{O}(\sqrt{T})$ regret and violation, while for the bandit feedback setting we present an algorithm that attains $\tilde{O}(T^{\alpha})$ regret and $\tilde{O}(T^{1-\alpha/2})$ violation for any $\alpha \in [1/2, 1]$. |
Federico Cacciamani; Matteo Castiglioni; Nicola Gatti; |
618 | Directed Chain Generative Adversarial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints. |
Ming Min; Ruimeng Hu; Tomoyuki Ichiba; |
619 | Propensity Matters: Measuring and Enhancing Balancing for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on BMSE, we propose IPS-V2 and DR-V2 as the estimators of unbiased loss, and theoretically show that IPS-V2 and DR-V2 have greater propensity balancing and smaller variance without sacrificing additional bias. |
Haoxuan Li; Yanghao Xiao; Chunyuan Zheng; Peng Wu; Peng Cui; |
620 | Layered State Discovery for Incremental Autonomous Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel layered decomposition of the set of incrementally $L$-controllable states that is based on the iterative application of a state-expansion operator. |
Liyu Chen; Andrea Tirinzoni; Alessandro Lazaric; Matteo Pirotta; |
621 | MonoFlow: Rethinking Divergence GANs Via The Perspective of Wasserstein Gradient Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. |
Mingxuan Yi; Zhanxing Zhu; Song Liu; |
622 | CataBEEM: Integrating Latent Interaction Categories in Node-wise Community Detection Models for Network Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a category-and-block edge exchangeable model (CataBEEM) to study interaction networks with joint latent interaction-level category and node-level community structures. |
Yuhua Zhang; Walter H. Dempsey; |
623 | Chemically Transferable Generative Backmapping of Coarse-Grained Proteins Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work addresses both issues to build a fast, transferable, and reliable generative backmapping tool for CG protein representations. |
Soojung Yang; Rafael Gomez-Bombarelli; |
624 | Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CoPaint, which can coherently inpaint the whole image without introducing mismatches. |
Guanhua Zhang; Jiabao Ji; Yang Zhang; Mo Yu; Tommi S. Jaakkola; Shiyu Chang; |
625 | MAGANet: Achieving Combinatorial Generalization By Modeling A Group Action Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent studies discovered that the disentangled representation is insufficient for combinatorial generalization and is not even correlated. In this regard, we propose a novel framework for data generation that can robustly generalize under these distribution shift situations. |
Geonho Hwang; Jaewoong Choi; Hyunsoo Cho; Myungjoo Kang; |
626 | Go Beyond Imagination: Maximizing Episodic Reachability with World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new intrinsic reward design called GoBI – Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion. |
Yao Fu; Run Peng; Honglak Lee; |
627 | QuantumDARTS: Differentiable Quantum Architecture Search for Variational Quantum Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of quantum architecture search (QAS) for VQA to automatically design parameterized quantum circuits (PQC). |
Wenjie Wu; Ge Yan; Xudong Lu; Kaisen Pan; Junchi Yan; |
628 | Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But using predictive error as intrinsic motivation is fragile in *stochastic environments*, as the agent may become trapped by high-entropy areas of the state-action space, such as a noisy TV. In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the *unpredictable* aspects of each outcome—which we use as additional input for predictions, such that intrinsic rewards only reflect the *predictable* aspects of world dynamics. |
Daniel Jarrett; Corentin Tallec; Florent Altché; Thomas Mesnard; Remi Munos; Michal Valko; |
629 | Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method — Sequential Multi-Dimensional SSL — where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. |
Aniruddh Raghu; Payal Chandak; Ridwan Alam; John Guttag; Collin Stultz; |
630 | Diverse and Faithful Knowledge-Grounded Dialogue Generation Via Sequential Posterior Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues by approximately sampling from the posterior distribution. |
Yan Xu; Deqian Kong; Dehong Xu; Ziwei Ji; Bo Pang; Pascale Fung; Ying Nian Wu; |
631 | X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation Using CLIP and StableDiffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). |
Hanqing Zhao; Dianmo Sheng; Jianmin Bao; Dongdong Chen; Dong Chen; Fang Wen; Lu Yuan; Ce Liu; Wenbo Zhou; Qi Chu; Weiming Zhang; Nenghai Yu; |
632 | One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. |
Fan Bao; Shen Nie; Kaiwen Xue; Chongxuan Li; Shi Pu; Yaole Wang; Gang Yue; Yue Cao; Hang Su; Jun Zhu; |
633 | ClimaX: A Foundation Model for Weather and Climate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings. |
Tung Nguyen; Johannes Brandstetter; Ashish Kapoor; Jayesh K Gupta; Aditya Grover; |
634 | Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities. |
Joonhyuk Yang; Dongpil Shin; Hye Won Chung; |
635 | Prototype-oriented Unsupervised Anomaly Detection for Multivariate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing UAD methods try to learn a fixed set of mappings for each MTS, entailing expensive computation and limited model adaptation. To address this pivotal issue, we propose a prototype-oriented UAD (PUAD) method under a probabilistic framework. |
Yuxin Li; Wenchao Chen; Bo Chen; Dongsheng Wang; Long Tian; Mingyuan Zhou; |
636 | Delay-agnostic Asynchronous Coordinate Update Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a delay-agnostic asynchronous coordinate update algorithm (DEGAS) for computing operator fixed points, with applications to asynchronous optimization. |
Xuyang Wu; Changxin Liu; Sindri Magnússon; Mikael Johansson; |
637 | FedCR: Personalized Federated Learning Based on Across-Client Common Representation with Conditional Mutual Information Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In personalized federated learning (PFL), multiple clients train customized models to fulfill their personal objectives, which, however, are prone to overfitting to local data due to the heterogeneity and scarcity of local data. To address this, we propose from the information-theoretic perspective a personalized federated learning framework based on the common representation learned across clients, named FedCR. |
Hao Zhang; Chenglin Li; Wenrui Dai; Junni Zou; Hongkai Xiong; |
638 | Lifelong Language Pretraining with Distribution-Specialized Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose Lifelong-MoE, an extensible MoE (Mixture-of-Experts) architecture that dynamically adds model capacity via adding experts with regularized pretaining. |
Wuyang Chen; Yanqi Zhou; Nan Du; Yanping Huang; James Laudon; Zhifeng Chen; Claire Cui; |
639 | Revisiting Over-smoothing and Over-squashing Using Ollivier-Ricci Curvature Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our theory, we propose the Batch Ollivier-Ricci Flow, a novel rewiring algorithm capable of simultaneously addressing both over-smoothing and over-squashing. |
Khang Nguyen; Nong Minh Hieu; Vinh Duc NGUYEN; Nhat Ho; Stanley Osher; Tan Minh Nguyen; |
640 | AudioLDM: Text-to-Audio Generation with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn continuous audio representations from contrastive language-audio pretraining (CLAP) embeddings. |
Haohe Liu; Zehua Chen; Yi Yuan; Xinhao Mei; Xubo Liu; Danilo Mandic; Wenwu Wang; Mark D Plumbley; |
641 | Accounting For Informative Sampling When Learning to Forecast Treatment Outcomes Over Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize informative sampling as a covariate shift problem and show that it can prohibit accurate estimation of treatment outcomes if not properly accounted for. To overcome this challenge, we present a general framework for learning treatment outcomes in the presence of informative sampling using inverse intensity-weighting, and propose a novel method, TESAR-CDE, that instantiates this framework using Neural CDEs. |
Toon Vanderschueren; Alicia Curth; Wouter Verbeke; Mihaela van der Schaar; |
642 | A Picture of The Space of Typical Learnable Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and contrastive learning. |
Rahul Ramesh; Jialin Mao; Itay Griniasty; Rubing Yang; Han Kheng Teoh; Mark Transtrum; James Sethna; Pratik Chaudhari; |
643 | Towards Quantum Machine Learning for Constrained Combinatorial Optimization: A Quantum QAP Solver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel quantum neural network (QNN) for learning CO problems in a supervised manner to achieve better and faster results. |
Xinyu Ye; Ge Yan; Junchi Yan; |
644 | Transformers As Algorithms: Generalization and Stability in In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. |
Yingcong Li; Muhammed Emrullah Ildiz; Dimitris Papailiopoulos; Samet Oymak; |
645 | Estimation Beyond Data Reweighting: Kernel Method of Moments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the use of $\varphi$-divergences effectively limits the candidate distributions to reweightings of the data samples. We lift this long-standing limitation and provide a method of moments that goes beyond data reweighting. |
Heiner Kremer; Yassine Nemmour; Bernhard Schölkopf; Jia-Jie Zhu; |
646 | Importance Weighted Expectation-Maximization for Protein Sequence Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose IsEM-Pro, an approach to generate protein sequences towards a given fitness criterion. |
Zhenqiao Song; Lei Li; |
647 | Dissecting The Effects of SGD Noise in Distinct Regimes of Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $\alpha$ are varied. |
Antonio Sclocchi; Mario Geiger; Matthieu Wyart; |
648 | Neural Status Registers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neither architecture can extrapolate to much larger numbers than those seen in the training set. We propose a novel differentiable architecture, the Neural Status Register (NSR) to solve this problem. |
Lukas Faber; Roger Wattenhofer; |
649 | Convex Geometry of ReLU-layers, Injectivity on The Ball and Local Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper uses a frame-theoretic setting to study the injectivity of a ReLU-layer on the closed ball of $\mathbb{R}^n$ and its non-negative part. |
Daniel Haider; Martin Ehler; Peter Balazs; |
650 | Improving Adversarial Robustness Through The Contrastive-Guided Diffusion Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose the Contrastive-Guided Diffusion Process (Contrastive-DP), which incorporates the contrastive loss to guide the diffusion model in data generation. |
Yidong Ouyang; Liyan Xie; Guang Cheng; |
651 | High-Probability Bounds for Stochastic Optimization and Variational Inequalities: The Case of Unbounded Variance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. |
Abdurakhmon Sadiev; Marina Danilova; Eduard Gorbunov; Samuel Horváth; Gauthier Gidel; Pavel Dvurechensky; Alexander Gasnikov; Peter Richtárik; |
652 | Linkless Link Prediction Via Relational Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, to combine the advantages of GNNs and MLPs, we start with exploring direct knowledge distillation (KD) methods for link prediction, i.e., predicted logit-based matching and node representation-based matching. |
Zhichun Guo; William Shiao; Shichang Zhang; Yozen Liu; Nitesh Chawla; Neil Shah; Tong Zhao; |
653 | FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that data heterogeneity can be dealt from a different perspective. |
Bingqing Song; Prashant Khanduri; Xinwei Zhang; Jinfeng Yi; Mingyi Hong; |
654 | Subset Selection Based On Multiple Rankings in The Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the problem of subset selection where one is given multiple rankings of items and the goal is to select the highest quality subset. |
Niclas Boehmer; L. Elisa Celis; Lingxiao Huang; Anay Mehrotra; Nisheeth K Vishnoi; |
655 | LeadFL: Client Self-Defense Against Model Poisoning in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a client-self defense, LeadFL, that is combined with existing server-side defenses to thwart backdoor and targeted attacks. |
Chaoyi Zhu; Stefanie Roos; Lydia Y. Chen; |
656 | Phase-aware Adversarial Defense for Improving Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the cognitive science, we investigate the interference of adversarial noise from the perspective of image phase, and find ordinarily-trained models lack enough robustness against phase-level perturbations. |
Dawei Zhou; Nannan Wang; Heng Yang; Xinbo Gao; Tongliang Liu; |
657 | Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work, we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise. |
Axel Abels; Tom Lenaerts; Vito Trianni; Ann Nowe; |
658 | Efficient Learning of Mesh-Based Physical Simulation with Bi-Stride Multi-Scale Graph Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the bipartite graph determination, we propose a novel pooling strategy, *bi-stride* to tackle the aforementioned limitations. |
Yadi Cao; Menglei Chai; Minchen Li; Chenfanfu Jiang; |
659 | Quantum 3D Graph Learning with Applications to Molecule Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time to our best knowledge, we propose a quantum 3D embedding ansatz that learns the latent representation of 3D structures from the Hilbert space composed of the Bloch sphere of each qubit. |
Ge Yan; Huaijin Wu; Junchi Yan; |
660 | Bag of Tricks for Training Data Extraction from Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate and benchmark tricks for improving training data extraction using a publicly available dataset. |
Weichen Yu; Tianyu Pang; Qian Liu; Chao Du; Bingyi Kang; Yan Huang; Min Lin; Shuicheng YAN; |
661 | Comparison of Meta-learners for Estimating Multi-valued Treatment Heterogeneous Effects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce and discuss meta-learners that perform well as the number of treatments increases. |
Naoufal Acharki; Ramiro Lugo; Antoine Bertoncello; Josselin Garnier; |
662 | Revisiting Gradient Clipping: Stochastic Bias and Tight Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we give convergence guarantees that show precise dependence on arbitrary clipping thresholds $c$ and show that our guarantees are tight with both deterministic and stochastic gradients. |
Anastasia Koloskova; Hadrien Hendrikx; Sebastian U Stich; |
663 | Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: The Case of Negative Comonotonicity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by non-monotone machine learning applications, we follow the line of works (Diakonikolas et al., 2021; Lee & Kim, 2021; Pethick et al., 2022; Bohm,2022) aiming at going beyond monotonicity by considering the weaker *negative comonotonicity* assumption. In this work, we provide tight complexity analyses for the Proximal Point (PP), Extragradient (EG), and Optimistic Gradient (OG) methods in this setup, closing several questions on their working guarantees beyond monotonicity. |
Eduard Gorbunov; Adrien Taylor; Samuel Horváth; Gauthier Gidel; |
664 | One-Step Estimator for Permuted Sparse Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the unlabeled sparse recovery under multiple measurements, i.e., ${\mathbf{Y}} = {\mathbf{\Pi}}^{\natural} {\mathbf{X}} {\mathbf{B}}^{\natural} + {\mathbf{W}}$, where ${\mathbf{Y}} \in \mathbb{R}^{n\times m}, {\mathbf{\Pi}}^{\natural}\in \mathbb{R}^{n\times n}, {\mathbf{X}} \in \mathbb{R}^{n\times p}, {\mathbf{B}} ^{\natural}\in \mathbb{R}^{p\times m}, {\mathbf{W}}\in \mathbb{R}^{n\times m}$ represents the observations, missing (or incomplete) correspondence information, sensing matrix, sparse signals, and additive sensing noise, respectively. |
Hang Zhang; Ping Li; |
665 | Fast Online Node Labeling for Very Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an improvement based on the *online relaxation* technique introduced by a series of works (Rakhlin et al., 2012; Rakhlin & Sridharan, 2015; 2017). |
Baojian Zhou; Yifan Sun; Reza Babanezhad Harikandeh; |
666 | What Can Be Learnt With Wide Convolutional Neural Networks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study infinitely-wide deep CNNs in the kernel regime. |
Francesco Cagnetta; Alessandro Favero; Matthieu Wyart; |
667 | Language Instructed Reinforcement Learning for Human-AI Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions. |
Hengyuan Hu; Dorsa Sadigh; |
668 | Structured Cooperative Learning with Graphical Model Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Structured Cooperative Learning (SCooL), in which a cooperation graph across devices is generated by a graphical model prior to automatically coordinate mutual learning between devices. |
Shuangtong Li; Tianyi Zhou; Xinmei Tian; Dacheng Tao; |
669 | A Mathematical Model for Curriculum Learning for Parities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a CL model for learning the class of k-parities on d bits of a binary string with a neural network trained by stochastic gradient descent (SGD). |
Elisabetta Cornacchia; Elchanan Mossel; |
670 | A Model-free Closeness-of-influence Test for Features in Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ideally, it is desired to understand how a set of collected features combine together and influence the response value, but this problem is notoriously difficult, due to the high-dimensionality of data and limited number of labeled data points, among many others. In this work, we take a new perspective on this problem, and we study the question of assessing the difference of influence that the two given features have on the response value. |
Mohammad Mehrabi; Ryan A. Rossi; |
671 | Conditional Graph Information Bottleneck for Molecular Relational Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. |
Namkyeong Lee; Dongmin Hyun; Gyoung S. Na; Sungwon Kim; Junseok Lee; Chanyoung Park; |
672 | Guiding Pretraining in Reinforcement Learning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a method that uses background knowledge from text corpora to shape exploration. |
Yuqing Du; Olivia Watkins; Zihan Wang; Cédric Colas; Trevor Darrell; Pieter Abbeel; Abhishek Gupta; Jacob Andreas; |
673 | A Neural PDE Solver with Temporal Stencil Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this study shows that significant information is often lost in the low-resolution down-sampled features. To address such issues, we propose a new approach, namely Temporal Stencil Modeling (TSM), which combines the strengths of advanced time-series sequence modeling (with the HiPPO features) and state-of-the-art neural PDE solvers (with learnable stencil modeling). |
Zhiqing Sun; Yiming Yang; Shinjae Yoo; |
674 | Learning Perturbations to Explain Time Series Predictions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to explain predictions by learning not only masks, but also associated perturbations. |
Joseph Enguehard; |
675 | Learning to Suggest Breaks: Sustainable Optimization of Long-Term User Engagement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the role of breaks in recommendation, and propose a framework for learning optimal breaking policies that promote and sustain long-term engagement. |
Eden Saig; Nir Rosenfeld; |
676 | Training Normalizing Flows from Dependent Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a likelihood objective of normalizing flows incorporating dependencies between the data points, for which we derive a flexible and efficient learning algorithm suitable for different dependency structures. |
Matthias Kirchler; Christoph Lippert; Marius Kloft; |
677 | The Power of Uniform Sampling for K-Median Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the power of uniform sampling for $k$-Median in various metric spaces. |
Lingxiao Huang; Shaofeng H.-C. Jiang; Jianing Lou; |
678 | Towards Explaining Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We derive our interpretable mappings from a relaxation of the optimal transport problem, where the candidate mappings are restricted to a set of interpretable mappings. |
Sean Kulinski; David I. Inouye; |
679 | On Second-Order Scoring Rules for Epistemic Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. |
Viktor Bengs; Eyke Hüllermeier; Willem Waegeman; |
680 | Generated Graph Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the first framework to systematically investigate a set of sophisticated models and their performance in four classification scenarios. |
Yihan Ma; Zhikun Zhang; Ning Yu; Xinlei He; Michael Backes; Yun Shen; Yang Zhang; |
681 | The Catalog Problem: Clustering and Ordering Variable-Sized Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite progress in both neural clustering and set-to-sequence methods, no joint, fully differentiable model exists to-date. We develop such a modular architecture, referred to further as Neural Ordered Clusters (NOC), enhance it with a specific mechanism for learning cluster-level cardinality constraints, and provide a robust comparison of its performance in relation to alternative models. |
Mateusz Maria Jurewicz; Graham W. Taylor; Leon Derczynski; |
682 | Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces the Efficient Transformed Gaussian Process (ETGP), a new way of creating $C$ stochastic processes characterized by: 1) the $C$ processes are non-stationary, 2) the $C$ processes are dependent by construction without needing a mixing matrix, 3) training and making predictions is very efficient since the number of Gaussian Processes (GP) operations (e.g. inverting the inducing point’s covariance matrix) do not depend on the number of processes. This makes the ETGP particularly suited for multi-class problems with a very large number of classes, which are the problems studied in this work. |
Juan Maroñas; Daniel Hernández-Lobato; |
683 | MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meta-learning algorithms are able to learn a new task using previously learned knowledge, but they often require a large number of meta-training tasks which may not be readily available. To address this issue, we propose a method for few-shot learning with fewer tasks, which we call MetaModulation. |
Wenfang Sun; Yingjun Du; Xiantong Zhen; Fan Wang; Ling Wang; Cees G. M. Snoek; |
684 | Adaptive Smoothing Gradient Learning for Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a methodology such that training a prototype neural network will evolve into training an SNN gradually by fusing the learnable relaxation degree into the network with random spike noise. |
Ziming Wang; Runhao Jiang; Shuang Lian; Rui Yan; Huajin Tang; |
685 | An Instrumental Variable Approach to Confounded Off-Policy Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. |
Yang Xu; Jin Zhu; Chengchun Shi; Shikai Luo; Rui Song; |
686 | Scalable Multi-Agent Reinforcement Learning Through Intelligent Information Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. |
Siddharth Nayak; Kenneth Choi; Wenqi Ding; Sydney Dolan; Karthik Gopalakrishnan; Hamsa Balakrishnan; |
687 | Out-of-Domain Robustness Via Targeted Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. |
Irena Gao; Shiori Sagawa; Pang Wei Koh; Tatsunori Hashimoto; Percy Liang; |
688 | Constrained Causal Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose different surrogate models that enable to integrate observational and interventional data while capturing correlation among effects with increasing levels of sophistication. |
Virginia Aglietti; Alan Malek; Ira Ktena; Silvia Chiappa; |
689 | Dynamic Constrained Submodular Optimization with Polylogarithmic Update Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a simpler algorithm for the problem that maintains a $(\frac{1}{2}-\epsilon)$-approximate solution for submodular maximization under cardinality constraint $k$ using a polylogarithmic amortized update time. |
Kiarash Banihashem; Leyla Biabani; Samira Goudarzi; MohammadTaghi Hajiaghayi; Peyman Jabbarzade; Morteza Monemizadeh; |
690 | Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work first takes a deep look into a recently emerged logistic loss function of DCC, and characterizes its theoretical properties. |
Tri Nguyen; Shahana Ibrahim; Xiao Fu; |
691 | Self-supervised Learning of Split Invariant Equivariant Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks. |
Quentin Garrido; Laurent Najman; Yann LeCun; |
692 | Distance Weighted Supervised Learning for Offline Interaction Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data. |
Joey Hejna; Jensen Gao; Dorsa Sadigh; |
693 | A Theoretical Analysis of The Learning Dynamics Under Class Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. |
Emanuele Francazi; Marco Baity-Jesi; Aurelien Lucchi; |
694 | PCA-based Multi-Task Learning: A Random Matrix Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The article proposes and theoretically analyses a *computationally efficient* multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes. |
Malik Tiomoko; Romain Couillet; Frederic Pascal; |
695 | Low-Switching Policy Gradient with Exploration Via Online Sensitivity Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, recent advances for this problems have only been successful in tabular and linear setting, whose benign structures cannot be generalized to non-linearly parameterized policies. In this paper, we address this problem by leveraging recent advances in value-based algorithms, including bounded eluder-dimension and online sensitivity sampling, to design a low-switching sample-efficient policy optimization algorithm, *LPO*, with general non-linear function approximation. |
Yunfan Li; Yiran Wang; Yu Cheng; Lin Yang; |
696 | Improving Adversarial Robustness of Deep Equilibrium Models with Explicit Regulations Along The Neural Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Informed by the correlation between the entropy of dynamical systems and their stability properties, we propose reducing prediction entropy by progressively updating inputs along the neural dynamics. |
Zonghan Yang; Peng Li; Tianyu Pang; Yang Liu; |
697 | Protecting Language Generation Models Via Invisible Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods can be nullified by obvious countermeasures such as “synonym randomization”. To address this issue, we propose GINSW, a novel method to protect text generation models from being stolen through distillation. |
Xuandong Zhao; Yu-Xiang Wang; Lei Li; |
698 | Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels. |
Shion Takeno; Masahiro Nomura; Masayuki Karasuyama; |
699 | End-to-End Learning for Stochastic Optimization: A Bayesian Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a principled approach to end-to-end learning in stochastic optimization. |
Yves Rychener; Daniel Kuhn; Tobias Sutter; |
700 | Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. |
Dixian Zhu; Yiming Ying; Tianbao Yang; |
701 | Revisiting Domain Randomization Via Relaxed State-Adversarial Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the existing methods often assume that the distribution of domain parameters belongs to a specific family of probability functions, such as normal distributions, which may not be correct. To overcome these limitations, we propose a new approach to DR by rethinking it from the perspective of adversarial state perturbation, without the need for reconfiguring the simulator or relying on prior knowledge about the environment. |
Yun-Hsuan Lien; Ping-Chun Hsieh; Yu-Shuen Wang; |
702 | GFlowOut: Dropout with Generative Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. |
Dianbo Liu; Moksh Jain; Bonaventure F. P. Dossou; Qianli Shen; Salem Lahlou; Anirudh Goyal; Nikolay Malkin; Chris Chinenye Emezue; Dinghuai Zhang; Nadhir Hassen; Xu Ji; Kenji Kawaguchi; Yoshua Bengio; |
703 | Emergence of Sparse Representations from Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, we discover that noisy training introduces three implicit loss terms that result in sparsely firing neurons specializing to high variance features of the dataset. |
Trenton Bricken; Rylan Schaeffer; Bruno Olshausen; Gabriel Kreiman; |
704 | Non-stationary Reinforcement Learning Under General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. |
Songtao Feng; Ming Yin; Ruiquan Huang; Yu-Xiang Wang; Jing Yang; Yingbin Liang; |
705 | MetaDiffuser: Diffusion Model As Conditional Planner for Offline Meta-RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. |
Fei Ni; Jianye HAO; Yao Mu; Yifu Yuan; YAN ZHENG; Bin Wang; Zhixuan Liang; |
706 | SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. |
Pranjal Aggarwal; Ameet Deshpande; Karthik R Narasimhan; |
707 | Improved Active Multi-Task Representation Learning Via Lasso Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we show the strict dominance of the L1-regularized-relevance-based ($\nu^1$-based) strategy by giving a lower bound for the $\nu^2$-based strategy. |
Yiping Wang; Yifang Chen; Kevin Jamieson; Simon Shaolei Du; |
708 | Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. |
Ahmet Alacaoglu; Hanbaek Lyu; |
709 | SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM Parallelism (Stochastically Wired Adaptively Rebalanced Model Parallelism), a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. |
Max Ryabinin; Tim Dettmers; Michael Diskin; Alexander Borzunov; |
710 | Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study the problem of (finite horizon tabular) Markov decision processes (MDPs) with heavy-tailed rewards under the constraint of differential privacy (DP). |
Yulian Wu; Xingyu Zhou; Sayak Ray Chowdhury; Di Wang; |
711 | Rotation and Translation Invariant Representation Learning with Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Invariant Representation Learning with Implicit Neural Representation (IRL-INR), which uses an implicit neural representation (INR) with a hypernetwork to obtain semantic representations disentangled from the orientation of the image. |
Sehyun Kwon; Joo Young Choi; Ernest K. Ryu; |
712 | Optimizing Hyperparameters with Conformal Quantile Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks. |
David Salinas; Jacek Golebiowski; Aaron Klein; Matthias Seeger; Cedric Archambeau; |
713 | Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices. |
Hyeonsu Jeong; Hye Won Chung; |
714 | Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new method for hierarchical data embedding and distance. |
Ya-Wei Eileen Lin; Ronald R. Coifman; Gal Mishne; Ronen Talmon; |
715 | Graph Neural Networks with Learnable and Optimal Polynomial Bases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose two spectral GNN models that provide positive answers to the questions posed above. |
Yuhe Guo; Zhewei Wei; |
716 | Emergent Agentic Transformer from Chain of Hindsight Experience Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. |
Hao Liu; Pieter Abbeel; |
717 | FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate the research of FedHPO, we propose and implement a benchmark suite FedHPO-Bench that incorporates comprehensive FedHPO problems, enables flexible customization of the function evaluations, and eases continuing extensions. |
Zhen WANG; Weirui Kuang; Ce Zhang; Bolin Ding; Yaliang Li; |
718 | Bandits with Knapsacks: Advice on Time-Varying Demands Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions. |
Lixing Lyu; Wang Chi Cheung; |
719 | Adaptive Compositional Continual Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Adaptive Compositional Continual Meta-Learning (ACML) algorithm, which employs a compositional premise to associate a task with a subset of mixture components, allowing meta-knowledge sharing among heterogeneous tasks. |
Bin Wu; Jinyuan Fang; xiangxiang Zeng; Shangsong Liang; Qiang Zhang; |
720 | Integrating Prior Knowledge in Contrastive Learning with Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we open the door to new perspectives for CL by integrating prior knowledge, given either by generative models – viewed as prior representations – or weak attributes in the positive and negative sampling. |
Benoit Dufumier; Carlo Alberto Barbano; Robin Louiset; Edouard Duchesnay; Pietro Gori; |
721 | The Saddle-Point Method in Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We characterize the differential privacy guarantees of privacy mechanisms in the large-composition regime, i.e., when a privacy mechanism is sequentially applied a large number of times to sensitive data. |
Wael Alghamdi; Juan Felipe Gomez; Shahab Asoodeh; Flavio Calmon; Oliver Kosut; Lalitha Sankar; |
722 | From Temporal to Contemporaneous Iterative Causal Discovery in The Presence of Latent Confounders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a constraint-based algorithm for learning causal structures from observational time-series data, in the presence of latent confounders. |
Raanan Yehezkel Rohekar; Shami Nisimov; Yaniv Gurwicz; Gal Novik; |
723 | Offline Meta Reinforcement Learning with In-Distribution Online Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our theory finds that out-of-distribution adaptation episodes may lead to unreliable policy evaluation and that online adaptation with in-distribution episodes can ensure adaptation performance guarantee. Based on these theoretical insights, we propose a novel adaptation framework, called In-Distribution online Adaptation with uncertainty Quantification (IDAQ), which generates in-distribution context using a given uncertainty quantification and performs effective task belief inference to address new tasks. |
Jianhao Wang; Jin Zhang; Haozhe Jiang; Junyu Zhang; Liwei Wang; Chongjie Zhang; |
724 | Accelerated Infeasibility Detection of Constrained Optimization and Fixed-Point Iterations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we characterize the optimal accelerated rate of infeasibility detection. |
Jisun Park; Ernest K. Ryu; |
725 | Restoration Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish the interpretation of DDMs in terms of image restoration (IR). |
Jaemoo Choi; Yesom Park; Myungjoo Kang; |
726 | Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study PO in adversarial MDPs with a challenge that arises in almost every real-world application — *delayed bandit feedback*. |
Tal Lancewicki; Aviv Rosenberg; Dmitry Sotnikov; |
727 | One-Shot Compression of Large Edge-Exchangeable Graphs Using Bits-Back Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a one-shot method for compressing large labeled graphs called Random Edge Coding. |
Daniel Severo; James Townsend; Ashish J Khisti; Alireza Makhzani; |
728 | GOAT: A Global Transformer on Large-scale Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, existing GNN architectures are limited in their ability to perform equally well on both homophilious and heterophilious graphs as their inductive biases are generally tailored to only one setting. To address these issues, we propose GOAT, a scalable global graph transformer. |
Kezhi Kong; Jiuhai Chen; John Kirchenbauer; Renkun Ni; C. Bayan Bruss; Tom Goldstein; |
729 | Effectively Using Public Data in Privacy Preserving Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we further explore the potential of using public data in DP models, showing that utility gains can in fact be significantly higher than what shown in prior works. Specifically, we introduce DOPE-SGD, a modified DP-SGD algorithm that leverages public data during its training. |
Milad Nasr; Saeed Mahloujifar; Xinyu Tang; Prateek Mittal; Amir Houmansadr; |
730 | Learning to Initiate and Reason in Event-Driven Cascading Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new supervised learning setup called Cascade. |
Yuval Atzmon; Eli Meirom; Shie Mannor; Gal Chechik; |
731 | Anchor Sampling for Federated Learning with Partial Client Participation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training with large batches on individual clients is proposed to address data heterogeneity in general, but their effectiveness under partial client participation is not clear. Motivated by these challenges, we propose to develop a novel federated learning framework, referred to as FedAMD, for partial client participation. |
Feijie Wu; Song Guo; Zhihao Qu; Shiqi He; Ziming Liu; Jing Gao; |
732 | Scaling Laws for Generative Mixed-Modal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We report new mixed-modal scaling laws that unify the contributions of individual modalities and the interactions between them. |
Armen Aghajanyan; LILI YU; Alexis Conneau; Wei-Ning Hsu; Karen Hambardzumyan; Susan Zhang; Stephen Roller; Naman Goyal; Omer Levy; Luke Zettlemoyer; |
733 | Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance. |
Pengfei Li; Jianyi Yang; Shaolei Ren; |
734 | Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a bandit policy that is a closed-form function of said estimated parameters. |
Wonyoung Kim; Garud Iyengar; assaf zeevi; |
735 | Temporally Consistent Transformers for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we curate 3 challenging video datasets with long-range dependencies by rendering walks through 3D scenes of procedural mazes, Minecraft worlds, and indoor scans. |
Wilson Yan; Danijar Hafner; Stephen James; Pieter Abbeel; |
736 | Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a novel and general framework which unifies many existing GNN models from the view of parameterized decomposition and filtering, and show how it helps to enhance the flexibility of GNNs while alleviating the smoothness and amplification issues of existing models. |
Mingqi Yang; Wenjie Feng; Yanming Shen; Bryan Hooi; |
737 | Large Language Models Can Be Easily Distracted By Irrelevant Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the *distractibility* of large language models, i.e., how the model prediction can be distracted by irrelevant context. |
Freda Shi; Xinyun Chen; Kanishka Misra; Nathan Scales; David Dohan; Ed H. Chi; Nathanael Schärli; Denny Zhou; |
738 | GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite increasing efforts towards improving the quality of generic data types, such as images and texts, the problem of mislabel detection in graph data remains underexplored. To bridge the gap, we explore mislabelling issues in popular real-world graph datasets and propose GraphCleaner, a post-hoc method to detect and correct these mislabelled nodes in graph datasets. |
Yuwen Li; Miao Xiong; Bryan Hooi; |
739 | Data Structures for Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is close to $p$. |
Anders Aamand; Alexandr Andoni; Justin Y Chen; Piotr Indyk; Shyam Narayanan; Sandeep Silwal; |
740 | Long-Tailed Recognition By Mutual Information Maximization Between Latent Features and Ground-Truth Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to provide the background and further improve the performance. |
Min-Kook Suh; Seung-Woo Seo; |
741 | Identifiability of Label Noise Transition Matrix Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to characterize the identifiability of the label noise transition matrix. |
Yang Liu; Hao Cheng; Kun Zhang; |
742 | Proper Scoring Rules for Survival Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although there are fundamental theories on strictly proper scoring rules for uncertainty quantification, little is known about those for survival analysis. In this paper, we investigate extensions of four major strictly proper scoring rules for survival analysis and we prove that these extensions are proper under certain conditions, which arise from the discretization of the estimation of probability distributions. |
Hiroki Yanagisawa; |
743 | LazyGNN: Large-Scale Graph Neural Networks Via Lazy Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to capture long-distance dependency in graphs by shallower models instead of deeper models, which leads to a much more efficient model, LazyGNN, for graph representation learning. |
Rui Xue; Haoyu Han; MohamadAli Torkamani; Jian Pei; Xiaorui Liu; |
744 | Efficient List-Decodable Regression Using Batches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate the use of batches in studying list-decodable linear regression, in which only $\alpha\in (0,1]$ fraction of batches contain genuine samples from a common distribution and the rest can contain arbitrary or even adversarial samples. |
Abhimanyu Das; Ayush Jain; Weihao Kong; Rajat Sen; |
745 | Discovering Object-Centric Generalized Value Functions From Pixels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent `question’ functions and leveraging the subsequent learned general value functions for control. |
Somjit Nath; Gopeshh Raaj Subbaraj; Khimya Khetarpal; Samira Ebrahimi Kahou; |
746 | Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning Under Massively Parallel Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel Parallel Q-Learning (PQL) scheme that outperforms PPO in terms of wall-clock time and maintains superior sample efficiency. |
Zechu Li; Tao Chen; Zhang-Wei Hong; Anurag Ajay; Pulkit Agrawal; |
747 | Matrix Estimation for Individual Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts. |
Cindy Zhang; Sarah Huiyi Cen; Devavrat Shah; |
748 | Instrumental Variable Estimation of Average Partial Causal Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of estimating the average partial causal effect (APCE) of a continuous treatment in an IV setting. |
Yuta Kawakami; manabu kuroki; Jin Tian; |
749 | One-sided Matrix Completion from Two Observations Per Row Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns. |
Steven Cao; Percy Liang; Gregory Valiant; |
750 | Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new research direction, that leverages a specific translation invariant cost $c(x, y):=h(x-y)$ inspired by the elastic net. |
marco cuturi; Michal Klein; Pierre Ablin; |
751 | Hierarchical Neural Coding for Controllable CAD Model Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree. |
Xiang Xu; Pradeep Kumar Jayaraman; Joseph George Lambourne; Karl D.D. Willis; Yasutaka Furukawa; |
752 | Change Is Hard: A Closer Look at Subpopulation Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide a fine-grained analysis of subpopulation shift. |
Yuzhe Yang; Haoran Zhang; Dina Katabi; Marzyeh Ghassemi; |
753 | Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Ske2Grid, a new representation learning framework for improved skeleton-based action recognition. |
Dongqi Cai; Yangyuxuan Kang; Anbang Yao; Yurong Chen; |
754 | Stabilizing GANs’ Training with Brownian Motion Controller Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the stability of GANs from the perspective of control theory and propose a universal higher-order noise-based controller called Brownian Motion Controller (BMC). |
Tianjiao Luo; Ziyu Zhu; Jianfei Chen; Jun Zhu; |
755 | Featured Graph Coarsening with Similarity Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel optimization-based framework for graph coarsening that takes both the graph matrix and the node features as the input and jointly learns the coarsened graph matrix and the coarsened feature matrix while ensuring desired properties. |
Manoj Kumar; Anurag Sharma; Shashwat Saxena; Sandeep Kumar; |
756 | Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss bias reduction methods for each of the biases, and empirically investigate their effectiveness. |
Hiroshi Kajino; Kohei Miyaguchi; Takayuki Osogami; |
757 | Policy Regularization with Dataset Constraint for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that regularizing the policy towards the nearest state-action pair can be more effective and thus propose Policy Regularization with Dataset Constraint (PRDC). |
Yuhang Ran; Yi-Chen Li; Fuxiang Zhang; Zongzhang Zhang; Yang Yu; |
758 | LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning Via An Option Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic architecture. |
Woojun Kim; Jeonghye Kim; Youngchul Sung; |
759 | R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose *Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE)*, an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user. |
Daniel D. Johnson; Daniel Tarlow; Christian Walder; |
760 | SeedGNN: Graph Neural Network for Supervised Seeded Graph Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper proposes a new supervised approach that can learn from a training set how to match unseen graphs with only a few seeds. |
Liren Yu; Jiaming Xu; Xiaojun Lin; |
761 | Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). |
Fengxue Zhang; Jialin Song; James C Bowden; Alexander Ladd; Yisong Yue; Thomas Desautels; Yuxin Chen; |
762 | Lower Bounds for Learning in Revealing POMDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the fundamental limits of reinforcement learning (RL) in the challenging *partially observable* setting. |
Fan Chen; Huan Wang; Caiming Xiong; Song Mei; Yu Bai; |
763 | Width and Depth Limits Commute in Residual Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$, result in the same covariance structure no matter how that limit is taken. |
Soufiane Hayou; Greg Yang; |
764 | Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making Using Language Guided World Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose using few-shot large language models (LLMs) to hypothesize an AWM, that will be verified through world experience, to improve sample efficiency of RL agents. |
Kolby Nottingham; Prithviraj Ammanabrolu; Alane Suhr; Yejin Choi; Hannaneh Hajishirzi; Sameer Singh; Roy Fox; |
765 | Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. |
Qiwei Di; Jiafan He; Dongruo Zhou; Quanquan Gu; |
766 | Hiding Data Helps: On The Benefits of Masking for Sparse Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show in this work that, in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the elements of the ground-truth dictionary in the over-realized regime, regardless of the magnitude of the signal in the data-generating process. |
Muthu Chidambaram; Chenwei Wu; Yu Cheng; Rong Ge; |
767 | Learning to Design Analog Circuits to Meet Threshold Specifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method for generating from simulation data a dataset on which a system can be trained via supervised learning to design circuits to meet threshold specifications. |
Dmitrii Krylov; Pooya Khajeh; Junhan Ouyang; Thomas Reeves; Tongkai Liu; Hiba Ajmal; Hamidreza Aghasi; Roy Fox; |
768 | Explainability As Statistical Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a new route and cast interpretability as a statistical inference problem.We propose new datasets with ground truth selection which allow for the evaluation of the features importance map and show experimentally that multiple imputation provides more reasonable interpretations. |
Hugo Henri Joseph Senetaire; Damien Garreau; Jes Frellsen; Pierre-Alexandre Mattei; |
769 | Streaming Active Learning with Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are encountered. |
Akanksha Saran; Safoora Yousefi; Akshay Krishnamurthy; John Langford; Jordan T. Ash; |
770 | Multiply Robust Off-policy Evaluation and Learning Under Truncation By Death Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate OPE and OPL using principal stratification under truncation by death. |
Jianing Chu; Shu Yang; Wenbin Lu; |
771 | Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. |
Matthew Raffel; Drew Penney; Lizhong Chen; |
772 | Online Platt Scaling with Calibeating Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an online post-hoc calibration method, called Online Platt Scaling (OPS), which combines the Platt scaling technique with online logistic regression. |
Chirag Gupta; Aaditya Ramdas; |
773 | SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a HE-based fast neural network (NN) inference framework–SpENCNN built upon the co-design of HE operation-aware model sparsity and the single-instruction-multiple-data (SIMD)-friendly data packing, to improve NN inference latency. |
Ran Ran; Xinwei Luo; Wei Wang; Tao Liu; Gang Quan; Xiaolin Xu; Caiwen Ding; Wujie Wen; |
774 | The Benefits of Mixup for Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to seek a fundamental understanding of the benefits of Mixup. |
Difan Zou; Yuan Cao; Yuanzhi Li; Quanquan Gu; |
775 | Secure Federated Correlation Test and Entropy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first federated correlation test framework compatible with secure aggregation, namely FED-$\chi^2$. |
Qi Pang; Lun Wang; Shuai Wang; Wenting Zheng; Dawn Song; |
776 | Meta-Learning The Inductive Bias of Simple Neural Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is generally very difficult to map circuit structure to inductive bias. Here, we present a neural network tool to bridge this gap. |
Will Dorrell; Maria Yuffa; Peter E. Latham; |
777 | Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. |
Xufeng Cai; Chaobing Song; Stephen Wright; Jelena Diakonikolas; |
778 | Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-constrained, behaviorally relevant neural representations of complex behaviors. |
Cheol Jun Cho; Edward Chang; Gopala Anumanchipalli; |
779 | Provable Reset-free Reinforcement Learning By No-Regret Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make learning more practical, we propose a generic no-regret reduction to systematically design reset-free RL algorithms. |
Hoai-An Nguyen; Ching-An Cheng; |
780 | A Universal Unbiased Method for Classification from Aggregate Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses—previous research failed to achieve this goal. |
Zixi Wei; Lei Feng; Bo Han; Tongliang Liu; Gang Niu; Xiaofeng Zhu; Heng Tao Shen; |
781 | Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work addresses the issue from the lens of causal inference. |
Dong Xing; Pengjie Gu; Qian Zheng; Xinrun Wang; Shanqi Liu; Longtao Zheng; Bo An; Gang Pan; |
782 | Decentralized Stochastic Bilevel Optimization with Improved Per-Iteration Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it remains unknown how to design the distributed algorithm with sample complexity and convergence rate comparable to SGD for stochastic optimization, and at the same time without directly computing the exact Hessian or Jacobian matrices. In this paper we propose such an algorithm. |
Xuxing Chen; Minhui Huang; Shiqian Ma; Krishna Balasubramanian; |
783 | On The Impact of Algorithmic Recourse on Social Segregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the aforementioned gaps by making one of the first attempts at analyzing the delayed societal impact of algorithmic recourse. |
Ruijiang Gao; Himabindu Lakkaraju; |
784 | Understanding and Generalizing Contrastive Learning from The Inverse Optimal Transport Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to understand and generalize CL from a point set matching perspective, instead of the comparison between two points. |
Liangliang Shi; Gu Zhang; Haoyu Zhen; Jintao Fan; Junchi Yan; |
785 | Beyond Uniform Lipschitz Condition in Differentially Private Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. |
Rudrajit Das; Satyen Kale; Zheng Xu; Tong Zhang; sujay sanghavi; |
786 | Personalized Federated Learning Under Mixture of Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, these techniques often lack the ability to adapt to unseen data, further limiting their effectiveness in real-world scenarios. To address these limitations, we propose a novel approach, FedGMM, which utilizes Gaussian mixture models (GMM) to effectively fit the input data distributions across diverse clients. |
Yue Wu; SHUAICHENG ZHANG; Wenchao Yu; Yanchi Liu; Quanquan Gu; Dawei Zhou; Haifeng Chen; Wei Cheng; |
787 | SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{\emph{\texttt{SurCo}}}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem. |
Aaron M Ferber; Taoan Huang; Daochen Zha; Martin Schubert; Benoit Steiner; Bistra Dilkina; Yuandong Tian; |
788 | Towards A Persistence Diagram That Is Robust to Noise and Varied Densities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works have identified that existing methods, which construct persistence diagrams in Topological Data Analysis (TDA), are not robust to noise and varied densities in a point cloud. We analyze the necessary properties of an approach that can address these two issues, and propose a new filter function for TDA based on a new data-dependent kernel which possesses these properties. |
Hang Zhang; Kaifeng Zhang; Kai Ming Ting; Ye Zhu; |
789 | Robustness in Multimodal Learning Under Train-Test Modality Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms. |
Brandon McKinzie; Vaishaal Shankar; Joseph Yitan Cheng; Yinfei Yang; Jonathon Shlens; Alexander T Toshev; |
790 | Stochastic Gradient Succeeds for Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the stochastic gradient bandit algorithm converges to a globally optimal policy at an $O(1/t)$ rate, even with a constant step size. |
Jincheng Mei; Zixin Zhong; Bo Dai; Alekh Agarwal; Csaba Szepesvari; Dale Schuurmans; |
791 | Optimal Sets and Solution Paths of ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. |
Aaron Mishkin; Mert Pilanci; |
792 | Transformed Distribution Matching for Missing Value Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. |
He Zhao; Ke Sun; Amir Dezfouli; Edwin V. Bonilla; |
793 | Demonstration-free Autonomous Reinforcement Learning Via Implicit and Bidirectional Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose a demonstration-free ARL algorithm via Implicit and Bi-directional Curriculum (IBC). |
Jigang Kim; Daesol Cho; H. Jin Kim; |
794 | Effective Neural Topic Modeling with Embedding Clustering Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM). |
Xiaobao Wu; Xinshuai Dong; Thong Thanh Nguyen; Anh Tuan Luu; |
795 | Traversing Between Modes in Function Space for Fast Ensembling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While this provides a way to efficiently train ensembles, for inference, multiple forward passes should still be executed using all the ensemble parameters, which often becomes a serious bottleneck for real-world deployment. In this work, we propose a novel framework to reduce such costs. |
Eunggu Yun; Hyungi Lee; Giung Nam; Juho Lee; |
796 | What Is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important. |
Rui Yang; LIN Yong; Xiaoteng Ma; Hao Hu; Chongjie Zhang; Tong Zhang; |
797 | Understanding Oversquashing in GNNs Through The Lens of Effective Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Understanding and mitigating oversquashing has recently received significant attention from the research community. In this paper, we continue this line of work by analyzing oversquashing through the lens of the *effective resistance* between nodes in the input graph. |
Mitchell Black; Zhengchao Wan; Amir Nayyeri; Yusu Wang; |
798 | Fast Rates in Time-Varying Strongly Monotone Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new decentralized online algorithm for time-varying strongly monotone games, which greatly improves existing results and obtains fast rates, matching the best time-invariant guarantee without knowing the environmental non-stationarity. |
Yu-Hu Yan; Peng Zhao; Zhi-Hua Zhou; |
799 | Why Did The Model Fail?: Attributing Model Performance Changes to Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the problem of attributing performance differences between environments to distribution shifts in the underlying data generating mechanisms. |
Haoran Zhang; Harvineet Singh; Marzyeh Ghassemi; Shalmali Joshi; |
800 | Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning. |
Quanqi Hu; Zi-Hao Qiu; Zhishuai Guo; Lijun Zhang; Tianbao Yang; |
801 | Tighter Analysis for ProxSkip Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a tighter analysis for ProxSkip, an algorithm that allows fewer proximal operator computations to solve composite optimization problems. |
Zhengmian Hu; Heng Huang; |
802 | Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance. |
Yilun Du; Conor Durkan; Robin Strudel; Joshua B. Tenenbaum; Sander Dieleman; Rob Fergus; Jascha Sohl-Dickstein; Arnaud Doucet; Will Sussman Grathwohl; |
803 | Global Optimization with Parametric Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm GO-UCB that leverages a parametric family of functions (e.g., neural networks) instead. |
Chong Liu; Yu-Xiang Wang; |
804 | Online Learning in Stackelberg Games with An Omniscient Follower Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. |
Geng Zhao; Banghua Zhu; Jiantao Jiao; Michael Jordan; |
805 | On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Only existing attempt shows the same query time of $O(\log n)$, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in $O(\log\log n)$ expected query time. |
Sepanta Zeighami; Cyrus Shahabi; |
806 | Atari-5: Distilling The Arcade Learning Environment Down to Five Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the computational cost of generating results on the entire 57-game dataset limits ALE’s use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting small but representative subsets of environments within a benchmark suite. |
Matthew Aitchison; Penny Sweetser; Marcus Hutter; |
807 | Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the robuSt vAriational ofF-policy lEaRning (SAFER) approach, which only requires benign training data without attacking the agent. |
Zuxin Liu; Zijian Guo; Zhepeng Cen; Huan Zhang; Yihang Yao; Hanjiang Hu; Ding Zhao; |
808 | Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of how to differentiate computations expressed relationally. |
Yuxin Tang; Zhimin Ding; Dimitrije Jankov; Binhang Yuan; Daniel Bourgeois; Chris Jermaine; |
809 | Supported Trust Region Optimization for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy, enjoying the less restrictive support constraint. |
Yixiu Mao; Hongchang Zhang; Chen Chen; Yi Xu; Xiangyang Ji; |
810 | Minimizing Trajectory Curvature of ODE-based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the relationship between the forward process and the curvature, here we present an efficient method of training the forward process to minimize the curvature of generative trajectories without any ODE/SDE simulation. |
Sangyun Lee; Beomsu Kim; Jong Chul Ye; |
811 | Communication-Efficient Federated Hypergradient Computation Via Aggregated Iterative Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel communication-efficient federated hypergradient estimator via aggregated iterative differentiation (AggITD). |
Peiyao Xiao; Kaiyi Ji; |
812 | Motion Question Answering Via Modular Motion Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences |
Mark Endo; Joy Hsu; Jiaman Li; Jiajun Wu; |
813 | Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise. |
Heyang Zhao; Dongruo Zhou; Jiafan He; Quanquan Gu; |
814 | Graph Reinforcement Learning for Network Control Via Bi-Level Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality.To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. |
Daniele Gammelli; James Harrison; Kaidi Yang; Marco Pavone; Filipe Rodrigues; Francisco C. Pereira; |
815 | System Identification of Neural Systems: If We Got It Right, Would We Know? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the most commonly used comparison techniques, such as a linear encoding model and centered kernel alignment, to correctly identify a model by replacing brain recordings with known ground truth models. |
Yena Han; Tomaso Poggio; Brian Cheung; |
816 | From Robustness to Privacy and Back Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the relationship between two desiderata of algorithms in statistical inference and machine learning—differential privacy and robustness to adversarial data corruptions. |
Hilal Asi; Jonathan Ullman; Lydia Zakynthinou; |
817 | Gradient-based Wang–Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks Over The Input Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Exhaustive enumeration or traditional Monte Carlo methods for the entire input space can exhibit impractical sampling time, especially for high-dimensional inputs. To make such difficult sampling computationally feasible, in this paper, we propose a novel Gradient-based Wang-Landau (GWL) sampler. |
Weitang Liu; Yi-Zhuang You; Ying Wai Li; Jingbo Shang; |
818 | Spatial-Temporal Graph Learning with Adversarial Contrastive Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ubiquitous spatial-temporal data noise and incompleteness in real-life scenarios bring difficulties to generate high-quality region representations. In this paper, we propose a Spatial-Temporal Adversarial Graph contrastive learning model (STAG) to tackle this challenge for adaptive self-supervised graph augmentation. |
Qianru Zhang; Chao Huang; Lianghao Xia; Zheng Wang; Siu Ming Yiu; Ruihua Han; |
819 | Estimating Joint Treatment Effects By Combining Multiple Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop machinery for estimating joint treatment effects by combining data from multiple experimental datasets. |
Yonghan Jung; Jin Tian; Elias Bareinboim; |
820 | Brainformers: Trading Simplicity for Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. |
Yanqi Zhou; Nan Du; Yanping Huang; Daiyi Peng; Chang Lan; Da Huang; Siamak Shakeri; David So; Andrew M. Dai; Yifeng Lu; Zhifeng Chen; Quoc V Le; Claire Cui; James Laudon; Jeff Dean; |
821 | On The Convergence of Federated Averaging with Cyclic Client Participation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. |
Yae Jee Cho; Pranay Sharma; Gauri Joshi; Zheng Xu; Satyen Kale; Tong Zhang; |
822 | Learning to Bid in Repeated First-Price Auctions with Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the problem of learning in repeated first-price auctions with budgets. |
Qian Wang; Zongjun Yang; Xiaotie Deng; Yuqing Kong; |
823 | Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. |
Ilyas Fatkhullin; Anas Barakat; Anastasia Kireeva; Niao He; |
824 | Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This dichotomy amounts to a trade-off for real-time Automatic Speech Recognition (ASR) system design: profit from the low-latency benefit of strictly-causal architectures while accepting predictive performance limitations, or realize the modeling benefits of future-context models accompanied by their higher latency penalty. In this work, we relax the constraints of this choice and present the Adaptive Non-Causal Attention Transducer (ANCAT). |
Grant Strimel; Yi Xie; Brian John King; martin radfar; Ariya Rastrow; Athanasios Mouchtaris; |
825 | Infinite Action Contextual Bandits with Reusable Data Exhaust Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe an online algorithm with an equivalent smoothed regret guarantee, but which generates well-defined importance weights: in exchange, the online computational cost increases, but only to order smoothness (i.e., still independent of the action set). |
Mark Rucker; Yinglun Zhu; Paul Mineiro; |
826 | CodeIPPrompt: Intellectual Property Infringement Assessment of Code Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to bridge the gap by presenting CodeIPPrompt, a platform for automatic evaluation of the extent to which code language models may reproduce licensed programs. |
Zhiyuan Yu; Yuhao Wu; Ning Zhang; Chenguang Wang; Yevgeniy Vorobeychik; Chaowei Xiao; |
827 | EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. |
Kaja Gruntkowska; Alexander Tyurin; Peter Richtárik; |
828 | Dropout Reduces Underfitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. |
Zhuang Liu; Zhiqiu Xu; Joseph Jin; Zhiqiang Shen; Trevor Darrell; |
829 | Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution. |
Stone Tao; Xiaochen Li; Tongzhou Mu; Zhiao Huang; Yuzhe Qin; Hao Su; |
830 | D2Match: Leveraging Deep Learning and Degeneracy for Subgraph Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop $D^2$Match by leveraging the efficiency of Deep learning and Degeneracy for subgraph matching. |
Xuanzhou Liu; Lin Zhang; Jiaqi Sun; Yujiu Yang; Haiqin Yang; |
831 | Improving Medical Predictions By Irregular Multimodal Electronic Health Records Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our method first addresses irregularity in each single modality by (1) modeling irregular time series by dynamically incorporating hand-crafted imputation embeddings into learned interpolation embeddings via a gating mechanism, and (2) casting a series of clinical note representations as multivariate irregular time series and tackling irregularity via a time attention mechanism. |
Xinlu Zhang; Shiyang Li; Zhiyu Chen; Xifeng Yan; Linda Ruth Petzold; |
832 | Target-based Surrogates for Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. |
Jonathan Wilder Lavington; Sharan Vaswani; Reza Babanezhad Harikandeh; Mark Schmidt; Nicolas Le Roux; |
833 | Reinforcement Learning with History Dependent Dynamic Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce *Dynamic Contextual Markov Decision Processes (DCMDPs)*, a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. |
Guy Tennenholtz; Nadav Merlis; Lior Shani; Martin Mladenov; Craig Boutilier; |
834 | CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. |
Chaejeong Lee; Jayoung Kim; Noseong Park; |
835 | PLay: Parametrically Conditioned Layout Generation Using Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices. |
Chin-Yi Cheng; Forrest Huang; Gang Li; Yang Li; |
836 | On Pre-Training for Visuo-Motor Control: Revisiting A Learning-from-Scratch Baseline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. |
Nicklas Hansen; Zhecheng Yuan; Yanjie Ze; Tongzhou Mu; Aravind Rajeswaran; Hao Su; Huazhe Xu; Xiaolong Wang; |
837 | Sketched Ridgeless Linear Regression: The Role of Downsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size. |
Xin Chen; Yicheng Zeng; Siyue Yang; Qiang Sun; |
838 | Does Sparsity Help in Learning Misspecified Linear Bandits? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet it is unknown whether a structural assumption on the ground-truth parameter, such as sparsity, could break $\varepsilon\sqrt{d}$ barrier. In this paper, we address this question by showing that algorithms can obtain $O(\varepsilon)$-optimal actions by querying $\tilde{O}(\exp(m\varepsilon))$ actions, where $m$ is the sparsity parameter, removing the $\exp(d)$-dependence. |
Jialin Dong; Lin Yang; |
839 | Gradient Descent in Neural Networks As Sequential Learning in Reproducing Kernel Banach Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to extend beyond the limits of NTK toward a more general theory. |
Alistair Shilton; Sunil Gupta; Santu Rana; Svetha Venkatesh; |
840 | Statistical Inference on Multi-armed Bandits with Delayed Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an adaptively weighted estimator that on one hand incorporates the arm-dependent delaying mechanism to achieve consistency, and on the other hand mitigates the variance inflation across stages due to vanishing sampling probability. |
Lei Shi; Jingshen Wang; Tianhao Wu; |
841 | Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in practice, only approximated projections are accessible and their convergence is not well understood. To fill this gap, we present a first convergence analysis of the Schrödinger bridge algorithm based on approximated projections. |
Yu Chen; Wei Deng; Shikai Fang; Fengpei Li; Tianjiao Nicole Yang; Yikai Zhang; Kashif Rasul; Shandian Zhe; Anderson Schneider; Yuriy Nevmyvaka; |
842 | Towards Understanding Ensemble Distillation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build a theoretical foundation of the ensemble distillation framework in federated learning from the perspective of kernel ridge regression (KRR). |
Sejun Park; Kihun Hong; Ganguk Hwang; |
843 | Randomized Schur Complement Views for Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a randomized topological augmentor based on Schur complements for Graph Contrastive Learning (GCL). |
Vignesh Kothapalli; |
844 | Improved Algorithms for White-Box Adversarial Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. |
Ying Feng; David Woodruff; |
845 | Towards Trustworthy Explanation: On Causal Rationalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. |
Wenbo Zhang; TONG WU; Yunlong Wang; Yong Cai; Hengrui Cai; |
846 | GREAD: Graph Neural Reaction-Diffusion Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. |
Jeongwhan Choi; Seoyoung Hong; Noseong Park; Sung-Bae Cho; |
847 | FlexRound: Learnable Rounding Based on Element-wise Division for Post-Training Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As PTQ schemes based on reconstructing each layer or block output turn out to be effective to enhance quantized model performance, recent works have developed algorithms to devise and learn a new weight-rounding scheme so as to better reconstruct each layer or block output. In this work, we propose a simple yet effective new weight-rounding mechanism for PTQ, coined FlexRound, based on element-wise division instead of typical element-wise addition such that FlexRound enables jointly learning a common quantization grid size as well as a different scale for each pre-trained weight. |
Jung Hyun Lee; Jeonghoon Kim; Se Jung Kwon; Dongsoo Lee; |
848 | Distributional Offline Policy Evaluation with Predictive Error Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilistic generative models as long as it can be trained via MLE. |
Runzhe Wu; Masatoshi Uehara; Wen Sun; |
849 | MODeL: Memory Optimizations for Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MODeL, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks. |
Benoit Steiner; Mostafa Elhoushi; Jacob Kahn; James Hegarty; |
850 | Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In domain generalization (DG), the target domain is unknown when the model is being trained, and the trained model should successfully work on an arbitrary (and possibly unseen) target domain during inference. This is a difficult problem, and despite active studies in recent years, it remains a great challenge. In this paper, we take a simple yet effective approach to tackle this issue. |
Jungwuk Park; Dong-Jun Han; Soyeong Kim; Jaekyun Moon; |
851 | Difference-in-Differences Meets Tree-based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study considers the estimation of conditional causal effects in the presence of unmeasured confounding for a balanced panel with treatment imposed at the last time point. To address this, we combine Difference-in-differences (DiD) and tree-based methods and propose a new identification assumption that allows for the violation of the (conditional) parallel trends assumption adopted by most existing DiD methods. |
Caizhi Tang; Huiyuan Wang; Xinyu Li; Qing Cui; Longfei Li; JUN ZHOU; |
852 | Loss Balancing for Fair Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Imposing EL on the learning process leads to a non-convex optimization problem even if the loss function is convex, and the existing fair learning algorithms cannot properly be adopted to find the fair predictor under the EL constraint. This paper introduces an algorithm that can leverage off-the-shelf convex programming tools (e.g., CVXPY (Diamond and Boyd, 2016; Agrawal et al., 2018)) to efficiently find the global optimum of this non-convex optimization. |
Mohammad Mahdi Khalili; Xueru Zhang; Mahed Abroshan; |
853 | Constrained Decision Transformer for Offline Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. |
Zuxin Liu; Zijian Guo; Yihang Yao; Zhepeng Cen; Wenhao Yu; Tingnan Zhang; Ding Zhao; |
854 | Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the connection between the durability of FL backdoors and the relationships between benign images and poisoned images (i.e., the images whose labels are flipped to the target label during local training). |
Yanbo Dai; Songze Li; |
855 | Generative Adversarial Symmetry Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework, LieGAN, to *automatically discover equivariances* from a dataset using a paradigm akin to generative adversarial training. |
Jianke Yang; Robin Walters; Nima Dehmamy; Rose Yu; |
856 | Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. |
Qihan Ren; Huiqi Deng; Yunuo Chen; Siyu Lou; Quanshi Zhang; |
857 | RLSbench: Domain Adaptation Under Relaxed Label Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce RLSbench, a large-scale benchmark for *relaxed label shift*, consisting of $>$500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. |
Saurabh Garg; Nick Erickson; James Sharpnack; Alex Smola; Sivaraman Balakrishnan; Zachary Chase Lipton; |
858 | Mixture Proportion Estimation Beyond Irreducibility Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a more general sufficient condition that accommodates several settings of interest where irreducibility does not hold. |
Yilun Zhu; Aaron Fjeldsted; Darren Holland; George Landon; Azaree Lintereur; Clayton Scott; |
859 | Pre-computed Memory or On-the-fly Encoding? A Hybrid Approach to Retrieval Augmentation Makes The Most of Your Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. |
Michiel de Jong; Yury Zemlyanskiy; Nicholas FitzGerald; Joshua Ainslie; Sumit Sanghai; Fei Sha; William W. Cohen; |
860 | Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. |
Jamil Arbas; Hassan Ashtiani; Christopher Liaw; |
861 | Behavior Contrastive Learning for Unsupervised Skill Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel unsupervised skill discovery method through contrastive learning among behaviors, which makes the agent produce similar behaviors for the same skill and diverse behaviors for different skills. |
Rushuai Yang; Chenjia Bai; Hongyi Guo; Siyuan Li; Bin Zhao; Zhen Wang; Peng Liu; Xuelong Li; |
862 | ELSA: Efficient Label Shift Adaptation Through The Lens of Semiparametric Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we first propose a moment-matching framework for adapting the label shift based on the geometry of the influence function. Under such a framework, we propose a novel method named $\underline{\mathrm{E}}$fficient $\underline{\mathrm{L}}$abel $\underline{\mathrm{S}}$hift $\underline{\mathrm{A}}$daptation (ELSA), in which the adaptation weights can be estimated by solving linear systems. |
Qinglong Tian; Xin Zhang; Jiwei Zhao; |
863 | FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients’ uploaded data embeddings. |
Songze Li; Duanyi YAO; Jin Liu; |
864 | On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we (re)introduce and investigate a metric, named Orlicz-Wasserstein distance, in the study of the Bayesian contraction behavior for the parameters. |
Aritra Guha; Nhat Ho; XuanLong Nguyen; |
865 | Toward Large Kernel Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. |
Amirhesam Abedsoltan; Mikhail Belkin; Parthe Pandit; |
866 | Poisoning Generative Replay in Continual Learning to Promote Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their vulnerability under poisoning attacks has been largely understudied. In this work, we investigate this issue in the context of continual learning, where generative replayers are utilized to tackle catastrophic forgetting. |
Siteng Kang; Zhan Shi; Xinhua Zhang; |
867 | Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Optimal transport is an important tool in machine learning, allowing to capture geometric properties of the data through a linear program on transport polytopes. We present a single-loop optimization algorithm for minimizing general convex objectives on these domains, utilizing the principles of Sinkhorn matrix scaling and mirror descent. |
Marin Ballu; Quentin Berthet; |
868 | Learning Globally Smooth Functions on Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work combines techniques from semi-infinite constrained learning and manifold regularization to learn representations that are globally smooth on a manifold. |
Juan Cervino; Luiz F. O. Chamon; Benjamin David Haeffele; Rene Vidal; Alejandro Ribeiro; |
869 | Offline Learning in Markov Games with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. |
Yuheng Zhang; Yu Bai; Nan Jiang; |
870 | Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. |
Yulai Zhao; Zhuoran Yang; Zhaoran Wang; Jason D. Lee; |
871 | When Sparsity Meets Contrastive Models: Less Graph Data Can Bring Better Class-Balanced Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we find that pruned sparse contrastive models may miss valuable information, leading to a large loss value on the informative subset. Motivated by the above findings, we develop a unified data model dynamic sparsity framework called Data Decantation (DataDec) to address the above challenges. |
Chunhui Zhang; Chao Huang; Yijun Tian; Qianlong Wen; Zhongyu Ouyang; Youhuan Li; Yanfang Ye; Chuxu Zhang; |
872 | Efficient Online Reinforcement Learning with Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: * In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. |
Philip J. Ball; Laura Smith; Ilya Kostrikov; Sergey Levine; |
873 | Magneto: A Foundation Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal. |
Hongyu Wang; Shuming Ma; Shaohan Huang; Li Dong; Wenhui Wang; Zhiliang Peng; Yu Wu; Payal Bajaj; Saksham Singhal; Alon Benhaim; Barun Patra; Zhun Liu; Vishrav Chaudhary; Xia Song; Furu Wei; |
874 | A Three-regime Model of Network Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistical mechanics of learning. |
Yefan Zhou; Yaoqing Yang; Arin Chang; Michael W. Mahoney; |
875 | Inverse Reinforcement Learning Without Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an *exponential* speedup in theory. |
Gokul Swamy; David Wu; Sanjiban Choudhury; Drew Bagnell; Steven Wu; |
876 | An Information-Theoretic Analysis of Nonstationary Bandit Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. |
Seungki Min; Daniel Russo; |
877 | Moccasin: Efficient Tensor Rematerialization for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. |
Burak Bartan; Haoming Li; Harris Teague; Christopher Lott; Bistra Dilkina; |
878 | On The Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a class of dynamic time-consistent risk measures, called Expected Conditional Risk Measures (ECRMs), and derive policy gradient updates for ECRM-based objective functions. |
Xian Yu; Lei Ying; |
879 | Learning Mixtures of Gaussians with Censored Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error. |
Wai Ming Tai; Bryon Aragam; |
880 | Theoretical Bounds on The Network Community Profile from Low-rank Semi-definite Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study a new connection between a technical measure called $\mu$-conductance that arises in the study of Markov chains for sampling convex bodies and the network community profile that characterizes size-resolved properties of clusters and communities in social and information networks. |
Yufan Huang; C. Seshadhri; David F. Gleich; |
881 | Not All Semantics Are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled manner. |
Zi-Hao Qiu; Quanqi Hu; Zhuoning Yuan; Denny Zhou; Lijun Zhang; Tianbao Yang; |
882 | Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using An Adaptive Sampling Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time. |
Boxin Zhao; Boxiang Lyu; Raul Castro Fernandez; mladen kolar; |
883 | Policy Gradient in Robust MDPs with Global Convergence Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. |
Qiuhao Wang; Chin Pang Ho; Marek Petrik; |
884 | Wrapped Cauchy Distributed Angular Softmax for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Wrapped Cauchy Distributed Angular Softmax (WCDAS), a novel softmax function that incorporates data-wise Gaussian-based kernels into the angular correlation between feature representations and classifier weights, effectively mitigating noise and sparse sampling concerns. |
Boran Han; |
885 | Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an approximation scheme for solving bilevel programs with equilibrium constraints, which are generally difficult to solve. |
Jiayang Li; Jing Yu; Boyi Liu; Yu Nie; Zhaoran Wang; |
886 | On Penalty-based Bilevel Gradient Descent Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle the bilevel problem through the lens of the penalty method. |
Han Shen; Tianyi Chen; |
887 | Eventual Discounting Temporal Logic Counterfactual Experience Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification. |
Cameron Voloshin; Abhinav Verma; Yisong Yue; |
888 | Scaling Laws for Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. |
Patrick Fernandes; Behrooz Ghorbani; Xavier Garcia; Markus Freitag; Orhan Firat; |
889 | Fairness in Matching Under Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As our key contribution, we carefully axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits; indeed, it simultaneously recognizes uncertainty as the primary potential cause of unfairness and an approach to address it. |
Siddartha Devic; David Kempe; Vatsal Sharan; Aleksandra Korolova; |
890 | Competitive Gradient Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of convergence to a stationary point in zero-sum games. We propose competitive gradient optimization (CGO), a gradient-based method that incorporates the interactions between two players in zero-sum games for its iterative updates. |
Abhijeet Vyas; Brian Bullins; Kamyar Azizzadenesheli; |
891 | Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, given a simple general instruction, e.g., beating all enemies, agents are required to decompose it into multiple subgoals and figure out the right one to focus on. Inspired by previous work, we try to address these issues at the entity level and propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi). |
Ziluo Ding; Wanpeng Zhang; Junpeng Yue; Xiangjun Wang; Tiejun Huang; Zongqing Lu; |
892 | Better Diffusion Models Further Improve Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: After two years of rapid development in diffusion models, a question naturally arises: can better diffusion models further improve adversarial training? This paper gives an affirmative answer by employing the most recent diffusion model which has higher efficiency ($\sim 20$ sampling steps) and image quality (lower FID score) compared with DDPM. |
Zekai Wang; Tianyu Pang; Chao Du; Min Lin; Weiwei Liu; Shuicheng YAN; |
893 | Learning Deep Time-index Models for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Indeed, while naive deep time-index models are far more expressive than the manually predefined function representations of classical time-index models, they are inadequate for forecasting, being unable to generalize to unseen time steps due to the lack of inductive bias. In this paper, we propose DeepTime, a meta-optimization framework to learn deep time-index models which overcome these limitations, yielding an efficient and accurate forecasting model. |
Gerald Woo; Chenghao Liu; Doyen Sahoo; Akshat Kumar; Steven Hoi; |
894 | Surrogate Module Learning: Reduce The Gradient Error Accumulation in Training Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to reducing gradient error from a new perspective called surrogate module learning (SML). |
Shikuang Deng; Hao Lin; Yuhang Li; Shi Gu; |
895 | How to Address Monotonicity for Model Risk Management? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of establishing the accountability and fairness of transparent machine learning models through monotonicity. |
Dangxing Chen; Weicheng Ye; |
896 | B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding. |
Miruna Oprescu; Jacob Dorn; Marah Ghoummaid; Andrew Jesson; Nathan Kallus; Uri Shalit; |
897 | DeSRA: Detect and Delete The Artifacts of GAN-based Real-World Super-Resolution Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths. |
Liangbin Xie; Xintao Wang; Xiangyu Chen; Gen Li; Ying Shan; Jiantao Zhou; Chao Dong; |
898 | Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the Tensor-GP model by introducing an integrative dimensionality reduction technique, called tensor contraction, with a Tensor-GP for a scalar-on-tensor regression task with multi-channel imaging data. |
Hu Sun; Ward Manchester; Meng Jin; Yang Liu; Yang Chen; |
899 | Automatically Auditing Large Language Models Via Discrete Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we cast auditing as an optimization problem, where we automatically search for input-output pairs that match a desired target behavior. |
Erik Jones; Anca Dragan; Aditi Raghunathan; Jacob Steinhardt; |
900 | On The Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression. |
Weitong Zhang; Jiafan He; Zhiyuan Fan; Quanquan Gu; |
901 | AutoCoreset: An Automatic Practical Coreset Construction Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we suggest an automatic practical framework for constructing coresets, which requires (only) the input data and the desired cost function from the user, without the need for any other task-related computation to be done by the user. To do so, we reduce the problem of approximating a loss function to an instance of vector summation approximation, where the vectors we aim to sum are loss vectors of a specific subset of the queries, such that we aim to approximate the image of the function on this subset. |
Alaa Maalouf; Murad Tukan; Vladimir Braverman; Daniela Rus; |
902 | On The Expressive Power of Geometric Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a geometric version of the WL test (GWL) for discriminating geometric graphs while respecting the underlying physical symmetries: permutations, rotation, reflection, and translation. |
Chaitanya K. Joshi; Cristian Bodnar; Simon V Mathis; Taco Cohen; Pietro Lio; |
903 | Deep Regression Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore unlearning for the regression problem, particularly in deep learning models. |
Ayush Kumar Tarun; Vikram Singh Chundawat; Murari Mandal; Mohan Kankanhalli; |
904 | Variance Control for Distributional Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator Quantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. |
Qi Kuang; Zhoufan Zhu; Liwen Zhang; Fan Zhou; |
905 | Data-Copying in Generative Models: A Formal Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A formal framework for memorization in generative models, called “data-copying” was proposed by Meehan et. al (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization. |
Robi Bhattacharjee; Sanjoy Dasgupta; Kamalika Chaudhuri; |
906 | The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q- Learning dynamics. |
Aamal Hussain; Francesco Belardinelli; Dario Paccagnan; |
907 | A Kernel-Based View of Language Model Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate whether the Neural Tangent Kernel (NTK)—which originated as a model to study the gradient descent dynamics of infinitely wide networks with suitable random initialization—describes fine-tuning of pre-trained LMs. |
Sadhika Malladi; Alexander Wettig; Dingli Yu; Danqi Chen; Sanjeev Arora; |
908 | Graph Contrastive Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically study the vulnerability of GCL in the presence of malicious backdoor adversaries. |
Hangfan Zhang; Jinghui Chen; Lu Lin; Jinyuan Jia; Dinghao Wu; |
909 | Provable Data Subset Selection For Efficient Neural Networks Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first algorithm to construct coresets for *RBFNNs*, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an *RBFNN* on the larger input data. |
Murad Tukan; Samson Zhou; Alaa Maalouf; Daniela Rus; Vladimir Braverman; Dan Feldman; |
910 | On The Stepwise Nature of Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple picture of the training process of self-supervised learning methods with dual deep networks. |
James B Simon; Maksis Knutins; Liu Ziyin; Daniel Geisz; Abraham J Fetterman; Joshua Albrecht; |
911 | How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure, that we term $K$-RCPS, which allows to $(i)$ provide entrywise calibrated intervals for future samples of any diffusion model, and $(ii)$ control a certain notion of risk with respect to a ground truth image with minimal mean interval length. |
Jacopo Teneggi; Matthew Tivnan; Web Stayman; Jeremias Sulam; |
912 | Estimating Causal Effects Using A Multi-task Deep Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population. |
Ziyang Jiang; Zhuoran Hou; Yiling Liu; Yiman Ren; Keyu Li; David Carlson; |
913 | Controlling Posterior Collapse By An Inverse Lipschitz Constraint on The Decoder Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee. |
Yuri Kinoshita; Kenta Oono; Kenji Fukumizu; Yuichi Yoshida; Shin-ichi Maeda; |
914 | Coordinate Descent Methods for Fractional Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a class of structured fractional minimization problems, in which the numerator part of the objective is the sum of a differentiable convex function and a convex non-smooth function, while the denominator part is a convex or concave function. |
Ganzhao Yuan; |
915 | Interpretable Neural-Symbolic Concept Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings. |
Pietro Barbiero; Gabriele Ciravegna; Francesco Giannini; Mateo Espinosa Zarlenga; Lucie Charlotte Magister; Alberto Tonda; Pietro Lio; Frederic Precioso; Mateja Jamnik; Giuseppe Marra; |
916 | Incentivizing Exploration with Linear Contexts and Combinatorial Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. |
Mark Sellke; |
917 | Towards Robust Graph Incremental Learning on Evolving Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the inductive NGIL problem, which accounts for the evolution of graph structure (structural shift) induced by emerging tasks. |
Junwei Su; Difan Zou; Zijun Zhang; Chuan Wu; |
918 | Proximal Causal Learning of Conditional Average Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the P-learner, motivated by the R- and DR-learner, a tailored two-stage loss function for learning heterogeneous treatment effects in settings where exchangeability given observed covariates is an implausible assumption, and we wish to rely on proxy variables for causal inference. |
Erik Sverdrup; Yifan Cui; |
919 | SAM Operates Far from Home: Eigenvalue Regularization As A Dynamical Phenomenon Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. |
Atish Agarwala; Yann Dauphin; |
920 | Dimensionality Reduction for General KDE Mode Finding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we significantly generalize a result of (LeeLiMusco:2021) on mode approximation for Gaussian mixture models. |
Xinyu Luo; Christopher P Musco; Cas Widdershoven; |
921 | Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the exact observed likelihood in the presence of data heterogeneity. |
Yubo Zhuang; Xiaohui Chen; Yun Yang; |
922 | Markovian Gaussian Process Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the equivalent discrete state space representation of Markovian GPs to enable linear time GPVAE training via Kalman filtering and smoothing. |
Harrison Zhu; Carles Balsells-Rodas; Yingzhen Li; |
923 | Efficient Bound of Lipschitz Constant for Convolutional Layers By Gram Iteration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. |
Blaise Delattre; Quentin Barthélemy; Alexandre Araujo; Alexandre Allauzen; |
924 | Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs. |
Chen Wang; |
925 | Approximation Algorithms for Fair Range Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide an efficient constant factor approximation algorithm for the fair range $\ell_p$-clustering for all values of $p\in [1,\infty)$. |
Sedjro Salomon Hotegni; Sepideh Mahabadi; Ali Vakilian; |
926 | Does Continual Learning Equally Forget All Parameters? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. |
Haiyan Zhao; Tianyi Zhou; Guodong Long; Jing Jiang; Chengqi Zhang; |
927 | Towards Understanding and Improving GFlowNet Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution. |
Max W Shen; Emmanuel Bengio; Ehsan Hajiramezanali; Andreas Loukas; Kyunghyun Cho; Tommaso Biancalani; |
928 | Causal Isotonic Calibration for Heterogeneous Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. |
Lars van der Laan; Ernesto Ulloa-Perez; Marco Carone; Alex Luedtke; |
929 | Gradient-Free Structured Pruning with Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a gradient-free structured pruning framework that uses only unlabeled data. |
Azade Nova; Hanjun Dai; Dale Schuurmans; |
930 | How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. |
Raphael Olivier; Bhiksha Raj; |
931 | Coder Reviewer Reranking for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by collaborative programming, we propose Coder-Reviewer reranking. |
Tianyi Zhang; Tao Yu; Tatsunori Hashimoto; Mike Lewis; Wen-tau Yih; Daniel Fried; Sida Wang; |
932 | On The Training Instability of Shuffling SGD with Batch Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present explicit constructions to show how SS leads to distorted optima in regression and divergence for classification, whereas RR avoids both distortion and divergence. |
David Xing Wu; Chulhee Yun; Suvrit Sra; |
933 | Improving Fair Training Under Correlation Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We utilize the notion of correlation shifts between labels and groups, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the in-processing approaches to overcome their limitations. |
Yuji Roh; Kangwook Lee; Steven Euijong Whang; Changho Suh; |
934 | A Hybrid Quantum-Classical Approach Based on The Hadamard Transform for The Convolutional Layer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. |
Hongyi Pan; Xin Zhu; Salih Furkan Atici; Ahmet Cetin; |
935 | Benign Overfitting in Deep Neural Networks Under Lazy Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining (nearly) zero-training error under the lazy training regime. |
Zhenyu Zhu; Fanghui Liu; Grigorios Chrysos; Francesco Locatello; Volkan Cevher; |
936 | Run-off Election: Improved Provable Defense Against Data Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that merely considering the majority vote in ensemble defenses is wasteful as it does not effectively utilize available information in the logits layers of the base models. |
Keivan Rezaei; Kiarash Banihashem; Atoosa Chegini; Soheil Feizi; |
937 | Understanding Self-Distillation in The Presence of Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically characterize the effect of SD in two supervised learning problems with *noisy labels*. |
Rudrajit Das; sujay sanghavi; |
938 | Dataset Distillation with Convexified Implicit Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. |
Noel Loo; Ramin Hasani; Mathias Lechner; Daniela Rus; |
939 | Omnipredictors for Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce omnipredictors for constrained optimization and study their complexity and implications. |
Lunjia Hu; Inbal Rachel Livni Navon; Omer Reingold; Chutong Yang; |
940 | ACAT: Adversarial Counterfactual Attention for Classification and Detection in Medical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these are expensive to collect and may vary significantly across annotators. To overcome these issues, we propose a framework that employs saliency maps to obtain soft spatial attention masks that modulate the image features at different scales. |
Alessandro Fontanella; Antreas Antoniou; Wenwen Li; Joanna Wardlaw; Grant Mair; Emanuele Trucco; Amos Storkey; |
941 | Recasting Self-Attention with Holographic Reduced Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). |
Mohammad Mahmudul Alam; Edward Raff; Stella Biderman; Tim Oates; James Holt; |
942 | Generative Graph Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Fused Gromov-Wasserstein (FGW) Mixture Model named FraMe to address the GDL problem from the generative view. |
Zhichen Zeng; Ruike Zhu; Yinglong Xia; Hanqing Zeng; Hanghang Tong; |
943 | Synergies Between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide evidence that disentangled representations coupled with sparse task-specific predictors improve generalization. |
Sebastien Lachapelle; Tristan Deleu; Divyat Mahajan; Ioannis Mitliagkas; Yoshua Bengio; Simon Lacoste-Julien; Quentin Bertrand; |
944 | Gaussian Processes at The Helm(holtz): A More Fluid Model for Ocean Currents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. |
Renato Berlinghieri; Brian L. Trippe; David R. Burt; Ryan James Giordano; Kaushik Srinivasan; Tamay Özgökmen; Junfei Xia; Tamara Broderick; |
945 | End-to-End Multi-Object Detection with A Regularized Mixture Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework to train an end-to-end multi-object detector consisting of only two terms: negative log-likelihood (NLL) and a regularization term. |
Jaeyoung Yoo; Hojun Lee; Seunghyeon Seo; Inseop Chung; Nojun Kwak; |
946 | Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we bring back the welfare objectives of ad auctions into CTR predictions and propose a novel weighted rankloss to train the CTR model. |
Boxiang Lyu; Zhe Feng; Zachary Robertson; Oluwasanmi O Koyejo; |
947 | Data-Driven Subgroup Identification for Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label. |
Zachary Izzo; Ruishan Liu; James Zou; |
948 | Hindsight Learning for MDPs with Exogenous Inputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). |
Sean R. Sinclair; Felipe Vieira Frujeri; Ching-An Cheng; Luke Marshall; Hugo De Oliveira Barbalho; Jingling Li; Jennifer Neville; Ishai Menache; Adith Swaminathan; |
949 | Model Transferability with Responsive Decision Subjects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift. |
Yatong Chen; Zeyu Tang; Kun Zhang; Yang Liu; |
950 | LegendreTron: Uprising Proper Multiclass Loss Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LegendreTron as a novel and practical method that jointly learns *proper canonical losses* and probabilities for multiclass problems. |
Kevin H Lam; Christian Walder; Spiridon Penev; Richard Nock; |
951 | Semi-Offline Reinforcement Learning for Optimized Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose semi-offline RL, a novel paradigm that can smoothly transit from the offline setting to the online setting, balances the exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. |
Changyu Chen; Xiting Wang; Yiqiao Jin; Victor Ye Dong; Li Dong; Jie Cao; Yi Liu; Rui Yan; |
952 | Smooth Non-stationary Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a non-stationary two-arm bandit problem where we assume an arm’s mean reward is a $\beta$-Hölder function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable. |
Su Jia; Qian Xie; Nathan Kallus; Peter I. Frazier; |
953 | Prometheus: Taming Sample and Communication Complexities in Constrained Decentralized Stochastic Bilevel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem often arises from multi-agent learning problems with safety constraints. As shown in this paper, constrained decentralized bilevel optimization is far more challenging than its unconstrained counterpart due to the complex coupling structure, which necessitates new algorithm design and analysis techniques. |
Zhuqing Liu; Xin Zhang; Prashant Khanduri; Songtao Lu; Jia Liu; |
954 | GC-Flow: A Graph-Based Flow Network for Effective Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design normalizing flows that replace GCN layers, leading to a *generative model* that models both the class conditional likelihood $p(\mathbf{x}|y)$ and the class prior $p(y)$. |
Tianchun Wang; Farzaneh Mirzazadeh; Xiang Zhang; Jie Chen; |
955 | Weighted Tallying Bandits: Overcoming Intractability Via Repeated Exposure Optimality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this, a significant line of work has formalized settings where an action’s loss is a function of the number of times it was played in the prior $m$ timesteps, where $m$ corresponds to a bound on human memory capacity. To more faithfully capture decay of human memory with time, we introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action’s loss is a function of a *weighted* summation of the number of times it was played in the last $m$ timesteps. |
Dhruv Malik; Conor Igoe; Yuanzhi Li; Aarti Singh; |
956 | Nonparametric Density Estimation Under Distribution Drift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study nonparametric density estimation in non-stationary drift settings. |
Alessio Mazzetto; Eli Upfal; |
957 | Global Selection of Contrastive Batches Via Optimization on Sample Permutations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide an alternative to hard negative mining, Global Contrastive Batch Sampling (GCBS), an efficient approximation to the batch assignment problem that upper bounds the gap between the global and training losses, $\mathcal{L}^{Global} – \mathcal{L}^{Train}$, in contrastive learning settings. |
Vin Sachidananda; Ziyi Yang; Chenguang Zhu; |
958 | PromptBoosting: Black-Box Text Classification with Ten Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM’s parameters, gradients, or hidden representations. |
Bairu Hou; Joe O’Connor; Jacob Andreas; Shiyu Chang; Yang Zhang; |
959 | LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. |
Timothy Castiglia; Yi Zhou; Shiqiang Wang; Swanand Kadhe; Nathalie Baracaldo; Stacy Patterson; |
960 | Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to assess how well a model performs under distribution shifts without using labels. |
Weijian Deng; Yumin Suh; Stephen Gould; Liang Zheng; |
961 | Learning Intuitive Policies Using Action Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. |
Mingwei Ma; Jizhou Liu; Samuel Sokota; Max Kleiman-Weiner; Jakob Nicolaus Foerster; |
962 | On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we aim to analyze the fundamental limits of MSFDA. |
Maohao Shen; Yuheng Bu; Gregory Wornell; |
963 | Abstracting Imperfect Information Away from Two-Player Zero-Sum Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned non-correspondence problem—thus, computing them can be treated as perfect-information problems. |
Samuel Sokota; Ryan D’Orazio; Chun Kai Ling; David J Wu; J Zico Kolter; Noam Brown; |
964 | Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. |
Jiafan He; Heyang Zhao; Dongruo Zhou; Quanquan Gu; |
965 | On User-Level Private Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees. |
Badih Ghazi; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Raghu Meka; Chiyuan Zhang; |
966 | Disentangled Generative Models for Robust Prediction of System Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The use of deep neural networks for modelling system dynamics is increasingly popular, but long-term prediction accuracy and out-of-distribution generalization still present challenges. In this study, we address these challenges by considering the parameters of dynamical systems as factors of variation of the data and leverage their ground-truth values to disentangle the representations learned by generative models. |
Stathi Fotiadis; Mario Lino Valencia; Shunlong Hu; Stef Garasto; Chris D Cantwell; Anil Anthony Bharath; |
967 | HOPE: High-order Graph ODE For Modeling Interacting Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods have severe deficiencies in capacity and efficiency due to the failure to model high-order correlations in long-term temporal trends. To tackle this, in this paper, we propose a novel model named High-order graph ODE (HOPE) for learning from dynamic interaction data, which can be naturally represented as a graph. |
Xiao Luo; Jingyang Yuan; Zijie Huang; Huiyu Jiang; Yifang Qin; Wei Ju; Ming Zhang; Yizhou Sun; |
968 | Formalizing Preferences Over Runtime Distributions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When trying to solve a computational problem, we are often faced with a choice between algorithms that are guaranteed to return the right answer but differ in their runtime distributions (e.g., SAT solvers, sorting algorithms). This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distributions. |
Devon R. Graham; Kevin Leyton-Brown; Tim Roughgarden; |
969 | Curious Replay for Model-based Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present Curious Replay—a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. |
Isaac Kauvar; Chris Doyle; Linqi Zhou; Nick Haber; |
970 | Deep Latent State Space Models for Time-Series Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE to increase modeling capacity. |
Linqi Zhou; Michael Poli; Winnie Xu; Stefano Massaroli; Stefano Ermon; |
971 | Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When designing pricing experiments, there are three fundamental objectives: estimating the causal effect of price (i.e., price elasticity), maximizing the expected revenue through the experiment, and controlling the tail risk suffering from a very huge loss. In this paper, we reveal the relationship among such three objectives. |
David Simchi-Levi; Chonghuan Wang; |
972 | Differentiable and Transportable Structure Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce D-Struct which recovers transportability in the discovered structures through a novel architecture and loss function while remaining fully differentiable. |
Jeroen Berrevoets; Nabeel Seedat; Fergus Imrie; Mihaela van der Schaar; |
973 | Repository-Level Prompt Generation for Large Language Models of Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. |
Disha Shrivastava; Hugo Larochelle; Daniel Tarlow; |
974 | Improving Bi-level Optimization Based Methods with Inspiration from Humans’ Classroom Study Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are interested in investigating whether these techniques can inspire the development of ML training strategies to improve bi-level optimization (BLO) based methods. Towards this goal, we develop a general framework, Skillearn, which consists of basic elements such as learners, interaction functions, learning stages, etc. |
Pengtao Xie; |
975 | Graph Generative Model for Benchmarking Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. |
Minji Yoon; Yue Wu; John Palowitch; Bryan Perozzi; Russ Salakhutdinov; |
976 | Linear Optimal Partial Transport Embedding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem. |
Yikun Bai; Ivan Vladimir Medri; Rocio Diaz Martin; Rana Shahroz; Soheil Kolouri; |
977 | Quantum Speedups for Zero-Sum Games Via Improved Dynamic Gibbs Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a quantum algorithm for computing an $\epsilon$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. |
Adam Bouland; Yosheb M Getachew; Yujia Jin; Aaron Sidford; Kevin Tian; |
978 | A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we develop the first near-optimal safe RL algorithm for episodic Markov Decision Processes with unsafe states and actions under instantaneous hard constraints and the linear mixture model. |
Ming Shi; Yingbin Liang; Ness Shroff; |
979 | Deep Graph Representation Learning and Optimization for Influence Maximization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified and underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner. |
Chen Ling; Junji Jiang; Junxiang Wang; My Thai; Lukas Xue; James Song; Meikang Qiu; Liang Zhao; |
980 | Probabilistic Categorical Adversarial Attack and Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these achievements are still hard to be generalized to categorical data. To bridge this gap, we propose a novel framework, Probabilistic Categorical Adversarial Attack (or PCAA). |
Han Xu; Pengfei He; Jie Ren; Yuxuan Wan; Zitao Liu; Hui Liu; Jiliang Tang; |
981 | Second-order Regression Models Exhibit Progressive Sharpening to The Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. |
Atish Agarwala; Fabian Pedregosa; Jeffrey Pennington; |
982 | Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the $SO(3)$ convolutions or tensor products to mathematically equivalent convolutions in $SO(2)$ . |
Saro Passaro; C. Lawrence Zitnick; |
983 | Delayed Bandits: When Do Intermediate Observations Help? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a model, where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay. |
Emmanuel Esposito; Saeed Masoudian; Hao Qiu; Dirk van der Hoeven; Nicolò Cesa-Bianchi; Yevgeny Seldin; |
984 | Von Mises Mixture Distributions for Molecular Conformation Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present VonMisesNet, a new graph neural network that captures conformational variability via a variational approximation of rotatable bond torsion angles as a mixture of von Mises distributions. |
Kirk Swanson; Jake Williams; Eric M Jonas; |
985 | Neural Algorithmic Reasoning with Causal Regularisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make an important observation: there are many different inputs for which an algorithm will perform certain intermediate computations identically. |
Beatrice Bevilacqua; Kyriacos Nikiforou; Borja Ibarz; Ioana Bica; Michela Paganini; Charles Blundell; Jovana Mitrovic; Petar Veličković; |
986 | Robust Speech Recognition Via Large-Scale Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. |
Alec Radford; Jong Wook Kim; Tao Xu; Greg Brockman; Christine McLeavey; Ilya Sutskever; |
987 | Sequential Monte Carlo Learning for Time Series Structure Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new approach to automatically discovering accurate models of complex time series data. |
Feras Saad; Brian Patton; Matthew Douglas Hoffman; Rif A. Saurous; Vikash Mansinghka; |
988 | Scaling Spherical CNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show how spherical CNNs can be scaled for much larger problems. |
Carlos Esteves; Jean-Jacques Slotine; Ameesh Makadia; |
989 | Sample Complexity of Probability Divergences Under Group Symmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We rigorously quantify the improvement in the sample complexity of variational divergence estimations for group-invariant distributions. In the cases of the Wasserstein-1 metric … |
Ziyu Chen; Markos Katsoulakis; Luc Rey-Bellet; Wei Zhu; |
990 | Faster Rates of Convergence to Stationary Points in Differentially Private Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of approximating stationary points of Lipschitz and smooth functions under $(\varepsilon,\delta)$-differential privacy (DP) in both the finite-sum and stochastic settings. |
Raman Arora; Raef Bassily; Tomás González; Cristóbal A Guzmán; Michael Menart; Enayat Ullah; |
991 | Free-Form Variational Inference for Gaussian Process State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method for inference in Bayesian GPSSMs, which overcomes the drawbacks of previous approaches, namely over-simplified assumptions, and high computational requirements. |
Xuhui Fan; Edwin V. Bonilla; Terry O’Kane; Scott A Sisson; |
992 | Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program. |
Chunlin Sun; Shang Liu; Xiaocheng Li; |
993 | Sequential Counterfactual Risk Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. |
Houssam Zenati; Eustache Diemert; Matthieu Martin; Julien Mairal; Pierre Gaillard; |
994 | Automatic Data Augmentation Via Invariance-Constrained Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In fact, there is both empirical and theoretical evidence that the indiscriminate use of data augmentation can introduce biases that outweigh its benefits. This work tackles these issues by automatically adapting the data augmentation while solving the learning task. |
Ignacio Hounie; Luiz F. O. Chamon; Alejandro Ribeiro; |
995 | Divide and Conquer Dynamic Programming: An Almost Linear Time Change Point Detection Methodology in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a novel, general and computationally efficient framework, called Divide and Conquer Dynamic Programming (DCDP), for localizing change points in time series data with high-dimensional features. |
Wanshan Li; Daren Wang; Alessandro Rinaldo; |
996 | An Effective Meaningful Way to Evaluate Survival Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore various metrics to estimate MAE for survival datasets that include (many) censored individuals. |
Shi-ang Qi; Neeraj Kumar; Mahtab Farrokh; Weijie Sun; Li-Hao Kuan; Rajesh Ranganath; Ricardo Henao; Russell Greiner; |
997 | Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. |
Ronshee Chawla; Daniel Vial; Sanjay Shakkottai; R. Srikant; |
998 | On The Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. |
Mudit Gaur; Vaneet Aggarwal; Mridul Agarwal; |
999 | Decoding Layer Saliency in Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. |
Elizabeth Mary Hou; Gregory David Castanon; |
1000 | Data Efficient Neural Scaling Law Via Model Reusing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the neural scaling law under the previously overlooked data scarcity regime, focusing on the more challenging situation where we need to train a gigantic model with a disproportionately limited supply of available training data. |
Peihao Wang; Rameswar Panda; Zhangyang Wang; |
1001 | Can Forward Gradient Match Backpropagation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. |
Louis Fournier; Stephane Rivaud; Eugene Belilovsky; Michael Eickenberg; Edouard Oyallon; |
1002 | CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. |
Gengchen Mai; Ni Lao; Yutong He; Jiaming Song; Stefano Ermon; |
1003 | Fully Dynamic Submodular Maximization Over Matroids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time. |
Paul Duetting; Federico Fusco; Silvio Lattanzi; Ashkan Norouzi-Fard; Morteza Zadimoghaddam; |
1004 | Counterfactual Analysis in Dynamic Latent State Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide an optimization-based framework to perform counterfactual analysis in a dynamic model with hidden states. |
Martin B Haugh; Raghav Singal; |
1005 | Beyond In-Domain Scenarios: Robust Density-Aware Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN). |
Christian Tomani; Futa Kai Waseda; Yuesong Shen; Daniel Cremers; |
1006 | Demystifying Uneven Vulnerability of Link Stealing Attacks Against Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first present theoretical evidence of the uneven vulnerability of GNNs to link stealing attacks, which lays the foundation for demystifying such uneven risks among different groups of edges. We further demonstrate a group-based attack paradigm to expose the practical privacy harm to GNN users derived from the uneven vulnerability of edges. |
He Zhang; Bang Wu; Shuo Wang; Xiangwen Yang; Minhui Xue; Shirui Pan; Xingliang YUAN; |
1007 | PAC Generalization Via Invariant Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen intervened SEMs? Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds *probabilistically* over a family of linear SEMs without faithfulness assumptions. |
Advait U Parulekar; Karthikeyan Shanmugam; Sanjay Shakkottai; |
1008 | A New Near-linear Time Algorithm for K-nearest Neighbor Search Using A Compressed Cover Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. |
Yury Elkin; Vitaliy Kurlin; |
1009 | Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs), where the transition probability can be parameterized as a linear combination of known feature mappings. |
Junkai Zhang; Weitong Zhang; Quanquan Gu; |
1010 | TAN Without A Burn: Scaling Laws of DP-SGD Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. |
Tom Sander; Pierre Stock; Alexandre Sablayrolles; |
1011 | The Benefits of Model-Based Generalization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we provide theoretical and empirical insight into when, and how, we can expect data generated by a learned model to be useful. |
Kenny John Young; Aditya Ramesh; Louis Kirsch; Jürgen Schmidhuber; |
1012 | Provably and Practically Efficient Neural Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a provable sublinear regret bound that is also efficient in the finite regime as demonstrated by empirical studies. |
Sudeep Salgia; |
1013 | Training Deep Surrogate Models with Large Scale Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper advocates that relying on a traditional static dataset to train these models does not allow the full benefit of the solver to be used as a data generator. It proposes an open source online training framework for deep surrogate models. |
Lucas Thibaut Meyer; Marc Schouler; Robert Alexander Caulk; Alejandro Ribes; Bruno Raffin; |
1014 | Auxiliary Modality Learning with Generalized Curriculum Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally define “Auxiliary Modality Learning” (AML), systematically classify types of auxiliary modality (in visual computing) and architectures for AML, and analyze their performance. |
Yu Shen; Xijun Wang; Peng Gao; Ming Lin; |
1015 | Sample and Predict Your Latent: Modality-free Sequential Disentanglement Via Contrastive Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to avoid that by generating, sampling, and comparing empirical distributions from the underlying variational model. |
Ilan Naiman; Nimrod Berman; Omri Azencot; |
1016 | STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). |
Souradip Chakraborty; Amrit Bedi; Alec Koppel; Mengdi Wang; Furong Huang; Dinesh Manocha; |
1017 | Distributed Linear Bandits Under Communication Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. |
Sudeep Salgia; Qing Zhao; |
1018 | Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper builds upon it by introducing a new interpolation procedure in the numerical design process that allows for a far more efficient privacy analysis. |
Chuan Guo; Kamalika Chaudhuri; Pierre Stock; Michael Rabbat; |
1019 | Efficient Preconditioned Stochastic Gradient Descent for Estimation in Latent Variable Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose as an alternative for parameter estimation an efficient preconditioned stochastic gradient algorithm. |
Charlotte Baey; Maud Delattre; Estelle Kuhn; Jean-Benoist Leger; Sarah Lemler; |
1020 | Automated Search for Conjectures on Mathematical Constants Using Analysis of Integer Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a fundamentally different method to search for conjectures on mathematical constants: through analysis of integer sequences. |
Ofir Razon; Yoav Harris; Shahar Gottlieb; Dan Carmon; Ofir David; Ido Kaminer; |
1021 | TIDE: Time Derivative Diffusion for Deep Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to over smoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework. |
Maysam Behmanesh; Maximilian Krahn; Maks Ovsjanikov; |
1022 | Geometric Latent Diffusion Models for 3D Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent huge success of Stable (latent) Diffusion models, we propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM). |
Minkai Xu; Alexander S Powers; Ron O. Dror; Stefano Ermon; Jure Leskovec; |
1023 | Attention-Based Recurrence for Multi-Agent Reinforcement Learning Under Stochastic Partial Observability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. |
Thomy Phan; Fabian Ritz; Philipp Altmann; Maximilian Zorn; Jonas Nüßlein; Michael Kölle; Thomas Gabor; Claudia Linnhoff-Popien; |
1024 | Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders. |
Matthew J. Muckley; Alaaeldin El-Nouby; Karen Ullrich; Herve Jegou; Jakob Verbeek; |
1025 | On The Forward Invariance of Neural ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation. |
Wei Xiao; Tsun-Hsuan Wang; Ramin Hasani; Mathias Lechner; Yutong Ban; Chuang Gan; Daniela Rus; |
1026 | Learning Neural PDE Solvers with Parameter-Guided Channel Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Channel Attention guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy. |
Makoto Takamoto; Francesco Alesiani; Mathias Niepert; |
1027 | Private Statistical Estimation of Many Quantiles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points. |
Clément Lalanne; Aurélien Garivier; Rémi Gribonval; |
1028 | Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim at an ambitious goal of democratizing pretraining. |
Boris Knyazev; DOHA HWANG; Simon Lacoste-Julien; |
1029 | Entropy-driven Unsupervised Keypoint Representation Learning in Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel approach for unsupervised learning of meaningful representations from videos, leveraging the concept of image spatial entropy (ISE) that quantifies the per-pixel information in an image. |
Ali Younes; Simone Schaub-Meyer; Georgia Chalvatzaki; |
1030 | Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we show that such approaches have significant errors over the scale of the approximations. To address this issue, we propose a Monte Carlo method that uses multiple samples from a suitable distribution to reduce bias. |
Jiaming Song; Qinsheng Zhang; Hongxu Yin; Morteza Mardani; Ming-Yu Liu; Jan Kautz; Yongxin Chen; Arash Vahdat; |
1031 | UMD: Unsupervised Model Detection for X2X Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UMD, the first Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs. |
Zhen Xiang; Zidi Xiong; Bo Li; |
1032 | Improving Graph Generation By Restricting Graph Bandwidth Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, one of the main limitations of existing methods is their large output space, which limits generation scalability and hinders accurate modeling of the underlying distribution. To overcome these limitations, we propose a novel approach that significantly reduces the output space of existing graph generative models. |
Nathaniel Lee Diamant; Alex Tseng; Kangway V Chuang; Tommaso Biancalani; Gabriele Scalia; |
1033 | On The Estimation of Gaussian Mixture Copula Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper revisits Gaussian Mixture Copula Model (GMCM), a more expressive alternative to the widely used Gaussian Mixture Model (GMM), with the goal to make its parameter estimation tractable. |
ASHUTOSH TEWARI; |
1034 | Solving Linear Programs with Fast Online Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents fast first-order methods for solving linear programs (LPs) approximately. |
Wenzhi Gao; Dongdong Ge; Chunlin Sun; Yinyu Ye; |
1035 | Variational Open-Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Variational Open-Domain (VOD) framework for end-to-end training and evaluation of retrieval-augmented models, focusing on open-domain question answering and language modelling. |
Valentin Liévin; Andreas Geert Motzfeldt; Ida Riis Jensen; Ole Winther; |
1036 | Structure Learning of Latent Factors Via Clique Search on Correlation Thresholded Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the widespread application of latent factor analysis, existing methods suffer from the following weaknesses: requiring the number of factors to be known, lack of theoretical guarantees for learning the model structure, and nonidentifiability of the parameters due to rotation invariance properties of the likelihood. We address these concerns by proposing a fast correlation thresholding (CT) algorithm that simultaneously learns the number of latent factors and a rotationally identifiable model structure. |
Dale Kim; Qing Zhou; |
1037 | Phase Transitions in The Detection of Correlated Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of detecting the correlation between two Gaussian databases $\mathsf{X}\in\mathbb{R}^{n\times d}$ and $\mathsf{Y}^{n\times d}$, each composed of $n$ users with $d$ features. |
Dor Elimelech; Wasim Huleihel; |
1038 | Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a model-agnostic, cache-friendly, and hardware-aware model compression approach: Random Operation Access Specific Tile (ROAST) hashing. |
Aditya Desai; Keren Zhou; Anshumali Shrivastava; |
1039 | Algorithmic Collective Action in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple theoretical model of a collective interacting with a firm’s learning algorithm. |
Moritz Hardt; Eric Mazumdar; Celestine Mendler-Dünner; Tijana Zrnic; |
1040 | Adversarial Cheap Talk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim’s observation, resulting in a minimal range of influence. |
Chris Lu; Timon Willi; Alistair Letcher; Jakob Nicolaus Foerster; |
1041 | RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. |
Rafael Rodriguez-Sanchez; Benjamin Adin Spiegel; Jennifer Wang; Roma Patel; Stefanie Tellex; George Konidaris; |
1042 | Scalable Safe Policy Improvement Via Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. |
Alberto Castellini; Federico Bianchi; Edoardo Zorzi; Thiago D. Simão; Alessandro Farinelli; Matthijs T. J. Spaan; |
1043 | Robust Collaborative Learning with Linear Gradient Overhead Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. |
Sadegh Farhadkhani; Rachid Guerraoui; Nirupam Gupta; Lê-Nguyên Hoang; Rafael Pinot; John Stephan; |
1044 | Regularization-free Diffeomorphic Temporal Alignment Nets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing DTAN formulations crucially depend on a regularization term whose optimal hyperparameters are dataset-specific and usually searched via a large number of experiments. Here we propose a regularization-free DTAN that obviates the need to perform such an expensive, and often impractical, search. |
Ron Shapira Weber; Oren Freifeld; |
1045 | CLUTR: Curriculum Learning Via Unsupervised Task Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. |
Abdus Salam Azad; Izzeddin Gur; Jasper Emhoff; Nathaniel Alexis; Aleksandra Faust; Pieter Abbeel; Ion Stoica; |
1046 | POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data. |
Korawat Tanwisuth; Shujian Zhang; Huangjie Zheng; Pengcheng He; Mingyuan Zhou; |
1047 | Feature Expansion for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the feature space that dominates representation learning has not been systematically studied in graph neural networks. In this paper, we propose to fill this gap by analyzing the feature space of both spatial and spectral models. |
Jiaqi Sun; Lin Zhang; Guangyi Chen; Peng XU; Kun Zhang; Yujiu Yang; |
1048 | Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL — improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes’ theorem. |
Wenhao Ding; Tong Che; Ding Zhao; Marco Pavone; |
1049 | PINA: Leveraging Side Information in EXtreme Multi-label Classification Via Predicted Instance Neighborhood Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Predicted Instance Neighborhood Aggregation (PINA), a data augmentation method for the general XMC problem that leverages beneficial side information. |
Eli Chien; Jiong Zhang; Cho-Jui Hsieh; Jyun-Yu Jiang; Wei-Cheng Chang; Olgica Milenkovic; Hsiang-Fu Yu; |
1050 | A Two-Stage Active Learning Algorithm for K-Nearest Neighbors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide consistency guarantees for a modified $k$-nearest neighbors classifier trained on samples acquired via our scheme, and show that when the conditional probability function $\mathbb{P}(Y=y|X=x)$ is sufficiently smooth and the Tsybakov noise condition holds, our actively trained classifiers converge to the Bayes optimal classifier at a faster asymptotic rate than passively trained $k$-nearest neighbor classifiers. |
Nicholas Rittler; Kamalika Chaudhuri; |
1051 | Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. |
Yang Song; Prafulla Dhariwal; Mark Chen; Ilya Sutskever; |
1052 | Continuously Parameterized Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that by continuously parameterizing a mixture of factor analyzers using a learned ordinary differential equation, we can improve the fit of mixture models over direct methods. |
Christopher M Bender; Yifeng Shi; Marc Niethammer; Junier Oliva; |
1053 | Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a provably efficient algorithm based on value iteration that can simultaneously allow asynchronous communication and guarantee the benefit of cooperation with low communication complexity. |
Yifei Min; Jiafan He; Tianhao Wang; Quanquan Gu; |
1054 | A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the universality hypothesis by examining how small networks learn to implement group compositions. |
Bilal Chughtai; Lawrence Chan; Neel Nanda; |
1055 | On The Convergence of SARSA with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region. |
Shangtong Zhang; Remi Tachet des Combes; Romain Laroche; |
1056 | PAC-Bayesian Generalization Bounds for Adversarial Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We extend PAC-Bayesian theory to generative models and develop generalization bounds for models based on the Wasserstein distance and the total variation distance. |
Sokhna Diarra Mbacke; Florence Clerc; Pascal Germain; |
1057 | Cluster Explanation Via Polyhedral Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra. |
Connor Lawless; Oktay Gunluk; |
1058 | Diffusion Based Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements on state-of-the-art models on semi-supervised image classification. |
Sarthak Mittal; Korbinian Abstreiter; Stefan Bauer; Bernhard Schölkopf; Arash Mehrjou; |
1059 | Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a fully Bayesian autoencoder model that treats both local latent variables and global decoder parameters in a Bayesian fashion. |
Ba-Hien Tran; Babak Shahbaba; Stephan Mandt; Maurizio Filippone; |
1060 | Benign Overfitting in Two-layer ReLU Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. |
Yiwen Kou; Zixiang Chen; Yuanzhou Chen; Quanquan Gu; |
1061 | Fairness in Streaming Submodular Maximization Over A Matroid Constraint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the natural generalization of this problem to a matroid constraint. |
Marwa El Halabi; Federico Fusco; Ashkan Norouzi-Fard; Jakab Tardos; Jakub Tarnawski; |
1062 | Multi-Layer Neural Networks As Trainable Ladders of Hilbert Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To characterize the functions spaces explored by multi-layer neural networks (NNs), we introduce Neural Hilbert Ladders (NHLs), a collection of reproducing kernel Hilbert spaces (RKHSes) that are defined iteratively and adaptive to training. |
Zhengdao Chen; |
1063 | Optimal Randomized Multilevel Monte Carlo for Repeatedly Nested Expectations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Monte Carlo estimator called $\mathsf{READ}$, which stands for “Recursive Estimator for Arbitrary Depth.” |
Yasa Syed; Guanyang Wang; |
1064 | PAC Prediction Sets for Large Language Models of Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the context of code generation, we propose a solution that considers a restricted set of prediction sets that can compactly be represented as partial programs, which are programs with portions replaced with holes. |
Adam Khakhar; Stephen Mell; Osbert Bastani; |
1065 | Scalable Adaptive Computation for Iterative Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Network (RIN), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. |
Allan Jabri; David J. Fleet; Ting Chen; |
1066 | Function-Space Regularization in Neural Networks: A Probabilistic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. |
Tim G. J. Rudner; Sanyam Kapoor; Shikai Qiu; Andrew Gordon Wilson; |
1067 | PAC-Bayesian Offline Contextual Bandits With Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a new principled approach for off-policy learning in contextual bandits. |
Otmane Sakhi; Pierre Alquier; Nicolas Chopin; |
1068 | Distribution-dependent McDiarmid-type Inequalities for Functions of Unbounded Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper gives unbounded analogues of the McDiarmid-type exponential inequalities for three popular classes of distributions, namely sub-Gaussian, sub-exponential and heavy-tailed distributions. |
Shaojie Li; Yong Liu; |
1069 | Special Properties of Gradient Descent with Large Learning Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formally prove that GD with large step size —on certain non-convex function classes — follows a different trajectory than GD with a small step size, which can lead to convergence to a global minimum instead of a local one. |
Amirkeivan Mohtashami; Martin Jaggi; Sebastian U Stich; |
1070 | The Power of Learned Locally Linear Models for Nonlinear Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm – e.g. $\mathtt{iLQR}$ – on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. |
Daniel Pfrommer; Max Simchowitz; Tyler Westenbroek; Nikolai Matni; Stephen Tu; |
1071 | On The Privacy-Robustness-Utility Trilemma in Distributed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. |
Youssef Allouah; Rachid Guerraoui; Nirupam Gupta; Rafael Pinot; John Stephan; |
1072 | Statistical Learning Under Heterogenous Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$. |
Max Simchowitz; Anurag Ajay; Pulkit Agrawal; Akshay Krishnamurthy; |
1073 | Why Is Public Pretraining Necessary for Private Model Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the stark contrast in the gain of pretraining between non-private and private machine learning suggests that the gain in the latter is rooted in a fundamentally different cause. To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates the optimization algorithm to go through two phases. |
Arun Ganesh; MAHDI HAGHIFAM; Milad Nasr; Sewoong Oh; Thomas Steinke; Om Thakkar; Abhradeep Guha Thakurta; Lun Wang; |
1074 | Reinforcement Learning in Low-rank MDPs with Density Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. |
Audrey Huang; Jinglin Chen; Nan Jiang; |
1075 | Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. |
Raif M. Rustamov; Subhabrata Majumdar; |
1076 | Random Teachers Are Good Teachers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation. |
Felix Sarnthein; Gregor Bachmann; Sotiris Anagnostidis; Thomas Hofmann; |
1077 | Concurrent Shuffle Differential Privacy Under Continual Observation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the concurrent shuffle model of differential privacy. |
Jay Tenenbaum; Haim Kaplan; Yishay Mansour; Uri Stemmer; |
1078 | Discrete Key-Value Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. |
Frederik Träuble; Anirudh Goyal; Nasim Rahaman; Michael Curtis Mozer; Kenji Kawaguchi; Yoshua Bengio; Bernhard Schölkopf; |
1079 | Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios. |
Zhen Lin; Shubhendu Trivedi; Cao Xiao; Jimeng Sun; |
1080 | Understanding The Complexity Gains of Single-Task RL with A Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. |
Qiyang Li; Yuexiang Zhai; Yi Ma; Sergey Levine; |
1081 | PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we can achieve the same level of performance with low value sample reuse and frequent feature distillation, as long as the policy regularization strength and data diversity are preserved. |
Kaixin Wang; Daquan Zhou; Jiashi Feng; Shie Mannor; |
1082 | Is Consensus Acceleration Possible in Decentralized Optimization Over Slowly Time-Varying Networks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider decentralized optimization problems where one aims to minimize a sum of convex smooth objective functions distributed between nodes in the network. |
Dmitry Metelev; Alexander Rogozin; Dmitry Kovalev; Alexander Gasnikov; |
1083 | Sequential Predictive Conformal Inference for Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new distribution-free conformal prediction algorithm for sequential data (e.g., time series), called the *sequential predictive conformal inference* (SPCI). |
Chen Xu; Yao Xie; |
1084 | Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a step towards studying the representational power of neural networks for approximating solutions to nonlinear PDEs. |
Tanya Marwah; Zachary Chase Lipton; Jianfeng Lu; Andrej Risteski; |
1085 | Distortion and Uncertainty Aware Loss for Panoramic Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the assumption is inapplicable to panoramic data due to its latitude-wise distortion and high uncertainty nearby textures and edges. To handle these challenges, we propose distortion and uncertainty aware loss (DUL) that consists of a distortion-aware loss and an uncertainty-aware loss. |
Zhiqiang Yan; Xiang Li; Kun Wang; Shuo Chen; Jun Li; Jian Yang; |
1086 | SinFusion: Training Diffusion Models on A Single Image or Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they are usually trained on very large datasets and are not naturally adapted to manipulate a given input image or video. In this paper we show how this can be resolved by training a diffusion model on a single input image or video. |
Yaniv Nikankin; Niv Haim; michal Irani; |
1087 | Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). |
Seongun Kim; Kyowoon Lee; Jaesik Choi; |
1088 | VIMA: Robot Manipulation with Multimodal Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization. |
Yunfan Jiang; Agrim Gupta; Zichen Zhang; Guanzhi Wang; Yongqiang Dou; Yanjun Chen; Li Fei-Fei; Anima Anandkumar; Yuke Zhu; Linxi Fan; |
1089 | Generating Novel, Designable, and Diverse Protein Structures By Equivariantly Diffusing Oriented Residue Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we leverage recent advances in denoising diffusion probabilistic models and equivariant neural networks to develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space. |
Yeqing Lin; Mohammed AlQuraishi; |
1090 | Revisiting Pseudo-Label for Single-Positive Multi-Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the conditions of the effectiveness of learning from pseudo-label for SPMLL are shown and the learnability of pseudo-label-based methods is proven. |
Biao Liu; Ning Xu; Jiaqi Lv; Xin Geng; |
1091 | What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose two simple but effective methods inspired by the fixpoint computation in similarity flooding, and demonstrate their effectiveness on benchmark datasets. |
Zequn Sun; Jiacheng Huang; Xiaozhou Xu; Qijin Chen; Weijun Ren; Wei Hu; |
1092 | Differentially Private Optimization on Large Model at Small Cost Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a novel Book-Keeping (BK) technique that implements existing DP optimizers (thus achieving the same accuracy), with a substantial improvement on the computational cost. |
Zhiqi Bu; Yu-Xiang Wang; Sheng Zha; George Karypis; |
1093 | Fully-Adaptive Composition in Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct filters that match the rates of advanced composition, including constants, despite allowing for adaptively chosen privacy parameters. |
Justin Whitehouse; Aaditya Ramdas; Ryan Rogers; Steven Wu; |
1094 | Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning Using Independent Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients/weight updates aggregated over as many as 1024 samples. |
Sanjay Kariyappa; Chuan Guo; Kiwan Maeng; Wenjie Xiong; G. Edward Suh; Moinuddin K Qureshi; Hsien-Hsin S. Lee; |
1095 | The Numerical Stability of Hyperbolic Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze the limitations of two popular models for the hyperbolic space, namely, the Poincaré ball and the Lorentz model. |
Gal Mishne; Zhengchao Wan; Yusu Wang; Sheng Yang; |
1096 | Revisiting Bellman Errors for Offline Model Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we elucidate why previous work has seen pessimistic results with Bellman errors and identify conditions under which OMS algorithms based on Bellman errors will perform well. |
Joshua P Zitovsky; Daniel de Marchi; Rishabh Agarwal; Michael Rene Kosorok; |
1097 | $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. |
Chengyue Wu; Teng Wang; Yixiao Ge; Zeyu Lu; Ruisong Zhou; Ying Shan; Ping Luo; |
1098 | Multi-task Representation Learning for Pure Exploration in Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study multi-task representation learning for best arm identification in linear bandit (RepBAI-LB) and best policy identification in contextual linear bandit (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. |
Yihan Du; Longbo Huang; Wen Sun; |
1099 | Data Representations’ Study of Latent Image Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep neural networks have been demonstrated to achieve phenomenal success in many domains, and yet their inner mechanisms are not well understood. In this paper, we investigate the curvature of image manifolds, i.e., the manifold deviation from being flat in its principal directions. |
Ilya Kaufman; Omri Azencot; |
1100 | Implicit Graph Neural Networks: A Monotone Operator Viewpoint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a new well-posedness characterization for IGNNs leveraging monotone operator theory, resulting in a much more expressive parameterization than the existing one. |
Justin Baker; Qingsong Wang; Cory D Hauck; Bao Wang; |
1101 | Reasons for The Superiority of Stochastic Estimators Over Deterministic Ones: Robustness, Consistency and Perceptual Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use. |
Guy Ohayon; Theo Joseph Adrai; Michael Elad; Tomer Michaeli; |
1102 | Regularizing Towards Soft Equivariance Under Mixed Symmetries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of proposing a new architectural restriction as in most of the previous approaches, we present a regularizer-based method for building a model for a dataset with mixed approximate symmetries. |
Hyunsu Kim; Hyungi Lee; Hongseok Yang; Juho Lee; |
1103 | Spurious Valleys and Clustering Behavior of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove two main results concerning the geometry of the loss landscape of a neural network. |
Samuele Pollaci; |
1104 | RSC: Accelerate Graph Neural Networks Training Via Randomized Sparse Computations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. |
Zirui Liu; CHEN SHENGYUAN; Kaixiong Zhou; Daochen Zha; Xiao Huang; Xia Hu; |
1105 | Image Restoration with Mean-Reverting Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a stochastic differential equation (SDE) approach for general-purpose image restoration. |
Ziwei Luo; Fredrik K. Gustafsson; Zheng Zhao; Jens Sjölund; Thomas B. Schön; |
1106 | Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and … |
Orin Levy; Alon Cohen; Asaf Cassel; Yishay Mansour; |
1107 | Sequential Kernelized Independence Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. |
Aleksandr Podkopaev; Patrick Blöbaum; Shiva Kasiviswanathan; Aaditya Ramdas; |
1108 | MultiRobustBench: Benchmarking Robustness Against Multiple Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first unified framework for considering multiple attacks against ML models. |
Sihui Dai; Saeed Mahloujifar; Chong Xiang; Vikash Sehwag; Pin-Yu Chen; Prateek Mittal; |
1109 | ILLUME: Rationalizing Vision-Language Models Through Human Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, outputs of these models rarely align with user’s rationales for specific answers. In order to improve this alignment and reinforce commonsense reasons, we propose a tuning paradigm based on human interactions with machine-generated data. |
Manuel Brack; Patrick Schramowski; Björn Deiseroth; Kristian Kersting; |
1110 | Parameter-Level Soft-Masking for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is threefold: (1) overcoming CF, (2) encouraging KT, and (3) tackling the capacity problem. |
Tatsuya Konishi; Mori Kurokawa; Chihiro Ono; Zixuan Ke; Gyuhak Kim; Bing Liu; |
1111 | Are Diffusion Models Vulnerable to Membership Inference Attacks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the vulnerability of diffusion models to Membership Inference Attacks (MIAs), a common privacy concern. |
Jinhao Duan; Fei Kong; Shiqi Wang; Xiaoshuang Shi; Kaidi Xu; |
1112 | Parallel Online Clustering of Bandits Via Hedonic Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CLUB-HG, a novel algorithm that integrates a game-theoretic approach into clustering inference. |
Xiaotong Cheng; Cheng Pan; Setareh Maghsudi; |
1113 | The Computational Complexity of Concise Hypersphere Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform the first complexity-theoretic study of the hypersphere classification problem for binary data. |
Eduard Eiben; Robert Ganian; Iyad A. Kanj; Sebastian Ordyniak; Stefan Szeider; |
1114 | Demystifying Disagreement-on-the-Line in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression; and study under what conditions the disagreement-on-the-line phenomenon occurs in our setting. |
Donghwan Lee; Behrad Moniri; Xinmeng Huang; Edgar Dobriban; Hamed Hassani; |
1115 | Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a margin-based learning framework that exploits freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. |
Haoyue Bai; Gregory Canal; Xuefeng Du; Jeongyeol Kwon; Robert D Nowak; Yixuan Li; |
1116 | Robust Weak Supervision with Variational Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, LFs need to be carefully designed, often requiring expert domain knowledge and extensive validation for existing WS methods to be effective. To tackle this, we propose the Weak Supervision Variational Auto-Encoder (WS-VAE), a novel framework that combines unsupervised representation learning and weak labelling to reduce the dependence of WS on expert and manual engineering of LFs. |
Francesco Tonolini; Nikolaos Aletras; Yunlong Jiao; Gabriella Kazai; |
1117 | Thompson Sampling with Diffusion Generative Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. |
Yu-Guan Hsieh; Shiva Kasiviswanathan; Branislav Kveton; Patrick Blöbaum; |
1118 | Distilling Internet-Scale Vision-Language Models Into Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. |
Theodore Sumers; Kenneth Marino; Arun Ahuja; Rob Fergus; Ishita Dasgupta; |
1119 | Explainable Data-Driven Optimization: From Context to Decision and Back Again Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce two classes of explanations and develop methods to find nearest explanations of random forest and nearest-neighbor predictors. |
Alexandre Forel; Axel Parmentier; Thibaut Vidal; |
1120 | Why Random Pruning Is All We Need to Start Sparse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate the feasibility of this approach in experiments for different pruning methods and propose particularly effective choices of initial layer-wise sparsity ratios of the random source network. |
Advait Harshal Gadhikar; Sohom Mukherjee; Rebekka Burkholz; |
1121 | ContraBAR: Contrastive Bayes-Adaptive Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We begin by proving that representations learned by CPC are indeed sufficient for Bayes optimality. Based on this observation, we propose a simple meta RL algorithm that uses CPC in lieu of variational belief inference. |
Era Choshen; Aviv Tamar; |
1122 | FREDIS: A Fusion Framework of Refinement and Disambiguation for Unreliable Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a fusion framework of refinement and disambiguation named FREDIS to handle the UPLL problem. |
Congyu Qiao; Ning Xu; Jiaqi Lv; yi Ren; Xin Geng; |
1123 | Learning Useful Representations for Shifting Tasks and Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions? Our thesis is that *such scenarios are better served by representations that are richer than those obtained with a single optimization episode. |
Jianyu Zhang; Leon Bottou; |
1124 | Computational Doob H-transforms for Online Filtering of Discretely Observed Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach is based on the fully adapted auxiliary particle filter, which involves Doob’s $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac formulas and neural networks. |
Nicolas Chopin; Andras Fulop; Jeremy Heng; Alexandre H. Thiery; |
1125 | XTab: Cross-table Pretraining for Tabular Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. |
Bingzhao Zhu; Xingjian Shi; Nick Erickson; Mu Li; George Karypis; Mahsa Shoaran; |
1126 | On The Robustness of Text Vectorizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the Hölder or Lipschitz sense with respect to the Hamming distance. |
Rémi Catellier; Samuel Vaiter; Damien Garreau; |
1127 | Beyond Reward: Offline Preference-guided Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function. |
Yachen Kang; Diyuan Shi; Jinxin Liu; Li He; Donglin Wang; |
1128 | DIFF2: Differential Private Optimization Via Gradient Differences for Nonconvex Distributed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the best known utility bound, we propose a new differential private optimization framework called DIFF2 (DIFFerential private optimization via gradient DIFFerences) that constructs a differential private global gradient estimator with possibly quite small variance based on communicated gradient differences rather than gradients themselves. |
Tomoya Murata; Taiji Suzuki; |
1129 | On Many-Actions Policy Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. |
Michal Nauman; Marek Cygan; |
1130 | Online Nonstochastic Control with Adversarial and Static Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose online nonstochastic control algorithms that achieve both sublinear regret and sublinear adversarial constraint violation while keeping static constraint violation minimal against the optimal constrained linear control policy in hindsight. |
Xin Liu; Zixian Yang; Lei Ying; |
1131 | Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. |
Kaiwen Wang; Nathan Kallus; Wen Sun; |
1132 | The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. |
Jiin Woo; Gauri Joshi; Yuejie Chi; |
1133 | Multi-task Hierarchical Adversarial Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. |
Jiayu Chen; Dipesh Tamboli; Tian Lan; Vaneet Aggarwal; |
1134 | On The Generalization of Multi-modal Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through this unified perspective, we characterize the advantage of MMCL by showing that text pairs induce more semantically consistent and diverse positive pairs, which, according to our analysis, provably benefit downstream generalization. Inspired by this finding, we propose several methods to significantly improve the downstream performance of SSCL on ImageNet by leveraging multi-modal information. |
Qi Zhang; Yifei Wang; Yisen Wang; |
1135 | Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TnALE, a surprisingly simple algorithm that updates each structure-related variable alternately by local enumeration, greatly reducing the number of evaluations compared to TNLS. |
Chao Li; Junhua Zeng; Chunmei Li; Cesar F Caiafa; Qibin Zhao; |
1136 | Shapley Based Residual Decomposition for Instance Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the idea of decomposing the residuals of regression with respect to the data instances instead of features. |
Tommy Liu; Amanda Susan Barnard; |
1137 | Less Is More: Task-aware Layer-wise Distillation for Language Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, the hidden representations of the teacher contain redundant information that the student does not necessarily need for the target task’s learning. To address these challenges, we propose a novel Task-aware layEr-wise Distillation (TED). |
Chen Liang; Simiao Zuo; Qingru Zhang; Pengcheng He; Weizhu Chen; Tuo Zhao; |
1138 | Metagenomic Binning Using Connectivity-constrained Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the use of a Variational AutoEncoder (VAE) tailored to leverage auxiliary structural information about contig relations when learning contig representations for subsequent metagenomic binning. |
Andre Lamurias; Alessandro Tibo; Katja Hose; Mads Albertsen; Thomas Dyhre Nielsen; |
1139 | Relevant Walk Search for Explaining Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose polynomial-time algorithms for finding top-$K$ relevant walks, which drastically reduces the computation and thus increases the applicability of GNN-LRP to large-scale problems. |
Ping Xiong; Thomas Schnake; Michael Gastegger; Grégoire Montavon; Klaus Robert Muller; Shinichi Nakajima; |
1140 | Modality-Agnostic Variational Compression of Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR). |
Jonathan Richard Schwarz; Jihoon Tack; Yee Whye Teh; Jaeho Lee; Jinwoo Shin; |
1141 | VectorMapNet: End-to-end Vectorized HD Map Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these predictions do not include instance information of individual map elements and require heuristic post-processing to obtain vectorized maps. To tackle these challenges, we introduce an end-to-end vectorized HD map learning pipeline, termed VectorMapNet. |
Yicheng Liu; Tianyuan Yuan; Yue Wang; Yilun Wang; Hang Zhao; |
1142 | Hierarchical Imitation Learning with Vector Quantized Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal. |
Kalle Kujanpää; Joni Pajarinen; Alexander Ilin; |
1143 | Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the online bandit learning of the monotone multi-linear DR-submodular functions, designing the algorithm $\mathtt{BanditMLSM}$ that attains $O(T^{2/3}\log T)$ of $(1-1/e)$-regret. |
Zongqi Wan; Jialin Zhang; Wei Chen; Xiaoming SUN; Zhijie Zhang; |
1144 | Temporal Label Smoothing for Early Event Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following an analysis of objectives from both fields, we propose Temporal Label Smoothing (TLS), a simpler, yet best-performing method that preserves prediction monotonicity over time. |
Hugo Yèche; Alizée Pace; Gunnar Ratsch; Rita Kuznetsova; |
1145 | Reconstructive Neuron Pruning for Backdoor Defense Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel defense called *Reconstructive Neuron Pruning* (RNP) to expose and prune backdoor neurons via an unlearning and then recovering process. |
Yige Li; Xixiang Lyu; Xingjun Ma; Nodens Koren; Lingjuan Lyu; Bo Li; Yu-Gang Jiang; |
1146 | Minimalistic Predictions to Schedule Jobs with Online Precedence Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present lower bounds and algorithmic upper bounds for different precedence topologies, and thereby give a structured overview on which and how additional (possibly erroneous) information helps for designing better algorithms. |
Alexandra Lassota; Alexander Lindermayr; Nicole Megow; Jens Schlöter; |
1147 | Estimating The Contamination Factor’s Distribution in Unsupervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, there are no good methods for estimating the contamination factor itself. We address this need from a Bayesian perspective, introducing a method for estimating the posterior distribution of the contamination factor for a given unlabeled dataset. |
Lorenzo Perini; Paul-Christian Bürkner; Arto Klami; |
1148 | Implicit Neural Spatial Representations for Time-dependent PDEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores solving time-dependent PDEs with INSR. |
Honglin Chen; Rundi Wu; Eitan Grinspun; Changxi Zheng; Peter Yichen Chen; |
1149 | Cell-Free Latent Go-Explore Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Latent Go-Explore (LGE), a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL). |
Quentin Gallouédec; Emmanuel Dellandrea; |
1150 | Unlocking Slot Attention By Changing Optimal Transport Costs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose **MESH** (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. |
Yan Zhang; David W Zhang; Simon Lacoste-Julien; Gertjan J. Burghouts; Cees G. M. Snoek; |
1151 | Generalized Polyak Step Size for First Order Optimization with Momentum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a general framework to set the learning rate adaptively for first-order optimization methods with momentum, motivated by the derivation of Polyak step size. |
Xiaoyu Wang; Mikael Johansson; Tong Zhang; |
1152 | Towards Understanding Generalization of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even though GNNs have achieved remarkable success in real-world applications, understanding their working mechanism in theory is still on primary stage. In this paper, we move towards this goal from the perspective of generalization. |
Huayi Tang; Yong Liu; |
1153 | Projected Tensor Power Method for Hypergraph Community Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the non-convex and discrete nature of the maximum likelihood estimation problem, we develop a simple yet efficient iterative method, called the *projected tensor power method*, to tackle it. |
Jinxin Wang; Yuen-Man Pun; Xiaolu Wang; Peng Wang; Anthony Man-Cho So; |
1154 | Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. |
Martino Bernasconi; Matteo Castiglioni; Andrea Celli; Alberto Marchesi; Francesco Trovò; Nicola Gatti; |
1155 | Trustworthy Policy Learning Under The Counterfactual No-Harm Criterion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first formalize the counterfactual no-harm criterion for policy learning from a principal stratification perspective. Next, we propose a novel upper bound for the fraction negatively affected by the policy and show the consistency and asymptotic normality of the estimator. |
Haoxuan Li; Chunyuan Zheng; Yixiao Cao; Zhi Geng; Yue Liu; Peng Wu; |
1156 | TabDDPM: Modelling Tabular Data with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features. |
Akim Kotelnikov; Dmitry Baranchuk; Ivan Rubachev; Artem Babenko; |
1157 | Graph Neural Tangent Kernel: Convergence on Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the training dynamics of large-graph GNNs using graph neural tangent kernels (GNTKs) and graphons. |
Sanjukta Krishnagopal; Luana Ruiz; |
1158 | Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly Detection with Scale Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from current reconstruction-guided generative models and transformation-based contrastive models, we devise novel data-driven supervision for tabular data by introducing a characteristic — scale — as data labels. |
Hongzuo Xu; Yijie Wang; Juhui Wei; Songlei Jian; Yizhou Li; Ning Liu; |
1159 | Aligning Language Models with Preferences Through $f$-divergence Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach, $f$-DPG, which allows the use of any $f$-divergence to approximate any target distribution that can be evaluated. |
Dongyoung Go; Tomasz Korbak; Germán Kruszewski; Jos Rozen; Nahyeon Ryu; Marc Dymetman; |
1160 | Speed-Oblivious Online Scheduling: Knowing (Precise) Speeds Is Not Necessary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider online scheduling on unrelated (heterogeneous) machines in a speed-oblivious setting, where an algorithm is unaware of the exact job-dependent processing speeds. We show strong impossibility results for clairvoyant and non-clairvoyant algorithms and overcome them in models inspired by practical settings: (i) we provide competitive learning-augmented algorithms, assuming that (possibly erroneous) predictions on the speeds are given, and (ii) we provide competitive algorithms for the speed-ordered model, where a single global order of machines according to their unknown job-dependent speeds is known. |
Alexander Lindermayr; Nicole Megow; Martin Rapp; |
1161 | Differentiable Simulations for Enhanced Sampling of Rare Events Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using differentiable simulations (DiffSim) for the discovery and enhanced sampling of chemical transformations without a need to resort to preselected CVs, using only a distance metric. |
Martin Sipka; Johannes Carl Bertold Dietschreit; Lukáš Grajciar; Rafael Gomez-Bombarelli; |
1162 | Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel Admeta (**A** **D**ouble exponential **M**oving averag**E** **T**o **A**daptive and non-adaptive momentum) optimizer framework. |
Yineng Chen; Zuchao Li; Lefei Zhang; Bo Du; hai zhao; |
1163 | Vector-Valued Control Variates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose vector-valued control variates, an extension of control variates which can be used to reduce the variance of multiple Monte Carlo estimators jointly. |
Zhuo Sun; Alessandro Barp; Francois-Xavier Briol; |
1164 | Causal Structure Learning for Latent Intervened Non-stationary Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for non-stationary time series data, domain indexes are often unavailable, making it difficult to distinguish observational samples from interventional samples. To address these issues, we propose a novel Latent Intervened Non-stationary learning (LIN) method to make the domain indexes recovery process and the causal structure learning process mutually promote each other. |
Chenxi Liu; Kun Kuang; |
1165 | Neural Inverse Operators for Solving PDE Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing operator learning frameworks map functions to functions and need to be modified to learn inverse maps from data. We propose a novel architecture termed Neural Inverse Operators (NIOs) to solve these PDE inverse problems. |
Roberto Molinaro; Yunan Yang; Björn Engquist; Siddhartha Mishra; |
1166 | A Distribution Optimization Framework for Confidence Bounds of Risk Measures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a distribution optimization framework that significantly improves confidence bounds for various risk measures compared to previous methods. |
Hao Liang; Zhi-Quan Luo; |
1167 | On The Complexity of Bayesian Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Established computational modes (*i.e.*, rule-based or similarity-based) are primarily studied isolated, focusing on confined and abstract problem spaces. In this work, we study these two modes when the *problem space* scales up and when the *complexity* of concepts becomes diverse. |
Yu-Zhe Shi; Manjie Xu; John E. Hopcroft; Kun He; Joshua B. Tenenbaum; Song-Chun Zhu; Ying Nian Wu; Wenjuan Han; Yixin Zhu; |
1168 | Vertical Federated Graph Neural Network for Recommender System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN. |
Peihua Mai; Yan Pang; |
1169 | Feature Directions Matter: Long-Tailed Learning Via Rotated Balanced Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Holding a different view, in this paper, we show that features with fixed directions may be harmful to the generalization of models, even if it is completely symmetric. To avoid this issue, we propose Representation-Balanced Learning Framework (RBL), which introduces orthogonal matrices to learn directions while maintaining the geometric structure of ETF. |
Gao Peifeng; Qianqian Xu; Peisong Wen; Zhiyong Yang; Huiyang Shao; Qingming Huang; |
1170 | SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general “affine variance” noise model and provides sharp rates of convergence in both the low-noise and high-noise regimes. |
Amit Attia; Tomer Koren; |
1171 | Learning Hidden Markov Models When The Locations of Missing Observations Are Unknown Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider the general problem of learning an HMM from data with unknown missing observation locations. |
Binyamin Perets; Mark Kozdoba; Shie Mannor; |
1172 | Unscented Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Variational Autoencoder (VAE) is a seminal approach in deep generative modeling with latent variables. Interpreting its reconstruction process as a nonlinear transformation of samples from the latent posterior distribution, we apply the Unscented Transform (UT) — a well-known distribution approximation used in the Unscented Kalman Filter (UKF) from the field of filtering. |
Faris Janjos; Lars Rosenbaum; Maxim Dolgov; J. Marius Zoellner; |
1173 | Finding Generalization Measures By Contrasting Signal and Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new generalization measure REF Complexity (RElative Fitting degree between signal and noise), motivated by the intuition that a given model-algorithm pair may generalize well if it fits signal (e.g., true labels) fast while fitting noise (e.g., random labels) slowly. |
Jiaye Teng; Bohang Zhang; Ruichen Li; Haowei He; Yequan Wang; Yan Tian; Yang Yuan; |
1174 | Regression with Sensor Data Containing Incomplete Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, because an incomplete observation does not provide any tags indicating incompleteness, we cannot eliminate or impute them. To address this issue, we propose a learning algorithm that explicitly models incomplete observations corrupted with an asymmetric noise that always has a negative value. |
Takayuki Katsuki; Takayuki Osogami; |
1175 | Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. |
Minghao Guo; Veronika Thost; Samuel W Song; Adithya Balachandran; Payel Das; Jie Chen; Wojciech Matusik; |
1176 | Detecting Out-of-distribution Data Through In-distribution Class Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, some representative methods share an unproven assumption that the probability that OOD data belong to every ID class should be the same, i.e., these OOD-to-ID probabilities actually form a uniform distribution. In this paper, we show that this assumption makes the above methods incapable when the ID model is trained with class-imbalanced data.Fortunately, by analyzing the causal relations between ID/OOD classes and features, we identify several common scenarios where the OOD-to-ID probabilities should be the ID-class-prior distribution and propose two strategies to modify existing inference-time detection methods: 1) replace the uniform distribution with the ID-class-prior distribution if they explicitly use the uniform distribution; 2) otherwise, reweight their scores according to the similarity between the ID-class-prior distribution and the softmax outputs of the pre-trained model. |
Xue Jiang; Feng Liu; Zhen Fang; Hong Chen; Tongliang Liu; Feng Zheng; Bo Han; |
1177 | Learning Unnormalized Statistical Models Via Compositional Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a direct approach for optimizing the negative log-likelihood of unnormalized models from the perspective of compositional optimization. |
Wei Jiang; Jiayu Qin; Lingyu Wu; Changyou Chen; Tianbao Yang; Lijun Zhang; |
1178 | DoG Is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). |
Maor Ivgi; Oliver Hinder; Yair Carmon; |
1179 | CrossSplit: Mitigating Label Noise Memorization Through Data Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. |
Jihye Kim; Aristide Baratin; Yan Zhang; Simon Lacoste-Julien; |
1180 | An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adaptive entropy-regularization framework (ADER) for multi-agent reinforcement learning (RL) to learn the adequate amount of exploration of each agent for entropy-based exploration. |
Woojun Kim; Youngchul Sung; |
1181 | Generalizing Neural Wave Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave functions to different molecules. |
Nicholas Gao; Stephan Günnemann; |
1182 | Deep Laplacian-based Options for Temporally-Extended Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These assumptions are fundamentally not scalable. In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration. |
Martin Klissarov; Marlos C. Machado; |
1183 | One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new loss function for adversarial training. |
Sekitoshi Kanai; Shin’ya Yamaguchi; Masanori Yamada; Hiroshi Takahashi; Kentaro Ohno; Yasutoshi Ida; |
1184 | Multi-Modal Classifiers for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is open-vocabulary object detection (OVOD) — building a model that can detect objects beyond the set of categories seen at training, thus enabling the user to specify categories of interest at inference without the need for model retraining. |
Prannay Kaul; Weidi Xie; Andrew Zisserman; |
1185 | Multi-Task Structural Learning Using Local Task Similarity Induced Neuron Creation and Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the contrary, learning in the brain occurs through structural changes that are in tandem with changes in synaptic strength. Thus, we propose *Multi-Task Structural Learning (MTSL)* that simultaneously learns the multi-task architecture and its parameters. |
Naresh Kumar Gurulingan; Bahram Zonooz; Elahe Arani; |
1186 | A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE. |
Shengchao Liu; weitao Du; Zhi-Ming Ma; Hongyu Guo; Jian Tang; |
1187 | Reachability-Aware Laplacian Representation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a mismatch would impede the learning process in reward shaping. To fix this issue, we introduce a Reachability-Aware Laplacian Representation (RA-LapRep), by properly scaling each dimension of LapRep. |
Kaixin Wang; Kuangqi Zhou; Jiashi Feng; Bryan Hooi; Xinchao Wang; |
1188 | Adversarial Collaborative Learning on Non-IID Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from typical FL approaches, the paper proposes a new learning concept called ADCOL (Adversarial Collaborative Learning) for non-IID features. |
Qinbin Li; Bingsheng He; Dawn Song; |
1189 | Provably Invariant Learning Without Domain Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present TIVA for environment-independent invariance learning, which requires no environment-specific information in training data. |
Xiaoyu Tan; LIN Yong; Shengyu Zhu; Chao Qu; Xihe Qiu; Xu Yinghui; Peng Cui; Yuan Qi; |
1190 | GRAFENNE: Learning on Graphs with Heterogeneous and Dynamic Feature Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these techniques (i) assume uniformity of feature set across nodes, (ii) are transductive by nature, and (iii) fail to work when features are added or removed over time. In this work, we address these limitations through a novel GNN framework called GRAFENNE. |
Shubham Gupta; Sahil Manchanda; Sayan Ranu; Srikanta J. Bedathur; |
1191 | DevFormer: A Symmetric Transformer for Context-Aware Device Placement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DevFormer, a novel transformer-based architecture for addressing the complex and computationally demanding problem of hardware design optimization. |
Haeyeon Kim; Minsu Kim; Federico Berto; Joungho Kim; Jinkyoo Park; |
1192 | Continual Learning in Linear Classification on Separable Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze continual learning on a sequence of separable linear classification tasks with binary labels. |
Itay Evron; Edward Moroshko; Gon Buzaglo; Maroun Khriesh; Badea Marjieh; Nathan Srebro; Daniel Soudry; |
1193 | Total Variation Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a GNN model that computes cluster assignments by optimizing a tighter relaxation of the minimum cut based on graph total variation (GTV). |
Jonas Berg Hansen; Filippo Maria Bianchi; |
1194 | Learning Control By Iterative Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose *iterative inversion* – an algorithm for learning an inverse function without input-output pairs, but only with samples from the desired output distribution and access to the forward function. |
Gal Leibovich; Guy Jacob; Or Avner; Gal Novik; Aviv Tamar; |
1195 | Expected Gradients of Maxout Networks and Consequences to Parameter Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. |
Hanna Tseran; Guido Montufar; |
1196 | Topological Singularity Detection at Multiple Scales Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ‘manifoldness’ of a point along multiple scales. |
Julius Von Rohrscheidt; Bastian Rieck; |
1197 | Finding The Missing-half: Graph Complementary Learning for Homophily-prone and Heterophily-prone Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our paper, we introduce Graph cOmplementAry Learning, namely GOAL, which consists of two components: graph complementation and complemented graph convolution. |
YIZHEN ZHENG; He Zhang; Vincent Lee; Yu Zheng; Xiao Wang; Shirui Pan; |
1198 | Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets. To overcome this limitation, we propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting, called Regularized Adaptive Weight Modification (RAWM). |
XiaoHui Zhang; Jiangyan Yi; Jianhua Tao; Chenglong Wang; Chu Yuan Zhang; |
1199 | Homomorphism AutoEncoder — Learning Group Structured Representations from Observed Transitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. |
Hamza Keurti; Hsiao-Ru Pan; Michel Besserve; Benjamin F Grewe; Bernhard Schölkopf; |
1200 | Semi-Dual Unbalanced Quadratic Optimal Transport: Fast Statistical Rates and Convergent Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive a semi-dual formulation for the problem of unbalanced quadratic optimal transport and we study its stability properties, namely we give upper and lower bounds for the Bregman divergence of the new objective that hold globally. |
Adrien Vacher; François-Xavier Vialard; |
1201 | Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present AIRS: **A**utomatic **I**ntrinsic **R**eward **S**haping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). |
Mingqi Yuan; Bo Li; Xin Jin; Wenjun Zeng; |
1202 | Symmetry-Aware Robot Design with Structured Subgroups Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. |
Heng Dong; Junyu Zhang; Tonghan Wang; Chongjie Zhang; |
1203 | Gradient Descent Monotonically Decreases The Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this model, we prove that the GFS sharpness decreases monotonically. |
Itai Kreisler; Mor Shpigel Nacson; Daniel Soudry; Yair Carmon; |
1204 | Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how HuBERT uses clustering to discover hidden acoustic units, we formulate a factor analysis (FA) model that uses the discovered hidden acoustic units to align the SSL features. |
Weiwei Lin; Chenhang HE; Man-Wai Mak; Youzhi Tu; |
1205 | Refined Regret for Adversarial MDPs with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides two algorithms that improve the regret to $\tilde{\mathcal O}(\sqrt K)$ in the same setting. |
Yan Dai; Haipeng Luo; Chen-Yu Wei; Julian Zimmert; |
1206 | EM-Network: Oracle Guided Self-distillation for Sequence Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning. |
Ji Won Yoon; SungHwan Ahn; Hyeonseung Lee; Minchan Kim; Seok Min Kim; Nam Soo Kim; |
1207 | A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, whether counterexample-guided repair is guaranteed to terminate remains an open question. We approach this question by showing that counterexample-guided repair can be viewed as a robust optimisation algorithm. |
David Boetius; Stefan Leue; Tobias Sutter; |
1208 | One-Shot Federated Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Conformal Prediction method that computes prediction sets in a one-shot Federated Learning (FL) setting. |
Pierre Humbert; Batiste Le bars; Aurélien Bellet; Sylvain Arlot; |
1209 | Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. |
Jongyeong Lee; Junya Honda; Chao-Kai Chiang; Masashi Sugiyama; |
1210 | Dimension-independent Certified Neural Network Watermarks Via Mollifier Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By leveraging mollifier theory, this paper proposes a mollifier smoothing method with dimension-independent certified radius of our proposed smooth classifier, for conducting the certified watermark problem against the $l_p$-norm watermark removal attacks ($1 \leq p \leq \infty$) for high parameter dimension $d$. |
Jiaxiang Ren; Yang Zhou; Jiayin Jin; Lingjuan Lyu; Da Yan; |
1211 | Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs. |
Wenhao XU; Xuefeng Gao; Xuedong He; |
1212 | A Generalization of ViT/MLP-Mixer to Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. |
Xiaoxin He; Bryan Hooi; Thomas Laurent; Adam Perold; Yann LeCun; Xavier Bresson; |
1213 | Global Context Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision. |
Ali Hatamizadeh; Hongxu Yin; Greg Heinrich; Jan Kautz; Pavlo Molchanov; |
1214 | Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study variance-dependent regret bounds for Markov decision processes (MDPs). |
Runlong Zhou; Zhang Zihan; Simon Shaolei Du; |
1215 | Controllability-Aware Unsupervised Skill Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision. |
Seohong Park; Kimin Lee; Youngwoon Lee; Pieter Abbeel; |
1216 | Improving Graph Neural Networks with Learnable Propagation Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, Convolutional Neural Networks (CNNs) can learn diverse propagation filters, and phenomena like over-smoothing are typically not apparent in CNNs. In this paper, we bridge these gaps by incorporating trainable channel-wise weighting factors $\omega$ to learn and mix multiple smoothing and sharpening propagation operators at each layer. |
Moshe Eliasof; Lars Ruthotto; Eran Treister; |
1217 | Multiplier Bootstrap-based Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. |
Runzhe Wan; Haoyu Wei; Branislav Kveton; Rui Song; |
1218 | Predictable MDP Abstraction for Unsupervised Model-Based RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Errors in this predictive model can degrade the performance of model-based controllers, and complex Markov decision processes (MDPs) can present exceptionally difficult prediction problems. To mitigate this issue, we propose predictable MDP abstraction (PMA): instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space that only permits predictable, easy-to-model actions, while covering the original state-action space as much as possible. |
Seohong Park; Sergey Levine; |
1219 | CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies. |
Zachary Novack; Julian McAuley; Zachary Chase Lipton; Saurabh Garg; |
1220 | Towards Credible Visual Model Interpretation with Path Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not only that, for deep visual models, the methods may also not conform to the original game-theoretic intuitions that are the basis of their axiomatic nature. To address these issues, we perform a systematic investigation of the path attribution framework. |
NAVEED AKHTAR; Mohammad A. A. K. Jalwana; |
1221 | Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For convex and smooth functions, we obtain the same $\mathcal{O}(\sqrt{\sigma_{1:T}^2}+\sqrt{\Sigma_{1:T}^2})$ regret bound, without the convexity requirement of individual functions. For strongly convex and smooth functions, we establish an $\mathcal{O}(\min\{\log (\sigma_{1:T}^2+\Sigma_{1:T}^2), (\sigma_{\max}^2 + \Sigma_{\max}^2) \log T\})$ bound, better than their $\mathcal{O}((\sigma_{\max}^2 + \Sigma_{\max}^2) \log T)$ result. For exp-concave and smooth functions, we achieve a new $\mathcal{O}(d\log(\sigma_{1:T}^2+\Sigma_{1:T}^2))$ bound. |
Sijia Chen; Wei-Wei Tu; Peng Zhao; Lijun Zhang; |
1222 | Interactive Object Placement with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these random vectors are not interpretable, which prevents users from interacting with the object placement process. To address this problem, we propose an Interactive Object Placement method with Reinforcement Learning, dubbed IOPRE, to make sequential decisions for producing a reasonable placement given an initial location and size of the foreground. |
Shengping Zhang; Quanling Meng; Qinglin Liu; Liqiang Nie; Bineng Zhong; Xiaopeng Fan; Rongrong Ji; |
1223 | Mechanistic Mode Connectivity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. |
Ekdeep Singh Lubana; Eric J Bigelow; Robert P. Dick; David Krueger; Hidenori Tanaka; |
1224 | Contrastive Learning Meets Homophily: Two Birds with One Stone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduced a new parameterized neighbor sampling component to replace the conventional sub-optimal samplings. |
Dongxiao He; JiTao Zhao; Rui Guo; Zhiyong Feng; Di Jin; Yuxiao Huang; Zhen Wang; Weixiong Zhang; |
1225 | Adversarial Parameter Attack on Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such attacks could be detected by the user, because the accuracy of the attacked network will reduce and the network cannot work normally. To make the attack more stealthy, in this paper, the adversarial parameter attack is proposed, in which small perturbations to the parameters of the network are made such that the accuracy of the attacked network does not decrease much, but its robustness against adversarial example attacks becomes much lower. |
Lijia Yu; Yihan Wang; Xiao-Shan Gao; |
1226 | AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we adopt **N**on-**P**arametric **C**lassifier to perform the test-time **Ada**ptation (**AdaNPC**). |
YiFan Zhang; xue wang; Kexin Jin; Kun Yuan; Zhang Zhang; Liang Wang; Rong Jin; Tieniu Tan; |
1227 | Tuning Language Models As Training Data Generators for Augmentation-Enhanced Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. |
Yu Meng; Martin Michalski; Jiaxin Huang; Yu Zhang; Tarek Abdelzaher; Jiawei Han; |
1228 | NeuralSlice: Neural 3D Triangle Mesh Reconstruction Via Slicing 4D Tetrahedral Meshes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel 3D shape representation named NeuralSlice, which represents a 3D shape as the intersection of a 4D tetrahedral mesh and a 4D hyperplane. |
Chenbo Jiang; Jie Yang; Shwai He; Yu-Kun Lai; Lin Gao; |
1229 | Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators. |
Mo Zhou; Rong Ge; |
1230 | Leveraging Offline Data in Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practical scenarios often motivate an intermediate setting: if we have some set of offline data and may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy. In this work, we consider this setting, which we call the FineTuneRL setting, for MDPs with linear structure. |
Andrew Wagenmaker; Aldo Pacchiano; |
1231 | Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. |
Ameya Velingker; Maximilian Vötsch; David Woodruff; Samson Zhou; |
1232 | Pareto Regret Analyses in Multi-objective Multi-armed Bandit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings. |
Mengfan Xu; Diego Klabjan; |
1233 | Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure. |
Anas Barakat; Ilyas Fatkhullin; Niao He; |
1234 | Meta Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. |
Brandon Amos; Giulia Luise; Samuel Cohen; Ievgen Redko; |
1235 | Hyperbolic Image-text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MERU, a contrastive model that yields hyperbolic representations of images and text. |
Karan Desai; Maximilian Nickel; Tanmay Rajpurohit; Justin Johnson; Shanmukha Ramakrishna Vedantam; |
1236 | LongCoder: A Long-Range Pre-trained Language Model for Code Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. |
Daya Guo; Canwen Xu; Nan Duan; Jian Yin; Julian McAuley; |
1237 | Which Is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we answer the question from the perspective of *causal data generative process*. |
Yu Yao; Mingming Gong; Yuxuan Du; Jun Yu; Bo Han; Kun Zhang; Tongliang Liu; |
1238 | Surface Snapping Optimization Layer for Single Image Object Shape Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage recent advances in monocular scene understanding to incorporate an additional geometric cue of surface normals. |
Yuan-Ting Hu; Alex Schwing; Raymond A. Yeh; |
1239 | A Closer Look at Self-Supervised Lightweight Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop and benchmark several self-supervised pre-training methods on image classification tasks and some downstream dense prediction tasks. |
Shaoru Wang; Jin Gao; Zeming Li; Xiaoqin Zhang; Weiming Hu; |
1240 | NP-SemiSeg: When Neural Processes Meet Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we move one step forward by adapting NPs to semi-supervised semantic segmentation, resulting in a new model called NP-SemiSeg. |
Jianfeng Wang; Daniela Massiceti; Xiaolin Hu; Vladimir Pavlovic; Thomas Lukasiewicz; |
1241 | A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we pinpoint a major limitation of the leading empirical defense, adversarial training, when applied to 3D point cloud models: gradient obfuscation, which significantly hampers robustness against potent attacks. |
Jiachen Sun; Jiongxiao Wang; Weili Nie; Zhiding Yu; Zhuoqing Mao; Chaowei Xiao; |
1242 | Regret-Minimizing Double Oracle for Extensive-Form Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among regret minimization-based double oracle methods, being only polynomial in $|S|$. |
Xiaohang Tang; Le Cong Dinh; Stephen Marcus McAleer; Yaodong Yang; |
1243 | Representations and Exploration for Deep Reinforcement Learning Using Singular Value Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. |
Yash Chandak; Shantanu Thakoor; Zhaohan Daniel Guo; Yunhao Tang; Remi Munos; Will Dabney; Diana L Borsa; |
1244 | Graph Ladling: Shockingly Simple Parallel GNN Training Without Intermediate Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of $\textit{unhealthy gradients, over-smoothening, information squashing}$, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs. |
AJAY KUMAR JAISWAL; Shiwei Liu; Tianlong Chen; Ying Ding; Zhangyang Wang; |
1245 | Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the notion of barrier function to explicitly encode the hard safety chance constraints, and given that the environment is unknown, relax them to our design of *generative-model-based soft barrier functions*. Based on such soft barriers, we propose a novel safe RL approach with bi-level optimization that can jointly learn the unknown environment and optimize the control policy, while effectively avoiding the unsafe region with safety probability optimization. |
Yixuan Wang; Simon Sinong Zhan; Ruochen Jiao; Zhilu Wang; Wanxin Jin; Zhuoran Yang; Zhaoran Wang; Chao Huang; Qi Zhu; |
1246 | Adaptive Identification of Populations with Treatment Benefit in Clinical Trials: Machine Learning Challenges and Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the unique characteristics of the subpopulation selection problem — most importantly that (i) one is usually interested in finding subpopulations with any treatment benefit (and not necessarily the single subgroup with largest effect) given a limited budget and that (ii) effectiveness only has to be demonstrated across the subpopulation on average — give rise to interesting challenges and new desiderata when designing algorithmic solutions. Building on these findings, we propose AdaGGI and AdaGCPI, two meta-algorithms for subpopulation construction. |
Alicia Curth; Alihan Hüyük; Mihaela van der Schaar; |
1247 | Efficient Sequence Transduction By Jointly Predicting Tokens and Durations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for sequence-to-sequence tasks. |
Hainan Xu; Fei Jia; Somshubra Majumdar; He Huang; Shinji Watanabe; Boris Ginsburg; |
1248 | Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the stochastic linear contextual bandit problem with high-dimensional features. |
Sunrit Chakraborty; Saptarshi Roy; Ambuj Tewari; |
1249 | Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study distributed contextual linear bandits with stochastic contexts, where $N$ agents/learners act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. |
Sanae Amani; Tor Lattimore; András György; Lin Yang; |
1250 | Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We, therefore, develop techniques to (i) reverse-engineer the preprocessor and then (ii) use this extracted information to attack the end-to-end system. |
Chawin Sitawarin; Florian Tramèr; Nicholas Carlini; |
1251 | MixFlows: Principled Variational Inference Via Mixed Flows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents mixed variational flows (MixFlows), a new variational family that consists of a mixture of repeated applications of a map to an initial reference distribution. |
Zuheng Xu; Naitong Chen; Trevor Campbell; |
1252 | Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was limited since its numerical implementation by conventional classical computation requires an exponential runtime $\exp(O(D))$ as data dimension $D$ increases. To address this problem, we develop a quantum ridgelet transform (QRT), which implements the ridgelet transform of a quantum state within a linear runtime $O(D)$ of quantum computation. |
Hayata Yamasaki; Sathyawageeswar Subramanian; Satoshi Hayakawa; Sho Sonoda; |
1253 | Federated Conformal Predictors for Distributed Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we extend conformal prediction to the federated learning setting. |
Charles Lu; Yaodong Yu; Sai Praneeth Karimireddy; Michael Jordan; Ramesh Raskar; |
1254 | Quantifying Human Priors Over Social and Navigation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: can I walk from here to there? In this work, we leverage the combinatorial structure of graphs to quantify human priors over such relational data. |
Gecia Bravo-Hermsdorff; |
1255 | Learning in POMDPs Is Sample-Efficient with Hindsight Observability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. |
Jonathan Lee; Alekh Agarwal; Christoph Dann; Tong Zhang; |
1256 | Efficient Graph Field Integrators Meet Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present two new classes of algorithms for efficient field integration on graphs encoding point cloud data. |
Krzysztof Marcin Choromanski; Arijit Sehanobish; Han Lin; YUNFAN ZHAO; Eli Berger; Tetiana Parshakova; Alvin Pan; David Watkins; Tianyi Zhang; Valerii Likhosherstov; Somnath Basu Roy Chowdhury; Kumar Avinava Dubey; Deepali Jain; Tamas Sarlos; Snigdha Chaturvedi; Adrian Weller; |
1257 | Are Large Kernels Better Teachers Than Transformers for ConvNets? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets. |
Tianjin Huang; Lu Yin; Zhenyu Zhang; Li Shen; Meng Fang; Mykola Pechenizkiy; Zhangyang Wang; Shiwei Liu; |
1258 | MonoNeRF: Learning Generalizable NeRFs from Monocular Videos Without Camera Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generalizable neural radiance fields – MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes without any ground-truth annotations of depth and camera poses. |
Yang Fu; Ishan Misra; Xiaolong Wang; |
1259 | Fair Densities Via Boosting The Sufficient Statistics of Exponential Families Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a boosting algorithm to pre-process data for fairness. |
Alexander Soen; Hisham Husain; Richard Nock; |
1260 | Fast Rates for Maximum Entropy Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the maximum entropy exploration problem of two different types. |
Daniil Tiapkin; Denis Belomestny; Daniele Calandriello; Eric Moulines; Remi Munos; Alexey Naumov; pierre perrault; Yunhao Tang; Michal Valko; Pierre MENARD; |
1261 | On The Robustness of Randomized Ensembles to Adversarial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: . In this work, we first demystify RECs as we derive fundamental results regarding their theoretical limits, necessary and sufficient conditions for them to be useful, and more. |
Hassan Dbouk; Naresh Shanbhag; |
1262 | Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver. |
Runlong Zhou; Ruosong Wang; Simon Shaolei Du; |
1263 | Multi-Environment Pretraining Enables Transfer to Action Limited Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions – for example, videos of game-play are much more available than sequences of frames paired with their logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a *target* environment of interest with fully-annotated datasets from various other *source* environments. |
David Venuto; Sherry Yang; Pieter Abbeel; Doina Precup; Igor Mordatch; Ofir Nachum; |
1264 | Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the dynamics of QNNs and show that contrary to popular belief it is qualitatively different from that of any kernel regression: due to the unitarity of quantum operations, there is a non-negligible deviation from the tangent kernel regression derived at the random initialization. |
Xuchen You; Shouvanik Chakrabarti; Boyang Chen; Xiaodi Wu; |
1265 | Latent Traversals in Generative Models As Potential Flows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape’s gradient. |
Yue Song; T. Anderson Keller; Nicu Sebe; Max Welling; |
1266 | Taxonomy-Structured Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle a generalization with taxonomy-structured domains, which formalizes domains with nested, hierarchical similarity structures such as animal species and product catalogs. |
Tianyi Liu; Zihao Xu; Hao He; Guang-Yuan Hao; Guang-He Lee; Hao Wang; |
1267 | Hybrid Energy Based Model in The Feature Space for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the HEAT model, a new post-hoc OOD detection method estimating the density of in-distribution (ID) samples using hybrid energy-based models (EBM) in the feature space of a pre-trained backbone. |
Marc Lafon; Elias Ramzi; Clément Rambour; Nicolas THOME; |
1268 | A New PHO-rmula for Improved Performance of Semi-Structured Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that techniques to properly identify the contributions of the different model components in SSNs, however, lead to suboptimal network estimation, slower convergence, and degenerated or erroneous predictions. In order to solve these problems while preserving favorable model properties, we propose a non-invasive post-hoc orthogonalization (PHO) that guarantees identifiability of model components and provides better estimation and prediction quality. |
David Rügamer; |
1269 | Offline Reinforcement Learning with Closed-Form Policy Improvement Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose our closed-form policy improvement operators. |
Jiachen Li; Edwin Zhang; Ming Yin; Qinxun Bai; Yu-Xiang Wang; William Yang Wang; |
1270 | Uncovering Adversarial Risks of Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we uncover a novel security vulnerability of TTA based on the insight that predictions on benign samples can be impacted by malicious samples in the same batch. To exploit this vulnerability, we propose Distribution Invading Attack (DIA), which injects a small fraction of malicious data into the test batch. |
Tong Wu; Feiran Jia; Xiangyu Qi; Jiachen T. Wang; Vikash Sehwag; Saeed Mahloujifar; Prateek Mittal; |
1271 | DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fundamentally, this might be because multi-step policy improvements require operations that cannot be approximated by stochastic samples, hence hindering the widespread adoption of such methods in practice. To address such limitations, we introduce doubly multi-step off-policy VI (DoMo-VI), a novel oracle algorithm that combines multi-step policy improvements and policy evaluations. |
Yunhao Tang; Tadashi Kozuno; Mark Rowland; Anna Harutyunyan; Remi Munos; Bernardo Avila Pires; Michal Valko; |
1272 | Reward-Mixing MDPs with Few Latent Contexts Are Learnable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we resolve several open questions for the general RMMDP setting. |
Jeongyeol Kwon; Yonathan Efroni; Constantine Caramanis; Shie Mannor; |
1273 | Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under the function approximation setup where the optimal latent state-action $Q$-function is linear in the state feature, and the optimal $Q$-function has a gap in actions, we provide a computationally and statistically efficient algorithm for finding the exact optimal policy. |
Masatoshi Uehara; Ayush Sekhari; Jason D. Lee; Nathan Kallus; Wen Sun; |
1274 | A Flexible Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework for parameterizing diffusion models, particularly the spatial part of forward SDEs, by leveraging the symplectic and Riemannian geometry of the data manifold. |
weitao Du; He Zhang; Tao Yang; Yuanqi Du; |
1275 | On Sampling with Approximate Transport Maps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. |
Louis Grenioux; Alain Durmus; Eric Moulines; Marylou Gabrié; |
1276 | COLA: Orchestrating Error Coding and Learning for Robust Neural Network Inference Against Hardware Defects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose to reduce inner layer feature error correlation by 1) adopting a separated architecture, where the last portions of the paths to all output nodes are separated, and 2) orthogonalizing weights in common DNN layers so that the intermediate features are orthogonal with each other. |
Anlan Yu; Ning Lyu; Jieming Yin; Zhiyuan Yan; Wujie Wen; |
1277 | On The Occupancy Measure of Non-Markovian Policies in Continuous MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While expected, for technical reasons, the translation of this result to continuous state space has resisted until now. Our main contribution is to fill this gap and to provide a general measure-theoretic treatment of the problem, permitting, in particular, its extension to continuous MDPs. |
Romain Laroche; Remi Tachet des Combes; |
1278 | How Jellyfish Characterise Alternating Group Equivariant Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a full characterisation of all of the possible alternating group ($A_n$) equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. |
Edward Pearce-Crump; |
1279 | Learning The Dynamics of Sparsely Observed Interacting Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of learning the dynamics of an unknown non-parametric system linking a target and a feature time series. |
Linus Bleistein; Adeline Fermanian; Anne-Sophie Jannot; Agathe Guilloux; |
1280 | Differentiable Multi-Target Causal Bayesian Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting — a critical component for causal discovery from finite data where interventions can be costly or risky. |
Panagiotis Tigas; Yashas Annadani; Desi R. Ivanova; Andrew Jesson; Yarin Gal; Adam Foster; Stefan Bauer; |
1281 | Fair Yet Asymptotically Equal Collaborative Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions. |
Xiaoqiang Lin; Xinyi Xu; See-Kiong Ng; Chuan-Sheng Foo; Bryan Kian Hsiang Low; |
1282 | From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. |
Edwige Cyffers; Aurélien Bellet; Debabrota Basu; |
1283 | Safe Offline Reinforcement Learning with Real-Time Budget Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, in many realworld applications, the learned policy is required to respond to dynamically determined safety budgets (i.e., constraint threshold) in real time. In this paper, we target at the above real-time budget constraint problem under the offline setting, and propose Trajectory-based REal-time Budget Inference (TREBI) as a novel solution that approaches this problem from the perspective of trajectory distribution. |
Qian Lin; Bo Tang; Zifan Wu; Chao Yu; Shangqin Mao; Qianlong Xie; Xingxing Wang; Dong Wang; |
1284 | Differential Privacy Has Bounded Impact on Fairness in Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We theoretically study the impact of differential privacy on fairness in classification. |
Paul Mangold; Michaël Perrot; Aurélien Bellet; Marc Tommasi; |
1285 | Rethinking Visual Reconstruction: Experience-Based Content Completion Guided By Visual Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing approaches ignored the brain completion mechanism. In this work, we propose to reconstruct seen images with both the visual perception and the brain completion process, and design a simple, yet effective visual decoding framework to achieve this goal. |
Jiaxuan Chen; Yu Qi; Gang Pan; |
1286 | The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. |
Borja Rodríguez Gálvez; Arno Blaas; Pau Rodriguez; Adam Golinski; Xavier Suau; Jason Ramapuram; Dan Busbridge; Luca Zappella; |
1287 | SlotGAT: Slot-based Message Passing for Heterogeneous Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. |
Ziang Zhou; Jieming Shi; Renchi Yang; Yuanhang Zou; Qing Li; |
1288 | WL Meet VC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study GNNs’ generalization ability through the lens of Vapnik-Chervonenkis (VC) dimension theory in two settings, focusing on graph-level predictions. |
Christopher Morris; Floris Geerts; Jan Tönshoff; Martin Grohe; |
1289 | Hierarchical Diffusion for Offline Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first formulate the problem of offline long-horizon decision-$\mathbf{M}$ak$\mathbf{I}$ng from the perspective of conditional generative modeling by incorporating goals into the control-as-inference graphic models. A $\mathbf{H}$ierarchical trajectory-level $\mathbf{D}$iffusion probabilistic model is then proposed with classifier-free guidance. HDMI employs a cascade framework that utilizes the reward-conditional goal diffuser for the subgoal discovery and the goal-conditional trajectory diffuser for generating the corresponding action sequence of subgoals. |
Wenhao Li; Xiangfeng Wang; Bo Jin; Hongyuan Zha; |
1290 | A Connection Between One-Step RL and Critic Regularization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods typically require more compute but have appealing lower-bound guarantees. In this paper, we draw a connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. |
Benjamin Eysenbach; Matthieu Geist; Sergey Levine; Ruslan Salakhutdinov; |
1291 | SOM-CPC: Unsupervised Contrastive Learning with Self-Organizing Maps for Structured Representations of High-Rate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose SOM-CPC, a model that visualizes data in an organized 2D manifold, while preserving higher-dimensional information. |
Iris A.M. Huijben; Arthur Andreas Nijdam; Sebastiaan Overeem; Merel M Van Gilst; Ruud Van Sloun; |
1292 | Quantized Distributed Training of Large Models with Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. |
Ilia Markov; Adrian Vladu; Qi Guo; Dan Alistarh; |
1293 | LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The action fluctuation is caused by the high Lipschitz constant of actor networks. To address this problem, we propose a neural network named LipsNet. |
Xujie Song; Jingliang Duan; Wenxuan Wang; Shengbo Eben Li; Chen Chen; Bo Cheng; Bo Zhang; Junqing Wei; Xiaoming Simon Wang; |
1294 | Fast Excess Risk Rates Via Offset Rademacher Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the offset Rademacher complexity, this work outlines a systematical framework for deriving sharp excess risk bounds in statistical learning without Bernstein condition. |
Chenguang Duan; Yuling Jiao; Lican Kang; Xiliang Lu; Jerry Zhijian Yang; |
1295 | Doubly Adversarial Federated Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the bandit feedback setting, we propose a near-optimal federated bandit algorithm called FEDEXP3. |
Jialin Yi; Milan Vojnovic; |
1296 | Architecture-Agnostic Masked Image Modeling — From ViT Back to CNN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we observe that MIM essentially teaches the model to learn better middle-order interactions among patches for more generalized feature extraction. |
Siyuan Li; Di Wu; Fang Wu; Zelin Zang; Stan Z. Li; |
1297 | On Strengthening and Defending Graph Reconstruction Attack with Markov Chain Approximation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Especially by taking GNNs as a Markov chain and attacking GNNs via a flexible chain approximation, we systematically explore the underneath principles of graph reconstruction attack, and propose two information theory-guided mechanisms: (1) the chain-based attack method with adaptive designs for extracting more private information; (2) the chain-based defense method that sharply reduces the attack fidelity with moderate accuracy loss. |
Zhanke Zhou; Chenyu Zhou; Xuan Li; Jiangchao Yao; quanming yao; Bo Han; |
1298 | Cooperative Open-ended Learning Framework for Zero-Shot Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility. To address this issue, we propose the Cooperative Open-ended LEarning (COLE) framework, which constructs open-ended objectives in cooperative games with two players from the perspective of graph theory to assess and identify the cooperative ability of each strategy. |
Yang Li; Shao Zhang; Jichen Sun; Yali Du; Ying Wen; Xinbing Wang; Wei Pan; |
1299 | Stochastic Gradient Descent Under Markovian Sampling Schemes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. |
Mathieu Even; |
1300 | Learning Instance-Specific Augmentations By Capturing Local Invariances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce InstaAug, a method for automatically learning input-specific augmentations from data. |
Ning Miao; Tom Rainforth; Emile Mathieu; Yann Dubois; Yee Whye Teh; Adam Foster; Hyunjik Kim; |
1301 | Sampling-based Nyström Approximation and Kernel Quadrature Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the Nyström approximation of a positive definite kernel associated with a probability measure. |
Satoshi Hayakawa; Harald Oberhauser; Terry Lyons; |
1302 | CO-BED: Information-Theoretic Contextual Optimization Via Bayesian Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize the problem of contextual optimization through the lens of Bayesian experimental design and propose CO-BED—a general, model-agnostic framework for designing contextual experiments using information-theoretic principles. |
Desi R. Ivanova; Joel Jennings; Tom Rainforth; Cheng Zhang; Adam Foster; |
1303 | Nonlinear Advantage: Trained Networks Might Not Be As Complex As You Think Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To characterize the depth of the resulting partially linearized network, we introduce a measure called average path length, representing the average number of active nonlinearities encountered along a path in the network graph. |
Christian H.X. Ali Mehmeti-Göpel; Jan Disselhoff; |
1304 | Bidirectional Learning for Offline Model-based Biological Sequence Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on biological sequence design to maximize some sequence score. |
Can Chen; Yingxue Zhang; Xue Liu; Mark Coates; |
1305 | Achieving High Accuracy with PINNs Via Energy Natural Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose energy natural gradient descent, a natural gradient method with respect to a Hessian-induced Riemannian metric as an optimization algorithm for physics-informed neural networks (PINNs) and the deep Ritz method. |
Johannes Müller; Marius Zeinhofer; |
1306 | Graph Switching Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel graph-based approach for switching dynamical systems, GRAph Switching dynamical Systems (GRASS), in which we use a dynamic graph to characterize interactions between objects and learn both intra-object and inter-object mode-switching behaviour.For benchmarking, we create two new datasets, a synthesized ODE-driven particles dataset and a real-world Salsa-couple dancing dataset.We will release code and data after acceptance. |
Yongtuo Liu; Sara Magliacane; Miltiadis Kofinas; Efstratios Gavves; |
1307 | Improving Expert Predictions with Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance. |
Eleni Straitouri; Lequn Wang; Nastaran Okati; Manuel Gomez Rodriguez; |
1308 | Neural Wasserstein Gradient Flows for Discrepancies with Riesz Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. |
Fabian Altekrüger; Johannes Hertrich; Gabriele Steidl; |
1309 | Boosting Graph Contrastive Learning Via Graph Contrastive Saliency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Random augmentations may inevitably lead to semantic information corruption during the training, and force the network to mistakenly focus on semantically irrelevant environmental background structures. To address these limitations and to improve generalization, we propose a novel self-supervised learning framework for GCL, which can adaptively screen the semantic-related substructure in graphs by capitalizing on the proposed gradient-based Graph Contrastive Saliency (GCS). |
Chunyu Wei; Yu Wang; Bing Bai; Kai Ni; David J. Brady; LU FANG; |
1310 | Bigger, Better, Faster: Human-level Atari with Human-level Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. |
Max Schwarzer; Johan Samir Obando Ceron; Aaron Courville; Marc G Bellemare; Rishabh Agarwal; Pablo Samuel Castro; |
1311 | Nonparametric Iterative Machine Teaching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of Iterative Machine Teaching (IMT), where the teacher provides examples to the learner iteratively such that the learner can achieve fast convergence to a target model. |
Chen Zhang; Xiaofeng Cao; Weiyang Liu; Ivor Tsang; James Kwok; |
1312 | On The Convergence Rate of Gaussianization with Random Rotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore potential speed-ups and formulate challenges for further research. |
Felix Draxler; Lars Kühmichel; Armand Rousselot; Jens Müller; Christoph Schnoerr; Ullrich Koethe; |
1313 | Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally characterise the partial identifiability of the reward function given several popular reward learning data sources, including expert demonstrations and trajectory comparisons. |
Joar Max Viktor Skalse; Matthew Farrugia-Roberts; Stuart Russell; Alessandro Abate; Adam Gleave; |
1314 | Multi-User Reinforcement Learning with Low Rank Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. |
Dheeraj Mysore Nagaraj; Suhas S Kowshik; Naman Agarwal; Praneeth Netrapalli; Prateek Jain; |
1315 | Gibbsian Polar Slice Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By updating the directional and radial components of chain iterates separately, we obtain a family of samplers that mimic polar slice sampling, and yet can be implemented efficiently. |
Philip Schär; Michael Habeck; Daniel Rudolf; |
1316 | HarsanyiNet: Computing Accurate Shapley Values in A Single Forward Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when people use Shapley values to explain the attribution of input variables of a deep neural network (DNN), it usually requires a very high computational cost to approximate relatively accurate Shapley values in real-world applications. Therefore, we propose a novel network architecture, the HarsanyiNet, which makes inferences on the input sample and simultaneously computes the exact Shapley values of the input variables in a single forward propagation. |
Lu Chen; Siyu Lou; Keyan Zhang; Jin Huang; Quanshi Zhang; |
1317 | A Fast Optimistic Method for Monotone Variational Inequalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study monotone variational inequalities that can arise as optimality conditions for constrained convex optimization or convex-concave minimax problems and propose a novel algorithm that uses only one gradient/operator evaluation and one projection onto the constraint set per iteration. |
Michael Sedlmayer; Dang-Khoa Nguyen; Radu Ioan Bot; |
1318 | Improving Hyperparameter Learning Under Approximate Inference in Gaussian Process Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We improve hyperparameter learning in GP models and focus on the interplay between variational inference (VI) and the learning target. |
Rui Li; S. T. John; Arno Solin; |
1319 | Understanding Self-Predictive Learning for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. |
Yunhao Tang; Zhaohan Daniel Guo; Pierre Harvey Richemond; Bernardo Avila Pires; Yash Chandak; Remi Munos; Mark Rowland; Mohammad Gheshlaghi Azar; Charline Le Lan; Clare Lyle; András György; Shantanu Thakoor; Will Dabney; Bilal Piot; Daniele Calandriello; Michal Valko; |
1320 | Towards A Better Understanding of Representation Dynamics Under TD-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the question: how does end-to-end TD-learning impact the representation over time?Complementary to prior work, we provide a set of analysis that sheds further light on the representation dynamics under TD-learning. |
Yunhao Tang; Remi Munos; |
1321 | VA-learning As A More Efficient Alternative to Q-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. |
Yunhao Tang; Remi Munos; Mark Rowland; Michal Valko; |
1322 | CRISP: Curriculum Based Sequential Neural Decoders for Polar Code Family Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{ C}$ur${\textbf{RI}}$culum based $\textbf{S}$equential neural decoder for $\textbf{P}$olar codes (CRISP). |
S Ashwin Hebbar; Viraj Vivek Nadkarni; Ashok Vardhan Makkuva; Suma Bhat; Sewoong Oh; Pramod Viswanath; |
1323 | Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning with these matrices requires the usage of Riemanian geometry to account for their structure. In this paper, we propose a new method to deal with distributions of covariance matrices, and demonstrate its computational efficiency on M/EEG multivariate time series. |
Clément Bonet; Benoît Malézieux; Alain Rakotomamonjy; Lucas Drumetz; Thomas Moreau; Matthieu Kowalski; Nicolas Courty; |
1324 | On Uni-Modal Feature Learning in Supervised Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to choose a targeted late-fusion learning method for the given supervised multi-modal task from Uni-Modal Ensemble (UME) and the proposed Uni-Modal Teacher (UMT), according to the distribution of uni-modal and paired features. |
Chenzhuang Du; Jiaye Teng; Tingle Li; Yichen Liu; Tianyuan Yuan; Yue Wang; Yang Yuan; Hang Zhao; |
1325 | Revisiting Weighted Aggregation in Federated Learning with Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. |
Zexi Li; Tao Lin; Xinyi Shang; Chao Wu; |
1326 | Why Do Nearest Neighbor Language Models Work? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on. |
Frank F. Xu; Uri Alon; Graham Neubig; |
1327 | Implicit Jacobian Regularization Weighted with Impurity of Probability Output Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. |
Sungyoon Lee; Jinseong Park; Jaewook Lee; |
1328 | Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of decentralized multi-agent reinforcement learning in Markov games. |
Dylan J Foster; Noah Golowich; Sham M. Kakade; |
1329 | SRATTA: Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. |
Tanguy Marchand; Regis Loeb; Ulysse Marteau-Ferey; Jean Ogier du Terrail; Arthur Pignet; |
1330 | Shape-Guided Dual-Memory Learning for 3D Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a shape-guided expert-learning framework to tackle the problem of unsupervised 3D anomaly detection. |
Yu-Min Chu; Liu Chieh; Ting-I Hsieh; Hwann-Tzong Chen; Tyng-Luh Liu; |
1331 | StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present a graph reinforcement learning approach, StriderNet, that learns a policy to displace the atoms towards low energy configurations. |
Vaibhav Bihani; Sahil Manchanda; Srikanth Sastry; Sayan Ranu; N M Anoop Krishnan; |
1332 | A Deep Conjugate Direction Method for Iteratively Solving Linear Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel deep learning approach to approximate the solution of large, sparse, symmetric, positive-definite linear systems of equations. |
Ayano Kaneda; Osman Akar; Jingyu Chen; Victoria Alicia Trevino Kala; David Hyde; Joseph Teran; |
1333 | Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Banker Online Mirror Descent (Banker-OMD), a novel framework generalizing the classical Online Mirror Descent (OMD) technique in the online learning literature. |
Jiatai Huang; Yan Dai; Longbo Huang; |
1334 | HyperTuning: Toward Adapting Large Language Models Without Back-propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model. |
Jason Phang; Yi Mao; Pengcheng He; Weizhu Chen; |
1335 | Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. |
Shibo Li; Michael Penwarden; Yiming Xu; Conor Tillinghast; Akil Narayan; Robert Kirby; Shandian Zhe; |
1336 | Contextual Combinatorial Bandits with Probabilistically Triggered Arms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round. |
Xutong Liu; Jinhang Zuo; Siwei Wang; John C.S. Lui; Mohammad Hajiesmaili; Adam Wierman; Wei Chen; |
1337 | SAAL: Sharpness-Aware Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome overfitting, this paper introduces the first active learning method to incorporate the sharpness of loss space into the acquisition function. |
Yoon-Yeong Kim; Youngjae Cho; JoonHo Jang; Byeonghu Na; Yeongmin Kim; Kyungwoo Song; Wanmo Kang; Il-chul Moon; |
1338 | Towards Deep Attention in Graph Neural Networks: Problems and Remedies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate some problematic phenomena related to deep graph attention, including vulnerability to over-smoothed features and smooth cumulative attention. |
Soo Yong Lee; Fanchen Bu; Jaemin Yoo; Kijung Shin; |
1339 | Understanding The Role of Feedback in Online Learning with Switching Costs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the role of feedback in online learning with switching costs. |
Duo Cheng; Xingyu Zhou; Bo Ji; |
1340 | Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates. |
Naoki Sato; Hideaki Iiduka; |
1341 | Neural Prediction Errors Enable Analogical Visual Reasoning in Human Standard Intelligence Tests Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a neural network model to solve Raven’s Progressive Matrices (RPM) – one of the standard intelligence tests in human psychology. |
Lingxiao Yang; Hongzhi You; Zonglei Zhen; Dahui Wang; Xiaohong Wan; Xiaohua Xie; Ru-Yuan Zhang; |
1342 | Live in The Moment: Learning Dynamics Model Adapted to Evolving Policy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for *all historical policies* does not necessarily benefit model prediction for the *current policy* since the policy in use is constantly evolving over time. |
Xiyao Wang; Wichayaporn Wongkamjan; Ruonan Jia; Furong Huang; |
1343 | Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform much worse than the oracle best. |
Hilaf Hasson; Danielle C. Maddix; Bernie Wang; Gaurav Gupta; Youngsuk Park; |
1344 | Straightening Out The Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments. We propose to address this issue via affine re-parameterization of the code vectors. |
Minyoung Huh; Brian Cheung; Pulkit Agrawal; Phillip Isola; |
1345 | Mitigating Memorization of Noisy Labels By Clipping The Model Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses. |
Hongxin Wei; HUIPING ZHUANG; RENCHUNZI XIE; Lei Feng; Gang Niu; Bo An; Yixuan Li; |
1346 | Approximately Optimal Core Shapes for Tensor Decompositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and give a polynomial-time approximation scheme based on a reduction to the 2-dimensional knapsack problem with a matroid constraint. |
Mehrdad Ghadiri; Matthew Fahrbach; Gang Fu; Vahab Mirrokni; |
1347 | A Gromov–Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we consider a graph as an element on a metric space equipped with the Gromov–Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. |
Yifan Chen; Rentian Yao; Yun Yang; Jie Chen; |
1348 | Understanding The Impact of Adversarial Robustness on Accuracy Disparity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model. |
Yuzheng Hu; Fan Wu; Hongyang Zhang; Han Zhao; |
1349 | Revisiting Simple Regret: Fast Rates for Returning A Good Arm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make a significant progress on minimizing simple regret in both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms and $T$ is the number of samples. |
Yao Zhao; Connor Stephens; Csaba Szepesvari; Kwang-Sung Jun; |
1350 | Improved Online Conformal Prediction Via Strongly Adaptive Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of uncertainty quantification via prediction sets, in an online setting where the data distribution may vary arbitrarily over time. |
Aadyot Bhatnagar; Huan Wang; Caiming Xiong; Yu Bai; |
1351 | Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose sample complexity bounds for learning a simplex from noisy samples. |
seyed amir hossein saberi; Amir Najafi; Abolfazl Motahari; Babak Khalaj; |
1352 | Analyzing Privacy Leakage in Machine Learning Via Multiple Hypothesis Testing: A Lesson From Fano Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. |
Chuan Guo; Alexandre Sablayrolles; Maziar Sanjabi; |
1353 | Fast Combinatorial Algorithms for Min Max Correlation Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce fast algorithms for correlation clustering with respect to the Min Max objective that provide constant factor approximations on complete graphs. |
Sami Davies; Benjamin Moseley; Heather Newman; |
1354 | Grounding Language Models to Images for Multimodal Inputs and Outputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images. |
Jing Yu Koh; Ruslan Salakhutdinov; Daniel Fried; |
1355 | Causal Proxy Models for Concept-based Model Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadata-guided heuristics. |
Zhengxuan Wu; Karel D’Oosterlinck; Atticus Geiger; Amir Zur; Christopher Potts; |
1356 | Scaling Laws for Reward Model Overoptimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use a synthetic setup in which a fixed “gold-standard” reward model plays the role of humans, providing labels used to train a proxy reward model. |
Leo Gao; John Schulman; Jacob Hilton; |
1357 | Escaping Saddle Points in Zeroth-order Optimization: The Power of Two-point Estimators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that by adding an appropriate isotropic perturbation at each iteration, a zeroth-order algorithm based on $2m$ (for any $1 \leq m \leq d$) function evaluations per iteration can not only find $\epsilon$-second order stationary points polynomially fast, but do so using only $\tilde{O}(\frac{d}{m\epsilon^{2}\bar{\psi}})$ function evaluations, where $\bar{\psi} \geq \tilde{\Omega}(\sqrt{\epsilon})$ is a parameter capturing the extent to which the function of interest exhibits the strict saddle property. |
Zhaolin Ren; Yujie Tang; Na Li; |
1358 | SE(3) Diffusion Model with Application to Protein Backbone Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for estimating the SE(3) equivariant score over multiple frames. |
Jason Yim; Brian L. Trippe; Valentin De Bortoli; Emile Mathieu; Arnaud Doucet; Regina Barzilay; Tommi S. Jaakkola; |
1359 | FeDXL: Provable Federated Learning for Deep X-Risk Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing FL algorithms are applicable. |
Zhishuai Guo; Rong Jin; Jiebo Luo; Tianbao Yang; |
1360 | On The Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the correctness of AD when the parameter space of a neural network consists solely of machine-representable numbers. |
Wonyeol Lee; Sejun Park; Alex Aiken; |
1361 | What Can Online Reinforcement Learning with Function Approximation Benefit from General Coverage Conditions? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL. |
Fanghui Liu; Luca Viano; Volkan Cevher; |
1362 | Graph Mixup with Soft Alignments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose S-Mixup, a simple yet effective mixup method for graph classification by soft alignments. |
Hongyi Ling; Zhimeng Jiang; Meng Liu; Shuiwang Ji; Na Zou; |
1363 | Transformers Meet Directed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian — a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. |
Simon Geisler; Yujia Li; Daniel J Mankowitz; Ali Taylan Cemgil; Stephan Günnemann; Cosmin Paduraru; |
1364 | Unveiling The Mask of Position-Information Pattern Through The Mist of Image Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing metrics for quantifying the strength of positional information remain unreliable and frequently lead to erroneous results. To address this issue, we propose novel metrics for measuring and visualizing the encoded positional information. |
Chieh Hubert Lin; Hung-Yu Tseng; Hsin-Ying Lee; Maneesh Kumar Singh; Ming-Hsuan Yang; |
1365 | The Fast Johnson-Lindenstrauss Transform Is Even Faster Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we give a surprising new analysis of the Fast JL transform, showing that the $k \ln^2 n$ term in the embedding time can be improved to $(k \ln^2 n)/\alpha$ for an $\alpha = \Omega(\min\{\varepsilon^{-1}\ln(1/\varepsilon), \ln n\})$. |
Ora Nova Fandina; Mikael Møller Høgsgaard; Kasper Green Larsen; |
1366 | Perturbation Analysis of Neural Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features). |
Tom Tirer; Haoxiang Huang; Jonathan Niles-Weed; |
1367 | Stabilizing Transformer Training By Preventing Attention Entropy Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. |
Shuangfei Zhai; Tatiana Likhomanenko; Etai Littwin; Dan Busbridge; Jason Ramapuram; Yizhe Zhang; Jiatao Gu; Joshua M. Susskind; |
1368 | Normalizing Flows for Interventional Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we estimate the density of potential outcomes after interventions from observational data. |
Valentyn Melnychuk; Dennis Frauen; Stefan Feuerriegel; |
1369 | IncDSI: Incrementally Updatable Document Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). |
Varsha Kishore; Chao Wan; Justin Lovelace; Yoav Artzi; Kilian Q Weinberger; |
1370 | Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations. |
Yuchao Lin; Keqiang Yan; Youzhi Luo; Yi Liu; Xiaoning Qian; Shuiwang Ji; |
1371 | The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned. |
Xingyu Xu; Yandi Shen; Yuejie Chi; Cong Ma; |
1372 | Tighter Bounds on The Expressivity of Transformer Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. |
David Chiang; Peter Cholak; Anand Pillay; |
1373 | Learning Distributions Over Quantum Measurement Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a general problem of learning properties from quantum states: given an unknown $d$-dimensional quantum state $\rho$ and $M$ unknown quantum measurements $\mathcal{M}_1,…,\mathcal{M}_M$ with $K\geq 2$ outcomes, estimating the probability distribution for applying $\mathcal{M}_i$ on $\rho$ to within total variation distance $\epsilon$. |
Weiyuan Gong; Scott Aaronson; |
1374 | DDGR: Continual Learning with Deep Diffusion-based Generative Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, most generative replay methods typically reuse the generated samples to update the generator, which causes the samples regenerated by the generator deviating from the distribution of previous tasks. To overcome these two issues, we propose a novel approach, called deep diffusion-based generative replay (DDGR), which adopts a diffusion model as the generator and calculates an instruction-operator through the classifier to instruct the generation of samples. |
Rui Gao; Weiwei Liu; |
1375 | Efficiently Predicting High Resolution Mass Spectra with Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. |
Michael Murphy; Stefanie Jegelka; Ernest Fraenkel; Tobias Kind; David Healey; Thomas Butler; |
1376 | Model-Aware Contrastive Learning: Towards Escaping The Dilemmas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively. |
Zizheng Huang; Haoxing Chen; Ziqi Wen; Chao Zhang; Huaxiong Li; Bo Wang; Chunlin Chen; |
1377 | ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new learning framework named $\textbf{ConCerNet}$ to improve the trustworthiness of the DNN based dynamics modeling to endow the invariant properties. |
Wang Zhang; Tsui-Wei Weng; Subhro Das; Alexandre Megretski; Luca Daniel; Lam M. Nguyen; |
1378 | Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. |
Yurong Chen; Qian Wang; Zhijian Duan; Haoran Sun; Zhaohua Chen; Xiang Yan; Xiaotie Deng; |
1379 | Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the uncontrollable convergence rate caused by correlations across nodes in the underlying dimensional signal-generating space, we propose to use Wasserstein barycenters as graph-level consensus to combat node-level correlations. |
Xu Chu; Yujie Jin; Xin Wang; Shanghang Zhang; Yasha Wang; Wenwu Zhu; Hong Mei; |
1380 | OpenFE: Automated Feature Generation with Expert-level Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts. |
Tianping Zhang; Zheyu Zhang; Zhiyuan Fan; Haoyan Luo; Fengyuan Liu; Qian Liu; Wei Cao; Jian Li; |
1381 | Random Grid Neural Processes for Parametric Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. |
Arnaud Vadeboncoeur; Ieva Kazlauskaite; Yanni Papandreou; Fehmi Cirak; Mark Girolami; Omer Deniz Akyildiz; |
1382 | PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present PreNAS, a search-free NAS approach that accentuates target models in one-shot training. |
Haibin Wang; Ce Ge; Hesen Chen; Xiuyu Sun; |
1383 | Enhancing Activity Prediction Models in Drug Discovery with The Ability to Understand Human Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. |
Philipp Seidl; Andreu Vall; Sepp Hochreiter; Günter Klambauer; |
1384 | Semi-Parametric Contextual Pricing Algorithm Using Cox Proportional Hazards Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A challenge is that customer valuation is almost never observable in practice and is instead *type-I interval censored* by the offered price. To address this challenge, we propose a novel semi-parametric contextual pricing algorithm for stochastic contexts, called the epoch-based Cox proportional hazards Contextual Pricing (CoxCP) algorithm. |
Young-Geun Choi; Gi-Soo Kim; Yunseo Choi; Wooseong Cho; Myunghee Cho Paik; Min-hwan Oh; |
1385 | Dual Focal Loss for Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our work, we propose a new loss function by focusing on dual logits. |
Linwei Tao; Minjing Dong; Chang Xu; |
1386 | On The Identifiability and Estimation of Causal Location-Scale Noise Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the generality of the model class, we show the causal direction is identifiable up to some pathological cases. To empirically validate these theoretical findings, we propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks. |
Alexander Immer; Christoph Schultheiss; Julia E Vogt; Bernhard Schölkopf; Peter Bühlmann; Alexander Marx; |
1387 | PWSHAP: A Path-Wise Explanation Model for Targeted Variables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Path-Wise Shapley effects (PWSHAP), a framework for assessing the targeted effect of a binary (e.g. treatment) variable from a complex outcome model. |
Lucile Ter-Minassian; Oscar Clivio; Karla DiazOrdaz; Robin J. Evans; Christopher C. Holmes; |
1388 | ModelDiff: A Framework for Comparing Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms. |
Harshay Shah; Sung Min Park; Andrew Ilyas; Aleksander Madry; |
1389 | Shedding A PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, the literature on its statistical properties — or, more accurately, its generalization properties — with respect to the distribution of slices, beyond the uniform measure, is scarce. To bring new contributions to this line of research, we leverage the PAC-Bayesian theory and a central observation that SW may be interpreted as an average risk, the quantity PAC-Bayesian bounds have been designed to characterize. |
Ruben Ohana; Kimia Nadjahi; Alain Rakotomamonjy; Liva Ralaivola; |
1390 | GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the costs, we propose *Guided Adversarial Training * (GAT), a novel adversarial training technique that exploits auxiliary tasks under a limited set of training data. |
Salah GHAMIZI; Jingfeng Zhang; Maxime Cordy; Mike Papadakis; Masashi Sugiyama; YVES LE TRAON; |
1391 | Cold Analysis of Rao-Blackwellized Straight-Through Gumbel-Softmax Gradient Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The state-of-the art in this family, the Gumbel-Rao estimator uses an extra internal sampling to reduce the variance, which may be costly. We analyze this estimator and show that it possesses a zero temperature limit with a surprisingly simple closed form. |
Alexander Shekhovtsov; |
1392 | End-to-End Full-Atom Antibody Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There are two major defects in current learning-based methods: 1) tackling only a certain subtask of the whole antibody design pipeline, making them suboptimal or resource-intensive. 2) omitting either the framework regions or side chains, thus incapable of capturing the full-atom geometry. To address these pitfalls, we propose dynamic Multi-channel Equivariant grAph Network (dyMEAN), an end-to-end full-atom model for E(3)-equivariant antibody design given the epitope and the incomplete sequence of the antibody. |
Xiangzhe Kong; Wenbing Huang; Yang Liu; |
1393 | Efficient Personalized Federated Learning Via Sparse Model-Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. |
Daoyuan Chen; Liuyi Yao; Dawei Gao; Bolin Ding; Yaliang Li; |
1394 | Disentangled Multiplex Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that it is essential for conducting effective and robust UMGRL to extract complete and clean common information, as well as more-complementarity and less-noise private information. |
Yujie Mo; Yajie Lei; Jialie Shen; Xiaoshuang Shi; Heng Tao Shen; Xiaofeng Zhu; |
1395 | Byzantine-Robust Learning on Heterogeneous Data Via Gradient Splitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first reveal the root causes of performance degradation of current robust AGRs in non-IID settings: the curse of dimensionality and gradient heterogeneity. In order to address this issue, we propose GAS, a GrAdient Splitting approach that can successfully adapt existing robust AGRs to non-IID settings. |
Yuchen Liu; Chen Chen; Lingjuan Lyu; Fangzhao Wu; Sai Wu; Gang Chen; |
1396 | Personalized Federated Learning with Inferred Collaboration Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, our core idea is to learn a collaboration graph, which models the benefits from each pairwise collaboration and allocates appropriate collaboration strengths. Based on this, we propose a novel personalized FL algorithm, pFedGraph, which consists of two key modules: (1) inferring the collaboration graph based on pairwise model similarity and dataset size at server to promote fine-grained collaboration and (2) optimizing local model with the assistance of aggregated model at client to promote personalization. |
Rui Ye; Zhenyang Ni; Fangzhao Wu; Siheng Chen; Yanfeng Wang; |
1397 | Progressive Purification for Instance-Dependent Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a theoretically grounded and practically effective approach named POP, i.e. PrOgressive Purification for instance-dependent partial label learning, is proposed. |
Ning Xu; Biao Liu; Jiaqi Lv; Congyu Qiao; Xin Geng; |
1398 | SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Semantic-aware Neural Radiance Fields for Reinforcement Learning (SNeRL), which jointly optimizes semantic-aware neural radiance fields (NeRF) with a convolutional encoder to learn 3D-aware neural implicit representation from multi-view images. |
Dongseok Shim; Seungjae Lee; H. Jin Kim; |
1399 | Constrained Efficient Global Optimization of Expensive Black-box Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CONFIG (CONstrained efFIcient Global Optimization), a simple and effective algorithm to solve it. |
Wenjie Xu; Yuning Jiang; Bratislav Svetozarevic; Colin Jones; |
1400 | How Powerful Are Shallow Neural Networks with Bandlimited Random Weights? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the well-known fact that a neural network is a universal approximator, in this study, we mathematically show that when hidden parameters are distributed in a bounded domain, the network may not achieve zero approximation error. |
Ming Li; Sho Sonoda; Feilong Cao; Yu Guang Wang; Jiye Liang; |
1401 | FusionRetro: Molecule Representation Fusion Via In-Context Learning for Retrosynthetic Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning. |
Songtao Liu; Zhengkai Tu; Minkai Xu; Zuobai Zhang; Lu Lin; Zhitao Ying; Jian Tang; Peilin Zhao; Dinghao Wu; |
1402 | FedBR: Improving Federated Learning on Heterogeneous Data Via Local Learning Bias Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such a scheme is currently constrained by slow and unstable convergence due to the variety of data on different clients’ devices. In this work, we identify three under-explored phenomena of biased local learning that may explain these challenges caused by local updates in supervised FL. |
Yongxin Guo; Xiaoying Tang; Tao Lin; |
1403 | The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of *dynamical stability* (Wu et al., 2018). |
Lei Wu; Weijie J Su; |
1404 | Conformal Inference Is (almost) Free for Neural Networks Trained with Early Stopping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Models trained with early stopping often provide relatively accurate predictions, but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses the above limitation with conformalized early stopping: a novel method that combines early stopping with conformal calibration while efficiently recycling the same hold-out data. |
Ziyi Liang; Yanfei Zhou; Matteo Sesia; |
1405 | Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning Via Class-Imbalance Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. |
Jianyi Zhang; Ang Li; Minxue Tang; Jingwei Sun; Xiang Chen; Fan Zhang; Changyou Chen; Yiran Chen; Hai Li; |
1406 | Orthogonality-Enforced Latent Space in Autoencoders: An Approach to Learning Disentangled Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations that are independent of one another. |
Jaehoon Cha; Jeyan Thiyagalingam; |
1407 | Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the lack of many concepts in gyrovector spaces for the considered manifolds, e.g., the inner product and gyroangles, techniques and mathematical tools provided by these works are still limited compared to those developed for studying hyperbolic geometry. In this paper, we generalize some notions in gyrovector spaces for SPD and Grassmann manifolds, and propose new models and layers for building neural networks on these manifolds. |
Xuan Son Nguyen; Shuo Yang; |
1408 | When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tailored to the NCD problem, we introduce a graph-theoretic representation that can be learned by a novel NCD Spectral Contrastive Loss (NSCL). |
Yiyou Sun; Zhenmei Shi; Yingyu Liang; Yixuan Li; |
1409 | Federated Adversarial Learning: A Framework with Convergence Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formulate a general form of federated adversarial learning (FAL) that is adapted from adversarial learning in the centralized setting. |
Xiaoxiao Li; Zhao Song; Jiaming Yang; |
1410 | Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses. |
Uri Sherman; Tomer Koren; Yishay Mansour; |
1411 | Bayesian Online Change Point Detection with Hilbert Space Approximate Student-t Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a variant of Bayesian online change point detection with a reducedrank Student-t process (TP) and dependent Student-t noise, as a nonparametric time series model. |
Jeremy Sellier; Petros Dellaportas; |
1412 | Graph Positional Encoding Via Random Feature Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, however, there is still no clear understanding of the relation between these two augmentation schemes. Here we propose a novel family of positional encoding schemes which draws a link between the above two approaches and improves over both. |
Moshe Eliasof; Fabrizio Frasca; Beatrice Bevilacqua; Eran Treister; Gal Chechik; Haggai Maron; |
1413 | Averaged Method of Multipliers for Bi-Level Optimization Without Lower-Level Strong Convexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. |
Risheng Liu; Yaohua Liu; Wei Yao; Shangzhi Zeng; Jin Zhang; |
1414 | Adaptive Computation with Elastic Input Sequence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new approach called AdaTape, which allows for dynamic computation in neural networks through adaptive tape tokens. |
Fuzhao Xue; Valerii Likhosherstov; Anurag Arnab; Neil Houlsby; Mostafa Dehghani; Yang You; |
1415 | Policy Contrastive Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. |
Jialei Huang; Zhao-Heng Yin; Yingdong Hu; Yang Gao; |
1416 | CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions. |
Jun Zhang; Shuyang Jiang; Jiangtao Feng; Lin Zheng; Lingpeng Kong; |
1417 | Robust One-Class Classification with Signed Distance Function Using 1-Lipschitz Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method, dubbed One Class Signed Distance Function (OCSDF), to perform One Class Classification (OCC) by provably learning the Signed Distance Function (SDF) to the boundary of the support of any distribution. |
Louis Béthune; Paul Novello; Guillaume Coiffier; Thibaut Boissin; Mathieu Serrurier; Quentin VINCENOT; Andres Troya-Galvis; |
1418 | Set-membership Belief State-based Reinforcement Learning for POMDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel algorithm called Set-membership Belief state-based Reinforcement Learning (SBRL), which consists of two parts: a Set-membership Belief state learning Model (SBM) for learning bounded belief state sets and an RL controller for making decisions based on SBM. |
Wei Wei; Lijun Zhang; Lin Li; Huizhong Song; Jiye Liang; |
1419 | Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider contextual bandits with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{\mathcal O}(\sqrt{T}+\zeta)$. |
Chenlu Ye; Wei Xiong; Quanquan Gu; Tong Zhang; |
1420 | Accuracy on The Curve: On The Nonlinear Correlation of ML Performance Between Data Subpopulations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through rigorous experimentation and analysis across a variety of datasets, models, and training epochs, we demonstrate that OOD performance often has a nonlinear correlation with ID performance in subpopulation shifts. |
Weixin Liang; Yining Mao; Yongchan Kwon; Xinyu Yang; James Zou; |
1421 | Robust Satisficing MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes robust satisficing MDPs (RSMDPs), where the expected returns of feasible policies are softly-constrained to achieve a user-specified target under ambiguity. |
Haolin Ruan; Siyu Zhou; Zhi Chen; Chin Pang Ho; |
1422 | Off-Policy Evaluation for Large Action Spaces Via Conjunct Effect Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called *OffCEM*, that is based on the *conjunct effect model* (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. |
Yuta Saito; Qingyang Ren; Thorsten Joachims; |
1423 | Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, clustering on real-world graph with various levels of homophily poses a new challenge to the graph research community. To fill this gap, we propose a novel graph clustering method, which contains three key components: graph reconstruction, a mixed filter, and dual graph clustering network. |
Erlin Pan; zhao kang; |
1424 | Poisoning Language Models During Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired trigger phrase appears in the input. |
Alexander Wan; Eric Wallace; Sheng Shen; Dan Klein; |
1425 | Learning to Optimize Differentiable Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing algorithms for solving games suffer from empirical instability, hence demanding heavy ad-hoc tuning in practice. To tackle these challenges, we resort to the emerging scheme of Learning to Optimize (L2O), which discovers problem-specific efficient optimization algorithms through data-driven training. |
Xuxi Chen; Nelson Vadori; Tianlong Chen; Zhangyang Wang; |
1426 | Decentralized SGD and Average-direction SAM Are Asymptotically Equivalent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. |
Tongtian Zhu; Fengxiang He; Kaixuan Chen; Mingli Song; Dacheng Tao; |
1427 | A Closer Look at The Intervention Procedure of Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While such intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain rather unexplored. In this work, we develop various ways of selecting intervening concepts to improve the intervention effectiveness and conduct an array of in-depth analyses as to how they evolve under different circumstances. |
Sungbin Shin; Yohan Jo; Sungsoo Ahn; Namhoon Lee; |
1428 | Regression with Label Permutation in Generalized Linear Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses. |
Guanhua Fang; Ping Li; |
1429 | A Coupled Flow Approach to Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate applications of a normalizing flow based model for the aforementioned distributions. |
Gideon Joseph Freund; Elad Sarafian; Sarit Kraus; |
1430 | Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a principled way to extend gradient inversion attacks to weight updates in FL, thereby better exposing weaknesses in the presumed privacy protection inherent in FL. |
Junyi Zhu; Ruicong Yao; Matthew B. Blaschko; |
1431 | Reliable Measures of Spread in High Dimensional Latent Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that the commonly used measures of data spread, average cosine similarity and a partition function min/max ratio I(V), do not provide reliable metrics to compare the use of latent space across data distributions. We propose and examine six alternative measures of data spread, all of which improve over these current metrics when applied to seven synthetic data distributions. |
Anna Marbut; Katy McKinney-Bock; Travis J Wheeler; |
1432 | Adaptive Annealed Importance Sampling with Constant Rate Progress Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained. |
Shirin Goshtasbpour; Victor Cohen; Fernando Perez-Cruz; |
1433 | Large Language Models Struggle to Learn Long-Tail Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web. |
Nikhil Kandpal; Haikang Deng; Adam Roberts; Eric Wallace; Colin Raffel; |
1434 | Neural Diffusion Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Neural network approaches for meta-learning distributions over functions have desirable properties such as increased flexibility and a reduced complexity of inference. Building on the successes of denoising diffusion models for generative modelling, we propose Neural Diffusion Processes (NDPs), a novel approach that learns to sample from a rich distribution over functions through its finite marginals. |
Vincent Dutordoir; Alan Saul; Zoubin Ghahramani; Fergus Simpson; |
1435 | Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). |
Ruijiang Dong; Feng Liu; Haoang Chi; Tongliang Liu; Mingming Gong; Gang Niu; Masashi Sugiyama; Bo Han; |
1436 | Target-Aware Generative Augmentations for Single-Shot Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of adapting models from a source domain to a target domain, a task that has become increasingly important due to the brittle generalization of deep neural networks. |
Kowshik Thopalli; Rakshith Subramanyam; Pavan K. Turaga; Jayaraman J. Thiagarajan; |
1437 | Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. |
Hendrik Fichtenberger; Monika Henzinger; Jalaj Upadhyay; |
1438 | PixelAsParam: A Gradient View on Diffusion Sampling with Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current guidance on denoising processes suffers from the trade-off between diversity, image quality, and conditional information. In this work, we propose to view this guidance sampling process from a gradient view, where image pixels are treated as parameters being optimized, and each mathematical term in the sampling process represents one update direction. |
Anh-Dung Dinh; Daochang Liu; Chang Xu; |
1439 | Rigid Body Flows for Sampling Molecular Crystal Structures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new type of normalizing flow that is tailored for modeling positions and orientations of multiple objects in three-dimensional space, such as molecules in a crystal. |
Jonas Köhler; Michele Invernizzi; Pim De Haan; Frank Noe; |
1440 | On The Power of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This question cannot be answered by the existing theory of representation, optimization or generalization, because the issues they mainly investigate are assumed to be nonexistent here. In this paper, we show that category theory provides powerful machinery to answer this question. |
Yang Yuan; |
1441 | Detecting Adversarial Data By Probing Multiple Perturbations Using Expected Perturbation Score Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. |
Shuhai Zhang; Feng Liu; Jiahao Yang; Yifan Yang; Changsheng Li; Bo Han; Mingkui Tan; |
1442 | CircuitNet: A Generic Neural Network to Realize Universal Circuit Motif Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new type of neural network inspired by the architectures of neuronal circuits, namely Circuit Neural Network (CircuitNet). |
Yansen Wang; XINYANG JIANG; Kan Ren; Caihua Shan; Xufang Luo; Dongqi Han; Kaitao Song; Yifei Shen; Dongsheng Li; |
1443 | Controlled Differential Equations on Long Sequences Via Non-standard Wavelets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For tasks where it is sensible to assume that the (long) sequences in the training data are a fixed length of temporal measurements — this assumption holds in most experiments tackled in the literature — we describe an efficient simplification. |
Sourav Pal; Zhanpeng Zeng; Sathya N. Ravi; Vikas Singh; |
1444 | InfoOT: Information Maximizing Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. |
Ching-Yao Chuang; Stefanie Jegelka; David Alvarez-Melis; |
1445 | On The Connection Between MPNN and Graph Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT. |
Chen Cai; Truong Son Hy; Rose Yu; Yusu Wang; |
1446 | Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements. |
Jikai Jin; Zhiyuan Li; Kaifeng Lyu; Simon Shaolei Du; Jason D. Lee; |
1447 | Interval Bound Interpolation for Few-shot Learning with Few Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. |
Shounak Datta; Sankha Subhra Mullick; Anish Chakrabarty; Swagatam Das; |
1448 | NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the neural tangent kernel (NTK)–which reveals the gradient descent dynamics of neural networks–of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. |
Tianxin Wei; Zeming Guo; Yifan Chen; Jingrui He; |
1449 | Are Equivariant Equilibrium Approximators Beneficial? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically characterize the benefits and limitations of equivariant equilibrium approximators. |
Zhijian Duan; Yunxuan Ma; Xiaotie Deng; |
1450 | Everyone’s Preference Changes Differently: A Weighted Multi-Interest Model For Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Multi-Interest Preference (MIP) model, an approach that not only produces multi-interest for users by using the user’s sequential engagement more effectively but also automatically learns a set of weights to represent the preference over each embedding so that the candidates can be retrieved from each interest proportionally. |
Hui Shi; Yupeng Gu; Yitong Zhou; Bo Zhao; Sicun Gao; Jishen Zhao; |
1451 | ChiPFormer: Transferable Chip Placement Via Offline Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such an RL-based approach suffers from long training time and low transfer ability in unseen chip circuits. To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data. |
Yao Lai; Jinxin Liu; Zhentao Tang; Bin Wang; Jianye HAO; Ping Luo; |
1452 | MetricGAN-OKD: Multi-Metric Optimization of MetricGAN Via Online Knowledge Distillation for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective multi-metric optimization method in MetricGAN via online knowledge distillation—MetricGAN-OKD. |
Wooseok Shin; Byung Hoon Lee; Jin Sob Kim; Hyun Joon Park; Sung Won Han; |
1453 | Learning to Maximize Mutual Information for Dynamic Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. |
Ian Connick Covert; Wei Qiu; MingYu Lu; Na Yoon Kim; Nathan J White; Su-In Lee; |
1454 | Complementary Attention for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Complementary Attention for Multi-Agent reinforcement learning (CAMA), which applies a divide-and-conquer strategy on input entities accompanied with the complementary attention of enhancement and replenishment. |
Jianzhun Shao; Hongchang Zhang; Yun Qu; Chang Liu; Shuncheng He; Yuhang Jiang; Xiangyang Ji; |
1455 | 2D-Shapley: A Framework for Fragmented Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis. |
Liu Zhihong; Hoang Anh Just; Xiangyu Chang; Xi Chen; Ruoxi Jia; |
1456 | Explaining The Effects of Non-convergent MCMC in The Training of Energy-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs). |
Elisabeth Agoritsas; Giovanni Catania; Aurélien Decelle; Beatriz Seoane; |
1457 | Tight Certification of Adversarially Trained Neural Networks Via Nonconvex Low-Rank Semidefinite Relaxations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. |
Hong-Ming Chiu; Richard Y. Zhang; |
1458 | Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to study and promote the robustness of a model as per its performance on a continuous geodesic interpolation of subpopulations, e.g., a class of samples in a classification problem. |
Jiacheng Zhu; Jielin Qiu; Aritra Guha; Zhuolin Yang; XuanLong Nguyen; Bo Li; Ding Zhao; |
1459 | Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts. |
Shayan Shirahmad Gale Bagi; Zahra Gharaee; Oliver Schulte; Mark Crowley; |
1460 | Drug Discovery Under Covariate Shift with Domain-Informed Prior Distributions Over Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift—a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. |
Leo Klarner; Tim G. J. Rudner; Michael Reutlinger; Torsten Schindler; Garrett M Morris; Charlotte Deane; Yee Whye Teh; |
1461 | Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a Bayesian network to inaugurate correlations between agents’ action selections in their joint policy. |
Dingyang Chen; Qi Zhang; |
1462 | Sketching for First Order Method: Efficient Algorithm for Low-Bandwidth Channel and Vulnerability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It enables runtime and memory saving via randomly compressing the original large problem into lower dimensions. In this paper, we propose a novel sketching scheme for the first order method in large-scale distributed learning setting, such that the communication costs between distributed agents are saved while the convergence of the algorithms is still guaranteed. |
Zhao Song; Yitan Wang; Zheng Yu; Lichen Zhang; |
1463 | DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as Numpy and Pandas.We release our benchmark at https://ds1000-code-gen.github.io. |
Yuhang Lai; Chengxi Li; Yiming Wang; Tianyi Zhang; Ruiqi Zhong; Luke Zettlemoyer; Wen-tau Yih; Daniel Fried; Sida Wang; Tao Yu; |
1464 | Optimal Goal-Reaching Reinforcement Learning Via Quasimetric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. |
Tongzhou Wang; Antonio Torralba; Phillip Isola; Amy Zhang; |
1465 | MEWL: Few-shot Multimodal Word Learning with Referential Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent advancements in multimodal learning, a systematic and rigorous evaluation is still missing for human-like word learning in machines. To fill in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess how machines learn word meaning in grounded visual scenes. |
Guangyuan Jiang; Manjie Xu; Shiji Xin; Wei Liang; Yujia Peng; Chi Zhang; Yixin Zhu; |
1466 | A Kernel Stein Test of Goodness of Fit for Sequential Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. |
Jerome Baum; Heishiro Kanagawa; Arthur Gretton; |
1467 | Learning Functional Distributions with Private Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of learning functional distributions in the presence of noise. |
Changlong Wu; Yifan Wang; Ananth Grama; Wojciech Szpankowski; |
1468 | I$^2$SB: Image-to-Image Schrödinger Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. |
Guan-Horng Liu; Arash Vahdat; De-An Huang; Evangelos Theodorou; Weili Nie; Anima Anandkumar; |
1469 | LookupFFN: Making Transformers Compute-lite for CPU Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose an alternative formulation (we call it LookupFFN) to GEMM based FFNs inspired by the recent studies of using Locality Sensitive Hashing (LSH) to approximate FFNs. |
Zhanpeng Zeng; Michael Davies; Pranav Pulijala; Karthikeyan Sankaralingam; Vikas Singh; |
1470 | Learning to Learn from APIs: Black-Box Data-Free Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing DFML work can only meta-learn from (i) white-box and (ii) small-scale pre-trained models (iii) with the same architecture, neglecting the more practical setting where the users only have inference access to the APIs with arbitrary model architectures and model scale inside. To solve this issue, we propose a Bi-level Data-free Meta Knowledge Distillation (BiDf-MKD) framework to transfer more general meta knowledge from a collection of black-box APIs to one single meta model. |
Zixuan Hu; Li Shen; Zhenyi Wang; Baoyuan Wu; Chun Yuan; Dacheng Tao; |
1471 | Continual Vision-Language Representation Learning with Off-Diagonal Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate SD, we propose a new continual vision-language representation learning framework **Mod-X**: **M**aintain **o**ff-**d**iagonal information-matri**X**. |
Zixuan Ni; Longhui Wei; Siliang Tang; Yueting Zhuang; Qi Tian; |
1472 | On The Convergence of The MLE As An Estimator of The Learning Rate in The Exp3 Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our objective in this work is to show that the estimation of the learning rate cannot be efficient if the learning rate is constant in the classical Exp3 (Exponential weights for Exploration and Exploitation) algorithm. |
Julien Aubert; Luc Lehéricy; Patricia Reynaud-Bouret; |
1473 | DugMatting: Decomposed-Uncertainty-Guided Matting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a decomposed-uncertainty-guided matting (dugMatting) algorithm, which explores the explicitly decomposed uncertainties to efficiently and effectively improve the results. |
Jiawei Wu; Changqing Zhang; Zuoyong Li; Huazhu Fu; Xi Peng; Joey Tianyi Zhou; |
1474 | Differentially Private Stochastic Convex Optimization Under A Quantile Loss Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study $(\varepsilon,\delta)$-differentially private (DP) stochastic convex optimization under an $r$-th quantile loss function taking the form $c(u) = ru^+ + (1-r)(-u)^+$. |
Du Chen; Geoffrey A. Chua; |
1475 | Revisiting Discriminative Vs. Generative Classifiers: Theory and Implications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. |
Chenyu Zheng; Guoqiang Wu; Fan Bao; Yue Cao; Chongxuan Li; Jun Zhu; |
1476 | Quantum Policy Gradient Algorithm with Optimized Action Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Focusing on applications in quantum reinforcement learning, we propose an action decoding procedure for a quantum policy gradient approach. |
Nico Meyer; Daniel D. Scherer; Axel Plinge; Christopher Mutschler; Michael J. Hartmann; |
1477 | Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite being efficient, Max-SW and its amortized version cannot guarantee metricity property due to the sub-optimality of the projected gradient ascent and the amortization gap. Therefore, we propose to replace Max-SW with distributional sliced Wasserstein distance with von Mises-Fisher (vMF) projecting distribution (v-DSW). |
Khai Nguyen; Dang Nguyen; Nhat Ho; |
1478 | Learning Affinity with Hyperbolic Representation for Spatial Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that the properties of hyperbolic geometry serve as a valuable alternative to learning hierarchical affinity for spatial propagation tasks. |
Jin-Hwi Park; Jaesung Choe; Inhwan Bae; Hae-Gon Jeon; |
1479 | Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data. In this work, we propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps by 1) introducing pseudo prompt enhancement with a distill-then-reprogram approach, it alleviates data scarcity with orders of magnitude concept compositions by using language-free audios; 2) leveraging spectrogram autoencoder to predict the self-supervised audio representation instead of waveforms. |
Rongjie Huang; Jiawei Huang; Dongchao Yang; Yi Ren; Luping Liu; Mingze Li; Zhenhui Ye; Jinglin Liu; Xiang Yin; Zhou Zhao; |
1480 | Synthetic Data for Model Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to using synthetic data for training, in this work we explore whether synthetic data can be beneficial for model selection. |
Alon Shoshan; Nadav Bhonker; Igor Kviatkovsky; Matan Fintz; Gerard Medioni; |
1481 | Learning Subpocket Prototypes for Generalizable Structure-based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method DrugGPS for generalizable structure-based drug design. |
ZAIXI ZHANG; Qi Liu; |
1482 | Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a systematic study of quantum lower bounds on finding $\epsilon$-approximate stationary points of nonconvex functions, and we consider the following two important settings: 1) having access to $p$-th order derivatives; or 2) having access to stochastic gradients. |
Chenyi Zhang; Tongyang Li; |
1483 | Graph Neural Networks Can Recover The Hidden Features Solely from The Graph Structure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate whether GNNs can exploit the graph structure from the perspective of the expressive power of GNNs. |
Ryoma Sato; |
1484 | For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods Are Created Equal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF).To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work. |
Yingdong Hu; Renhao Wang; Li Erran Li; Yang Gao; |
1485 | SurProGenes: Survival Risk-Ordered Representation of Cancer Patients and Genes for The Identification of Prognostic Genes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, most cancer genomics studies lack appropriate low-risk groups against which to compare. To address these issues, we present a framework that identifies candidate prognostic genes by integrating representation learning and statistical analysis approaches. |
Junetae Kim; Kyoungsuk Park; Hanseok Jeong; Youngwook KIM; Jeongseon Kim; Sun-Young Kim; |
1486 | Better Training of GFlowNets with Local Credit and Incomplete Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the case where the energy function can be applied not just to terminal states but also to intermediate states. |
Ling Pan; Nikolay Malkin; Dinghuai Zhang; Yoshua Bengio; |
1487 | Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the feasibility of employing INT4 weight and activation (W4A4) quantization for language models. |
Xiaoxia Wu; Cheng Li; Reza Yazdani Aminabadi; Zhewei Yao; Yuxiong He; |
1488 | Defects of Convolutional Decoder Networks in Frequency Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove the representation defects of a cascaded convolutional decoder network, considering the capacity of representing different frequency components of an input sample. |
Ling Tang; Wen Shen; Zhanpeng Zhou; YueFeng Chen; Quanshi Zhang; |
1489 | Exploring Chemical Space with Score-based Out-of-distribution Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. |
Seul Lee; Jaehyeong Jo; Sung Ju Hwang; |
1490 | Concept-based Explanations for Out-of-Distribution Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector’s decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors. |
Jihye Choi; Jayaram Raghuram; Ryan Feng; Jiefeng Chen; Somesh Jha; Atul Prakash; |
1491 | Optimal Shrinkage for Distributed Second-Order Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of Hessian inversion bias in distributed second-order optimization algorithms. |
Fangzhao Zhang; Mert Pilanci; |
1492 | Not All Strongly Rayleigh Distributions Have Small Probabilistic Generating Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They raised the question whether every strongly Rayleigh distribution can be efficiently represented by such circuits. We prove that this question has a negative answer, there are strongly Rayleigh distributions that cannot be represented by polynomial-sized probabilistic generating circuits, assuming a widely accepted complexity theoretic conjecture. |
Markus Bläser; |
1493 | Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we simplify such difficulties for a class of structured symmetric positive-definite matrices with the affine-invariant metric. |
Wu Lin; Valentin Duruisseaux; Melvin Leok; Frank Nielsen; Mohammad Emtiyaz Khan; Mark Schmidt; |
1494 | Nugget: Neural Agglomerative Embeddings of Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a solution called Nugget, which encodes language into a representation based on a dynamically selected subset of input tokens. |
Guanghui Qin; Benjamin Van Durme; |
1495 | Automatically Marginalized MCMC in Probabilistic Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to use automatic marginalization as part of the sampling process using HMC in a graphical model extracted from a PPL, which substantially improves sampling from real-world hierarchical models. |
Jinlin Lai; Javier Burroni; Hui Guan; Daniel Sheldon; |
1496 | Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs. |
Toshinori Kitamura; Tadashi Kozuno; Yunhao Tang; Nino Vieillard; Michal Valko; Wenhao Yang; Jincheng Mei; Pierre MENARD; Mohammad Gheshlaghi Azar; Remi Munos; Olivier Pietquin; Matthieu Geist; Csaba Szepesvari; Wataru Kumagai; Yutaka Matsuo; |
1497 | Neural Networks Trained with SGD Learn Distributions of Increasing Complexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. |
Maria Refinetti; Alessandro Ingrosso; Sebastian Goldt; |
1498 | Regret Minimization and Convergence to Equilibria in General-sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. |
Liad Erez; Tal Lancewicki; Uri Sherman; Tomer Koren; Yishay Mansour; |
1499 | Learning to Decouple Complex Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This setting is fairly common in the real world but has been less considered. In this paper, we propose a sequential learning approach under this setting by decoupling a complex system for handling irregularly sampled and cluttered sequential observations. |
Zihan Zhou; Tianshu Yu; |
1500 | Prompting Large Language Model for Machine Translation: A Case Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. |
Biao Zhang; Barry Haddow; Alexandra Birch; |
1501 | Non-autoregressive Conditional Diffusion Models for Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TimeDiff, a non-autoregressive diffusion model that achieves high-quality time series prediction with the introduction of two novel conditioning mechanisms: future mixup and autoregressive initialization. |
Lifeng Shen; James Kwok; |
1502 | FedDisco: Federated Learning with Discrepancy-Aware Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose a novel aggregation method, Federated Learning with Discrepancy-Aware Collaboration (FedDisco), whose aggregation weights not only involve both the dataset size and the discrepancy value, but also contribute to a tighter theoretical upper bound of the optimization error. |
Rui Ye; Mingkai Xu; Jianyu Wang; Chenxin Xu; Siheng Chen; Yanfeng Wang; |
1503 | Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel smoothing technique that creates a surrogate market game, in which first-order methods can be applied. |
Nils Kohring; Fabian Raoul Pieroth; Martin Bichler; |
1504 | SpeedDETR: Speed-aware Transformers for End-to-end Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main issue is that the current literature solely concentrates on building algorithms with minimal computation, oblivious that the practical latency can also be affected by the memory access cost and the degree of parallelism. Therefore, we propose SpeedDETR, a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices. |
Peiyan Dong; Zhenglun Kong; Xin Meng; PENG ZHANG; Hao Tang; Yanzhi Wang; Chih-Hsien Chou; |
1505 | CLIPood: Generalizing CLIP to Out-of-Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on the unseen test data. |
Yang Shu; Xingzhuo Guo; Jialong Wu; Ximei Wang; Jianmin Wang; Mingsheng Long; |
1506 | Randomized Gaussian Process Upper Confidence Bound with Tighter Bayesian Regret Bounds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study first generalizes the regret analysis of RGP-UCB to a wider class of distributions, including the Gamma distribution. Furthermore, we propose improved RGP-UCB (IRGP-UCB) based on a two-parameter exponential distribution, which achieves tighter Bayesian regret bounds. |
Shion Takeno; Yu Inatsu; Masayuki Karasuyama; |
1507 | Scaling of Class-wise Training Losses for Post-hoc Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To resolve the issue, we propose a new calibration method to synchronize the class-wise training losses. |
Seungjin Jung; Seungmo Seo; Yonghyun Jeong; Jongwon Choi; |
1508 | Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent success of chain-of-thought prompting, we propose ChainCoder, a program synthesis language model that generates Python code progressively, i.e. from coarse to fine in multiple passes. |
Wenqing Zheng; S P Sharan; AJAY KUMAR JAISWAL; Kevin Wang; Yihan Xi; Dejia Xu; Zhangyang Wang; |
1509 | Two-Scale Gradient Descent Ascent Dynamics Finds Mixed Nash Equilibria of Continuous Games: A Mean-Field Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More precisely we show that for each finite temperature (or regularization parameter), the two-scale Mean-Field GDA with a suitable finite scale ratio converges exponentially to the unique MNE without assuming the convexity or concavity of the interaction potential. |
Yulong Lu; |
1510 | Learning Antidote Data to Individual Unfairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such adversarial perturbations along a direction covering sensitive information used in DRO do not consider the inherent feature correlations or innate data constraints, therefore could mislead the model to optimize at off-manifold and unrealistic samples. In light of this drawback, in this paper, we propose to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. |
Peizhao Li; Ethan Xia; Hongfu Liu; |
1511 | A Law of Robustness Beyond Isoperimetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the robust interpolation problem of arbitrary data distributions supported on a bounded space and propose a two-fold law of robustness. |
Yihan Wu; Heng Huang; Hongyang Zhang; |
1512 | Tight and Fast Generalization Error Bound of Graph Embedding in Metric Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a novel upper bound of graph embedding’s generalization error by evaluating the local Rademacher complexity of the model as a function set of the distances of representation couples. |
Atsushi Suzuki; Atsushi Nitanda; Taiji Suzuki; jing wang; Feng Tian; Kenji Yamanishi; |
1513 | ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. |
Kaiwen Zhou; Kaizhi Zheng; Connor Pryor; Yilin Shen; Hongxia Jin; Lise Getoor; Xin Eric Wang; |
1514 | Stratified Adversarial Robustness with Rejection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We theoretically analyze the stratified rejection setting and propose a novel defense method — Adversarial Training with Consistent Prediction-based Rejection (CPR) — for building a robust selective classifier. |
Jiefeng Chen; Jayaram Raghuram; Jihye Choi; Xi Wu; Yingyu Liang; Somesh Jha; |
1515 | No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By swapping the computational order of aggregation and broadcasting, we propose a novel and efficient parallel federated learning (PFL) framework that unlocks the edge nodes during global computation and the central server during local computation. |
Feilong Zhang; Xianming Liu; Shiyi Lin; Gang Wu; Xiong Zhou; Junjun Jiang; Xiangyang Ji; |
1516 | Hierarchical Programmatic Reinforcement Learning Via Learning to Compose Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the encouraging results, the program policies that LEAPS can produce are limited by the distribution of the program dataset. Furthermore, during searching, LEAPS evaluates each candidate program solely based on its return, failing to precisely reward correct parts of programs and penalize incorrect parts. To address these issues, we propose to learn a meta-policy that composes a series of programs sampled from the learned program embedding space. |
Guan-Ting Liu; En-Pei Hu; Pu-Jen Cheng; Hung-yi Lee; Shao-Hua Sun; |
1517 | Generalization Analysis for Contrastive Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms. |
Yunwen Lei; Tianbao Yang; Yiming Ying; Ding-Xuan Zhou; |
1518 | Fundamental Tradeoffs in Learning with Prior Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general reduction-based approach for extending classical minimax lower-bound techniques in order to lower bound the prioritized risk for statistical estimation problems. |
Anirudha Majumdar; |
1519 | Understanding Backdoor Attacks Through The Adaptability Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the adaptability hypothesis to understand when and why a backdoor attack works for general learning models, including deep neural networks, based on the theoretical investigation of classical kernel-based learning models. |
Xun Xian; Ganghua Wang; Jayanth Srinivasa; Ashish Kundu; Xuan Bi; Mingyi Hong; Jie Ding; |
1520 | From Hypergraph Energy Functions to Hypergraph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Somewhat differently, in this paper we begin by presenting an expressive family of parameterized, hypergraph-regularized energy functions. |
Yuxin Wang; Quan Gan; Xipeng Qiu; Xuanjing Huang; David Wipf; |
1521 | Continual Learners Are Incremental Model Generalizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. |
Jaehong Yoon; Sung Ju Hwang; Yue Cao; |
1522 | Does A Neural Network Really Encode Symbolic Concepts? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. |
Mingjie Li; Quanshi Zhang; |
1523 | Towards Unbiased Training in Federated Open-world Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel **Fed**erated**o**pen-world **S**emi-**S**upervised **L**earning (**FedoSSL**) framework, which can solve the key challenge in distributed and open-world settings, i.e., the biased training process for heterogeneously distributed unseen classes. |
Jie ZHANG; Xiaosong Ma; Song Guo; Wenchao Xu; |
1524 | LinSATNet: The Positive Linear Satisfiability Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions. |
Runzhong Wang; Yunhao Zhang; Ziao Guo; Tianyi Chen; Xiaokang Yang; Junchi Yan; |
1525 | Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound. |
Chuhan Xie; Wenhao Yang; Zhihua Zhang; |
1526 | Optimal Arms Identification with Knapsacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel OAK algorithm and prove the upper bound of our algorithm by exploring the relationship between selecting optimal actions and the structure of the feasible region. |
Shaoang Li; Lan Zhang; Yingqi Yu; Xiangyang Li; |
1527 | DIVISION: Memory Efficient Training Via Dual Activation Precision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a simple and effective method to compress DNN training. |
Guanchu Wang; Zirui Liu; Zhimeng Jiang; Ninghao Liu; Na Zou; Xia Hu; |
1528 | Open-VCLIP: Transforming CLIP to An Open-vocabulary Video Model Via Interpolated Weight Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Open-VCLIP, a simple yet effective approach that transforms CLIP into a strong zero-shot video classifier that can recognize unseen actions and events at test time. |
Zejia Weng; Xitong Yang; Ang Li; Zuxuan Wu; Yu-Gang Jiang; |
1529 | Learning to Boost Training By Periodic Nowcasting Near Future Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our observations on 1) high correlation between past eights and future weights, 2) conditions for beneficial weight prediction, and 3) feasibility of weight prediction, we propose a more general framework by intermittently skipping a handful of epochs by periodically forecasting near future weights, i.e., a Weight Nowcaster Network (WNN). |
Jinhyeok Jang; Woo-han Yun; Won Hwa Kim; Youngwoo Yoon; Jaehong Kim; Jaeyeon Lee; ByungOk Han; |
1530 | Efficient and Degree-Guided Graph Generation Via Discrete Diffusion Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose EDGE, a new diffusion-based generative graph model that addresses generative tasks with large graphs. |
Xiaohui Chen; Jiaxing He; Xu Han; Liping Liu; |
1531 | Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. |
Jeffrey Willette; Seanie Lee; Bruno Andreis; Kenji Kawaguchi; Juho Lee; Sung Ju Hwang; |
1532 | RLEG: Vision-Language Representation Learning with Diffusion-based Embedding Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By virtue of generative capability in this paper, we propose a novel vision-language Representation Learning method with diffusion-based Embedding Generation (RLEG), which exploits diffusion models to generate feature embedding online for learning effective vision-language representation. |
Liming Zhao; Kecheng Zheng; Yun Zheng; Deli Zhao; Jingren Zhou; |
1533 | The Ideal Continual Learner: An Agent That Never Forgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a rigorous theoretical understanding of these methods remains elusive. This paper aims to bridge this gap between theory and practice by proposing a new continual learning framework called Ideal Continual Learner (ICL), which is guaranteed to avoid catastrophic forgetting by construction. |
Liangzu Peng; Paris Giampouras; Rene Vidal; |
1534 | TIPS: Topologically Important Path Sampling for Anytime Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations of existing hand-crafted approaches, we first model the training process of AnytimeNNs as a discrete-time Markov chain (DTMC) and use it to identify the paths that contribute the most to the training of AnytimeNNs. Based on this new DTMC-based analysis, we further propose TIPS, a framework to automatically design AnytimeNNs under various hardware constraints. |
Guihong Li; Kartikeya Bhardwaj; Yuedong Yang; Radu Marculescu; |
1535 | Unleashing Mask: Explore The Intrinsic Out-of-Distribution Detection Capability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we generally discover the existence of an intermediate stage of a model trained on in-distribution (ID) data having higher OOD detection performance than that of its final stage across different settings, and further identify one critical data-level attribution to be learning with the atypical samples. Based on such insights, we propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. |
Jianing Zhu; Hengzhuang Li; Jiangchao Yao; Tongliang Liu; Jianliang Xu; Bo Han; |
1536 | Exploring Model Dynamics for Accumulative Poisoning Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we dive into the perspective of model dynamics and propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. |
Jianing Zhu; Xiawei Guo; Jiangchao Yao; Chao Du; LI He; Shuo Yuan; Tongliang Liu; Liang Wang; Bo Han; |
1537 | NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test-time. |
Jiatao Gu; Alex Trevithick; Kai-En Lin; Joshua M. Susskind; Christian Theobalt; Lingjie Liu; Ravi Ramamoorthi; |
1538 | Men Also Do Laundry: Multi-Attribute Bias Amplification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate models can learn to exploit correlations with respect to multiple attributes, which are not accounted for by current metrics. Moreover, we show that current metrics can give the erroneous impression that little to no bias amplification has occurred as they aggregate positive and negative bias scores. Further, these metrics lack an ideal value, making them difficult to interpret. To address these shortcomings, we propose a new metric: $\textit{Multi-Attribute Bias Amplification}$. |
Dora Zhao; Jerone Andrews; Alice Xiang; |
1539 | Muse: Text-To-Image Generation Via Masked Generative Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Muse, a text-to-image Transformermodel that achieves state-of-the-art image genera-tion performance while being significantly moreefficient than diffusion or autoregressive models.Muse is trained on a masked modeling task indiscrete token space: given the text embeddingextracted from a pre-trained large language model(LLM), Muse learns to predict randomly maskedimage tokens. |
Huiwen Chang; Han Zhang; Jarred Barber; Aaron Maschinot; Jose Lezama; Lu Jiang; Ming-Hsuan Yang; Kevin Patrick Murphy; William T. Freeman; Michael Rubinstein; Yuanzhen Li; Dilip Krishnan; |
1540 | Learning Noisy OR Bayesian Networks with Max-Product Belief Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose parallel max-product as an alternative algorithm for learning noisy-OR BNs with complex latent structures and we derive a fast stochastic training scheme that scales to large datasets. |
Antoine Dedieu; Guangyao Zhou; Dileep George; Miguel Lazaro-Gredilla; |
1541 | Learning Preconditioners for Conjugate Gradient PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method that leverages learning-based approach to obtain an approximate matrix factorization to the system matrix to be used as a preconditioner in the context of PCG solvers. |
Yichen Li; Peter Yichen Chen; Tao Du; Wojciech Matusik; |
1542 | The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. |
Sarah Rathnam; Sonali Parbhoo; Weiwei Pan; Susan Murphy; Finale Doshi-Velez; |
1543 | FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to offer an efficient solution to TPP inference using general parametric kernels with finite support. |
Guillaume Staerman; Cédric Allain; Alexandre Gramfort; Thomas Moreau; |
1544 | Is Learning Summary Statistics Necessary for Likelihood-free Inference? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A longstanding question in LFI has been how to design or learn good summary statistics of data, but this might now seem unnecessary due to the advent of recent end-to-end (i.e. neural network-based) LFI methods. In this work, we rethink this question with a new method for learning summary statistics. |
Yanzhi Chen; Michael U. Gutmann; Adrian Weller; |
1545 | FLEX: An Adaptive Exploration Algorithm for Nonlinear Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. |
Matthieu Blanke; Marc Lelarge; |
1546 | Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a novel progressive deep topic model that consists of a knowledge-informed textural data coarsening process and a corresponding progressive generative model. |
Zhibin Duan; Xinyang Liu; Yudi Su; Yi.shi Xu; Bo Chen; Mingyuan Zhou; |
1547 | ClusterFuG: Clustering Fully Connected Graphs By Multicut Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a graph clustering formulation based on multicut (a.k.a. weighted correlation clustering) on the complete graph. |
Ahmed Abbas; Paul Swoboda; |
1548 | A Unified Optimization Framework of ANN-SNN Conversion: Towards Optimal Mapping from Activation Values to Firing Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a unified optimization framework for ANN-SNN conversion that considers both performance loss and conversion error. |
Haiyan Jiang; Srinivas Anumasa; Giulia De Masi; Huan Xiong; Bin Gu; |
1549 | PFNs4BO: In-Context Learning for Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO). |
Samuel Müller; Matthias Feurer; Noah Hollmann; Frank Hutter; |
1550 | Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Meta-SAGE, a novel approach for improving the scalability of deep reinforcement learning models for combinatorial optimization (CO) tasks. |
Jiwoo Son; Minsu Kim; Hyeonah Kim; Jinkyoo Park; |
1551 | Improving The Model Consistency of Decentralized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topologies. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. |
Yifan Shi; Li Shen; Kang Wei; Yan Sun; Bo Yuan; Xueqian Wang; Dacheng Tao; |
1552 | Dink-Net: Neural Clustering on Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the existing methods fail to scale to the large graph with million nodes. To solve this problem, a scalable deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. |
Yue Liu; KE LIANG; Jun Xia; sihang zhou; Xihong Yang; Xinwang Liu; Stan Z. Li; |
1553 | Weak Proxies Are Sufficient and Preferable for Fairness with Missing Sensitive Attributes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness. Second, we develop an algorithm that is able to measure fairness (provably) accurately with only three properly identified proxies. |
Zhaowei Zhu; Yuanshun Yao; Jiankai Sun; Hang Li; Yang Liu; |
1554 | Improving Visual Prompt Tuning for Self-supervised Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, identifying the optimal blocks for prompts within each self-supervised ViT for diverse future scenarios is a costly process. To mitigate this problem, we propose a simple yet effective method that learns a gate for each ViT block to adjust its intervention into the prompt tokens. |
Seungryong Yoo; Eunji Kim; Dahuin Jung; Jungbeom Lee; Sungroh Yoon; |
1555 | Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work. |
Junfan Li; Shizhong Liao; |
1556 | Online Prototype Alignment for Few-shot Policy Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, in this paper, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve few-shot policy transfer within only several episodes. |
Qi Yi; Rui Zhang; Shaohui Peng; Jiaming Guo; Yunkai Gao; Kaizhao Yuan; Ruizhi Chen; Siming Lan; Xing Hu; Zidong Du; Xishan Zhang; Qi Guo; Yunji Chen; |
1557 | Submodular Order Functions and Assortment Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give fast algorithms with strong approximation guarantees for maximizing submodular order functions under a variety of constraints. |
Rajan Udwani; |
1558 | On The Impact of Knowledge Distillation for Model Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we have attempted to show that KD enhances the interpretability as well as the accuracy of models. |
Hyeongrok Han; Siwon Kim; Hyun-Soo Choi; Sungroh Yoon; |
1559 | Probabilistic Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we address the ambiguity issue that can harm reliability. |
Eunji Kim; Dahuin Jung; Sangha Park; Siwon Kim; Sungroh Yoon; |
1560 | A General Representation Learning Framework with Generalization Performance Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove that generalization error of representation learning function can be estimated effectively by solving two convex optimization problems. |
Junbiao Cui; Jianqing Liang; Qin Yue; Jiye Liang; |
1561 | Moderately Distributional Exploration for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is because a large uncertainty set could introduce domains containing semantically different factors from training domains. To address this issue, we propose to perform a $\textit{mo}$derately $\textit{d}$istributional $\textit{e}$xploration (MODE) for domain generalization. |
Rui Dai; Yonggang Zhang; Zhen Fang; Bo Han; Xinmei Tian; |
1562 | Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. |
Ronghao Dang; Lu Chen; Liuyi Wang; Zongtao He; Chengju Liu; Qijun Chen; |
1563 | Learning to Acquire Novel Cognitive Tasks with Evolution, Plasticity and Meta-meta-learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we evolve neural networks, endowed with plastic connections and neuromodulation, over a sizable set of simple cognitive tasks adapted from a computational neuroscience framework. |
Thomas Miconi; |
1564 | Model-Free Robust Average-Reward Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the robust average-reward MDPs under the model-free setting.We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence, and Wasserstein distance. |
Yue Wang; Alvaro Velasquez; George K. Atia; Ashley Prater-Bennette; Shaofeng Zou; |
1565 | Discover and Cure: Concept-aware Mitigation of Spurious Correlation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. |
Shirley Wu; Mert Yuksekgonul; Linjun Zhang; James Zou; |
1566 | Geometric Autoencoders – What You See Is What You Decode Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such expressive networks can achieve low reconstruction error even when the latent representation is distorted. To avoid such misleading visualizations, we propose first a differential geometric perspective on the decoder, leading to insightful diagnostics for an embedding’s distortion, and second a new regularizer mitigating such distortion. |
Philipp Nazari; Sebastian Damrich; Fred A Hamprecht; |
1567 | Robust Camera Pose Refinement for Multi-Resolution Hash Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a joint optimization algorithm to calibrate the camera pose and learn a geometric representation using efficient multi-resolution hash encoding. |
Hwan Heo; Taekyung Kim; Jiyoung Lee; Jaewon Lee; Soohyun Kim; Hyunwoo J. Kim; Jin-Hwa Kim; |
1568 | Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$. |
Lesi Chen; Jing Xu; Luo Luo; |
1569 | DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces DADAO: the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $\mu$-strongly convex functions distributed over a given network of size $n$. |
Adel Nabli; Edouard Oyallon; |
1570 | Competing for Shareable Arms in Multi-Player Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium. |
Renzhe Xu; Haotian Wang; Xingxuan Zhang; Bo Li; Peng Cui; |
1571 | Contrast with Reconstruct: Contrastive 3D Representation Learning Guided By Generative Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms. In this paper, we propose contrast with reconstruct (ReCon) that unifies these two paradigms. |
Zekun Qi; Runpei Dong; Guofan Fan; Zheng Ge; Xiangyu Zhang; Kaisheng Ma; Li Yi; |
1572 | Estimating Possible Causal Effects with Latent Variables Via Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since many causal graphs can correspond to one PAG, they are possibly associated with different causal effects. The aim of this paper is to estimate these possible causal effects via covariate adjustment given a PAG. |
Tian-Zuo Wang; Tian Qin; Zhi-Hua Zhou; |
1573 | Bit Allocation Using Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC). |
Tongda Xu; Han Gao; Chenjian Gao; Yuanyuan Wang; Dailan He; Jinyong Pi; Jixiang Luo; Ziyu Zhu; Mao Ye; Hongwei Qin; Yan Wang; Jingjing Liu; Ya-Qin Zhang; |
1574 | Causal Strategic Classification: A Tale of Two Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Focusing on accuracy as our primary objective, we show how strategic behavior and causal effects underlie two complementing forms of distribution shift. We characterize these shifts, and propose a learning algorithm that balances between these two forces and over time, and permits end-to-end training. |
Guy Horowitz; Nir Rosenfeld; |
1575 | Modeling Dynamic Environments with Scene Graph Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. |
Andrey Kurenkov; Michael Lingelbach; Tanmay Agarwal; Emily Jin; Chengshu Li; Ruohan Zhang; Li Fei-Fei; Jiajun Wu; Silvio Savarese; Roberto Martín-Martín; |
1576 | Solving High-Dimensional PDEs with Latent Spectral Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. |
Haixu Wu; Tengge Hu; Huakun Luo; Jianmin Wang; Mingsheng Long; |
1577 | Oscillation-free Quantization for Low-bit Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose three techniques correspondingly: statistical weight quantization ($\rm StatsQ$) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing ($\rm CGA$) that freezes the weights with $\textit{high confidence}$ and calms the oscillating weights; and $\textit{query}$-$\textit{key}$ reparameterization ($\rm QKR$) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. |
Shih-yang Liu; Zechun Liu; Kwang-Ting Cheng; |
1578 | A Study on Transformer Configuration and Training Objective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, we usually set the base model with hidden size (i.e. model width) to be 768 and the number of transformer layers (i.e. model depth) to be 12. In this paper, we revisit these conventional configurations by studying the the relationship between transformer configuration and training objective. |
Fuzhao Xue; Jianghai Chen; Aixin Sun; Xiaozhe Ren; Zangwei Zheng; Xiaoxin He; Yongming Chen; Xin Jiang; Yang You; |
1579 | Quantifying The Knowledge in GNNs for Reliable Distillation Into MLPs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs. |
Lirong Wu; Haitao Lin; Yufei Huang; Stan Z. Li; |
1580 | Composer: Creative and Controllable Image Synthesis with Composable Conditions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. |
Lianghua Huang; Di Chen; Yu Liu; Yujun Shen; Deli Zhao; Jingren Zhou; |
1581 | Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a constraint matrix ${\sf A}$ and a positive semi-definite matrix $W\in \mathbb{R}^{n\times n}$ with a sparse eigenbasis, we consider the task of maintaining the projection in the form of ${\sf B}^\top({\sf B}{\sf B}^\top)^{-1}{\sf B}$, where ${\sf B}={\sf A}(W\otimes I)$ or ${\sf B}={\sf A}(W^{1/2}\otimes W^{1/2})$. |
Zhao Song; Xin Yang; Yuanyuan Yang; Lichen Zhang; |
1582 | Robust Perception Through Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference. |
Chengzhi Mao; Lingyu Zhang; Abhishek Vaibhav Joshi; Junfeng Yang; Hao Wang; Carl Vondrick; |
1583 | Understanding and Defending Patched-based Adversarial Attacks for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on understanding adversarial patch attacks, we propose a simple but efficient defense that correctly detects more than 95% of adversarial patches. |
Liang Liu; Yanan Guo; Youtao Zhang; Jun Yang; |
1584 | Robust Weight Signatures: Gaining Robustness As Easy As Patching Weights? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We start by drawing several key observations: (i) assuming that we train the same model architecture on both a clean dataset and its corrupted version, a comparison between the two resultant models shows their weights to mostly differ in shallow layers; (ii) the weight difference after projection, which we call Robust Weight Signature (RWS), appears to be discriminative and indicative of different corruption types; (iii) perhaps most strikingly, for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets. Based on those RWS observations, we propose a minimalistic model robustness patching framework that carries a model trained on clean data together with its pre-extracted RWSs. |
Ruisi Cai; Zhenyu Zhang; Zhangyang Wang; |
1585 | Estimating Heterogeneous Treatment Effects: Mutual Information Bounds and Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works on modeling selection bias and corresponding algorithms do not naturally generalize to non-binary treatment spaces. To address this limitation, we propose to use mutual information to describe selection bias in estimating HTE and derive a novel error bound using the mutual information between the covariates and the treatments, which is the first error bound to cover general treatment schemes including multinoulli or continuous spaces. |
Xingzhuo Guo; Yuchen Zhang; Jianmin Wang; Mingsheng Long; |
1586 | On The Optimality of Misspecified Kernel Ridge Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that KRR is minimax optimal for any $s\in (0,1)$ when the $\mathcal{H}$ is a Sobolev RKHS. |
Haobo Zhang; Yicheng Li; Weihao Lu; Qian Lin; |
1587 | Multi-View Masked World Models for Visual Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. |
Younggyo Seo; Junsu Kim; Stephen James; Kimin Lee; Jinwoo Shin; Pieter Abbeel; |
1588 | Performative Recommendation: Diversifying Content Via Strategic Incentives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To remedy this, conventional approaches such as re-ranking improve diversity by *presenting* more diverse items. Here we argue that to promote inherent and prolonged diversity, the system must encourage its *creation*. |
Itay Eilat; Nir Rosenfeld; |
1589 | How Does Information Bottleneck Help Deep Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide the first rigorous learning theory for justifying the benefit of information bottleneck in deep learning by mathematically relating information bottleneck to generalization errors. |
Kenji Kawaguchi; Zhun Deng; Xu Ji; Jiaoyang Huang; |
1590 | Momentum Ensures Convergence of SIGNSGD Under Weaker Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. |
Tao Sun; Qingsong Wang; Dongsheng Li; Bao Wang; |
1591 | Lowering The Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the pre-training phase itself incurs significant time/resource overhead, and prior work has not gone beyond hyperparameter search to reduce pre-training time. Our work explicitly aims to reduce this $\textbf{pre-training tax}$ in gradient-based subset training. |
Yeonju Ro; Zhangyang Wang; Vijay Chidambaram; Aditya Akella; |
1592 | Eliminating Adversarial Noise Via Information Discard and Robust Representation Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we start from the latent inherent properties of adversarial samples to break the limitations. |
Dawei Zhou; Yukun Chen; Nannan Wang; Decheng Liu; Xinbo Gao; Tongliang Liu; |
1593 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. |
Junnan Li; Dongxu Li; Silvio Savarese; Steven Hoi; |
1594 | Gradient Descent Finds The Global Optima of Two-Layer Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main aim of this paper is to conduct the convergence analysis of the gradient descent for two-layer physics-informed neural networks (PINNs). |
Yihang Gao; Yiqi Gu; Michael Ng; |
1595 | A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One popular approach for solving $\ell_2$ regression problem is via sketching: picking a structured random matrix $S\in \mathbb{R}^{m\times n}$ with $m\ll n$ and $SA$ can be quickly computed, solve the “sketched” regression problem $x’=\mathrm{argmin} ||SAx-Sb||_2$. In this paper, we show that in order to obtain such $\ell_\infty$ guarantee for $\ell_2$ regression, one has to use sketching matrices that are *dense*. |
Zhao Song; Mingquan Ye; Junze Yin; Lichen Zhang; |
1596 | Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a principled framework that provides a coherent and systematic understanding of the impact an error in the perception module imposes on an autonomous agent’s planning that actually controls the vehicle. |
Weixin Li; Xiaodong Yang; |
1597 | Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}^n$, where any such concept depends only on $K$ out of $n$ attributes of the input. |
Shiwei Zeng; Jie Shen; |
1598 | Consistency of Multiple Kernel Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in multiple kernel clustering (MKC), the consistency of kernel weights has not been sufficiently investigated. In this work, we fill this gap with a non-asymptotic analysis on the consistency of kernel weights of a novel method termed SimpleMKKM. |
Weixuan Liang; Xinwang Liu; Yong Liu; Chuan Ma; Yunping Zhao; Zhe Liu; En Zhu; |
1599 | Towards Controlled Data Augmentations for Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we thoroughly study the coupling of data augmentation and active learning, thereby proposing Controllable Augmentation ManiPulator for Active Learning. |
Jianan Yang; Haobo Wang; Sai Wu; Gang Chen; Junbo Zhao; |
1600 | Bi-directional Masks for Efficient N:M Sparse Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on addressing the dense backward propagation issue for training efficiency of N:M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves practical speedups supported by the N:M sparse tensor core. Therefore, we present a novel method of Bi-directional Masks (Bi-Mask) with its two central innovations in: 1) Separate sparse masks in the two directions of forward and backward propagation to obtain training acceleration. |
Yuxin Zhang; Yiting Luo; Mingbao Lin; Yunshan Zhong; JingJing Xie; Fei Chao; Rongrong Ji; |
1601 | The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. |
Anirudh Vemula; Yuda Song; Aarti Singh; Drew Bagnell; Sanjiban Choudhury; |
1602 | Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and general framework for nonparametric estimation of heterogeneous treatment effects under fairness constraints. |
Kwangho Kim; Jose R Zubizarreta; |
1603 | Controlled Text Generation with Natural Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, it is notoriously difficult to control their generation in such a way that it satisfies user-specified constraints. In this paper, we present InstructCTG, a simple controlled text generation framework that incorporates different constraints by verbalizing them as natural language instructions. |
Wangchunshu Zhou; Yuchen Eleanor Jiang; Ethan Wilcox; Ryan Cotterell; Mrinmaya Sachan; |
1604 | Margin-based Neural Network Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel margin-based DNN watermarking approach that is robust to the functionality stealing attacks based on model extraction and distillation. |
Byungjoo Kim; Suyoung Lee; Seanie Lee; Sooel Son; Sung Ju Hwang; |
1605 | Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a multistep operator that unifies per-decision and trajectory-aware methods. |
Brett Daley; Martha White; Christopher Amato; Marlos C. Machado; |
1606 | Great Models Think Alike: Improving Model Reliability Via Inter-Model Latent Agreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A fundamental challenge is that models are often unreliable due to overconfidence. In this paper, we estimate a model’s reliability by measuring the agreement between its latent space, and the latent space of a foundation model. |
Ailin Deng; Miao Xiong; Bryan Hooi; |
1607 | Opponent-Limited Online Search for Imperfect Information Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make Safe-1-KLSS applicable to even larger games, we propose Opponent-Limited Subgame Solving (OLSS) to limit how the opponent reaches a subgame and how it acts in the subgame. |
Weiming Liu; Haobo Fu; QIANG FU; Yang Wei; |
1608 | Pruning Via Sparsity-indexed ODE: A Continuous Sparsity Viewpoint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a novel pruning framework, coined Sparsity-indexed ODE (SpODE) that provides explicit guidance on how to best preserve model performance while ensuring an infinitesimal increase in model sparsity. |
Zhanfeng Mo; Haosen Shi; Sinno Pan; |
1609 | Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new first-order optimization algorithm — AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent—for separable convex-concave minimax optimization. |
Chris Junchi Li; Angela Yuan; Gauthier Gidel; Quanquan Gu; Michael Jordan; |
1610 | Lazy Agents: A New Perspective on Solving Sparse Reward Problem in Multi-agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sparse reward remains a valuable and challenging problem in multi-agent reinforcement learning (MARL). This paper addresses this issue from a new perspective, i.e., lazy agents. |
Boyin Liu; Zhiqiang Pu; Yi Pan; Jianqiang Yi; Yanyan Liang; Du Zhang; |
1611 | Long Horizon Temperature Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, autoregressive models rely on myopic temperature scaling that greedily optimizes the next token. To address this, we propose *Long Horizon Temperature Scaling* (LHTS), a novel approach for sampling from temperature-scaled *joint* distributions. |
Andy Shih; Dorsa Sadigh; Stefano Ermon; |
1612 | Using Perturbation to Improve Goodness-of-Fit Tests Based on Kernelized Stein Discrepancy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to perturb the observed sample via Markov transition kernels, with respect to which the target distribution is invariant. |
Xing Liu; Andrew Duncan; Axel Gandy; |
1613 | Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. |
Ioannis Anagnostides; Gabriele Farina; Tuomas Sandholm; |
1614 | Neural Stochastic Differential Games for Time-series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite such progress, most existing methods still face challenges in providing a general framework for analyzing time series. To tackle this, we adopt stochastic differential games to suggest a new philosophy of utilizing interacting collective intelligence in time series analysis. |
Sungwoo Park; Byoungwoo Park; Moontae Lee; Changhee Lee; |
1615 | Which Tricks Are Important for Learning to Rank? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, several other GBDT-based ranking algorithms were proposed. In this paper, we thoroughly analyze these methods in a unified setup. |
Ivan Lyzhin; Aleksei Ustimenko; Andrey Gulin; Liudmila Prokhorenkova; |
1616 | A Complete Expressiveness Hierarchy for Subgraph GNNs Via Subgraph Weisfeiler-Lehman Tests Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While numerous architectures have been proposed, so far there is still a limited understanding of how various design paradigms differ in terms of expressive power, nor is it clear what design principle achieves maximal expressiveness with minimal architectural complexity. To address these fundamental questions, this paper conducts a systematic study of general node-based subgraph GNNs through the lens of Subgraph Weisfeiler-Lehman Tests (SWL). |
Bohang Zhang; Guhao Feng; Yiheng Du; Di He; Liwei Wang; |
1617 | The SSL Interplay: Augmentations, Inductive Bias, and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. % on the resulting performance in downstream tasks. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in kernel regimes, and highlight several insights for SSL practitioners that arise from our theory. |
Vivien Cabannes; Bobak Kiani; Randall Balestriero; Yann LeCun; Alberto Bietti; |
1618 | Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field. |
Batuhan Yardim; Semih Cayci; Matthieu Geist; Niao He; |
1619 | High-dimensional Clustering Onto Hamiltonian Cycle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems. |
Tianyi Huang; Shenghui Cheng; Stan Z. Li; Zhengjun Zhang; |
1620 | Personalized Subgraph Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a new subgraph FL problem, personalized subgraph FL, which focuses on the joint improvement of the interrelated local GNNs rather than learning a single global model, and propose a novel framework, FEDerated Personalized sUBgraph learning (FED-PUB), to tackle it. |
Jinheon Baek; Wonyong Jeong; Jiongdao Jin; Jaehong Yoon; Sung Ju Hwang; |
1621 | Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF). |
Banghua Zhu; Michael Jordan; Jiantao Jiao; |
1622 | On Investigating The Conservative Property of Score-Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that the architectural constraints of CSBMs may limit their modeling ability. |
Chen-Hao Chao; Wei-Fang Sun; Bo-Wun Cheng; Chun-Yi Lee; |
1623 | Curriculum Co-disentangled Representation Learning Across Multiple Environments for Social Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in literature there has been no work on social recommendation capable of disentangling user representations across consuming and social environments. To solve this problem, we study co-disentangled representation learning across different environments via proposing the curriculum co-disentangled representation learning (CurCoDis) model to disentangle the hidden factors for users across both consuming and social environments. |
Xin Wang; Zirui Pan; Yuwei Zhou; Hong Chen; Chendi Ge; Wenwu Zhu; |
1624 | Rethinking Explaining Graph Neural Networks Via Non-parametric Subgraph Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the observation that graphs typically share some common motif patterns, we propose a novel non-parametric subgraph matching framework, dubbed MatchExplainer, to explore explanatory subgraphs. |
Fang Wu; Siyuan Li; Xurui Jin; Yinghui Jiang; Dragomir Radev; Zhangming Niu; Stan Z. Li; |
1625 | UPSCALE: Unconstrained Channel Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints. |
Alvin Wan; Hanxiang Hao; Kaushik Patnaik; Yueyang Xu; Omer Hadad; David Güera; Zhile Ren; Qi Shan; |
1626 | A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, given the interconnected nature of source localization, separation, and recognition, independent models are likely to yield suboptimal performance as they fail to capture the interdependence between these tasks. To address this problem, we propose a unified audio-visual learning framework (dubbed OneAVM) that integrates audio and visual cues for joint localization, separation, and recognition. |
Shentong Mo; Pedro Morgado; |
1627 | Fair and Optimal Classification Via Post-Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the bias exhibited by machine learning models, fairness criteria can be integrated into the training process to ensure fair treatment across all demographics, but it often comes at the expense of model performance. Understanding such tradeoffs, therefore, underlies the design of fair algorithms. To this end, this paper provides a complete characterization of the inherent tradeoff of demographic parity on classification problems, under the most general multi-group, multi-class, and noisy setting. |
Ruicheng Xian; Lang Yin; Han Zhao; |
1628 | A Theory of Continuous Generative Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a theory for generalized GFlowNets, which encompasses both existing discrete GFlowNets and ones with continuous or hybrid state spaces, and perform experiments with two goals in mind. |
Salem Lahlou; Tristan Deleu; Pablo Lemos; Dinghuai Zhang; Alexandra Volokhova; Alex Hernández-García; Lena Nehale Ezzine; Yoshua Bengio; Nikolay Malkin; |
1629 | Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC). |
Peng XU; Lin Zhang; Xuanzhou Liu; Jiaqi Sun; Yue Zhao; Haiqin Yang; Bei Yu; |
1630 | All in A Row: Compressed Convolution Networks for Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between Euclidean space and graph space, we propose a differentiable method for regularization on graphs that applies permutations to the input graphs. |
Junshu Sun; Shuhui Wang; XINZHE HAN; Zhe Xue; Qingming Huang; |
1631 | Blossom: An Anytime Algorithm for Computing Optimal Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple algorithm to learn optimal decision trees of bounded depth. |
Emir Demirović; Emmanuel Hebrard; Louis Jean; |
1632 | Functional Neural Networks: Shift Invariant Models for Functional Data with Applications to EEG Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs). |
Florian Heinrichs; Mavin Heim; Corinna Weber; |
1633 | Optimizing Mode Connectivity for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To dodge the ridge, we propose parameter-saving OPtimizing Connectivity (OPC) based on Fourier series and gradient projection for finding the low-loss path between minima. |
Haitao Wen; Haoyang Cheng; Heqian Qiu; Lanxiao Wang; Lili Pan; Hongliang Li; |
1634 | Never Mind The Metrics—what About The Uncertainty? Visualising Binary Confusion Matrix Metric Distributions to Put Performance in Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop novel interactive visualisations of performance metric contours within (and beyond) ROC space, showing the discrete probability mass functions of true and false positive rates and how these relate to performance metric distributions. We aim to raise awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that performance claims should be tempered by this understanding. |
David Lovell; Dimity Miller; Jaiden Capra; Andrew P. Bradley; |
1635 | Rethink DARTS Search Space and Renovate A New Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first propose and orchestrate a suite of improvements to frame a larger and harder DSS, termed LHD, while retaining high efficiency in search. |
Jiuling Zhang; Zhiming Ding; |
1636 | SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. |
Guangxuan Xiao; Ji Lin; Mickael Seznec; Hao Wu; Julien Demouth; song han; |
1637 | Provable Dynamic Fusion for Low-Quality Multimodal Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we design a provably robust multimodal fusion method? This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. |
Qingyang Zhang; Haitao Wu; Changqing Zhang; Qinghua Hu; Huazhu Fu; Joey Tianyi Zhou; Xi Peng; |
1638 | Which Invariance Should We Transfer? A Causal Minimax Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, a key question remains: which subset of this whole stable information should the model transfer, in order to achieve optimal generalization ability? To answer this question, we present a comprehensive minimax analysis from a causal perspective. |
Mingzhou Liu; Xiangyu Zheng; Xinwei Sun; Fang Fang; Yizhou Wang; |
1639 | CLUSTSEG: Clustering for Universal Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks ($i.e.,$ superpixel, semantic, instance, and panoptic) through a unified, neural clustering scheme. |
James Chenhao Liang; Tianfei Zhou; Dongfang Liu; Wenguan Wang; |
1640 | Optimal LP Rounding and Linear-Time Approximation Algorithms for Clustering Edge-Colored Hypergraphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications where the goal is to cluster a set of objects based on multiway interactions of different categories or types. We present improved approximation guarantees based on linear programming, and show they are tight by proving a matching integrality gap. |
Nate Veldt; |
1641 | GFlowNet-EM for Learning Compositional Latent Variable Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. |
Edward J Hu; Nikolay Malkin; Moksh Jain; Katie E Everett; Alexandros Graikos; Yoshua Bengio; |
1642 | Statistical Inference and A/B Testing for First-Price Pacing Equilibria Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a statistical framework for the FPPE model, in which a limit FPPE with a continuum of items models the long-run steady-state behavior of the auction platform, and an observable FPPE consisting of a finite number of items provides the data to estimate primitives of the limit FPPE, such as revenue, Nash social welfare (a fair metric of efficiency), and other parameters of interest. |
Luofeng Liao; Christian Kroer; |
1643 | SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a new score-decomposed diffusion model (SDDM) on manifolds to explicitly optimize the tangled distributions during image generation. |
Shikun Sun; Longhui Wei; Junliang Xing; Jia Jia; Qi Tian; |
1644 | Stable Estimation of Heterogeneous Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the estimation accuracy of HTE for underrepresented populations, we propose a novel Stable CounterFactual Regression (StableCFR) to smooth the population distribution and upsample the underrepresented subpopulations, while balancing confounders between treatment and control groups. |
Anpeng Wu; Kun Kuang; Ruoxuan Xiong; Bo Li; Fei Wu; |
1645 | CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. |
Nan Yin; Li Shen; Mengzhu Wang; Long Lan; Zeyu Ma; Chong Chen; Xian-Sheng Hua; Xiao Luo; |
1646 | Answering Complex Logical Queries on Knowledge Graphs Via Query Computation Tree Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose QTO (Query Computation Tree Optimization) that can efficiently find the exact optimal solution. |
Yushi Bai; Xin Lv; Juanzi Li; Lei Hou; |
1647 | FAIRER: Fairness As Decision Rationale Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups. |
Tianlin Li; Qing Guo; Aishan Liu; Mengnan Du; Zhiming Li; Yang Liu; |
1648 | Social Learning Spontaneously Emerges By Searching Optimal Heuristics with Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we address both problems by employing a deep reinforcement learning model to optimize the social learning strategies (SLSs) of agents in a cooperative game in a multi-dimensional landscape. |
Seungwoong Ha; Hawoong Jeong; |
1649 | Crafting Training Degradation Distribution for The Accuracy-Generalization Trade-off in Real-World Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach to craft training degradation distributions using a small set of reference images. |
Ruofan Zhang; Jinjin Gu; Haoyu Chen; Chao Dong; Yulun Zhang; Wenming Yang; |
1650 | SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An anchor view that maintains the essential information of input graphs for contrastive learning has been hardly investigated. In this paper, based on the theory of graph information bottleneck, we deduce the definition of this anchor view; put differently, the anchor view with essential information of input graph is supposed to have the minimal structural uncertainty. |
Junran Wu; Xueyuan Chen; Bowen Shi; Shangzhe Li; Ke Xu; |
1651 | Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations. |
Pingchuan Ma; Peter Yichen Chen; Bolei Deng; Joshua B. Tenenbaum; Tao Du; Chuang Gan; Wojciech Matusik; |
1652 | Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MOdel-Bellman Inconsistency penalized offLinE Policy Optimization (MOBILE), a novel uncertainty-driven offline RL algorithm. |
Yihao Sun; Jiaji Zhang; Chengxing Jia; Haoxin Lin; Junyin Ye; Yang Yu; |
1653 | Evolving Semantic Prototype Improves Generative Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: So the synthesized visual sample features do not faithfully represent the real sample features, limiting the classifier training and existing ZSL performance. In this paper, we formulate this mismatch phenomenon as the visual-semantic domain shift problem. |
Shiming Chen; Wenjin Hou; Ziming Hong; Xiaohan Ding; Yibing Song; Xinge You; Tongliang Liu; Kun Zhang; |
1654 | Random Shuffle Transformer for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present that local window Transformer can also function as modeling non-local interactions. |
Jie Xiao; Xueyang Fu; man zhou; Hongjian Liu; Zheng-Jun Zha; |
1655 | Retrieval-Augmented Multimodal Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e.g., documents on the web). |
Michihiro Yasunaga; Armen Aghajanyan; Weijia Shi; Richard James; Jure Leskovec; Percy Liang; Mike Lewis; Luke Zettlemoyer; Wen-tau Yih; |
1656 | A Closer Look at Few-shot Classification Again Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we empirically prove that the training algorithm and the adaptation algorithm can be completely disentangled, which allows algorithm analysis and design to be done individually for each phase. |
Xu Luo; Hao Wu; Ji Zhang; Lianli Gao; Jing Xu; Jingkuan Song; |
1657 | Boosting Offline Reinforcement Learning with Action Preference Query Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP). |
Qisen Yang; Shenzhi Wang; Matthieu Gaetan Lin; Shiji Song; Gao Huang; |
1658 | Patch-level Contrastive Learning Via Positional Query for Visual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a plug-in method PQCL (Positional Query for patch-level Contrastive Learning), which allows performing patch-level contrasts between two views with exact patch correspondence. |
Shaofeng Zhang; Qiang Zhou; Zhibin Wang; Fan Wang; Junchi Yan; |
1659 | Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines a uniform UAP for the function class $C(\mathcal{K},\mathbb{R}^{d_y})$ and gives the exact minimum width of the leaky-ReLU NN as $w_{\min}=\max(d_x+1,d_y)+1_{d_y=d_x+1}$, which involves the effects of the output dimensions. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory. |
Li’ang Li; Yifei Duan; Guanghua Ji; Yongqiang Cai; |
1660 | Optimizing DDPM Sampling with Shortcut Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose Shortcut Fine-Tuning (SFT), a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs). |
Ying Fan; Kangwook Lee; |
1661 | Learning Physical Models That Can Respect Conservation Laws Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Delivering on the promise of SciML requires seamlessly incorporating both types of problems into the learning process. To address this issue, we propose ProbConserv, a framework for incorporating constraints into a generic SciML architecture. |
Derek Hansen; Danielle C. Maddix; Shima Alizadeh; Gaurav Gupta; Michael W. Mahoney; |
1662 | Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach, CL-LNS, that delivers state-of-the-art anytime performance on several ILP benchmarks measured by metrics including the primal gap, the primal integral, survival rates and the best performing rate. |
Taoan Huang; Aaron M Ferber; Yuandong Tian; Bistra Dilkina; Benoit Steiner; |
1663 | BiBench: Benchmarking and Analyzing Network Binarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood. To close this gap, we present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization. |
Haotong Qin; Mingyuan Zhang; Yifu Ding; AOYU LI; Zhongang Cai; Ziwei Liu; Fisher Yu; Xianglong Liu; |
1664 | Leveraging Demonstrations to Improve Online Learning: Quality Matters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes’ rule and derive a prior-dependent Bayesian regret bound. |
Botao Hao; Rahul Jain; Tor Lattimore; Benjamin Van Roy; Zheng Wen; |
1665 | Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we thus propose model ratatouille, a new strategy to recycle the multiple fine-tunings of the same foundation model on diverse auxiliary tasks. |
Alexandre Rame; Kartik Ahuja; Jianyu Zhang; Matthieu Cord; Leon Bottou; David Lopez-Paz; |
1666 | Optimizing NOTEARS Objectives Via Topological Swaps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimality challenges associated with this class of non-convex programs. |
Chang Deng; Kevin Bello; Bryon Aragam; Pradeep Kumar Ravikumar; |
1667 | Dynamical Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel setting, the Dynamical Linear Bandits (DLB), an extension of the linear bandits characterized by a hidden state. |
Marco Mussi; Alberto Maria Metelli; Marcello Restelli; |
1668 | Unifying Molecular and Textual Representations Via Multi-task Language Modelling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains. |
Dimitrios Christofidellis; Giorgio Giannone; Jannis Born; Ole Winther; Teodoro Laino; Matteo Manica; |
1669 | Probabilistic Contrastive Learning Recovers The Correct Aleatoric Uncertainty of Ambiguous Inputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. |
Michael Kirchhof; Enkelejda Kasneci; Seong Joon Oh; |
1670 | Towards Omni-generalizable Neural Methods for Vehicle Routing Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a generic meta-learning framework, which enables effective training of an initialized model with the capability of fast adaptation to new tasks during inference. |
Jianan Zhou; Yaoxin Wu; Wen Song; Zhiguang Cao; Jie Zhang; |
1671 | Model-agnostic Measure of Generalization Difficulty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty of tasks. |
Akhilan Boopathy; Kevin Liu; Jaedong Hwang; Shu Ge; Asaad Mohammedsaleh; Ila R Fiete; |
1672 | Long-Term Rhythmic Video Soundtracker Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms. |
Jiashuo Yu; Yaohui Wang; Xinyuan Chen; Xiao Sun; Yu Qiao; |
1673 | Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, the dual problem of bipartite matching and, more generally, *$\text{L}$-/$\text{L}^\natural$-convex function minimization* have *arbitrarily many* optimal solutions, making such prediction-dependent bounds arbitrarily large. To resolve this theoretically critical issue, we present a new warm-start-with-prediction framework for $\text{L}$-/$\text{L}^\natural$-convex function minimization. |
Shinsaku Sakaue; Taihei Oki; |