Paper Digest: ICML 2022 Highlights
The Internationl Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2022, it is to be held in Baltimore, US.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICML 2022 Highlights
Paper | Author(s) | |
---|---|---|
1 | PAC-Bayesian Bounds on Rate-Efficient Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We derive analytic bounds on the noise invariance of majority vote classifiers operating on compressed inputs. |
Alhabib Abbas; Yiannis Andreopoulos; |
2 | Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, the loss landscape of MAML is much complex with possibly many more saddle points and local minima than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. |
Momin Abbas; Quan Xiao; Lisha Chen; Pin-Yu Chen; Tianyi Chen; |
3 | An Initial Alignment Between Neural Network and Target Is Needed for Gradient Descent to Learn Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces the notion of “Initial Alignment” (INAL) between a neural network at initialization and a target function. |
Emmanuel Abbe; Elisabetta Cornacchia; Jan Hazla; Christopher Marquis; |
4 | Active Sampling for Min-Max Fairness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose simple active sampling and reweighting strategies for optimizing min-max fairness that can be applied to any classification or regression model learned via loss minimization. |
Jacob D Abernethy; Pranjal Awasthi; Matth?us Kleindessner; Jamie Morgenstern; Chris Russell; Jie Zhang; |
5 | Meaningfully Debugging Model Mistakes Using Conceptual Counterfactual Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a systematic approach, conceptual counterfactual explanations (CCE), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). |
Abubakar Abid; Mert Yuksekgonul; James Zou; |
6 | Batched Dueling Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the batched K-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. |
Arpit Agarwal; Rohan Ghuge; Viswanath Nagarajan; |
7 | Hierarchical Shrinkage: Improving The Accuracy and Interpretability of Tree-based Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm which regularizes the tree not by altering its structure, but by shrinking the prediction over each leaf toward the sample means over each of its ancestors, with weights depending on a single regularization parameter and the number of samples in each ancestor. |
Abhineet Agarwal; Yan Shuo Tan; Omer Ronen; Chandan Singh; Bin Yu; |
8 | Deep Equilibrium Networks Are Sensitive to Initialization Statistics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. |
Atish Agarwala; Samuel S Schoenholz; |
9 | Learning of Cluster-based Feature Importance for Electronic Health Record Time-series Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a supervised deep learning model to cluster EHR data based on the identification of clinically understandable phenotypes with regard to both outcome prediction and patient trajectory. |
Henrique Aguiar; Mauro Santos; Peter Watkinson; Tingting Zhu; |
10 | On The Convergence of The Shapley Value in Parametric Bayesian Learning Games Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we establish the convergence property of the Shapley value in parametric Bayesian learning games where players perform a Bayesian inference using their combined data, and the posterior-prior KL divergence is used as the characteristic function. |
Lucas Agussurja; Xinyi Xu; Bryan Kian Hsiang Low; |
11 | Individual Preference Stability for Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. |
Saba Ahmadi; Pranjal Awasthi; Samir Khuller; Matth?us Kleindessner; Jamie Morgenstern; Pattara Sukprasert; Ali Vakilian; |
12 | Understanding The Unstable Convergence of Gradient Descent Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, many works have observed that in machine learning applications step sizes often do not fulfill this condition, yet (stochastic) gradient descent still converges, albeit in an unstable manner. We investigate this unstable convergence phenomenon from first principles, and discuss key causes behind it. |
Kwangjun Ahn; Jingzhao Zhang; Suvrit Sra; |
13 | Minimum Cost Intervention Design for Causal Effect Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired effect. |
Sina Akbari; Jalal Etesami; Negar Kiyavash; |
14 | How Faithful Is Your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a 3-dimensional evaluation metric, ($\alpha$-Precision, $\beta$-Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. |
Ahmed Alaa; Boris Van Breugel; Evgeny S. Saveliev; Mihaela van der Schaar; |
15 | A Natural Actor-Critic Framework for Zero-Sum Markov Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. |
Ahmet Alacaoglu; Luca Viano; Niao He; Volkan Cevher; |
16 | Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By leveraging Holographic Reduced Representations (HRRs), we create a neural network with a pseudo-encryption style defense that empirically shows robustness to attack, even under threat models that unrealistically favor the adversary. |
Mohammad Mahmudul Alam; Edward Raff; Tim Oates; James Holt; |
17 | Optimistic Linear Support and Successor Features As A Basis for Optimal Policy Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the identified solutions are not guaranteed to be optimal. We introduce a novel algorithm that addresses this limitation. |
Lucas Nunes Alegre; Ana Bazzan; Bruno C. Da Silva; |
18 | Structured Stochastic Gradient MCMC Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. To relax these assumptions, this work proposes a new non-parametric variational inference scheme that combines ideas from both SGMCMC and coordinate-ascent VI. |
Antonios Alexos; Alex J Boyd; Stephan Mandt; |
19 | XAI for Transformers: Better Explanations Through Conservative Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. |
Ameen Ali; Thomas Schnake; Oliver Eberle; Gr?goire Montavon; Klaus-Robert M?ller; Lior Wolf; |
20 | RUMs from Head-to-Head Contests Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on slates of size two representing head-to-head contests. |
Matteo Almanza; Flavio Chierichetti; Ravi Kumar; Alessandro Panconesi; Andrew Tomkins; |
21 | Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present RetoMaton – retrieval automaton – which approximates the datastore search, based on (1) saving pointers between consecutive datastore entries, and (2) clustering of entries into "states". |
Uri Alon; Frank Xu; Junxian He; Sudipta Sengupta; Dan Roth; Graham Neubig; |
22 | Minimax Classification Under Concept Drift with Multidimensional Adaptation and Performance Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents adaptive minimax risk classifiers (AMRCs) that account for multidimensional time changes by means of a multivariate and high-order tracking of the time-varying underlying distribution. |
Ver?nica ?lvarez; Santiago Mazuelas; Jose A Lozano; |
23 | Scalable First-Order Bayesian Optimization Via Structured Automatic Differentiation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we observe that a wide range of kernels gives rise to structured matrices, enabling an exact $O(n^2d)$ matrix-vector multiply for gradient observations and $O(n^2d^2)$ for Hessian observations. Beyond canonical kernel classes, we derive a programmatic approach to leveraging this type of structure for transformations and combinations of the discussed kernel classes, which constitutes a structure-aware automatic differentiation algorithm. |
Sebastian E Ament; Carla P Gomes; |
24 | Public Data-Assisted Mirror Descent for Private Model Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. |
Ehsan Amid; Arun Ganesh; Rajiv Mathews; Swaroop Ramaswamy; Shuang Song; Thomas Steinke; Thomas Steinke; Vinith M Suriyakumar; Om Thakkar; Abhradeep Thakurta; |
25 | On Last-Iterate Convergence Beyond Zero-Sum Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we provide new results and techniques that apply to broader families of games and learning dynamics. |
Ioannis Anagnostides; Ioannis Panageas; Gabriele Farina; Tuomas Sandholm; |
26 | Online Algorithms with Multiple Predictions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the performance of the best solution obtained from the predictions. |
Keerti Anand; Rong Ge; Amit Kumar; Debmalya Panigrahi; |
27 | Learning to Hash Robustly, Guaranteed Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. |
Alexandr Andoni; Daniel Beaglehole; |
28 | Set Based Stochastic Subsampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Deep models are designed to operate on huge volumes of high dimensional data such as images. In order to reduce the volume of data these models must process, we propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an arbitrary downstream task network (e.g. classifier). |
Bruno Andreis; Seanie Lee; A. Tuan Nguyen; Juho Lee; Eunho Yang; Sung Ju Hwang; |
29 | Towards Understanding Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that the existing justifications for the success of SAM which are based on a PAC-Bayes generalization bound and the idea of convergence to flat minima are incomplete. |
Maksym Andriushchenko; Nicolas Flammarion; |
30 | Fair and Fast K-Center Clustering for Data Summarization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider two key issues faced by many clustering methods when used for data summarization, namely (a) an unfair representation of "demographic groups” and (b) distorted summarizations, where data points in the summary represent subsets of the original data of vastly different sizes. |
Haris Angelidakis; Adam Kurpisz; Leon Sering; Rico Zenklusen; |
31 | Interactive Correlation Clustering with Existential Cluster Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce existential cluster constraints: a new form of feedback where users indicate the features of desired clusters. |
Rico Angell; Nicholas Monath; Nishant Yadav; Andrew Mccallum; |
32 | Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Current algorithms, however, do not generally offer statistical guarantees that protect against a model’s mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. |
Anastasios N Angelopoulos; Amit Pal Kohli; Stephen Bates; Michael Jordan; Jitendra Malik; Thayer Alshaabi; Srigokul Upadhyayula; Yaniv Romano; |
33 | AdaGrad Avoids Saddle Points Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on the AdaGrad family of algorithms – from scalar to full-matrix preconditioning – and we examine the question of whether the method’s trajectories avoid saddle points. |
Kimon Antonakopoulos; Panayotis Mertikopoulos; Georgios Piliouras; Xiao Wang; |
34 | UnderGrad: A Universal Black-Box Optimization Method with Almost Dimension-Free Convergence Rate Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our paper aims to bridge this gap by providing a scalable universal method – dubbed UnDERGrad – which enjoys an almost dimension-free oracle complexity in problems with a favorable geometry (like the simplex, $\ell_1$-ball or trace-constraints), while retaining the order-optimal dependence on T described above. |
Kimon Antonakopoulos; Dong Quan Vu; Volkan Cevher; Kfir Levy; Panayotis Mertikopoulos; |
35 | Adapting The Linearised Laplace Model Evidence for Modern Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. |
Javier Antoran; David Janz; James U Allingham; Erik Daxberger; Riccardo Rb Barbano; Eric Nalisnick; Jose Miguel Hernandez-Lobato; |
36 | EAT-C: Environment-Adversarial Sub-Task Curriculum for Efficient Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Reinforcement learning (RL) is inefficient on long-horizon tasks due to sparse rewards and its policy can be fragile to slightly perturbed environments. We address these challenges via a curriculum of tasks with coupled environments, generated by two policies trained jointly with RL: (1) a co-operative planning policy recursively decomposing a hard task into a coarse-to-fine sub-task tree; and (2) an adversarial policy modifying the environment in each sub-task. |
Shuang Ao; Tianyi Zhou; Jing Jiang; Guodong Long; Xuan Song; Chengqi Zhang; |
37 | Online Balanced Experimental Design Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present algorithms that build on recent advances in online discrepancy minimization which accommodate both arbitrary treatment probabilities and multiple treatments. |
David Arbour; Drew Dimmery; Tung Mai; Anup Rao; |
38 | VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning Based on Bayesian Novelty Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a variational architecture growing framework dubbed VariGrow. |
Randy Ardywibowo; Zepeng Huo; Zhangyang Wang; Bobak J Mortazavi; Shuai Huang; Xiaoning Qian; |
39 | Thresholded Lasso Bandit Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear bandits, where feature vectors may be of large dimension $d$, but where the reward function depends on a few, say $s_0\ll d$, of these features only. |
Kaito Ariu; Kenshi Abe; Alexandre Proutiere; |
40 | Gradient Based Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. |
Aleksandar Armacki; Dragana Bajovic; Dusan Jakovetic; Soummya Kar; |
41 | Understanding Gradient Descent on The Edge of Stability in Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss. |
Sanjeev Arora; Zhiyuan Li; Abhishek Panigrahi; |
42 | Private Optimization in The Interpolation Regime: Faster Rates and Hardness Results Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate differentially private stochastic optimization in the interpolation regime. |
Hilal Asi; Karan Chadha; Gary Cheng; John Duchi; |
43 | Optimal Algorithms for Mean Estimation Under Local Differential Privacy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we investigate the question of designing the randomizer with the smallest variance. |
Hilal Asi; Vitaly Feldman; Kunal Talwar; |
44 | Asymptotically-Optimal Gaussian Bandits with Side Observations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The LP optimizes the cost (regret) required to reliably estimate the suboptimality gap of each arm. This LP lower bound motivates our main contribution: the first known asymptotically optimal algorithm for this general setting. |
Alexia Atsidakou; Orestis Papadigenopoulos; Constantine Caramanis; Sujay Sanghavi; Sanjay Shakkottai; |
45 | Congested Bandits: Optimal Routing Via Short-term Resets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For the multi-armed setup, we propose a UCB style algorithm and show that its policy regret scales as $\tilde{O}(\sqrt{K \Delta T})$.Motivated by this, we introduce the problem of Congested Bandits where each arm’s reward is allowed to depend on the number of times it was played in the past $\Delta$ timesteps. |
Pranjal Awasthi; Kush Bhatia; Sreenivas Gollapudi; Kostas Kollias; |
46 | Do More Negative Samples Necessarily Hurt In Contrastive Learning? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. |
Pranjal Awasthi; Nishanth Dikkala; Pritish Kamath; |
47 | H-Consistency Bounds for Surrogate Loss Minimizers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a detailed study of estimation errors in terms of surrogate loss estimation errors. |
Pranjal Awasthi; Anqi Mao; Mehryar Mohri; Yutao Zhong; |
48 | Iterative Hard Thresholding with Adaptive Regularization: Sparser Solutions Without Sacrificing Runtime Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a simple modification to the iterative hard thresholding (IHT) algorithm, which recovers asymptotically sparser solutions as a function of the condition number. |
Kyriakos Axiotis; Maxim Sviridenko; |
49 | Proving Theorems Using Incremental Learning and Hindsight Experience Replay Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we adapt the idea of hindsight experience replay from reinforcement learning to the automated theorem proving domain, so as to use the intermediate data generated during unsuccessful proof attempts. |
Eser Ayg?n; Ankit Anand; Laurent Orseau; Xavier Glorot; Stephen M Mcaleer; Vlad Firoiu; Lei M Zhang; Doina Precup; Shibl Mourad; |
50 | Near-optimal Rate of Consistency for Linear Models with Missing Values Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task. |
Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet; |
51 | How Tempering Fixes Data Augmentation in Bayesian Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we identify two interlaced factors concurrently influencing the strength of the cold posterior effect, namely the correlated nature of augmentations and the degree of invariance of the employed model to such transformations. |
Gregor Bachmann; Lorenzo Noci; Thomas Hofmann; |
52 | ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce (i) $\mathtt{ASAP.SGD}$, an analytical framework capturing necessary and desired properties of staleness-adaptive step size functions and (ii) \textsc{tail}-$\tau$, a method for utilizing key properties of the execution instance, generating a tailored strategy that not only dampens the impact of stale updates, but also leverages fresh ones. |
Karl B?ckstr?m; Marina Papatriantafilou; Philippas Tsigas; |
53 | From Noisy Prediction to True Label: Noisy Prediction Calibration Via Generative Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We suggest a new branch of method, Noisy Prediction Calibration (NPC) in learning with noisy labels. |
Heesun Bae; Seungjae Shin; Byeonghu Na; Joonho Jang; Kyungwoo Song; Il-Chul Moon; |
54 | Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. |
Alexei Baevski; Wei-Ning Hsu; Qiantong Xu; Arun Babu; Jiatao Gu; Michael Auli; |
55 | End-to-End Balancing for Causal Continuous Treatment-Effect Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new theory for consistency of entropy balancing for continuous treatments. |
Taha Bahadori; Eric Tchetgen Tchetgen; David Heckerman; |
56 | A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned Kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. |
Lu Bai; Lixin Cui; Hancock Edwin; |
57 | Near-Optimal Learning of Extensive-Form Games with Imperfect Information Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. |
Yu Bai; Chi Jin; Song Mei; Tiancheng Yu; |
58 | Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel contrastive learning boosted multi-label prediction model based on a Gaussian mixture variational autoencoder (C-GMVAE), which learns a multimodal prior space and employs a contrastive loss. |
Junwen Bai; Shufeng Kong; Carla P Gomes; |
59 | A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. |
He Bai; Renjie Zheng; Junkun Chen; Mingbo Ma; Xintong Li; Liang Huang; |
60 | Stability Based Generalization Bounds for Exponential Family Langevin Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we unify and substantially generalize stability based generalization bounds and make three technical contributions. |
Arindam Banerjee; Tiancong Chen; Xinyan Li; Yingxue Zhou; |
61 | Certified Neural Network Watermarks with Randomized Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the first certifiable watermarking method. |
Arpit Bansal; Ping-Yeh Chiang; Michael J Curry; Rajiv Jain; Curtis Wigington; Varun Manjunatha; John P Dickerson; Tom Goldstein; |
62 | Data Scaling Laws in NMT: The Effect of Noise and Architecture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). |
Yamini Bansal; Behrooz Ghorbani; Ankush Garg; Biao Zhang; Colin Cherry; Behnam Neyshabur; Orhan Firat; |
63 | Learning Stable Classifiers By Transferring Unstable Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we explicitly inform the target classifier about unstable features in the source tasks. |
Yujia Bao; Shiyu Chang; Dr.Regina Barzilay; |
64 | Fast Composite Optimization and Statistical Recovery in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: From optimization upfront, we propose a new algorithm named Fast Federated Dual Averaging for strongly convex and smooth loss and establish state-of-the-art iteration and communication complexity in the composite setting. |
Yajie Bao; Michael Crawshaw; Shan Luo; Mingrui Liu; |
65 | Generative Modeling for Multi-task Visual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, motivated by multi-task learning of shareable feature representations, we consider a novel problem of learning a shared generative model that is useful across various visual perception tasks. |
Zhipeng Bao; Martial Hebert; Yu-Xiong Wang; |
66 | Estimating The Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider diagonal and full covariances to improve the expressive power of DPMs. |
Fan Bao; Chongxuan Li; Jiacheng Sun; Jun Zhu; Bo Zhang; |
67 | On The Surrogate Gap Between Contrastive and Supervised Losses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Following the simplified setting where positive pairs are drawn from the true distribution (not generated by data augmentation; as supposed in previous studies), this study establishes surrogate upper and lower bounds for the downstream classification loss for all negative sample sizes that best explain the empirical observations on the negative sample size in the earlier studies. |
Han Bao; Yoshihiro Nagano; Kento Nozawa; |
68 | Representation Topology Divergence: A Method for Comparing Neural Network Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a method for comparing two data representations. |
Serguei Barannikov; Ilya Trofimov; Nikita Balabin; Evgeny Burnaev; |
69 | Sparse Mixed Linear Regression with Guarantees: Taming An Intractable Problem with Invex Relaxation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the problem of sparse mixed linear regression on an unlabeled dataset that is generated from linear measurements from two different regression parameter vectors. |
Adarsh Barik; Jean Honorio; |
70 | Neural Fisher Discriminant Analysis: Optimal Neural Network Embeddings in Polynomial Time Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a natural extension of FLDA that employs neural networks, called Neural Fisher Discriminant Analysis (NFDA). |
Burak Bartan; Mert Pilanci; |
71 | Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model called discounted stochastic game (Shapley, 1953). |
Lucas Baudin; Rida Laraki; |
72 | Information Discrepancy in Strategic Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We initiate the study of the effects of non-transparency in decision rules on individuals’ ability to improve in strategic learning settings. |
Yahav Bechavod; Chara Podimata; Steven Wu; Juba Ziani; |
73 | On The Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. |
Amrit Singh Bedi; Souradip Chakraborty; Anjaly Parayil; Brian M Sadler; Pratap Tokekar; Alec Koppel; |
74 | Imitation Learning By Estimating Expertise of Demonstrators Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. |
Mark Beliaev; Andy Shih; Stefano Ermon; Dorsa Sadigh; Ramtin Pedarsani; |
75 | Matching Normalizing Flows and Probability Paths on Manifolds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to train CNFs on manifolds by minimizing probability path divergence (PPD), a novel family of divergences between the probability density path generated by the CNF and a target probability density path. |
Heli Ben-Hamu; Samuel Cohen; Joey Bose; Brandon Amos; Maximillian Nickel; Aditya Grover; Ricky T. Q. Chen; Yaron Lipman; |
76 | Stochastic Contextual Dueling Bandits Under Linear Stochastic Transitivity Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a computationally efficient algorithm, \Algo{CoLSTIM}, which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. |
Viktor Bengs; Aadirupa Saha; Eyke H?llermeier; |
77 | Neural Inverse Kinematic Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a neural IK method that employs the hierarchical structure of the problem to sequentially sample valid joint angles conditioned on the desired position and on the preceding joints along the chain. |
Raphael Bensadoun; Shir Gur; Nitsan Blau; Lior Wolf; |
78 | Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this fundamental limitation, we show how to re-cast a class of stochastic volatility models as a hierarchical Gaussian process (GP) model with specialized covariance functions. |
Gregory Benton; Wesley Maddox; Andrew Gordon Wilson; |
79 | Gradient Descent on Neurons and Its Link to Approximate Second-order Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This challenges widely held believes and immediately raises the question why KFAC performs so well. Towards answering this question we present evidence strongly suggesting that KFAC approximates a first-order algorithm, which performs gradient descent on neurons rather than weights. |
Frederik Benzing; |
80 | Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the hard-threshold problem of achieving sublinear regret while guaranteeing that the threshold constraint is satisfied at every iteration with high probability. |
Martino Bernasconi; Federico Cacciamani; Matteo Castiglioni; Alberto Marchesi; Nicola Gatti; Francesco Trov?; |
81 | Skin Deep Unlearning: Artefact and Instrument Debiasing in The Context of Melanoma Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we robustly remove bias and spurious variation from an automated melanoma classification pipeline using two leading bias unlearning techniques. |
Peter Bevan; Amir Atapour-Abarghouei; |
82 | Approximate Bayesian Computation with Domain Expert in The Loop Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce an active learning method for ABC statistics selection which reduces the domain expert’s work considerably. |
Ayush Bharti; Louis Filstroff; Samuel Kaski; |
83 | Minimax M-estimation Under Adversarial Contamination Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To illustrate the usefulness of the derived robust M-estimator in an online setting, we present a bandit algorithm for the partially identifiable best arm identification problem that improves upon the sample complexity of the state of the art algorithms. |
Sujay Bhatt; Guanhua Fang; Ping Li; Gennady Samorodnitsky; |
84 | Nearly Optimal Catoni’s M-estimator for Infinite Variance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we extend the remarkable M-estimator of Catoni \citep{Cat12} to situations where the variance is infinite. |
Sujay Bhatt; Guanhua Fang; Ping Li; Gennady Samorodnitsky; |
85 | Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint) differential privacy. |
Alberto Bietti; Chen-Yu Wei; Miroslav Dudik; John Langford; Steven Wu; |
86 | Non-Vacuous Generalisation Bounds for Shallow Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. |
Felix Biggs; Benjamin Guedj; |
87 | Structure-preserving GANs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences. |
Jeremiah Birrell; Markos Katsoulakis; Luc Rey-Bellet; Wei Zhu; |
88 | Scalable Spike-and-Slab Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this article, we propose Scalable Spike-and-Slab (S^3), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George & McCulloch (1993). |
Niloy Biswas; Lester Mackey; Xiao-Li Meng; |
89 | Breaking Down Out-of-Distribution Detection: Many Methods Based on OOD Training Data Estimate A Combination of The Same Core Quantities Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The goal of this paper is to recognize common objectives as well as to identify the implicit scoring functions of different OOD detection methods. |
Julian Bitterwolf; Alexander Meinke; Maximilian Augustin; Matthias Hein; |
90 | A Query-optimal Algorithm for Finding Counterfactuals Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. |
Guy Blanc; Caleb Koch; Jane Lange; Li-Yang Tan; |
91 | Popular Decision Tree Algorithms Are Provably Noise Tolerant Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. |
Guy Blanc; Jane Lange; Ali Malik; Li-Yang Tan; |
92 | Optimizing Sequential Experimental Design with Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these methods may not sufficiently explore the design space, require access to a differentiable probabilistic model and can only optimize over continuous design spaces. Here, we address these limitations by showing that the problem of optimizing policies can be reduced to solving a Markov decision process (MDP). |
Tom Blau; Edwin V. Bonilla; Iadine Chades; Amir Dezfouli; |
93 | Lagrangian Method for Q-Function Learning (with Applications to Machine Translation) Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. |
Huang Bojun; |
94 | Generalized Results for The Existence and Consistency of The MLE in The Bradley-Terry-Luce Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the performance of the Bradley-Terry-Luce model for ranking from pairwise comparison data under more realistic settings than those considered in the literature so far. |
Heejong Bong; Alessandro Rinaldo; |
95 | How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Leveraging NTK theory, we show theoretically that gradient descent drives layerwise weight updates that are aligned with their input activity correlations weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. |
Akhilan Boopathy; Ila Fiete; |
96 | Improving Language Models By Retrieving from Trillions of Tokens Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. |
Sebastian Borgeaud; Arthur Mensch; Jordan Hoffmann; Trevor Cai; Eliza Rutherford; Katie Millican; George Bm Van Den Driessche; Jean-Baptiste Lespiau; Bogdan Damoc; Aidan Clark; Diego De Las Casas; Aurelia Guy; Jacob Menick; Roman Ring; Tom Hennigan; Saffron Huang; Loren Maggiore; Chris Jones; Albin Cassirer; Andy Brock; Michela Paganini; Geoffrey Irving; Oriol Vinyals; Simon Osindero; Karen Simonyan; Jack Rae; Erich Elsen; Laurent Sifre; |
97 | Lie Point Symmetry Data Augmentation for Neural PDE Solvers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we are presented with a proverbial chicken-and-egg problem. In this paper, we present a method, which can partially alleviate this problem, by improving neural PDE solver sample complexity—Lie point symmetry data augmentation (LPSDA). |
Johannes Brandstetter; Max Welling; Daniel E Worrall; |
98 | An Iterative Clustering Algorithm for The Contextual Stochastic Block Model with Optimality Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. |
Guillaume Braun; Hemant Tyagi; Christophe Biernacki; |
99 | Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the emerging principles of dendritic computation, we augment a dynamically interpretable and mathematically tractable piecewise-linear (PL) recurrent neural network (RNN) by a linear spline basis expansion. |
Manuel Brenner; Florian Hess; Jonas M Mikhaeil; Leonard F Bereska; Zahra Monfared; Po-Chen Kuo; Daniel Durstewitz; |
100 | Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. |
Luc Brogat-Motte; R?mi Flamary; Celine Brouard; Juho Rousu; Florence D?Alch?-Buc; |
101 | Efficient Learning of CNNs Using Patch Based Features Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recent work has demonstrated the effectiveness of using patch based representations when learning from image data. Here we provide theoretical support for this observation, by showing that a simple semi-supervised algorithm that uses patch statistics can efficiently learn labels produced by a one-hidden-layer Convolutional Neural Network (CNN). |
Alon Brutzkus; Amir Globerson; Eran Malach; Alon Regev Netser; Shai Shalev-Schwartz; |
102 | Causal Structure-based Root Cause Analysis of Outliers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a formal method to identify "root causes" of outliers, amongst variables. |
Kailash Budhathoki; Lenon Minorics; Patrick Bloebaum; Dominik Janzing; |
103 | IGLUE: A Benchmark for Transfer Learning Across Modalities, Tasks, and Languages Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. |
Emanuele Bugliarello; Fangyu Liu; Jonas Pfeiffer; Siva Reddy; Desmond Elliott; Edoardo Maria Ponti; Ivan Vulic; |
104 | Interactive Inverse Reinforcement Learning for Cooperative Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of designing autonomous agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. |
Thomas Kleine B?ning; Anne-Marie George; Christos Dimitrakakis; |
105 | Convolutional and Residual Networks Provably Contain Lottery Tickets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability. |
Rebekka Burkholz; |
106 | Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new algorithm with stronger sample complexity bounds than existing ones. |
Haoyuan Cai; Tengyu Ma; Simon Du; |
107 | Convergence of Invariant Graph Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the convergence of one powerful GNN, Invariant Graph Network (IGN) over graphs sampled from graphons. |
Chen Cai; Yusu Wang; |
108 | Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In detail, we propose a reinforcement learning algorithm (Optimistic Exploration via Adversarial Integral Equation or OP-TENET) that attains an $\epsilon$-optimal policy within $O(1/\epsilon^2)$ episodes. |
Qi Cai; Zhuoran Yang; Zhaoran Wang; |
109 | Scaling Gaussian Process Optimization By Evaluating A Few Unique Candidates Multiple Times Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that sequential black-box optimization based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution for multiple evaluation steps and switch only when necessary. |
Daniele Calandriello; Luigi Carratino; Alessandro Lazaric; Michal Valko; Lorenzo Rosasco; |
110 | Adaptive Gaussian Process Change Point Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Detecting change points in time series, i.e., points in time at which some observed process suddenly changes, is a fundamental task that arises in many real-world applications, with consequences for safety and reliability. In this work, we propose ADAGA, a novel Gaussian process-based solution to this problem, that leverages a powerful heuristics we developed based on statistical hypothesis testing. |
Edoardo Caldarelli; Philippe Wenk; Stefan Bauer; Andreas Krause; |
111 | Measuring Dissimilarity with Diffeomorphism Invariance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data’s internal structure to be invariant to diffeomorphisms. |
Th?ophile Cantelobre; Carlo Ciliberto; Benjamin Guedj; Alessandro Rudi; |
112 | A Model-Agnostic Randomized Learning Framework Based on Random Hypothesis Subspace Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a model-agnostic randomized learning framework based on Random Hypothesis Subspace Sampling (RHSS). |
Yiting Cao; Chao Lan; |
113 | Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, state-of-the-art techniques for safety-critical settings hinge on the assumption that the kernel hyperparameters are known, which does not apply in general. To mitigate this, we introduce robust Gaussian process uniform error bounds in settings with unknown hyperparameters. |
Alexandre Capone; Armin Lederer; Sandra Hirche; |
114 | Burst-Dependent Plasticity and Dendritic Amplification Support Target-Based Learning and Hierarchical Imitation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a multi-compartment model of pyramidal neuron, in which bursts and dendritic input segregation give the possibility to plausibly support a biological target-based learning. |
Cristiano Capone; Cosimo Lupo; Paolo Muratore; Pier Stanislao Paolucci; |
115 | A Marriage Between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose a new, suitable game representation that we call team-public-information, in which a team is represented as a single coordinator who only knows information common to the whole team and prescribes to each member an action for any possible private state. |
Luca Carminati; Federico Cacciamani; Marco Ciccone; Nicola Gatti; |
116 | RECAPP: Crafting A More Efficient Catalyst for Convex Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. |
Yair Carmon; Arun Jambulapati; Yujia Jin; Aaron Sidford; |
117 | Estimating and Penalizing Induced Preference Shifts in Recommender Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We focus on induced preference shifts in users. |
Micah D Carroll; Anca Dragan; Stuart Russell; Dylan Hadfield-Menell; |
118 | YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. |
Edresson Casanova; Julian Weber; Christopher D Shulby; Arnaldo Candido Junior; Eren G?lge; Moacir A Ponti; |
119 | The Infinite Contextual Graph Markov Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As with most Deep Graph Networks, an inherent limitation is the need to perform an extensive model selection to choose the proper size of each layer’s latent representation. In this paper, we address this problem by introducing the Infinite Contextual Graph Markov Model (iCGMM), the first deep Bayesian nonparametric model for graph learning. |
Daniele Castellana; Federico Errica; Davide Bacciu; Alessio Micheli; |
120 | Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Compressed Vertical Federated Learning (C-VFL) for communication-efficient training on vertically partitioned data. |
Timothy J Castiglia; Anirban Das; Shiqiang Wang; Stacy Patterson; |
121 | Online Learning with Knapsacks: The Best of Both Worlds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$ resource constraints. |
Matteo Castiglioni; Andrea Celli; Christian Kroer; |
122 | Stabilizing Off-Policy Deep Reinforcement Learning from Pixels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutional encoder and low-magnitude rewards. |
Edoardo Cetin; Philip J Ball; Stephen Roberts; Oya Celiktutan; |
123 | Accelerated, Optimal and Parallel: Some Results on Model-based Stochastic Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an acceleration scheme for the APROX family and provide non-asymptotic convergence guarantees, which are order-optimal in all problem-dependent constants and provide even larger minibatching speedups. |
Karan Chadha; Gary Cheng; John Duchi; |
124 | Robust Imitation Learning Against Variations in Environment Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a robust imitation learning (IL) framework that improves the robustness of IL when environment dynamics are perturbed. |
Jongseong Chae; Seungyul Han; Whiyoung Jung; Myungsik Cho; Sungho Choi; Youngchul Sung; |
125 | Fairness with Adaptive Weights Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel adaptive reweighing method to address representation bias. |
Junyi Chai; Xiaoqian Wang; |
126 | UNIREX: A Unified Learning Framework for Language Model Rationale Extraction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although attribution algorithms and select-predict pipelines are commonly used in rationale extraction, they both rely on certain heuristics that hinder them from satisfying all three desiderata. In light of this, we propose UNIREX, a flexible learning framework which generalizes rationale extractor optimization as follows: (1) specify architecture for a learned rationale extractor; (2) select explainability objectives (\ie faithfulness and plausibility criteria); and (3) jointly train the task model and rationale extractor on the task using selected objectives. |
Aaron Chan; Maziar Sanjabi; Lambert Mathias; Liang Tan; Shaoliang Nie; Xiaochang Peng; Xiang Ren; Hamed Firooz; |
127 | Revisiting Label Smoothing and Knowledge Distillation Compatibility: What Was Missing? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The main contributions of our work are the discovery, analysis and validation of systematic diffusion as the missing concept which is instrumental in understanding and resolving these contradictory findings. |
Keshigeyan Chandrasegaran; Ngoc-Trung Tran; Yunqing Zhao; Ngai-Man Cheung; |
128 | Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we tackle the training-inference mismatch encountered during unsupervised learning of controllable generative sequence models. |
Jen-Hao Rick Chang; Ashish Shrivastava; Hema Koppula; Xiaoshuai Zhang; Oncel Tuzel; |
129 | Learning Bellman Complete Representations for Offline Policy Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. |
Jonathan Chang; Kaiwen Wang; Nathan Kallus; Wen Sun; |
130 | Sample Efficient Learning of Predictors That Complement Humans Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we provide the first theoretical analysis of the benefit of learning complementary predictors in expert deferral. |
Mohammad-Amin Charusaie; Hussein Mozannar; David Sontag; Samira Samadi; |
131 | Nystrom Kernel Mean Embeddings Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an efficient approximation procedure based on the Nystr{ö}m method, which exploits a small random subset of the dataset. |
Antoine Chatalic; Nicolas Schreuder; Lorenzo Rosasco; Alessandro Rudi; |
132 | Coarsening The Granularity: Towards Structurally Sparse Lottery Tickets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we demonstrate the first positive result that a structurally sparse winning ticket can be effectively found in general. |
Tianlong Chen; Xuxi Chen; Xiaolong Ma; Yanzhi Wang; Zhangyang Wang; |
133 | Learning Domain Adaptive Object Detection with Probabilistic Teacher Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture the uncertainty of unlabeled target data from a gradually evolving teacher and guides the learning of a student in a mutually beneficial manner. |
Meilin Chen; Weijie Chen; Shicai Yang; Jie Song; Xinchao Wang; Lei Zhang; Yunfeng Yan; Donglian Qi; Yueting Zhuang; Di Xie; Shiliang Pu; |
134 | The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the problem of training a $d$ dimensional model with distributed differential privacy (DP) where secure aggregation (SecAgg) is used to ensure that the server only sees the noisy sum of $n$ model updates in every training round. |
Wei-Ning Chen; Christopher A Choquette Choo; Peter Kairouz; Ananda Theertha Suresh; |
135 | Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. |
Mayee Chen; Daniel Y Fu; Avanika Narayan; Michael Zhang; Zhao Song; Kayvon Fatahalian; Christopher Re; |
136 | Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. |
Tianrui Chen; Aditya Gangrade; Venkatesh Saligrama; |
137 | On The Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we extend the uniform-PAC sample complexity from episodic setting to the infinite-horizon discounted setting, and propose a novel algorithm dubbed UPAC-UCLK that achieves an $\Tilde{O}\big(d^2/((1-\gamma)^4\epsilon^2)+1/((1-\gamma)^6\epsilon^2)\big)$ uniform-PAC sample complexity, where $d$ is the dimension of the feature mapping, $\gamma \in(0,1)$ is the discount factor of the MDP and $\epsilon$ is the accuracy parameter. |
Yuanzhou Chen; Jiafan He; Quanquan Gu; |
138 | Streaming Algorithms for Support-Aware Histograms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error. In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution are considered. |
Justin Chen; Piotr Indyk; Tal Wagner; |
139 | Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). |
Liyu Chen; Rahul Jain; Haipeng Luo; |
140 | Learning Infinite-horizon Average-reward Markov Decision Process with Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. |
Liyu Chen; Rahul Jain; Haipeng Luo; |
141 | Active Multi-Task Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance. |
Yifang Chen; Kevin Jamieson; Simon Du; |
142 | On Collective Robustness of Bagging Against Data Poisoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this analysis, we propose hash bagging to improve the robustness of vanilla bagging almost for free. |
Ruoxin Chen; Zenan Li; Jie Li; Junchi Yan; Chentao Wu; |
143 | Online Active Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. |
Cheng Chen; Yi Li; Yiming Sun; |
144 | Selling Data To A Machine Learner: Pricing Via Costly Signaling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider a new problem of selling data to a machine learner who looks to purchase data to train his machine learning model. |
Junjie Chen; Minming Li; Haifeng Xu; |
145 | ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. |
Jintai Chen; Kuanlun Liao; Kun Wei; Haochao Ying; Danny Z Chen; Jian Wu; |
146 | Weisfeiler-Lehman Meets Gromov-Wasserstein Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the Weisfeiler-Lehman (WL) distance, a notion of distance between labeled measure Markov chains (LMMCs), of which labeled graphs are special cases. |
Samantha Chen; Sunhyuk Lim; Facundo Memoli; Zhengchao Wan; Yusu Wang; |
147 | On Non-local Convergence Analysis of Deep Linear Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the non-local convergence properties of deep linear networks. |
Kun Chen; Dachao Lin; Zhihua Zhang; |
148 | Flow-based Recurrent Belief State Learning for POMDPs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce the \textbf{F}l\textbf{O}w-based \textbf{R}ecurrent \textbf{BE}lief \textbf{S}tate model (FORBES), which incorporates normalizing flows into the variational inference to learn general continuous belief states for POMDPs. |
Xiaoyu Chen; Yao Mark Mu; Ping Luo; Shengbo Li; Jianyu Chen; |
149 | Structure-Aware Transformer for Graph Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose several methods for automatically generating the subgraph representation and show theoretically that the resulting representations are at least as expressive as the subgraph representations. |
Dexiong Chen; Leslie O?Bray; Karsten Borgwardt; |
150 | The Poisson Binomial Mechanism for Unbiased Federated Learning with Secure Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. |
Wei-Ning Chen; Ayfer Ozgur; Peter Kairouz; |
151 | Learning Mixtures of Linear Dynamical Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of learning a mixture of multiple linear dynamical systems (LDSs) from unlabeled short sample trajectories, each generated by one of the LDS models. |
Yanxi Chen; H. Vincent Poor; |
152 | On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. |
Xiaohong Chen; Zhengling Qi; |
153 | Faster Fundamental Graph Algorithms Via Learned Predictions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the question of speeding up classic graph algorithms with machine-learned predictions. |
Justin Chen; Sandeep Silwal; Ali Vakilian; Fred Zhang; |
154 | Improve Single-Point Zeroth-Order Optimization Using High-Pass and Low-Pass Filters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we borrow the idea of high-pass and low-pass filters from extremum seeking control (continuous-time version of SZO) and develop a novel SZO method called HLF-SZO by integrating these filters. |
Xin Chen; Yujie Tang; Na Li; |
155 | Deep Variational Graph Convolutional Recurrent Network for Multivariate Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we model sensor dependency and stochasticity within MTS by developing an embedding-guided probabilistic generative network. |
Wenchao Chen; Long Tian; Bo Chen; Liang Dai; Zhibin Duan; Mingyuan Zhou; |
156 | Auxiliary Learning with Joint Task and Data Scheduling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to learn a joint task and data schedule for auxiliary learning, which captures the importance of different data samples in each auxiliary task to the target task. |
Hong Chen; Xin Wang; Chaoyu Guan; Yue Liu; Wenwu Zhu; |
157 | Optimization-Induced Graph Implicit Nonlinear Diffusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the over-smoothing issue, most existing graph neural networks can only capture limited dependencies with their inherently finite aggregation layers. To overcome this limitation, we propose a new kind of graph convolution, called Graph Implicit Nonlinear Diffusion (GIND), which implicitly has access to infinite hops of neighbors while adaptively aggregating features with nonlinear diffusion to prevent over-smoothing. |
Qi Chen; Yifei Wang; Yisen Wang; Jiansheng Yang; Zhouchen Lin; |
158 | Robust Meta-learning with Sampling Noise and Label Noise Via Eigen-Reptile Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, when handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise on a corrupted dataset. To address these two challenges, we present Eigen-Reptile (ER) that updates the meta-parameters with the main direction of historical task-specific parameters. |
Dong Chen; Lingfei Wu; Siliang Tang; Xiao Yun; Bo Long; Yueting Zhuang; |
159 | Adaptive Model Design for Markov Decision Process Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, appropriate regulations are often required, if we hope to take the external costs/benefits of its actions into consideration. In this paper, we study how to regulate such an agent by redesigning model parameters that can affect the rewards and/or the transition kernels. |
Siyu Chen; Donglin Yang; Jiayang Li; Senmiao Wang; Zhuoran Yang; Zhaoran Wang; |
160 | State Transition of Dendritic Spines Improves Learning of Sparse Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the state transition of dendritic spines in the filopodial model of spinogenesis, we model different states of SNN weights, facilitating weight optimization for pruning. |
Yanqi Chen; Zhaofei Yu; Wei Fang; Zhengyu Ma; Tiejun Huang; Yonghong Tian; |
161 | Efficient Online ML API Selection for Multi-Label Classification Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting the user’s budget. |
Lingjiao Chen; Matei Zaharia; James Zou; |
162 | Data-Efficient Double-Win Lottery Tickets from Robust Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. |
Tianlong Chen; Zhenyu Zhang; Sijia Liu; Yang Zhang; Shiyu Chang; Zhangyang Wang; |
163 | Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To trade off the DNN expressiveness (which calls for more non-linearity) and robustness certification scalability (which prefers more linearity), we propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity. |
Tianlong Chen; Huan Zhang; Zhenyu Zhang; Shiyu Chang; Sijia Liu; Pin-Yu Chen; Zhangyang Wang; |
164 | Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the first optimistic model-based algorithm for PbRL with general function approximation, which estimates the model using value-targeted regression and calculates the exploratory policies by solving an optimistic planning problem. |
Xiaoyu Chen; Han Zhong; Zhuoran Yang; Zhaoran Wang; Liwei Wang; |
165 | Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents’ local information and are sample and communication-efficient. |
Ziyi Chen; Yi Zhou; Rong-Rong Chen; Shaofeng Zou; |
166 | Task-aware Privacy Preservation for Multi-dimensional Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address how to significantly improve the ultimate task performance with multi-dimensional user data by considering a task-aware privacy preservation problem. |
Jiangnan Cheng; Ao Tang; Sandeep Chinchali; |
167 | Adversarially Trained Actor Critic for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. |
Ching-An Cheng; Tengyang Xie; Nan Jiang; Alekh Agarwal; |
168 | Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. |
Nadiia Chepurko; Kenneth Clarkson; Lior Horesh; Honghao Lin; David Woodruff; |
169 | RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. |
Victor Chernozhukov; Whitney Newey; Vi?ctor M Quintas-Marti?nez; Vasilis Syrgkanis; |
170 | Self-supervised Learning with Random-projection Quantizer for Speech Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a simple and effective self-supervised learning approach for speech recognition. |
Chung-Cheng Chiu; James Qin; Yu Zhang; Jiahui Yu; Yonghui Wu; |
171 | Discrete Probabilistic Inverse Optimal Transport Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We formalize and systematically analyze the properties of IOT using tools from the study of entropy-regularized OT. |
Wei-Ting Chiu; Pei Wang; Patrick Shafto; |
172 | Selective Network Linearization for Efficient Private Inference Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. |
Minsu Cho; Ameya Joshi; Brandon Reagen; Siddharth Garg; Chinmay Hegde; |
173 | From Block-Toeplitz Matrices to Differential Equations on Graphs: Towards A General Theory for Scalable Masked Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. |
Krzysztof Choromanski; Han Lin; Haoxian Chen; Tianyi Zhang; Arijit Sehanobish; Valerii Likhosherstov; Jack Parker-Holder; Tamas Sarlos; Adrian Weller; Thomas Weingarten; |
174 | Shuffle Private Linear Contextual Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a general algorithmic framework for linear contextual bandits under the shuffle trust model, where there exists a trusted shuffler – in between users and the central server– that randomly permutes a batch of users data before sending those to the server. |
Sayak Ray Chowdhury; Xingyu Zhou; |
175 | DNA: Domain Generalization with Diversified Neural Averaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Methodologically, we propose a diversified neural averaging (DNA) method for DG, which optimizes the proposed PAC-Bayes bound approximately. |
Xu Chu; Yujie Jin; Wenwu Zhu; Yasha Wang; Xin Wang; Shanghang Zhang; Hong Mei; |
176 | TPC: Transformation-Specific Smoothing for Point Cloud Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a transformation-specific smoothing framework TPC, which provides tight and scalable robustness guarantees for point cloud models against semantic transformation attacks. |
Wenda Chu; Linyi Li; Bo Li; |
177 | Unified Scaling Laws for Routed Language Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. |
Aidan Clark; Diego De Las Casas; Aurelia Guy; Arthur Mensch; Michela Paganini; Jordan Hoffmann; Bogdan Damoc; Blake Hechtman; Trevor Cai; Sebastian Borgeaud; George Bm Van Den Driessche; Eliza Rutherford; Tom Hennigan; Matthew J Johnson; Albin Cassirer; Chris Jones; Elena Buchatskaya; David Budden; Laurent Sifre; Simon Osindero; Oriol Vinyals; Marc?Aurelio Ranzato; Jack Rae; Erich Elsen; Koray Kavukcuoglu; Karen Simonyan; |
178 | Context-Aware Drift Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead we may wish to test for differences in the distributions conditional on context that is permitted to change. To facilitate this we borrow machinery from the causal inference domain to develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects. |
Oliver Cobb; Arnaud Van Looveren; |
179 | On The Robustness of CountSketch to Adaptive Inputs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior "blackbox" approaches. |
Edith Cohen; Xin Lyu; Jelani Nelson; Tamas Sarlos; Moshe Shechner; Uri Stemmer; |
180 | Diffusion Bridges Vector Quantized Variational Autoencoders Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. |
Max Cohen; Guillaume Quispe; Sylvain Le Corff; Charles Ollion; Eric Moulines; |
181 | Online and Consistent Correlation Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we study the problem in the classic online setting with recourse; The vertices of the graphs arrive in an online manner and the goal is to maintain an approximate clustering while minimizing the number of times each vertex changes cluster. |
Vincent Cohen-Addad; Silvio Lattanzi; Andreas Maggiori; Nikos Parotsidis; |
182 | Massively Parallel $k$-Means Clustering for Perturbation Resilient Instances Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider $k$-means clustering of $n$ data points in Euclidean space in the Massively Parallel Computation (MPC) model, a computational model which is an abstraction of modern massively parallel computing system such as MapReduce. |
Vincent Cohen-Addad; Vahab Mirrokni; Peilin Zhong; |
183 | One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an efficient sampling routine that uses an online representation of the data distribution as a prefilter to retain elements from rare groups. |
Benjamin Coleman; Benito Geordie; Li Chou; R. A. Leo Elworth; Todd Treangen; Anshumali Shrivastava; |
184 | Transfer and Marginalize: Explaining Away Label Noise with Privileged Information Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a simple and efficient method for supervised learning with neural networks: it transfers via weight sharing the knowledge learned with privileged information and approximately marginalizes over privileged information at test time. |
Mark Collier; Rodolphe Jenatton; Effrosyni Kokiopoulou; Jesse Berent; |
185 | MAML and ANIL Provably Learn Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we prove that two well-known GBML methods, MAML and ANIL, as well as their first-order approximations, are capable of learning common representation among a set of given tasks. |
Liam Collins; Aryan Mokhtari; Sewoong Oh; Sanjay Shakkottai; |
186 | Entropic Causal Inference: Graph Identifiability Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In our work, we first extend the causal graph identifiability result in the two-variable setting under relaxed assumptions. We then show the first identifiability result using the entropic approach for learning causal graphs with more than two nodes. |
Spencer Compton; Kristjan Greenewald; Dmitriy A Katz; Murat Kocaoglu; |
187 | Mitigating Gender Bias in Face Recognition Using The Von Mises-Fisher Mixture Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we investigate the gender bias of deep Face Recognition networks. |
Jean-R?my Conti; Nathan Noiry; Stephan Clemencon; Vincent Despiegel; St?phane Gentric; |
188 | Counterfactual Transportability: A Formal Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the transportability of counterfactuals from an arbitrary combination of observational and experimental distributions coming from disparate domains. |
Juan D Correa; Sanghack Lee; Elias Bareinboim; |
189 | Label-Free Explainability for Unsupervised Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. |
Jonathan Crabb?; Mihaela van der Schaar; |
190 | Evaluating The Adversarial Robustness of Adaptive Test-time Defenses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While these results are disappointing, we still believe that adaptive test-time defenses are a promising avenue of research and, as such, we provide recommendations for their thorough evaluation. |
Francesco Croce; Sven Gowal; Thomas Brunner; Evan Shelhamer; Matthias Hein; Taylan Cemgil; |
191 | Adversarial Robustness Against Multiple and Single $l_p$-Threat Models Via Quick Fine-Tuning of Robust Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we propose Extreme norm Adversarial Training (E-AT) for multiple-norm robustness which is based on geometric properties of $l_p$-balls. |
Francesco Croce; Matthias Hein; |
192 | Self-conditioning Pre-Trained Language Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). |
Xavier Suau Cuadros; Luca Zappella; Nicholas Apostoloff; |
193 | Only Tails Matter: Average-Case Universality and Robustness in The Convex Regime Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work shows that the concentration of eigenvalues near the edges of the ESD determines a problem’s asymptotic average complexity. |
Leonardo Cunha; Gauthier Gidel; Fabian Pedregosa; Damien Scieur; Courtney Paquette; |
194 | Principal Component Flows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we characterize the geometric structure of flows using principal manifolds and understand the relationship between latent variables and samples using contours. |
Edmond Cunningham; Adam D Cobb; Susmit Jha; |
195 | Deep Symbolic Regression for Recurrence Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. |
St?phane D?Ascoli; Pierre-Alexandre Kamienny; Guillaume Lample; Francois Charton; |
196 | Continuous Control with Action Quantization from Demonstrations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). |
Robert Dadashi; L?onard Hussenot; Damien Vincent; Sertan Girgin; Anton Raichuk; Matthieu Geist; Olivier Pietquin; |
197 | Dialog Inpainting: Turning Documents Into Dialogs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. |
Zhuyun Dai; Arun Tejasvi Chaganty; Vincent Y Zhao; Aida Amini; Qazi Mamunur Rashid; Mike Green; Kelvin Guu; |
198 | DisPFL: Towards Communication-Efficient Personalized Federated Learning Via Decentralized Sparse Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named DisPFL, which employs personalized sparse masks to customize sparse local models on the edge. |
Rong Dai; Li Shen; Fengxiang He; Xinmei Tian; Dacheng Tao; |
199 | Marginal Distribution Adaptation for Discrete Sets Via Module-Oriented Divergence Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a general framework to adapt a generative model subject to a (possibly counterfactual) target data distribution with both sampling and computation efficiency. |
Hanjun Dai; Mengjiao Yang; Yuan Xue; Dale Schuurmans; Bo Dai; |
200 | Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel formulation for the Inverse Reinforcement Learning (IRL) problem, which jointly accounts for the compatibility with the expert behavior of the identified reward and its effectiveness for the subsequent forward learning phase. |
Angelo Damiani; Giorgio Manganini; Alberto Maria Metelli; Marcello Restelli; |
201 | Understanding Robust Generalization in Learning Regular Languages Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We hypothesize that standard end-to-end modeling strategies cannot generalize well to systematic distribution shifts and propose a compositional strategy to address this. |
Soham Dan; Osbert Bastani; Dan Roth; |
202 | Unsupervised Image Representation Learning with Deep Latent Particles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new representation of visual data that disentangles object position from appearance. |
Tal Daniel; Aviv Tamar; |
203 | Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. |
Chris Dann; Yishay Mansour; Mehryar Mohri; Ayush Sekhari; Karthik Sridharan; |
204 | Monarch: Expressive Structured Matrices for Efficient and Accurate Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These methods have not seen widespread adoption (1) in end-to-end training due to unfavorable efficiency–quality tradeoffs, and (2) in dense-to-sparse fine-tuning due to lack of tractable algorithms to approximate a given dense weight matrix. To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms). |
Tri Dao; Beidi Chen; Nimit S Sohoni; Arjun Desai; Michael Poli; Jessica Grogan; Alexander Liu; Aniruddh Rao; Atri Rudra; Christopher Re; |
205 | Score-Guided Intermediate Level Optimization: Fast Langevin Mixing for Inverse Problems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In practice, to allow for increased expressivity, we propose to do posterior sampling in the latent space of a pre-trained generative model. |
Giannis Daras; Yuval Dagan; Alex Dimakis; Constantinos Daskalakis; |
206 | Test-Time Training Can Close The Natural Distribution Shift Performance Gap in Deep Learning Based Compressed Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a domain adaptation method for deep learning based compressive sensing that relies on self-supervision during training paired with test-time training at inference. |
Mohammad Zalbagi Darestani; Jiayu Liu; Reinhard Heckel; |
207 | Knowledge Base Question Answering By Case-based Reasoning Over Subgraphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Leveraging this structural similarity between local neighborhoods of different subgraphs, we introduce a semiparametric model (CBR-SUBG) with (i) a nonparametric component that for each query, dynamically retrieves other similar $k$-nearest neighbor (KNN) training queries along with query-specific subgraphs and (ii) a parametric component that is trained to identify the (latent) reasoning patterns from the subgraphs of KNN queries and then apply them to the subgraph of the target query. |
Rajarshi Das; Ameya Godbole; Ankita Naik; Elliot Tower; Manzil Zaheer; Hannaneh Hajishirzi; Robin Jia; Andrew Mccallum; |
208 | Framework for Evaluating Faithfulness of Local Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the faithfulness of an explanation system to the underlying prediction model. |
Sanjoy Dasgupta; Nave Frost; Michal Moshkovitz; |
209 | Distinguishing Rule and Exemplar-based Generalization in Learning Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The trade-off between exemplar- and rule-based generalization has been studied extensively in cognitive psychology; in this work, we present a protocol inspired by these experimental approaches to probe the inductive biases that control this trade-off in category-learning systems such as artificial neural networks. |
Ishita Dasgupta; Erin Grant; Tom Griffiths; |
210 | Robust Multi-Objective Bayesian Optimization Under Input Noise Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since directly optimizing MVaR is computationally infeasible in many settings, we propose a scalable, theoretically-grounded approach for optimizing MVaR using random scalarizations. |
Samuel Daulton; Sait Cakmak; Maximilian Balandat; Michael A. Osborne; Enlu Zhou; Eytan Bakshy; |
211 | Attentional Meta-learners for Few-shot Polythetic Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, we find that in the presence of task-irrelevant features, inherent to meta-learning problems, attentional models are susceptible to misclassification. To address this challenge, we propose a self-attention feature-selection mechanism that adaptively dilutes non-discriminative features. |
Ben J Day; Ramon Vi?as Torn?; Nikola Simidjievski; Pietro Li?; |
212 | Adversarial Vulnerability of Randomized Ensembles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this impressive performance raises the question: Are these robustness gains provided by randomized ensembles real? In this work we address this question both theoretically and empirically. |
Hassan Dbouk; Naresh Shanbhag; |
213 | Born-Infeld (BI) for AI: Energy-Conserving Descent (ECD) for Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a novel framework for optimization based on energy-conserving Hamiltonian dynamics in a strongly mixing (chaotic) regime and establish its key properties analytically and numerically. |
Giuseppe Bruno De Luca; Eva Silverstein; |
214 | Error-driven Input Modulation: Solving The Credit Assignment Problem Without A Backward Pass Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose to replace the backward pass with a second forward pass in which the input signal is modulated based on the error of the network. |
Giorgia Dellaferrera; Gabriel Kreiman; |
215 | DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a reconstruction-free MBRL agent, called DreamerPro, that can enhance robustness to distractions. |
Fei Deng; Ingook Jang; Sungjin Ahn; |
216 | NeuralEF: Deconstructing Kernels By Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the existing method relies on an expensive orthogonalization step and is difficult to implement. We show that these problems can be fixed by using a new series of objective functions that generalizes the EigenGame to function space. |
Zhijie Deng; Jiaxin Shi; Jun Zhu; |
217 | Deep Causal Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, this can lead the model to recklessly learn all the correlated distances found in training data including the spurious distance (e.g., background differences) that is not the distance of interest and can harm the generalization of the learned metric. To address this issue, we study metric learning from a causality perspective and accordingly propose deep causal metric learning (DCML) that pursues the true causality of the distance between samples. |
Xiang Deng; Zhongfei Zhang; |
218 | On The Convergence of Inexact Predictor-Corrector Methods for Linear Programming Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To remedy this, we theoretically and empirically analyze (slightly modified) predictor-corrector IPMs when using approximate linear solvers: our approach guarantees that, when certain conditions are satisfied, the number of IPM iterations does not increase and that the final solution remains feasible. |
Gregory Dexter; Agniva Chowdhury; Haim Avron; Petros Drineas; |
219 | Analysis of Stochastic Processes Through Replay Buffers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we analyze a system where a stochastic process X is pushed into a replay buffer and then randomly sampled to generate a stochastic process Y from the replay buffer. |
Shirli Di-Castro; Shie Mannor; Dotan Di Castro; |
220 | Streaming Algorithms for High-Dimensional Robust Statistics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). |
Ilias Diakonikolas; Daniel M. Kane; Ankit Pensia; Thanasis Pittas; |
221 | Learning General Halfspaces with Adversarial Label Noise Via Online Gradient Descent Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we show that the problem can be solved directly via online gradient descent applied to a sequence of natural non-convex surrogates. |
Ilias Diakonikolas; Vasilis Kontonis; Christos Tzamos; Nikos Zarifis; |
222 | Variational Feature Pyramid Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we opt to learn a dataset-specific architecture for Feature Pyramid Networks. |
Panagiotis Dimitrakopoulos; Giorgos Sfikas; Christophoros Nikou; |
223 | Understanding Doubly Stochastic Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the analysis of why this projection improves clustering has been limited. In this paper we present theoretical conditions on the given affinity matrix under which its doubly stochastic projection is an ideal affinity matrix (i.e., it has no false connections between clusters, and is well-connected within each cluster). |
Tianjiao Ding; Derek Lim; Rene Vidal; Benjamin D Haeffele; |
224 | Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. |
Dongsheng Ding; Chen-Yu Wei; Kaiqing Zhang; Mihailo Jovanovic; |
225 | Generalization and Robustness Implications in Object-Centric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. |
Andrea Dittadi; Samuele S Papa; Michele De Vita; Bernhard Sch?lkopf; Ole Winther; Francesco Locatello; |
226 | Fair Generalized Linear Models with A Convex Penalty Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we introduce two fairness criteria for GLMs based on equalizing expected outcomes or log-likelihoods. |
Hyungrok Do; Preston Putzel; Axel S Martin; Padhraic Smyth; Judy Zhong; |
227 | Bayesian Learning with Information Gain Provably Bounds Risk for A Robust Adversarial Defense Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new algorithm to learn a deep neural network model robust against adversarial attacks. |
Bao Gia Doan; Ehsan M Abbasnejad; Javen Qinfeng Shi; Damith Ranashinghe; |
228 | On The Adversarial Robustness of Causal Algorithmic Recourse Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we formulate the adversarially robust recourse problem and show that recourse methods that offer minimally costly recourse fail to be robust. |
Ricardo Dominguez-Olmedo; Amir H Karimi; Bernhard Sch?lkopf; |
229 | Finding The Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models and smoothly approximated with a concrete Gaussian Mixture (GM). |
Runpei Dong; Zhanhong Tan; Mengdi Wu; Linfeng Zhang; Kaisheng Ma; |
230 | PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a Parallelizable Attention-based Computation structure Encoder (PACE) that processes nodes simultaneously and encodes DAGs in parallel. |
Zehao Dong; Muhan Zhang; Fuhai Li; Yixin Chen; |
231 | Privacy for Free: How Does Dataset Condensation Help Privacy? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. |
Tian Dong; Bo Zhao; Lingjuan Lyu; |
232 | Fast Rates for Noisy Interpolation Require Rethinking The Effect of Inductive Bias Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Even though this intuition is valid for regularized models, in this paper we caution against a strong inductive bias for interpolation in the presence of noise: While a stronger inductive bias encourages a simpler structure that is more aligned with the ground truth, it also increases the detrimental effect of noise. |
Konstantin Donhauser; Nicol? Ruggeri; Stefan Stojanovic; Fanny Yang; |
233 | Adapting to Mixing Time in Stochastic Optimization with Markovian Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the first optimization method that does not require the knowledge of the mixing time, yet obtains the optimal asymptotic convergence rate when applied to convex problems. |
Ron Dorfman; Kfir Yehuda Levy; |
234 | TACTiS: Transformer-Attentional Copulas for Time Series Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we address the problem of estimating the joint predictive distribution of high-dimensional multivariate time series. |
Alexandre Drouin; ?tienne Marcotte; Nicolas Chapados; |
235 | Branching Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. |
Yihan Du; Wei Chen; |
236 | Bayesian Imitation Learning for End-to-End Mobile Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator. |
Yuqing Du; Daniel Ho; Alex Alemi; Eric Jang; Mohi Khansari; |
237 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose and develop a family of language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. |
Nan Du; Yanping Huang; Andrew M Dai; Simon Tong; Dmitry Lepikhin; Yuanzhong Xu; Maxim Krikun; Yanqi Zhou; Adams Wei Yu; Orhan Firat; Barret Zoph; Liam Fedus; Maarten P Bosma; Zongwei Zhou; Tao Wang; Emma Wang; Kellie Webster; Marie Pellat; Kevin Robinson; Kathleen Meier-Hellstern; Toju Duke; Lucas Dixon; Kun Zhang; Quoc Le; Yonghui Wu; Zhifeng Chen; Claire Cui; |
238 | Learning Iterative Reasoning Through Energy Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a new framework for iterative reasoning with neural networks. |
Yilun Du; Shuang Li; Joshua Tenenbaum; Igor Mordatch; |
239 | SE(3) Equivariant Graph Neural Networks with Complete Local Frames Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently. |
Weitao Du; He Zhang; Yuanqi Du; Qi Meng; Wei Chen; Nanning Zheng; Bin Shao; Tie-Yan Liu; |
240 | A Context-Integrated Transformer-Based Neural Network for Auction Design Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these works either focus on a fixed set of bidders and items, or restrict the auction to be symmetric. In this work, we overcome such limitations by factoring public contextual information of bidders and items into the auction learning framework. |
Zhijian Duan; Jingwu Tang; Yutong Yin; Zhe Feng; Xiang Yan; Manzil Zaheer; Xiaotie Deng; |
241 | Augment with Care: Contrastive Learning for Combinatorial Problems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We find that label-preserving augmentations are critical for the success of contrastive pre-training. |
Haonan Duan; Pashootan Vaezipoor; Max B Paulus; Yangjun Ruan; Chris Maddison; |
242 | Parametric Visual Program Induction with Function Modularization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the concept of parametric visual program induction. |
Xuguang Duan; Xin Wang; Ziwei Zhang; Wenwu Zhu; |
243 | Bayesian Deep Embedding Topic Meta-Learner Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel framework that efficiently solves the problem of topic modeling under the small data regime. |
Zhibin Duan; Yishi Xu; Jianqiao Sun; Bo Chen; Wenchao Chen; Chaojie Wang; Mingyuan Zhou; |
244 | Deletion Robust Submodular Maximization Over Matroids Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we study the deletion robust version of the problem under the classic matroids constraint. |
Paul Duetting; Federico Fusco; Silvio Lattanzi; Ashkan Norouzi-Fard; Morteza Zadimoghaddam; |
245 | From Data to Functa: Your Data Point Is A Function and You Can Treat It Like One Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? |
Emilien Dupont; Hyunjik Kim; S. M. Ali Eslami; Danilo Jimenez Rezende; Dan Rosenbaum; |
246 | Efficient Low Rank Convex Bounds for Pairwise Discrete Graphical Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we extend a Burer-Monteiro style method to compute low rank Semi-Definite Programming (SDP) bounds for the MAP problem on discrete graphical models with an arbitrary number of states and arbitrary pairwise potentials. |
Valentin Durante; George Katsirelos; Thomas Schiex; |
247 | Robust Counterfactual Explanations for Tree-Based Ensembles Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel strategy – that we call RobX – to generate robust counterfactuals for tree-based ensembles, e.g., XGBoost. |
Sanghamitra Dutta; Jason Long; Saumitra Mishra; Cecilia Tilli; Daniele Magazzeni; |
248 | On The Difficulty of Defending Self-Supervised Learning Against Model Extraction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We thus explore model stealing attacks against SSL. |
Adam Dziedzic; Nikita Dhawan; Muhammad Ahmad Kaleem; Jonas Guan; Nicolas Papernot; |
249 | LIMO: Latent Inceptionism for Targeted Molecule Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Latent Inceptionism on Molecules (LIMO), which significantly accelerates molecule generation with an inceptionism-like technique. |
Peter Eckmann; Kunyang Sun; Bo Zhao; Mudong Feng; Michael Gilson; Rose Yu; |
250 | Inductive Biases and Variable Creation in Self-Attention Mechanisms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To support our analysis, we present synthetic experiments to probe the sample complexity of learning sparse Boolean functions with Transformers. |
Benjamin L Edelman; Surbhi Goel; Sham Kakade; Cyril Zhang; |
251 | Provable Reinforcement Learning with A Short-Term Memory Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. |
Yonathan Efroni; Chi Jin; Akshay Krishnamurthy; Sobhan Miryoosefi; |
252 | Sparsity in Partially Controllable Linear Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, our structural results characterize those state variables which are irrelevant for optimal control, an analysis which departs from classical control techniques. |
Yonathan Efroni; Sham Kakade; Akshay Krishnamurthy; Cyril Zhang; |
253 | FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduced a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS, hence resolving the bottleneck to improve communication efficiency. |
Anis Elgabli; Chaouki Ben Issaid; Amrit Singh Bedi; Ketan Rajawat; Mehdi Bennis; Vaneet Aggarwal; |
254 | PathGCN: Learning General Graph Spatial Operators from Paths Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose pathGCN, a novel approach to learn the spatial operator from random paths on the graph. |
Moshe Eliasof; Eldad Haber; Eran Treister; |
255 | Discrete Tree Flows Via Tree-Structured Permutations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees—building upon the success of efficient tree-based methods for classification and regression for discrete data. |
Mai Elkady; Hyung Zin Lim; David I Inouye; |
256 | For Learning in Symmetric Teams, Local Optima Are Global Nash Equilibria Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. |
Scott Emmons; Caspar Oesterheld; Andrew Critch; Vincent Conitzer; Stuart Russell; |
257 | Streaming Algorithm for Monotone K-Submodular Maximization with Cardinality Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop a new streaming algorithm for maximizing a monotone k-submodular function subject to a per-coordinate cardinality constraint attaining an approximation guarantee close to the state of the art guarantee in the offline setting. |
Alina Ene; Huy Nguyen; |
258 | Towards Scaling Difference Target Propagation By Learning Backprop Targets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. |
Maxence M Ernoult; Fabrice Normandin; Abhinav Moudgil; Sean Spinney; Eugene Belilovsky; Irina Rish; Blake Richards; Yoshua Bengio; |
259 | Understanding Dataset Difficulty with $\mathcalV$-Usable Information Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address these questions, we frame dataset difficulty-w.r.t. a model V-as the lack of V-usable information (Xu et al., 2019), where a lower value indicates a more difficult dataset for V. |
Kawin Ethayarajh; Yejin Choi; Swabha Swayamdipta; |
260 | Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. |
Utku Evci; Vincent Dumoulin; Hugo Larochelle; Michael C Mozer; |
261 | Variational Sparse Coding with Learned Thresholding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new approach to variational sparse coding that allows us to learn sparse distributions by thresholding samples, avoiding the use of problematic relaxations. |
Kion Fallah; Christopher J Rozell; |
262 | Training Discrete Deep Generative Models Via Gapped Straight-Through Estimator Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a Gapped Straight-Through (GST) estimator to reduce the variance without incurring resampling overhead. |
Ting-Han Fan; Ta-Chung Chi; Alexander I. Rudnicky; Peter J Ramadge; |
263 | DRIBO: Robust Deep Reinforcement Learning Via Multi-View Information Bottleneck Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specif- ically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. |
Jiameng Fan; Wenchao Li; |
264 | Generalized Data Distribution Iteration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To obtain higher sample efficiency and superior final performance simultaneously has been one of the major challenges for deep reinforcement learning (DRL). Previous work could handle one of these challenges but typically failed to address them concurrently. In this paper, we try to tackle these two challenges simultaneously. |
Jiajun Fan; Changnan Xiao; |
265 | Variational Wasserstein Gradient Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper builds on the recent works with a slight but crucial difference: we propose to utilize a variational formulation of the objective function formulated as maximization over a parametric class of functions. |
Jiaojiao Fan; Qinsheng Zhang; Amirhossein Taghvaei; Yongxin Chen; |
266 | Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP) Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. |
Alex Fang; Gabriel Ilharco; Mitchell Wortsman; Yuhao Wan; Vaishaal Shankar; Achal Dave; Ludwig Schmidt; |
267 | Bayesian Continuous-Time Tucker Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition. |
Shikai Fang; Akil Narayan; Robert Kirby; Shandian Zhe; |
268 | Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. |
Sadegh Farhadkhani; Rachid Guerraoui; Nirupam Gupta; Rafael Pinot; John Stephan; |
269 | An Equivalence Between Data Poisoning and Byzantine Gradient Attacks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. |
Sadegh Farhadkhani; Rachid Guerraoui; L? Nguy?n Hoang; Oscar Villemaud; |
270 | Investigating Generalization By Controlling Normalized Margin Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The paper finds that yes{—}in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior. |
Alexander R Farhang; Jeremy D Bernstein; Kushal Tirumala; Yang Liu; Yisong Yue; |
271 | Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging The Gap Between Learning in Extensive-Form and Normal-Form Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm—the premier learning algorithm for NFGs—can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick. |
Gabriele Farina; Chung-Wei Lee; Haipeng Luo; Christian Kroer; |
272 | Local Linear Convergence of Douglas-Rachford for Linear Programming: A Probabilistic Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we analyze the local linear convergence rate $r$ of the DRS method for random linear programs, and give explicit and tight bounds on $r$. |
Oisin Faust; Hamza Fawzi; |
273 | Matching Structure for Dual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose to further enhance dual learning with structure matching that explicitly builds structural connections in between. |
Hao Fei; Shengqiong Wu; Yafeng Ren; Meishan Zhang; |
274 | Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. |
Yingjie Fei; Ruitu Xu; |
275 | Private Frequency Estimation Via Projective Geometry Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. |
Vitaly Feldman; Jelani Nelson; Huy Nguyen; Kunal Talwar; |
276 | An Intriguing Property of Geophysics Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To alleviate those issues, recent studies leverage deep neural networks to learn the inversion mappings from measurements to the property directly. In this paper, we show that such a mapping can be well modeled by a very shallow (but not wide) network with only five layers. |
Yinan Feng; Yinpeng Chen; Shihang Feng; Peng Jin; Zicheng Liu; Youzuo Lin; |
277 | Principled Knowledge Extrapolation with GANs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to study counterfactual synthesis from a new perspective of knowledge extrapolation, where a given knowledge dimension of the data distribution is extrapolated, but the remaining knowledge is kept indistinguishable from the original distribution. |
Ruili Feng; Jie Xiao; Kecheng Zheng; Deli Zhao; Jingren Zhou; Qibin Sun; Zheng-Jun Zha; |
278 | A Resilient Distributed Boosting Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a distributed boosting algorithm which is resilient to a limited amount of noise. |
Yuval Filmus; Idan Mehalel; Shay Moran; |
279 | Model-Value Inconsistency As A Signal for Epistemic Uncertainty Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). |
Angelos Filos; Eszter V?rtes; Zita Marinho; Gregory Farquhar; Diana Borsa; Abram Friesen; Feryal Behbahani; Tom Schaul; Andre Barreto; Simon Osindero; |
280 | Coordinated Double Machine Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While this methodology is flexible and can accommodate arbitrary predictive models, typically trained independently of one another, this paper argues that a carefully coordinated learning algorithm for deep neural networks may reduce the estimation bias. |
Nitai Fingerhut; Matteo Sesia; Yaniv Romano; |
281 | Conformal Prediction Sets with Limited False Positives Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop a new approach to multi-label conformal prediction in which we aim to output a precise set of promising prediction candidates with a bounded number of incorrect answers. |
Adam Fisch; Tal Schuster; Tommi Jaakkola; Dr.Regina Barzilay; |
282 | Fast Population-Based Reinforcement Learning on A Single Machine Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we compare implementations and revisit previous studies to show that the judicious use of compilation and vectorization allows population-based training to be performed on a single machine with one accelerator with minimal overhead compared to training a single agent. |
Arthur Flajolet; Claire Bizon Monroc; Karim Beguir; Thomas Pierrot; |
283 | Fast Relative Entropy Coding with A* Coding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce AS* and AD* coding, two REC algorithms based on A* sampling. |
Gergely Flamich; Stratis Markou; Jose Miguel Hernandez-Lobato; |
284 | Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. |
Adam Foster; Arpi Vezer; Craig A. Glastonbury; Paidi Creed; Samer Abujudeh; Aaron Sim; |
285 | Label Ranking Through Nonparametric Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a generative model for Label Ranking, in noiseless and noisy nonparametric regression settings, and provide sample complexity bounds for learning algorithms in both cases. |
Dimitris Fotakis; Alkis Kalavasis; Eleni Psaroudaki; |
286 | A Neural Tangent Kernel Perspective of GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). |
Jean-Yves Franceschi; Emmanuel De B?zenac; Ibrahim Ayed; Mickael Chen; Sylvain Lamprier; Patrick Gallinari; |
287 | Extracting Latent State Representations with Linear Dynamics from Rich Observations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider a setting where there is a hidden linear subspace of the high-dimensional feature space in which the dynamics are linear. |
Abraham Frandsen; Rong Ge; Holden Lee; |
288 | SPDY: Accurate Pruning with Speedup Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. |
Elias Frantar; Dan Alistarh; |
289 | Revisiting The Effects of Stochasticity for Hamiltonian Samplers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. |
Giulio Franzese; Dimitrios Milios; Maurizio Filippone; Pietro Michiardi; |
290 | Bregman Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a framework based on bilevel optimization for learning multilayer, deep data representations. |
Jordan Frecon; Gilles Gasso; Massimiliano Pontil; Saverio Salzo; |
291 | (Non-)Convergence Results for Predictive Coding Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: One major open problem around PCNs is their convergence behavior. In this paper, we use dynamical systems theory to formally investigate the convergence of PCNs as they are used in machine learning. |
Simon Frieder; Thomas Lukasiewicz; |
292 | Scaling Structured Inference with Randomization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. |
Yao Fu; John Cunningham; Mirella Lapata; |
293 | Greedy When Sure and Conservative When Uncertain About The Opponents Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a new approach, named Greedy when Sure and Conservative when Uncertain (GSCU), to competing online against unknown and nonstationary opponents. |
Haobo Fu; Ye Tian; Hongxiang Yu; Weiming Liu; Shuang Wu; Jiechao Xiong; Ying Wen; Kai Li; Junliang Xing; Qiang Fu; Wei Yang; |
294 | DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we open up a new compression paradigm for developing real-hardware efficient DNNs, leading to boosted hardware efficiency while maintaining model accuracy. |
Yonggan Fu; Haichuan Yang; Jiayi Yuan; Meng Li; Cheng Wan; Raghuraman Krishnamoorthi; Vikas Chandra; Yingyan Lin; |
295 | Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. |
Wei Fu; Chao Yu; Zelai Xu; Jiaqi Yang; Yi Wu; |
296 | $p$-Laplacian Based Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, when the topology is non-informative for label prediction, ordinary GNNs may work significantly worse than simply applying multi-layer perceptrons (MLPs) on each node. To tackle the above problem, we propose a new $p$-Laplacian based GNN model, termed as $^p$GNN, whose message passing mechanism is derived from a discrete regularization framework and could be theoretically explained as an approximation of a polynomial graph filter defined on the spectral domain of $p$-Laplacians. |
Guoji Fu; Peilin Zhao; Yatao Bian; |
297 | Why Should I Trust You, Bellman? The Bellman Error Is A Poor Replacement for Value Error Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. |
Scott Fujimoto; David Meger; Doina Precup; Ofir Nachum; Shixiang Shane Gu; |
298 | Robin Hood and Matthew Effects: Differential Privacy Has Disparate Impact on Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We analyze the impact of DP on these models vis-a-vis underrepresented classes/subgroups of data, specifically, studying: 1) the size of classes/subgroups in the synthetic data, and 2) the accuracy of classification tasks run on them. |
Georgi Ganev; Bristena Oprisanu; Emiliano De Cristofaro; |
299 | The Complexity of K-Means Clustering When Little Is Known Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we study the complexity of k-means clustering in settings where most of the data is not known or simply irrelevant. |
Robert Ganian; Thekla Hamm; Viktoriia Korchemna; Karolina Okrasa; Kirill Simonov; |
300 | IDYNO: Learning Nonparametric DAGs from Interventional Dynamic Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new algorithm, IDYNO, to learn the DAG structure from potentially nonlinear times series data by using a continuous optimization framework that includes a recent formulation for continuous acyclicity constraint. |
Tian Gao; Debarun Bhattacharjya; Elliot Nelson; Miao Liu; Yue Yu; |
301 | Loss Function Learning for Domain Generalization By Implicit Gradient Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we introduce a novel meta-learning approach to loss function search based on implicit gradient. |
Boyan Gao; Henry Gouk; Yongxin Yang; Timothy Hospedales; |
302 | On The Convergence of Local Stochastic Compositional Gradient Descent with Momentum Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we developed a novel local stochastic compositional gradient descent with momentum method, which facilitates Federated Learning for the stochastic compositional problem. |
Hongchang Gao; Junyi Li; Heng Huang; |
303 | Deep Reference Priors: What Is The Best Way to Pretrain A Model? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents the first demonstration of reference priors for medium-scale deep networks and image-based data. |
Yansong Gao; Rahul Ramesh; Pratik Chaudhari; |
304 | On The Equivalence Between Temporal and Static Equivariant Graph Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work formalizes the associational task of predicting node attribute evolution in temporal graphs from the perspective of learning equivariant representations. |
Jianfei Gao; Bruno Ribeiro; |
305 | Generalizing Gaussian Smoothing for Random Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on an analysis of DFO for non-convex functions, we propose to choose a distribution for perturbations that minimizes the mean squared error (MSE) of the gradient estimate. |
Katelyn Gao; Ozan Sener; |
306 | Rethinking Image-Scaling Attacks: The Interplay Between Vulnerabilities in Machine Learning Systems Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate the interplay between vulnerabilities of the image scaling procedure and machine learning models in the decision-based black-box setting. |
Yue Gao; Ilia Shumailov; Kassem Fawaz; |
307 | Lazy Estimation of Variable Importance for Large Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a fast and flexible method for approximating the reduced model with important inferential guarantees. |
Yue Gao; Abby Stevens; Garvesh Raskutti; Rebecca Willett; |
308 | Fast and Reliable Evaluation of Adversarial Robustness with Minimum-Margin Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel method, minimum-margin (MM) attack, to fast and reliably evaluate adversarial robustness. |
Ruize Gao; Jiongxiao Wang; Kaiwen Zhou; Feng Liu; Binghui Xie; Gang Niu; Bo Han; James Cheng; |
309 | Value Function Based Difference-of-Convex Algorithm for Bilevel Hyperparameter Selection Problems Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we develop a sequentially convergent Value Function based Difference-of-Convex Algorithm with inexactness (VF-iDCA). |
Lucy L Gao; Jane Ye; Haian Yin; Shangzhi Zeng; Jin Zhang; |
310 | Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, a novel cartoon-texture-saliency-sampler (CTSS) module is proposed to adaptively sample cartoon-texture-salient patches from training data. |
Xiang Gao; Yuqi Zhang; Yingjie Tian; |
311 | Stochastic Smoothing of The Top-K Calibrated Hinge Loss for Deep Imbalanced Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we introduce a stochastic top-K hinge loss inspired by recent developments on top-K calibrated losses. |
Camille Garcin; Maximilien Servajean; Alexis Joly; Joseph Salmon; |
312 | PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of update. |
Matilde Gargiani; Andrea Zanelli; Andrea Martinelli; Tyler Summers; John Lygeros; |
313 | The Power of First-order Smooth Optimization for Black-box Non-smooth Problems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a black-box fashion new zeroth-order algorithms for non-smooth convex optimization problems. |
Alexander Gasnikov; Anton Novitskii; Vasilii Novitskii; Farshed Abdukhakimov; Dmitry Kamzolov; Aleksandr Beznosikov; Martin Takac; Pavel Dvurechensky; Bin Gu; |
314 | A Functional Information Perspective on Model Interpretation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input. |
Itai Gat; Nitay Calderon; Roi Reichart; Tamir Hazan; |
315 | UniRank: Unimodal Bandit Algorithms for Online Ranking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. |
Camille-Sovanneary Gauthier; Romaric Gaudel; Elisa Fromont; |
316 | Variational Inference with Locally Enhanced Bounds for Hierarchical Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new family of variational bounds for hierarchical models, based on the application of tightening methods (e.g. importance weighting) separately for each group of local random variables. |
Tomas Geffner; Justin Domke; |
317 | Inducing Causal Structure for Interpretable Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). |
Atticus Geiger; Zhengxuan Wu; Hanson Lu; Josh Rozner; Elisa Kreiss; Thomas Icard; Noah Goodman; Christopher Potts; |
318 | Achieving Minimax Rates in Pool-Based Batch Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. |
Claudio Gentile; Zhilei Wang; Tong Zhang; |
319 | Near-Exact Recovery for Tomographic Inverse Problems Via Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work is concerned with the following fundamental question in scientific machine learning: Can deep-learning-based methods solve noise-free inverse problems to near-perfect accuracy? |
Martin Genzel; Ingo G?hring; Jan Macdonald; Maximilian M?rz; |
320 | Online Learning for Min Sum Set Cover and Pandora’s Box Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a computationally efficient algorithm that is constant-competitive against the cost of the optimal search order. |
Evangelia Gergatsouli; Christos Tzamos; |
321 | Equivariance Versus Augmentation for Spherical Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. |
Jan Gerken; Oscar Carlsson; Hampus Linander; Fredrik Ohlsson; Christoffer Petersson; Daniel Persson; |
322 | A Regret Minimization Approach to Multi-Agent Control Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of multi-agent control of a dynamical system with known dynamics and adversarial disturbances. |
Udaya Ghai; Udari Madhushani; Naomi Leonard; Elad Hazan; |
323 | Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children’s toy kits. |
Seyed Kamyar Seyed Ghasemipour; Satoshi Kataoka; Byron David; Daniel Freeman; Shixiang Shane Gu; Igor Mordatch; |
324 | Faster Privacy Accounting Via Evolving Discretization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for compositions of mechanisms. |
Badih Ghazi; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; |
325 | Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. |
Amin Ghiasi; Hamid Kazemi; Steven Reich; Chen Zhu; Micah Goldblum; Tom Goldstein; |
326 | Offline RL Policies Should Be Trained to Be Adaptive Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose that offline RL methods should instead be adaptive in the presence of uncertainty. |
Dibya Ghosh; Anurag Ajay; Pulkit Agrawal; Sergey Levine; |
327 | Breaking The $\sqrtT$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that stochastic contexts indeed help to reduce the regret from $\sqrt{T}$ to $\polylog(T)$. |
Avishek Ghosh; Abishek Sankararaman; |
328 | SCHA-VAE: Hierarchical Context Aggregation for Few-Shot Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We extend current latent variable models for sets to a fully hierarchical approach with an attention-based point to set-level aggregation and call our method SCHA-VAE for Set-Context-Hierarchical-Aggregation Variational Autoencoder. |
Giorgio Giannone; Ole Winther; |
329 | A Joint Exponential Mechanism For Differentially Private Top-$k$ Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a differentially private algorithm for releasing the sequence of $k$ elements with the highest counts from a data domain of $d$ elements. |
Jennifer Gillenwater; Matthew Joseph; Andres Munoz; Monica Ribero Diaz; |
330 | Neuro-Symbolic Hierarchical Rule Induction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Neuro-Symbolic Hierarchical Rule Induction, an efficient interpretable neuro-symbolic model, to solve Inductive Logic Programming (ILP) problems. |
Claire Glanois; Zhaohui Jiang; Xuening Feng; Paul Weng; Matthieu Zimmer; Dong Li; Wulong Liu; Jianye Hao; |
331 | It’s Raw! Audio Generation with State-Space Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. |
Karan Goel; Albert Gu; Chris Donahue; Christopher Re; |
332 | RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents the RankSim (ranking similarity) regularizer for deep imbalanced regression, which encodes an inductive bias that samples that are closer in label space should also be closer in feature space. |
Yu Gong; Greg Mori; Fred Tung; |
333 | How to Fill The Optimum Set? Population Gradient Descent with Harmless Diversity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Therefore, it is useful to consider the problem of finding a set of diverse points in the optimum set of an objective function. In this work, we frame this problem as a bi-level optimization problem of maximizing a diversity score inside the optimum set of the main loss function, and solve it with a simple population gradient descent framework that iteratively updates the points to maximize the diversity score in a fashion that does not hurt the optimization of the main loss. |
Chengyue Gong; Lemeng Wu; Qiang Liu; |
334 | Partial Label Learning Via Label Influence Function Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, inspired by influence function, we develop a novel PLL framework called Partial Label Learning via Label Influence Function (PLL-IF). |
Xiuwen Gong; Dong Yuan; Wei Bao; |
335 | Secure Distributed Training at Scale Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency. |
Eduard Gorbunov; Alexander Borzunov; Michael Diskin; Max Ryabinin; |
336 | Retrieval-Augmented Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. |
Anirudh Goyal; Abram Friesen; Andrea Banino; Theophane Weber; Nan Rosemary Ke; Adri? Puigdom?nech Badia; Arthur Guez; Mehdi Mirza; Peter C Humphreys; Ksenia Konyushova; Michal Valko; Simon Osindero; Timothy Lillicrap; Nicolas Heess; Charles Blundell; |
337 | The State of Sparse Training in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. |
Laura Graesser; Utku Evci; Erich Elsen; Pablo Samuel Castro; |
338 | Causal Inference Through The Structural Causal Marginal Problem Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce an approach to counterfactual inference based on merging information from multiple datasets. |
Luigi Gresele; Julius Von K?gelgen; Jonas K?bler; Elke Kirschbaum; Bernhard Sch?lkopf; Dominik Janzing; |
339 | Mirror Learning: A Unifying Framework of Policy Optimisation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, in this paper, we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. |
Jakub Grudzien; Christian A Schroeder De Witt; Jakob Foerster; |
340 | Adapting K-means Algorithms for Outliers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we build on their ideas and show how to adapt several sequential and distributed k-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output (1 + $\epsilon$)z outliers while achieving an O(1/$\epsilon$)-approximation to the objective function. |
Christoph Grunau; V?clav Rozhon; |
341 | Variational Mixtures of ODEs for Inferring Cellular Gene Expression Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Additionally, a single progenitor cell type often bifurcates into multiple child cell types, further complicating the problem of modeling the dynamics. To address this problem, we developed an approach called variational mixtures of ordinary differential equations. |
Yichen Gu; David T Blaauw; Joshua Welch; |
342 | Learning Pseudometric-based Action Representations for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes an action representation learning framework for offline RL based on a pseudometric, which measures both the behavioral relation and the data-distributional relation between actions. |
Pengjie Gu; Mengchen Zhao; Chen Chen; Dong Li; Jianye Hao; Bo An; |
343 | NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider a partially observable scenario known as fluid dynamics grounding, that is, inferring the state transitions and interactions within the fluid particle systems from sequential visual observations of the fluid surface. |
Shanyan Guan; Huayu Deng; Yunbo Wang; Xiaokang Yang; |
344 | Fast-Rate PAC-Bayesian Generalization Bounds for Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a general PAC-Bayesian framework to cope with single-task learning and meta-learning uniformly. |
Jiechao Guan; Zhiwu Lu; |
345 | Leveraging Approximate Symbolic Models for Reinforcement Learning Via Skill Diversity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Symbolic models of real world tasks are however often incomplete. To this end, we introduce Approximate Symbolic-Model Guided Reinforcement Learning, wherein we will formalize the relationship between the symbolic model and the underlying MDP that will allow us to characterize the incompleteness of the symbolic model. |
Lin Guan; Sarath Sreedharan; Subbarao Kambhampati; |
346 | Large-Scale Graph Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing approaches fail to handle large-scale graphs because current performance estimation strategies in GNAS are computationally expensive for large-scale graphs and suffer from consistency collapse issues. To tackle these problems, we propose the Graph ArchitectUre Search at Scale (GAUSS) method that can handle large-scale graphs by designing an efficient light-weight supernet and the joint architecture-graph sampling. |
Chaoyu Guan; Xin Wang; Hong Chen; Ziwei Zhang; Wenwu Zhu; |
347 | Identifiability Conditions for Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unfortunately, it is unclear under what conditions this identifiability assumption holds, even when restricting ourselves to the case where a correct bijective map between domains exists. We study this bijective domain mapping problem and provide several new sufficient conditions for the identifiability of linear domain maps. |
Ishaan Gulrajani; Tatsunori Hashimoto; |
348 | A Parametric Class of Approximate Gradient Updates for Policy Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To better capture the commonalities and identify key differences between policy optimization methods, we develop a unified perspective that re-expresses the underlying updates in terms of a limited choice of gradient form and scaling function. |
Ramki Gummadi; Saurabh Kumar; Junfeng Wen; Dale Schuurmans; |
349 | Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In the offline setting, estimating these operators directly is challenging due to (i) the large observation space and (ii) insufficient coverage of the offline dataset. To tackle these challenges, we propose a novel algorithm that constructs confidence regions for these Bellman operators via offline estimation of their RKHS embeddings, and returns the final policy via pessimistic planning within the confidence regions. |
Hongyi Guo; Qi Cai; Yufeng Zhang; Zhuoran Yang; Zhaoran Wang; |
350 | No-Regret Learning in Partially-Informed Auctions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Auctions with partially-revealed information about items are broadly employed in real-world applications, but the underlying mechanisms have limited theoretical support. In this work, we study a machine learning formulation of these types of mechanisms, presenting algorithms that are no-regret from the buyer’s perspective. |
Wenshuo Guo; Michael Jordan; Ellen Vitercik; |
351 | Bounding Training Data Reconstruction in Private (Deep) Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we derive the first semantic guarantees for DP mechanisms against training data reconstruction attacks under a formal threat model. |
Chuan Guo; Brian Karrer; Kamalika Chaudhuri; Laurens van der Maaten; |
352 | Adversarially Trained Neural Representations Are Already As Robust As Biological Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop a method for performing adversarial visual attacks directly on primate brain activity. |
Chong Guo; Michael Lee; Guillaume Leclerc; Joel Dapello; Yug Rao; Aleksander Madry; James Dicarlo; |
353 | Class-Imbalanced Semi-Supervised Learning with Adaptive Thresholding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a simple yet effective framework, which only involves adaptive thresholding for different classes in SSL algorithms, and achieves remarkable performance improvement on more than twenty imbalance ratios. |
Lan-Zhe Guo; Yu-Feng Li; |
354 | Deep Squared Euclidean Approximation to The Levenshtein Distance for DNA Storage Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel deep squared Euclidean embedding for DNA sequences using Siamese neural network, squared Euclidean embedding, and chi-squared regression. |
Alan J.X. Guo; Cong Liang; Qing-Hu Hou; |
355 | Online Continual Learning Through Mutual Information Maximization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposed a new online continual learning approach called OCMM based on mutual information (MI) maximization. |
Yiduo Guo; Bing Liu; Dongyan Zhao; |
356 | Fast Provably Robust Decision Trees and Boosting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work proposes the Fast Provably Robust Decision Tree (FPRDT) with the smallest computational complexity O(n log n), a tradeoff between global and local optimizations over the adversarial 0/1 loss. |
Jun-Qi Guo; Ming-Zhuo Teng; Wei Gao; Zhi-Hua Zhou; |
357 | Understanding and Improving Knowledge Graph Embedding for Entity Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To fill the research gap, we define a typical paradigm abstracted from existing EEA methods and analyze how the embedding discrepancy between two potentially aligned entities is implicitly bounded by a predefined margin in the score function. |
Lingbing Guo; Qiang Zhang; Zequn Sun; Mingyang Chen; Wei Hu; Huajun Chen; |
358 | NISPA: Neuro-Inspired Stability-Plasticity Adaptation for Continual Learning in Sparse Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The main desiderata associated with CL are to maintain performance on older tasks, leverage the latter to improve learning of future tasks, and to introduce minimal overhead in the training process (for instance, to not require a growing model or retraining). We propose the Neuro-Inspired Stability-Plasticity Adaptation (NISPA) architecture that addresses these desiderata through a sparse neural network with fixed density. |
Mustafa B Gurbuz; Constantine Dovrolis; |
359 | Active Learning on A Budget: Opposite Strategies Suit High and Low Budgets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Accordingly, we propose TypiClust – a deep active learning strategy suited for low budgets. |
Guy Hacohen; Avihu Dekel; Daphna Weinshall; |
360 | You Only Cut Once: Boosting Data Augmentation with A Single Cut Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present You Only Cut Once (YOCO) for performing data augmentations. |
Junlin Han; Pengfei Fang; Weihao Li; Jie Hong; Mohammad Ali Armin; Ian Reid; Lars Petersson; Hongdong Li; |
361 | Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we develop a scalable MCMC sampling algorithm for $k$-NDPPs with low-rank kernels, thus enabling runtime that is sublinear in $n$. |
Insu Han; Mike Gartrell; Elvis Dohmatob; Amin Karbasi; |
362 | G-Mixup: Graph Data Augmentation for Graph Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it is challenging to directly adopt Mixup to augment graph data because different graphs typically: 1) have different numbers of nodes; 2) are not readily aligned; and 3) have unique typologies in non-Euclidean space. To this end, we propose G-Mixup to augment graphs for graph classification by interpolating the generator (i.e., graphon) of different classes of graphs. |
Xiaotian Han; Zhimeng Jiang; Ninghao Liu; Xia Hu; |
363 | Private Streaming SCO in $\ell_p$ Geometry with Applications in High Dimensional Online Decision Making Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a private variant of the Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data. |
Yuxuan Han; Zhicong Liang; Zhipeng Liang; Yang Wang; Yuan Yao; Jiheng Zhang; |
364 | Off-Policy Reinforcement Learning with Delayed Rewards Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study deep reinforcement learning (RL) algorithms with delayed rewards. |
Beining Han; Zhizhou Ren; Zuofan Wu; Yuan Zhou; Jian Peng; |
365 | Adversarial Attacks on Gaussian Process Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our goal is to understand adversarial attacks on GP bandits from theoretical and practical perspectives. |
Eric Han; Jonathan Scarlett; |
366 | Random Gegenbauer Features for Scalable Kernel Methods Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose efficient random features for approximating a new and rich class of kernel functions that we refer to as Generalized Zonal Kernels (GZK). |
Insu Han; Amir Zandieh; Haim Avron; |
367 | Stochastic Reweighted Gradient Descent Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose stochastic reweighted gradient descent (SRG), a stochastic gradient method based solely on importance sampling that can reduce the variance of the gradient estimator and improve on the asymptotic error of stochastic gradient descent (SGD) in the strongly convex and smooth case. |
Ayoub El Hanchi; David Stephens; Chris Maddison; |
368 | Dual Perspective of Label-Specific Feature Learning for Multi-Label Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a dual perspective for label-specific feature learning, where label-specific discriminative properties are considered by identifying each label’s own non-informative features and making the discrimination process immutable to variations of these features. |
Jun-Yi Hang; Min-Ling Zhang; |
369 | Temporal Difference Learning for Model Predictive Control Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we combine the strengths of model-free and model-based methods. |
Nicklas A Hansen; Hao Su; Xiaolong Wang; |
370 | Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. |
Philippe Hansen-Estruch; Amy Zhang; Ashvin Nair; Patrick Yin; Sergey Levine; |
371 | TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We derive a near-linear-time and essentially sample-optimal estimator that establishes c_{t,d}=2 for all (t,d)!=(1,0). |
Yi Hao; Ayush Jain; Alon Orlitsky; Vaishakh Ravindrakumar; |
372 | Contextual Information-Directed Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. |
Botao Hao; Tor Lattimore; Chao Qin; |
373 | GSmooth: Certified Robustness Against Semantic Transformations Via Generalized Randomized Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing methods are insufficient or unable to provably defend against semantic transformations, especially those without closed-form expressions (such as defocus blur and pixelate), which are more common in practice and often unrestricted. To fill up this gap, we propose generalized randomized smoothing (GSmooth), a unified theoretical framework for certifying robustness against general semantic transformations via a novel dimension augmentation strategy. |
Zhongkai Hao; Chengyang Ying; Yinpeng Dong; Hang Su; Jian Song; Jun Zhu; |
374 | Implicit Regularization with Polynomial Growth in Deep Tensor Factorization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the implicit regularization effects of deep learning in tensor factorization. |
Kais Hariz; Hachem Kadri; Stephane Ayache; Maher Moakher; Thierry Artieres; |
375 | Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, our work establishes a novel connection between strategic responses to ML models and instrumental variable (IV) regression by observing that the sequence of deployed models can be viewed as an instrument that affects agents’ observable features but does not directly influence their outcomes. |
Keegan Harris; Dung Daniel T Ngo; Logan Stapleton; Hoda Heidari; Steven Wu; |
376 | C*-algebra Net: A New Approach Generalizing Neural Network Parameters to C*-algebra Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new framework that generalizes the parameters of neural network models to $C^*$-algebra-valued ones. |
Yuka Hashimoto; Zhao Wang; Tomoko Matsui; |
377 | General-purpose, Long-context Autoregressive Modeling with Perceiver AR Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. |
Curtis Hawthorne; Andrew Jaegle; Catalina Cangea; Sebastian Borgeaud; Charlie Nash; Mateusz Malinowski; Sander Dieleman; Oriol Vinyals; Matthew Botvinick; Ian Simon; Hannah Sheahan; Neil Zeghidour; Jean-Baptiste Alayrac; Joao Carreira; Jesse Engel; |
378 | On Distribution Shift in Learning-based Bug Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we argue that this massive performance difference is caused by a distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. |
Jingxuan He; Luca Beurer-Kellner; Martin Vechev; |
379 | GNNRank: Learning Global Rankings from Pairwise Comparisons Via Directed Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. |
Yixuan He; Quan Gan; David Wipf; Gesine D Reinert; Junchi Yan; Mihai Cucuringu; |
380 | Exploring The Gap Between Collapsed & Whitened Features in Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We identify power law behaviour in eigenvalue decay, parameterised by exponent ß=0, as a spectrum that bridges between the collapsed & whitened feature extremes. |
Bobby He; Mete Ozay; |
381 | Sparse Double Descent: Where Network Pruning Aggravates Overfitting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. |
Zheng He; Zeke Xie; Quanzhi Zhu; Zengchang Qin; |
382 | A Reduction from Linear Contextual Bandit Lower Bounds to Estimation Lower Bounds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we complete the reverse direction by establishing the necessity. |
Jiahao He; Jiheng Zhang; Rachel Zhang; |
383 | HyperPrompt: Prompt-based Task-Conditioning of Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. |
Yun He; Steven Zheng; Yi Tay; Jai Gupta; Yu Du; Vamsi Aribandi; Zhe Zhao; Yaguang Li; Zhao Chen; Donald Metzler; Heng-Tze Cheng; Ed H. Chi; |
384 | Label-Descriptive Patterns and Their Application to Characterizing Classification Errors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to discover those feature-value combinations (i.e., patterns) that strongly correlate with correct resp. |
Michael A. Hedderich; Jonas Fischer; Dietrich Klakow; Jilles Vreeken; |
385 | NOMU: Neural Optimization-based Model Uncertainty Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, we find that established benchmarks often fail to reliably capture some of these desiderata, even those that are required by Bayesian theory. To address this, we introduce a new approach for capturing model uncertainty for NNs, which we call Neural Optimization-based Model Uncertainty (NOMU). |
Jakob M Heiss; Jakob Weissteiner; Hanna S Wutte; Sven Seuken; Josef Teichmann; |
386 | Scaling Out-of-Distribution Detection for Real-World Settings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label settings with high-resolution images and thousands of classes. |
Dan Hendrycks; Steven Basart; Mantas Mazeika; Andy Zou; Joseph Kwon; Mohammadreza Mostajabi; Jacob Steinhardt; Dawn Song; |
387 | Generalization Bounds Using Lower Tail Exponents in Stochastic Optimizers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While recent work has revealed connections between generalization and heavy-tailed behavior in stochastic optimization, they mainly relied on continuous-time approximations; and a rigorous treatment for the original discrete-time iterations is yet to be performed. To bridge this gap, we present novel bounds linking generalization to the lower tail exponent of the transition kernel associated with the optimizer around a local minimum, in both discrete- and continuous-time settings. |
Liam Hodgkinson; Umut Simsekli; Rajiv Khanna; Michael Mahoney; |
388 | Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a fully unsupervised method to detect bias in contextualized embeddings. |
Valentin Hofmann; Janet Pierrehumbert; Hinrich Sch?tze; |
389 | Neural Laplace: Learning Diverse Classes of Differential Equations in The Laplace Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Neural Laplace, a unifying framework for learning diverse classes of DEs including all the aforementioned ones. |
Samuel I Holt; Zhaozhi Qian; Mihaela van der Schaar; |
390 | Deep Hierarchy in Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Since the hierarchy can have multiple layers, we call it deep. We propose a hierarchical Thompson sampling algorithm (HierTS) for this problem and show how to implement it efficiently for Gaussian hierarchies. |
Joey Hong; Branislav Kveton; Sumeet Katariya; Manzil Zaheer; Mohammad Ghavamzadeh; |
391 | DAdaQuant: Doubly-adaptive Quantization for Communication-efficient Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce DAdaQuant as a doubly-adaptive quantization algorithm that dynamically changes the quantization level across time and different clients. |
Robert H?nig; Yiren Zhao; Robert Mullins; |
392 | Equivariant Diffusion for Molecule Generation in 3D Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. |
Emiel Hoogeboom; Vi?ctor Garcia Satorras; Cl?ment Vignac; Max Welling; |
393 | Conditional GANs with Auxiliary Discriminative Classifier Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The fundamental reason pointed out in this paper is that the classifier of AC-GAN is generator-agnostic, which therefore cannot provide informative guidance for the generator to approach the joint distribution, resulting in a minimization of the conditional entropy that decreases the intra-class diversity. |
Liang Hou; Qi Cao; Huawei Shen; Siyuan Pan; Xiaoshuang Li; Xueqi Cheng; |
394 | AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Under such a scenario, AUC is a much more reasonable metric than accuracy since it is insensitive toward class distribution. Motivated by this, we present an early trial to explore adversarial training methods to optimize AUC. |
Wenzheng Hou; Qianqian Xu; Zhiyong Yang; Shilong Bao; Yuan He; Qingming Huang; |
395 | Wide Bayesian Neural Networks Have A Simple Weight Posterior: Theory and Accelerated Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce repriorisation, a data-dependent reparameterisation which transforms a Bayesian neural network (BNN) posterior to a distribution whose KL divergence to the BNN prior vanishes as layer widths grow. |
Jiri Hron; Roman Novak; Jeffrey Pennington; Jascha Sohl-Dickstein; |
396 | Learning Inverse Folding from Millions of Predicted Structures Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the problem of predicting a protein sequence from its backbone atom coordinates. |
Chloe Hsu; Robert Verkuil; Jason Liu; Zeming Lin; Brian Hie; Tom Sercu; Adam Lerer; Alexander Rives; |
397 | Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB$^+$, which achieves an $\widetilde{O}(Hd\sqrt{T})$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps. |
Pihe Hu; Yu Chen; Longbo Huang; |
398 | Neuron Dependency Graphs: A Causal Abstraction of Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We discover that neural networks exhibit approximate logical dependencies among neurons, and we introduce Neuron Dependency Graphs (NDG) that extract and present them as directed graphs. |
Yaojie Hu; Jin Tian; |
399 | Policy Diagnosis Via Measuring Role Diversity in Cooperative Multi-agent RL Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we quantify the agent’s behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks. |
Siyi Hu; Chuanlong Xie; Xiaodan Liang; Xiaojun Chang; |
400 | On The Role of Discount Factor in Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper examines two distinct effects of $\gamma$ in offline RL with theoretical analysis, namely the regularization effect and the pessimism effect. |
Hao Hu; Yiqin Yang; Qianchuan Zhao; Chongjie Zhang; |
401 | Transformer Quality in Linear Time Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We revisit the design choices in Transformers, and propose methods to address their weaknesses in handling long sequences. |
Weizhe Hua; Zihang Dai; Hanxiao Liu; Quoc Le; |
402 | Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. “make breakfast”), to a chosen set of actionable steps (e.g. “open fridge”). |
Wenlong Huang; Pieter Abbeel; Deepak Pathak; Igor Mordatch; |
403 | Forward Operator Estimation in Generative Models with Kernel Transfer Operators Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a substantially cheaper (and simpler) forward operator estimation strategy based on adapting known results on kernel transfer operators. |
Zhichun Huang; Rudrasis Chakraborty; Vikas Singh; |
404 | Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $\alpha$-th ($1<\alpha\le 2$) moments bounded by $\sigma^\alpha$, while the variances may not exist. |
Jiatai Huang; Yan Dai; Longbo Huang; |
405 | Frustratingly Easy Transferability Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. To this end, we propose a simple, efficient, and effective transferability measure named TransRate. |
Long-Kai Huang; Junzhou Huang; Yu Rong; Qiang Yang; Ying Wei; |
406 | Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably) Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network across different combinations of modalities on various tasks, which is counter-intuitive since multiple signals would bring more information (Wang et al., 2020). This work provides a theoretical explanation for the emergence of such performance gap in neural networks for the prevalent joint training framework. |
Yu Huang; Junyang Lin; Chang Zhou; Hongxia Yang; Longbo Huang; |
407 | Action-Sufficient State Representation Learning for Control with Structural Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). |
Biwei Huang; Chaochao Lu; Liu Leqi; Jose Miguel Hernandez-Lobato; Clark Glymour; Bernhard Sch?lkopf; Kun Zhang; |
408 | 3DLinker: An E(3) Equivariant Variational Autoencoder for Molecular Linker Design Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on a new type of drug design problem — generating a small “linker” to physically attach two independent molecules with their distinct functions. |
Yinan Huang; Xingang Peng; Jianzhu Ma; Muhan Zhang; |
409 | SDQ: Stochastic Differentiable Quantization with Mixed Precision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a novel Stochastic Differentiable Quantization (SDQ) method that can automatically learn the MPQ strategy in a more flexible and globally-optimized space with a smoother gradient approximation. |
Xijie Huang; Zhiqiang Shen; Shichao Li; Zechun Liu; Hu Xianghong; Jeffry Wicaksana; Eric Xing; Kwang-Ting Cheng; |
410 | Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios. |
Yan Huang; Ying Sun; Zehan Zhu; Changzhi Yan; Jinming Xu; |
411 | Efficient Representation Learning Via Adaptive Context Pooling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. |
Chen Huang; Walter Talbott; Navdeep Jaitly; Joshua M Susskind; |
412 | On The Learning of Non-Autoregressive Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. |
Fei Huang; Tianhua Tao; Hao Zhou; Lei Li; Minlie Huang; |
413 | Going Deeper Into Permutation-Sensitive Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we devise an efficient permutation-sensitive aggregation mechanism via permutation groups, capturing pairwise correlations between neighboring nodes. |
Zhongyu Huang; Yingheng Wang; Chaozhuo Li; Huiguang He; |
414 | Directed Acyclic Transformer for Non-Autoregressive Machine Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Directed Acyclic Transfomer (DA-Transformer), which represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. |
Fei Huang; Hao Zhou; Yang Liu; Hang Li; Minlie Huang; |
415 | Unsupervised Ground Metric Learning Using Wasserstein Singular Vectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose for the first time a canonical answer by simultaneously computing an OT distance between samples and between features of a dataset. |
Geert-Jan Huizing; Laura Cantini; Gabriel Peyr?; |
416 | Robust Kernel Density Estimation with Median-of-Means Principle Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a robust non-parametric density estimator combining the popular Kernel Density Estimation method and the Median-of-Means principle (MoM-KDE). |
Pierre Humbert; Batiste Le Bars; Ludovic Minvielle; |
417 | A Data-driven Approach for Learning to Control Computers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. |
Peter C Humphreys; David Raposo; Tobias Pohlen; Gregory Thornton; Rachita Chhaparia; Alistair Muldal; Josh Abramson; Petko Georgiev; Adam Santoro; Timothy Lillicrap; |
418 | Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Using such a denoiser guarantees the convergence of the PnP version of the Half-Quadratic-Splitting (PnP-HQS) iterative algorithm. In this paper, we show that this gradient denoiser can actually correspond to the proximal operator of another scalar function. |
Samuel Hurault; Arthur Leclaire; Nicolas Papadakis; |
419 | Inverse Contextual Bandits: Learning How Behavior Evolves Over Time Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. |
Alihan H?y?k; Daniel Jarrett; Mihaela van der Schaar; |
420 | Datamodels: Understanding Predictions with Data and Data with Predictions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a conceptual framework, datamodeling, for analyzing the behavior of a model class in terms of the training data. |
Andrew Ilyas; Sung Min Park; Logan Engstrom; Guillaume Leclerc; Aleksander Madry; |
421 | Parsimonious Learning-Augmented Caching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we introduce and study the setting in which the learning-augmented algorithm can utilize the predictions parsimoniously. |
Sungjin Im; Ravi Kumar; Aditya Petety; Manish Purohit; |
422 | Bayesian Optimization for Distributionally Robust Chance-constrained Problem Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we consider distributionally robust CC (DRCC) problem and propose a novel DRCC Bayesian optimization method for the case where the distribution of the environmental variables cannot be precisely specified. |
Yu Inatsu; Shion Takeno; Masayuki Karasuyama; Ichiro Takeuchi; |
423 | LeNSE: Learning To Navigate Subgraph Embeddings for Large-Scale Combinatorial Optimisation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a low-complexity approach for identifying a (possibly much smaller) subgraph of the original graph where the heuristics can be run in reasonable time and with a high likelihood of finding a global near-optimal solution. |
David Ireland; Giovanni Montana; |
424 | The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns Via Spotlights of Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this dual formulation offers a possibility of directly visualising how an NN makes use of training patterns at test time, by examining the corresponding attention weights. We conduct experiments on small scale supervised image classification tasks in single-task, multi-task, and continual learning settings, as well as language modelling, and discuss potentials and limits of this view for better understanding and interpreting how NNs exploit training patterns. |
Kazuki Irie; R?bert Csord?s; J?rgen Schmidhuber; |
425 | A Modern Self-Referential Weight Matrix That Learns to Modify Itself Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a scalable self-referential WM (SRWM) that learns to use outer products and the delta update rule to modify itself. |
Kazuki Irie; Imanol Schlag; R?bert Csord?s; J?rgen Schmidhuber; |
426 | Revisiting Online Submodular Minimization: Gap-Dependent Regret Bounds, Best of Both Worlds and Adversarial Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider online decision problems with submodular loss functions. |
Shinji Ito; |
427 | Modeling Strong and Human-Like Gameplay with KL-Regularized Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the task of accurately modeling strong human policies in multi-agent decision-making problems, given examples of human behavior. |
Athul Paul Jacob; David J Wu; Gabriele Farina; Adam Lerer; Hengyuan Hu; Anton Bakhtin; Jacob Andreas; Noam Brown; |
428 | A Deep Convolutional Neural Network That Is Invariant to Time Rescaling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a deep CNN (SITHCon) that uses a logarithmically compressed temporal representation at each level. |
Brandon G Jacques; Zoran Tiganj; Aakash Sarkar; Marc Howard; Per Sederberg; |
429 | Input Dependent Sparse Gaussian Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A limitation is, however, that in some tasks a large number of inducing points may be required to obtain good results. To alleviate this, we propose here to amortize the computation of the inducing points locations, as well as the parameters of $q$. |
Bahram Jafrasteh; Carlos Villacampa-Calvo; Daniel Hernandez-Lobato; |
430 | Regret Minimization with Performative Feedback Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our main contribution is regret bounds that scale only with the complexity of the distribution shifts and not that of the reward function. |
Meena Jagadeesan; Tijana Zrnic; Celestine Mendler-D?nner; |
431 | Biological Sequence Design with GFlowNets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. |
Moksh Jain; Emmanuel Bengio; Alex Hernandez-Garcia; Jarrid Rector-Brooks; Bonaventure F. P. Dossou; Chanakya Ajit Ekbote; Jie Fu; Tianyu Zhang; Michael Kilgour; Dinghuai Zhang; Lena Simine; Payel Das; Yoshua Bengio; |
432 | Combining Diverse Feature Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. |
Saachi Jain; Dimitris Tsipras; Aleksander Madry; |
433 | Training Your Sparse Neural Network Better with Any Mask Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Apart from the popular belief that only the quality of sparse masks matters for sparse training, in this paper we demonstrate an alternative opportunity: one can carefully customize the sparse training techniques to deviate from the default dense network training protocols, consisting of introducing “ghost" neurons and skip connections at the early stage of training, and strategically modifying the initialization as well as labels. |
Ajay Kumar Jaiswal; Haoyu Ma; Tianlong Chen; Ying Ding; Zhangyang Wang; |
434 | Sequential Covariate Shift Detection Using Classifier Two-Sample Tests Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the problem of detecting covariate shift, where the covariate distribution shifts but the conditional distribution of labels given covariates remains the same. |
Sooyong Jang; Sangdon Park; Insup Lee; Osbert Bastani; |
435 | Surrogate Likelihoods for Variational Annealed Importance Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, supporting data subsampling in these hybrid methods can be a challenge, a shortcoming that we address by introducing a surrogate likelihood that can be learned jointly with other variational parameters. |
Martin Jankowiak; Du Phan; |
436 | Planning with Diffusion for Flexible Behavior Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. |
Michael Janner; Yilun Du; Joshua Tenenbaum; Sergey Levine; |
437 | HyperImpute: Generalized Iterative Imputation with Automatic Model Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study an approach that marries the advantages of both: We propose *HyperImpute*, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters. |
Daniel Jarrett; Bogdan C Cebere; Tennison Liu; Alicia Curth; Mihaela van der Schaar; |
438 | Mitigating Modality Collapse in Multimodal VAEs Via Impartial Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We refer to this limitation as modality collapse. In this work, we argue that this effect is a consequence of conflicting gradients during multimodal VAE training. |
Adrian Javaloy; Maryam Meghdadi; Isabel Valera; |
439 | Towards Understanding How Momentum Improves Generalization in Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we adopt another perspective and first empirically show that gradient descent with momentum (GD+M) significantly improves generalization compared to gradient descent (GD) in some deep learning problems. From this observation, we formally study how momentum improves generalization. |
Samy Jelassi; Yuanzhi Li; |
440 | MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider cooperative multi-agent reinforcement learning (MARL) with sparse reward. |
Jeewon Jeon; Woojun Kim; Whiyoung Jung; Youngchul Sung; |
441 | An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear Programming Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we cast the SPO problem as a bi-level program and apply Symbolic Variable Elimination (SVE) to analytically solve the lower optimization. |
Jihwan Jeong; Parth Jaggi; Andrew Butler; Scott Sanner; |
442 | Agnostic Learnability of Halfspaces Via Logistic Loss Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Previously, for a certain broad class of “well-behaved” distributions on the examples, Diakonikolas et al. (2020) proved an tilde{Omega}(OPT) lower bound, while Frei et al. (2021) proved an tilde{O}(sqrt{OPT}) upper bound, where OPT denotes the best zero-one/misclassification risk of a homogeneous halfspace. In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves Omega(sqrt{OPT}) misclassification risk, matching the upper bound in (Frei et al., 2021). |
Ziwei Ji; Kwangjun Ahn; Pranjal Awasthi; Satyen Kale; Stefani Karp; |
443 | Improving Policy Optimization with Generalist-Specialist Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To have the best of both worlds, we propose a novel generalist-specialist training framework. |
Zhiwei Jia; Xuanlin Li; Zhan Ling; Shuang Liu; Yiran Wu; Hao Su; |
444 | Translatotron 2: High-quality Direct Speech-to-speech Translation with Voice Preservation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. |
Ye Jia; Michelle Tadmor Ramanovich; Tal Remez; Roi Pomerantz; |
445 | Online Learning and Pricing with Reusable Resources: Linear Bandits with Sub-Exponential Rewards Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a rate-optimal online learning and pricing algorithm, termed Batch Linear Confidence Bound (BLinUCB), and prove that the cumulative regret is $\tilde{O}( d_f \sqrt{T } )$. |
Huiwen Jia; Cong Shi; Siqian Shen; |
446 | The Role of Deconfounding in Meta-learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we offer a novel causal perspective of meta-learning. |
Yinjie Jiang; Zhengyu Chen; Kun Kuang; Luotian Yuan; Xinhai Ye; Zhihua Wang; Fei Wu; Ying Wei; |
447 | Subspace Learning for Effective Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an algorithm to learn the meta-parameters (\ie, subspace bases). |
Weisen Jiang; James Kwok; Yu Zhang; |
448 | Optimal Algorithms for Stochastic Multi-Level Compositional Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the problem of stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth but possibly non-convex functions. |
Wei Jiang; Bokun Wang; Yibo Wang; Lijun Zhang; Tianbao Yang; |
449 | Antibody-Antigen Docking and Design Via Hierarchical Structure Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new model called Hierarchical Structure Refinement Network (HSRN) for paratope docking and design. |
Wengong Jin; Dr.Regina Barzilay; Tommi Jaakkola; |
450 | Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This is due to the fact that in Greedy-BFGS the Hessian is directly approximated and the Newton direction approximation may not be as accurate as the one for BFGS. In this paper, we close this gap and present a novel BFGS method that has the best of two worlds. |
Qiujiang Jin; Alec Koppel; Ketan Rajawat; Aryan Mokhtari; |
451 | The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper considers two-player zero-sum Markov Games (MGs). We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension—a new complexity measure adapted from its single-agent version (Jin et al., 2021). |
Chi Jin; Qinghua Liu; Tiancheng Yu; |
452 | Domain Adaptation for Time Series Forecasting Via Attention Sharing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This poses a challenge for typical forecasting problems in practice, where there is a limited number of time series or observations per time series, or both. To cope with this data scarcity issue, we propose a novel domain adaptation framework, Domain Adaptation Forecaster (DAF). |
Xiaoyong Jin; Youngsuk Park; Danielle Maddix; Hao Wang; Yuyang Wang; |
453 | Accelerated Federated Learning with Decoupled Adaptive Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). |
Jiayin Jin; Jiaxiang Ren; Yang Zhou; Lingjuan Lyu; Ji Liu; Dejing Dou; |
454 | Supervised Off-Policy Ranking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method to solve SOPR, which learns a policy scoring model by minimizing a ranking loss of the training policies rather than estimating the precise policy performance. |
Yue Jin; Yue Zhang; Tao Qin; Xudong Zhang; Jian Yuan; Houqiang Li; Tie-Yan Liu; |
455 | Input-agnostic Certified Group Fairness Via Gaussian Parameter Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes an input-agnostic certified group fairness algorithm, FairSmooth, for improving the fairness of classification models while maintaining the remarkable prediction accuracy. |
Jiayin Jin; Zeru Zhang; Yang Zhou; Lingfei Wu; |
456 | Score-based Generative Modeling of Graphs Via The System of Stochastic Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, this is a challenging problem, and the previous graph generative methods either fail to capture the permutation-invariance property of graphs or cannot sufficiently model the complex dependency between nodes and edges, which is crucial for generating real-world graphs such as molecules. To overcome such limitations, we propose a novel score-based generative model for graphs with a continuous-time framework. |
Jaehyeong Jo; Seul Lee; Sung Ju Hwang; |
457 | Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We demonstrate that picking the answer with highest mean does not allow an algorithm to reach asymptotic optimality in terms of expected sample complexity. Instead, a furthest answer should be identified. |
Marc Jourdan; R?my Degenne; |
458 | Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning. |
Haotian Ju; Dongyue Li; Hongyang R Zhang; |
459 | Robust Alignment of Cross-session Recordings of Neural Population Activity By Behaviour Via Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: On the other hand, evidence suggests that the latent dynamics underlying behaviour may be stable even over months and years. Based on this idea, we introduce a model capable of inferring behaviourally relevant latent dynamics from previously unseen data recorded from the same animal, without any need for decoder recalibration. |
Justin Jude; Matthew Perich; Lee Miller; Matthias Hennig; |
460 | On Measuring Causal Contributions Via Do-interventions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a principled method for quantifying causal contributions. First, we provide desiderata of properties axioms that causal contribution measures should satisfy and propose the do-Shapley values (inspired by do-interventions [Pearl, 2000]) as a unique method satisfying these properties. |
Yonghan Jung; Shiva Kasiviswanathan; Jin Tian; Dominik Janzing; Patrick Bloebaum; Elias Bareinboim; |
461 | Efficient Approximate Inference for Stationary Kernel on Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, despite its expressive power, training this kernel is typically difficult because scalability and overfitting issues often arise due to a large number of training parameters. To resolve these issues, we propose an approximate inference method for estimating the Spectral mixture kernel hyperparameters. |
Yohan Jung; Kyungwoo Song; Jinkyoo Park; |
462 | Sketching Algorithms and Lower Bounds for Ridge Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We give a sketching-based iterative algorithm that computes a $1+\varepsilon$ approximate solution for the ridge regression problem $\min_x \|Ax-b\|_2^2 +\lambda\|x\|_2^2$ where $A \in R^{n \times d}$ with $d \ge n$. |
Praneeth Kacham; David Woodruff; |
463 | Flashlight: Enabling Innovation in Tools for Machine Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems by prioritizing open, modular, customizable internals and state-of-the-art, research-ready models and training setups across a variety of domains. |
Jacob D Kahn; Vineel Pratap; Tatiana Likhomanenko; Qiantong Xu; Awni Hannun; Jeff Cai; Paden Tomasello; Ann Lee; Edouard Grave; Gilad Avidov; Benoit Steiner; Vitaliy Liptchinsky; Gabriel Synnaeve; Ronan Collobert; |
464 | Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this contribution, we demonstrate how to successfully apply RL to the optimisation of a highly complex real-world machine {–} specifically a linear particle accelerator {–} in an only partially observable setting and without requiring training on the real machine. |
Jan Kaiser; Oliver Stein; Annika Eichler; |
465 | Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Abstract: This work addresses meta-learning (ML) by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units results in sparse … |
Konstantinos Kalais; Sotirios Chatzis; |
466 | Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. |
Nathan Kallus; Xiaojie Mao; Kaiwen Wang; Zhengyuan Zhou; |
467 | Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). |
Gautam Kamath; Xingtu Liu; Huanyu Zhang; |
468 | Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. |
Hidetaka Kamigaito; Katsuhiko Hayashi; |
469 | Matching Learned Causal Effects of Neural Networks with Domain Priors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Therefore, we propose a regularization method that aligns the learned causal effects of a neural network with domain priors, including both direct and total causal effects. |
Sai Srinivas Kancheti; Abbavaram Gowtham Reddy; Vineeth N Balasubramanian; Amit Sharma; |
470 | Deduplicating Training Data Mitigates Privacy Risks in Language Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the success of these attacks is largely due to duplication in commonly used web-scraped training sets. |
Nikhil Kandpal; Eric Wallace; Colin Raffel; |
471 | Lyapunov Density Models: Constraining Distribution Shift in Learning-Based Control Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Can we combine these two concepts, producing learning-based control algorithms that constrain the system to in-distribution states using only in-distribution actions? In this paper, we propose to do this by combining concepts from Lyapunov stability and density estimation, introducing Lyapunov density models: a generalization of control Lyapunov functions and density models that provides guarantees about an agent’s ability to stay in-distribution over its entire trajectory. |
Katie Kang; Paula Gradu; Jason J Choi; Michael Janner; Claire Tomlin; Sergey Levine; |
472 | Forget-free Continual Learning with Winning Subnetworks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by Lottery Ticket Hypothesis that competitive subnetworks exist within a dense network, we propose a continual learning method referred to as Winning SubNetworks (WSN), which sequentially learns and selects an optimal subnetwork for each task. |
Haeyong Kang; Rusty John Lloyd Mina; Sultan Rizky Hikmawan Madjid; Jaehong Yoon; Mark Hasegawa-Johnson; Sung Ju Hwang; Chang D. Yoo; |
473 | Differentially Private Approximate Quantiles Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we study the problem of differentially private (DP) quantiles, in which given dataset $X$ and quantiles $q_1, …, q_m \in [0,1]$, we want to output $m$ quantile estimations which are as close as possible to the true quantiles and preserve DP. |
Haim Kaplan; Shachar Schnapp; Uri Stemmer; |
474 | Simultaneous Graph Signal Clustering and Graph Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the problem of learning multiple graphs from heterogeneous data by formulating an optimization problem for joint graph signal clustering and graph topology inference. |
Abdullah Karaaslanli; Selin Aviyente; |
475 | Composing Partial Differential Equations with Physics-Aware Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a compositional physics-aware FInite volume Neural Network (FINN) for learning spatiotemporal advection-diffusion processes. |
Matthias Karlbauer; Timothy Praditia; Sebastian Otte; Sergey Oladyshkin; Wolfgang Nowak; Martin V. Butz; |
476 | Meta-Learning Hypothesis Spaces for Sequential Decision-making Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose to meta-learn a kernel from offline data (Meta-KeL). |
Parnian Kassraie; Jonas Rothfuss; Andreas Krause; |
477 | FOCUS: Familiar Objects in Common and Uncommon Settings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce FOCUS (Familiar Objects in Common and Uncommon Settings), a dataset for stress-testing the generalization power of deep image classifiers. |
Priyatham Kattakinda; Soheil Feizi; |
478 | Training OOD Detectors in Their Natural Habitats Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework that leverages wild mixture data—that naturally consists of both ID and OOD samples. |
Julian Katz-Samuels; Julia B Nakhleh; Robert Nowak; Yixuan Li; |
479 | Robustness Implies Generalization Via Data-Dependent Generalization Bounds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. |
Kenji Kawaguchi; Zhun Deng; Kyle Luh; Jiaoyang Huang; |
480 | Generating Distributional Adversarial Examples to Evade Statistical Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Due to the difficulties in designing adaptive attacks, however, recent work suggests that most detectors have incomplete evaluation. We aim to fill this gap by designing a generic adaptive attack against detectors: the ’statistical indistinguishability attack’ (SIA). |
Yigitcan Kaya; Muhammad Bilal Zafar; Sergul Aydore; Nathalie Rauschmayr; Krishnaram Kenthapadi; |
481 | Secure Quantized Training for Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We implement training of neural networks in secure multi-party computation (MPC) using quantization commonly used in said setting. |
Marcel Keller; Ke Sun; |
482 | A Convergent and Dimension-Independent Min-Max Optimization Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. |
Vijay Keswani; Oren Mangoubi; Sushant Sachdeva; Nisheeth K. Vishnoi; |
483 | Neural Network Poisson Models for Behavioural and Neural Spike Train Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Predominant modeling methods apply rather disjoint techniques to these scales; by contrast, we suggest an end-to-end model which exploits recent developments of flexible, but tractable, neural network point-process models to characterize dependencies between stimuli, actions, and neural data. |
Moein Khajehnejad; Forough Habibollahi; Richard Nock; Ehsan Arabzadeh; Peter Dayan; Amir Dezfouli; |
484 | Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider a federated reinforcement learning framework where multiple agents collaboratively learn a global model, without sharing their individual data and policies. |
Sajad Khodadadian; Pranay Sharma; Gauri Joshi; Siva Theja Maguluri; |
485 | Multi-Level Branched Regularization for Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the limitations, we propose a novel architectural regularization technique that constructs multiple auxiliary branches in each local model by grafting local and global subnetworks at several different levels and that learns the representations of the main pathway in the local model congruent to the auxiliary hybrid pathways via online knowledge distillation. |
Jinkyu Kim; Geeho Kim; Bohyung Han; |
486 | Learning Fair Representation with A Parametric Integral Probability Metric Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new adversarial training scheme for LFR, where the integral probability metric (IPM) with a specific parametric family of discriminators is used. |
Dongha Kim; Kunwoong Kim; Insung Kong; Ilsang Ohn; Yongdai Kim; |
487 | Dataset Condensation Via Efficient Synthetic-Data Parameterization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity. |
Jang-Hyun Kim; Jinuk Kim; Seong Joon Oh; Sangdoo Yun; Hwanjun Song; Joonhyun Jeong; Jung-Woo Ha; Hyun Oh Song; |
488 | Guided-TTS: A Diffusion Model for Text-to-Speech Via Classifier Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require any transcript of target speaker using classifier guidance. |
Heeseung Kim; Sungwon Kim; Sungroh Yoon; |
489 | Variational On-the-Fly Personalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel personalization method, Variational On-the-Fly Personalization. |
Jangho Kim; Jun-Tae Lee; Simyung Chang; Nojun Kwak; |
490 | Fisher SAM: Information Geometry and Sharpness Aware Minimisation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM’s Euclidean balls with ellipsoids induced by the Fisher information. |
Minyoung Kim; Da Li; Shell X Hu; Timothy Hospedales; |
491 | ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose a new ViT neural tree decoder (ViT-NeT). |
Sangwon Kim; Jaeyeal Nam; Byoung Chul Ko; |
492 | Sanity Simulations for Saliency Methods Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we design a synthetic benchmarking framework, SMERF, that allows us to perform ground-truth-based evaluation while controlling the complexity of the model’s reasoning. |
Joon Sik Kim; Gregory Plumb; Ameet Talwalkar; |
493 | Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For successful training, therefore, we introduce Soft Truncation, a universally applicable training technique for diffusion models, that softens the fixed and static truncation hyperparameter into a random variable. |
Dongjun Kim; Seungjae Shin; Kyungwoo Song; Wanmo Kang; Il-Chul Moon; |
494 | Rotting Infinitely Many-Armed Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\rot=o(1)$. |
Jung-Hun Kim; Milan Vojnovic; Se-Young Yun; |
495 | Accelerated Gradient Methods for Geodesically Convex Optimization: Tractable Algorithms and Convergence Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose computationally tractable accelerated first-order methods for Riemannian optimization, extending the Nesterov accelerated gradient (NAG) method. |
Jungbin Kim; Insoon Yang; |
496 | Generalizing to New Physical Systems Via Context-Informed Dynamics Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Data-driven approaches to modeling physical systems fail to generalize to unseen systems that share the same general dynamics with the learning domain, but correspond to different physical contexts. We propose a new framework for this key problem, context-informed dynamics adaptation (CoDA), which takes into account the distributional shift across systems for fast and efficient adaptation to new dynamics. |
Matthieu Kirchmeyer; Yuan Yin; Jeremie Dona; Nicolas Baskiotis; Alain Rakotomamonjy; Patrick Gallinari; |
497 | SoQal: Selective Oracle Questioning for Consistency Based Active Learning of Cardiac Signals Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: One way to mitigate this burden is via active learning (AL) which involves the (a) acquisition and (b) annotation of informative unlabelled instances. Whereas previous work addresses either one of these elements independently, we propose an AL framework that addresses both. |
Dani Kiyasseh; Tingting Zhu; David A Clifton; |
498 | Curriculum Reinforcement Learning Via Constrained Optimal Transport Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on the idea of framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL. |
Pascal Klink; Haoyi Yang; Carlo D?Eramo; Jan Peters; Joni Pajarinen; |
499 | Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we investigate the properties of representations learned by regular G-CNNs, and show considerable parameter redundancy in group convolution kernels. |
David M. Knigge; David W Romero; Erik J Bekkers; |
500 | Revisiting Contrastive Learning Through The Lens of Neighborhood Component Analysis: An Integrated Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. |
Ching-Yun Ko; Jeet Mohapatra; Sijia Liu; Pin-Yu Chen; Luca Daniel; Lily Weng; |
501 | Transfer Learning In Differential Privacy’s Hybrid-Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we study the problem of machine learning in the hybrid-model where the $n$ individuals in the curator’s dataset are drawn from a different distribution than the one of the general population (the local-agents). |
Refael Kohen; Or Sheffet; |
502 | Markov Chain Monte Carlo for Continuous-Time Switching Dynamical Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel inference algorithm utilizing a Markov Chain Monte Carlo approach. |
Lukas K?hs; Bastian Alt; Heinz Koeppl; |
503 | Partial Disentanglement for Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Given the theoretical insights, we propose a practical domain adaptation framework, called iMSDA. |
Lingjing Kong; Shaoan Xie; Weiran Yao; Yujia Zheng; Guangyi Chen; Petar Stojanov; Victor Akinwande; Kun Zhang; |
504 | Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome this difficulty by a new trade-off mechanism with a carefully-designed proportion for exploration and exploitation. |
Fang Kong; Yichi Zhou; Shuai Li; |
505 | Adaptive Data Analysis with Correlated Observations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. |
Aryeh Kontorovich; Menachem Sadigurschi; Uri Stemmer; |
506 | Controlling Conditional Language Models Without Catastrophic Forgetting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we extend DPG to conditional tasks by proposing Conditional DPG (CDPG). |
Tomasz Korbak; Hady Elsahar; German Kruszewski; Marc Dymetman; |
507 | Batch Greenkhorn Algorithm for Entropic-Regularized Multimarginal Optimal Transport: Linear Rate of Convergence and Iteration Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we propose a batch multimarginal version of the Greenkhorn algorithm for the entropic-regularized optimal transport problem. |
Vladimir R. Kostic; Saverio Salzo; Massimiliano Pontil; |
508 | Certified Adversarial Robustness Under The Bounded Support Set Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we generalize the $f$-divergence-based framework to a Wasserstein-distance-based and total-variation-distance-based framework that is first able to analyze robustness properties of bounded support set smoothing measures both theoretically and experimentally. |
Yiwen Kou; Qinyuan Zheng; Yisen Wang; |
509 | Exact Learning of Preference Structure: Single-peaked Preferences and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the setting where the members of a society (voters) have preferences over candidates, and the candidates can be ordered on an axis so that the voters’ preferences are single-peaked on this axis. |
Sonja Kraiczy; Edith Elkind; |
510 | Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose a general framework for multi-modal data integration for the purpose of nonlinear DS reconstruction and the analysis of cross-modal relations. |
Daniel Kramer; Philine L Bommer; Daniel Durstewitz; Carlo Tombolini; Georgia Koppe; |
511 | Probabilistic ODE Solutions in Millions of Dimensions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we explain the mathematical assumptions and detailed implementation schemes behind solving high-dimensional ODEs with a probabilistic numerical algorithm. |
Nicholas Kr?mer; Nathanael Bosch; Jonathan Schmidt; Philipp Hennig; |
512 | Active Nearest Neighbor Regression Through Delaunay Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce an algorithm for active function approximation based on nearest neighbor regression. |
Alexander Kravberg; Giovanni Luca Marchetti; Vladislav Polianskii; Anastasiia Varava; Florian T. Pokorny; Danica Kragic; |
513 | Functional Generalized Empirical Likelihood Estimation for Conditional Moment Restrictions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To benefit from recent developments in machine learning, we provide a functional reformulation of GEL in which arbitrary models can be leveraged. Motivated by a dual formulation of the resulting infinite dimensional optimization problem, we devise a practical method and explore its asymptotic properties. |
Heiner Kremer; Jia-Jie Zhu; Krikamol Muandet; Bernhard Sch?lkopf; |
514 | Calibrated and Sharp Uncertainties in Deep Learning Via Density Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a simple training procedure based on recalibration that yields calibrated models without sacrificing overall performance; unlike previous approaches, ours ensures the most general property of distribution calibration and applies to any model, including neural networks. |
Volodymyr Kuleshov; Shachi Deshpande; |
515 | ActiveHedge: Hedge Meets Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the classical problem of multi-class prediction with expert advice, but with an active learning twist. |
Bhuvesh Kumar; Jacob D Abernethy; Venkatesh Saligrama; |
516 | Balancing Discriminability and Transferability for Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Upon analyzing the hurdles from both theoretical and empirical standpoints, we derive novel insights to show that a mixup between original and corresponding translated generic samples enhances the discriminability-transferability trade-off while duly respecting the privacy-oriented source-free setting. |
Jogendra Nath Kundu; Akshay R Kulkarni; Suvaansh Bhambri; Deepesh Mehta; Shreyas Anand Kulkarni; Varun Jampani; Venkatesh Babu Radhakrishnan; |
517 | Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we argue for the importance of an online evaluation budget for a reliable comparison of deep offline RL algorithms. |
Vladislav Kurenkov; Sergey Kolesnikov; |
518 | Equivariant Priors for Compressed Sensing with Unknown Orientation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Additionally, in many scenarios, the signal has an unknown orientation prior to measurements. To address such recovery problems, we propose using equivariant generative models as a prior, which encapsulate orientation information in their latent space. |
Anna Kuzina; Kumar Pratik; Fabio Valerio Massoli; Arash Behboodi; |
519 | Coordinated Attacks Against Contextual Bandits: Fundamental Limits and Defense Mechanisms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction $\alpha < 1/2$ of tasks (users) are arbitrary and adversarial. |
Jeongyeol Kwon; Yonathan Efroni; Constantine Caramanis; Shie Mannor; |
520 | Large Batch Experience Replay Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. |
Thibault Lahire; Matthieu Geist; Emmanuel Rachelson; |
521 | FedScale: Benchmarking Model and System Performance of Federated Learning at Scale Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FedScale, a federated learning (FL) benchmarking suite with realistic datasets and a scalable runtime to enable reproducible FL research. |
Fan Lai; Yinwei Dai; Sanjay Singapuram; Jiachen Liu; Xiangfeng Zhu; Harsha Madhyastha; Mosharaf Chowdhury; |
522 | Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Based on this study, we propose a self-adaptive algorithm, named Smoothed Adaptive Weighting (SAW). |
Zhengfeng Lai; Chao Wang; Henrry Gunawan; Sen-Ching S Cheung; Chen-Nee Chuah; |
523 | Functional Output Regression with Infimal Convolution: Exploring The Huber and $e$-insensitive Losses Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We derive computationally tractable algorithms relying on duality to tackle the resulting tasks in the context of vector-valued reproducing kernel Hilbert spaces. |
Alex Lambert; Dimitri Bouche; Zoltan Szabo; Florence D?Alch?-Buc; |
524 | Tell Me Why! Explanations Support Learning Relational and Causal Structure Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we show that language can play a similar role for deep RL agents in complex environments. |
Andrew K Lampinen; Nicholas Roy; Ishita Dasgupta; Stephanie Cy Chan; Allison Tam; James Mcclelland; Chen Yan; Adam Santoro; Neil C Rabinowitz; Jane Wang; Felix Hill; |
525 | Generative Cooperative Networks for Natural Language Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce Generative Cooperative Networks, in which the discriminator architecture is cooperatively used along with the generation policy to output samples of realistic texts for the task at hand. |
Sylvain Lamprier; Thomas Scialom; Antoine Chaffin; Vincent Claveau; Ewa Kijak; Jacopo Staiano; Benjamin Piwowarski; |
526 | DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural Network for Traffic Flow Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel Dynamic Spatial-Temporal Aware Graph Neural Network (DSTAGNN) to model the complex spatial-temporal interaction in road network. |
Shiyong Lan; Yitong Ma; Weikang Huang; Wenwu Wang; Hongyu Yang; Pyang Li; |
527 | Cooperative Online Learning in Stochastic and Adversarial MDPs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). |
Tal Lancewicki; Aviv Rosenberg; Yishay Mansour; |
528 | PINs: Progressive Implicit Networks for Multi-Scale Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, scenes with a wide frequency spectrum remain a challenge: choosing high frequencies for positional encoding introduces noise in low structure areas, while low frequencies results in poor fitting of detailed regions. To address this, we propose a progressive positional encoding, exposing a hierarchical MLP structure to incremental sets of frequency encodings. |
Zoe Landgraf; Alexander Sorkine Hornung; Ricardo S Cabral; |
529 | Co-training Improves Prompt-based Learning for Large Language Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled data. |
Hunter Lang; Monica N Agrawal; Yoon Kim; David Sontag; |
530 | Goal Misgeneralization in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study goal misgeneralization, a type of out-of-distribution robustness failure in reinforcement learning (RL). |
Lauro Langosco Di Langosco; Jack Koch; Lee D Sharkey; Jacob Pfau; David Krueger; |
531 | Marginal Tail-Adaptive Normalizing Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. |
Mike Laszkiewicz; Johannes Lederer; Asja Fischer; |
532 | Bregman Proximal Langevin Monte Carlo Via Bregman-Moreau Envelopes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose efficient Langevin Monte Carlo algorithms for sampling distributions with nonsmooth convex composite potentials, which is the sum of a continuously differentiable function and a possibly nonsmooth function. |
Tim Tsz-Kit Lau; Han Liu; |
533 | Scalable Deep Reinforcement Learning Algorithms for Mean Field Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. |
Mathieu Lauriere; Sarah Perrin; Sertan Girgin; Paul Muller; Ayush Jain; Theophile Cabannes; Georgios Piliouras; Julien Perolat; Romuald Elie; Olivier Pietquin; Matthieu Geist; |
534 | Implicit Bias of Linear Equivariant Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this context, we show that L-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the 2/L-Schatten matrix norm. |
Hannah Lawrence; Bobak Kiani; Kristian G Georgiev; Andrew K Dienes; |
535 | Differentially Private Maximal Information Coefficients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a solution, we present algorithms to approximate MIC in a way that provides differential privacy. |
John Lazarsfeld; Aaron Johnson; Emmanuel Adeniran; |
536 | Entropic Gromov-Wasserstein Between Gaussian Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. |
Khang Le; Dung Q Le; Huy Nguyen; Dat Do; Tung Pham; Nhat Ho; |
537 | Neurocoder: General-Purpose Computation Using Stored Neural Programs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we design Neurocoder, a new class of general-purpose neural networks in which the neural network “codes” itself in a data-responsive way by composing relevant programs from a set of shareable, modular programs stored in external memory. |
Hung Le; Svetha Venkatesh; |
538 | Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in The Mean-Field Regime Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). |
James-Michael Leahy; Bekzhan Kerimkulov; David Siska; Lukasz Szpruch; |
539 | A Random Matrix Analysis of Data Stream Clustering: Coping With Limited Memory Resources Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This article introduces a random matrix framework for the analysis of clustering on high-dimensional data streams, a particularly relevant setting for a more sober processing of large amounts of data with limited memory and energy resources. |
Hugo Lebeau; Romain Couillet; Florent Chatelain; |
540 | Neural Tangent Kernel Analysis of Deep Narrow Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present the first trainability guarantee of infinitely deep but narrow neural networks. |
Jongmin Lee; Joo Young Choi; Ernest K Ryu; Albert No; |
541 | Dataset Condensation with Contrastive Signals Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We attribute this to the lack of participation of the contrastive signals between the classes resulting from the class-wise gradient matching strategy. To address this problem, we propose Dataset Condensation with Contrastive signals (DCC) by modifying the loss function to enable the DC methods to effectively capture the differences between classes. |
Saehyung Lee; Sanghyuk Chun; Sangwon Jung; Sangdoo Yun; Sungroh Yoon; |
542 | Confidence Score for Source-Free Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To differentiate between sample importance, in this study, we propose a novel sample-wise confidence score, the Joint Model-Data Structure (JMDS) score for SFUDA. |
Jonghyun Lee; Dahuin Jung; Junho Yim; Sungroh Yoon; |
543 | A Statistical Manifold Framework for Point Cloud Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A growing number of applications require a means of measuring not only distances between point clouds, but also angles, volumes, derivatives, and other more advanced concepts. To formulate and quantify these concepts in a coordinate-invariant way, we develop a Riemannian geometric framework for point cloud data. |
Yonghyeon Lee; Seungyeon Kim; Jinwon Choi; Frank Park; |
544 | Low-Complexity Deep Convolutional Neural Networks on Fully Homomorphic Encryption Using Multiplexed Parallel Convolutions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To improve the performance, we first minimize total bootstrapping runtime using multiplexed parallel convolution that collects sparse output data for multiple channels compactly. We also propose the imaginary-removing bootstrapping to prevent the deep neural networks from catastrophic divergence during approximate ReLU operations. |
Eunsang Lee; Joon-Woo Lee; Junghyun Lee; Young-Sik Kim; Yongjune Kim; Jong-Seon No; Woosuk Choi; |
545 | Statistical Inference with Implicit SGD: Proximal Robbins-Monro Vs. Polyak-Ruppert Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters. |
Yoonhyung Lee; Sungdong Lee; Joong-Ho Won; |
546 | Maslow’s Hammer in Catastrophic Forgetting: Node Re-Use Vs. Node Activation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-student framework and a real data setup to provide an explanation of this phenomenon that we name Maslow’s Hammer hypothesis. |
Sebastian Lee; Stefano Sarao Mannelli; Claudia Clopath; Sebastian Goldt; Andrew Saxe; |
547 | Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data Via Bayesian Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce block decomposition and history subsampling techniques to improve the scalability of Bayesian optimization when an input sequence becomes long. |
Deokjae Lee; Seungyong Moon; Junhyeok Lee; Hyun Oh Song; |
548 | Least Squares Estimation Using Sketched Data with Heteroskedastic Errors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. |
Sokbae Lee; Serena Ng; |
549 | Why The Rich Get Richer? On The Balancedness of Random Partition Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a principled way to compare the balancedness of random partition models, which gives a better understanding of what model works better and what doesn’t for different applications. |
Changwoo J Lee; Huiyan Sang; |
550 | Model Selection in Batch Policy Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and M model classes, learn a policy with performance that is competitive with the policy derived from the best model class. |
Jonathan Lee; George Tucker; Ofir Nachum; Bo Dai; |
551 | Supervised Learning with General Risk Functionals Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We establish the first uniform convergence results for estimating the CDF of the loss distribution, which yield uniform convergence guarantees that hold simultaneously both over a class of Hölder risk functionals and over a hypothesis class. |
Liu Leqi; Audrey Huang; Zachary Lipton; Kamyar Azizzadenesheli; |
552 | Generalized Strategic Classification and The Case of Aligned Incentives Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we argue for a broader perspective on what accounts for strategic user behavior, and propose and study a flexible model of generalized strategic classification. |
Sagi Levanon; Nir Rosenfeld; |
553 | A Simple Unified Framework for High Dimensional Bandit Problems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Stochastic high dimensional bandit problems with low dimensional structures are useful in different applications such as online advertising and drug discovery. In this work, we propose a simple unified algorithm for such problems and present a general analysis framework for the regret upper bound of our algorithm. |
Wenjie Li; Adarsh Barik; Jean Honorio; |
554 | Robust Training of Neural Networks Using Scale Invariant Architectures Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\frac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. |
Zhiyuan Li; Srinadh Bhojanapalli; Manzil Zaheer; Sashank Reddi; Sanjiv Kumar; |
555 | Spatial-Channel Token Distillation for Vision MLPs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work solves the problem from a novel knowledge distillation perspective. We propose a novel Spatial-channel Token Distillation (STD) method, which improves the information mixing in the two dimensions by introducing distillation tokens to each of them. |
Yanxi Li; Xinghao Chen; Minjing Dong; Yehui Tang; Yunhe Wang; Chang Xu; |
556 | An Analytical Update Rule for General Policy Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an analytical policy update rule that is independent of parametric function approximators. |
Hepeng Li; Nicholas Clavette; Haibo He; |
557 | On Convergence of Gradient Descent Ascent: A Tight Local Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the local convergence of general nonconvex-nonconcave minimax problems. |
Haochuan Li; Farzan Farnia; Subhro Das; Ali Jadbabaie; |
558 | On The Finite-Time Performance of The Knowledge Gradient Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this research, we present new theoretical results about the finite-time performance of the KG algorithm. |
Yanwen Li; Siyang Gao; |
559 | Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel phasic solution by alternating online RL and offline SL for tackling sparse-reward goal-conditioned problems.In the online phase, we perform RL training and collect rollout data while in the offline phase, we perform SL on those successful trajectories from the dataset. |
Yunfei Li; Tian Gao; Jiaqi Yang; Huazhe Xu; Yi Wu; |
560 | G$^2$CN: Graph Gaussian Convolution Networks with Concentrated Graph Filters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Nevertheless, we notice that existing spectral analysis fails to explain why existing graph propagations with the same global tendency, such as low-pass or high-pass, still yield very different results. Motivated by this situation, we develop a new framework for spectral analysis in this paper called concentration analysis. |
Mingjie Li; Xiaojun Guo; Yifei Wang; Yisen Wang; Zhouchen Lin; |
561 | Decomposing Temporal High-Order Interactions Via Latent ODEs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a result, these methods might not be capable enough of capturing complex, fine-grained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). |
Shibo Li; Robert Kirby; Shandian Zhe; |
562 | Neural Inverse Transform Sampler Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that when modeling one-dimensional conditional densities with a neural network, $Z$ can be exactly and efficiently computed by letting the network represent the cumulative distribution function of a target density, and applying a generalized fundamental theorem of calculus. |
Henry Li; Yuval Kluger; |
563 | PLATINUM: Semi-Supervised Model Agnostic Meta-Learning Using Submodular Mutual Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose PLATINUM (semi-suPervised modeL Agnostic meTa learnIng usiNg sUbmodular Mutual information ), a novel semi-supervised model agnostic meta learning framework that uses the submodular mutual in- formation (SMI) functions to boost the perfor- mance of FSC. |
Changbin Li; Suraj Kothawade; Feng Chen; Rishabh Iyer; |
564 | Deconfounded Value Decomposition for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate VD from a novel perspective of causal inference. |
Jiahui Li; Kun Kuang; Baoxiang Wang; Furui Liu; Long Chen; Changjie Fan; Fei Wu; Jun Xiao; |
565 | C-MinHash: Improving Minwise Hashing with Circulant Permutation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Circulant MinHash (C-MinHash) and provide the surprising theoretical results that using only two independent random permutations in a circulant manner leads to uniformly smaller Jaccard estimation variance than that of the classical MinHash with K independent permutations. |
Xiaoyun Li; Ping Li; |
566 | BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. |
Junnan Li; Dongxu Li; Caiming Xiong; Steven Hoi; |
567 | Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in The $O(e^-7/4)$ Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper studies the accelerated gradient descent for general nonconvex problems under the gradient Lipschitz and Hessian Lipschitz assumptions. |
Huan Li; Zhouchen Lin; |
568 | Achieving Fairness at No Utility Cost Via Data Reweighing with Influence Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the pre-processing aspect for achieving fairness, and propose a data reweighing approach that only adjusts the weight for samples in the training phase. |
Peizhao Li; Hongfu Liu; |
569 | High Probability Guarantees for Nonconvex Stochastic Gradient Descent with Heavy Tails Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop high probability bounds for nonconvex SGD with a joint perspective of optimization and generalization performance. |
Shaojie Li; Yong Liu; |
570 | MetAug: Contrastive Learning Via Meta Feature Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In response, we propose to directly augment the features in latent space, thereby learning discriminative representations without a large amount of input data. |
Jiangmeng Li; Wenwen Qiang; Changwen Zheng; Bing Su; Hui Xiong; |
571 | PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration. To address this issue, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), for more effective MI-driven collaboration. |
Pengyi Li; Hongyao Tang; Tianpei Yang; Xiaotian Hao; Tong Sang; Yan Zheng; Jianye Hao; Matthew E. Taylor; Wenyuan Tao; Zhen Wang; |
572 | CerDEQ: Certifiable Deep Equilibrium Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we aim to tackle the problem of DEQ’s certified training. |
Mingjie Li; Yisen Wang; Zhouchen Lin; |
573 | Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. |
Hongkang Li; Meng Wang; Sijia Liu; Pin-Yu Chen; Jinjun Xiong; |
574 | Let Invariant Rationale Discovery Inspire Graph Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Taking an invariance look at GCL, we argue that a high-performing augmentation should preserve the salient semantics of anchor graphs regarding instance-discrimination. To this end, we relate GCL with invariant rationale discovery, and propose a new framework, Rationale-aware Graph Contrastive Learning (RGCL). |
Sihang Li; Xiang Wang; An Zhang; Yingxin Wu; Xiangnan He; Tat-Seng Chua; |
575 | Difference Advantage Estimation for Multi-Agent Policy Gradients Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. |
Yueheng Li; Guangming Xie; Zongqing Lu; |
576 | Private Adaptive Optimization with Side Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose AdaDPS, a general framework that uses non-sensitive side information to precondition the gradients, allowing the effective use of adaptive methods in private settings. |
Tian Li; Manzil Zaheer; Sashank Reddi; Virginia Smith; |
577 | Permutation Search of Tensor Network Structures Via Local Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider a practical variant of TN-SS, dubbed TN permutation search (TN-PS), in which we search for good mappings from tensor modes onto TN vertices (core tensors) for compact TN representations. |
Chao Li; Junhua Zeng; Zerui Tao; Qibin Zhao; |
578 | Hessian-Free High-Resolution Nesterov Acceleration For Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Nesterov’s Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed (Shi et al., 2021). This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods. |
Ruilin Li; Hongyuan Zha; Molei Tao; |
579 | Double Sampling Randomized Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We instantiate DSRS for a generalized family of Gaussian smoothing and propose an efficient and sound computing method based on customized dual optimization considering sampling error. |
Linyi Li; Jiawei Zhang; Tao Xie; Bo Li; |
580 | HousE: Knowledge Graph Embedding with Householder Parameterization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a more powerful KGE framework named HousE, which involves a novel parameterization based on two kinds of Householder transformations: (1) Householder rotations to achieve superior capacity of modeling relation patterns; (2) Householder projections to handle sophisticated relation mapping properties. |
Rui Li; Jianan Zhao; Chaozhuo Li; Di He; Yiqi Wang; Yuming Liu; Hao Sun; Senzhang Wang; Weiwei Deng; Yanming Shen; Xing Xie; Qi Zhang; |
581 | Learning Multiscale Transformer Models for Sequence Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. |
Bei Li; Tong Zheng; Yi Jing; Chengbo Jiao; Tong Xiao; Jingbo Zhu; |
582 | Finding Global Homophily in Graph Neural Networks When Meeting Heterophily Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Further, for other homophilous nodes excluded in the neighborhood, they are ignored for information aggregation. To address these problems, we propose two models GloGNN and GloGNN++, which generate a node’s embedding by aggregating information from global nodes in the graph. |
Xiang Li; Renyu Zhu; Yao Cheng; Caihua Shan; Siqiang Luo; Dongsheng Li; Weining Qian; |
583 | Fat-Tailed Variational Inference with Anisotropic Tail Adaptive Flows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In doing so, we unveil a fundamental problem which plagues many existing flow-based methods: they can only model tail-isotropic distributions (i.e., distributions having the same tail parameter in every direction). To mitigate this and enable modeling of tail-anisotropic targets, we propose anisotropic tail-adaptive flows (ATAF). |
Feynman Liang; Michael Mahoney; Liam Hodgkinson; |
584 | Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel GAN latent sampling method by exploring and exploiting the hubness priors of GAN latent distributions. |
Yuanbang Liang; Jing Wu; Yu-Kun Lai; Yipeng Qin; |
585 | Reducing Variance in Temporal-Difference Value Estimation Via Ensemble of Deep Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose MeanQ, a simple ensemble method that estimates target values as ensemble means. |
Litian Liang; Yaosheng Xu; Stephen Mcaleer; Dailin Hu; Alexander Ihler; Pieter Abbeel; Roy Fox; |
586 | TSPipe: Learn from Teacher Faster with Pipelines Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents TSPipe, a pipelined approach to accelerate the training process of any TS frameworks including KD and SSL. |
Hwijoon Lim; Yechan Kim; Sukmin Yun; Jinwoo Shin; Dongsu Han; |
587 | Order Constraints in Optimal Transport Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce novel order constraints into the optimal transport formulation to allow for the incorporation of structure. |
Yu Chin Fabian Lim; Laura Wynter; Shiau Hong Lim; |
588 | Flow-Guided Sparse Transformer for Video Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. |
Jing Lin; Yuanhao Cai; Xiaowan Hu; Haoqian Wang; Youliang Yan; Xueyi Zou; Henghui Ding; Yulun Zhang; Radu Timofte; Luc Van Gool; |
589 | Federated Learning with Positive and Unlabeled Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, existing PU learning methods can be hardly applied in this situation. To address this problem, we propose a novel framework, namely Federated learning with Positive and Unlabeled data (FedPU), to minimize the expected risk of multiple negative classes by leveraging the labeled data in other clients. |
Xinyang Lin; Hanting Chen; Yixing Xu; Chao Xu; Xiaolin Gui; Yiping Deng; Yunhe Wang; |
590 | Decentralized Online Convex Optimization in Networked Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. |
Yiheng Lin; Judy Gan; Guannan Qu; Yash Kanoria; Adam Wierman; |
591 | Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: How to properly model the inter-frame relation within the video sequence is an important but unsolved challenge for video restoration (VR). In this work, we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to address this problem. |
Jing Lin; Xiaowan Hu; Yuanhao Cai; Haoqian Wang; Youliang Yan; Xueyi Zou; Yulun Zhang; Luc Van Gool; |
592 | Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose new, more efficient targeted white-box attacks against deep neural networks. |
Weiran Lin; Keane Lucas; Lujo Bauer; Michael K. Reiter; Mahmood Sharif; |
593 | Learning Augmented Binary Search Trees Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Given recent advances in algorithms with predictions, we propose pairing treaps with machine advice to form a learning-augmented treap. |
Honghao Lin; Tian Luo; David Woodruff; |
594 | Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback settings. |
Tianyi Lin; Aldo Pacchiano; Yaodong Yu; Michael Jordan; |
595 | Measuring The Effect of Training Data on Deep Learning Predictions Via Randomized Experiments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop a new, principled algorithm for estimating the contribution of training data points to the behavior of a deep learning model, such as a specific prediction it makes. |
Jinkun Lin; Anqi Zhang; Mathias L?cuyer; Jinyang Li; Aurojit Panda; Siddhartha Sen; |
596 | Interactively Learning Preference Constraints in Linear Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. |
David Lindner; Sebastian Tschiatschek; Katja Hofmann; Andreas Krause; |
597 | Delayed Reinforcement Learning By Imitation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: When the agent’s observations or interactions are delayed, classic reinforcement learning tools usually fail. In this paper, we propose a simple yet new and efficient solution to this problem. |
Pierre Liotet; Davide Maran; Lorenzo Bisi; Marcello Restelli; |
598 | CITRIS: Causal Identifiability from Temporal Intervened Sequences Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose CITRIS, a variational autoencoder framework that learns causal representations from temporal sequences of images in which underlying causal factors have possibly been intervened upon. |
Phillip Lippe; Sara Magliacane; Sindy L?we; Yuki M Asano; Taco Cohen; Stratis Gavves; |
599 | StreamingQA: A Benchmark for Adaptation to New Knowledge Over Time in Question Answering Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To study how semi-parametric QA models and their underlying parametric language models (LMs) adapt to evolving knowledge, we construct a new large-scale dataset, StreamingQA, with human written and generated questions asked on a given date, to be answered from 14 years of time-stamped news articles. |
Adam Liska; Tomas Kocisky; Elena Gribovskaya; Tayfun Terzi; Eren Sezener; Devang Agrawal; Cyprien De Masson D?Autume; Tim Scholtes; Manzil Zaheer; Susannah Young; Ellen Gilsenan-Mcmahon; Sophia Austin; Phil Blunsom; Angeliki Lazaridou; |
600 | Distributionally Robust $Q$-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel distributionally robust $Q$-learning algorithm that learns the best policy in the worst distributional perturbation of the environment. |
Zijian Liu; Qinxun Bai; Jose Blanchet; Perry Dong; Wei Xu; Zhengqing Zhou; Zhengyuan Zhou; |
601 | Constrained Variational Policy Optimization for Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step). |
Zuxin Liu; Zhepeng Cen; Vladislav Isenbaev; Wei Liu; Steven Wu; Bo Li; Ding Zhao; |
602 | Benefits of Overparameterized Convolutional Residual Networks: Function Approximation Under Smoothness Constraint Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The neural network themselves, however, can be highly nonsmooth. To bridge this gap, we take convolutional residual networks (ConvResNets) as an example, and prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness. |
Hao Liu; Minshuo Chen; Siawpeng Er; Wenjing Liao; Tong Zhang; Tuo Zhao; |
603 | Boosting Graph Structure Learning with Dummy Nodes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we use a particular dummy node connecting to all existing vertices without affecting original vertex and edge properties. |
Xin Liu; Jiayang Cheng; Yangqiu Song; Xin Jiang; |
604 | Equivalence Analysis Between Counterfactual Regret Minimization and Online Mirror Descent Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that CFRs with Regret Matching and Regret Matching+ are equivalent to special cases of FTRL and OMD, respectively. |
Weiming Liu; Huacong Jiang; Bin Li; Houqiang Li; |
605 | Deep Probability Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos.To address this, we build a synthetic dataset to study and compare different computable metrics. |
Sheng Liu; Aakash Kaku; Weicheng Zhu; Matan Leibovich; Sreyas Mohan; Boyang Yu; Haoxiang Huang; Laure Zanna; Narges Razavian; Jonathan Niles-Weed; Carlos Fernandez-Granda; |
606 | Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Gating Dropout, which allows tokens to ignore the gating network and stay at their local machines, thus reducing the cross-machine communication. |
Rui Liu; Young Jin Kim; Alexandre Muzio; Hany Hassan; |
607 | Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. |
Siqi Liu; Marc Lanctot; Luke Marris; Nicolas Heess; |
608 | Rethinking Attention-Model Explainability Through Faithfulness Violation Test Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, in this paper, we find one critical limitation in attention explanations: weakness in identifying the polarity of feature impact. |
Yibing Liu; Haoliang Li; Yangyang Guo; Chenqi Kong; Jing Li; Shiqi Wang; |
609 | Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we design a Generalized Krasnoselskii-Mann (GKM) scheme based on fixed-point iterations as our fundamental ODL module, which unifies existing ODL methods as special cases. |
Risheng Liu; Xuan Liu; Shangzhi Zeng; Jin Zhang; Yixuan Zhang; |
610 | Deep Neural Network Fusion Via Graph Matching with Applications to Model Ensemble and Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For the rising problem scale and multi-model consistency issues, we propose an efficient graduated assignment-based model fusion method, dubbed GAMF, which iteratively updates the matchings in a consistency-maintaining manner. |
Chang Liu; Chenfei Lou; Runzhong Wang; Alan Yuhan Xi; Li Shen; Junchi Yan; |
611 | Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Working in a setting in which the utility function and the system dynamics are both unknown, we propose to find the socially optimal policy and the CE from data via both online and offline variants of MARL. |
Zhihan Liu; Miao Lu; Zhaoran Wang; Michael Jordan; Zhuoran Yang; |
612 | Generating 3D Molecules for Target Protein Binding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A fundamental problem in drug discovery is to design molecules that bind to specific proteins. To tackle this problem using machine learning methods, here we propose a novel and effective framework, known as GraphBP, to generate 3D molecules that bind to given proteins by placing atoms of specific types and locations to the given binding site one by one. |
Meng Liu; Youzhi Luo; Kanji Uchino; Koji Maruhashi; Shuiwang Ji; |
613 | Communication-efficient Distributed Learning for Large Batch Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose new gradient compression methods for large batch optimization, JointSpar and its variant JointSpar-LARS with layerwise adaptive learning rates, that jointly reduce both the computation and the communication cost. |
Rui Liu; Barzan Mozafari; |
614 | Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. |
Zijian Liu; Ta Duy Nguyen; Alina Ene; Huy Nguyen; |
615 | REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. |
Xingyu Liu; Deepak Pathak; Kris Kitani; |
616 | Kill A Bird with Two Stones: Closing The Convergence Gaps in Non-Strongly Convex Optimization By Directly Accelerated SVRG with Double Compensation and Snapshots Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, there are still some gaps between the oracle complexities and their lower bounds. To fill in these gaps, this paper proposes a novel Directly Accelerated stochastic Variance reductIon (DAVIS) algorithm with two Snapshots for non-strongly convex (non-SC) unconstrained problems. |
Yuanyuan Liu; Fanhua Shang; Weixin An; Hongying Liu; Zhouchen Lin; |
617 | Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address this problem, this work studies no-regret learning in Markov games with adversarial opponents when competing against the best fixed policy in hindsight. |
Qinghua Liu; Yuanhao Wang; Chi Jin; |
618 | Local Augmentation for Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it remains an open question whether the neighborhood information is adequately aggregated for learning representations of nodes with few neighbors. To address this, we propose a simple and efficient data augmentation strategy, local augmentation, to learn the distribution of the node representations of the neighbors conditioned on the central node’s representation and enhance GNN’s expressive power with generated features. |
Songtao Liu; Rex Ying; Hanze Dong; Lanqing Li; Tingyang Xu; Yu Rong; Peilin Zhao; Junzhou Huang; Dinghao Wu; |
619 | Asking for Knowledge (AFK): Training RL Agents to Query External Knowledge Using Language Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In order to study how agents can be taught to query external knowledge via language, we first introduce two new environments: the grid-world-based Q-BabyAI and the text-based Q-TextWorld. In addition to physical interactions, an agent can query an external knowledge source specialized for these environments to gather information. Second, we propose the ‘Asking for Knowledge’ (AFK) agent, which learns to generate language commands to query for meaningful knowledge that helps solve the tasks. |
Iou-Jen Liu; Xingdi Yuan; Marc-Alexandre C?t?; Pierre-Yves Oudeyer; Alexander Schwing; |
620 | Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. |
Zhihan Liu; Yufeng Zhang; Zuyue Fu; Zhuoran Yang; Zhaoran Wang; |
621 | GACT: Activation Compressed Training for Generic Network Architectures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. |
Xiaoxuan Liu; Lianmin Zheng; Dequan Wang; Yukuo Cen; Weize Chen; Xu Han; Jianfei Chen; Zhiyuan Liu; Jie Tang; Joey Gonzalez; Michael Mahoney; Alvin Cheung; |
622 | Robust Training Under Label Noise By Over-parameterization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. |
Sheng Liu; Zhihui Zhu; Qing Qu; Chong You; |
623 | Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning Via Decoupled Policy Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce Decoupled Policy Optimization (DePO), which explicitly decouples the policy as a high-level state planner and an inverse dynamics model. |
Minghuan Liu; Zhengbang Zhu; Yuzheng Zhuang; Weinan Zhang; Jianye Hao; Yong Yu; Jun Wang; |
624 | On The Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this work is to understand whether we can reliably learn to cooperate with other agents without such restrictive assumptions, which are unlikely to hold in real-world applications. |
Robert Loftin; Frans A Oliehoek; |
625 | AutoIP: A United Framework to Integrate Physics Into Gaussian Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a simple, yet powerful and general framework {—} AutoIP, for Automatically Incorporating Physics {—} that can integrate all kinds of differential equations into Gaussian Processes (GPs) to enhance prediction accuracy and uncertainty quantification. |
Da Long; Zheng Wang; Aditi Krishnapriyan; Robert Kirby; Shandian Zhe; Michael Mahoney; |
626 | Bayesian Model Selection, The Marginal Likelihood, and Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. |
Sanae Lotfi; Pavel Izmailov; Gregory Benton; Micah Goldblum; Andrew Gordon Wilson; |
627 | Feature Learning and Signal Propagation in Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The curve of the alignment as a function of layer index (generally) exhibits a ascent-descent pattern where the maximum is reached for some hidden layer. In this work, we provide the first explanation for this phenomenon. |
Yizhang Lou; Chris E Mingard; Soufiane Hayou; |
628 | Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this manuscript we develop a quantitative and rigorous theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features in high-dimensions. |
Bruno Loureiro; Cedric Gerbelot; Maria Refinetti; Gabriele Sicuro; Florent Krzakala; |
629 | A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. |
Songtao Lu; |
630 | Additive Gaussian Processes Revisited Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the orthogonal additive kernel (OAK), which imposes an orthogonality constraint on the additive functions, enabling an identifiable, low-dimensional representation of the functional relationship. |
Xiaoyu Lu; Alexis Boukouvalas; James Hensman; |
631 | ModLaNets: Learning Generalisable Dynamics Via Modularity and Physical Inductive Bias Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Deep learning models are able to approximate one specific dynamical system but struggle at learning generalisable dynamics, where dynamical systems obey the same laws of physics but contain different numbers of elements (e.g., double- and triple-pendulum systems). To relieve this issue, we proposed the Modular Lagrangian Network (ModLaNet), a structural neural network framework with modularity and physical inductive bias. |
Yupu Lu; Shijie Lin; Guanqi Chen; Jia Pan; |
632 | Model-Free Opponent Shaping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these methods are myopic since only a small number of steps can be anticipated, are asymmetric since they treat other agents as naive learners, and require the use of higher-order derivatives, which are calculated through white-box access to an opponent’s differentiable learning algorithm. To address these issues, we propose Model-Free Opponent Shaping (M-FOS). |
Christopher Lu; Timon Willi; Christian A Schroeder De Witt; Jakob Foerster; |
633 | Multi-slots Online Matching with High Entropy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Particularly, the gradient computation and resource allocation are both challenging under this setting due to the absence of a closed-form solution. To overcome these obstacles, we develop a novel algorithm named Online subGradient descent for Multi-slots Allocation (OG-MA). |
Xingyu Lu; Qintong Wu; Wenliang Zhong; |
634 | Maximum Likelihood Training for Score-based Diffusion ODEs By High Order Denoising Score Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we prove that matching the first-order score is not sufficient to maximize the likelihood of the ODE, by showing a gap between the maximum likelihood and score matching objectives. |
Cheng Lu; Kaiwen Zheng; Fan Bao; Jianfei Chen; Chongxuan Li; Jun Zhu; |
635 | Orchestra: Unsupervised Federated Learning Via Globally Consistent Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Prior work on this topic has focused on directly extending centralized self-supervised learning techniques, which are not designed to have the properties listed above. To address this situation, we propose Orchestra, a novel unsupervised federated learning technique that exploits the federation’s hierarchy to orchestrate a distributed clustering task and enforce a globally consistent partitioning of clients’ data into discriminable clusters. |
Ekdeep Lubana; Chi Ian Tang; Fahim Kawsar; Robert Dick; Akhil Mathur; |
636 | A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper comments on fundamental aspects of IG and its applications/extensions: 1) We identify key differences between IG function spaces and the supporting literature’s function spaces which problematize previous claims of IG uniqueness. |
Daniel D Lundstrom; Tianjian Huang; Meisam Razaviyayn; |
637 | BAMDT: Bayesian Additive Semi-Multivariate Decision Trees for Nonparametric Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a new class of Bayesian additive multivariate decision tree models that combine univariate split rules for handling possibly high dimensional features without known multivariate structures and novel multivariate split rules for features with multivariate structures in each weak learner. |
Zhao Tang Luo; Huiyan Sang; Bani Mallick; |
638 | Disentangled Federated Learning for Tackling Attributes Skew Via Invariant Aggregation and Diversity Transferring Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To cope with these, we proposed disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches, which are trained by the proposed alternating local-global optimization independently. |
Zhengquan Luo; Yunlong Wang; Zilei Wang; Zhenan Sun; Tieniu Tan; |
639 | Channel Importance Matters in Few-Shot Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Understanding the difficulties posed by this task distribution shift is central to FSL. In this paper, we show that a simple channel-wise feature transformation may be the key to unraveling this secret from a channel perspective. |
Xu Luo; Jing Xu; Zenglin Xu; |
640 | Learning Dynamics and Generalization in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal difference algorithms to gain novel insight into the tension between these two objectives. |
Clare Lyle; Mark Rowland; Will Dabney; Marta Kwiatkowska; Yarin Gal; |
641 | On Finite-Sample Identifiability of Contrastive Learning-Based Nonlinear Independent Component Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work puts forth a finite-sample identifiability analysis of GCL-based nICA. |
Qi Lyu; Xiao Fu; |
642 | Pessimism Meets VCG: Learning Dynamic Mechanism Design Via Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To the best of our knowledge, our work provides the first offline RL algorithm for dynamic mechanism design without assuming uniform coverage. |
Boxiang Lyu; Zhaoran Wang; Mladen Kolar; Zhuoran Yang; |
643 | Versatile Offline Imitation from Observations and Examples Via Regularized State-Occupancy Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose State Matching Offline DIstribution Correction Estimation (SMODICE), a novel and versatile regression-based offline imitation learning algorithm derived via state-occupancy matching. |
Yecheng Ma; Andrew Shen; Dinesh Jayaraman; Osbert Bastani; |
644 | Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a method to explain how the information of each input variable is gradually discarded during the forward propagation in a deep neural network (DNN), which provides new perspectives to explain DNNs. |
Haotian Ma; Hao Zhang; Fan Zhou; Yinqing Zhang; Quanshi Zhang; |
645 | Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. |
Jan Macdonald; Mathieu E. Besan?on; Sebastian Pokutta; |
646 | A Tighter Analysis of Spectral Clustering, and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Abstract: This work studies the classical spectral clustering algorithm which embeds the vertices of some graph G=(V_G, E_G) into R^k using k eigenvectors of some matrix of G, and applies … |
Peter Macgregor; He Sun; |
647 | Zero-Shot Reward Specification Via Grounded Natural Language Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We use recent developments in building large-scale visuolanguage models like CLIP to devise a framework that generates the task reward signal just from goal text description and raw pixel observations which is then used to learn the task policy. |
Parsa Mahmoudieh; Deepak Pathak; Trevor Darrell; |
648 | Feature Selection Using E-values Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the context of supervised learning, we introduce the concept of e-value. |
Subhabrata Majumdar; Snigdhansu Chatterjee; |
649 | SSL Enables Learning from Sparse Rewards in Image-Goal Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We demonstrate that combining sparse rewards with self-supervised learning (SSL) not only makes them work, but also outperforms dense rewards, which is the first result of this kind. |
Arjun Majumdar; Gunnar A Sigurdsson; Robinson Piramuthu; Jesse Thomason; Dhruv Batra; Gaurav S Sukhatme; |
650 | Knowledge-Grounded Self-Rationalization Via Extractive and Natural Language Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, current models that generate the best extractive rationales or NLEs often fall behind the state-of-the-art (SOTA) in terms of task performance. In this work, we bridge this gap by introducing RExC, a self-rationalizing framework that grounds its predictions and two complementary types of explanations (NLEs and extractive rationales) in background knowledge. |
Bodhisattwa Prasad Majumder; Oana Camburu; Thomas Lukasiewicz; Julian Mcauley; |
651 | Nonparametric Involutive Markov Chain Monte Carlo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the nonparametric involutive Markov chain Monte Carlo (NP-iMCMC) algorithm as a method for constructing MCMC inference algorithms for nonparametric models expressible in universal PPLs. |
Carol Mak; Fabian Zaiser; Luke Ong; |
652 | Architecture Agnostic Federated Learning for Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work introduces a novel framework, Federated Heterogeneous Neural Networks (FedHeNN), that allows each client to build a personalised model without enforcing a common architecture across clients. |
Disha Makhija; Xing Han; Nhat Ho; Joydeep Ghosh; |
653 | Robustness in Multi-Objective Submodular Optimization: A Quantile Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose to design and analyse novel algorithms for the robust allocation of submodular systems through lens of quantile maximization. |
Cedric Malherbe; Kevin Scaman; |
654 | More Efficient Sampling for Tensor Decomposition With Worst-Case Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose sampling-based ALS methods for the CP and tensor ring decompositions whose cost does not have this exponential dependence, thereby significantly improving on the previous state-of-the-art. |
Osman Asif Malik; |
655 | Unaligned Supervision for Automatic Music Transcription in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Note$_{EM}$, a method for simultaneously training a transcriber and aligning the scores to their corresponding performances, in a fully-automated process. |
Ben Maman; Amit H Bermano; |
656 | Decision-Focused Learning: Through The Lens of Learning to Rank Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop pointwise, pairwise and listwise ranking loss functions, which can be differentiated in closed form given a subset of solutions. |
Jayanta Mandi; Vi?ctor Bucarey; Maxime Mulamba Ke Tchomba; Tias Guns; |
657 | Differentially Private Coordinate Descent for Composite Empirical Risk Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Differentially Private proximal Coordinate Descent (DP-CD), a new method to solve composite DP-ERM problems. |
Paul Mangold; Aur?lien Bellet; Joseph Salmon; Marc Tommasi; |
658 | Refined Convergence Rates for Maximum Likelihood Estimation Under Finite Mixture Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We revisit the classical problem of deriving convergence rates for the maximum likelihood estimator (MLE) in finite mixture models. |
Tudor A Manole; Nhat Ho; |
659 | On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For learning Nash equilibria in Markov potential games, we propose an independent policy gradient algorithm with a decentralized momentum-based variance reduction technique. |
Weichao Mao; Lin Yang; Kaiqing Zhang; Tamer Basar; |
660 | On The Effects of Artificial Data Modification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we show current shape bias identification methods and occlusion robustness measures are biased and propose a fairer alternative for the latter. |
Antonia Marcu; Adam Prugel-Bennett; |
661 | Personalized Federated Learning Through Local Memorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we exploit the ability of deep neural networks to extract high quality vectorial representations (embeddings) from non-tabular data, e.g., images and text, to propose a personalization mechanism based on local memorization. |
Othmane Marfoq; Giovanni Neglia; Richard Vidal; Laetitia Kameni; |
662 | Nested Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this setting, optimal algorithms based on the exponential weights blueprint (like Hedge, EXP3, and their variants) may incur significant regret because they tend to spend excessive amounts of time exploring irrelevant alternatives with similar, suboptimal costs. To account for this, we propose a nested exponential weights (NEW) algorithm that performs a layered exploration of the learner’s set of alternatives based on a nested, step-by-step selection method. |
Matthieu Martin; Panayotis Mertikopoulos; Thibaud Rahier; Houssam Zenati; |
663 | Closed-Form Diffeomorphic Transformations for Time Series Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a closed-form expression for the ODE solution and its gradient under continuous piecewise-affine (CPA) velocity functions. |
I?igo Martinez; Elisabeth Viles; Igor G. Olaizola; |
664 | SPECTRE: Spectral Conditioning Helps to Overcome The Expressivity Limits of One-shot Graph Generators Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We approach the graph generation problem from a spectral perspective by first generating the dominant parts of the graph Laplacian spectrum and then building a graph matching these eigenvalues and eigenvectors. |
Karolis Martinkus; Andreas Loukas; Nathana?l Perraudin; Roger Wattenhofer; |
665 | Modular Conformal Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a versatile class of algorithms for recalibration in regression that we call modular conformal calibration (MCC). |
Charles Marx; Shengjia Zhao; Willie Neiswanger; Stefano Ermon; |
666 | Continual Repeated Annealed Flow Transport Monte Carlo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Continual Repeated Annealed Flow Transport Monte Carlo (CRAFT), a method that combines a sequential Monte Carlo (SMC) sampler (itself a generalization of Annealed Importance Sampling) with variational inference using normalizing flows. |
Alex Matthews; Michael Arbel; Danilo Jimenez Rezende; Arnaud Doucet; |
667 | How to Stay Curious While Avoiding Noisy TVs Using Aleatoric Uncertainty Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In an attempt to make exploring agents robust to Noisy TVs, we present a simple solution: aleatoric mapping agents (AMAs). |
Augustine Mavor-Parker; Kimberly Young; Caswell Barry; Lewis Griffin; |
668 | How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing defenses have poor performance in practice, either requiring enormous computational overheads or severe utility trade-offs. To meet these challenges, we present a new approach to model stealing defenses called gradient redirection. |
Mantas Mazeika; Bo Li; David Forsyth; |
669 | Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider the problem of learning optimal decision trees, a combinatorial optimization problem that is challenging to solve at scale. |
Rahul Mazumder; Xiang Meng; Haoyue Wang; |
670 | Optimizing Tensor Network Contraction Using Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a Reinforcement Learning (RL) approach combined with Graph Neural Networks (GNN) to address the contraction ordering problem. |
Eli Meirom; Haggai Maron; Shie Mannor; Gal Chechik; |
671 | Causal Transformer for Estimating Counterfactual Outcomes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. |
Valentyn Melnychuk; Dennis Frauen; Stefan Feuerriegel; |
672 | Steerable 3D Spherical Neurons Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In our work, we propose a steerable feed-forward learning-based approach that consists of neurons with spherical decision surfaces and operates on point clouds. |
Pavlo Melnyk; Michael Felsberg; M?rten Wadenb?ck; |
673 | Transformers Are Meta-Reinforcement Learners Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present TrMRL (Transformers for Meta-Reinforcement Learning), a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture. |
Luckeciano C Melo; |
674 | ButterflyFlow: Building Invertible Layers with Butterfly Matrices Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new family of invertible linear layers based on butterfly layers, which are known to theoretically capture complex linear structures including permutations and periodicity, yet can be inverted efficiently. |
Chenlin Meng; Linqi Zhou; Kristy Choi; Tri Dao; Stefano Ermon; |
675 | In Defense of Dual-encoders for Neural Ranking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, a more fundamental question remains less explored: does this performance gap reflect an inherent limitation in the capacity of DE models, or a limitation in the training of such models? And does such an understanding suggest a principled means of improving DE models? In this paper, we study these questions, with three contributions. |
Aditya Menon; Sadeep Jayasumana; Ankit Singh Rawat; Seungyeon Kim; Sashank Reddi; Sanjiv Kumar; |
676 | Equivariant Quantum Graph Circuits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We investigate quantum circuits for graph representation learning, and propose equivariant quantum graph circuits (EQGCs), as a class of parameterized quantum circuits with strong relational inductive bias for learning over graph-structured data. |
Peter Mernyei; Konstantinos Meichanetzidis; Ismail Ilkan Ceylan; |
677 | Stochastic Rising Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. |
Alberto Maria Metelli; Francesco Trov?; Matteo Pirola; Marcello Restelli; |
678 | Minimizing Control for Credit Assignment with Strong Feedback Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions. |
Alexander Meulemans; Matilde Tristany Farinha; Maria R. Cervera; Jo?o Sacramento; Benjamin F. Grewe; |
679 | A Dynamical System Perspective for Lipschitz Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we tackle the problem of building $1$-Lipschitz Neural Networks. |
Laurent Meunier; Blaise J Delattre; Alexandre Araujo; Alexandre Allauzen; |
680 | Distribution Regression with Sliced Wasserstein Kernels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an OT-based estimator for distribution regression. |
Dimitri Meunier; Massimiliano Pontil; Carlo Ciliberto; |
681 | Interpretable and Generalizable Graph Learning Via Stochastic Attention Mechanism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, those post-hoc methods often fail to provide stable interpretation and may extract features that are spuriously correlated with the task. In this work, we address these issues by proposing Graph Stochastic Attention (GSAT). |
Siqi Miao; Mia Liu; Pan Li; |
682 | Modeling Structure with Undirected Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order. |
Tsvetomila Mihaylova; Vlad Niculae; Andre Martins; |
683 | Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These include the classical Hopfield networks (HNs), sparse distributed memories (SDMs), and more recently the modern continuous Hopfield networks (MCHNs), which possess close links with self-attention in machine learning. In this paper, we propose a general framework for understanding the operation of such memory networks as a sequence of three operations: similarity, separation, and projection. |
Beren Millidge; Tommaso Salvatori; Yuhang Song; Thomas Lukasiewicz; Rafal Bogacz; |
684 | Learning Stochastic Shortest Path with Linear Function Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an $\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}})$ regret. |
Yifei Min; Jiafan He; Tianhao Wang; Quanquan Gu; |
685 | Prioritized Training on Points That Are Learnable, Worth Learning, and Not Yet Learnt Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model’s generalization loss. |
S?ren Mindermann; Jan M Brauner; Muhammed T Razzak; Mrinank Sharma; Andreas Kirsch; Winnie Xu; Benedikt H?ltgen; Aidan N Gomez; Adrien Morisot; Sebastian Farquhar; Yarin Gal; |
686 | POEM: Out-of-Distribution Detection with Posterior Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel posterior sampling based outlier mining framework, POEM, which facilitates efficient use of outlier data and promotes learning a compact decision boundary between ID and OOD data for improved detection. |
Yifei Ming; Ying Fan; Yixuan Li; |
687 | A Simple Reward-free Approach to Constrained Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. |
Sobhan Miryoosefi; Chi Jin; |
688 | Wide Neural Networks Forget Less Catastrophically Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. |
Seyed Iman Mirzadeh; Arslan Chaudhry; Dong Yin; Huiyi Hu; Razvan Pascanu; Dilan Gorur; Mehrdad Farajtabar; |
689 | Proximal and Federated Random Reshuffling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose two new algorithms: Proximal and Federated Random Reshuffling (ProxRR and FedRR). |
Konstantin Mishchenko; Ahmed Khaled; Peter Richtarik; |
690 | ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we are specifically interested in the regime in which the evaluation of prox is costly relative to the evaluation of the gradient, which is the case in many applications. |
Konstantin Mishchenko; Grigory Malinovsky; Sebastian Stich; Peter Richtarik; |
691 | Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. |
Aaron Mishkin; Arda Sahiner; Mert Pilanci; |
692 | Memory-Based Model Editing at Scale Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a higher-capacity alternative, we propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC), which stores edits in an explicit memory and learns to reason over them to modulate the base model’s predictions as needed. |
Eric Mitchell; Charles Lin; Antoine Bosselut; Christopher D Manning; Chelsea Finn; |
693 | Invariant Ancestry Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the concept of minimal invariance and propose invariant ancestry search (IAS). |
Phillip B Mogensen; Nikolaj Thams; Jonas Peters; |
694 | Differentially Private Community Detection for Stochastic Block Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the community detection problem while preserving the privacy of the individual connections between the vertices. |
Mohamed S Mohamed; Dung Nguyen; Anil Vullikanti; Ravi Tandon; |
695 | A Multi-objective / Multi-task Learning Framework Induced By Pareto Stationarity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a novel and generic framework to discover a PO solution with multiple forms of preferences. |
Michinari Momma; Chaosheng Dong; Jia Liu; |
696 | EqR: Equivariant Representations for Data-Efficient Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose new mechanisms for learning representations that are equivariant to both the agent’s action, as well as symmetry transformations of the state-action pairs. |
Arnab Kumar Mondal; Vineet Jain; Kaleem Siddiqi; Siamak Ravanbakhsh; |
697 | Feature and Parameter Selection in Stochastic Linear Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study two model selection settings in stochastic linear bandits (LB). |
Ahmadreza Moradipari; Berkay Turan; Yasin Abbasi-Yadkori; Mahnoosh Alizadeh; Mohammad Ghavamzadeh; |
698 | Power-Law Escape Rate of SGD Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss. We use this property of SGD noise to derive a stochastic differential equation (SDE) with simpler additive noise by performing a random time change. |
Takashi Mori; Liu Ziyin; Kangqiao Liu; Masahito Ueda; |
699 | Rethinking Fano’s Inequality in Ensemble Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a fundamental theory on ensemble learning that evaluates a given ensemble system by a well-grounded set of metrics. |
Terufumi Morishita; Gaku Morio; Shota Horiguchi; Hiroaki Ozaki; Nobuo Nukaga; |
700 | SpeqNets: Sparsity-aware Permutation-equivariant Graph Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By introducing new heuristics for the graph isomorphism problem, we devise a class of universal, permutation-equivariant graph networks, which, unlike previous architectures, offer a fine-grained control between expressivity and scalability and adapt to the sparsity of the graph. |
Christopher Morris; Gaurav Rattan; Sandra Kiefer; Siamak Ravanbakhsh; |
701 | CtrlFormer: Learning Transferable State Representation for Visual Control Via Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, porting Transformer to sample-efficient visual control remains a challenging and unsolved problem. To this end, we propose a novel Control Transformer (CtrlFormer), possessing many appealing benefits that prior arts do not have. |
Yao Mark Mu; Shoufa Chen; Mingyu Ding; Jianyu Chen; Runjian Chen; Ping Luo; |
702 | Generalized Beliefs for Cooperative AI Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address this, recent approaches rely on encoding symmetry and convention-awareness into policy training, but these require strong environmental assumptions and can complicate policy training. To overcome this, we propose moving the learning of conventions to the belief space. |
Darius Muglich; Luisa M Zintgraf; Christian A Schroeder De Witt; Shimon Whiteson; Jakob Foerster; |
703 | Bounding The Width of Neural Networks Via Coupled Initialization A Worst Case Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. |
Alexander Munteanu; Simon Omlor; Zhao Song; David Woodruff; |
704 | Constants Matter: The Performance Gains of Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we show through upper and lower bounds, that for a simple benign setting of well-specified logistic regression on a uniform distribution over a sphere, the expected excess error of both active learning and random sampling have the same inverse proportional dependence on the number of samples. |
Stephen O Mussmann; Sanjoy Dasgupta; |
705 | On The Generalization Analysis of Adversarial Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the generalization properties of adversarial learning. |
Waleed Mustafa; Yunwen Lei; Marius Kloft; |
706 | Universal and Data-adaptive Algorithms for Model Selection in Linear Contextual Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form O(d^{\alpha} T^{1 – \alpha}) with no feature diversity conditions whatsoever, where d denotes the dimension of the linear model and T denotes the total number of rounds. |
Vidya K Muthukumar; Akshay Krishnamurthy; |
707 | The Importance of Non-Markovianity in Maximum State Entropy Exploration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we argue that non-Markovianity is instead paramount for maximum state entropy exploration in a finite-sample regime. |
Mirco Mutti; Riccardo De Santi; Marcello Restelli; |
708 | PAC-Net: A Model Pruning Approach to Inductive Transfer Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose PAC-Net, a simple yet effective approach for transfer learning based on pruning. |
Sanghoon Myung; In Huh; Wonik Jang; Jae Myung Choe; Jisu Ryu; Daesin Kim; Kee-Eung Kim; Changwook Jeong; |
709 | AutoSNN: Towards Energy-Efficient Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To further improve the accuracy and reduce the spikes generated by SNNs, we propose a spike-aware neural architecture search framework called AutoSNN. |
Byunggook Na; Jisoo Mok; Seongsik Park; Dongjin Lee; Hyeokjun Choe; Sungroh Yoon; |
710 | Implicit Bias of The Step Size in Linear Diagonal Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. |
Mor Shpigel Nacson; Kavya Ravichandran; Nathan Srebro; Daniel Soudry; |
711 | DNNR: Differential Nearest Neighbors Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel method called Differential Nearest Neighbors Regression (DNNR) that addresses both issues simultaneously: during training, DNNR estimates local gradients to scale the features; during inference, it performs an n-th order Taylor approximation using estimated gradients. |
Youssef Nader; Leon Sixt; Tim Landgraf; |
712 | Overcoming Oscillations in Quantization-Aware Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise during training. |
Markus Nagel; Marios Fournarakis; Yelysei Bondarenko; Tijmen Blankevoort; |
713 | Strategic Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: How can a user make good choices based on strategic representations? We formalize this as a learning problem, and pursue algorithms for decision-making that are robust to manipulation. |
Vineet Nair; Ganesh Ghalme; Inbal Talgam-Cohen; Nir Rosenfeld; |
714 | Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. |
Giung Nam; Hyungi Lee; Byeongho Heo; Juho Lee; |
715 | Measuring Representational Robustness of Neural Networks Through Shared Invariances Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a measure called \stir, which faithfully captures the extent to which two NNs share invariances. |
Vedant Nanda; Till Speicher; Camila Kolling; John P Dickerson; Krishna Gummadi; Adrian Weller; |
716 | Tight and Robust Private Mean Estimation with Few Users Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study high-dimensional mean estimation under user-level differential privacy, and design an $(\varepsilon,\delta)$-differentially private mechanism using as few users as possible. |
Shyam Narayanan; Vahab Mirrokni; Hossein Esfandiari; |
717 | Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To address this challenge we present a novel, fully differentiable hybrid approach to FSI that combines a 2D direct numerical simulation for the deformable solid structure of the swimmer and a physics-constrained neural network surrogate to capture hydrodynamic effects of the fluid. |
Elvis Nava; John Z Zhang; Mike Yan Michelis; Tao Du; Pingchuan Ma; Benjamin F. Grewe; Wojciech Matusik; Robert Kevin Katzschmann; |
718 | Multi-Task Learning As A Bargaining Game Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update. |
Aviv Navon; Aviv Shamsian; Idan Achituve; Haggai Maron; Kenji Kawaguchi; Gal Chechik; Ethan Fetaya; |
719 | Variational Inference for Infinitely Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data. |
Achille Nazaret; David Blei; |
720 | Stable Conformal Prediction Sets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we combine CP techniques with classical algorithmic stability bounds to derive a prediction set computable with a single model fit. |
Eugene Ndiaye; |
721 | Discovering Generalizable Spatial Goal Representations Via Graph-based Active Reward Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we consider one-shot imitation learning for object rearrangement tasks, where an AI agent needs to watch a single expert demonstration and learn to perform the same task in different environments. |
Aviv Netanyahu; Tianmin Shu; Joshua Tenenbaum; Pulkit Agrawal; |
722 | Sublinear-Time Clustering Oracle for Signed Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We provide a local clustering oracle for signed graphs with such a clear community structure, that can answer membership queries, i.e., Given a vertex v, which community does v belong to?, in sublinear time by reading only a small portion of the graph. |
Stefan Neumann; Pan Peng; |
723 | Improved Regret for Differentially Private Exploration in Linear MDP Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We provide a private algorithm with an improved regret rate with an optimal dependence of O($\sqrt{}$K) on the number of episodes. |
Dung Daniel T Ngo; Giuseppe Vietri; Steven Wu; |
724 | A Framework for Learning to Request Rich and Contextually Useful Information from Humans Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a general interactive framework that enables an agent to request and interpret rich, contextually useful information from an assistant that has knowledge about the task and the environment. |
Khanh X Nguyen; Yonatan Bisk; Hal Daum? Iii; |
725 | Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Transformer Neural Processes (TNPs), a new member of the NP family that casts uncertainty-aware meta learning as a sequence modeling problem. |
Tung Nguyen; Aditya Grover; |
726 | Improving Transformers with Probabilistic Attention Keys Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head. |
Tam Minh Nguyen; Tan Minh Nguyen; Dung D. D. Le; Duy Khuong Nguyen; Viet-Anh Tran; Richard Baraniuk; Nhat Ho; Stanley Osher; |
727 | On Transportation of Mini-batches: A Hierarchical Approach Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, the m-OT does not approximate a proper metric between probability measures since the identity property is not satisfied. To address these problems, we propose a novel mini-batch scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that finds the optimal coupling between mini-batches and it can be seen as an approximation to a well-defined distance on the space of probability measures. |
Khai Nguyen; Dang Nguyen; Quoc Dinh Nguyen; Tung Pham; Hung Bui; Dinh Phung; Trung Le; Nhat Ho; |
728 | Improving Mini-batch Optimal Transport Via Partial Transportation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the misspecified mappings issue, we propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures, which we refer to as mini-batch partial optimal transport (m-POT). |
Khai Nguyen; Dang Nguyen; The-Anh Vu-Le; Tung Pham; Nhat Ho; |
729 | Recurrent Model-Free RL Can Be A Strong Baseline for Many POMDPs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We find that careful architecture and hyperparameter decisions can often yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques. |
Tianwei Ni; Benjamin Eysenbach; Ruslan Salakhutdinov; |
730 | Optimal Estimation of Policy Gradient Via Double Fitted Iteration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the double Fitted PG estimation (FPG) algorithm. |
Chengzhuo Ni; Ruiqi Zhang; Xiang Ji; Xuezhou Zhang; Mengdi Wang; |
731 | GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore diffusion models for the problem of text-conditional image synthesis and compare two different guidance strategies: CLIP guidance and classifier-free guidance. |
Alexander Quinn Nichol; Prafulla Dhariwal; Aditya Ramesh; Pranav Shyam; Pamela Mishkin; Bob Mcgrew; Ilya Sutskever; Mark Chen; |
732 | Diffusion Models for Adversarial Purification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. |
Weili Nie; Brandon Guo; Yujia Huang; Chaowei Xiao; Arash Vahdat; Animashree Anandkumar; |
733 | The Primacy Bias in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. |
Evgenii Nikishin; Max Schwarzer; Pierluca D?Oro; Pierre-Luc Bacon; Aaron Courville; |
734 | Causal Conceptions of Fairness and Their Consequences Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions almost always—in a measure theoretic sense—result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. |
Hamed Nilforoshan; Johann D Gaebler; Ravi Shroff; Sharad Goel; |
735 | Efficient Test-Time Model Adaptation Without Forgetting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these issues, we propose an efficient anti-forgetting test-time adaptation (EATA) method. |
Shuaicheng Niu; Jiaxiang Wu; Yifan Zhang; Yaofo Chen; Shijian Zheng; Peilin Zhao; Mingkui Tan; |
736 | Generative Trees: Adversarial and Copycat Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a new path forward for the generation of tabular data, exploiting decades-old understanding of the supervised task’s best components for DT induction, from losses (properness), models (tree-based) to algorithms (boosting). |
Richard Nock; Mathieu Guillame-Bert; |
737 | Path-Aware and Structure-Preserving Generation of Synthetically Accessible Molecules Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To design synthetically accessible molecules that preserve main structural motifs of target molecules, we propose a reaction-embedded and structure-conditioned variational autoencoder. |
Juhwan Noh; Dae-Woong Jeong; Kiyoung Kim; Sehui Han; Moontae Lee; Honglak Lee; Yousung Jung; |
738 | Utilizing Expert Features for Contrastive Learning of Time-Series Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an approach that incorporates expert knowledge for time-series representation learning. |
Manuel T Nonnenmacher; Lukas Oldenburg; Ingo Steinwart; David Reeb; |
739 | Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Tranception, a novel transformer architecture leveraging autoregressive predictions and retrieval of homologous sequences at inference to achieve state-of-the-art fitness prediction performance. |
Pascal Notin; Mafalda Dias; Jonathan Frazer; Javier Marchena Hurtado; Aidan N Gomez; Debora Marks; Yarin Gal; |
740 | Fast Finite Width Neural Tangent Kernel Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We perform the first in-depth analysis of the compute and memory requirements for NTK computation in finite width networks. |
Roman Novak; Jascha Sohl-Dickstein; Samuel S Schoenholz; |
741 | Multicoated Supermasks Enhance Hidden Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that the supermask stops improving even though gradients are not zero, thus underutilizing backpropagated information. To address this we propose a method that extends Hidden Networks by training an overlay of multiple hierarchical supermasks{—}a multicoated supermask. |
Yasuyuki Okoshi; ?ngel L?pez Garci?a-Arias; Kazutoshi Hirose; Kota Ando; Kazushi Kawamura; Thiem Van Chu; Masato Motomura; Jaehoon Yu; |
742 | Generalized Leverage Scores: Geometric Interpretation and Applications Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we extend the definition of leverage scores to relate the columns of a matrix to arbitrary subsets of singular vectors. |
Bruno Ordozgoiti; Antonis Matakos; Aristides Gionis; |
743 | Practical Almost-Linear-Time Approximation Algorithms for Hybrid and Overlapping Graph Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a frame-work based on two novel clustering objectives, which naturally extend the well-studied notion of conductance to clusters with hybrid vertex-and edge-boundary structure. |
Lorenzo Orecchia; Konstantinos Ameranis; Charalampos Tsourakakis; Kunal Talwar; |
744 | Anticorrelated Noise Injection for Improved Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we zoom in on the problem of correlating the perturbations of consecutive PGD steps. |
Antonio Orvieto; Hans Kersting; Frank Proske; Francis Bach; Aurelien Lucchi; |
745 | Scalable Deep Gaussian Markov Random Fields for General Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a flexible GMRF model for general graphs built on the multi-layer structure of Deep GMRFs, originally proposed for lattice graphs only. |
Joel Oskarsson; Per Sid?n; Fredrik Lindsten; |
746 | Zero-shot AutoML with Pretrained Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. |
Ekrem ?zt?rk; Fabio Ferreira; Hadi Jomaa; Lars Schmidt-Thieme; Josif Grabocka; Frank Hutter; |
747 | History Compression Via Language Models in Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. |
Fabian Paischer; Thomas Adler; Vihang Patil; Angela Bitto-Nemling; Markus Holzleitner; Sebastian Lehner; Hamid Eghbal-Zadeh; Sepp Hochreiter; |
748 | A Study on The Ramanujan Graph Property of Winning Lottery Tickets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We analyze the Ramanujan graph property of such bipartite layers in terms of their spectral characteristics using the Cheeger’s inequality for irregular graphs. |
Bithika Pal; Arindam Biswas; Sudeshna Kolay; Pabitra Mitra; Biswajit Basu; |
749 | On Learning Mixture of Linear Regressions in The Non-Realizable Setting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we show that a version of the popular expectation minimization (EM) algorithm finds out the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial points, and thereby provides a solution for the ERM. |
Soumyabrata Pal; Arya Mazumdar; Rajat Sen; Avishek Ghosh; |
750 | Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Multiple agents exacerbate the problem severely, since the suboptimal policy by any agent can lead to uncoordinated global failure. Following this intuition, we propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), which combines the first-order policy gradients and zeroth-order optimization methods to better optimize the conservative value functions over the actor parameters. |
Ling Pan; Longbo Huang; Tengyu Ma; Huazhe Xu; |
751 | A Unified Weight Initialization Paradigm for Tensorial Convolutional Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Meanwhile, although there are ad-hoc approaches for specific architectures (e.g., Tensor Ring Nets), they are not applicable to TCNNs with other tensor decomposition methods (e.g., CP or Tucker decomposition). To address this problem, we propose a universal weight initialization paradigm, which generalizes Xavier and Kaiming methods and can be widely applicable to arbitrary TCNNs. |
Yu Pan; Zeyong Su; Ao Liu; Wang Jingquan; Nannan Li; Zenglin Xu; |
752 | Robustness and Accuracy Could Be Reconcilable By (Proper) Definition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we dig for the origin of this trade-off in adversarial training and find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance — an overcorrection towards smoothness. |
Tianyu Pang; Min Lin; Xiao Yang; Jun Zhu; Shuicheng Yan; |
753 | Towards Coherent and Consistent Use of Entities in Narrative Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on the end task of narrative generation and systematically analyse the long-range entity coherence and consistency in generated stories.First, we propose a set of automatic metrics for measuring model performance in terms of entity usage. |
Pinelopi Papalampidi; Kris Cao; Tomas Kocisky; |
754 | Constrained Discrete Black-Box Optimization Using Mixed-Integer Programming Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In response, we propose NN+MILP, a general discrete MBO framework using piecewise-linear neural networks as surrogate models and mixed-integer linear programming (MILP) to optimize the acquisition function. |
Theodore P Papalexopoulos; Christian Tjandraatmadja; Ross Anderson; Juan Pablo Vielma; David Belanger; |
755 | A Theoretical Comparison of Graph Neural Network Extensions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study and compare different Graph Neural Network extensions that increase the expressive power of GNNs beyond the Weisfeiler-Leman test. |
P?l Andr?s Papp; Roger Wattenhofer; |
756 | Validating Causal Inference Methods Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. |
Harsh Parikh; Carlos Varjao; Louise Xu; Eric Tchetgen Tchetgen; |
757 | The Unsurprising Effectiveness of Pre-Trained Vision Models for Control Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this context, we revisit and study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets. |
Simone Parisi; Aravind Rajeswaran; Senthil Purushwalkam; Abhinav Gupta; |
758 | Learning Symmetric Embeddings for Equivariant World Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, characterizing how transformations act on input data is often difficult, limiting the applicability of equivariant models. We propose learning symmetric embedding networks (SENs) that encode an input space (e.g. images), where we do not know the effect of transformations (e.g. rotations), to a feature space that transforms in a known manner under these operations. |
Jung Yeon Park; Ondrej Biza; Linfeng Zhao; Jan-Willem Van De Meent; Robin Walters; |
759 | Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: BNNs require a large number of predictions to produce reliable results, leading to a significant increase in computational cost. To alleviate this issue, we propose spatial smoothing, a method that ensembles neighboring feature map points of convolutional neural networks. |
Namuk Park; Songkuk Kim; |
760 | Exact Optimal Accelerated Complexity for Fixed-Point Iterations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work presents an acceleration mechanism for fixed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a Hölder-type growth condition. |
Jisun Park; Ernest K Ryu; |
761 | Kernel Methods for Radial Transformed Compositional Data with Many Zeros Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a radial transformation that does not require zero substitutions and more importantly results in essential equivalence between domains before and after the transformation. |
Junyoung Park; Changwon Yoon; Cheolwoo Park; Jeongyoun Ahn; |
762 | Evolving Curricula with Regret-Based Environment Design Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work proposes harnessing the power of evolution in a principled, regret-based curriculum. |
Jack Parker-Holder; Minqi Jiang; Michael Dennis; Mikayel Samvelyan; Jakob Foerster; Edward Grefenstette; Tim Rockt?schel; |
763 | Neural Language Models Are Not Born Equal to Fit Brain Data, But Training Helps Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we make first steps in this direction and examine the impact of test loss, training corpus and model architecture (comparing GloVe, LSTM, GPT-2 and BERT), on the prediction of functional Magnetic Resonance Imaging time-courses of participants listening to an audiobook. |
Alexandre Pasquiou; Yair Lakretz; John T Hale; Bertrand Thirion; Christophe Pallier; |
764 | A New Similarity Measure for Covariate Shift with Applications to Nonparametric Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new measure of distribution mismatch between the source and target distributions using the integrated ratio of probabilities of balls at a given radius. |
Reese Pathak; Cong Ma; Martin Wainwright; |
765 | Align-RUDDER: Learning From Few Demonstrations By Reward Redistribution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. |
Vihang Patil; Markus Hofmarcher; Marius-Constantin Dinu; Matthias Dorfer; Patrick M Blies; Johannes Brandstetter; Jos? Arjona-Medina; Sepp Hochreiter; |
766 | POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. |
Shishir G. Patil; Paras Jain; Prabal Dutta; Ion Stoica; Joseph Gonzalez; |
767 | Learning to Cut By Looking Ahead: Cutting Plane Selection Via Imitation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. |
Max B Paulus; Giulia Zarpellon; Andreas Krause; Laurent Charlin; Chris Maddison; |
768 | Neural Network Pruning Denoises The Features and Makes Local Connectivity Emerge in Visual Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we characterize the inductive bias that pruning imprints in such winning lottery tickets: focusing on visual tasks, we analyze the architecture resulting from iterative magnitude pruning of a simple fully connected network. |
Franco Pellegrini; Giulio Biroli; |
769 | Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. |
Yifan Peng; Siddharth Dalmia; Ian Lane; Shinji Watanabe; |
770 | Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the challenge, we develop an E(3)-equivariant generative network composed of two modules: 1) a new graph neural network capturing both spatial and bonding relationships between atoms of the binding pockets and 2) a new efficient algorithm which samples new drug candidates conditioned on the pocket representations from a tractable distribution without relying on MCMC. |
Xingang Peng; Shitong Luo; Jiaqi Guan; Qi Xie; Jian Peng; Jianzhu Ma; |
771 | Differentiable Top-k Classification Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Leveraging recent advances in differentiable sorting and ranking, we propose a family of differentiable top-k cross-entropy classification losses. |
Felix Petersen; Hilde Kuehne; Christian Borgelt; Oliver Deussen; |
772 | Multi-scale Feature Learning Dynamics: Insights for Double Descent Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we investigate the origins of the less studied epoch-wise double descent in which the test error undergoes two non-monotonous transitions, or descents as the training time increases. |
Mohammad Pezeshki; Amartya Mitra; Yoshua Bengio; Guillaume Lajoie; |
773 | A Differential Entropy Estimator for Training Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address shortcomings in previously proposed estimators for DE, here we introduce KNIFE, a fully parameterized, differentiable kernel-based estimator of DE. |
Georg Pichler; Pierre Jean A. Colombo; Malik Boudiaf; G?nther Koliander; Pablo Piantanida; |
774 | Federated Learning with Partial Model Personalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. |
Krishna Pillutla; Kshitiz Malik; Abdel-Rahman Mohamed; Mike Rabbat; Maziar Sanjabi; Lin Xiao; |
775 | Deep Networks on Toroids: Removing Symmetries Reveals The Structure of Flat Regions in The Landscape Geometry Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather than the loss. |
Fabrizio Pittorino; Antonio Ferraro; Gabriele Perugini; Christoph Feinauer; Carlo Baldassi; Riccardo Zecchina; |
776 | Geometric Multimodal Contrastive Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. |
Petra Poklukar; Miguel Vasco; Hang Yin; Francisco S. Melo; Ana Paiva; Danica Kragic; |
777 | Constrained Offline Policy Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we introduce Constrained Offline Policy Optimization (COPO), an offline policy optimization algorithm for learning in MDPs with cost constraints. |
Nicholas Polosky; Bruno C. Da Silva; Madalina Fiterau; Jithin Jagannath; |
778 | Offline Meta-Reinforcement Learning with Online Self-Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. |
Vitchyr H Pong; Ashvin V Nair; Laura M Smith; Catherine Huang; Sergey Levine; |
779 | Debiaser Beware: Pitfalls of Centering Regularized Transport Maps Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, and perhaps surprisingly, we present a few cases in which debiasing is provably detrimental in a statistical sense, notably when the regularization strength is large or the number of samples is small. |
Aram-Alexandre Pooladian; Marco Cuturi; Jonathan Niles-Weed; |
780 | Adaptive Second Order Coresets for Data-efficient Machine Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose AdaCore, a method that leverages the geometry of the data to extract subsets of the training examples for efficient machine learning. |
Omead Pooladzandi; David Davini; Baharan Mirzasoleiman; |
781 | On The Practicality of Deterministic Epistemic Uncertainty Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we first provide a taxonomy of DUMs, and evaluate their calibration under continuous distributional shifts. Then, we extend them to semantic segmentation. |
Janis Postels; Mattia Seg?; Tao Sun; Luca Daniel Sieber; Luc Van Gool; Fisher Yu; Federico Tombari; |
782 | A Simple Guard for Learned Optimizers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new class of Safeguarded L2O, called Loss-Guarded L2O (LGL2O), which is both conceptually simpler and computationally less expensive. |
Isabeau Pr?mont-Schwarz; Jaroslav Vi?tku; Jan Feyereisl; |
783 | Hardness and Algorithms for Robust and Sparse Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression. |
Eric Price; Sandeep Silwal; Samson Zhou; |
784 | Nonlinear Feature Diffusion on Hypergraphs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we develop a nonlinear diffusion process on hypergraphs that spreads both features and labels following the hypergraph structure. |
Konstantin Prokopchik; Austin R Benson; Francesco Tudisco; |
785 | Universal Joint Approximation of Manifolds and Densities By Simple Injective Flows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study approximation of probability measures supported on n-dimensional manifolds embedded in R^m by injective flows—neural networks composed of invertible flows and injective layers. |
Michael Puthawala; Matti Lassas; Ivan Dokmanic; Maarten De Hoop; |
786 | The Teaching Dimension of Regularized Kernel Learners Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the fact that regularization can reduce the learning complexity in machine learning, a natural question is whether the similar fact happens in machine teaching. To answer this essential question, this paper proposes a unified theoretical framework termed STARKE to analyze the TD of regularized kernel learners. |
Hong Qian; Xu-Hui Liu; Chen-Xi Su; Aimin Zhou; Yang Yu; |
787 | ContentVec: An Improved Self-Supervised Speech Representation By Disentangling Speakers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new SSL method that can achieve speaker disentanglement without severe loss of content. |
Kaizhi Qian; Yang Zhang; Heting Gao; Junrui Ni; Cheng-I Lai; David Cox; Mark Hasegawa-Johnson; Shiyu Chang; |
788 | Interventional Contrastive Learning with Meta Semantic Regularizer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a backdoor adjustment-based regularization method, namely Interventional Contrastive Learning with Meta Semantic Regularizer (ICL-MSR), to perform causal intervention towards the proposed SCM. |
Wenwen Qiang; Jiangmeng Li; Changwen Zheng; Bing Su; Hui Xiong; |
789 | Sample-Efficient Reinforcement Learning with Loglog(T) Switching Cost Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$. |
Dan Qiao; Ming Yin; Ming Min; Yu-Xiang Wang; |
790 | Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we formulate the aforementioned setting as the problem of evolving domain generalization. |
Tiexin Qin; Shiqi Wang; Haoliang Li; |
791 | Graph Neural Architecture Search Under Distribution Shifts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, when there is a distribution shift between training and testing graphs, the existing approaches fail to deal with the problem of adapting to unknown test graph structures since they only search for a fixed architecture for all graphs. To solve this problem, we propose a novel GRACES model which is able to generalize under distribution shifts through tailoring a customized GNN architecture suitable for each graph instance with unknown distribution. |
Yijian Qin; Xin Wang; Ziwei Zhang; Pengtao Xie; Wenwu Zhu; |
792 | Spectral Representation of Robustness Measures for Optimization Under Input Uncertainty Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Spectral Representation of Robustness Measures based on the GP’s spectral representation, i.e., an analytical approach to approximately infer both robustness measures for normal and uniform input uncertainty distributions. |
Jixiang Qing; Tom Dhaene; Ivo Couckuyt; |
793 | Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. |
Zi-Hao Qiu; Quanqi Hu; Yongjian Zhong; Lijun Zhang; Tianbao Yang; |
794 | Latent Outlier Exposure for Anomaly Detection with Contaminated Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a strategy for training an anomaly detector in the presence of unlabeled anomalies that is compatible with a broad class of models. |
Chen Qiu; Aodong Li; Marius Kloft; Maja Rudolph; Stephan Mandt; |
795 | Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To narrow such a gap, we study contrastive-learning empowered RL for a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. |
Shuang Qiu; Lingxiao Wang; Chenjia Bai; Zhuoran Yang; Zhaoran Wang; |
796 | Fast and Provable Nonconvex Tensor RPCA Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study nonconvex tensor robust principal component analysis (RPCA) based on the $t$-SVD. |
Haiquan Qiu; Yao Wang; Shaojie Tang; Deyu Meng; Quanming Yao; |
797 | Generalized Federated Learning Via Sharpness Aware Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Therefore, in this paper, we revisit the solutions to the distribution shift problem in FL with a focus on local learning generality. |
Zhe Qu; Xingyu Li; Rui Duan; Yao Liu; Bo Tang; Zhuo Lu; |
798 | Particle Transformer for Jet Tagging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present JetClass, a new comprehensive dataset for jet tagging. |
Huilin Qu; Congqiao Li; Sitian Qian; |
799 | Winning The Lottery Ahead of Time: Efficient Early Network Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Early Compression via Gradient Flow Preservation (EarlyCroP), which efficiently extracts state-of-the-art sparse models before or early in training addressing challenge (1), and can be applied in a structured manner addressing challenge (2). |
John Rachwan; Daniel Z?gner; Bertrand Charpentier; Simon Geisler; Morgane Ayle; Stephan G?nnemann; |
800 | Convergence of Uncertainty Sampling for Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an efficient uncertainty estimator for binary classification which we also extend to multiple classes, and provide a non-asymptotic rate of convergence for our uncertainty sampling based active learning algorithm in both cases under no-noise conditions (i.e., linearly separable data). |
Anant Raj; Francis Bach; |
801 | DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the much larger model size and unique architecture, how to provide fast MoE model inference remains challenging and unsolved, limiting their practical usage. To tackle this, we present DeepSpeed-MoE, an end-to-end MoE training and inference solution, including novel MoE architecture designs and model compression techniques that reduce MoE model size by up to 3.7x, and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions. |
Samyam Rajbhandari; Conglong Li; Zhewei Yao; Minjia Zhang; Reza Yazdani Aminabadi; Ammar Ahmad Awan; Jeff Rasley; Yuxiong He; |
802 | Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new regularization – named Fishr – that enforces domain invariance in the space of the gradients of the loss: specifically, the domain-level variances of gradients are matched across training domains. |
Alexandre Rame; Corentin Dancette; Matthieu Cord; |
803 | A Closer Look at Smoothness in Domain Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we analyze the effect of smoothness enhancing formulations on domain adversarial training, the objective of which is a combination of task loss (eg. |
Harsh Rangwani; Sumukh K Aithal; Mayank Mishra; Arihant Jain; Venkatesh Babu Radhakrishnan; |
804 | Linear Adversarial Concept Erasure Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we formulate the problem of identifying a linear subspace that corresponds to a given concept, and removing it from the representation. |
Shauli Ravfogel; Michael Twiton; Yoav Goldberg; Ryan D Cotterell; |
805 | Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. |
Noam Razin; Asaf Maman; Nadav Cohen; |
806 | One-Pass Algorithms for MAP Inference of Nonsymmetric Determinantal Point Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we initiate the study of one-pass algorithms for solving the maximum-a-posteriori (MAP) inference problem for Non-symmetric Determinantal Point Processes (NDPPs). |
Aravind Reddy; Ryan A. Rossi; Zhao Song; Anup Rao; Tung Mai; Nedim Lipka; Gang Wu; Eunyee Koh; Nesreen Ahmed; |
807 | Universality of Winning Tickets: A Renormalization Group Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We find that iterative magnitude pruning, the principal algorithm used for discovering winning tickets, is a renormalization group scheme, and can be viewed as inducing a flow in parameter space. |
William T Redman; Tianlong Chen; Zhangyang Wang; Akshunna S. Dogra; |
808 | The Dynamics of Representation Learning in Shallow, Non-linear Autoencoders Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we study the dynamics of feature learning in non-linear, shallow autoencoders. |
Maria Refinetti; Sebastian Goldt; |
809 | Proximal Exploration for Model-guided Protein Sequence Design Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the exploration mechanism of model-guided sequence design. |
Zhizhou Ren; Jiahan Li; Fan Ding; Yuan Zhou; Jianzhu Ma; Jian Peng; |
810 | Towards Theoretical Analysis of Transformation Complexity of ReLU DNNs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to theoretically analyze the complexity of feature transformations encoded in piecewise linear DNNs with ReLU layers. |
Jie Ren; Mingjie Li; Meng Zhou; Shih-Han Chan; Quanshi Zhang; |
811 | Benchmarking and Analyzing Point Cloud Classification Under Corruptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to rigorously benchmark and analyze point cloud classification under corruptions. |
Jiawei Ren; Liang Pan; Ziwei Liu; |
812 | A Unified View on PAC-Bayes Bounds for Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, by upper bounding arbitrary convex functions, which link the expected and empirical losses at the environment and also per-task levels, we obtain new PAC-Bayes bounds. |
Arezou Rezazadeh; |
813 | 3PC: Three Point Compressors for Communication-Efficient Distributed Training and A Better Theory for Lazy Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose and study a new class of gradient compressors for communication-efficient training—three point compressors (3PC)—as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. |
Peter Richtarik; Igor Sokolov; Elnur Gasanov; Ilyas Fatkhullin; Zhize Li; Eduard Gorbunov; |
814 | Robust SDE-Based Variational Formulations for Solving Linear PDEs Via Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this article, we rigorously investigate corresponding numerical aspects that appear in the context of linear Kolmogorov PDEs. |
Lorenz Richter; Julius Berner; |
815 | Probabilistically Robust Learning: Balancing Average and Worst-case Performance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, in this paper we propose a framework called probabilistic robustness that bridges the gap between the accurate, yet brittle average case and the robust, yet conservative worst case by enforcing robustness to most rather than to all perturbations. |
Alexander Robey; Luiz Chamon; George J. Pappas; Hamed Hassani; |
816 | LyaNet: A Lyapunov Framework for Training Neural ODEs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method for training ordinary differential equations by using a control-theoretic Lyapunov condition for stability. |
Ivan Dario Jimenez Rodriguez; Aaron Ames; Yisong Yue; |
817 | Short-Term Plasticity Neurons Learning to Learn and Forget Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we present a new type of recurrent neural unit, the STP Neuron (STPN), which indeed turns out strikingly powerful. |
Hector Garcia Rodriguez; Qinghai Guo; Timoleon Moraitis; |
818 | Function-space Inference with Sparse Implicit Processes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot tune the prior IP to the observed data. We propose here the first method that can accomplish both goals. |
Simon Rodri?guez-Santana; Bryan Zaldivar; Daniel Hernandez-Lobato; |
819 | Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper demonstrates how to recover causal graphs from the score of the data distribution in non-linear additive (Gaussian) noise models. |
Paul Rolland; Volkan Cevher; Matth?us Kleindessner; Chris Russell; Dominik Janzing; Bernhard Sch?lkopf; Francesco Locatello; |
820 | Dual Decomposition of Convex Optimization Layers for Consistent Attention in Medical Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a multi-layer attention mechanism that enforces consistent interpretations between attended convolutional layers using convex optimization. |
Tom Ron; Tamir Hazan; |
821 | A Consistent and Efficient Evaluation Strategy for Attribution Methods Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present an information-theoretic analysis of evaluation strategies based on pixel perturbations. |
Yao Rong; Tobias Leemann; Vadim Borisov; Gjergji Kasneci; Enkelejda Kasneci; |
822 | Efficiently Learning The Topology and Behavior of A Networked Dynamical System Via Active Queries Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present algorithms to learn the topology and the behavior under both batch and adaptive query models for several classes of dynamical systems. |
Daniel J Rosenkrantz; Abhijin Adiga; Madhav Marathe; Zirou Qiu; S S Ravi; Richard Stearns; Anil Vullikanti; |
823 | Learning to Infer Structures of Network Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We adopt a transformer-like architecture which correctly accounts for the symmetries of the problem and learns a mapping from the equilibrium actions to the network structure of the game without explicit knowledge of the utility function. |
Emanuele Rossi; Federico Monti; Yan Leng; Michael Bronstein; Xiaowen Dong; |
824 | Direct Behavior Specification Via Constrained Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. |
Julien Roy; Roger Girgis; Joshua Romoff; Pierre-Luc Bacon; Chris J Pal; |
825 | Constraint-based Graph Network Simulator Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we present a framework for constraint-based learned simulation, where a scalar constraint function is implemented as a graph neural network, and future predictions are computed by solving the optimization problem defined by the learned constraint. |
Yulia Rubanova; Alvaro Sanchez-Gonzalez; Tobias Pfaff; Peter Battaglia; |
826 | Continual Learning Via Sequential Function-Space Variational Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Addressing the drawbacks of existing techniques, we propose an optimization objective derived by formulating continual learning as sequential function-space variational inference. |
Tim G. J. Rudner; Freddie Bickford Smith; Qixuan Feng; Yee Whye Teh; Yarin Gal; |
827 | Graph-Coupled Oscillator Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. |
T. Konstantin Rusch; Ben Chamberlain; James Rowbottom; Siddhartha Mishra; Michael Bronstein; |
828 | Hindering Adversarial Attacks with Implicit Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-10 classifiers for perturbations up to 8/255 in Linf norm and 0.5 in L2 norm. |
Andrei A Rusu; Dan Andrei Calian; Sven Gowal; Raia Hadsell; |
829 | Exploiting Independent Instruments: Identification and Distribution Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We connect to the existing literature in econometrics and provide a practical method called HSIC-X for exploiting independence that can be combined with any gradient-based learning procedure. |
Sorawit Saengkyongam; Leonard Henckel; Niklas Pfister; Jonas Peters; |
830 | FedNL: Making Newton-Type Methods Applicable to Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by recent work of Islamov et al (2021), we propose a family of Federated Newton Learn (\algname{FedNL}) methods, which we believe is a marked step in the direction of making second-order methods applicable to FL. |
Mher Safaryan; Rustem Islamov; Xun Qian; Peter Richtarik; |
831 | Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of K-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decision points queried in an online sequential manner. |
Aadirupa Saha; Pierre Gaillard; |
832 | Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of dynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time-varying preferences. |
Aadirupa Saha; Shubham Gupta; |
833 | Unraveling Attention Via Convex Duality: Analysis and Interpretations of Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. |
Arda Sahiner; Tolga Ergen; Batu Ozturkler; John Pauly; Morteza Mardani; Mert Pilanci; |
834 | Off-Policy Evaluation for Large Action Spaces Via Embeddings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This foils the use of OPE in many applications from recommender systems to language models. To overcome this issue, we propose a new OPE estimator that leverages marginalized importance weights when action embeddings provide structure in the action space. |
Yuta Saito; Thorsten Joachims; |
835 | Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. |
Charbel Sakr; Steve Dai; Rangha Venkatesan; Brian Zimmer; William Dally; Brucek Khailany; |
836 | A Convergence Theory for SVGD in The Population Limit Under Talagrand’s Inequality T1 Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the convergence of SVGD in the population limit, (i.e., with an infinite number of particles) to sample from a non-logconcave target distribution satisfying Talagrand’s inequality T1. |
Adil Salim; Lukang Sun; Peter Richtarik; |
837 | FITNESS: (Fine Tune on New and Similar Samples) to Detect Anomalies in Streams with Drift and Outliers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose \name, (\textbf{Fi}ne \textbf{T}une on \textbf{Ne}w and \textbf{S}imilar \textbf{S}amples), a flexible framework for detecting anomalies on data streams. |
Abishek Sankararaman; Balakrishnan Narayanaswamy; Vikramank Y Singh; Zhao Song; |
838 | The Algebraic Path Problem for Graph Metrics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We here clarify, for the first time, the relation between the potential distance and the log-semiring. |
Enrique Fita Sanmarti?n; Sebastian Damrich; Fred Hamprecht; |
839 | LSB: Local Self-Balancing MCMC in Discrete Spaces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the Local Self-Balancing sampler (LSB), a local Markov Chain Monte Carlo (MCMC) method for sampling in purely discrete domains, which is able to autonomously adapt to the target distribution and to reduce the number of target evaluations required to converge. |
Emanuele Sansone; |
840 | PoF: Post-Training of Feature Extractor for Improving Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. |
Ikuro Sato; Yamada Ryota; Masayuki Tanaka; Nakamasa Inoue; Rei Kawakami; |
841 | Re-evaluating Word Mover’s Distance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we point out that the evaluation in the original study could be misleading. |
Ryoma Sato; Makoto Yamada; Hisashi Kashima; |
842 | Understanding Contrastive Learning Requires Incorporating Inductive Biases Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Theoretical analysis is presented for the class of linear representations, where incorporating inductive biases of the function class allows contrastive learning to work with less stringent conditions compared to prior analyses. |
Nikunj Saunshi; Jordan Ash; Surbhi Goel; Dipendra Misra; Cyril Zhang; Sanjeev Arora; Sham Kakade; Akshay Krishnamurthy; |
843 | The Neural Race Reduction: Dynamics of Abstraction in Gated Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. |
Andrew Saxe; Shagun Sodhani; Sam Jay Lewallen; |
844 | Convergence Rates of Non-Convex Stochastic Gradient Descent Under A Generic Lojasiewicz Condition and Local Smoothness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose to extend these results by analyzing stochastic gradient descent under more generic Lojasiewicz conditions that are applicable to any convex loss function, thus extending the current theory to a larger panel of losses commonly used in practice such as cross-entropy. |
Kevin Scaman; Cedric Malherbe; Ludovic Dos Santos; |
845 | An Asymptotic Test for Conditional Independence Using Analytic Kernel Embeddings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new conditional dependence measure and a statistical test for conditional independence. |
Meyer Scetbon; Laurent Meunier; Yaniv Romano; |
846 | Linear-Time Gromov Wasserstein Distances Using Low Rank Couplings and Costs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We show in this work how a recent variant of the OT problem that restricts the set of admissible couplings to those having a low-rank factorization is remarkably well suited to the resolution of GW: when applied to GW, we show that this approach is not only able to compute a stationary point of the GW problem in time $O(n^2)$, but also uniquely positioned to benefit from the knowledge that the initial cost matrices are low-rank, to yield a linear time $O(n)$ GW approximation. |
Meyer Scetbon; Gabriel Peyr?; Marco Cuturi; |
847 | Streaming Inference for Infinite Feature Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we make feature models significantly more applicable to streaming data by imbuing them with the ability to create new features, online, in a probabilistic and principled manner. |
Rylan Schaeffer; Yilun Du; Gabrielle K Liu; Ila Fiete; |
848 | Modeling Irregular Time Series with Continuous Recurrent Units Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, in many datasets (e.g. medical records) observation times are irregular and can carry important information. To address this challenge, we propose continuous recurrent units (CRUs) {–} a neural architecture that can naturally handle irregular intervals between observations. |
Mona Schirmer; Mazin Eltayeb; Stefan Lessmann; Maja Rudolph; |
849 | Structure Preserving Neural Networks: A Case Study in The Entropy Closure of The Boltzmann Equation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore applications of deep learning in statistical physics. |
Steffen Schotth?fer; Tianbai Xiao; Martin Frank; Cory Hauck; |
850 | Improving Robustness Against Real-World and Worst-Case Distribution Shifts Through Decision Region Quantification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. |
Leo Schwinn; Leon Bungert; An Nguyen; Ren? Raab; Falk Pulsmeyer; Doina Precup; Bjoern Eskofier; Dario Zanca; |
851 | Symmetric Machine Theory of Mind Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast, we propose to model machine theory of mind in a more general symmetric scenario. |
Melanie Sclar; Graham Neubig; Yonatan Bisk; |
852 | Data-SUITE: Data-centric Identification of In-distribution Incongruous Examples Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a paradigm shift with Data-SUITE: a data-centric AI framework to identify these regions, independent of a task-specific model. |
Nabeel Seedat; Jonathan Crabb?; Mihaela van der Schaar; |
853 | Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. |
Nabeel Seedat; Fergus Imrie; Alexis Bellot; Zhaozhi Qian; Mihaela van der Schaar; |
854 | Neural Tangent Kernel Beyond The Infinite-Width Limit: Effects of Depth and Initialization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the NTK of fully-connected ReLU networks with depth comparable to width. |
Mariia Seleznova; Gitta Kutyniok; |
855 | Reinforcement Learning with Action-Free Pre-Training from Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos. |
Younggyo Seo; Kimin Lee; Stephen L James; Pieter Abbeel; |
856 | Efficient Model-based Multi-agent Reinforcement Learning Via Optimistic Equilibrium Computation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. |
Pier Giuseppe Sessa; Maryam Kamgarpour; Andreas Krause; |
857 | Selective Regression Under Fairness Criteria Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. |
Abhin Shah; Yuheng Bu; Joshua K Lee; Subhro Das; Rameswar Panda; Prasanna Sattigeri; Gregory W Wornell; |
858 | Utility Theory for Sequential Decision Making Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The von Neumann-Morgenstern (VNM) utility theorem shows that under certain axioms of rationality, decision-making is reduced to maximizing the expectation of some utility function. We extend these axioms to increasingly structured sequential decision making settings and identify the structure of the corresponding utility functions. |
Mehran Shakerinava; Siamak Ravanbakhsh; |
859 | Translating Robot Skills: Learning Unsupervised Skill Correspondences Across Robots Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore how we can endow robots with the ability to learn correspondences between their own skills, and those of morphologically different robots in different domains, in an entirely unsupervised manner. |
Tanmay Shankar; Yixin Lin; Aravind Rajeswaran; Vikash Kumar; Stuart Anderson; Jean Oh; |
860 | A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Assuming access to a few demonstrations, we propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. |
Archit Sharma; Rehaan Ahmad; Chelsea Finn; |
861 | Content Addressable Memory Without Catastrophic Forgetting By Heteroassociation with A Fixed Scaffold Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel CAM architecture, Memory Scaffold with Heteroassociation (MESH), that factorizes the problems of internal attractor dynamics and association with external content to generate a CAM continuum without a memory cliff: Small numbers of patterns are stored with complete information recovery matching standard CAMs, while inserting more patterns still results in partial recall of every pattern, with a graceful trade-off between pattern number and pattern richness. |
Sugandha Sharma; Sarthak Chandra; Ila Fiete; |
862 | Federated Minimax Optimization: Improved Convergence Analyses and Algorithms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider nonconvex minimax optimization, which is gaining prominence in many modern machine learning applications, such as GANs. |
Pranay Sharma; Rohan Panda; Gauri Joshi; Pramod Varshney; |
863 | DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-DPP to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. |
Hassam Sheikh; Kizza Frisbee; Mariano Phielipp; |
864 | Instance Dependent Regret Analysis of Kernelized Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the problem of designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function $f$. |
Shubhanshu Shekhar; Tara Javidi; |
865 | Data Augmentation As Feature Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. |
Ruoqi Shen; Sebastien Bubeck; Suriya Gunasekar; |
866 | Metric-Fair Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we henceforth study metric-fair active learning of homogeneous halfspaces, and show that under the distribution-dependent PAC learning model, fairness and label efficiency can be achieved simultaneously. |
Jie Shen; Nan Cui; Jing Wang; |
867 | PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we employ partial differential operators (PDOs) to model 3D filters, and derive general steerable 3D CNNs, which are called PDO-s3DCNNs. |
Zhengyang Shen; Tao Hong; Qi She; Jinwen Ma; Zhouchen Lin; |
868 | Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that contrastive pre-training, which learns features on unlabeled source and target data and then fine-tunes on labeled source data, is competitive with strong UDA methods. |
Kendrick Shen; Robbie M Jones; Ananya Kumar; Sang Michael Xie; Jeff Z. Haochen; Tengyu Ma; Percy Liang; |
869 | Constrained Optimization with Dynamic Bound-scaling for Effective NLP Backdoor Defense Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a novel optimization method for NLP backdoor inversion. |
Guangyu Shen; Yingqi Liu; Guanhong Tao; Qiuling Xu; Zhuo Zhang; Shengwei An; Shiqing Ma; Xiangyu Zhang; |
870 | Staged Training for Transformer Language Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a growth operator to increase the model depth and width. |
Sheng Shen; Pete Walsh; Kurt Keutzer; Jesse Dodge; Matthew Peters; Iz Beltagy; |
871 | Deep Network Approximation in Terms of Intrinsic Parameters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: From an approximation perspective, this paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect. |
Zuowei Shen; Haizhao Yang; Shijun Zhang; |
872 | Gradient-Free Method for Heavily Constrained Nonconvex Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, to solve the nonconvex problem with a large number of white/black-box constraints, we proposed a doubly stochastic zeroth-order gradient method (DSZOG) with momentum method and adaptive step size. |
Wanli Shi; Hongchang Gao; Bin Gu; |
873 | Global Optimization of K-Center Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work provides a practical global optimization algorithm for this task based on a reduced-space spatial branch and bound scheme. |
Mingfei Shi; Kaixun Hua; Jiayang Ren; Yankai Cao; |
874 | Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction. |
Laixi Shi; Gen Li; Yuting Wei; Yuxin Chen; Yuejie Chi; |
875 | Adversarial Masking for Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose ADIOS, a masked image model (MIM) framework for self-supervised learning, which simultaneously learns a masking function and an image encoder using an adversarial objective. |
Yuge Shi; N Siddharth; Philip Torr; Adam R Kosiorek; |
876 | Visual Attention Emerges from Recurrent Sparse Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present VARS, Visual Attention from Recurrent Sparse reconstruction, a new attention formulation built on two prominent features of the human visual attention mechanism: recurrency and sparsity. |
Baifeng Shi; Yale Song; Neel Joshi; Trevor Darrell; Xin Wang; |
877 | A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. |
Chengchun Shi; Masatoshi Uehara; Jiawei Huang; Nan Jiang; |
878 | Robust Group Synchronization Via Quadratic Programming Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel quadratic programming formulation for estimating the corruption levels in group synchronization, and use these estimates to solve this problem. |
Yunpeng Shi; Cole M Wyeth; Gilad Lerman; |
879 | Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. |
Tal Shnitzer; Mikhail Yurochkin; Kristjan Greenewald; Justin M Solomon; |
880 | Scalable Computation of Causal Bounds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the problem of computing bounds for causal inference problems with unobserved confounders, where identifiability does not hold. |
Madhumitha Shridharan; Garud Iyengar; |
881 | Bit Prioritization in Variational Autoencoders Via Progressive Coding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we treat image synthesis itself as a hierarchical representation learning problem and regularize an HVAE toward representations that improve the model’s image synthesis performance. |
Rui Shu; Stefano Ermon; |
882 | Fair Representation Learning Through Implicit Path Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Besides, to avoid the high computational and memory cost of differentiating in the inner-loop of bi-level objective, we propose an implicit path alignment algorithm, which only relies on the solution of inner optimization and the implicit differentiation rather than the exact optimization path. |
Changjian Shui; Qi Chen; Jiaqi Li; Boyu Wang; Christian Gagn?; |
883 | Faster Algorithms for Learning Convex Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop and analyze an approach for solving a broad range of convex function learning problems that is faster than state-of-the-art approaches. |
Ali Siahkamari; Durmus Alp Emre Acar; Christopher Liao; Kelly L Geyer; Venkatesh Saligrama; Brian Kulis; |
884 | Coin Flipping Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that neural networks with access to randomness can outperform deterministic networks by using amplification. |
Yuval Sieradzki; Nitzan Hodos; Gal Yehuda; Assaf Schuster; |
885 | Reverse Engineering The Neural Tangent Kernel Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. |
James Benjamin Simon; Sajant Anand; Mike Deweese; |
886 | Demystifying The Adversarial Robustness of Random Transformation Defenses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: First, we show that the BPDA attack (Athalye et al., 2018a) used in BaRT’s evaluation is ineffective and likely overestimates its robustness. We then attempt to construct the strongest possible RT defense through the informed selection of transformations and Bayesian optimization for tuning their parameters. |
Chawin Sitawarin; Zachary J Golan-Strieb; David Wagner; |
887 | Smoothed Adversarial Linear Contextual Bandits with Knapsacks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present algorithms and characterize regret for LinCBwK in the smoothed setting where base context vectors are assumed to be perturbed by Gaussian noise. |
Vidyashankar Sivakumar; Shiliang Zuo; Arindam Banerjee; |
888 | GenLabel: Mixup Relabeling Using Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we first identify the main causes of this phenomenon by theoretically and empirically analyzing the mixup algorithm. To resolve this, we propose GenLabel, a simple yet effective relabeling algorithm designed for mixup. |
Jy-Yong Sohn; Liang Shang; Hongxu Chen; Jaekyun Moon; Dimitris Papailiopoulos; Kangwook Lee; |
889 | Communicating Via Markov Decision Processes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider the problem of communicating exogenous information by means of Markov decision process trajectories. |
Samuel Sokota; Christian A Schroeder De Witt; Maximilian Igl; Luisa M Zintgraf; Philip Torr; Martin Strohmeier; Zico Kolter; Shimon Whiteson; Jakob Foerster; |
890 | The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the multivariate community Hawkes (MULCH) model, an extremely flexible community-based model for continuous-time networks that introduces dependence between node pairs using structured multivariate Hawkes processes. |
Hadeel Soliman; Lingfei Zhao; Zhipeng Huang; Subhadeep Paul; Kevin S Xu; |
891 | Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA) to separately access the risk sources. |
Kyunghwan Son; Junsu Kim; Sungsoo Ahn; Roben D Delos Reyes; Yung Yi; Jinwoo Shin; |
892 | TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In addition, in order to handle this issue, we propose Topology-Aware Margin (TAM) to reflect local topology on the learning objective. |
Jaeyun Song; Joonhyung Park; Eunho Yang; |
893 | A General Recipe for Likelihood-free Bayesian Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To extend BO to a broader class of models and utilities, we propose likelihood-free BO (LFBO), an approach based on likelihood-free inference. |
Jiaming Song; Lantao Yu; Willie Neiswanger; Stefano Ermon; |
894 | Fully-Connected Network on Noncompact Symmetric Space and Ridgelet Transform Based on Helgason-Fourier Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Based on the well-established framework of the Helgason-Fourier transform on the noncompact symmetric space, we present a fully-connected network and its associated ridgelet transform on the noncompact symmetric space, covering the hyperbolic neural network (HNN) and the SPDNet as special cases. |
Sho Sonoda; Isao Ishikawa; Masahiro Ikeda; |
895 | Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. |
Aivar Sootla; Alexander I Cowen-Rivers; Taher Jafferjee; Ziyan Wang; David H Mguni; Jun Wang; Haitham Ammar; |
896 | Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. |
Pedro J Soto; Ilia Ilmer; Haibin Guan; Jun Li; |
897 | Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. |
Samuel Stanton; Wesley Maddox; Nate Gruver; Phillip Maffettone; Emily Delaney; Peyton Greenside; Andrew Gordon Wilson; |
898 | 3D Infomax Improves GNNs for Molecular Property Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although the 3D molecular graph structure is necessary for models to achieve strong performance on many tasks, it is infeasible to obtain 3D structures at the scale required by many real-world applications. To tackle this issue, we propose to use existing 3D molecular datasets to pre-train a model to reason about the geometry of molecules given only their 2D molecular graphs. |
Hannes St?rk; Dominique Beaini; Gabriele Corso; Prudencio Tossou; Christian Dallago; Stephan G?nnemann; Pietro Li?; |
899 | EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We challenge this paradigm with EquiBind, an SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand’s bound pose and orientation. |
Hannes St?rk; Octavian Ganea; Lagnajit Pattanaik; Dr.Regina Barzilay; Tommi Jaakkola; |
900 | Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To overcome these drawbacks, we present Plug & Play Attacks, which relax the dependency between the target model and image prior, and enable the use of a single GAN to attack a wide range of targets, requiring only minor adjustments to the attack. |
Lukas Struppek; Dominik Hintersdorf; Antonio De Almeida Correira; Antonia Adler; Kristian Kersting; |
901 | Scaling-up Diverse Orthogonal Convolutional Networks By A Paraunitary Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a theoretical framework that establishes the equivalence between diverse orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain. |
Jiahao Su; Wonmin Byeon; Furong Huang; |
902 | Divergence-Regularized Multi-Agent Actor-Critic Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate divergence regularization in cooperative MARL and propose a novel off-policy cooperative MARL framework, divergence-regularized multi-agent actor-critic (DMAC). |
Kefan Su; Zongqing Lu; |
903 | Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. |
Miguel Suau; Jinke He; Matthijs T. J. Spaan; Frans Oliehoek; |
904 | Improved StyleGAN-v2 Based Inversion for Out-of-Distribution Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose SPHInX (StyleGAN with Projection Heads for Inverting X), an approach for accurately embedding OOD images onto the StyleGAN latent space. |
Rakshith Subramanyam; Vivek Narayanaswamy; Mark Naufel; Andreas Spanias; Jayaraman J. Thiagarajan; |
905 | Continuous-Time Analysis of Accelerated Gradient Methods Via Conservation Laws in Dilated Coordinate Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We analyze continuous-time models of accelerated gradient methods through deriving conservation laws in dilated coordinate systems. |
Jaewook J Suh; Gyumin Roh; Ernest K Ryu; |
906 | Do Differentiable Simulators Give Better Policy Gradients? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. |
Hyung Ju Suh; Max Simchowitz; Kaiqing Zhang; Russ Tedrake; |
907 | Intriguing Properties of Input-Dependent Randomized Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present one concrete design of the smoothing variance function and test it on CIFAR10 and MNIST. |
Peter S?keni?k; Aleksei Kuvshinov; Stephan G?nnemann; |
908 | Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work presents reward surfaces and related visualizations of 27 of the most widely used reinforcement learning environments in Gym for the first time. |
Ryan Sullivan; Jordan K Terry; Benjamin Black; John P Dickerson; |
909 | AGNAS: Attention-Guided Micro and Macro-Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the two issues, we propose a new search paradigm, that is, leverage the attention mechanism to guide the micro- and macro-architecture search, namely AGNAS. |
Zihao Sun; Yu Hu; Shun Lu; Longxing Yang; Jilin Mei; Yinhe Han; Xiaowei Li; |
910 | Adaptive Random Walk Gradient Descent for Decentralized Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the adaptive step size random walk gradient descent with momentum for decentralized optimization, in which the training samples are drawn dependently with each other. |
Tao Sun; Dongsheng Li; Bao Wang; |
911 | MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing NAS methods for object detection require hundreds to thousands of GPU hours of searching, making them impractical in fast-paced research and development. In this work, we propose a novel zero-shot NAS method to address this issue. |
Zhenhong Sun; Ming Lin; Xiuyu Sun; Zhiyu Tan; Hao Li; Rong Jin; |
912 | Out-of-Distribution Detection with Deep Nearest Neighbors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the efficacy of non-parametric nearest-neighbor distance for OOD detection, which has been largely overlooked in the literature. |
Yiyou Sun; Yifei Ming; Xiaojin Zhu; Yixuan Li; |
913 | Black-Box Tuning for Language-Model-as-a-Service Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes the black-box tuning framework to optimize the continuous prompt prepended to the input text via derivative-free optimization. |
Tianxiang Sun; Yunfan Shao; Hong Qian; Xuanjing Huang; Xipeng Qiu; |
914 | Correlated Quantization for Distributed Mean Estimation and Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a correlated quantization protocol whose error guarantee depends on the deviation of data points instead of their absolute range. |
Ananda Theertha Suresh; Ziteng Sun; Jae Ro; Felix Yu; |
915 | Causal Imitation Learning Under Temporally Correlated Noise Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. |
Gokul Swamy; Sanjiban Choudhury; Drew Bagnell; Steven Wu; |
916 | Being Properly Improper Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to "untwist" the twisted posterior into the clean posterior. |
Tyler Sypherd; Richard Nock; Lalitha Sankar; |
917 | Distributionally-Aware Kernelized Bandit Problems for Risk Aversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the issues, in this paper, we model environments using a family of the output distributions (or more precisely, probability kernel) and Kernel Mean Embeddings (KME), and provide novel UCB-type algorithms for CVaR and MV. |
Sho Takemori; |
918 | Sequential and Parallel Constrained Max-value Entropy Search Via Information Lower Bound Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel variant of MES for constrained problems, called Constrained MES via Information lower BOund (CMES-IBO), that is based on a Monte Carlo (MC) estimator of a lower bound of a mutual information (MI). |
Shion Takeno; Tomoyuki Tamura; Kazuki Shitara; Masayuki Karasuyama; |
919 | SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). |
Yuhta Takida; Takashi Shibuya; Weihsiang Liao; Chieh-Hsin Lai; Junki Ohmura; Toshimitsu Uesaka; Naoki Murata; Shusuke Takahashi; Toshiyuki Kumakura; Yuki Mitsufuji; |
920 | A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) at a target site by leveraging models derived from other potentially heterogeneous sites, without them sharing subject-level data. |
Xiaoqing Tan; Chung-Chou H. Chang; Ling Zhou; Lu Tang; |
921 | N-Penetrate: Active Learning of Neural Collision Handler for Complex 3D Mesh Deformations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a robust learning algorithm to detect and handle collisions in 3D deforming meshes. |
Qingyang Tan; Zherong Pan; Breannan Smith; Takaaki Shiratori; Dinesh Manocha; |
922 | Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. |
Yunhao Tang; |
923 | Rethinking Graph Neural Networks for Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our crucial observation is the existence of anomalies will lead to the ‘right-shift’ phenomenon, that is, the spectral energy distribution concentrates less on low frequencies and more on high frequencies. |
Jianheng Tang; Jiajin Li; Ziqi Gao; Jia Li; |
924 | Deep Safe Incomplete Multi-view Clustering: Theorem and Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although jointly imputing incomplete samples and conducting clustering has been shown to achieve promising performance, learning from both complete and incomplete data may be worse than learning only from complete data, particularly when imputed views are semantic inconsistent with missing views. To address this issue, we propose a novel framework to reduce the clustering performance degradation risk from semantic inconsistent imputed views. |
Huayi Tang; Yong Liu; |
925 | Virtual Homogeneity Learning: Defending Against Data Heterogeneity in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a different approach named virtual homogeneity learning (VHL) to directly “rectify” the data heterogeneity. |
Zhenheng Tang; Yonggang Zhang; Shaohuai Shi; Xin He; Bo Han; Xiaowen Chu; |
926 | Cross-Space Active Learning on Graph Convolutional Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our study covers both budget algorithms which terminate after a designated number of label requests, and verifiable algorithms which terminate only after having found an accurate hypothesis. |
Yufei Tao; Hao Wu; Shiyuan Deng; |
927 | FedNest: Federated Bilevel, Minimax, and Compositional Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose FedNest: A federated alternating stochastic gradient method to address general nested problems. |
Davoud Ataee Tarzanagh; Mingchen Li; Christos Thrampoulidis; Samet Oymak; |
928 | Efficient Distributionally Robust Bayesian Optimization with Worst-case Sensitivity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We provide a regret bound for our novel DRBO algorithm with the fast approximation, and empirically show it is competitive with that using the exact worst-case expected value while incurring significantly less computation time. |
Sebastian Shenghong Tay; Chuan Sheng Foo; Urano Daisuke; Richalynn Leong; Bryan Kian Hsiang Low; |
929 | LIDL: Local Intrinsic Dimension Estimation Using Approximate Likelihood Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We attempt to address that challenge by proposing a novel approach to the problem: Local Intrinsic Dimension estimation using approximate Likelihood (LIDL). |
Piotr Tempczyk; Rafal Michaluk; Lukasz Garncarek; Przemyslaw Spurek; Jacek Tabor; Adam Golinski; |
930 | LCANets: Lateral Competition Improves Robustness Against Corruption and Attack Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by evidence that neural activity in V1 is sparse, we develop a class of hybrid CNNs, called LCANets, which feature a frontend that performs sparse coding via local lateral competition. |
Michael Teti; Garrett Kenyon; Ben Migliori; Juston Moore; |
931 | Reverse Engineering $\ell_p$ Attacks: A Block-sparse Optimization Approach with Recovery Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, given an attacked signal, we study conditions under which one can determine the type of attack ($\ell_1$, $\ell_2$ or $\ell_\infty$) and recover the clean signal. |
Darshan Thaker; Paris Giampouras; Rene Vidal; |
932 | Generalised Policy Improvement with Geometric Policy Composition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. |
Shantanu Thakoor; Mark Rowland; Diana Borsa; Will Dabney; Remi Munos; Andre Barreto; |
933 | Algorithms for The Communication of Samples Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we propose two new coding schemes with practical advantages over existing approaches. |
Lucas Theis; Noureldin Y Ahmed; |
934 | Consistent Polyhedral Surrogates for Top-k Classification and Variants Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We leverage this analysis to derive constraints on the conditional label distributions under which these proposed surrogates become consistent for top-$k$. |
Anish Thilagar; Rafael Frongillo; Jessica J Finocchiaro; Emma Goodwill; |
935 | On The Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Complement to these developments, in this paper, we isolate a new class of functions that could be Clarke irregular (and thus not weakly convex anymore) and show that our new algorithmic scheme can compute NAS points for functions in that class within finite time. |
Lai Tian; Kaiwen Zhou; Anthony Man-Cho So; |
936 | From Dirichlet to Rubin: Optimistic Exploration in RL Without Bonuses Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. 2012 for multi-armed bandits. |
Daniil Tiapkin; Denis Belomestny; Eric Moulines; Alexey Naumov; Sergey Samsonov; Yunhao Tang; Michal Valko; Pierre Menard; |
937 | Nonparametric Sparse Tensor Factorization with Hierarchical Gamma Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a nonparametric factorization approach for sparsely observed tensors. |
Conor Tillinghast; Zheng Wang; Shandian Zhe; |
938 | Deciphering Lasso-based Classification Through A Large Dimensional Analysis of The Iterative Soft-Thresholding Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a theoretical analysis of a Lasso-based classification algorithm. |
Malik Tiomoko; Ekkehard Schnoor; Mohamed El Amine Seddik; Igor Colin; Aladin Virmaux; |
939 | Extended Unconstrained Features Model for Exploring Deep Neural Collapse Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we further analyze and extend the UFM. |
Tom Tirer; Joan Bruna; |
940 | Object Permanence Emerges in A Random Walk Along Memory Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a self-supervised objective for learning representations that localize objects under occlusion – a property known as object permanence. |
Pavel Tokmakov; Allan Jabri; Jie Li; Adrien Gaidon; |
941 | Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and More Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than $n=|P|$ for general monotonic loss functions. |
Elad Tolochinksy; Ibrahim Jubran; Dan Feldman; |
942 | Failure and Success of The Spectral Bias Prediction for Laplace Kernel Ridge Regression: The Case of Low-dimensional Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To clarify when the spectral bias prediction holds, we first focus on a one-dimensional model where rigorous results are obtained and then use scaling arguments to generalize and test our findings in higher dimensions. |
Umberto M Tomasini; Antonio Sclocchi; Matthieu Wyart; |
943 | Quantifying and Learning Linear Symmetry-Based Disentanglement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose D_LSBD, a mathematically sound metric to quantify LSBD, and provide a practical implementation for SO(2) groups. |
Loek Tonnaer; Luis Armando Perez Rey; Vlado Menkovski; Mike Holenderski; Jim Portegies; |
944 | A Temporal-Difference Approach to Policy Gradient Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new approach of reconstructing the policy gradient from the start state without requiring a particular sampling strategy. |
Samuele Tosatto; Andrew Patterson; Martha White; Rupam Mahmood; |
945 | Simple and Near-optimal Algorithms for Hidden Stratification and Multi-group Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper studies the structure of solutions to the multi-group learning problem, and provides simple and near-optimal algorithms for the learning problem. |
Christopher J Tosh; Daniel Hsu; |
946 | Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the lack of standardized benchmarks in this emerging field is making progress difficult to track. To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods. |
Brandon Trabucco; Xinyang Geng; Aviral Kumar; Sergey Levine; |
947 | AnyMorph: Learning Transferable Polices By Inferring Agent Morphology Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This is a challenging problem that required previous approaches to use hand-designed descriptions of the new agent’s morphology. Instead of hand-designing this description, we propose a data-driven method that learns a representation of morphology directly from the reinforcement learning objective. |
Brandon Trabucco; Mariano Phielipp; Glen Berseth; |
948 | Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We prove a hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance \epsilon (in some metric), we show how to build a similarly robust (but inefficient) classifier for attacks at distance \epsilon/2. |
Florian Tramer; |
949 | Nesterov Accelerated Shuffling Gradient Method for Convex Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG), a new algorithm for the convex finite-sum minimization problems. |
Trang H Tran; Katya Scheinberg; Lam M Nguyen; |
950 | A Completely Tuning-Free and Robust Approach to Sparse Precision Matrix Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a completely tuning-free approach for estimating sparse Gaussian graphical models. |
Chau Tran; Guo Yu; |
951 | Tackling Covariate Shift with Node-based Bayesian Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Node-based BNNs have recently been introduced as scalable alternatives, which induce epistemic uncertainty by multiplying each hidden node with latent random variables, while learning a point-estimate of the weights. In this paper, we interpret these latent noise variables as implicit representations of simple and domain-agnostic data perturbations during training, producing BNNs that perform well under covariate shift due to input corruptions. |
Trung Q Trinh; Markus Heinonen; Luigi Acerbi; Samuel Kaski; |
952 | Fenrir: Physics-Enhanced Regression for Initial Value Problems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show how probabilistic numerics can be used to convert an initial value problem into a Gauss–Markov process parametrised by the dynamics of the initial value problem. |
Filip Tronarp; Nathanael Bosch; Philipp Hennig; |
953 | Interpretable Off-Policy Learning Via Hyperbox Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an algorithm for interpretable off-policy learning via hyperbox search. |
Daniel Tschernutter; Tobias Hatt; Stefan Feuerriegel; |
954 | FriendlyCore: Practical Differentially Private Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a simple and practical tool $\mathsf{FriendlyCore}$ that takes a set of points ${\cal D}$ from an unrestricted (pseudo) metric space as input. |
Eliad Tsfadia; Edith Cohen; Haim Kaplan; Yishay Mansour; Uri Stemmer; |
955 | Pairwise Conditional Gradients Without Swap Steps and Sparser Kernel Herding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new variant of PCG, the so-called Blended Pairwise Conditional Gradients (BPCG). |
Kazuma K Tsuji; Ken?Ichiro Tanaka; Sebastian Pokutta; |
956 | Prototype Based Classification from Hierarchy to Fairness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our contribution in this work is a new neural network architecture, the concept subspace network (CSN), which generalizes existing specialized classifiers to produce a unified model capable of learning a spectrum of multi-concept relationships. |
Mycal Tucker; Julie A. Shah; |
957 | Consensus Multiplicative Weights Update: Learning to Learn Using Projector-based Game Signatures Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In doing so, we introduce CMWU, a new algorithm that extends consensus optimization to the constrained case, has local convergence guarantees for zero-sum bimatrix games, and show that it enjoys competitive performance on both zero-sum games with constant coefficients and across a spectrum of games when its coefficients are learnt. |
Nelson Vadori; Rahul Savani; Thomas Spooner; Sumitra Ganesh; |
958 | Self-Supervised Models of Audio Effectively Explain Human Cortical Responses to Speech Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we capitalize on the progress of self-supervised speech representation learning (SSL) to create new state-of-the-art models of the human auditory system. |
Aditya R Vaidya; Shailee Jain; Alexander Huth; |
959 | Path-Gradient Estimators for Continuous Normalizing Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In many applications, this regime can however not be reached by a simple Gaussian variational distribution. In this work, we overcome this crucial limitation by proposing a path-gradient estimator for the considerably more expressive variational family of continuous normalizing flows. |
Lorenz Vaitl; Kim Andrea Nicoli; Shinichi Nakajima; Pan Kessel; |
960 | Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we provide novel confidence intervals for the Nyström method and the sparse variational Gaussian process approximation method, which we establish using novel interpretations of the approximate (surrogate) posterior variance of the models. |
Sattar Vakili; Jonathan Scarlett; Da-Shan Shiu; Alberto Bernacchia; |
961 | EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a robust DME technique named EDEN that naturally handles heterogeneous communication budgets and packet losses. |
Shay Vargaftik; Ran Ben Basat; Amit Portnoy; Gal Mendelson; Yaniv Ben Itzhak; Michael Mitzenmacher; |
962 | Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochastic gradients and (ii) problem-dependent constants. |
Sharan Vaswani; Benjamin Dubois-Taine; Reza Babanezhad; |
963 | Correlation Clustering Via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Correlation clustering is a widely studied framework for clustering based on pairwise similarity and dissimilarity scores, but its best approximation algorithms rely on impractical linear programming relaxations. We present faster approximation algorithms that avoid these relaxations, for two well-studied special cases: cluster editing and cluster deletion. |
Nate Veldt; |
964 | The CLRS Algorithmic Reasoning Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To consolidate progress and work towards unified evaluation, we propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. |
Petar Velickovic; Adri? Puigdom?nech Badia; David Budden; Razvan Pascanu; Andrea Banino; Misha Dashevskiy; Raia Hadsell; Charles Blundell; |
965 | Bregman Power K-Means for Clustering Exponential Family Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we bridge these new algorithmic advances to classical work on hard clustering under Bregman divergences, which enjoy a bijection to exponential family distributions and are thus well-suited for clustering objects arising from a breadth of data generating mechanisms. |
Adithya Vellal; Saptarshi Chakraborty; Jason Q Xu; |
966 | Estimation in Rotationally Invariant Generalized Linear Models Via Approximate Message Passing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. |
Ramji Venkataramanan; Kevin K?gler; Marco Mondelli; |
967 | Bayesian Optimization Under Stochastic Delayed Feedback Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider the BO under stochastic delayed feedback problem. |
Arun Verma; Zhongxiang Dai; Bryan Kian Hsiang Low; |
968 | VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast, scene graphs have much larger object and relation vocabularies, and their semantics are latent. To address this challenge, we propose a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs. |
Tathagat Verma; Abir De; Yateesh Agrawal; Vishwa Vinay; Soumen Chakrabarti; |
969 | Calibrated Learning to Defer with One-vs-All Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an L2D system based on one-vs-all classifiers that is able to produce calibrated probabilities of expert correctness. |
Rajeev Verma; Eric Nalisnick; |
970 | Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). |
Daniel Vial; Advait Parulekar; Sanjay Shakkottai; R Srikant; |
971 | On Implicit Bias in Overparameterized Bilevel Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of different gradient-based algorithms for jointly optimizing the inner and outer parameters. |
Paul Vicol; Jonathan P Lorraine; Fabian Pedregosa; David Duvenaud; Roger B Grosse; |
972 | Multiclass Learning with Margin: Exponential Rates with No Bias-variance Trade-off Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the behavior of error bounds for multiclass classification under suitable margin conditions. |
Stefano Vigogna; Giacomo Meanti; Ernesto De Vito; Lorenzo Rosasco; |
973 | Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. |
Adam R Villaflor; Zhe Huang; Swapnil Pande; John M Dolan; Jeff Schneider; |
974 | Bayesian Nonparametrics for Offline Skill Discovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We first propose a method for offline learning of options (a particular skill framework) exploiting advances in variational inference and continuous relaxations. We then highlight an unexplored connection between Bayesian nonparametrics and offline skill discovery, and show how to obtain a nonparametric version of our model. |
Valentin Villecroze; Harry Braviner; Panteha Naderian; Chris Maddison; Gabriel Loaiza-Ganem; |
975 | Hermite Polynomial Features for Private Data Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To improve the sensitivity, we propose to replace random features with Hermite polynomial features. |
Margarita Vinaroz; Mohammad-Amin Charusaie; Frederik Harder; Kamil Adamczewski; Mi Jung Park; |
976 | What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. |
Tiffany J Vlaar; Jonathan Frankle; |
977 | Multirate Training of Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained on different time scales, where slow parts are updated less frequently. |
Tiffany J Vlaar; Benedict Leimkuhler; |
978 | Provably Adversarially Robust Nearest Prototype Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we provide a complete discussion on the complexity when using $\ell_p$-distances for decision and $\ell_q$-threat models for certification for $p,q \in \{1,2,\infty\}$. |
V?clav Vor?cek; Matthias Hein; |
979 | First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While such bounds exist in many settings, they have proven elusive in reinforcement learning with large state spaces. In this work we address this gap, and show that it is possible to obtain regret scaling as $\widetilde{\mathcal{O}}(\sqrt{d^3 H^3 \cdot V_1^\star \cdot K} + d^{3.5}H^3\log K )$ in reinforcement learning with large state spaces, namely the linear MDP setting. |
Andrew J Wagenmaker; Yifang Chen; Max Simchowitz; Simon Du; Kevin Jamieson; |
980 | Reward-Free RL Is No Harder Than Reward-Aware RL in Linear Markov Decision Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To our knowledge, our approach is the first computationally efficient algorithm to achieve optimal d dependence in linear MDPs, even in the single-reward PAC setting. |
Andrew J Wagenmaker; Yifang Chen; Max Simchowitz; Simon Du; Kevin Jamieson; |
981 | Training Characteristic Functions with Reinforcement Learning: XAI-methods Play Connect Four Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a setup to directly train characteristic functions in the form of neural networks to play simple two-player games. |
Stephan W?ldchen; Sebastian Pokutta; Felix Huber; |
982 | Retroformer: Pushing The Limits of End-to-end Retrosynthesis Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Retroformer, a novel Transformer-based architecture for retrosynthesis prediction without relying on any cheminformatics tools for molecule editing. |
Yue Wan; Chang-Yu Hsieh; Ben Liao; Shengyu Zhang; |
983 | Safe Exploration for Efficient Policy Evaluation and Comparison Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. |
Runzhe Wan; Branislav Kveton; Rui Song; |
984 | Greedy Based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we derive the expression of the joint Q value function of LVD and MVD. |
Lipeng Wan; Zeyang Liu; Xingyu Chen; Xuguang Lan; Nanning Zheng; |
985 | Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep model-based methods have serious limitations. We dive deeper into the causes of this poor performance, by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL. |
Yi Wan; Ali Rahimi-Kalahroudi; Janarthanan Rajendran; Ida Momennejad; Sarath Chandar; Harm H Van Seijen; |
986 | Fast Lossless Neural Compression with Integer-Only Discrete Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Integer-only Discrete Flows (IODF) an efficient neural compressor with integer-only arithmetic. |
Siyu Wang; Jianfei Chen; Chongxuan Li; Jun Zhu; Bo Zhang; |
987 | Accelerating Shapley Explanation Via Contributive Cooperator Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Even though Shapley value provides an effective explanation for a DNN model prediction, the computation relies on the enumeration of all possible input feature coalitions, which leads to the exponentially growing complexity. To address this problem, we propose a novel method SHEAR to significantly accelerate the Shapley explanation for DNN models, where only a few coalitions of input features are involved in the computation. |
Guanchu Wang; Yu-Neng Chuang; Mengnan Du; Fan Yang; Quan Zhou; Pushkar Tripathi; Xuanting Cai; Xia Hu; |
988 | Denoised MDPs: Learning World Models Better Than The World Itself Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. |
Tongzhou Wang; Simon Du; Antonio Torralba; Phillip Isola; Amy Zhang; Yuandong Tian; |
989 | Neural Implicit Dictionary Learning Via Mixture-of-Expert Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID) from a data collection and representing INR as a functional combination of wavelets sampled from the dictionary. |
Peihao Wang; Zhiwen Fan; Tianlong Chen; Zhangyang Wang; |
990 | Robust Models Are More Interpretable Because Attributions Look Normal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model’s input gradients around data points will more closely align with boundaries’ normal vectors when they are smooth. |
Zifan Wang; Matt Fredrikson; Anupam Datta; |
991 | Disentangling Disease-related Representation from Obscure for Disease Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, to learn the representations for identifying obscured lesions, we propose a disentanglement learning strategy under the guidance of alpha blending generation in an encoder-decoder framework (DAB-Net). |
Chu-Ran Wang; Fei Gao; Fandong Zhang; Fangwei Zhong; Yizhou Yu; Yizhou Wang; |
992 | Solving Stackelberg Prediction Game with Least Squares Loss Via Spherically Constrained Least Squares Reformulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore an alternative reformulation of the SPG-LS. |
Jiali Wang; Wen Huang; Rujun Jiang; Xudong Li; Alex L Wang; |
993 | VLMixer: Unpaired Vision-Language Pre-training Via Cross-Modal CutMix Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a data augmentation method, namely cross-modal CutMix (CMC), for implicit cross-modal alignment learning in unpaired VLP. |
Teng Wang; Wenhao Jiang; Zhichao Lu; Feng Zheng; Ran Cheng; Chengguo Yin; Ping Luo; |
994 | DynaMixer: A Vision MLP Architecture with Dynamic Mixing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, this paper presents an efficient MLP-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion. |
Ziyu Wang; Wenhao Jiang; Yiming M Zhu; Li Yuan; Yibing Song; Wei Liu; |
995 | Improving Screening Processes Via Calibrated Subset Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate what guarantees a screening classifier can provide, independently of whether it is constructed manually or trained. |
Lequn Wang; Thorsten Joachims; Manuel Gomez Rodriguez; |
996 | The Geometry of Robust Value Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the geometry of the robust value space for the more general Robust MDPs (RMDPs) setting, where transition uncertainties are considered. |
Kaixin Wang; Navdeep Kumar; Kuangqi Zhou; Bryan Hooi; Jiashi Feng; Shie Mannor; |
997 | What Dense Graph Do You Need for Self-Attention? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Normalized Information Payload (NIP), a graph scoring function measuring information transfer on graph, which provides an analysis tool for trade-offs between performance and complexity. |
Yuxin Wang; Chu-Tak Lee; Qipeng Guo; Zhangyue Yin; Yunhua Zhou; Xuanjing Huang; Xipeng Qiu; |
998 | Improved Certified Defenses Against Data Poisoning with (Deterministic) Finite Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an improved certified defense against general poisoning attacks, namely Finite Aggregation. |
Wenxiao Wang; Alexander J Levine; Soheil Feizi; |
999 | Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as $\widetilde{O}\left(\varepsilon_0 + T\Delta + T/\sqrt{n} + 1/\sqrt{nT}\right)$, where $\Delta$ is the average distributional distance between consecutive domains. |
Haoxiang Wang; Bo Li; Han Zhao; |
1000 | Communication-Efficient Adaptive Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel communication-efficient adaptive federated learning method (FedCAMS) with theoretical convergence guarantees. |
Yujia Wang; Lu Lin; Jinghui Chen; |
1001 | Provable Acceleration of Heavy Ball Beyond Quadratics for A Class of Polyak-Lojasiewicz Functions When The Non-Convexity Is Averaged-Out Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop some new techniques that help show acceleration beyond quadratics, which is achieved by analyzing how the change of the Hessian at two consecutive time points affects the convergence speed. |
Jun-Kun Wang; Chi-Heng Lin; Andre Wibisono; Bin Hu; |
1002 | Robustness Verification for Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the robustness metric used in these methods is linked to attack algorithms, image labels and downstream tasks, all of which may affect the consistency and reliability of robustness metric for CL. To address these problems, this paper proposes a novel Robustness Verification framework for Contrastive Learning (RVCL). |
Zekai Wang; Weiwei Liu; |
1003 | Convergence and Recovery Guarantees of The K-Subspaces Method for Subspace Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present local convergence analysis and a recovery guarantee for KSS, assuming data are generated by the semi-random union of subspaces model, where $N$ points are randomly sampled from $K \ge 2$ overlapping subspaces. |
Peng Wang; Huikang Liu; Anthony Man-Cho So; Laura Balzano; |
1004 | NP-Match: When Neural Processes Meet Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. |
Jianfeng Wang; Thomas Lukasiewicz; Daniela Massiceti; Xiaolin Hu; Vladimir Pavlovic; Alexandros Neophytou; |
1005 | Iterative Double Sketching for Faster Least-Squares Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We generalize the iterative Hessian sketching (IHS) algorithm and propose a new sketching framework named iterative double sketching (IDS) which uses approximations for both the gradient and the Hessian in each iteration. |
Rui Wang; Yanyan Ouyang; Wangli Xu; |
1006 | What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a large-scale evaluation of modeling choices and their impact on zero-shot generalization. |
Thomas Wang; Adam Roberts; Daniel Hesslow; Teven Le Scao; Hyung Won Chung; Iz Beltagy; Julien Launay; Colin Raffel; |
1007 | Improving Task-free Continual Learning By Distributionally Robust Memory Evolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these problems, for the first time, we propose a principled memory evolution framework to dynamically evolve the memory data distribution by making the memory buffer gradually harder to be memorized with distributionally robust optimization (DRO). |
Zhenyi Wang; Li Shen; Le Fang; Qiuling Suo; Tiehang Duan; Mingchen Gao; |
1008 | Risk-Averse No-Regret Learning in Online Convex Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. |
Zifan Wang; Yi Shen; Michael Zavlanos; |
1009 | Provable Domain Generalization Via Invariant-Feature Subspace Recovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). |
Haoxiang Wang; Haozhe Si; Bo Li; Han Zhao; |
1010 | ProgFed: Effective, Communication, and Computation Efficient Federated Learning By Progressive Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we propose ProgFed, the first progressive training framework for efficient and effective federated learning. |
Hui-Po Wang; Sebastian Stich; Yang He; Mario Fritz; |
1011 | Model-based Meta Reinforcement Learning Using Graph Structured Surrogate Models and Amortized Policy Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we take a closer look at this framework and propose a new posterior sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. |
Qi Wang; Herke Van Hoof; |
1012 | Approximately Equivariant Networks for Imperfectly Symmetric Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore approximately equivariant networks which are biased towards preserving symmetry but are not strictly constrained to do so. |
Rui Wang; Robin Walters; Rose Yu; |
1013 | Three-stage Evolution and Fast Equilibrium for SGD with Non-degerate Critical Points Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We justify the fast equilibrium conjecture on stochastic gradient descent from (Li et al. 2020) under the assumptions that critical points are non-degenerate and the stochastic noise is a standard Gaussian. |
Yi Wang; Zhiren Wang; |
1014 | Understanding Instance-Level Impact of Fairness Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Building on the concept of influence function, a measure that characterizes the impact of a training example on the target model and its predictive performance, this work studies the influence of training examples when fairness constraints are imposed. |
Jialu Wang; Xin Eric Wang; Yang Liu; |
1015 | Tractable Uncertainty for Structure Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present Tractable Uncertainty for STructure learning (TRUST), a framework for approximate posterior inference that relies on probabilistic circuits as a representation of our posterior belief. |
Benjie Wang; Matthew R Wicker; Marta Kwiatkowska; |
1016 | Causal Dynamics Learning for Task-Independent State Abstraction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL), which first learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action, thus generalizing well to unseen states. |
Zizhao Wang; Xuesu Xiao; Zifan Xu; Yuke Zhu; Peter Stone; |
1017 | Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We generalize the multiple-play multi-armed bandits (MP-MAB) problem with a shareable arms setting, in which several plays can share the same arm. |
Xuchuang Wang; Hong Xie; John C. S. Lui; |
1018 | Generative Coarse-Graining of Molecular Conformations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the recent progress in generative models and equivariant networks, we propose a novel model that rigorously embeds the vital probabilistic nature and geometrical consistency requirements of the backmapping transformation. |
Wujie Wang; Minkai Xu; Chen Cai; Benjamin K Miller; Tess Smidt; Yusu Wang; Jian Tang; Rafael Gomez-Bombarelli; |
1019 | Nonparametric Embeddings of Sparse High-Order Interaction Events Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). |
Zheng Wang; Yiming Xu; Conor Tillinghast; Shibo Li; Akil Narayan; Shandian Zhe; |
1020 | When Are Linear Stochastic Bandits Attackable? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. |
Huazheng Wang; Haifeng Xu; Hongning Wang; |
1021 | DRAGONN: Distributed Randomized Approximate Gradients of Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose DRAGONN, a randomized hashing algorithm for GS in DDT. |
Zhuang Wang; Zhaozhuo Xu; Xinyu Wu; Anshumali Shrivastava; T. S. Eugene Ng; |
1022 | Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. |
Bokun Wang; Tianbao Yang; |
1023 | OFA: Unifying Architectures, Tasks, and Modalities Through A Simple Sequence-to-Sequence Learning Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we pursue a unified paradigm for multimodal pretraining to break the shackles of complex task/modality-specific customization. |
Peng Wang; An Yang; Rui Men; Junyang Lin; Shuai Bai; Zhikang Li; Jianxin Ma; Chang Zhou; Jingren Zhou; Hongxia Yang; |
1024 | How Powerful Are Spectral Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the analysis, we propose JacobiConv, which uses Jacobi basis due to its orthogonality and flexibility to adapt to a wide range of weight functions. |
Xiyuan Wang; Muhan Zhang; |
1025 | Thompson Sampling for Robust Transfer in Multi-Task Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. |
Zhi Wang; Chicheng Zhang; Kamalika Chaudhuri; |
1026 | Individual Reward Assisted Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Individual Reward Assisted Team Policy Learning (IRAT), which learns two policies for each agent from the dense individual reward and the sparse team reward with discrepancy constraints for updating the two policies mutually. |
Li Wang; Yupeng Zhang; Yujing Hu; Weixun Wang; Chongjie Zhang; Yang Gao; Jianye Hao; Tangjie Lv; Changjie Fan; |
1027 | Removing Batch Normalization Boosts Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although the dominant approach is to extend BN to capture this mixture of distribution, we propose to completely eliminate this bottleneck by removing all BN layers in AT. |
Haotao Wang; Aston Zhang; Shuai Zheng; Xingjian Shi; Mu Li; Zhangyang Wang; |
1028 | Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To solve this problem, we propose Partial and Asymmetric Supervised Contrastive Learning (PASCL), which explicitly encourages the model to distinguish between tail-class in-distribution samples and OOD samples. |
Haotao Wang; Aston Zhang; Yi Zhu; Shuai Zheng; Mu Li; Alex J Smola; Zhangyang Wang; |
1029 | Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, current methods always assume the factor representations of the entities in each tensor mode are static, and never consider their temporal evolution. To fill this gap, we propose NONparametric FActor Trajectory learning for dynamic tensor decomposition (NONFAT). |
Zheng Wang; Shandian Zhe; |
1030 | Thompson Sampling for (Combinatorial) Pure Exploration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To deal with this challenge, we explore the idea of Thompson Sampling (TS) that uses independent random samples instead of the upper confidence bounds, and design the first TS-based algorithm TS-Explore for (combinatorial) pure exploration. |
Siwei Wang; Jun Zhu; |
1031 | Policy Gradient Method For Robust Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. |
Yue Wang; Shaofeng Zou; |
1032 | Certifying Out-of-Domain Generalization for Blackbox Functions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the problem of certifying distributional robustness for blackbox models and bounded loss functions, and propose a novel certification framework based on the Hellinger distance. |
Maurice G Weber; Linyi Li; Boxin Wang; Zhikuan Zhao; Bo Li; Ce Zhang; |
1033 | More Than A Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our findings suggest that random matrix theory, rather than just being a toy model, may be central to understanding the properties of neural representations in practice. |
Alexander Wei; Wei Hu; Jacob Steinhardt; |
1034 | To Smooth or Not? When Label Smoothing Meets Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We provide understandings for the properties of LS and NLS when learning with noisy labels. |
Jiaheng Wei; Hangyu Liu; Tongliang Liu; Gang Niu; Masashi Sugiyama; Yang Liu; |
1035 | Open-Sampling: Exploring Out-of-Distribution Data for Re-balancing Long-tailed Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes from a Bayesian perspective. |
Hongxin Wei; Lue Tao; Renchunzi Xie; Lei Feng; Bo An; |
1036 | Mitigating Neural Network Overconfidence with Logit Normalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, neural networks are known to suffer from the overconfidence issue, where they produce abnormally high confidence for both in- and out-of-distribution inputs. In this work, we show that this issue can be mitigated through Logit Normalization (LogitNorm)—a simple fix to the cross-entropy loss—by enforcing a constant vector norm on the logits in training. |
Hongxin Wei; Renchunzi Xie; Hao Cheng; Lei Feng; Bo An; Yixuan Li; |
1037 | Koopman Q-learning: Offline Reinforcement Learning Via Symmetries of Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Current algorithms over-fit to the training dataset and as a consequence perform poorly when deployed to out-of-distribution generalizations of the environment. We aim to address these limitations by learning a Koopman latent representation which allows us to infer symmetries of the system’s underlying dynamic. |
Matthias Weissenbacher; Samarth Sinha; Animesh Garg; Kawahara Yoshinobu; |
1038 | Fishing for User Data in Large-Batch Federated Learning Via Gradient Magnification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a new strategy that dramatically elevates existing attacks to operate on batches of arbitrarily large size, and without architectural modifications. |
Yuxin Wen; Jonas A. Geiping; Liam Fowl; Micah Goldblum; Tom Goldstein; |
1039 | BabelTower: Learning to Auto-parallelized Program Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a learning-based framework, i.e., BabelTower, to address this problem.We first create a large-scale dataset consisting of compute-intensive function-level monolingual corpora. |
Yuanbo Wen; Qi Guo; Qiang Fu; Xiaqing Li; Jianxing Xu; Yanlin Tang; Yongwei Zhao; Xing Hu; Zidong Du; Ling Li; Chao Wang; Xuehai Zhou; Yunji Chen; |
1040 | Random Forest Density Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a density estimation algorithm called random forest density estimation ( RFDE) based on random trees where the split of cell is along the midpoint of the randomly chosen dimension. |
Hongwei Wen; Hanyuan Hang; |
1041 | Fighting Fire with Fire: Avoiding DNN Shortcuts Through Priming Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show empirically that DNNs can be coaxed to avoid poor shortcuts by providing an additional “priming” feature computed from key input features, usually a coarse output estimate. |
Chuan Wen; Jianing Qian; Jierui Lin; Jiaye Teng; Dinesh Jayaraman; Yang Gao; |
1042 | Preconditioning for Scalable Gaussian Process Hyperparameter Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. |
Jonathan Wenger; Geoff Pleiss; Philipp Hennig; John Cunningham; Jacob Gardner; |
1043 | Measure Estimation in The Barycentric Coding Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper considers the problem of measure estimation under the barycentric coding model (BCM), in which an unknown measure is assumed to belong to the set of Wasserstein-2 barycenters of a finite set of known measures. |
Mattthew E Werenski; Ruijie Jiang; Abiy Tasissa; Shuchin Aeron; James M Murphy; |
1044 | COLA: Consistent Learning with Opponent-Learning Awareness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: First, we formalize consistency and show that higher-order LOLA (HOLA) solves LOLA’s inconsistency problem if it converges. Second, we correct a claim made in the literature by Sch{ä}fer and Anandkumar (2019), proving that Competitive Gradient Descent (CGD) does not recover HOLA as a series expansion (and fails to solve the consistency problem). Third, we propose a new method called Consistent LOLA (COLA), which learns update functions that are consistent under mutual opponent shaping. |
Timon Willi; Alistair Hp Letcher; Johannes Treutlein; Jakob Foerster; |
1045 | Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our derivation highlights additional terms due to statistical diffusivity which arise from the proper handling of distributions in the continuous-time setting. Based on this, we propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online, control algorithm. |
Harley E Wiltzer; David Meger; Marc G. Bellemare; |
1046 | Easy Variational Inference for Categorical Models Via An Independent Binary Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We pursue tractable Bayesian analysis of generalized linear models (GLMs) for categorical data. |
Michael T Wojnowicz; Shuchin Aeron; Eric L Miller; Michael Hughes; |
1047 | Continual Learning with Guarantees Via Weight Interval Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show how to put bounds on forgetting by reformulating continual learning of a model as a continual contraction of its parameter space. |
Maciej Wolczyk; Karol Piczak; Bartosz W?jcik; Lukasz Pustelnik; Pawel Morawiecki; Jacek Tabor; Tomasz Trzcinski; Przemyslaw Spurek; |
1048 | A Deep Learning Approach for The Segmentation of Electroencephalography Data in Eye Tracking Applications Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we introduce DETRtime, a novel framework for time-series segmentation that creates ocular event detectors that do not require additionally recorded eye-tracking modality and rely solely on EEG data. |
Lukas Wolf; Ard Kastrati; Martyna B Plomecka; Jie-Ming Li; Dustin Klebe; Alexander Veicht; Roger Wattenhofer; Nicolas Langer; |
1049 | Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number of samples, improving upon all previously known methods by poly(q) factors. |
David Woodruff; Amir Zandieh; |
1050 | Model Soups: Averaging Weights of Multiple Fine-tuned Models Improves Accuracy Without Increasing Inference Time Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. |
Mitchell Wortsman; Gabriel Ilharco; Samir Ya Gadre; Rebecca Roelofs; Raphael Gontijo-Lopes; Ari S Morcos; Hongseok Namkoong; Ali Farhadi; Yair Carmon; Simon Kornblith; Ludwig Schmidt; |
1051 | Metric-Fair Classifier Derandomization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we initiate a systematic study of classifier derandomization with metric fairness guarantees. |
Jimmy Wu; Yatong Chen; Yang Liu; |
1052 | Structural Entropy Guided Graph Hierarchical Pooling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, inspired by structural entropy, we propose a hierarchical pooling approach, SEP, to tackle the two issues. |
Junran Wu; Xueyuan Chen; Ke Xu; Shangzhe Li; |
1053 | Self-supervised Models Are Good Teaching Assistants for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we propose a head-level knowledge distillation method that selects the most important head of the supervised teacher and self-supervised teaching assistant, and let the student mimic the attention distribution of these two heads, so as to make the student focus on the relationship between tokens deemed by the teacher and the teacher assistant. |
Haiyan Wu; Yuting Gao; Yinqi Zhang; Shaohui Lin; Yuan Xie; Xing Sun; Ke Li; |
1054 | Characterizing and Overcoming The Greedy Nature of Learning in Multi-modal Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. |
Nan Wu; Stanislaw Jastrzebski; Kyunghyun Cho; Krzysztof J Geras; |
1055 | Instrumental Variable Regression with Confounder Balancing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Confounder Balanced IV Regression (CB-IV) algorithm to jointly remove the bias from the unmeasured confounders and balance the observed confounders. |
Anpeng Wu; Kun Kuang; Bo Li; Fei Wu; |
1056 | MemSR: Training Memory-efficient Lightweight Model for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper aims at calculating a winning initialization from a complex teacher network for a plain student network, which can provide performance comparable to complex models. |
Kailu Wu; Chung-Kuei Lee; Kaisheng Ma; |
1057 | Delay-Adaptive Step-sizes for Asynchronous Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. |
Xuyang Wu; Sindri Magnusson; Hamid Reza Feyzmahdavian; Mikael Johansson; |
1058 | Variational Nearest Neighbor Gaussian Process Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we instead exploit a sparse approximation of the precision matrix. |
Luhuan Wu; Geoff Pleiss; John P Cunningham; |
1059 | Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we derive PG in a unified framework, precisely clarify the relation between PG implementation and theory, and echos back the findings by \citeauthor{nota2020policy}. |
Shuang Wu; Ling Shi; Jun Wang; Guangjian Tian; |
1060 | DAVINZ: Data Valuation Using Deep Neural Networks at Initialization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we theoretically derive a domain-aware generalization bound to estimate the generalization performance of DNNs without model training. |
Zhaoxuan Wu; Yao Shu; Bryan Kian Hsiang Low; |
1061 | Robust Deep Reinforcement Learning Through Bootstrapped Opportunistic Curriculum Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Bootstrapped Opportunistic Adversarial Curriculum Learning (BCL), a novel flexible adversarial curriculum learning framework for robust reinforcement learning. |
Junlin Wu; Yevgeniy Vorobeychik; |
1062 | Revisiting Consistency Regularization for Deep Partial Label Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we revisit a simple idea namely consistency regularization, which has been shown effective in traditional PLL literature, to guide the training of deep models. |
Dong-Dong Wu; Deng-Bao Wang; Min-Ling Zhang; |
1063 | Flowformer: Linearizing Transformers with Conservation Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. |
Haixu Wu; Jialong Wu; Jiehui Xu; Jianmin Wang; Mingsheng Long; |
1064 | Nearly Optimal Policy Optimization with Stable at Any Time Guarantee Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To bridge such a gap, we propose a novel algorithm Reference-based Policy Optimization with Stable at Any Time guarantee (RPO-SAT), which features the property “Stable at Any Time”. |
Tianhao Wu; Yunchang Yang; Han Zhong; Liwei Wang; Simon Du; Jiantao Jiao; |
1065 | RetrievalGuard: Provably Robust 1-Nearest Neighbor Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to design a provably robust image retrieval model which keeps the most important evaluation metric Recall@1 invariant to adversarial perturbation. |
Yihan Wu; Hongyang Zhang; Heng Huang; |
1066 | Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we provide a problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems. |
Jingfeng Wu; Difan Zou; Vladimir Braverman; Quanquan Gu; Sham Kakade; |
1067 | Optimal Clustering with Noisy Queries Via Multi-Armed Bandit Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we obtain the first matching upper and lower bounds for a wide range of parameters. |
Jinghui Xia; Zengfeng Huang; |
1068 | ProGCL: Rethinking Hard Negative Mining in Graph Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To remedy this deficiency, we propose an effective method, dubbed \textbf{ProGCL}, to estimate the probability of a negative being true one, which constitutes a more suitable measure for negatives’ hardness together with similarity. |
Jun Xia; Lirong Wu; Ge Wang; Jintao Chen; Stan Z. Li; |
1069 | Synergy and Symmetry in Deep Learning: Interactions Between The Data, Model, and Inference Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While previous efforts have investigated this question by studying the data (D), model (M), and inference algorithm (I) as independent modules, in this paper we analyzes the triplet (D,M,I) as an integrated system and identify important synergies that help mitigate the curse of dimensionality. |
Lechao Xiao; Jeffrey Pennington; |
1070 | Identification of Linear Non-Gaussian Latent Hierarchical Structure Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Accordingly, this paper investigates the problem of discovering the hidden causal variables and estimating the causal structure, including both the causal relations among latent variables and those between latent and measured variables. |
Feng Xie; Biwei Huang; Zhengming Chen; Yangbo He; Zhi Geng; Kun Zhang; |
1071 | COAT: Measuring Object Compositionality in Emergent Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to directly measure compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models. |
Sirui Xie; Ari S Morcos; Song-Chun Zhu; Ramakrishna Vedantam; |
1072 | Robust Policy Learning Over Multiple Uncertainty Sets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. |
Annie Xie; Shagun Sodhani; Chelsea Finn; Joelle Pineau; Amy Zhang; |
1073 | Adaptive Inertia: Disentangling The Effects of Adaptive Learning Rate and Momentum Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framework. |
Zeke Xie; Xinrui Wang; Huishuai Zhang; Issei Sato; Masashi Sugiyama; |
1074 | Self-Supervised Representation Learning Via Latent Graph Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose the LaGraph, a theoretically grounded predictive SSL framework based on latent graph prediction. |
Yaochen Xie; Zhao Xu; Shuiwang Ji; |
1075 | Efficient Computation of Higher-Order Subgraph Attribution Via Message Passing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: GNN-LRP gives a relevance attribution of walks between nodes at each layer, and the subgraph attribution is expressed as a sum over exponentially many such walks. In this work, we demonstrate that such exponential complexity can be avoided. |
Ping Xiong; Thomas Schnake; Gr?goire Montavon; Klaus-Robert M?ller; Shinichi Nakajima; |
1076 | A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work focuses on a distinct approach of posterior sampling, which is celebrated in many bandits and reinforcement learning settings but remains under-explored for MGs. |
Wei Xiong; Han Zhong; Chengshuai Shi; Cong Shen; Tong Zhang; |
1077 | Importance Weighted Kernel Bayes’ Rule Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected posterior features, based on regression from kernel or neural net features of the observations. |
Liyuan Xu; Yutian Chen; Arnaud Doucet; Arthur Gretton; |
1078 | Learning to Separate Voices By Spatial Regions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a two-stage self-supervised framework in which overheard voices from earphones are pre-processed to extract relatively clean personalized signals, which are then used to train a region-wise separation model. |
Alan Xu; Romit Roy Choudhury; |
1079 | Detached Error Feedback for Distributed SGD with Random Sparsification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. |
An Xu; Heng Huang; |
1080 | Accurate Quantization of Measures Via Interacting Particle-based Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we prove general upper bounds on the quantization error of MMD and KSD at rates which significantly outperform quantization by i.i.d. samples. |
Lantian Xu; Anna Korba; Dejan Slepcev; |
1081 | Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a unified framework for group equivariant networks on homogeneous spaces derived from a Fourier perspective. |
Yinshuang Xu; Jiahui Lei; Edgar Dobriban; Kostas Daniilidis; |
1082 | Inferring Cause and Effect in The Presence of Heteroscedastic Noise Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to partition the domain of the cause into multiple segments where the noise indeed is dependent. |
Sascha Xu; Osman A Mian; Alexander Marx; Jilles Vreeken; |
1083 | Prompting Decision Transformer for Few-Shot Policy Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. |
Mengdi Xu; Yikang Shen; Shun Zhang; Yuchen Lu; Ding Zhao; Joshua Tenenbaum; Chuang Gan; |
1084 | Analyzing and Mitigating Interference in Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe that: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators between them; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. |
Jin Xu; Xu Tan; Kaitao Song; Renqian Luo; Yichong Leng; Tao Qin; Tie-Yan Liu; Jian Li; |
1085 | On The Statistical Benefits of Curriculum Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the benefits of CL in the multitask linear regression problem under both structured and unstructured settings. |
Ziping Xu; Ambuj Tewari; |
1086 | A Difference Standardization Method for Mutual Transfer Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, previous studies about mutual transfer learning either suffer from high computational complexity or oversimplified hypothesis. To overcome these challenges, in this paper, we propose the \underline{Diff}erence \underline{S}tandardization method ({\bf DiffS}) for mutual transfer learning. |
Haoqing Xu; Meng Wang; Beilun Wang; |
1087 | SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present SkexGen, a novel autoregressive generative model for computer-aided design (CAD) construction sequences containing sketch-and-extrude modeling operations. |
Xiang Xu; Karl D.D. Willis; Joseph G Lambourne; Chin-Yi Cheng; Pradeep Kumar Jayaraman; Yasutaka Furukawa; |
1088 | Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to address the problem without additional steps of reward learning and offline RL training for the case when demonstrations contain a large proportion of suboptimal data. |
Haoran Xu; Xianyuan Zhan; Honglei Yin; Huiling Qin; |
1089 | Adversarial Attack and Defense for Non-Parametric Two-Sample Tests Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To enable TST-agnostic attacks, we propose an ensemble attack (EA) framework that jointly minimizes the different types of test criteria. |
Xilie Xu; Jingfeng Zhang; Feng Liu; Masashi Sugiyama; Mohan Kankanhalli; |
1090 | Adversarially Robust Models May Not Transfer Better: Sufficient Conditions for Domain Transferability from The View of Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore the relationship between regularization and domain transferability considering different factors such as norm regularization and data augmentations (DA). |
Xiaojun Xu; Jacky Y Zhang; Evelyn Ma; Hyun Ho Son; Sanmi Koyejo; Bo Li; |
1091 | A Theoretical Analysis on Independence-driven Importance Weighting for Covariate-shift Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recently, independence-driven importance weighting algorithms in stable learning literature have shown empirical effectiveness to deal with covariate-shift generalization on several learning models, including regression algorithms and deep neural networks, while their theoretical analyses are missing. In this paper, we theoretically prove the effectiveness of such algorithms by explaining them as feature selection processes. |
Renzhe Xu; Xingxuan Zhang; Zheyan Shen; Tong Zhang; Peng Cui; |
1092 | Langevin Monte Carlo for Contextual Bandits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an efficient posterior sampling algorithm, viz., Langevin Monte Carlo Thompson Sampling (LMC-TS), that uses Markov Chain Monte Carlo (MCMC) methods to directly sample from the posterior distribution in contextual bandits. |
Pan Xu; Hongkai Zheng; Eric V Mazumdar; Kamyar Azizzadenesheli; Animashree Anandkumar; |
1093 | Investigating Why Contrastive Learning Benefits Robustness Against Label Noise Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we rigorously prove that the representation matrix learned by contrastive learning boosts robustness, by having: (i) one prominent singular value corresponding to each sub-class in the data, and significantly smaller remaining singular values; and (ii) a large alignment between the prominent singular vectors and the clean labels of each sub-class. |
Yihao Xue; Kyle Whitecross; Baharan Mirzasoleiman; |
1094 | Diversified Adversarial Attacks Based on Conjugate Gradient Method Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although existing methods based on the steepest descent have achieved high attack success rates, ill-conditioned problems occasionally reduce their performance. To address this limitation, we utilize the conjugate gradient (CG) method, which is effective for this type of problem, and propose a novel attack algorithm inspired by the CG method, named the Auto Conjugate Gradient (ACG) attack. |
Keiichiro Yamamura; Haruki Sato; Nariaki Tateiwa; Nozomi Hata; Toru Mitsutake; Issa Oe; Hiroki Ishikura; Katsuki Fujisawa; |
1095 | Cycle Representation Learning for Inductive Relation Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, based on the mathematics of algebraic topology, we propose a novel solution for inductive relation prediction, an important learning task for knowledge graph completion. |
Zuoyu Yan; Tengfei Ma; Liangcai Gao; Zhi Tang; Chao Chen; |
1096 | Optimally Controllable Perceptual Lossy Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a nontrivial finding that only two decoders are sufficient for optimally achieving arbitrary (an infinite number of different) D-P tradeoff. |
Zeyu Yan; Fei Wen; Peilin Liu; |
1097 | Active Fairness Auditing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we initiate the study of query-based auditing algorithms that can estimate the demographic parity of ML models in a query-efficient manner. |
Tom Yan; Chicheng Zhang; |
1098 | Self-Organized Polynomial-Time Coordination Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. |
Qianlan Yang; Weijun Dong; Zhizhou Ren; Jianhao Wang; Tonghan Wang; Chongjie Zhang; |
1099 | Regularizing A Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To avoid the detrimental impact of distribution mismatch, we regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process. |
Shentao Yang; Yihao Feng; Shujian Zhang; Mingyuan Zhou; |
1100 | A Psychological Theory of Explainability Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a psychological theory of how humans draw conclusions from saliency maps, the most common form of XAI explanation, which for the first time allows for precise prediction of explainee inference conditioned on explanation. |
Scott Cheng-Hsin Yang; Nils Erik Tomas Folke; Patrick Shafto; |
1101 | Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Omni-Granular Ego-Semantic Propagation for Self-Supervised Graph Representation Learning (OEPG). |
Ling Yang; Shenda Hong; |
1102 | Unsupervised Time-Series Representation Learning with Iterative Bilinear Temporal-Spectral Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a unified framework, namely Bilinear Temporal-Spectral Fusion (BTSF). |
Ling Yang; Shenda Hong; |
1103 | Searching for BurgerFormer with Micro-Meso-Macro Space Design Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By revisiting typical search spaces, we design micro-meso-macro space to search for Transformer-like architectures, namely BurgerFormer. |
Longxing Yang; Yu Hu; Shun Lu; Zihao Sun; Jilin Mei; Yinhe Han; Xiaowei Li; |
1104 | Efficient Variance Reduction for Meta-learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. |
Hansi Yang; James Kwok; |
1105 | Injecting Logical Constraints Into Neural Networks Via Straight-Through Estimators Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network’s weights in the direction that the binarized outputs satisfy the logical constraints. |
Zhun Yang; Joohyung Lee; Chiyoun Park; |
1106 | Locally Sparse Neural Networks for Tabular Biomedical Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when applied to tabular datasets. To address these neural networks’ shortcomings, we propose an intrinsically interpretable network for heterogeneous biomedical data. |
Junchen Yang; Ofir Lindenbaum; Yuval Kluger; |
1107 | Not All Poisons Are Created Equal: Robust Training Against Data Poisoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks, and provides theoretical guarantees for the performance of the model. |
Yu Yang; Tian Yu Liu; Baharan Mirzasoleiman; |
1108 | Does The Data Induce Capacity Control in Deep Learning? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We show that the input correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. |
Rubing Yang; Jialin Mao; Pratik Chaudhari; |
1109 | Informed Learning By Wide Neural Networks: Convergence, Generalization and Sampling Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider an informed deep neural network (DNN) with over-parameterization and domain knowledge integrated into its training objective function, and study how and why domain knowledge benefits the performance. |
Jianyi Yang; Shaolei Ren; |
1110 | Linear Bandit Algorithms with Sublinear Time Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. |
Shuo Yang; Tongzheng Ren; Sanjay Shakkottai; Eric Price; Inderjit S. Dhillon; Sujay Sanghavi; |
1111 | A New Perspective on The Effects of Spectrum in Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by this, we propose the correlation-free architecture which naturally removes the correlation issue among different channels, making it possible to utilize more sophisticated filters within each channel. |
Mingqi Yang; Yanming Shen; Rui Li; Heng Qi; Qiang Zhang; Baocai Yin; |
1112 | Fourier Learning with Cyclical Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we have designed a novel approach to overcome the aforementioned shortcomings. |
Yingxiang Yang; Zhihan Xiong; Tianyi Liu; Taiqing Wang; Chong Wang; |
1113 | Estimating Instance-dependent Bayes-label Transition Matrix Using A Deep Neural Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. |
Shuo Yang; Erkun Yang; Bo Han; Yang Liu; Min Xu; Gang Niu; Tongliang Liu; |
1114 | A Study of Face Obfuscation in ImageNet Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the effects of face obfuscation on the popular ImageNet challenge visual recognition benchmark. |
Kaiyu Yang; Jacqueline H. Yau; Li Fei-Fei; Jia Deng; Olga Russakovsky; |
1115 | Anarchic Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Toward this end, we propose two Anarchic Federated Averaging (AFA) algorithms with two-sided learning rates for both cross-device and cross-silo settings, which are named AFA-CD and AFA-CS, respectively. |
Haibo Yang; Xin Zhang; Prashant Khanduri; Jia Liu; |
1116 | Identity-Disentangled Adversarial Augmentation for Self-supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study a simple adversarial augmentation method that can modify training data to be hard positives/negatives without distorting the key information about their original identities. |
Kaiwen Yang; Tianyi Zhou; Xinmei Tian; Dacheng Tao; |
1117 | Learning from A Learning User for Optimal Recommendations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we formalize a model to capture such ”learning users” and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. |
Fan Yao; Chuanhao Li; Denis Nekipelov; Hongning Wang; Haifeng Xu; |
1118 | Improving Out-of-Distribution Robustness Via Selective Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we specifically consider the problems of subpopulation shifts (e.g., imbalanced data) and domain shifts. |
Huaxiu Yao; Yu Wang; Sai Li; Linjun Zhang; Weixin Liang; James Zou; Chelsea Finn; |
1119 | NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. |
Xingcheng Yao; Yanan Zheng; Xiaocong Yang; Zhilin Yang; |
1120 | Feature Space Particle Inference for Neural Network Ensembles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose to optimize particles in the feature space where activations of a specific intermediate layer lie to alleviate the abovementioned difficulties. |
Shingo Yashima; Teppei Suzuki; Kohta Ishikawa; Ikuro Sato; Rei Kawakami; |
1121 | Centroid Approximation for Bootstrap: Improving Particle Quality at Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an efficient method to explicitly optimize a small set of high quality “centroid” points to better approximate the ideal bootstrap distribution. |
Mao Ye; Qiang Liu; |
1122 | Be Like Water: Adaptive Floating Point for Machine Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel numerical representation, Adaptive Floating Point (AFP), that dynamically adjusts to the characteristics of deep learning data. |
Thomas Yeh; Max Sterner; Zerlina Lai; Brandon Chuang; Alexander Ihler; |
1123 | QSFL: A Two-Level Uplink Communication Optimization Framework for Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a solution, we propose a novel FL framework named QSFL, towards optimizing FL uplink (client-to-server) communication at both client and model levels. |
Liping Yi; Wang Gang; Liu Xiaoguang; |
1124 | De Novo Mass Spectrometry Peptide Sequencing with A Transformer Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a simple yet powerful method for de novo peptide sequencing, Casanovo, that uses a transformer framework to map directly from a sequence of observed peaks (a mass spectrum) to a sequence of amino acids (a peptide). |
Melih Yilmaz; William Fondrie; Wout Bittremieux; Sewoong Oh; William S Noble; |
1125 | Bayesian Nonparametric Learning for Point Processes with Spatial Homogeneity: A Spatial Analysis of NBA Shot Locations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel nonparametric Bayesian method for learning the underlying intensity surface built upon a combination of Dirichlet process and Markov random field. |
Fan Yin; Jieying Jiao; Jun Yan; Guanyu Hu; |
1126 | Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: BHFL brings in a new challenge, that the aggregation of model parameters with different bitwidths could result in severe performance degeneration, especially for high-bitwidth models. To tackle this problem, we propose ProWD framework, which has a trainable weight dequantizer at the central server that progressively reconstructs the low-bitwidth weights into higher bitwidth weights, and finally into full-precision weights. |
Jaehong Yoon; Geon Park; Wonyong Jeong; Sung Ju Hwang; |
1127 | ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, this work advocates hybrid NNs that consist of both powerful yet costly multiplications and efficient yet less powerful operators for marrying the best of both worlds, and proposes ShiftAddNAS, which can automatically search for more accurate and more efficient NNs. |
Haoran You; Baopu Li; Shi Huihong; Yonggan Fu; Yingyan Lin; |
1128 | Molecular Representation Learning Via Heterogeneous Motif Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most existing methods deal with molecular graphs individually while neglecting their connections, such as motif-level relationships. We propose a novel molecular graph representation learning method by constructing a heterogeneous motif graph to address this issue. |
Zhaoning Yu; Hongyang Gao; |
1129 | Understanding Robust Overfitting of Adversarial Training and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given these observations, we further designed data ablation adversarial training and identify that some small-loss data which are not worthy of the adversary strength cause robust overfitting in the strong adversary mode. To relieve this issue, we propose minimum loss constrained adversarial training (MLCAT): in a minibatch, we learn large-loss data as usual, and adopt additional measures to increase the loss of the small-loss data. |
Chaojian Yu; Bo Han; Li Shen; Jun Yu; Chen Gong; Mingming Gong; Tongliang Liu; |
1130 | How to Leverage Unlabeled Data in Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model at all. |
Tianhe Yu; Aviral Kumar; Yevgen Chebotar; Karol Hausman; Chelsea Finn; Sergey Levine; |
1131 | Reachability Constrained Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To deal with this problem, this paper proposes the reachability CRL (RCRL) method by using reachability analysis to establish the novel self-consistency condition and characterize the feasible sets. |
Dongjie Yu; Haitong Ma; Shengbo Li; Jianyu Chen; |
1132 | Topology-Aware Network Pruning Using Multi-stage Graph Embedding and Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel multi-stage graph embedding technique based on graph neural networks (GNNs) to identify DNN topologies and use reinforcement learning (RL) to find a suitable compression policy. |
Sixing Yu; Arya Mazaheri; Ali Jannesari; |
1133 | The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a tractable heuristic for solving the combinatorial extension of OBS, in which we select weights for simultaneous removal, and we combine it with a single-pass systematic update of unpruned weights. |
Xin Yu; Thiago Serra; Srikumar Ramalingam; Shandian Zhe; |
1134 | GraphFM: Improving Large-Scale GNN Training Via Feature Momentum Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose a new technique, named feature momentum (FM), that uses a momentum step to incorporate historical embeddings when updating feature representations. |
Haiyang Yu; Limei Wang; Bokun Wang; Meng Liu; Tianbao Yang; Shuiwang Ji; |
1135 | Latent Diffusion Energy-Based Model for Interpretable Text Modelling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. |
Peiyu Yu; Sirui Xie; Xiaojian Ma; Baoxiong Jia; Bo Pang; Ruiqi Gao; Yixin Zhu; Song-Chun Zhu; Ying Nian Wu; |
1136 | Predicting Out-of-Distribution Error with The Projection Norm Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a metric— Projection Norm—to predict a model’s performance on out-of-distribution (OOD) data without access to ground truth labels. |
Yaodong Yu; Zitong Yang; Alexander Wei; Yi Ma; Jacob Steinhardt; |
1137 | Robust Task Representations for Offline Meta-Reinforcement Learning Via Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing offline meta-reinforcement learning algorithms cannot distinguish these factors, making task representations unstable to the change of behavior policies. To address this problem, we propose a contrastive learning framework for task representations that are robust to the distribution mismatch of behavior policies in training and test. |
Haoqi Yuan; Zongqing Lu; |
1138 | Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. |
Zhuoning Yuan; Yuexin Wu; Zi-Hao Qiu; Xianzhi Du; Lijun Zhang; Denny Zhou; Tianbao Yang; |
1139 | Neural Tangent Kernel Empowered Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel FL paradigm empowered by the NTK framework. |
Kai Yue; Richeng Jin; Ryan Pilgrim; Chau-Wai Wong; Dror Baron; Huaiyu Dai; |
1140 | Time Is MattEr: Temporal Self-supervision for Video Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the observations, we design simple yet effective self-supervised tasks for video models to learn temporal dynamics better. |
Sukmin Yun; Jaehyung Kim; Dongyoon Han; Hwanjun Song; Jung-Woo Ha; Jinwoo Shin; |
1141 | Pure Noise to The Rescue of Insufficient Data: Improving Imbalanced Classification By Training on Random Noise Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite remarkable progress on visual recognition tasks, deep neural-nets still struggle to generalize well when training data is scarce or highly imbalanced, rendering them extremely vulnerable to real-world examples. In this paper, we present a surprisingly simple yet highly effective method to mitigate this limitation: using pure noise images as additional training data. |
Shiran Zada; Itay Benou; Michal Irani; |
1142 | Adaptive Conformal Predictions for Time Series Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a parameter-free method, AgACI, that adaptively builds upon ACI based on online expert aggregation. |
Margaux Zaffran; Olivier Feron; Yannig Goude; Julie Josse; Aymeric Dieuleveut; |
1143 | Actor-Critic Based Improper Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. |
Mohammadi Zaki; Avi Mohan; Aditya Gopalan; Shie Mannor; |
1144 | Stabilizing Q-learning with Linear Architectures for Provable Efficient Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work proposes an exploration variant of the basic Q-learning protocol with linear function approximation. |
Andrea Zanette; Martin Wainwright; |
1145 | Multi Resolution Analysis (MRA) for Approximate Self-Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. |
Zhanpeng Zeng; Sourav Pal; Jeffery Kline; Glenn M Fung; Vikas Singh; |
1146 | Efficient PAC Learning from The Crowd with Pairwise Comparisons Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that by leveraging the more easily acquired pairwise comparison queries, it is possible to exponentially reduce the label complexity while retaining the overall query complexity and runtime. |
Shiwei Zeng; Jie Shen; |
1147 | Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a new method called X-VLM to perform ‘multi-grained vision language pre-training.’ |
Yan Zeng; Xinsong Zhang; Hang Li; |
1148 | Position Prediction As An Effective Pretraining Strategy Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel, but surprisingly simple alternative to content reconstruction – that of predicting locations from content, without providing positional information for it. |
Shuangfei Zhai; Navdeep Jaitly; Jason Ramapuram; Dan Busbridge; Tatiana Likhomanenko; Joseph Y Cheng; Walter Talbott; Chen Huang; Hanlin Goh; Joshua M Susskind; |
1149 | Anytime Information Cascade Popularity Prediction Via Self-Exciting Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, for general, marked Hawkes point processes, we present closed-form expressions for the mean and variance of future event counts, conditioned on observed events. |
Xi Zhang; Akshay Aravamudan; Georgios C Anagnostopoulos; |
1150 | Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients’ updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients’ updates. |
Xinwei Zhang; Xiangyi Chen; Mingyi Hong; Steven Wu; Jinfeng Yi; |
1151 | Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Collaboration of Experts (CoE) framework to assemble the expertise of multiple networks towards a common goal. |
Yikang Zhang; Zhuo Chen; Zhao Zhong; |
1152 | PDE-Based Optimal Strategy for Unconstrained Online Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To streamline this workflow, we present a framework that generates new potential functions by solving a Partial Differential Equation (PDE). |
Zhiyu Zhang; Ashok Cutkosky; Ioannis Paschalidis; |
1153 | Stochastic Continuous Submodular Maximization: Boosting Via Non-oblivious Function Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we revisit Stochastic Continuous Submodular Maximization in both offline and online settings, which can benefit wide applications in machine learning and operations research areas. |
Qixin Zhang; Zengde Deng; Zaiyi Chen; Haoyuan Hu; Yu Yang; |
1154 | When and How Mixup Improves Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we theoretically prove that Mixup improves calibration in high-dimensional settings by investigating natural statistical models. |
Linjun Zhang; Zhun Deng; Kenji Kawaguchi; James Zou; |
1155 | UAST: Uncertainty-Aware Siamese Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We argue that these approaches lack a clear probabilistic explanation, so it is desirable to model the uncertainty and ambiguity representation of target estimation. To address this issue, this paper presents an Uncertainty-Aware Siamese Tracker (UAST) by developing a novel distribution-based regression formulation with localization uncertainty. |
Dawei Zhang; Yanwei Fu; Zhonglong Zheng; |
1156 | Examining Scaling and Transfer of Language Model Architectures for Machine Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. |
Biao Zhang; Behrooz Ghorbani; Ankur Bapna; Yong Cheng; Xavier Garcia; Jonathan Shen; Orhan Firat; |
1157 | Revisiting End-to-End Speech-to-Text Translation From Scratch Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, transcripts are not always available, and how significant such pretraining is for E2E ST has rarely been studied in the literature. In this paper, we revisit this question and explore the extent to which the quality of E2E ST trained on speech-translation pairs alone can be improved. |
Biao Zhang; Barry Haddow; Rico Sennrich; |
1158 | A Stochastic Multi-Rate Control Framework For Modeling Distributed Optimization Algorithms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work offers a fresh perspective to model, analyze, and design distributed optimization algorithms through the lens of stochastic multi-rate feedback control. |
Xinwei Zhang; Mingyi Hong; Sairaj Dhople; Nicola Elia; |
1159 | GALAXY: Graph-based Active Learning at The Extreme Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. |
Jifan Zhang; Julian Katz-Samuels; Robert Nowak; |
1160 | Fairness Interventions As (Dis)Incentives for Strategic Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing works have largely examined these as two separate issues, e.g., by focusing on building ML algorithms robust to strategic manipulation, or on training a fair ML algorithm. In this study, we set out to understand the impact they each have on the other, and examine how to characterize fair policies in the presence of strategic behavior. |
Xueru Zhang; Mohammad Mahdi Khalili; Kun Jin; Parinaz Naghizadeh; Mingyan Liu; |
1161 | Role-based Multiplex Network Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing multiplex network embedding methods neglect structural role information, which can be used to determine the structural similarity between nodes. To overcome this shortcoming, this work proposes a simple, effective, role-based embedding method for multiplex networks, called RMNE. |
Hegui Zhang; Gang Kou; |
1162 | Dynamic Topic Models for Temporal Document Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While existing topic models focus on the dynamics of individual documents, we propose two neural topic models aimed at learning unified topic distributions that incorporate both document dynamics and network structure. |
Delvin Ce Zhang; Hady Lauw; |
1163 | Personalized Federated Learning Via Variational Bayesian Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Federated learning faces huge challenges from model overfitting due to the lack of data and statistical diversity among clients. To address these challenges, this paper proposes a novel personalized federated learning method via Bayesian variational inference named pFedBayes. |
Xu Zhang; Yinchuan Li; Wenpeng Li; Kaiyang Guo; Yunfeng Shao; |
1164 | Federated Learning with Label Distribution Skew Via Logits Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate the label distribution skew in FL, where the distribution of labels varies across clients. |
Jie Zhang; Zhiqi Li; Bo Li; Jianghe Xu; Shuang Wu; Shouhong Ding; Chao Wu; |
1165 | Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Remarkably, however, we observe that even though the weights do not converge to stationary points, the progress in minimizing the loss function halts and training loss stabilizes. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems to explain it. |
Jingzhao Zhang; Haochuan Li; Suvrit Sra; Ali Jadbabaie; |
1166 | Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study oracle complexity of gradient based methods for stochastic approximation problems. |
Jingzhao Zhang; Hongzhou Lin; Subhro Das; Suvrit Sra; Ali Jadbabaie; |
1167 | Deep and Flexible Graph Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes DFG-NAS, a novel method that searches for deep and flexible GNN architectures. |
Wentao Zhang; Zheyu Lin; Yu Shen; Yang Li; Zhi Yang; Bin Cui; |
1168 | A Langevin-like Sampler for Discrete Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose discrete Langevin proposal (DLP), a simple and scalable gradient-based proposal for sampling complex high-dimensional discrete distributions. |
Ruqi Zhang; Xingchao Liu; Qiang Liu; |
1169 | Rich Feature Construction for The Optimization-Generalization Dilemma Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to initialize the networks with a rich representation containing a palette of potentially useful features, ready to be used by even simple models. |
Jianyu Zhang; David Lopez-Paz; Leon Bottou; |
1170 | Generative Flow Networks for Discrete Probabilistic Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. |
Dinghuai Zhang; Nikolay Malkin; Zhen Liu; Alexandra Volokhova; Aaron Courville; Yoshua Bengio; |
1171 | Neurotoxin: Durable Backdoors in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Neurotoxin, a simple one-line backdoor attack that functions by attacking parameters that are changed less in magnitude during training. |
Zhengming Zhang; Ashwinee Panda; Linyue Song; Yaoqing Yang; Michael Mahoney; Prateek Mittal; Ramchandran Kannan; Joseph Gonzalez; |
1172 | Making Linear MDPs Practical Via Contrastive Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. |
Tianjun Zhang; Tongzheng Ren; Mengjiao Yang; Joseph Gonzalez; Dale Schuurmans; Bo Dai; |
1173 | NAFS: A Simple Yet Tough-to-beat Baseline for Graph Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present node-adaptive feature smoothing (NAFS), a simple non-parametric method that constructs node representations without parameter learning. |
Wentao Zhang; Zeang Sheng; Mingyu Yang; Yang Li; Yu Shen; Zhi Yang; Bin Cui; |
1174 | Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To improve worst-group performance on spuriously correlated data without training attribute labels, we propose Correct-N-Contrast (CNC), a contrastive approach to directly learn representations robust to spurious correlations. |
Michael Zhang; Nimit S Sohoni; Hongyang R Zhang; Chelsea Finn; Christopher Re; |
1175 | Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. |
Xuezhou Zhang; Yuda Song; Masatoshi Uehara; Mengdi Wang; Alekh Agarwal; Wen Sun; |
1176 | Partial Counterfactual Identification from Observational and Experimental Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper investigates the problem of bounding counterfactual queries from an arbitrary collection of observational and experimental distributions and qualitative knowledge about the underlying data-generating model represented in the form of a causal diagram. |
Junzhe Zhang; Jin Tian; Elias Bareinboim; |
1177 | Set Norm and Equivariant Skip Connections: Putting The Deep in Deep Sets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we mathematically and empirically analyze normalization layers and residual connections in the context of deep permutation invariant neural networks.Based on our analysis, we propose Deep Sets++ and Set Transformer++, deep models that reach comparable or better performance than their original counterparts on a diverse suite of tasks. |
Lily Zhang; Veronica Tozzo; John Higgins; Rajesh Ranganath; |
1178 | Learning to Estimate and Refine Fluid Motion with Physical Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose an unsupervised learning based prediction-correction scheme for fluid flow estimation. |
Mingrui Zhang; Jianhong Wang; James B Tlhomole; Matthew Piggott; |
1179 | A Branch and Bound Framework for Stronger Adversarial Attacks of ReLU Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we systematically search adversarial examples in the activation space of ReLU networks to tackle hard instances where none of the existing adversarial attacks succeed. |
Huan Zhang; Shiqi Wang; Kaidi Xu; Yihan Wang; Suman Jana; Cho-Jui Hsieh; Zico Kolter; |
1180 | A Simple Yet Universal Strategy for Online Convex Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, they need to design and optimize one surrogate loss for each type of functions, making it difficult to exploit the structure of the problem and utilize existing algorithms. In this paper, we propose a simple strategy for universal online convex optimization, which avoids these limitations. |
Lijun Zhang; Guanghui Wang; Jinfeng Yi; Tianbao Yang; |
1181 | Low-Precision Stochastic Gradient Langevin Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we provide the first study of low-precision Stochastic Gradient Langevin Dynamics (SGLD), showing that its costs can be significantly reduced without sacrificing performance, due to its intrinsic ability to handle system noise. |
Ruqi Zhang; Andrew Gordon Wilson; Christopher De Sa; |
1182 | Expression Might Be Enough: Representing Pressure and Demand for Reinforcement Learning Based Traffic Signal Control Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we (1) present a novel, flexible and efficient method, namely advanced max pressure (Advanced-MP), taking both running and queuing vehicles into consideration to decide whether to change current signal phase; (2) inventively design the traffic movement representation with the efficient pressure and effective running vehicles from Advanced-MP, namely advanced traffic state (ATS); and (3) develop a reinforcement learning (RL) based algorithm template, called Advanced-XLight, by combining ATS with the latest RL approaches, and generate two RL algorithms, namely "Advanced-MPLight" and "Advanced-CoLight" from Advanced-XLight. |
Liang Zhang; Qiang Wu; Jun Shen; Linyuan L?; Bo Du; Jianqing Wu; |
1183 | Uncertainty Modeling in Generative Compressed Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the reconstruction capability of the generative model is fundamentally limited by the range of its generator, typically a small subset of the signal space of interest. To break this bottleneck and thus reconstruct those out-of-range signals, this paper presents a novel method called CS-BGM that can effectively expands the range of generator. |
Yilang Zhang; Mengchu Xu; Xiaojun Mao; Jian Wang; |
1184 | Building Robust Ensembles Via Margin Boosting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we take a principled approach towards building robust ensembles. |
Dinghuai Zhang; Hongyang Zhang; Aaron Courville; Yoshua Bengio; Pradeep Ravikumar; Arun Sai Suggala; |
1185 | Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we advance Fast-AT from the fresh perspective of bi-level optimization (BLO). |
Yihua Zhang; Guanhua Zhang; Prashant Khanduri; Mingyi Hong; Shiyu Chang; Sijia Liu; |
1186 | Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We focus on FQE with general differentiable function approximators, making our theory applicable to neural function approximations. |
Ruiqi Zhang; Xuezhou Zhang; Chengzhuo Ni; Mengdi Wang; |
1187 | ROCK: Causal Inference Principles for Reasoning About Commonsense Causality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel framework, ROCK, to Reason O(A)bout Commonsense K(C)ausality, which utilizes temporal signals as incidental supervision, and balances confounding effects using temporal propensities that are analogous to propensity scores. |
Jiayao Zhang; Hongming Zhang; Weijie Su; Dan Roth; |
1188 | No-Regret Learning in Time-Varying Zero-Sum Games Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial manner. |
Mengxiao Zhang; Peng Zhao; Haipeng Luo; Zhi-Hua Zhou; |
1189 | PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To resolve this issue, we propose PLATON, which captures the uncertainty of importance scores by upper confidence bound of importance estimation. |
Qingru Zhang; Simiao Zuo; Chen Liang; Alexander Bukharin; Pengcheng He; Weizhu Chen; Tuo Zhao; |
1190 | NysADMM: Faster Composite Convex Optimization Via Low-rank Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper develops a scalable new algorithm, called NysADMM, to minimize a smooth convex loss function with a convex regularizer. |
Shipu Zhao; Zachary Frangella; Madeleine Udell; |
1191 | Toward Compositional Generalization in Object-Oriented World Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. |
Linfeng Zhao; Lingzhi Kong; Robin Walters; Lawson L.S. Wong; |
1192 | Dynamic Regret of Online Markov Decision Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. |
Peng Zhao; Long-Fei Li; Zhi-Hua Zhou; |
1193 | Learning to Solve PDE-constrained Inverse Problems with Graph Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In many application domains across science and engineering, however, we are not only interested in a forward simulation but also in solving inverse problems with constraints defined by a partial differential equation (PDE). Here we explore GNNs to solve such PDE-constrained inverse problems. |
Qingqing Zhao; David B Lindell; Gordon Wetzstein; |
1194 | Learning from Counterfactual Links for Link Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the causal relationship between the two variables was largely ignored for learning to predict links on a graph. In this work, we visit this factor by asking a counterfactual question: "would the link still exist if the graph structure became different from observation?" |
Tong Zhao; Gang Liu; Daheng Wang; Wenhao Yu; Meng Jiang; |
1195 | Global Optimization Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to fit a new type of function called a global optimization network (GON), defined as any composition of an invertible function and a unimodal function, whose unique global maximizer can be inferred in $\mathcal{O}(D)$ time, and used as the estimate. |
Sen Zhao; Erez Louidor; Maya Gupta; |
1196 | Certified Robustness Against Natural Language Attacks By Causal Intervention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper follows a causal perspective to look into the adversarial vulnerability and proposes Causal Intervention by Semantic Smoothing (CISS), a novel framework towards robustness against natural language attacks. |
Haiteng Zhao; Chang Ma; Xinshuai Dong; Anh Tuan Luu; Zhi-Hong Deng; Hanwang Zhang; |
1197 | Efficient Learning for AlphaZero Via Path Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper aims at building powerful models under a limited amount of self-plays which can be utilized by a human throughout the lifetime. |
Dengwei Zhao; Shikui Tu; Lei Xu; |
1198 | Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. |
Yang Zhao; Hao Zhang; Xiuyuan Hu; |
1199 | Ripple Attention for Visual Perception with Sub-quadratic Complexity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To bridge the gap, we propose ripple attention, a sub-quadratic attention mechanism for vision transformers. |
Lin Zheng; Huijie Pan; Lingpeng Kong; |
1200 | Linear Complexity Randomized Self-attention Mechanism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA). |
Lin Zheng; Chong Wang; Lingpeng Kong; |
1201 | Online Decision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. |
Qinqing Zheng; Amy Zhang; Aditya Grover; |
1202 | Learning Efficient and Robust Ordinary Differential Equations Via Invertible Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to learn an ODE of interest from data by viewing its dynamics as a vector field related to another base vector field via a diffeomorphism (i.e., a differentiable bijection), represented by an invertible neural network (INN). |
Weiming Zhi; Tin Lai; Lionel Ott; Edwin V. Bonilla; Fabio Ramos; |
1203 | HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. |
Andrey Zhmoginov; Mark Sandler; Maksym Vladymyrov; |
1204 | Describing Differences Between Text Distributions with Natural Language Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to automatically summarize the differences by “learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., “ is military-related." |
Ruiqi Zhong; Charlie Snell; Dan Klein; Jacob Steinhardt; |
1205 | Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving a correlated coarse equilibrium based on the two value functions. |
Han Zhong; Wei Xiong; Jiyuan Tan; Liwei Wang; Tong Zhang; Zhaoran Wang; Zhuoran Yang; |
1206 | Dimension-free Complexity Bounds for High-order Nonconvex Finite-sum Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that the polylogarithmic dimension dependence gap is not essential and can be closed. |
Dongruo Zhou; Quanquan Gu; |
1207 | A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. |
Weichao Zhou; Wenchao Li; |
1208 | On The Optimization Landscape of Neural Collapse Under MSE Loss: Global Optimality with Unconstrained Features Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we justify NC under the mean squared error (MSE) loss, where recent empirical evidence shows that it performs comparably or even better than the de-facto cross-entropy loss. |
Jinxin Zhou; Xiao Li; Tianyu Ding; Chong You; Qing Qu; Zhihui Zhu; |
1209 | Model Agnostic Sample Reweighting for Out-of-Distribution Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work proposes a principled method, Model Agnostic samPLe rEweighting (MAPLE), to effectively address OOD problem, especially in overparameterized scenarios. |
Xiao Zhou; Yong Lin; Renjie Pi; Weizhong Zhang; Renzhe Xu; Peng Cui; Tong Zhang; |
1210 | Sparse Invariant Risk Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a simple yet effective paradigm named Sparse Invariant Risk Minimization (SparseIRM) to address this contradiction. |
Xiao Zhou; Yong Lin; Weizhong Zhang; Tong Zhang; |
1211 | Prototype-Anchored Learning for Learning with Imperfect Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. |
Xiong Zhou; Xianming Liu; Deming Zhai; Junjun Jiang; Xin Gao; Xiangyang Ji; |
1212 | FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these problems, we propose to combine Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. |
Tian Zhou; Ziqing Ma; Qingsong Wen; Xue Wang; Liang Sun; Rong Jin; |
1213 | Probabilistic Bilevel Coreset Selection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, for the first time we propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample. |
Xiao Zhou; Renjie Pi; Weizhong Zhang; Yong Lin; Zonghao Chen; Tong Zhang; |
1214 | Approximate Frank-Wolfe Algorithms Over Graph-structured Support Sets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the linear minimization oracle (LMO) cannot be efficiently obtained in general. |
Baojian Zhou; Yifan Sun; |
1215 | Improving Adversarial Robustness Via Mutual Information Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. |
Dawei Zhou; Nannan Wang; Xinbo Gao; Bo Han; Xiaoyu Wang; Yibing Zhan; Tongliang Liu; |
1216 | Modeling Adversarial Noise for Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the fact that adversarial noise contains well-generalizing features and that the relationship between adversarial data and natural data can help infer natural data and make reliable predictions, in this paper, we study to model adversarial noise by learning the transition relationship between adversarial labels (i.e. the flipped labels used to generate adversarial data) and natural labels (i.e. the ground truth labels of the natural data). |
Dawei Zhou; Nannan Wang; Bo Han; Tongliang Liu; |
1217 | Contrastive Learning with Boosted Memorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Different from previous works, we explore this direction from an alternative perspective, i.e., the data perspective, and propose a novel Boosted Contrastive Learning (BCL) method. |
Zhihan Zhou; Jiangchao Yao; Yan-Feng Wang; Bo Han; Ya Zhang; |
1218 | Understanding The Robustness in Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we examine the role of self-attention in learning robust representations. |
Daquan Zhou; Zhiding Yu; Enze Xie; Chaowei Xiao; Animashree Anandkumar; Jiashi Feng; Jose M. Alvarez; |
1219 | VLUE: A Multi-Task Multi-Dimension Benchmark for Evaluating Vision-Language Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we introduce the Vision-Language Understanding Evaluation (VLUE) benchmark, a multi-task multi-dimension benchmark for evaluating the generalization capabilities and the efficiency-performance trade-off (“Pareto SOTA”) of VLP models. |
Wangchunshu Zhou; Yan Zeng; Shizhe Diao; Xinsong Zhang; |
1220 | Detecting Corrupted Labels Without Training A Model to Predict Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, from a more data-centric perspective, we propose a training-free solution to detect corrupted labels. |
Zhaowei Zhu; Zihao Dong; Yang Liu; |
1221 | Contextual Bandits with Large Action Spaces: Made Practical Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces. |
Yinglun Zhu; Dylan J Foster; John Langford; Paul Mineiro; |
1222 | Neural-Symbolic Models for Logical Queries on Knowledge Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. |
Zhaocheng Zhu; Mikhail Galkin; Zuobai Zhang; Jian Tang; |
1223 | Topology-aware Generalization of Decentralized SGD Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). |
Tongtian Zhu; Fengxiang He; Lan Zhang; Zhengyang Niu; Mingli Song; Dacheng Tao; |
1224 | Resilient and Communication Efficient Learning for Heterogeneous Federated Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an FL scheme to address both challenges simultaneously. |
Zhuangdi Zhu; Junyuan Hong; Steve Drew; Jiayu Zhou; |
1225 | On Numerical Integration in Neural Ordinary Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the inverse modified differential equations (IMDE) to clarify the influence of numerical integration on training Neural ODE models. |
Aiqing Zhu; Pengzhan Jin; Beibei Zhu; Yifa Tang; |
1226 | When AUC Meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose systematic and efficient gradient-based methods for both one-way and two-way partial AUC (pAUC) maximization that are applicable to deep learning. |
Dixian Zhu; Gang Li; Bokun Wang; Xiaodong Wu; Tianbao Yang; |
1227 | Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. |
Yinglun Zhu; Paul Mineiro; |
1228 | Residual-Based Sampling for Online Outlier-Robust PCA Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study online ORPCA, an important variant that addresses the practical challenge that the data points arrive in a sequential manner and the goal is to recover the underlying subspace of the clean data with one pass of the data. |
Tianhao Zhu; Jie Shen; |
1229 | Region-Based Semantic Factorization in GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a highly efficient algorithm to factorize the latent semantics learned by GANs concerning an arbitrary image region. |
Jiapeng Zhu; Yujun Shen; Yinghao Xu; Deli Zhao; Qifeng Chen; |
1230 | Beyond Images: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that tasks with lower-quality features fail to meet the anchor-point or clusterability condition, due to the coexistence of both uninformative and informative representations. To handle this issue, we propose a generic and practical information-theoretic approach to down-weight the less informative parts of the lower-quality features. |
Zhaowei Zhu; Jialu Wang; Yang Liu; |
1231 | Towards Uniformly Superhuman Autonomy Via Subdominance Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We instead assume demonstrations are of varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations. |
Brian Ziebart; Sanjiban Choudhury; Xinyan Yan; Paul Vernaza; |
1232 | Inductive Matrix Completion: No Bad Local Minima and A Fast Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we make three contributions to the IMC problem: (i) we prove that under suitable conditions, the IMC optimization landscape has no bad local minima; (ii) we derive a simple scheme with theoretical guarantees to estimate the rank of the unknown matrix; and (iii) we propose GNIMC, a simple Gauss-Newton based method to solve the IMC problem, analyze its runtime and derive for it strong recovery guarantees. |
Pini Zilber; Boaz Nadler; |
1233 | Counterfactual Prediction for Outcome-Oriented Treatments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To overcome it, we establish a new objective of optimizing counterfactual prediction on outcome-oriented treatments, propose a novel Outcome-Oriented Sample Re-weighting (OOSR) method to make the predictive model concentrate more on outcome-oriented treatments, and theoretically analyze that our method can improve treatment selection towards the optimal one. |
Hao Zou; Bo Li; Jiangang Han; Shuiping Chen; Xuetao Ding; Peng Cui; |
1234 | SpaceMAP: Visualizing High-Dimensional Data By Space Expansion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, there exist intriguing, non-intuitive discrepancies between the geometry of high- and low-dimensional space. We look into such discrepancies and propose a novel visualization method called Space-based Manifold Approximation and Projection (SpaceMAP). |
Xinrui Zu; Qian Tao; |