Paper Digest: AAAI 2024 Papers & Highlights
Note: AAAI-2024 accepts more than 2,500 papers, this page only includes 500 of them selected by our daily paper digest ranking algorithm. To browse all accepted papers or learn more about the AAAI-2024 statistics, readers can read All AAAI-2024 accepted papers in a separate page, which takes quite some time to load. On this pape, readers are also able to filter papers by keywords. For example, using ‘related code’ as the filter keyword will produce a list of all papers with code available to download.
To search or review papers within AAAI-2024 related to a specific topic, please use the search by venue (AAAI-2024), review by venue (AAAI-2024) and question answering by venue (AAAI-2024) services. To browse papers by author, here is a list of all ~9,700 authors (AAAI-2024). You may also like to explore our “Best Paper” Digest (AAAI), which lists the most influential NeurIPS papers since 1982.
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!
TABLE 1: Paper Digest: AAAI 2024 Papers & Highlights
Paper | Author(s) | |
---|---|---|
1 | T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, relying solely on text prompts cannot fully take advantage of the knowledge learned by the model, especially when flexible and accurate controlling (e.g., structure and color) is needed. In this paper, we aim to “dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly. |
Chong Mou; Xintao Wang; Liangbin Xie; Yanze Wu; Jian Zhang; Zhongang Qi; Ying Shan; |
2 | MemoryBank: Enhancing Large Language Models with Long-Term Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recognizing the necessity for long-term memory, we propose MemoryBank, a novel memory mechanism tailored for LLMs. |
Wanjun Zhong; Lianghong Guo; Qiqi Gao; He Ye; Yanlin Wang; |
3 | Learning Temporal Resolution in Spectrogram for Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution modeling for audio classification. |
Haohe Liu; Xubo Liu; Qiuqiang Kong; Wenwu Wang; Mark D. Plumbley; |
4 | Machine-Created Universal Language for Cross-Lingual Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, we propose a new Machine-created Universal Language (MUL) as an alternative intermediate language. |
Yaobo Liang; Quanzhi Zhu; Junhe Zhao; Nan Duan; |
5 | I Prefer Not to Say: Protecting User Consent in Models with Optional Personal Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the decision not to share data can be considered as information in itself that should be protected to respect users’ privacy. |
Tobias Leemann; Martin Pawelczyk; Christian Thomas Eberle; Gjergji Kasneci; |
6 | Investigating The Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. |
Seonghyeon Ye; Hyeonbin Hwang; Sohee Yang; Hyeongu Yun; Yireun Kim; Minjoon Seo; |
7 | ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new bias assessment metric, dubbed ImageCaptioner2, for image captioning. |
Eslam Abdelrahman; Pengzhan Sun; Li Erran Li; Mohamed Elhoseiny; |
8 | Preference Ranking Optimization for Human Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to directly fine-tune LLMs for human alignment. |
Feifan Song; Bowen Yu; Minghao Li; Haiyang Yu; Fei Huang; Yongbin Li; Houfeng Wang; |
9 | Visual Adversarial Examples Jailbreak Aligned Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an illustration, we present a case study in which we exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision. |
Xiangyu Qi; Kaixuan Huang; Ashwinee Panda; Peter Henderson; Mengdi Wang; Prateek Mittal; |
10 | Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there remains a lack of datasets balanced along demographic traits that can be used to evaluate the downstream fairness of these models. In this work, we demonstrate that diffusion models can be leveraged to create such a dataset. |
Nicholas Lui; Bryan Chia; William Berrios; Candace Ross; Douwe Kiela; |
11 | Fine-Grained Distillation for Long Document Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. |
Yucheng Zhou; Tao Shen; Xiubo Geng; Chongyang Tao; Jianbing Shen; Guodong Long; Can Xu; Daxin Jiang; |
12 | TurboSVM-FL: Boosting Federated Learning Through SVM Aggregation for Lazy Clients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel federated aggregation strategy, TurboSVM-FL, that poses no additional computation burden on the client side and can significantly accelerate convergence for federated classification task, especially when clients are "lazy" and train their models solely for few epochs for next global aggregation. |
Mengdi Wang; Anna Bodonhelyi; Efe Bozkir; Enkelejda Kasneci; |
13 | Graph of Thoughts: Solving Elaborate Problems with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). |
Maciej Besta; Nils Blach; Ales Kubicek; Robert Gerstenberger; Michal Podstawski; Lukas Gianinazzi; Joanna Gajda; Tomasz Lehmann; Hubert Niewiadomski; Piotr Nyczyk; Torsten Hoefler; |
14 | NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. |
Tianwen Qian; Jingjing Chen; Linhai Zhuo; Yang Jiao; Yu-Gang Jiang; |
15 | ORES: Open-Vocabulary Responsible Visual Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. |
Minheng Ni; Chenfei Wu; Xiaodong Wang; Shengming Yin; Lijuan Wang; Zicheng Liu; Nan Duan; |
16 | Generalized Planning in PDDL Domains with Pretrained Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. |
Tom Silver; Soham Dan; Kavitha Srinivas; Joshua B. Tenenbaum; Leslie Kaelbling; Michael Katz; |
17 | I-CEE: Tailoring Explanations of Image Classification Models to User Expertise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, there is relatively little emphasis on the user (the explainee) in this growing body of work and most XAI techniques generate "one-size-fits-all” explanations. To bridge this gap and achieve a step closer towards human-centered XAI, we present I-CEE, a framework that provides Image Classification Explanations tailored to User Expertise. |
Yao Rong; Peizhu Qian; Vaibhav Unhelkar; Enkelejda Kasneci; |
18 | Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations. |
Taylor Sorensen; Liwei Jiang; Jena D. Hwang; Sydney Levine; Valentina Pyatkin; Peter West; Nouha Dziri; Ximing Lu; Kavel Rao; Chandra Bhagavatula; Maarten Sap; John Tasioulas; Yejin Choi; |
19 | When to Show A Suggestion? Integrating Human Feedback in AI-Assisted Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a utility-theoretic framework to drive decisions about suggestions to display versus withhold. |
Hussein Mozannar; Gagan Bansal; Adam Fourney; Eric Horvitz; |
20 | Detecting and Preventing Hallucinations in Large Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a Multimodal Hallucination Detection Dataset that can be used to train and benchmark models for hallucination detection and prevention. |
Anisha Gunjal; Jihan Yin; Erhan Bas; |
21 | Beyond Attention: Breaking The Limits of Transformer Context Length with Recurrent Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. |
Aydar Bulatov; Yuri Kuratov; Yermek Kapushev; Mikhail Burtsev; |
22 | VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action. |
Raphael Schumann; Wanrong Zhu; Weixi Feng; Tsu-Jui Fu; Stefan Riezler; William Yang Wang; |
23 | Ced-NeRF: A Compact and Efficient Method for Dynamic Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, employing a hybrid representation for dynamic scenes results in overfitting due to fast convergence, which can result in artifacts (e.g., floaters, noisy geometric) on novel views. To address this, we propose a compact and efficient method for dynamic neural radiance fields, namely Ced-NeRF which only require a small number of additional parameters to construct a hybrid representation of dynamic NeRF. |
Youtian Lin; |
24 | NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the NavGPT, a purely LLM-based instruction-following navigation agent, to reveal the reasoning capability of GPT models in complex embodied scenes by performing zero-shot sequential action prediction for vision-and-language navigation (VLN). |
Gengze Zhou; Yicong Hong; Qi Wu; |
25 | Task Contamination: Language Models May Not Be Few-Shot Anymore Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large language models (LLMs) offer impressive performance in various zero-shot and few-shot tasks. |
Changmao Li; Jeffrey Flanigan; |
26 | DocFormerv2: Local Features for Document Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU). |
Srikar Appalaraju; Peng Tang; Qi Dong; Nishant Sankaran; Yichu Zhou; R. Manmatha; |
27 | Parallel Vertex Diffusion for Unified Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. |
Zesen Cheng; Kehan Li; Peng Jin; Siheng Li; Xiangyang Ji; Li Yuan; Chang Liu; Jie Chen; |
28 | MedSegDiff-V2: Diffusion-Based Medical Image Segmentation with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we discovered that simply combining these two models resulted in subpar performance. To effectively integrate these two cutting-edge techniques for the Medical image segmentation, we propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2. |
Junde Wu; Wei Ji; Huazhu Fu; Min Xu; Yueming Jin; Yanwu Xu; |
29 | MobileInst: Video Instance Segmentation on The Mobile Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address these issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. |
Renhong Zhang; Tianheng Cheng; Shusheng Yang; Haoyi Jiang; Shuai Zhang; Jiancheng Lyu; Xin Li; Xiaowen Ying; Dashan Gao; Wenyu Liu; Xinggang Wang; |
30 | Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs and uncover underlying patterns and relationships. |
Likang Wu; Zhaopeng Qiu; Zhi Zheng; Hengshu Zhu; Enhong Chen; |
31 | CLIM: Contrastive Language-Image Mosaic for Region Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach called Contrastive Language-Image Mosaic (CLIM), which leverages large-scale image-text pairs effectively for aligning region and text representations. |
Size Wu; Wenwei Zhang; Lumin Xu; Sheng Jin; Wentao Liu; Chen Change Loy; |
32 | Improving The Robustness of Knowledge-Grounded Dialogue Via Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. |
Jiaan Wang; JIanfeng Qu; Kexin Wang; Zhixu Li; Wen Hua; Ximing Li; An Liu; |
33 | AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection. |
Jingchun Zhou; Zongxin He; Kin-Man Lam; Yudong Wang; Weishi Zhang; Chunle Guo; Chongyi Li; |
34 | Mitigating Large Language Model Hallucinations Via Autonomous Knowledge Graph-Based Retrofitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods usually only use the user’s input to query the knowledge graph, thus failing to address the factual hallucination generated by LLMs during its reasoning process. To address this problem, this paper proposes Knowledge Graph-based Retrofitting (KGR), a new framework that incorporates LLMs with KGs to mitigate factual hallucination during the reasoning process by retrofitting the initial draft responses of LLMs based on the factual knowledge stored in KGs. |
Xinyan Guan; Yanjiang Liu; Hongyu Lin; Yaojie Lu; Ben He; Xianpei Han; Le Sun; |
35 | Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Visual Chain-of-thought Prompting (VCTP) for knowledge-based reasoning, which involves the interaction between visual content and natural language in an iterative step-by-step reasoning manner. |
Zhenfang Chen; Qinhong Zhou; Yikang Shen; Yining Hong; Zhiqing Sun; Dan Gutfreund; Chuang Gan; |
36 | Towards Reliable Learning in The Wild: Generalization and Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, I will present our research on building machine learning models that are highly generalizable and easily adaptable to different shifts. |
Huaxiu Yao; |
37 | Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera setting cannot produce suitable dense 3D features due to the projection nature and uncontrollable densification process. To resolve this problem, we propose to regulate intermediate dense 3D features with the help of volume rendering. |
Junkai Xu; Liang Peng; Haoran Cheng; Linxuan Xia; Qi Zhou; Dan Deng; Wei Qian; Wenxiao Wang; Deng Cai; |
38 | OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose OUTFOX, a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other’s output. |
Ryuto Koike; Masahiro Kaneko; Naoaki Okazaki; |
39 | AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. |
Zhaopeng Gu; Bingke Zhu; Guibo Zhu; Yingying Chen; Ming Tang; Jinqiao Wang; |
40 | TDeLTA: A Light-Weight and Robust Table Detection Method Based on Learning Text Arrangement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle this problem, we start from the essence of the table, which is a set of text arranged in rows and columns. Based on this, we propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA. |
Yang Fan; Xiangping Wu; Qingcai Chen; Heng Li; Yan Huang; Zhixiang Cai; Qitian Wu; |
41 | ProAgent: Building Proactive Cooperative Agents with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such reliance, however, constrains the agents’ capacity for strategic adaptation when cooperating with unfamiliar teammates, which becomes a significant challenge in zero-shot coordination scenarios. To address this challenge, we propose ProAgent, a novel framework that harnesses large language models (LLMs) to create proactive agents capable of dynamically adapting their behavior to enhance cooperation with teammates. |
Ceyao Zhang; Kaijie Yang; Siyi Hu; Zihao Wang; Guanghe Li; Yihang Sun; Cheng Zhang; Zhaowei Zhang; Anji Liu; Song-Chun Zhu; Xiaojun Chang; Junge Zhang; Feng Yin; Yitao Liang; Yaodong Yang; |
42 | Make RepVGG Greater Again: A Quantization-Aware Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, its quantization performance is usually too poor to deploy (e.g. more than 20% top-1 accuracy drop on ImageNet) when INT8 inference is desired. In this paper, we dive into the underlying mechanism of this failure, where the original design inevitably enlarges quantization error. |
Xiangxiang Chu; Liang Li; Bo Zhang; |
43 | Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision while being cost-efficient. |
Liang Li; Qingyuan Li; Bo Zhang; Xiangxiang Chu; |
44 | Bad Actor, Good Advisor: Exploring The Role of Large Language Models in Fake News Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the potential of LLMs in fake news detection. |
Beizhe Hu; Qiang Sheng; Juan Cao; Yuhui Shi; Yang Li; Danding Wang; Peng Qi; |
45 | Music Style Transfer with Time-Varying Inversion of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a music style transfer approach that effectively captures musical attributes using minimal data. |
Sifei Li; Yuxin Zhang; Fan Tang; Chongyang Ma; Weiming Dong; Changsheng Xu; |
46 | TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we observe that the multi-label classification results heavily rely on discriminative local features but are overlooked by CLIP. |
Yuqi Lin; Minghao Chen; Kaipeng Zhang; Hengjia Li; Mingming Li; Zheng Yang; Dongqin Lv; Binbin Lin; Haifeng Liu; Deng Cai; |
47 | ResDiff: Combining CNN and Diffusion Model for Image Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Adapting the Diffusion Probabilistic Model (DPM) for direct image super-resolution is wasteful, given that a simple Convolutional Neural Network (CNN) can recover the main low-frequency content. Therefore, we present ResDiff, a novel Diffusion Probabilistic Model based on Residual structure for Single Image Super-Resolution (SISR). |
Shuyao Shang; Zhengyang Shan; Guangxing Liu; LunQian Wang; XingHua Wang; Zekai Zhang; Jinglin Zhang; |
48 | SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike previous approaches, the key difference of this paper is to exploit the multi-view constraints directly from the explicit geometry of the neural surface, which can be used as effective regularization to jointly learn the neural surface and refine the camera poses. To build effective multi-view constraints, we introduce a fast differentiable on-surface intersection to generate on-surface points, and propose view-consistent losses on such differentiable points to regularize the neural surface learning. |
Shi-Sheng Huang; Zixin Zou; Yichi Zhang; Yan-Pei Cao; Ying Shan; |
49 | High-Fidelity 3D Head Avatars Reconstruction Through Spatially-Varying Expression Conditioned Neural Radiance Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent NeRF-based photo-realistic 3D head avatar methods achieve high-quality avatar rendering, they still encounter challenges retaining intricate facial expression details because they overlook the potential of specific expression variations at different spatial positions when conditioning the radiance field. Motivated by this observation, we introduce a novel Spatially-Varying Expression (SVE) conditioning. |
Minghan Qin; Yifan Liu; Yuelang Xu; Xiaochen Zhao; Yebin Liu; Haoqian Wang; |
50 | Synergistic Multiscale Detail Refinement Via Intrinsic Supervision for Underwater Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In multi-degradation encoder-decoder framework of SMDR-IS, we introduce the Bifocal Intrinsic-Context Attention Module (BICA). |
Dehuan Zhang; Jingchun Zhou; Chunle Guo; Weishi Zhang; Chongyi Li; |
51 | SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SparseGNV: a learning framework that incorporates 3D structures and image generative models to generate novel views with three modules. |
Weihao Cheng; Yan-Pei Cao; Ying Shan; |
52 | Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a systematic examination of various quantization schemes, model families, and quantization bit precision has been absent from the literature. In this paper, we conduct a comprehensive analysis of these factors by investigating the effects of PTQ on weight-only, activation-only, and weight-and-activation quantization using diverse methods such as round-to-nearest (RTN), GPTQ, ZeroQuant, and their variants. |
Zhewei Yao; Xiaoxia Wu; Cheng Li; Stephen Youn; Yuxiong He; |
53 | Towards Model Extraction Attacks in GAN-Based Image Translation Via Domain Shift Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a … |
Di Mi; Yanjun Zhang; Leo Yu Zhang; Shengshan Hu; Qi Zhong; Haizhuan Yuan; Shirui Pan; |
54 | Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, since the image-level prompts mask out continuous spatial details in the prompt-allocated region, it will suffer from inaccurate contextual information and limited domain knowledge extraction, particularly when dealing with dense prediction TTA problems. To overcome these challenges, we propose a novel Sparse Visual Domain Prompts (SVDP) approach, which applies minimal trainable parameters (e.g., 0.1%) to pixels across the entire image and reserves more spatial information of the input. |
Senqiao Yang; Jiarui Wu; Jiaming Liu; Xiaoqi Li; Qizhe Zhang; Mingjie Pan; Yulu Gan; Zehui Chen; Shanghang Zhang; |
55 | DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective. |
Xinghao Wang; Junliang He; Pengyu Wang; Yunhua Zhou; Tianxiang Sun; Xipeng Qiu; |
56 | Toward Open-Set Human Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The challenge lies in identifying completely new, out-of-domain relationships, as opposed to in-domain ones which have seen improvements in zero-shot HOI detection. To address this challenge, we introduce a simple Disentangled HOI Detection (DHD) model for detecting novel relationships by integrating an open-set object detector with a Visual Language Model (VLM). |
Mingrui Wu; Yuqi Liu; Jiayi Ji; Xiaoshuai Sun; Rongrong Ji; |
57 | Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach, Prot2Text, which predicts a protein’s function in a free text style, moving beyond the conventional binary or categorical classifications. |
Hadi Abdine; Michail Chatzianastasis; Costas Bouyioukos; Michalis Vazirgiannis; |
58 | Chain of Generation: Multi-Modal Gesture Synthesis Via Cascaded Conditional Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study aims to improve the generation of 3D gestures by utilizing multimodal information from human speech. |
Zunnan Xu; Yachao Zhang; Sicheng Yang; Ronghui Li; Xiu Li; |
59 | Towards Multi-Intent Spoken Language Understanding Via Hierarchical Attention and Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Multi-Intent SLU framework termed HAOT, which utilizes hierarchical attention to divide the scopes of each intent and applies optimal transport to achieve the mutual guidance between slot and intent. |
Xuxin Cheng; Zhihong Zhu; Hongxiang Li; Yaowei Li; Xianwei Zhuang; Yuexian Zou; |
60 | Spatial Transform Decoupling for Oriented Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we present a novel approach, termed Spatial Transform Decoupling (STD), providing a simple-yet-effective solution for oriented object detection with ViTs. |
Hongtian Yu; Yunjie Tian; Qixiang Ye; Yunfan Liu; |
61 | ExpeL: LLM Agents Are Experiential Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This scenario emphasizes the growing need for new methodologies that allow learning from agent experiences without requiring parametric updates. To address these problems, we introduce the Experiential Learning (ExpeL) agent. |
Andrew Zhao; Daniel Huang; Quentin Xu; Matthieu Lin; Yong-Jin Liu; Gao Huang; |
62 | Frequency-Aware Deepfake Detection: Improving Generalizability Through Frequency Space Domain Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, these detectors have exhibited a lack of proficiency in learning the frequency domain and tend to overfit to the artifacts present in the training data, leading to suboptimal performance on unseen sources. To address this issue, we introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors. |
Chuangchuang Tan; Yao Zhao; Shikui Wei; Guanghua Gu; Ping Liu; Yunchao Wei; |
63 | Mutual-Modality Adversarial Attack with Semantic Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, enhancing the transferability of the adversarial samples has become a crucial area of research, which heavily relies on selecting appropriate surrogate models. To address this challenge, we propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme. |
Jingwen Ye; Ruonan Yu; Songhua Liu; Xinchao Wang; |
64 | Dual-Window Multiscale Transformer for Hyperspectral Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Transformer-based HSI reconstruction method called dual-window multiscale Transformer (DWMT), which is a coarse-to-fine process, reconstructing the global properties of HSI with the long-range dependencies. |
Fulin Luo; Xi Chen; Xiuwen Gong; Weiwen Wu; Tan Guo; |
65 | UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a unified context-aware TTS framework called UniCATS, which is capable of both speech continuation and editing. |
Chenpeng Du; Yiwei Guo; Feiyu Shen; Zhijun Liu; Zheng Liang; Xie Chen; Shuai Wang; Hui Zhang; Kai Yu; |
66 | SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SQLdepth, a novel approach that can effectively learn fine-grained scene structure priors from ego-motion. |
Youhong Wang; Yunji Liang; Hao Xu; Shaohui Jiao; Hongkai Yu; |
67 | Chasing Fairness in Graphs: A GNN Architecture Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we aim to achieve fairness via a new GNN architecture. |
Zhimeng Jiang; Xiaotian Han; Chao Fan; Zirui Liu; Na Zou; Ali Mostafavi; Xia Hu; |
68 | LDMVFI: Video Frame Interpolation with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards developing perceptually-oriented VFI methods, in this work we propose latent diffusion model-based VFI, LDMVFI. |
Duolikun Danier; Fan Zhang; David Bull; |
69 | Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a "reasoning-aware" diagnosis framework that rationalizes the diagnostic process via prompt-based learning in a time- and labor-efficient manner, and learns to reason over the prompt-generated rationales. |
Taeyoon Kwon; Kai Tzu-iunn Ong; Dongjin Kang; Seungjun Moon; Jeong Ryong Lee; Dosik Hwang; Beomseok Sohn; Yongsik Sim; Dongha Lee; Jinyoung Yeo; |
70 | Stable Unlearnable Example: Enhancing The Robustness of Unlearnable Examples Via Stable Error-Minimizing Noise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, we found a negative correlation exists between the robustness of defensive noise and the protection performance, indicating defensive noise’s instability issue. Motivated by this, to further boost the robust unlearnable example, we introduce Stable Error-Minimizing noise (SEM), which trains the defensive noise against random perturbation instead of the time-consuming adversarial perturbation to improve the stability of defensive noise. |
Yixin Liu; Kaidi Xu; Xun Chen; Lichao Sun; |
71 | Evaluate Geometry of Radiance Fields with Low-Frequency Color Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key insight is that the better the geometry, the lower-frequency the computed color field. |
Qihang Fang; Yafei Song; Keqiang Li; Li Shen; Huaiyu Wu; Gang Xiong; Liefeng Bo; |
72 | Improving Factual Error Correction By Learning to Inject Factual Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the absence of paired data to train the masker makes accurately pinpointing factual errors within claims challenging. To mitigate this, we propose to improve FEC by Learning to Inject Factual Errors (LIFE), a three-step distantly supervised method: ‘mask-corrupt-correct’. |
Xingwei He; Qianru Zhang; A-Long Jin; Jun Ma; Yuan Yuan; Siu Ming Yiu; |
73 | SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This design suffers from data leakage problem and lacks the evaluation of subjective Q/A ability. In this paper, we propose SciEval, a comprehensive and multi-disciplinary evaluation benchmark to address these issues. |
Liangtai Sun; Yang Han; Zihan Zhao; Da Ma; Zhennan Shen; Baocai Chen; Lu Chen; Kai Yu; |
74 | Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the loose consistency regularization in SESS and restricted pseudo-label selection strategy in 3DIoUMatch lead to either low-quality supervision or a limited amount of pseudo labels. To address these issues, we present a novel Dual-Perspective Knowledge Enrichment approach named DPKE for semi-supervised 3D object detection. |
Yucheng Han; Na Zhao; Weiling Chen; Keng Teck Ma; Hanwang Zhang; |
75 | Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. |
Yantian Zha; Lin Guan; Subbarao Kambhampati; |
76 | ‘Why Didn’t You Allocate This Task to Them?’ Negotiation-Aware Task Allocation and Contrastive Explanation Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design an Artificially Intelligent Task Allocator (AITA) that proposes a task allocation for a team of humans. |
Zahra Zahedi; Sailik Sengupta; Subbarao Kambhampati; |
77 | Diverse and Aligned Audio-to-Video Generation Via Text-to-Video Model Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. |
Guy Yariv; Itai Gat; Sagie Benaim; Lior Wolf; Idan Schwartz; Yossi Adi; |
78 | OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Large language models (LLMs) with hundreds of billions of parameters require powerful server-grade GPUs for inference, limiting their practical deployment. To address this challenge, we introduce the outlier-aware weight quantization (OWQ) method, which aims to minimize LLM’s footprint through low-precision representation. |
Changhun Lee; Jungyu Jin; Taesu Kim; Hyungjun Kim; Eunhyeok Park; |
79 | Zhongjing: Enhancing The Chinese Medical Capabilities of Large Language Model Through Expert Feedback and Real-World Multi-Turn Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Zhongjing, the first Chinese medical LLaMA-based LLM that implements an entire training pipeline from continuous pre-training, SFT, to Reinforcement Learning from Human Feedback (RLHF). |
Songhua Yang; Hanjie Zhao; Senbin Zhu; Guangyu Zhou; Hongfei Xu; Yuxiang Jia; Hongying Zan; |
80 | LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain logs. To alleviate this issue, we propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains, where we establish a two-stage process including the pre-training and adapter-based tuning stage. |
Hongcheng Guo; Jian Yang; Jiaheng Liu; Jiaqi Bai; Boyang Wang; Zhoujun Li; Tieqiao Zheng; Bo Zhang; Junran Peng; Qi Tian; |
81 | Fluctuation-Based Adaptive Structured Pruning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel retraining-free structured pruning framework for LLMs, named FLAP (FLuctuation-based Adaptive Structured Pruning). |
Yongqi An; Xu Zhao; Tao Yu; Ming Tang; Jinqiao Wang; |
82 | Improving Audio-Visual Segmentation with Bidirectional Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, the interconnections between different modalities tend to be overlooked in audio-visual modeling. In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework. |
Dawei Hao; Yuxin Mao; Bowen He; Xiaodong Han; Yuchao Dai; Yiran Zhong; |
83 | DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency Via Efficient Data Sampling and Routing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. |
Conglong Li; Zhewei Yao; Xiaoxia Wu; Minjia Zhang; Connor Holmes; Cheng Li; Yuxiong He; |
84 | Class-Attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes. |
Xuechen Zhang; Mingchen Li; Jiasi Chen; Christos Thrampoulidis; Samet Oymak; |
85 | Generating Images of Rare Concepts Using Pre-trained Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Text-to-image diffusion models can synthesize high quality images, but they have various limitations. Here we highlight a common failure mode of these models, namely, generating uncommon concepts and structured concepts like hand palms. |
Dvir Samuel; Rami Ben-Ari; Simon Raviv; Nir Darshan; Gal Chechik; |
86 | KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities for a comprehensive understanding of multimodal tasks. |
Debjyoti Mondal; Suraj Modi; Subhadarshi Panda; Rituraj Singh; Godawari Sudhakar Rao; |
87 | Towards Real-World Test-Time Adaptation: Tri-net Self-Training with Balanced Normalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first complement the existing real-world TTA protocol with a globally class imbalanced testing set. |
Yongyi Su; Xun Xu; Kui Jia; |
88 | PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it often suffers from signal decay and error accumulation due to the long timing paths in large-scale industrial circuits. To address these challenges, we propose a two-stage approach. |
Ruizhe Zhong; Junjie Ye; Zhentao Tang; Shixiong Kai; Mingxuan Yuan; Jianye Hao; Junchi Yan; |
89 | DreamStyler: Paint By Style Inversion with Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce DreamStyle, a novel framework designed for artistic image synthesis, proficient in both text-to-image synthesis and style transfer. |
Namhyuk Ahn; Junsoo Lee; Chunggi Lee; Kunhee Kim; Daesik Kim; Seung-Hun Nam; Kibeom Hong; |
90 | Polyper: Boundary Sensitive Polyp Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new boundary sensitive framework for polyp segmentation, termed Polyper.Our method is motivated by a clinical approach that seasoned medical practitioners often leverage the inherent features of interior polyp regions to tackle blurred boundaries.Inspired by this, we propose to explicitly leverages boundary regions to bolster the model’s boundary discrimination capability while minimizing computational resource wastage. |
Hao Shao; Yang Zhang; Qibin Hou; |
91 | Comprehensive View Embedding Learning for Single-Cell Multimodal Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Result: In this study, we propose CoVEL, a deep learning method for unsupervised integration of single-cell multimodal data. |
Zhenchao Tang; Jiehui Huang; Guanxing Chen; Calvin Yu-Chian Chen; |
92 | Frozen CLIP Transformer Is An Efficient Point Cloud Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP transformer. |
Xiaoshui Huang; Zhou Huang; Sheng Li; Wentao Qu; Tong He; Yuenan Hou; Yifan Zuo; Wanli Ouyang; |
93 | ODTrack: Online Dense Temporal Token Learning for Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, they can only interact independently within each image-pair and establish limited temporal correlations. To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named ODTrack, which densely associates the contextual relationships of video frames in an online token propagation manner. |
Yaozong Zheng; Bineng Zhong; Qihua Liang; Zhiyi Mo; Shengping Zhang; Xianxian Li; |
94 | Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. |
Zizhao Wang; Caroline Wang; Xuesu Xiao; Yuke Zhu; Peter Stone; |
95 | Completing Priceable Committees: Utilitarian and Representation Guarantees for Proportional Multiwinner Voting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the impact of imposing much more demanding proportionality axioms. |
Markus Brill; Jannik Peters; |
96 | Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the selective sparse activation principle of context gating in biological systems, we present a novel SNN model with selective activation to achieve continual learning. |
Jiangrong Shen; Wenyao Ni; Qi Xu; Huajin Tang; |
97 | How to Protect Copyright Data in Optimization of Large Language Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we observe that large language model training and optimization can be seen as a softmax regression problem. |
Timothy Chu; Zhao Song; Chiwun Yang; |
98 | PointAttN: You Only Need Attention for Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for point cloud completion with implicit local region partition. |
Jun Wang; Ying Cui; Dongyan Guo; Junxia Li; Qingshan Liu; Chunhua Shen; |
99 | TEILP: Time Prediction Over Knowledge Graphs Via Logical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they often fall short in capturing essential temporal relationships such as order and distance. In this paper, we propose TEILP, a logical reasoning framework that naturaly integrates such temporal elements into knowledge graph predictions. |
Siheng Xiong; Yuan Yang; Ali Payani; James C Kerce; Faramarz Fekri; |
100 | Omni-Kernel Network for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop an efficient convolutional network for image restoration by enhancing multi-scale representation learning. |
Yuning Cui; Wenqi Ren; Alois Knoll; |
101 | Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to explicitly retrieve knowledge of seen primitives for compositional zero-shot learning. |
Chenchen Jing; Yukun Li; Hao Chen; Chunhua Shen; |
102 | MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building upon MmAP, we develop an innovative multi-task prompt learning framework. |
Yi Xin; Junlong Du; Qiang Wang; Ke Yan; Shouhong Ding; |
103 | MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals, e.g., text and single-frame poses, for generating consecutive human motions by treating multimodal signals as special input tokens in large language models (LLMs). |
Yaqi Zhang; Di Huang; Bin Liu; Shixiang Tang; Yan Lu; Lu Chen; Lei Bai; Qi Chu; Nenghai Yu; Wanli Ouyang; |
104 | ScanERU: Interactive 3D Visual Grounding Based on Embodied Reference Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their effectiveness, existing methods suffer from the difficulty of low recognition accuracy in cases of multiple adjacent objects with similar appearance. To address this issue, this work intuitively introduces the human-robot interaction as a cue to facilitate the development of 3D visual grounding. |
Ziyang Lu; Yunqiang Pei; Guoqing Wang; Peiwei Li; Yang Yang; Yinjie Lei; Heng Tao Shen; |
105 | When Do Program-of-Thought Works for Reasoning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score CIRS, which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. |
Zhen Bi; Ningyu Zhang; Yinuo Jiang; Shumin Deng; Guozhou Zheng; Huajun Chen; |
106 | Improving Transferability for Cross-Domain Trajectory Prediction Via Neural Stochastic Differential Equation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, the proficient performance of models trained on large-scale datasets has limited transferability on other small-size datasets, bounding the utilization of existing large-scale datasets. To address this limitation, we propose a method based on continuous and stochastic representations of Neural Stochastic Differential Equations (NSDE) for alleviating discrepancies due to data acquisition strategy. |
Daehee Park; Jaewoo Jeong; Kuk-Jin Yoon; |
107 | Semi-supervised 3D Object Detection with PatchTeacher and PillarMix Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high-quality pseudo labels for the student. |
Xiaopei Wu; Liang Peng; Liang Xie; Yuenan Hou; Binbin Lin; Xiaoshui Huang; Haifeng Liu; Deng Cai; Wanli Ouyang; |
108 | Conformal Prediction Regions for Time Series Using Linear Complementarity Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In fact, to obtain prediction regions over T time steps with confidence 1–delta, previous works require that each individual prediction region is valid with confidence 1–delta/T. We propose an optimization-based method for reducing this conservatism to enable long horizon planning and verification when using learning-enabled time series predictors. |
Matthew Cleaveland; Insup Lee; George J. Pappas; Lars Lindemann; |
109 | Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our research, we view the iterative way of updating molecule conformations in diffusion process is consistent with molecular dynamics and introduce a novel molecule generation method named Geometric-Facilitated Molecular Diffusion (GFMDiff). |
Can Xu; Haosen Wang; Weigang Wang; Pengfei Zheng; Hongyang Chen; |
110 | Fine-Grained Knowledge Selection and Restoration for Non-exemplar Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This strict restriction enlarges the difficulty of alleviating catastrophic forgetting since all techniques can only be applied to current task data. Considering this challenge, we propose a novel framework of fine-grained knowledge selection and restoration. |
Jiang-Tian Zhai; Xialei Liu; Lu Yu; Ming-Ming Cheng; |
111 | Generalizable Sleep Staging Via Multi-Level Domain Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. |
Jiquan Wang; Sha Zhao; Haiteng Jiang; Shijian Li; Tao Li; Gang Pan; |
112 | A Non-parametric Graph Clustering Framework for Multi-View Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in the article we are devoted to getting rid of hyper-parameters, and devise a non-parametric graph clustering (NpGC) framework to more practically partition multi-view data. |
Shengju Yu; Siwei Wang; Zhibin Dong; Wenxuan Tu; Suyuan Liu; Zhao Lv; Pan Li; Miao Wang; En Zhu; |
113 | VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. |
Jialu Li; Aishwarya Padmakumar; Gaurav Sukhatme; Mohit Bansal; |
114 | ConditionVideo: Training-Free Condition-Guided Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text, by leveraging the power of off-the-shelf text-to-image generation methods (e.g., Stable Diffusion). |
Bo Peng; Xinyuan Chen; Yaohui Wang; Chaochao Lu; Yu Qiao; |
115 | Critic-Guided Decision Transformer for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fortunately, value-based methods offer a solution by leveraging a value function to approximate the expected returns, thereby addressing the inconsistency effectively. Building upon these insights, we propose a novel approach, termed the Critic-Guided Decision Transformer (CGDT), which combines the predictability of long-term returns from value-based methods with the trajectory modeling capability of the Decision Transformer. |
Yuanfu Wang; Chao Yang; Ying Wen; Yu Liu; Yu Qiao; |
116 | GOODAT: Towards Test-Time Graph Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, this paper introduces a method to detect Graph Out-of-Distribution At Test-time (namely GOODAT), a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture. |
Luzhi Wang; Dongxiao He; He Zhang; Yixin Liu; Wenjie Wang; Shirui Pan; Di Jin; Tat-Seng Chua; |
117 | Making Natural Language Reasoning Explainable and Faithful Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, we will focus on (1) our design of leveraging structured information (that is grounded to the context), for the explainable complex question answering and reasoning; (2) our multi-module interpretable framework for inductive reasoning, which conducts step-wise faithful reasoning with iterative feedback. |
Xinya Du; |
118 | ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a.k.a. personalized T2I), we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, and 33K composite text prompts. |
Maitreya Patel; Tejas Gokhale; Chitta Baral; Yezhou Yang; |
119 | A Dynamic Learning Method Towards Realistic Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Then the entire dataset is split into a training set and a test set, with the latter containing images of unseen concepts, unseen compositions, unseen domains as well as their combinations. Following this, we show that the visual-semantic relationship changes on unseen images, leading us to construct two dynamic modulators to adapt the visual features and composition prototypes in accordance with the input image. |
Xiaoming Hu; Zilei Wang; |
120 | Deciphering Compatibility Relationships with Textual Descriptions Via Extraction and Explanation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Present models, while making strides in this area, still occasionally fall short, offering explanations that can be elementary and repetitive. This work aims to address these shortcomings by introducing the Pair Fashion Explanation (PFE) dataset, a unique resource that has been curated to illuminate these compatibility relationships. |
Yu Wang; Zexue He; Zhankui He; Hao Xu; Julian McAuley; |
121 | UMIE: Unified Multimodal Information Extraction with Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current MIE methods often resort to using task-specific model structures, which results in limited generalizability across tasks and underutilizes shared knowledge across MIE tasks. To address these issues, we propose UMIE, a unified multimodal information extractor to unify three MIE tasks as a generation problem using instruction tuning, being able to effectively extract both textual and visual mentions. |
Lin Sun; Kai Zhang; Qingyuan Li; Renze Lou; |
122 | Variance-Insensitive and Target-Preserving Mask Refinement for Interactive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs. |
Chaowei Fang; Ziyin Zhou; Junye Chen; Hanjing Su; Qingyao Wu; Guanbin Li; |
123 | Continual Relation Extraction Via Sequential Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Continual Relation Extraction via Sequential Multi-task Learning (CREST), a novel CRE approach built upon a tailored Multi-task Learning framework for continual learning. |
Thanh-Thien Le; Manh Nguyen; Tung Thanh Nguyen; Linh Ngo Van; Thien Huu Nguyen; |
124 | Mastering Context-to-Label Representation Transformation for Event Causality Identification with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, a notable drawback of existing ECI models stems from their reliance on simple feed-forward networks to handle the complex context-to-label representation transformation process, which might require drastic changes in the representations to hinder the learning process. To overcome this issue, our work introduces a novel method for ECI where, instead abrupt transformations, event context representations are gradually updated to achieve effective label representations. |
Hieu Man; Franck Dernoncourt; Thien Huu Nguyen; |
125 | Iterative Token Evaluation and Refinement for Real-World Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an Iterative Token Evaluation and Refinement (ITER) framework for RWSR, which utilizes a discrete diffusion model operating in the discrete token representation space, i.e., indexes of features extracted from a VQGAN codebook pre-trained with high-quality (HQ) images. |
Chaofeng Chen; Shangchen Zhou; Liang Liao; Haoning Wu; Wenxiu Sun; Qiong Yan; Weisi Lin; |
126 | HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HGPROMPT, a novel pre-training and prompting framework to unify not only pre-training and downstream tasks but also homogeneous and heterogeneous graphs via a dual-template design. |
Xingtong Yu; Yuan Fang; Zemin Liu; Xinming Zhang; |
127 | Federated Graph Learning Under Domain Shift with Generalizable Prototypes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to improve the performance of the global model from different perspectives, we propose a novel framework called Federated Graph Learning with Generalizable Prototypes (FGGP). |
Guancheng Wan; Wenke Huang; Mang Ye; |
128 | LLMEval: A Preliminary Study on How to Evaluate Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze evaluation methods by comparing various criteria with both manual and automatic evaluation, utilizing onsite, crowd-sourcing, public annotators and GPT-4, with different scoring methods and ranking systems. |
Yue Zhang; Ming Zhang; Haipeng Yuan; Shichun Liu; Yongyao Shi; Tao Gui; Qi Zhang; Xuanjing Huang; |
129 | A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining. |
Yinmin Zhang; Jie Liu; Chuming Li; Yazhe Niu; Yaodong Yang; Yu Liu; Wanli Ouyang; |
130 | Modeling Continuous Motion for 3D Point Cloud Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches have primarily relied on appearance matching or motion modeling within only two successive frames, thereby overlooking the long-range continuous motion property of objects in 3D space. To address this issue, this paper presents a novel approach that views each tracklet as a continuous stream: at each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank, enabling efficient exploitation of sequential information. |
Zhipeng Luo; Gongjie Zhang; Changqing Zhou; Zhonghua Wu; Qingyi Tao; Lewei Lu; Shijian Lu; |
131 | Brush Your Text: Synthesize Any Scene Text on Images Via Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we propose Diff-Text, which is a training-free scene text generation framework for any language. |
Lingjun Zhang; Xinyuan Chen; Yaohui Wang; Yue Lu; Yu Qiao; |
132 | UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While these strategies are effective, they also introduce a considerable computational burden, posing challenges for real-time MOT. In response to this, we introduce UCMCTrack, a novel motion model-based tracker robust to camera movements. |
Kefu Yi; Kai Luo; Xiaolei Luo; Jiangui Huang; Hao Wu; Rongdong Hu; Wei Hao; |
133 | Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies obtained through maximizing specific diversity metrics. |
Muhammad Rahman; Jiaxun Cui; Peter Stone; |
134 | Reward (Mis)design for Autonomous Driving (Abstract Reprint) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article considers the problem of diagnosing certain common errors in reward design. |
W. Bradley Knox; Alessandro Allievi; Holger Banzhaf; Felix Schmitt; Peter Stone; |
135 | Temporally and Distributionally Robust Optimization for Cold-Start Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, existing DRO methods face an inconsistency issue: the worse-case warm-start items emphasized during DRO training might not align well with the cold-start item distribution. To capture the temporal feature shifts and combat this inconsistency issue, we propose a novel temporal DRO with new optimization objectives, namely, 1) to integrate a worst-case factor to improve the worst-case performance, and 2) to devise a shifting factor to capture the shifting trend of item features and enhance the optimization of the potentially popular groups in cold-start items. |
Xinyu Lin; Wenjie Wang; Jujia Zhao; Yongqi Li; Fuli Feng; Tat-Seng Chua; |
136 | FMRNet: Image Deraining Via Frequency Mutual Revision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to investigate the potential relationships among rain-free and residue components at the frequency domain, forming a frequency mutual revision network (FMRNet) for image deraining. |
Kui Jiang; Junjun Jiang; Xianming Liu; Xin Xu; Xianzheng Ma; |
137 | FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets. |
Dongmei Zhang; Chang Li; Renrui Zhang; Shenghao Xie; Wei Xue; Xiaodong Xie; Shanghang Zhang; |
138 | MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. |
Ernie Chu; Tzuhsuan Huang; Shuo-Yen Lin; Jun-Cheng Chen; |
139 | Boosting Residual Networks with Group Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this manuscript, we mitigate the significant knowledge distillation gap caused by using the same kind of supervision and advocate leveraging the subnets to provide diverse knowledge. |
Shengji Tang; Peng Ye; Baopu Li; Weihao Lin; Tao Chen; Tong He; Chong Yu; Wanli Ouyang; |
140 | Online Boosting Adaptive Learning Under Concept Drift for Multistream Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the growing research outcomes in this area, there has been a notable oversight regarding the temporal dynamic relationships between these streams, leading to the issue of negative transfer arising from irrelevant data. In this paper, we propose a novel Online Boosting Adaptive Learning (OBAL) method that effectively addresses this limitation by adaptively learning the dynamic correlation among different streams. |
En Yu; Jie Lu; Bin Zhang; Guangquan Zhang; |
141 | A Diffusion Model with State Estimation for Degradation-Blind Inverse Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a learnable state-estimator-based diffusion model to incorporate the measurements into the reconstruction process. |
Liya Ji; Zhefan Rao; Sinno Jialin Pan; Chenyang Lei; Qifeng Chen; |
142 | A Dynamic GCN with Cross-Representation Distillation for Event-Based Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the second problem, we introduce a novel learning framework called cross-representation distillation (CRD), which leverages the dense representation of events as a cross-representation auxiliary to provide additional supervision and prior knowledge for the event graph. |
Yongjian Deng; Hao Chen; Youfu Li; |
143 | Everything2Motion: Synchronizing Diverse Inputs Via A Unified Framework for Human Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional methodologies, typically reliant on single modality inputs like text or audio, employ modality-specific model frameworks, posing challenges for unified model deployment and application. To address this, we propose Everything2Motion, a unified model framework. |
Zhaoxin Fan; Longbin Ji; Pengxin Xu; Fan Shen; Kai Chen; |
144 | PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While prior works succeeded in controlling either semantic or metrical aspects of poetry generation, simultaneously addressing both remains a challenge. In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry to tackle such challenges. |
Zhiyuan Hu; Chumin Liu; Yue Feng; Anh Tuan Luu; Bryan Hooi; |
145 | T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning Via Large Language Model Signals for Science Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. |
Lei Wang; Yi Hu; Jiabang He; Xing Xu; Ning Liu; Hui Liu; Heng Tao Shen; |
146 | Sample-and-Bound for Non-convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose new sampling-based methods for non-convex optimization that adapts Monte Carlo Tree Search (MCTS) to improve efficiency. |
Yaoguang Zhai; Zhizhen Qin; Sicun Gao; |
147 | Optical Flow for Spike Camera with Hierarchical Spatial-Temporal Spike Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Hierarchical Spatial-Temporal (HiST) fusion module for spike representation to pursue reliable feature matching and develop a robust optical flow network, dubbed as HiST-SFlow. |
Rui Zhao; Ruiqin Xiong; Jian Zhang; Xinfeng Zhang; Zhaofei Yu; Tiejun Huang; |
148 | PNeRFLoc: Visual Localization with Point-Based Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel visual localization framework, i.e., PNeRFLoc, based on a unified point-based representation. |
Boming Zhao; Luwei Yang; Mao Mao; Hujun Bao; Zhaopeng Cui; |
149 | Negative Pre-aware for Noisy Cross-Modal Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel Negative Pre-aware Cross-modal (NPC) matching solution for large visual-language model fine-tuning on noisy downstream tasks. |
Xu Zhang; Hao Li; Mang Ye; |
150 | TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve data efficiency in VLP, we propose Text-aware Image Mixing (TiMix), which integrates mix-based data augmentation techniques into SMCL, yielding significant performance improvements without significantly increasing computational overhead. |
Chaoya Jiang; Wei Ye; Haiyang Xu; Qinghao Ye; Ming Yan; Ji Zhang; Shikun Zhang; |
151 | MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prior research on deep learning-based emotional music generation has rarely explored the contribution of different musical elements to emotions, let alone the deliberate manipulation of these elements to alter the emotion of music, which is not conducive to fine-grained element-level control over emotions. To address this gap, we present a novel approach employing musical element-based regularization in the latent space to disentangle distinct elements, investigate their roles in distinguishing emotions, and further manipulate elements to alter musical emotions. |
Shulei Ji; Xinyu Yang; |
152 | Summarizing Stream Data for Memory-Constrained Online Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to Summarize the knowledge from the Stream Data (SSD) into more informative samples by distilling the training characteristics of real images. |
Jianyang Gu; Kai Wang; Wei Jiang; Yang You; |
153 | On The Affinity, Rationality, and Diversity of Hierarchical Topic Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing work struggles with producing topic hierarchies of low affinity, rationality, and diversity, which hampers document understanding. To overcome these challenges, we in this paper propose Transport Plan and Context-aware Hierarchical Topic Model (TraCo). |
Xiaobao Wu; Fengjun Pan; Thong Nguyen; Yichao Feng; Chaoqun Liu; Cong-Duy Nguyen; Anh Tuan Luu; |
154 | TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, there is a lack of decent text descriptions in current ReID benchmarks. To address these issues, in this work, we propose a novel one-stage text-free CLIP-based learning framework named TF-CLIP for video-based person ReID. |
Chenyang Yu; Xuehu Liu; Yingquan Wang; Pingping Zhang; Huchuan Lu; |
155 | Learning Fair Policies for Multi-Stage Selection Problems from Observational Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity. |
Zhuangzhuang Jia; Grani A. Hanasusanto; Phebe Vayanos; Weijun Xie; |
156 | ProCC: Progressive Cross-Primitive Compatibility for Open-World Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the primitive prediction approach and propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks. |
Fushuo Huo; Wenchao Xu; Song Guo; Jingcai Guo; Haozhao Wang; Ziming Liu; Xiaocheng Lu; |
157 | Label-Efficient Few-Shot Semantic Segmentation with Unsupervised Meta-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to alleviate the training cost for few-shot semantic segmentation (FSS) models. |
Jianwu Li; Kaiyue Shi; Guo-Sen Xie; Xiaofeng Liu; Jian Zhang; Tianfei Zhou; |
158 | Learning Subject-Aware Cropping By Outpainting Professional Photos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a weakly-supervised approach (GenCrop) to learn what makes a high-quality, subject-aware crop from professional stock images. |
James Hong; Lu Yuan; Michaël Gharbi; Matthew Fisher; Kayvon Fatahalian; |
159 | FAVOR: Full-Body AR-Driven Virtual Object Rearrangement Guided By Instruction Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Bridging a gap in the field, our study introduces FAVOR: a novel dataset for Full-body AR-driven Virtual Object Rearrangement that uniquely employs motion capture systems and AR eyeglasses. |
Kailin Li; Lixin Yang; Zenan Lin; Jian Xu; Xinyu Zhan; Yifei Zhao; Pengxiang Zhu; Wenxiong Kang; Kejian Wu; Cewu Lu; |
160 | Delving Into Multimodal Prompting for Fine-Grained Visual Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model. |
Xin Jiang; Hao Tang; Junyao Gao; Xiaoyu Du; Shengfeng He; Zechao Li; |
161 | Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel framework termed Adaptive Uncertainty-based Learning (AUL) for text-based person retrieval from the uncertainty perspective. |
Shenshen Li; Chen He; Xing Xu; Fumin Shen; Yang Yang; Heng Tao Shen; |
162 | LION: Implicit Vision Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a light-weight prompt framework named impLicit vIsion prOmpt tuNing (LION), which is motivated by deep implicit models with stable low memory costs for various complex tasks. |
Haixin Wang; Jianlong Chang; Yihang Zhai; Xiao Luo; Jinan Sun; Zhouchen Lin; Qi Tian; |
163 | Decoding AI’s Nudge: A Unified Framework to Predict Human Behavior in AI-Assisted Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a computational framework that can provide an interpretable characterization of the influence of different forms of AI assistance on decision makers in AI-assisted decision making. |
Zhuoyan Li; Zhuoran Lu; Ming Yin; |
164 | Towards Compact 3D Representations Via Point Feature Enhancement Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To learn compact 3D representations, we propose a simple yet effective Point Feature Enhancement Masked Autoencoders (Point-FEMAE), which mainly consists of a global branch and a local branch to capture latent semantic features. |
Yaohua Zha; Huizhen Ji; Jinmin Li; Rongsheng Li; Tao Dai; Bin Chen; Zhi Wang; Shu-Tao Xia; |
165 | Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models Through Intervention Without Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. |
Zhongzhi Chen; Xingwu Sun; Xianfeng Jiao; Fengzong Lian; Zhanhui Kang; Di Wang; Chengzhong Xu; |
166 | PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction (PSC-CPI), which captures the dependencies between protein sequences and structures through both intra-modality and cross-modality contrasting. |
Lirong Wu; Yufei Huang; Cheng Tan; Zhangyang Gao; Bozhen Hu; Haitao Lin; Zicheng Liu; Stan Z. Li; |
167 | HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing methods mainly focus on modeling long-term dependencies across patches, while paying little attention to other dimensions (e.g., short-term dependencies within patches and complex interactions among cross-variavle patches). To address these challenges, we propose a pure MLP-based HDMixer, aiming to acquire patches with richer semantic information and efficiently modeling hierarchical interactions. |
Qihe Huang; Lei Shen; Ruixin Zhang; Jiahuan Cheng; Shouhong Ding; Zhengyang Zhou; Yang Wang; |
168 | DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DI-V2X, that aims to learn Domain-Invariant representations through a new distillation framework to mitigate the domain discrepancy in the context of V2X 3D object detection. |
Xiang Li; Junbo Yin; Wei Li; Chengzhong Xu; Ruigang Yang; Jianbing Shen; |
169 | Embracing Language Inclusivity and Diversity in CLIP Through Continual Language Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, there is an increasing interest in developing multilingual VL models via a joint-learning setup, which, however, could be unrealistic due to expensive costs and data availability. In this work, we propose to extend VL-PTMs’ language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF). |
Bang Yang; Yong Dai; Xuxin Cheng; Yaowei Li; Asif Raza; Yuexian Zou; |
170 | UniAP: Towards Universal Animal Perception in Vision Via Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce UniAP, a novel Universal Animal Perception model that leverages few-shot learning to enable cross-species perception among various visual tasks. |
Meiqi Sun; Zhonghan Zhao; Wenhao Chai; Hanjun Luo; Shidong Cao; Yanting Zhang; Jenq-Neng Hwang; Gaoang Wang; |
171 | SimDistill: Simulated Multi-Modal Distillation for BEV 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although distilling precise 3D geometry knowledge from LiDAR data could help tackle this challenge, the benefits of LiDAR information could be greatly hindered by the significant modality gap between different sensory modalities. To address this issue, we propose a Simulated multi-modal Distillation (SimDistill) method by carefully crafting the model architecture and distillation strategy. |
Haimei Zhao; Qiming Zhang; Shanshan Zhao; Zhe Chen; Jing Zhang; Dacheng Tao; |
172 | Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods parameterize the image/text instances as deterministic embeddings and do not explicitly consider the inherent uncertainty in pedestrian images and their textual descriptions, leading to limited image-text relationship expression and semantic alignment. To address the above problem, in this paper, we propose a novel method that unifies multi-modal uncertainty modeling and semantic alignment for TI-ReID. |
Zhiwei Zhao; Bin Liu; Yan Lu; Qi Chu; Nenghai Yu; |
173 | Progressive Feature Self-Reinforcement for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A typical manifestation is the diminished precision on object boundaries, leading to deteriorated accuracy of WSSS. To alleviate this issue, we propose to adaptively partition the image content into certain regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing. |
Jingxuan He; Lechao Cheng; Chaowei Fang; Zunlei Feng; Tingting Mu; Mingli Song; |
174 | One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we discover that the gradient error will lead to an unexpected zig-zagging-like issue in the gradient descent learning procedures, where the gradient directions rapidly oscillate or zig-zag, and such issue seriously slows down the model convergence. Accordingly, this paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction to defy this issue. |
Lianbo Ma; Yuee Zhou; Jianlun Ma; Guo Yu; Qing Li; |
175 | Adaptive Graph Learning for Multimodal Conversational Emotion Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, challenges arise in addressing cross-modal interactions that involve content with conflicting emotions across different modalities. To address this issue, we introduce an adaptive interactive graph network (IGN) called AdaIGN that employs the Gumbel Softmax trick to adaptively select nodes and edges, enhancing intra- and cross-modal interactions. |
Geng Tu; Tian Xie; Bin Liang; Hongpeng Wang; Ruifeng Xu; |
176 | Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design a novel two-stage training scheme that can utilize easily obtained datasets (i.e., image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos. |
Yue Ma; Yingqing He; Xiaodong Cun; Xintao Wang; Siran Chen; Xiu Li; Qifeng Chen; |
177 | Graph Neural Prompting with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. |
Yijun Tian; Huan Song; Zichen Wang; Haozhu Wang; Ziqing Hu; Fang Wang; Nitesh V. Chawla; Panpan Xu; |
178 | Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a dynamic semantic-based graph convolution network (DS-GCN) for skeleton-based human action recognition, where the joints and edge types were encoded in the skeleton topology in an implicit way. |
Jianyang Xie; Yanda Meng; Yitian Zhao; Anh Nguyen; Xiaoyun Yang; Yalin Zheng; |
179 | Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The existing adaptation methods do not consider the missing knowledge, which may lead to crucial task-related knowledge for the downstream tasks being ignored. To address this issue, we propose a new adaptation framework called Data Adaptive Traceback (DAT). |
Wenshuo Peng; Kaipeng Zhang; Yue Yang; Hao Zhang; Yu Qiao; |
180 | Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we theoretically establish that the number of distinguishable eigenvalues plays a pivotal role in determining the expressive power of spectral graph neural networks. In light of this observation, we propose an eigenvalue correction strategy that can free polynomial filters from the constraints of repeated eigenvalue inputs. |
Kangkang Lu; Yanhua Yu; Hao Fei; Xuan Li; Zixuan Yang; Zirui Guo; Meiyu Liang; Mengran Yin; Tat-Seng Chua; |
181 | DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DeepAccident, a large-scale dataset generated via a realistic simulator containing diverse accident scenarios that frequently occur in real-world driving. |
Tianqi Wang; Sukmin Kim; Ji Wenxuan; Enze Xie; Chongjian Ge; Junsong Chen; Zhenguo Li; Ping Luo; |
182 | SimCalib: Graph Neural Network Calibration Based on Similarity Between Nodes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we shed light on the relationship between GNN calibration and nodewise similarity via theoretical analysis. |
Boshi Tang; Zhiyong Wu; Xixin Wu; Qiaochu Huang; Jun Chen; Shun Lei; Helen Meng; |
183 | Attribute-Missing Graph Clustering Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, they are not tailored for clustering tasks, leading to inferior clustering results. To solve these issues, we propose a novel Attribute-Missing Graph Clustering (AMGC) method to alternately promote clustering and imputation in a unified framework, where we iteratively produce the clustering-enhanced nearest neighbor information to conduct the data imputation process and utilize the imputed information to implicitly refine the clustering distribution through model optimization. |
Wenxuan Tu; Renxiang Guan; Sihang Zhou; Chuan Ma; Xin Peng; Zhiping Cai; Zhe Liu; Jieren Cheng; Xinwang Liu; |
184 | AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new quantization strategy based on Auxiliary Queries for DETR (AQ-DETR), aiming to enhance the capacity of quantized queries. |
Runqi Wang; Huixin Sun; Linlin Yang; Shaohui Lin; Chuanjian Liu; Yan Gao; Yao Hu; Baochang Zhang; |
185 | HISR: Hybrid Implicit Surface Representation for Photorealistic 3D Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches, however, either represent objects as implicit surface functions or neural volumes and still struggle to recover shapes with heterogeneous materials, in particular human skin, hair or clothes. To this aim, we present a new hybrid implicit surface representation to model human shapes. |
Angtian Wang; Yuanlu Xu; Nikolaos Sarafianos; Robert Maier; Edmond Boyer; Alan Yuille; Tony Tung; |
186 | Cached Transformers: Improving Transformers with Differentiable Memory Cachde Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens. |
Zhaoyang Zhang; Wenqi Shao; Yixiao Ge; Xiaogang Wang; Jinwei Gu; Ping Luo; |
187 | PVALane: Prior-Guided 3D Lane Detection with View-Agnostic Feature Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel prior-guided perspective on lane detection and propose an end-to-end framework named PVALane, which utilizes 2D prior knowledge to achieve precise and efficient 3D lane detection. |
Zewen Zheng; Xuemin Zhang; Yongqiang Mou; Xiang Gao; Chengxin Li; Guoheng Huang; Chi-Man Pun; Xiaochen Yuan; |
188 | Boosting Few-Shot Learning Via Attentive Feature Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this mixing operation weakens the feature representation due to the linear interpolation and the overlooking of the importance of specific channels. To solve these issues, this paper proposes attentive feature regularization (AFR) which aims to improve the feature representativeness and discriminability. |
Xingyu Zhu; Shuo Wang; Jinda Lu; Yanbin Hao; Haifeng Liu; Xiangnan He; |
189 | Exploring Channel-Aware Typical Features for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, exploring the channel-aware typical features is crucial to better-separating ID and OOD data. Driven by this insight, we propose expLoring channel-Aware tyPical featureS (LAPS). |
Rundong He; Yue Yuan; Zhongyi Han; Fan Wang; Wan Su; Yilong Yin; Tongliang Liu; Yongshun Gong; |
190 | Electron Microscopy Images As Set of Fragments for Mitochondrial Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design a coherent fragment vision transformer (FragViT) combined with affinity learning to manipulate features on 3D fragments yet explore mutual relationships to model fragment-wise context, enjoying locality prior without sacrificing global reception. |
Naisong Luo; Rui Sun; Yuwen Pan; Tianzhu Zhang; Feng Wu; |
191 | Fact-Driven Logical Reasoning for Machine Reading Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a general formalism of knowledge units by extracting backbone constituents of the sentence, such as the subject-verb-object formed “facts”. |
Siru Ouyang; Zhuosheng Zhang; Hai Zhao; |
192 | Hierarchical Multi-Marginal Optimal Transport for Network Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite great success in aligning networks in pairs, the literature on multi-network alignment is sparse due to the exponentially growing solution space and lack of high-order discrepancy measures. To fill this gap, we propose a hierarchical multi-marginal optimal transport framework named HOT for multi-network alignment. |
Zhichen Zeng; Boxin Du; Si Zhang; Yinglong Xia; Zhining Liu; Hanghang Tong; |
193 | TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the distribution of relationships exhibits a long-tailed pattern. To address the above problems, in this paper, we introduce a network named TD2-Net that aims at denoising and debiasing for dynamic SGG. |
Xin Lin; Chong Shi; Yibing Zhan; Zuopeng Yang; Yaqi Wu; Dacheng Tao; |
194 | SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM’s pre-trained knowledge for improved generalisation. |
Wenxi Yue; Jing Zhang; Kun Hu; Yong Xia; Jiebo Luo; Zhiyong Wang; |
195 | DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DeS3, a method that removes hard, soft and self shadows based on adaptive attention and ViT similarity. |
Yeying Jin; Wei Ye; Wenhan Yang; Yuan Yuan; Robby T. Tan; |
196 | Neural Embeddings for KNN Search in Biological Sequence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Bio-kNN: a kNN search framework for biological sequences. |
Zhihao Chang; Linzhu Yu; Yanchao Xu; Wentao Hu; |
197 | Efficient Lightweight Image Denoising with Triple Attention Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Lightweight Image Denoising Transformer method (LIDFormer) based on Triple Multi-Dconv Head Transposed Attention (TMDTA) to boost computational efficiency. |
Yubo Zhou; Jin Lin; Fangchen Ye; Yanyun Qu; Yuan Xie; |
198 | On The Actionability of Outcome Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through a simple model encompassing actions, latent states, and measurements, we demonstrate that pure outcome prediction rarely results in the most effective policy for taking actions, even when combined with other measurements. |
Lydia T. Liu; Solon Barocas; Jon Kleinberg; Karen Levy; |
199 | SyFormer: Structure-Guided Synergism Transformer for Large-Portion Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges, we propose a novel large-portion image inpainting approach, namely the Structure-Guided Synergism Transformer (SyFormer), to rectify the discrepancies in feature representation and enrich the structural cues from limited reference. |
Jie Wu; Yuchao Feng; Honghui Xu; Chuanmeng Zhu; Jianwei Zheng; |
200 | M2SD:Multiple Mixing Self-Distillation for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This presents challenges in updating the model with new classes using limited training data, particularly in balancing acquiring new knowledge while retaining the old. We propose a novel method named Multiple Mxing Self-Distillation (M2SD) during the training phase to address these issues. |
Jinhao Lin; Ziheng Wu; Weifeng Lin; Jun Huang; RongHua Luo; |
201 | ChromaFusionNet (CFNet): Natural Fusion of Fine-Grained Color Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods, including color style transfer and image harmonization, exhibit inconsistencies, especially at boundary regions. Addressing this, we present ChromaFusionNet (CFNet), a novel approach that views the color fusion problem through the lens of image color inpainting. |
Yi Dong; Yuxi Wang; Ruoxi Fan; Wenqi Ouyang; Zhiqi Shen; Peiran Ren; Xuansong Xie; |
202 | Auto-Prox: Training-Free Vision Transformer Architecture Search Via Automatic Proxy Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Auto-Prox, an automatic proxy discovery framework, to address the problem. |
Zimian Wei; Peijie Dong; Zheng Hui; Anggeng Li; Lujun Li; Menglong Lu; Hengyue Pan; Dongsheng Li; |
203 | TexFit: Text-Driven Fashion Image Editing with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TexFit, a Text-driven Fashion image Editing method using diffusion models, which performs the local image editing only with the easily accessible text. |
Tongxin Wang; Mang Ye; |
204 | Editing Language Model-Based Knowledge Graph Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, language model-based KG embeddings are usually deployed as static artifacts, making them difficult to modify post-deployment without re-training after deployment. To address this issue, we propose a new task of editing language model-based KG embeddings in this paper. |
Siyuan Cheng; Ningyu Zhang; Bozhong Tian; Xi Chen; Qingbin Liu; Huajun Chen; |
205 | Talk Funny! A Large-Scale Humor Response Dataset with Chain-of-Humor Interpretation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct the largest Chinese Explainable Humor Response Dataset to date with chain-of-humor and humor mind map annotations, which can be used to comprehensively evaluate as well as improve the humorous response ability of PLMs. |
Yuyan Chen; Yichen Yuan; Panjun Liu; Dayiheng Liu; Qinghao Guan; Mengfei Guo; Haiming Peng; Bang Liu; Zhixu Li; Yanghua Xiao; |
206 | Batch Normalization Is Blind to The First and Second Derivatives of The Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that when we do the Taylor series expansion of the loss function, the BN operation will block the influence of the first-order term and most influence of the second-order term of the loss. |
Zhanpeng Zhou; Wen Shen; Huixin Chen; Ling Tang; Yuefeng Chen; Quanshi Zhang; |
207 | Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees (Abstract Reprint) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution. |
Kai-Chieh Hsu; Allen Z. Ren; Duy P. Nguyen; Anirudha Majumdar; Jaime F. Fisac; |
208 | Null Space Matters: Range-Null Decomposition for Consistent Multi-Contrast MRI Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a range-null decomposition-assisted DUN architecture to ensure consistency while still providing desirable interpretability. |
Jiacheng Chen; Jiawei Jiang; Fei Wu; Jianwei Zheng; |
209 | WaveNet: Tackling Non-stationary Graph Signals Via Graph Spectral Wavelets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we also find that even increasing the polynomial order does not change this situation, which means polynomial-based models have a natural deficiency when facing high-frequency signals. To tackle these problems, we propose WaveNet, which aims to effectively capture the high-frequency part of the graph spectral signal from the perspective of wavelet bases through reconstructing the message propagation matrix. |
Zhirui Yang; Yulan Hu; Sheng Ouyang; Jingyu Liu; Shuqiang Wang; Xibo Ma; Wenhan Wang; Hanjing Su; Yong Liu; |
210 | Adaptive Discovering and Merging for Incremental Novel Class Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a new paradigm called Adaptive Discovering and Merging (ADM) to discover novel categories adaptively in the incremental stage and integrate novel knowledge into the model without affecting the original knowledge. |
Guangyao Chen; Peixi Peng; Yangru Huang; Mengyue Geng; Yonghong Tian; |
211 | FedDiv: Collaborative Noise Filtering for Federated Learning with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present FedDiv to tackle the challenges of F-LNL. |
Jichang Li; Guanbin Li; Hui Cheng; Zicheng Liao; Yizhou Yu; |
212 | AVSegFormer: Audio-Visual Segmentation with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose AVSegFormer, a novel framework for AVS that leverages the transformer architecture. |
Shengyi Gao; Zhe Chen; Guo Chen; Wenhai Wang; Tong Lu; |
213 | Generative Model for Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, optimal decision tree algorithms attempt to create an entire decision tree at once to achieve global optimality. We place our proposal between these approaches by designing a generative model for decision trees. |
Riccardo Guidotti; Anna Monreale; Mattia Setzu; Giulia Volpi; |
214 | Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To our knowledge, few works have tried to tackle this issue by modifying the training process for DDPMs, but they still perform unsatisfactorily due to 1) partially modeling the discrepancy and 2) ignoring the prediction error accumulation. To address the above issues, in this paper, we propose a multi-step denoising scheduled sampling (MDSS) strategy to alleviate the exposure bias for DDPMs. |
Zhiyao Ren; Yibing Zhan; Liang Ding; Gaoang Wang; Chaoyue Wang; Zhongyi Fan; Dacheng Tao; |
215 | GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes GLOP (Global and Local Optimization Policies), a unified hierarchical framework that efficiently scales toward large-scale routing problems. |
Haoran Ye; Jiarui Wang; Helan Liang; Zhiguang Cao; Yong Li; Fanzhang Li; |
216 | Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is mainly caused by the challenge that images are not sequential signals and lack a natural order when applying autoregressive modeling. In this study, inspired by human beings’ way of grasping an image, i.e., focusing on the main object first, we present a semantic-aware autoregressive image modeling (SemAIM) method to tackle this challenge. |
Kaiyou Song; Shan Zhang; Tong Wang; |
217 | Transient Glimpses: Unveiling Occluded Backgrounds Through The Spike Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a novel approach that utilizes a single spike camera for continuous multi-view imaging to address occlusion removal. |
Jiyuan Zhang; Shiyan Chen; Yajing Zheng; Zhaofei Yu; Tiejun Huang; |
218 | Frequency-Adaptive Pan-Sharpening with Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening, which consists of three key components: the Adaptive Frequency Separation Prediction Module, the Sub-Frequency Learning Expert Module, and the Expert Mixture Module. |
Xuanhua He; Keyu Yan; Rui Li; Chengjun Xie; Jie Zhang; Man Zhou; |
219 | Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by Image Signal Processing (ISP) pipeline, which distinguishes image restoration and enhancement, we present a novel Neural ISP framework, named FourierISP. |
Xuanhua He; Tao Hu; Guoli Wang; Zejin Wang; Run Wang; Qian Zhang; Keyu Yan; Ziyi Chen; Rui Li; Chengjun Xie; Jie Zhang; Man Zhou; |
220 | Decoupled Contrastive Learning for Long-Tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the performance on long-tailed recognition, this paper addresses those two issues of SCL by decoupling the training objective. |
Shiyu Xuan; Shiliang Zhang; |
221 | STEM: Unleashing The Power of Embeddings for Multi-Task Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel Shared and Task-specific EMbeddings (STEM) paradigm that aims to incorporate both shared and task-specific embeddings to effectively capture task-specific user preferences. |
Liangcai Su; Junwei Pan; Ximei Wang; Xi Xiao; Shijie Quan; Xihua Chen; Jie Jiang; |
222 | Recognizing Ultra-High-Speed Moving Objects with Bio-Inspired Spike Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the physical limit of CMOS technology in spike cameras still hinders their capability of recognizing ultra-high-speed moving objects, e.g., extremely fast motions cause blur during the imaging process of spike cameras. This paper presents the first theoretical analysis for the causes of spiking motion blur and proposes a robust representation that addresses this issue through temporal-spatial context learning. |
Junwei Zhao; Shiliang Zhang; Zhaofei Yu; Tiejun Huang; |
223 | From Past to Future: Rethinking Eligibility Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. |
Dhawal Gupta; Scott M. Jordan; Shreyas Chaudhari; Bo Liu; Philip S. Thomas; Bruno Castro da Silva; |
224 | UniCell: Universal Cell Nucleus Classification Via Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a universal cell nucleus classification framework (UniCell), which employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains. |
Junjia Huang; Haofeng Li; Xiang Wan; Guanbin Li; |
225 | Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge.Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi-Specialty with 14,041 questions and Xiezhi-Interdiscipline with 10,746 questions. |
Zhouhong Gu; Xiaoxuan Zhu; Haoning Ye; Lin Zhang; Jianchen Wang; Yixin Zhu; Sihang Jiang; Zhuozhi Xiong; Zihan Li; Weijie Wu; Qianyu He; Rui Xu; Wenhao Huang; Jingping Liu; Zili Wang; Shusen Wang; Weiguo Zheng; Hongwei Feng; Yanghua Xiao; |
226 | Graph Invariant Learning with Subgraph Co-mixup for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel graph invariant learning method based on invariant and variant patterns co-mixup strategy, which is capable of jointly generating mixed multiple environments and capturing invariant patterns from the mixed graph data. |
Tianrui Jia; Haoyang Li; Cheng Yang; Tao Tao; Chuan Shi; |
227 | P-Laplacian Adaptation for Generative Pre-trained Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. |
Haoyuan Wu; Xinyun Zhang; Peng Xu; Peiyu Liao; Xufeng Yao; Bei Yu; |
228 | Dynamic Reactive Spiking Graph Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the dynamic cognition in the brain, we propose a Dynamic Reactive Spiking Graph Neural Network that can enhance model’s expressive ability in higher biological fidelity. |
Han Zhao; Xu Yang; Cheng Deng; Junchi Yan; |
229 | Weakly-Supervised Mirror Detection Via Scribble Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, it is difficult to obtain basic mirror structure using scribble annotation, and the distinction between foreground (mirror) and background (non-mirror) features is not emphasized caused by mirror reflections. Therefore, we propose a foreground-aware mask attention (FAMA), integrating mirror edges and semantic features to complete mirror regions and suppressing the influence of backgrounds. |
Mingfeng Zha; Yunqiang Pei; Guoqing Wang; Tianyu Li; Yang Yang; Wenbin Qian; Heng Tao Shen; |
230 | Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work makes the first attempt at multimodal rumor detection in the frequency domain, which efficiently transforms spatial features into the frequency spectrum and obtains highly discriminative spectrum features for multimodal representation and fusion. |
An Lao; Qi Zhang; Chongyang Shi; Longbing Cao; Kun Yi; Liang Hu; Duoqian Miao; |
231 | Multi-Granularity Causal Structure Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a novel method called MgCSL (Multi-granularity Causal Structure Learning), which first leverages sparse auto-encoder to explore coarse-graining strategies and causal abstractions from micro-variables to macro-ones. |
Jiaxuan Liang; Jun Wang; Guoxian Yu; Shuyin Xia; Guoyin Wang; |
232 | VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose VLM2Scene, which exploits the potential of VLMs to enhance 3D self-supervised representation learning through our proposed image-text-LiDAR contrastive learning strategy. |
Guibiao Liao; Jiankun Li; Xiaoqing Ye; |
233 | M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple receptive fields (continuous contexts), and multiple solution spaces (distinct tasks) in ONE model. |
Jiaming Liu; Yue Wu; Maoguo Gong; Qiguang Miao; Wenping Ma; Cai Xu; Can Qin; |
234 | Separate The Wheat from The Chaff: Model Deficiency Unlearning Via Parameter-Efficient Module Operation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a PEMs operation approach, namely Extraction-before-Subtraction (Ext-Sub), to enhance the truthfulness and detoxification of LLMs through the integration of “expert” PEM and “anti-expert” PEM. |
Xinshuo Hu; Dongfang Li; Baotian Hu; Zihao Zheng; Zhenyu Liu; Min Zhang; |
235 | Non-exemplar Online Class-Incremental Continual Learning Via Dual-Prototype Self-Augment and Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Dual-prototype Self-augment and Refinement method (DSR) for NO-CL problem, which consists of two strategies: 1) Dual class prototypes: vanilla and high-dimensional prototypes are exploited to utilize the pre-trained information and obtain robust quasi-orthogonal representations rather than example buffers for both privacy preservation and memory reduction. |
Fushuo Huo; Wenchao Xu; Jingcai Guo; Haozhao Wang; Yunfeng Fan; |
236 | StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. |
Yu Zhang; Rongjie Huang; Ruiqi Li; JinZheng He; Yan Xia; Feiyang Chen; Xinyu Duan; Baoxing Huai; Zhou Zhao; |
237 | SasWOT: Real-Time Semantic Segmentation Architecture Search WithOut Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SasWOT, the first training-free Semantic segmentation Architecture Search (SAS) framework via an auto-discovery proxy. |
Chendi Zhu; Lujun Li; Yuli Wu; Zhengxing Sun; |
238 | Improving The Adversarial Transferability of Vision Transformers with Virtual Dense Connection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore the shared deficiency of models with different structures, researchers begin to analyze the cross-structure adversarial transferability, which is still under-explored. Therefore, in this work, we focus on the ViT attacks to improve the cross-structure transferability between the transformer-based and convolution-based models. |
Jianping Zhang; Yizhan Huang; Zhuoer Xu; Weibin Wu; Michael R. Lyu; |
239 | Curvature-Invariant Adversarial Attacks for 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to enhance the imperceptibility of adversarial attacks on 3D point cloud recognition by better preserving the local curvature of the original 3D point clouds. |
Jianping Zhang; Wenwei Gu; Yizhan Huang; Zhihan Jiang; Weibin Wu; Michael R. Lyu; |
240 | Three Heads Are Better Than One: Improving Cross-Domain NER with Progressive Decomposed Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, to transfer multiple source domains’ knowledge, we decouple the NER task into the pipeline tasks of mention detection and entity typing, where the mention detection unifies the training object across domains, thus providing the entity typing with higher-quality entity mentions. |
Xuming Hu; Zhaochen Hong; Yong Jiang; Zhichao Lin; Xiaobin Wang; Pengjun Xie; Philip S. Yu; |
241 | Conformal Crystal Graph Transformer with Robust Encoding of Periodic Invariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, effectively capturing angular information within 3D crystal structures continues to pose a significant challenge for graph-based approaches. This study introduces novel solutions to these challenges. |
Yingheng Wang; Shufeng Kong; John M. Gregoire; Carla P. Gomes; |
242 | Layer Collaboration in The Forward-Forward Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study layer collaboration in the forward-forward algorithm. |
Guy Lorberbom; Itai Gat; Yossi Adi; Alexander Schwing; Tamir Hazan; |
243 | CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recognizing the untapped potential of combining textual queries with visual representations of medical conditions, we introduce the Multimodal Medical Question Summarization (MMQS) Dataset. This dataset, a major contribution of our work, pairs medical queries with visual aids, facilitating a richer and more nuanced understanding of patient needs. |
Akash Ghosh; Arkadeep Acharya; Raghav Jain; Sriparna Saha; Aman Chadha; Setu Sinha; |
244 | VIGC: Visual Instruction Generation and Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it’s worth noting that the currently accessible MLLMs are not as powerful as their LLM counterparts, as they tend to produce inadequate responses and generate false information. As a solution for addressing the current issue, this paper proposes the Visual Instruction Generation and Correction (VIGC) framework that enables multimodal large language models to generate instruction-tuning data and progressively enhance its quality on-the-fly. |
Bin Wang; Fan Wu; Xiao Han; Jiahui Peng; Huaping Zhong; Pan Zhang; Xiaoyi Dong; Weijia Li; Wei Li; Jiaqi Wang; Conghui He; |
245 | Equity-Transformer: Solving NP-Hard Min-Max Routing Problems As Sequential Generation with Equity Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Equity-Transformer to solve large-scale min-max routing problems. |
Jiwoo Son; Minsu Kim; Sanghyeok Choi; Hyeonah Kim; Jinkyoo Park; |
246 | Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model’s ability to learn accurate and realistic motion. In this paper, we introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data to obtain supervision signals. |
Shaoheng Fang; Zuhong Liu; Mingyu Wang; Chenxin Xu; Yiqi Zhong; Siheng Chen; |
247 | Wikiformer: Pre-training with Structured Information of Wikipedia for Ad-Hoc Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we devise four pre-training objectives tailored for IR tasks based on the structured knowledge of Wikipedia. |
Weihang Su; Qingyao Ai; Xiangsheng Li; Jia Chen; Yiqun Liu; Xiaolong Wu; Shengluan Hou; |
248 | Complete Neural Networks for Complete Euclidean Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Neural networks for point clouds, which respect their natural invariance to permutation and rigid motion, have enjoyed recent success in modeling geometric phenomena, from … |
Snir Hordan; Tal Amir; Steven J. Gortler; Nadav Dym; |
249 | Debiased Novel Category Discovering and Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the challenging problem of Novel Class Discovery and Localization (NCDL), aiming to train detectors that can detect the categories present in the training data, while also actively discover, localize, and cluster new categories. |
Juexiao Feng; Yuhong Yang; Yanchun Xie; Yaqian Li; Yandong Guo; Yuchen Guo; Yuwei He; Liuyu Xiang; Guiguang Ding; |
250 | Multitarget Device-Free Localization Via Cross-Domain Wi-Fi RSS Training Data and Attentional Prior Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we pioneeringly propose a transformer-based learning method with Wi-Fi RSS as input, and an attentional prior fusion module, to simultaneously locate an unknown number of people at random positions. |
Na Fan; Zeyue Tian; Amartansh Dubey; Samruddhi Deshmukh; Ross Murch; Qifeng Chen; |
251 | Beyond The Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discover two types of latent labels behind the displayed label embedded in LiDAR and image data. |
Yujun Chen; Xin Tan; Zhizhong Zhang; Yanyun Qu; Yuan Xie; |
252 | Teaching Large Language Models to Translate with Comparison Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, it can be more challenging to tune smaller LLMs with lower-quality training data. To address this issue, we propose a novel framework using examples in comparison to teach LLMs to learn translation. |
Jiali Zeng; Fandong Meng; Yongjing Yin; Jie Zhou; |
253 | Exploiting Auxiliary Caption for Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we contend that exploiting easily available captions which describe general actions, i.e., auxiliary captions defined in our paper, will significantly boost the performance. |
Hongxiang Li; Meng Cao; Xuxin Cheng; Yaowei Li; Zhihong Zhu; Yuexian Zou; |
254 | Entropic Open-Set Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an Entropic Open-set AL (EOAL) framework which leverages both known and unknown distributions effectively to select informative samples during AL rounds. |
Bardia Safaei; Vibashan VS; Celso M. de Melo; Vishal M. Patel; |
255 | Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper provides an overview of model reprogramming to bridge this gap. |
Pin-Yu Chen; |
256 | Discovering Agents (Abstract Reprint) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the first formal causal definition of agents – roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. |
Zachary Kenton; Ramana Kumar; Sebastian Farquhar; Jonathan Richens; Matt MacDermott; Tom Everitt; |
257 | Aligner²: Enhancing Joint Multiple Intent Detection and Slot Filling Via Adjustive and Forced Cross-Task Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More seriously, they lack alignment between the predictions of the two sub-tasks due to task-independent decoding, resulting in a limitation on the overall performance. To address these challenges, we propose a novel framework termed Aligner² for multi-intent SLU, which contains an Adjustive Cross-task Aligner (ACA) and a Forced Cross-task Aligner (FCA). |
Zhihong Zhu; Xuxin Cheng; Yaowei Li; Hongxiang Li; Yuexian Zou; |
258 | An Attentive Inductive Bias for Sequential Recommendation Beyond The Self-Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel method called Beyond Self-Attention for Sequential Recommendation (BSARec), which leverages the Fourier transform to i) inject an inductive bias by considering fine-grained sequential patterns and ii) integrate low and high-frequency information to mitigate oversmoothing. |
Yehjin Shin; Jeongwhan Choi; Hyowon Wi; Noseong Park; |
259 | Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we observe that OOD detection performance varies greatly depending on the task and layer output. |
Maxime Darrin; Guillaume Staerman; Eduardo Dadalto Camara Gomes; Jackie C. K. Cheung; Pablo Piantanida; Pierre Colombo; |
260 | History Matters: Temporal Knowledge Editing in Large Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the task of Temporal Knowledge Editing (TKE) and establish a benchmark AToKe (Assessment of TempOral Knowledge Editing) to evaluate current model editing methods. |
Xunjian Yin; Jin Jiang; Liming Yang; Xiaojun Wan; |
261 | Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior studies on audio-visual speech recognition typically assume the visibility of speaking lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely affecting recognition performance. To address this issue, we propose a framework that restores occluded lips in a video by utilizing both the video itself and the corresponding noisy audio. |
Jiadong Wang; Zexu Pan; Malu Zhang; Robby T. Tan; Haizhou Li; |
262 | SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding. |
Tianyu Yu; Chengyue Jiang; Chao Lou; Shen Huang; Xiaobin Wang; Wei Liu; Jiong Cai; Yangning Li; Yinghui Li; Kewei Tu; Hai-Tao Zheng; Ningyu Zhang; Pengjun Xie; Fei Huang; Yong Jiang; |
263 | Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources. |
Qingping Zheng; Yuanfan Guo; Jiankang Deng; Jianhua Han; Ying Li; Songcen Xu; Hang Xu; |
264 | Geometry-Guided Domain Generalization for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, at the feature level, conventional domain invariant learning methods generally cause the negative transfer issue, due to the ignorance of dependency between geometry tasks and domains. To tackle these issues, in this paper, we propose MonoGDG, a geometry-guided domain generalization framework for M3OD, which effectively addresses the domain gap at both camera and feature levels. |
Fan Yang; Hui Chen; Yuwei He; Sicheng Zhao; Chenghao Zhang; Kai Ni; Guiguang Ding; |
265 | SocialCVAE: Predicting Pedestrian Trajectory Via Interaction Conditioned Latents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the deterministic nature of the learned steering behaviors from the empirical models limits the models’ practical performance. To address this issue, this work proposes the social conditional variational autoencoder (SocialCVAE) for predicting pedestrian trajectories, which employs a CVAE to explore behavioral uncertainty in human motion decisions. |
Wei Xiang; Haoteng YIN; He Wang; Xiaogang Jin; |
266 | Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Selective Synaptic Dampening (SSD), a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data. |
Jack Foster; Stefan Schoepf; Alexandra Brintrup; |
267 | Low-Distortion Clustering with Ordinal and Limited Cardinal Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of n agents located in an underlying metric space, our goal is to partition them into k clusters, optimizing some social cost objective. |
Jakob Burkhardt; Ioannis Caragiannis; Karl Fehrs; Matteo Russo; Chris Schwiegelshohn; Sudarshan Shyam; |
268 | SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This contrasts with heuristic planning methods that employ domain knowledge (formalized in action models such as PDDL) and heuristic search to generate feasible, optimal plans. Inspired by this, we propose to combine the power of LLMs and heuristic planning by leveraging the world knowledge of LLMs and the principles of heuristic search. |
Rishi Hazra; Pedro Zuidberg Dos Martires; Luc De Raedt; |
269 | CUTS+: High-Dimensional Causal Discovery from Irregular Time-Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the missing entries in the observations further hamper the causal structural learning. To overcome these limitations, We propose CUTS+, which is built on the Granger-causality-based causal discovery method CUTS and raises the scalability by introducing a technique called Coarse-to-fine-discovery (C2FD) and leveraging a message-passing-based graph neural network (MPGNN). |
Yuxiao Cheng; Lianglong Li; Tingxiong Xiao; Zongren Li; Jinli Suo; Kunlun He; Qionghai Dai; |
270 | Text-to-Image Generation for Abstract Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the three-layer artwork theory that identifies critical factors, intent, object and form during artistic creation, we propose a framework of Text-to-Image generation for Abstract Concepts (TIAC). |
Jiayi Liao; Xu Chen; Qiang Fu; Lun Du; Xiangnan He; Xiang Wang; Shi Han; Dongmei Zhang; |
271 | Bi-ViT: Pushing The Limit of Vision Transformer Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through extensive empirical analyses, we identify the severe drop in ViT binarization is caused by attention distortion in self-attention, which technically stems from the gradient vanishing and ranking disorder. |
Yanjing Li; Sheng Xu; Mingbao Lin; Xianbin Cao; Chuanjian Liu; Xiao Sun; Baochang Zhang; |
272 | SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain. |
Malyaban Bal; Abhronil Sengupta; |
273 | Exposing The Deception: Uncovering More Forgery Clues for Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These deficiencies culminate in unsatisfactory accuracy and limited generalizability in real-life scenarios. In this paper, we try to tackle these challenges through three designs: (1) We present a novel framework to capture broader forgery clues by extracting multiple non-overlapping local representations and fusing them into a global semantic-rich feature. |
Zhongjie Ba; Qingyu Liu; Zhenguang Liu; Shuang Wu; Feng Lin; Li Lu; Kui Ren; |
274 | Feature Unlearning for Pre-trained GANs and VAEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models. |
Saemi Moon; Seunghyuk Cho; Dongwoo Kim; |
275 | Structured Probabilistic Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. |
Dou Hu; Lingwei Wei; Yaxin Liu; Wei Zhou; Songlin Hu; |
276 | Review-Enhanced Hierarchical Contrastive Learning for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing studies capture simple review relations, failing to (1) completely explore hidden connections between users (or items), (2) filter out redundant information derived from reviews, and (3) model the behavioral association between rating and review interactions. To address these challenges, we propose a review-enhanced hierarchical contrastive learning, namely ReHCL. |
Ke Wang; Yanmin Zhu; Tianzi Zang; Chunyang Wang; Mengyuan Jing; |
277 | Decoupled Optimisation for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To balance the overall parameter contribution across all classes, we investigate the importance of each model parameter to the learning of different class groups, and propose a multistage parameter Decouple and Optimisation (DO) framework that decouples parameters into different groups with each group learning a specific portion of classes. |
Cong Cong; Shiyu Xuan; Sidong Liu; Shiliang Zhang; Maurice Pagnucco; Yang Song; |
278 | All But One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These issues severely degrade the original utility of generative models. In this work, we present a new approach that solves all of these challenges. |
SeungHoo Hong; Juhun Lee; Simon S. Woo; |
279 | Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most models lack causal reasoning and struggle with class imbalance, leading to poor precision and recall. To address this, we propose a Task-Driven Causal Feature Distillation model (TDCFD) to transform original feature values into causal feature attributions for the specific risk prediction task. |
Zhixuan Chu; Mengxuan Hu; Qing Cui; Longfei Li; Sheng Li; |
280 | Learning to Optimize Permutation Flow Shop Scheduling Via Graph-Based Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. |
Longkang Li; Siyuan Liang; Zihao Zhu; Chris Ding; Hongyuan Zha; Baoyuan Wu; |
281 | Controllable 3D Face Generation with Conditional Style Code Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, an ideal controllable 3D face generation model should consider both facial attributes and expressions. Thus we propose a novel approach called TEx-Face(TExt & Expression-to-Face) that addresses these challenges by dividing the task into three components, i.e., 3D GAN Inversion, Conditional Style Code Diffusion, and 3D Face Decoding. |
Xiaolong Shen; Jianxin Ma; Chang Zhou; Zongxin Yang; |
282 | Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such a phenomenon is even amplified as more unlabeled data will be mislabeled as head classes when the class distribution of labeled and unlabeled datasets are mismatched. To solve this problem, we propose a novel method named ComPlementary Experts (CPE). |
Chengcheng Ma; Ismail Elezi; Jiankang Deng; Weiming Dong; Changsheng Xu; |
283 | IT3D: Improved Text-to-3D Generation with Explicit View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, existing Text-to-3D approaches often grapple with challenges such as over-saturation, inadequate detailing, and unrealistic outputs. This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. |
Yiwen Chen; Chi Zhang; Xiaofeng Yang; Zhongang Cai; Gang Yu; Lei Yang; Guosheng Lin; |
284 | GLH-Water: A Large-Scale Dataset for Global Surface Water Detection in Large-Size Very-High-Resolution Satellite Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although achievements have been made in detecting surface water in small-size satellite images corresponding to local geographic scales, datasets and methods suitable for mapping and analyzing global surface water have yet to be explored. To encourage the development of this task and facilitate the implementation of relevant applications, we propose the GLH-water dataset that consists of 250 satellite images and 40.96 billion pixels labeled surface water annotations that are distributed globally and contain water bodies exhibiting a wide variety of types (e.g. , rivers, lakes, and ponds in forests, irrigated fields, bare areas, and urban areas). |
Yansheng Li; Bo Dang; Wanchun Li; Yongjun Zhang; |
285 | Levenshtein Distance Embedding with Poisson Regression for DNA Storage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel neural network-based sequence embedding technique using Poisson regression is proposed. |
Xiang Wei; Alan J.X. Guo; Sihan Sun; Mengyi Wei; Wei Yu; |
286 | Toward More Generalized Malicious URL Detection Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The findings presented in this work not only expose a latent issue in the field but also provide an actionable remedy, marking a significant step forward in the pursuit of more reliable and robust malicious URL detection. |
Yun-Da Tsai; Cayon Liow; Yin Sheng Siang; Shou-De Lin; |
287 | LLM Vs Small Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, they treat personality traits as one-hot classification labels, overlooking the semantic information within them. In this paper, we propose a large language model (LLM) based text augmentation enhanced personality detection model, which distills the LLM’s knowledge to enhance the small model for personality detection, even when the LLM fails in this task. |
Linmei Hu; Hongyu He; Duokang Wang; Ziwang Zhao; Yingxia Shao; Liqiang Nie; |
288 | Wavelet-Driven Spatiotemporal Predictive Learning: Bridging Frequency and Time Variations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an innovative Wavelet-based SpatioTemporal (WaST) framework, which extracts and adaptively controls both low and high-frequency components at image and feature levels via 3D discrete wavelet transform for faster processing while maintaining high-quality predictions. |
Xuesong Nie; Yunfeng Yan; Siyuan Li; Cheng Tan; Xi Chen; Haoyuan Jin; Zhihang Zhu; Stan Z. Li; Donglian Qi; |
289 | Fine Structure-Aware Sampling: A New Sampling Training Scheme for Pixel-Aligned Implicit Models in Single-View Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing sampling training schemes either fail to capture thin surfaces (e.g. ears, fingers) or cause noisy artefacts in reconstructed meshes. To address these problems, we introduce Fine Structured-Aware Sampling (FSS), a new sampling training scheme to train pixel-aligned implicit models for single-view human reconstruction. |
Kennard Yanting Chan; Fayao Liu; Guosheng Lin; Chuan Sheng Foo; Weisi Lin; |
290 | DreamIdentity: Enhanced Editability for Efficient Face-Identity Preserved Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite existing encoder-based methods achieving high efficiency and decent face similarity, the generated image often fails to follow the textual prompts. To ease this editability issue, we present DreamIdentity, to learn edit-friendly and accurate face-identity representations in the word embedding space. |
Zhuowei Chen; Shancheng Fang; Wei Liu; Qian He; Mengqi Huang; Zhendong Mao; |
291 | Multi-Architecture Multi-Expert Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the performance degradation of efficient diffusion models by introducing Multi-architecturE Multi-Expert diffusion models (MEME). |
Yunsung Lee; JinYoung Kim; Hyojun Go; Myeongho Jeong; Shinhyeok Oh; Seungtaek Choi; |
292 | Joint Demosaicing and Denoising for Spike Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an iterative joint demosaicing and denoising network (SJDD-Net) for spike cameras based on the observation model. |
Yanchen Dong; Ruiqin Xiong; Jing Zhao; Jian Zhang; Xiaopeng Fan; Shuyuan Zhu; Tiejun Huang; |
293 | Uncertainty-Aware Yield Prediction with Multimodal Molecular Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing models often utilize single-modal feature representations, such as molecular fingerprints, SMILES sequences, or molecular graphs, which is not sufficient to capture the complex interactions and dynamic behavior of molecules in reactions. In this paper, we present an advanced Uncertainty-Aware Multimodal model (UAM) to tackle these challenges. |
Jiayuan Chen; Kehan Guo; Zhen Liu; Olexandr Isayev; Xiangliang Zhang; |
294 | Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we train a local distance indicator (LDI) that predicts the unsigned distance from a query point to a local implicit surface. |
Shujuan Li; Junsheng Zhou; Baorui Ma; Yu-Shen Liu; Zhizhong Han; |
295 | Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, autoregressive omni-aware generative network (AOG-Net) is proposed for 360-degree image generation by outpainting an incomplete 360-degree image progressively with NFoV and text guidances joinly or individually. |
Zhuqiang Lu; Kun Hu; Chaoyue Wang; Lei Bai; Zhiyong Wang; |
296 | Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we proposed a novel Incomplete Contrastive Multi-View Clustering method with high-confidence guiding (ICMVC). |
Guoqing Chao; Yi Jiang; Dianhui Chu; |
297 | Structural Entropy Based Graph Structure Learning for Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, they might be insufficient when dealing with the graphs containing noises from real-world complex systems. To address this issue, we propose a novel and effective GSL framework for node classification based on the structural information theory. |
Liang Duan; Xiang Chen; Wenjie Liu; Daliang Liu; Kun Yue; Angsheng Li; |
298 | Devignet: High-Resolution Vignetting Removal Via A Dual Aggregated Fusion Transformer with Adaptive Channel Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the substantial lack of real-world vignetting datasets hinders the objective and comprehensive evaluation of vignetting removal. To address these challenges, we present VigSet, a pioneering dataset for vignetting removal. |
Shenghong Luo; Xuhang Chen; Weiwen Chen; Zinuo Li; Shuqiang Wang; Chi-Man Pun; |
299 | Adversarial Robust Safeguard for Evading Deep Facial Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, traditional optimization-based methods face limitations in scalability as they struggle to accommodate the substantial expansion of data volume, a consequence of the time-intensive iterative pipeline. To solve these challenges, we propose a learning-based model, Adversarial Robust Safeguard (ARS), to generate desirable protection noise in a single forward process, concurrently exhibiting a heightened resistance against prevalent perturbations. |
Jiazhi Guan; Yi Zhao; Zhuoer Xu; Changhua Meng; Ke Xu; Youjian Zhao; |
300 | Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a multi-modal cross-scale deformable transformer network (M2DTN) to achieve unregistered HSI-SR. |
Wenqian Dong; Yang Xu; Jiahui Qu; Shaoxiong Hou; |
301 | Toward Robustness in Multi-Label Classification: A Data Augmentation Strategy Against Imbalance and Noise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Multi-label classification poses challenges due to imbalanced and noisy labels in training data. In this paper, we propose a unified data augmentation method, named BalanceMix, to address these challenges. |
Hwanjun Song; Minseok Kim; Jae-Gil Lee; |
302 | Visual Instruction Tuning with Polite Flamingo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Polite Flamingo, a multi-modal response rewriter that transforms raw annotations into a more appealing, "polite" format. |
Delong Chen; Jianfeng Liu; Wenliang Dai; Baoyuan Wang; |
303 | BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ability to accurately predict the trajectory of surrounding vehicles is a critical hurdle to overcome on the journey to fully autonomous vehicles. To address this challenge, we pioneer a novel behavior-aware trajectory prediction model (BAT) that incorporates insights and findings from traffic psychology, human behavior, and decision-making. |
Haicheng Liao; Zhenning Li; Huanming Shen; Wenxuan Zeng; Dongping Liao; Guofa Li; Chengzhong Xu; |
304 | Personalized LoRA for Human-Centered Text Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a personalized LoRA (PLoRA) with a plug-and-play (PnP) framework for the HCTU task. |
You Zhang; Jin Wang; Liang-Chih Yu; Dan Xu; Xuejie Zhang; |
305 | VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Large-scale pre-trained models have achieved remarkable success in various computer vision tasks. |
Yi Xin; Junlong Du; Qiang Wang; Zhiwen Lin; Ke Yan; |
306 | Small Language Model Can Self-Correct Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Intrinsic Self-Correction (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner, even for those small LMs with 6 billion parameters. |
Haixia Han; Jiaqing Liang; Jie Shi; Qianyu He; Yanghua Xiao; |
307 | Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel learning framework namely SKES. |
Yankai Chen; Yixiang Fang; Qiongyan Wang; Xin Cao; Irwin King; |
308 | Towards Explainable Joint Models Via Information Theory for Multiple Intent Detection and Slot Filling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we mathematically define the cross-task information gain (CIG) to measure the quality of joint processes from an information-theoretic perspective and discover an implicit optimization of CIG in previous models. Based on this, we propose a novel multi-stage iterative framework with theoretical effectiveness, explainability, and convergence, which can explicitly optimize information for cross-task interactions. |
Xianwei Zhuang; Xuxin Cheng; Yuexian Zou; |
309 | MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge – MULTISCRIPT, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. |
Jingyuan Qi; Minqian Liu; Ying Shen; Zhiyang Xu; Lifu Huang; |
310 | Any-Stereo: Arbitrary Scale Disparity Estimation for Iterative Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AnyStereo, an accurate and efficient disparity upsampling module with implicit neural representation for the iterative stereo pipeline. |
Zhaohuai Liang; Changhe Li; |
311 | Deep Unfolded Network with Intrinsic Supervision for Pan-Sharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enhance the transparency of network design, we develop an iterative solution algorithm following the half-quadratic splitting to unfold the deep model. |
Hebaixu Wang; Meiqi Gong; Xiaoguang Mei; Hao Zhang; Jiayi Ma; |
312 | Neural Oscillators for Generalization of Physics-Informed Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to enhance the generalization capabilities of PIML, facilitating practical, real-world applications where accurate predictions in unexplored regions are crucial. |
Taniya Kapoor; Abhishek Chandra; Daniel M. Tartakovsky; Hongrui Wang; Alfredo Nunez; Rolf Dollevoet; |
313 | Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of above, we introduce a novel noisy correspondence learning framework, namely Self-Reinforcing Errors Mitigation (SREM). |
Zhuohang Dang; Minnan Luo; Chengyou Jia; Guang Dai; Xiaojun Chang; Jingdong Wang; |
314 | Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose task-aware language-image representation to mitigate catastrophic forgetting, introducing a new paradigm for language-image-based CIOD. |
Hongquan Zhang; Bin-Bin Gao; Yi Zeng; Xudong Tian; Xin Tan; Zhizhong Zhang; Yanyun Qu; Jun Liu; Yuan Xie; |
315 | Elijah: Eliminating Backdoors Injected in Diffusion Models Via Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. |
Shengwei An; Sheng-Yen Chou; Kaiyuan Zhang; Qiuling Xu; Guanhong Tao; Guangyu Shen; Siyuan Cheng; Shiqing Ma; Pin-Yu Chen; Tsung-Yi Ho; Xiangyu Zhang; |
316 | Protein 3D Graph Structure Learning for Robust Structure-Based Protein Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first investigate the reason behind the performance decrease when utilizing predicted structures, attributing it to the structure embedding bias from the perspective of structure representation learning. To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures. |
Yufei Huang; Siyuan Li; Lirong Wu; Jin Su; Haitao Lin; Odin Zhang; Zihan Liu; Zhangyang Gao; Jiangbin Zheng; Stan Z. Li; |
317 | The Complexity of Computing Robust Mediated Equilibria in Ordinal Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that we can in fact make good use of mixed strategies in ordinal games if we consider settings that allow for folk theorems. |
Vincent Conitzer; |
318 | Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In fact, these data are almost always private, specific, and distinctive for scenes that require high robustness. To tackle these issues, we propose a challenging but significant task called Data-Free Adversarial Robustness Distillation (DFARD), which aims to train small, easily deployable, robust models without relying on data. |
Yuzheng Wang; Zhaoyu Chen; Dingkang Yang; Pinxue Guo; Kaixun Jiang; Wenqiang Zhang; Lizhe Qi; |
319 | SkeletonGait: Gait Recognition Using Skeleton Maps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel skeletal gait representation named skeleton map, together with SkeletonGait, a skeleton-based method to exploit structural information from human skeleton maps. |
Chao Fan; Jingzhe Ma; Dongyang Jin; Chuanfu Shen; Shiqi Yu; |
320 | BLiRF: Bandlimited Radiance Fields for Dynamic Scene Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a step back and investigate how current implementations may entail deleterious effects including limited expressiveness, entanglement of light and density fields, and sub-optimal motion localization. |
Sameera Ramasinghe; Violetta Shevchenko; Gil Avraham; Anton van den Hengel; |
321 | LatestEval: Addressing Data Contamination in Language Model Evaluation Through Dynamic and Time-Sensitive Test Construction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose LatestEval, an automatic method that leverages the most recent texts to create uncontaminated reading comprehension evaluations. |
Yucheng Li; Frank Guerin; Chenghua Lin; |
322 | DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient evolutionary-based NAS-MMC method called divide-and-conquer neural architecture search (DC-NAS). |
Xinyan Liang; Pinhan Fu; Qian Guo; Keyin Zheng; Yuhua Qian; |
323 | Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. |
Shangding Gu; Bilgehan Sel; Yuhao Ding; Lu Wang; Qingwei Lin; Ming Jin; Alois Knoll; |
324 | A Diffusion-Based Pre-training Framework for Crystal Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many significant problems involving crystal property prediction from 3D structures have limited labeled data due to expensive and time-consuming physical simulations or lab experiments. To overcome this challenge, we propose a pretrain-finetune framework for the crystal property prediction task named CrysDiff based on diffusion models. |
Zixing Song; Ziqiao Meng; Irwin King; |
325 | Continuous Piecewise-Affine Based Motion Model for Image Animation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, limited by the expressive power of the transformations used, these methods always produce poor results when the gap between the motion in the driving frame and the source image is large. To address this issue, we propose to model motion from the source image to the driving frame in highly-expressive diffeomorphism spaces. |
Hexiang Wang; Fengqi Liu; Qianyu Zhou; Ran Yi; Xin Tan; Lizhuang Ma; |
326 | Plug-In Diffusion Model for Sequential Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these approaches typically use the highest-score item in corpus for user interest prediction, leading to the ignorance of the user’s generalized preference contained within other items, thereby remaining constrained by the data sparsity issue. To address this issue, this paper presents a novel Plug-in Diffusion Model for Recommendation (PDRec) framework, which employs the diffusion model as a flexible plugin to jointly take full advantage of the diffusion-generating user preferences on all items. |
Haokai Ma; Ruobing Xie; Lei Meng; Xin Chen; Xu Zhang; Leyu Lin; Zhanhui Kang; |
327 | Neural Physical Simulation with Multi-Resolution Hash Grid Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We contribute a neural simulation framework based on multi-resolution hash grid representation to introduce hierarchical consideration of global and local information, simultaneously. |
Haoxiang Wang; Tao Yu; Tianwei Yang; Hui Qiao; Qionghai Dai; |
328 | Dual-Channel Learning Framework for Drug-Drug Interaction Prediction Via Relation-Aware Heterogeneous Graph Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, effectively handling heterogeneous information present in both biomedical knowledge graphs and drug molecular graphs remains a challenge for improved performance of DDI prediction. To address these limitations, we propose a Transformer-based relatIon-aware Graph rEpresentation leaRning framework (TIGER) for DDI prediction. |
Xiaorui Su; Pengwei Hu; Zhu-Hong You; Philip S. Yu; Lun Hu; |
329 | Explaining Generalization Power of A DNN Using Interactive Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this way, to some extent, we can consider such interactions as interactive concepts encoded by the DNN. Therefore, in this paper, we derive an analytic explanation of inconsistency of concepts of different complexities. |
Huilin Zhou; Hao Zhang; Huiqi Deng; Dongrui Liu; Wen Shen; Shih-Han Chan; Quanshi Zhang; |
330 | Learning Ultrametric Trees for Optimal Transport Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to find an optimal tree structure for a given discrete metric space so that the tree-Wasserstein distance approximates the optimal transport distance in the original space. |
Samantha Chen; Puoya Tabaghi; Yusu Wang; |
331 | Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector. |
Yang Jiao; Zequn Jie; Shaoxiang Chen; Lechao Cheng; Jingjing Chen; Lin Ma; Yu-Gang Jiang; |
332 | Pay Attention to Target: Relation-Aware Temporal Consistency for Domain Adaptive Video Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To take full advantage of the information contained in the pseudo-labels and empower more effective supervision signals, we propose a coherent PAT network including a target domain focalizer and relation-aware temporal consistency. |
Huayu Mai; Rui Sun; Yuan Wang; Tianzhu Zhang; Feng Wu; |
333 | AltDiffusion: A Multilingual Text-to-Image Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing works only support limited language input, e.g., English, Chinese, and Japanese, leaving users beyond these languages underserved and blocking the global expansion of T2I models. Therefore, this paper presents AltDiffusion, a novel multilingual T2I diffusion model that supports eighteen different languages. |
Fulong Ye; Guang Liu; Xinya Wu; Ledell Wu; |
334 | On Optimal Tradeoffs Between EFX and Nash Welfare Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this work is to characterize the tradeoffs between two well-studied measures of fairness and efficiency — envy freeness up to any item (EFX) for fairness, and Nash welfare for efficiency — by saying, for given constants α and β, whether there exists an α-EFX allocation that guarantees a β-fraction of the maximum Nash welfare (β-MNW). |
Michal Feldman; Simon Mauras; Tomasz Ponitka; |
335 | HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More importantly, we propose a novel spatial-temporal feature learning and fusion framework, termed ESTF, for event stream based human activity recognition. |
Xiao Wang; Zongzhen Wu; Bo Jiang; Zhimin Bao; Lin Zhu; Guoqi Li; Yaowei Wang; Yonghong Tian; |
336 | Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we incorporate the prompting methodology, which is widely used to enrich model input, into the label side for the first time. |
Bo Li; Wei Ye; Quansen Wang; Wen Zhao; Shikun Zhang; |
337 | Double-Bounded Optimal Transport for Advanced Clustering and Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one, thus giving more freedom for the transport to find solutions. |
Liangliang Shi; Zhaoqi Shen; Junchi Yan; |
338 | DVSAI: Diverse View-Shared Anchors Based Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For generating view-shared anchors with multi-dimension and multi-size for IMVC, we design a novel framework called Diverse View-Shared Anchors based Incomplete multi-view clustering (DVSAI). |
Shengju Yu; Siwei Wang; Pei Zhang; Miao Wang; Ziming Wang; Zhe Liu; Liming Fang; En Zhu; Xinwang Liu; |
339 | Compound Text-Guided Prompt Tuning Via Image-Adaptive Cues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, previous works require to include category names within prompts, exhibiting subpar performance when dealing with ambiguous category names. To address these shortcomings, we propose Compound Text-Guided Prompt Tuning (TGP-T) that significantly reduces resource demand while achieving superior performance. |
Hao Tan; Jun Li; Yizhuang Zhou; Jun Wan; Zhen Lei; Xiangyu Zhang; |
340 | Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tabular data analysis is crucial in various fields, and large language models show promise in this area. |
Xinyi He; Mengyu Zhou; Xinrun Xu; Xiaojun Ma; Rui Ding; Lun Du; Yan Gao; Ran Jia; Xu Chen; Shi Han; Zejian Yuan; Dongmei Zhang; |
341 | Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing deep SNNs rely on direct coding that generates powerless spike representation and lacks the temporal dynamics inherent in human vision. Hence, we introduce Gated Attention Coding (GAC), a plug-and-play module that leverages the multi-dimensional gated attention unit to efficiently encode inputs into powerful representations before feeding them into the SNN architecture. |
Xuerui Qiu; Rui-Jie Zhu; Yuhong Chou; Zhaorui Wang; Liang-Jian Deng; Guoqi Li; |
342 | How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. |
Soumya Suvra Ghosal; Yiyou Sun; Yixuan Li; |
343 | SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. |
Zhengze Xu; Dongyue Wu; Changqian Yu; Xiangxiang Chu; Nong Sang; Changxin Gao; |
344 | Sketch and Refine: Towards Fast and Accurate Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a “Sketch-and-Refine” paradigm that utilizes the merits of both keypoint-based and proposal-based methods. |
Chao Chen; Jie Liu; Chang Zhou; Jie Tang; Gangshan Wu; |
345 | Block Image Compressive Sensing with Local and Global Information Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To directly confront the communication problem among blocks and effectively resolve it, we propose a novel approach called Block Reconstruction with Blocks’ Communication Network (BRBCN). |
Xiaoyu Kong; Yongyong Chen; Feng Zheng; Zhenyu He; |
346 | Detect Any Keypoints: An Efficient Light-Weight Few-Shot Keypoint Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a separation of modulation-detection makes model heavy and slow when the number of keypoints increases. To overcome this issue, we design a novel light-weight detector which combines modulation and detection into one step, with the goal of reducing the computational cost without the drop of performance. |
Changsheng Lu; Piotr Koniusz; |
347 | Impartial Adversarial Distillation: Addressing Biased Data-Free Knowledge Distillation Via Adaptive Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigated a pragmatic yet under-explored problem: how to perform DFKD from a teacher model pretrained from imbalanced data. |
Dongping Liao; Xitong Gao; Chengzhong Xu; |
348 | Graph Reasoning Transformers for Knowledge-Aware Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the modality gap between natural language text and KGs has become a challenging obstacle when aligning and fusing cross-modal information. To address these challenges, we propose a novel knowledge-augmented question answering (QA) model, namely, Graph Reasoning Transformers (GRT). |
Ruilin Zhao; Feng Zhao; Liang Hu; Guandong Xu; |
349 | Roll with The Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The increased recognition difficulty on fine-grained unlabeled data spells disaster for pseudo-labeling accuracy, resulting in poor performance of the SSL model. To tackle this challenge, we propose Soft Label Selection with Confidence-Aware Clustering based on Class Transition Tracking (SoC) by reconstructing the pseudo-label selection process by jointly optimizing Expansion Objective and Shrinkage Objective, which is based on a soft label manner. |
Yue Duan; Zhen Zhao; Lei Qi; Luping Zhou; Lei Wang; Yinghuan Shi; |
350 | Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Secondly, during training, SDS loss may cause the generated content to overfit and collapse, limiting the model’s ability to learn intricate texture details. To overcome these challenges, we propose a novel approach called Noise Recalibration algorithm. |
Xiaofeng Yang; Fayao Liu; Yi Xu; Hanjing Su; Qingyao Wu; Guosheng Lin; |
351 | Latent Space Editing in Transformer-Based Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an editing space, which we call u-space, that can be manipulated in a controllable, accumulative, and composable manner. |
Vincent Tao Hu; Wei Zhang; Meng Tang; Pascal Mettes; Deli Zhao; Cees Snoek; |
352 | SAM-PARSER: Fine-Tuning SAM Efficiently By Parameter Space Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing fine-tuning methods attempt to bridge the gaps among different scenarios by introducing a set of new parameters to modify SAM’s original parameter space. Unlike these works, in this paper, we propose fine-tuning SAM efficiently by parameter space reconstruction (SAM-PARSER), which introduce nearly zero trainable parameters during fine-tuning. |
Zelin Peng; Zhengqin Xu; Zhilin Zeng; Xiaokang Yang; Wei Shen; |
353 | TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, most of current Transformer-based ReID methods only utilize the global feature of class tokens to achieve the holistic retrieval, ignoring the local discriminative ones. To address the above issues, we step further to utilize all the tokens of Transformers and propose a cyclic token permutation framework for multi-spectral object ReID, dubbled TOP-ReID. |
Yuhao Wang; Xuehu Liu; Pingping Zhang; Hu Lu; Zhengzheng Tu; Huchuan Lu; |
354 | SOGDet: Semantic-Occupancy Guided Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection), that leverages a 3D semantic-occupancy branch to improve the accuracy of 3D object detection. |
Qiu Zhou; Jinming Cao; Hanchao Leng; Yifang Yin; Yu Kun; Roger Zimmermann; |
355 | Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often implement watermark component removal and background restoration tasks within a singular branch, leading to residual watermarks in the predictions and ignoring cases where watermarks heavily obscure the background. To address these limitations, this study introduces the Removing Interference and Recovering Content Imaginatively (RIRCI) framework. |
Yicheng Leng; Chaowei Fang; Gen Li; Yixiang Fang; Guanbin Li; |
356 | Enhancing Cognitive Diagnosis Using Un-interacted Exercises: A Collaboration-Aware Mixed Sampling Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This oversight results in diminished performance when these models are applied to comprehensive datasets. In response to this gap, we present the Collaborative-aware Mixed Exercise Sampling (CMES) framework, which can effectively exploit the information present in un-interacted exercises linked to un-interacted knowledge concepts. |
Haiping Ma; Changqian Wang; Hengshu Zhu; Shangshang Yang; Xiaoming Zhang; Xingyi Zhang; |
357 | MCL-NER: Cross-Lingual Named Entity Recognition Via Multi-View Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (MCL-NER). |
Ying Mo; Jian Yang; Jiahao Liu; Qifan Wang; Ruoyu Chen; Jingang Wang; Zhoujun Li; |
358 | Transitivity-Preserving Graph Representation Learning for Bridging Local Connectivity and Role-Based Similarity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Unified Graph Transformer Networks (UGT) that effectively integrate local and global structural information into fixed-length vector representations. |
Van Thuy Hoang; O-Joun Lee; |
359 | Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. |
Johannes Lehner; Benedikt Alkin; Andreas Fürst; Elisabeth Rumetshofer; Lukas Miklautz; Sepp Hochreiter; |
360 | Eliciting Honest Information from Authors Using Sequential Review Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a sequential review mechanism that can truthfully elicit the ranking information from authors while only assuming the agent’s utility is increasing with respect to the true quality of her accepted papers. |
Yichi Zhang; Grant Schoenebeck; Weijie Su; |
361 | PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an end-to-end framework for generating 3D human pose datasets using Neural Radiance Fields (NeRF). |
Mohsen Gholami; Rabab Ward; Z. Jane Wang; |
362 | Clarifying The Behavior and The Difficulty of Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides conceptual and analytic insights into the difficulty of adversarial training via a simple theoretical study, where we derive an approximate dynamics of a recursive multi-step attack in a simple setting. |
Xu Cheng; Hao Zhang; Yue Xin; Wen Shen; Quanshi Zhang; |
363 | LMD: Faster Image Reconstruction with Latent Masking Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To the end, this paper presents LMD, a faster image reconstruction framework with Latent Masking Diffusion. |
Zhiyuan Ma; Zhihuan Yu; Jianjun Li; Bowen Zhou; |
364 | AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous works mainly focus on discreteness-sensitive instructions such as adding, removing or replacing specific objects, background elements or global styles (i.e., “hard editing”), while generally ignoring subject-binding but semantically fine-changing continuity-sensitive instructions such as actions, poses or adjectives, and so on (i.e., “soft editing”), which hampers generative AI from generating user-customized visual contents. To mitigate this predicament, we propose a spatio-temporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives. |
Zhiyuan Ma; Guoli Jia; Bowen Zhou; |
365 | EDA: Evolving and Distinct Anchors for Multimodal Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel paradigm, named Evolving and Distinct Anchors (EDA), to define the positive and negative components for multimodal motion prediction based on mixture models. |
Longzhong Lin; Xuewu Lin; Tianwei Lin; Lichao Huang; Rong Xiong; Yue Wang; |
366 | DINGO: Towards Diverse and Fine-Grained Instruction-Following Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing evaluation methods focus on general skills, they suffer from two main shortcomings, i.e., lack of fine-grained task-level evaluation and reliance on singular instruction expression. To address these problems, this paper introduces DINGO, a fine-grained and diverse instruction-following evaluation dataset that has two main advantages: (1) DINGO is based on a manual annotated, fine-grained and multi-level category tree with 130 nodes derived from real-world user requests; (2) DINGO includes diverse instructions, generated by both GPT-4 and human experts. |
Zihui Gu; Xingwu Sun; Fengzong Lian; Zhanhui Kang; Chengzhong Xu; Ju Fan; |
367 | Stitching Segments and Sentences Towards Generalization in Video-Text Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This limits the model’s ability to perform fine-grained matching and generalization, especially for tasks that selecting segments in long videos based on query texts. To address this issue, we propose a novel stitching and matching pre-text task for video-language pre-training that encourages fine-grained interactions between modalities. |
Fan Ma; Xiaojie Jin; Heng Wang; Jingjia Huang; Linchao Zhu; Yi Yang; |
368 | MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks Via Text Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. |
Haoqiang Guo; Sendong Zhao; Haochun Wang; Yanrui Du; Bing Qin; |
369 | VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To deal with ZSOC, preceding studies proposed a two-stage pipeline: discovering exemplars and counting. |
Seunggu Kang; WonJun Moon; Euiyeon Kim; Jae-Pil Heo; |
370 | Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient MoE architecture with weight sharing across the experts. |
Rongyu Zhang; Yulin Luo; Jiaming Liu; Huanrui Yang; Zhen Dong; Denis Gudovskiy; Tomoyuki Okuno; Yohei Nakata; Kurt Keutzer; Yuan Du; Shanghang Zhang; |
371 | BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve upon them, the present study introduces BLIVA: an augmented version of InstructBLIP with Visual Assistant. |
Wenbo Hu; Yifan Xu; Yi Li; Weiyue Li; Zeyuan Chen; Zhuowen Tu; |
372 | Towards Detailed Text-to-Motion Synthesis Via Basic-to-Advanced Hierarchical Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods conduct diffusion processes either on the raw data distribution or the low-dimensional latent space, which typically suffer from the problem of modality inconsistency or detail-scarce. To tackle this problem, we propose a novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to collaboratively exploit low-dimensional and high-dimensional diffusion models for high quality detailed motion synthesis. |
Zhenyu Xie; Yang Wu; Xuehao Gao; Zhongqian Sun; Wei Yang; Xiaodan Liang; |
373 | Unveiling Implicit Deceptive Patterns in Multi-Modal Fake News Via Neuro-Symbolic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: In the current Internet landscape, the rampant spread of fake news, particularly in the form of multi-modal content, poses a great social threat. While automatic multi-modal fake … |
Yiqi Dong; Dongxiao He; Xiaobao Wang; Youzhu Jin; Meng Ge; Carl Yang; Di Jin; |
374 | SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. |
Xinqiao Zhao; Feilong Tang; Xiaoyang Wang; Jimin Xiao; |
375 | SDGAN: Disentangling Semantic Manipulation for Facial Attribute Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This challenge primarily arises from the strong correlations between different attributes and the interplay between attributes and identity. In this paper, we propose Semantic Disentangled GAN (SDGAN), a novel method addressing this challenge. |
Wenmin Huang; Weiqi Luo; Jiwu Huang; Xiaochun Cao; |
376 | Distribution Matching for Multi-Task Learning of Classification Tasks: A Large-Scale Study on Faces & Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. |
Dimitrios Kollias; Viktoriia Sharmanska; Stefanos Zafeiriou; |
377 | Deep Incomplete Multi-View Learning Network with Insufficient Label Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning for such simultaneous lack of feature and label is crucial but rarely studied. To tackle these problems, we propose a novel Deep Incomplete Multi-view Learning Network (DIMvLN) by incorporating graph networks and semi-supervised learning in this paper. |
Zhangqi Jiang; Tingjin Luo; Xinyan Liang; |
378 | TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. |
Xingzhou Lou; Junge Zhang; Timothy J. Norman; Kaiqi Huang; Yali Du; |
379 | OctOcc: High-Resolution 3D Occupancy Prediction with Octree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing 3D occupancy prediction methods contend with the constraint of low-resolution 3D voxel features arising from the limitation of computational memory. To address this limitation and achieve a more fine-grained representation of 3D scenes, we propose OctOcc, a novel octree-based approach for 3D semantic occupancy prediction. |
Wenzhe Ouyang; Xiaolin Song; Bailan Feng; Zenglin Xu; |
380 | Hierarchical and Incremental Structural Entropy Minimization for Unsupervised Social Event Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address social event detection via graph structural entropy (SE) minimization. |
Yuwei Cao; Hao Peng; Zhengtao Yu; Philip S. Yu; |
381 | Mono3DVG: 3D Visual Grounding in Monocular Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel task of 3D visual grounding in monocular RGB images using language descriptions with both appearance and geometry information. |
Yang Zhan; Yuan Yuan; Zhitong Xiong; |
382 | Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a multilingual idiom KB (IdiomKB) developed using large LMs to address this. |
Shuang Li; Jiangjie Chen; Siyu Yuan; Xinyi Wu; Hao Yang; Shimin Tao; Yanghua Xiao; |
383 | Self-Supervised Likelihood Estimation with Energy Guidance for Anomaly Segmentation in Urban Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the contrary, in this paper, we exploit the strong context-dependent nature of segmentation task and design an energy-guided self-supervised frameworks for anomaly segmentation, which optimizes an anomaly head by maximizing likelihood of self-generated anomaly pixels. |
Yuanpeng Tu; Yuxi Li; Boshen Zhang; Liang Liu; Jiangning Zhang; Yabiao Wang; Cairong Zhao; |
384 | Improving Panoptic Narrative Grounding By Harnessing Semantic Relationships and Visual Confirmation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often neglect the modeling of semantic and visual relationships between phrase-level instances, limiting their ability for complex multi-modal reasoning in PNG. To tackle this issue, we propose XPNG, a “differentiation-refinement-localization” reasoning paradigm for accurately locating instances or regions. |
Tianyu Guo; Haowei Wang; Yiwei Ma; Jiayi Ji; Xiaoshuai Sun; |
385 | Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the task of extracting event hierarchies from multimodal (video and text) data to capture how the same event manifests itself in different modalities at different semantic levels. |
Hammad Ayyubi; Christopher Thomas; Lovish Chum; Rahul Lokesh; Long Chen; Yulei Niu; Xudong Lin; Xuande Feng; Jaywon Koo; Sounak Ray; Shih-Fu Chang; |
386 | Can Large Language Models Serve As Rational Players in Game Theory? A Systematic Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we endeavor to systematically analyze LLMs in the context of game theory. |
Caoyun Fan; Jindou Chen; Yaohui Jin; Hao He; |
387 | Monitoring of Perception Systems: Deterministic, Probabilistic, and Learning-Based Fault Detection and Identification (Abstract Reprint) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formalize the problem of runtime fault detection and identification in perception systems and present a framework to model diagnostic information using a diagnostic graph. |
Pasquale Antonante; Heath Nilsen; Luca Carlone; |
388 | Exploiting Label Skews in Federated Learning with Model Concatenation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Among different non-IID types, label skews have been challenging and common in image classification and other tasks. Instead of averaging the local models in most previous studies, we propose FedConcat, a simple and effective approach that concatenates these local models as the base of the global model to effectively aggregate the local knowledge. |
Yiqun Diao; Qinbin Li; Bingsheng He; |
389 | DRF: Improving Certified Robustness Via Distributional Robustness Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we provide a novel framework called DRF, which connects AT-based RS methods with distributional robustness (DR), and show that these methods are special cases of their counterparts in our framework. |
Zekai Wang; Zhengyu Zhou; Weiwei Liu; |
390 | Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Protein engineering is typically conducted through an iterative process of adding mutations to the wild-type or lead sequences, recombination of mutations, and running new rounds of screening. To enhance the efficiency of such a process, we propose a tree search-based bandit learning method, which expands a tree starting from the initial sequence with the guidance of a bandit machine learning model. |
Jiahao Qiu; Hui Yuan; Jinghong Zhang; Wentao Chen; Huazheng Wang; Mengdi Wang; |
391 | Deep Quantum Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we efficiently train novel end-to-end deep quantum error decoders. |
Yoni Choukroun; Lior Wolf; |
392 | Optimistic Model Rollouts for Pessimistic Offline Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We initially observed the potential benefits of optimism brought by encouraging more OOD rollouts. Motivated by this observation, we present ORPO, a simple yet effective model-based offline RL framework. |
Yuanzhao Zhai; Yiying Li; Zijian Gao; Xudong Gong; Kele Xu; Dawei Feng; Ding Bo; Huaimin Wang; |
393 | Benchmarking Large Language Models in Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. |
Jiawei Chen; Hongyu Lin; Xianpei Han; Le Sun; |
394 | Adversarial Socialbots Modeling Based on Structural Information Principles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the rapid advancement of reactive detectors, the exploration of adversarial socialbot modeling remains incomplete, significantly hindering the development of proactive detectors. To address this issue, we propose a mathematical Structural Information principles-based Adversarial Socialbots Modeling framework, namely SIASM, to enable more accurate and effective modeling of adversarial behaviors. |
Xianghua Zeng; Hao Peng; Angsheng Li; |
395 | Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It happens because they employ a single latent embedding for a frame while the multi-view images at the same frame were actually captured at different moments. To address this limitation, we introduce time offsets for individual unsynchronized videos and jointly optimize the offsets with NeRF. |
Seoha Kim; Jeongmin Bae; Youngsik Yun; Hahyun Lee; Gun Bang; Youngjung Uh; |
396 | Ghost Noise for Regularizing Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by our analysis, we propose a new regularization technique called Ghost Noise Injection (GNI) that imitates the noise in GBN without incurring the detrimental train-test discrepancy effects of small batch training. |
Atli Kosson; Dongyang Fan; Martin Jaggi; |
397 | DIUSum: Dynamic Image Utilization for Multimodal Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, whether intuitively or empirically, not all images can improve summary quality. Therefore, we propose a novel Dynamic Image Utilization framework for multimodal Summarization (DIUSum) to select and utilize valuable images for summarization. |
Min Xiao; Junnan Zhu; Feifei Zhai; Yu Zhou; Chengqing Zong; |
398 | RGMComm: Return Gap Minimization Via Discrete Communications in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To minimize the return gap, we propose the Return-Gap-Minimization Communication (RGMComm) algorithm, which is a surprisingly simple design of discrete message generation functions and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss function, which incorporates cosine-distance as the clustering metric. |
Jingdi Chen; Tian Lan; Carlee Joe-Wong; |
399 | MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes MuLTI, a highly accurate and efficient video-and-language understanding model that achieves efficient and effective feature fusion and rapid adaptation to downstream tasks. |
Jiaqi Xu; Bo Liu; Yunkuo Chen; Mengli Cheng; Xing Shi; |
400 | A Fixed-Point Approach to Unified Prompt-Based Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for concerned objects indicated by various prompt types, such as box, point, and text. |
Wei Lin; Antoni B. Chan; |
401 | Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model, by either misclassifying each instance away from its original prediction or relabeling the instance to a different label. |
Sungmin Cha; Sungjun Cho; Dasol Hwang; Honglak Lee; Taesup Moon; Moontae Lee; |
402 | SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a comprehensive benchmark that is meant to capture the amplification of social bias, via stigmas, in generative language models. |
Manish Nagireddy; Lamogha Chiazor; Moninder Singh; Ioana Baldini; |
403 | TaskLAMA: Probing The Complex Task Understanding of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We probe how accurately SCTD can be done with the knowledge extracted from pre-trained Large Language Models (LLMs). We introduce a new high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. |
Quan Yuan; Mehran Kazemi; Xin Xu; Isaac Noble; Vaiva Imbrasaite; Deepak Ramachandran; |
404 | Cross-Gate MLP with Protein Complex Invariant Embedding Is A One-Shot Antibody Designer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective model that can co-design 1D sequences and 3D structures of CDRs in a one-shot manner. |
Cheng Tan; Zhangyang Gao; Lirong Wu; Jun Xia; Jiangbin Zheng; Xihong Yang; Yue Liu; Bozhen Hu; Stan Z. Li; |
405 | Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i.e., those without seed entities). |
Yu Zhang; Yunyi Zhang; Yanzhen Shen; Yu Deng; Lucian Popa; Larisa Shwartz; ChengXiang Zhai; Jiawei Han; |
406 | CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. |
Dan Shi; Chaobin You; Jiantao Huang; Taihao Li; Deyi Xiong; |
407 | Enhancing Multi-Label Classification Via Dynamic Label-Order Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More critically, these methods can cause the model to rigidly memorize training order, resulting in missing labels during inference. In light of these limitations, this paper proposes a dynamic label-order learning approach that adaptively learns a label order for each sample. |
Jiangnan Li; Yice Zhang; Shiwei Chen; Ruifeng Xu; |
408 | Bootstrapping Large Language Models for Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing LLMs are pre-trained on general data, and suffer from the same problem of conventional approaches caused by knowledge gap between general and medical domain if they are applied to RRG. Therefore in this paper, we propose an approach to bootstrapping LLMs for RRG with a in-domain instance induction and a coarse-to-fine decoding process. |
Chang Liu; Yuanhe Tian; Weidong Chen; Yan Song; Yongdong Zhang; |
409 | Tackling Vision Language Tasks Through Learning Inner Monologues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The second approach presents decent performance, but feature alignment usually requires large amounts of training data and lacks interpretability. To tackle this dilemma, we propose a novel approach, Inner Monologue Multi-Modal Optimization (IMMO), to solve complex vision language problems by simulating Inner Monologue, a cognitive process in which an individual engages in silent verbal communication with themselves. |
Diji Yang; Kezhen Chen; Jinmeng Rao; Xiaoyuan Guo; Yawen Zhang; Jie Yang; Yi Zhang; |
410 | Towards Robust Image Stitching: An Adaptive Resistance Learning Against Compatible Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce a stitching-oriented attack (SoA), tailored to amplify the alignment loss within overlapping regions, thereby targeting the feature matching procedure. |
Zhiying Jiang; Xingyuan Li; Jinyuan Liu; Xin Fan; Risheng Liu; |
411 | A Robust Mutual-Reinforcing Framework for 3D Multi-Modal Medical Image Fusion Based on Visual-Semantic Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a robust 3D medical image fusion framework to establish a mutual-reinforcing mechanism between visual fusion and lesion segmentation, achieving their double improvement. |
Hao Zhang; Xuhui Zuo; Huabing Zhou; Tao Lu; Jiayi Ma; |
412 | Text Image Inpainting Via Global Structure-Guided Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Notably, current inpainting techniques often fail to adequately address this problem and have difficulties restoring accurate text images along with reasonable and consistent styles. Formulating this as an open problem of text image inpainting, this paper aims to build a benchmark to facilitate its study. |
Shipeng Zhu; Pengfei Fang; Chenjie Zhu; Zuoyan Zhao; Qiang Xu; Hui Xue; |
413 | LDS2AE: Local Diffusion Shared-Specific Autoencoder for Multimodal Remote Sensing Image Classification with Arbitrary Missing Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a local diffusion shared-specific autoencoder (LDS2AE), which solves the classification of arbitrary missing modalities with a single model. |
Jiahui Qu; Yuanbo Yang; Wenqian Dong; Yufei Yang; |
414 | TriSampler: A Better Negative Sampling Principle for Dense Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This exploration culminates in the unveiling of the quasi-triangular principle, a novel framework that elucidates the triangular-like interplay between query, positive document, and negative document. Fueled by this guiding principle, we introduce TriSampler, a straightforward yet highly effective negative sampling method. |
Zhen Yang; Zhou Shao; Yuxiao Dong; Jie Tang; |
415 | AvatarVerse: High-Quality & Stable 3D Avatar Creation from Text and Pose Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In specific, we introduce a 2D diffusion model conditioned on DensePose signal to establish 3D pose control of avatars through 2D images, which enhances view consistency from partially observed scenarios. |
Huichao Zhang; Bowen Chen; Hao Yang; Liao Qu; Xu Wang; Li Chen; Chao Long; Feida Zhu; Daniel Du; Min Zheng; |
416 | Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential Recommendations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Ada-Retrieval, an adaptive multi-round retrieval paradigm for recommender systems that iteratively refines user representations to better capture potential candidates in the full item space. |
Lei Li; Jianxun Lian; Xiao Zhou; Xing Xie; |
417 | MKG-FENN: A Multimodal Knowledge Graph Fused End-to-End Neural Network for Accurate Drug–Drug Interaction Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from them, this paper proposes a Multimodal Knowledge Graph Fused End-to-end Neural Network (MKGFENN) that consists of two main parts: multimodal knowledge graph (MKG) and fused end-to-end neural network (FENN). |
Di Wu; Wu Sun; Yi He; Zhong Chen; Xin Luo; |
418 | COMMA: Co-articulated Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is because the essential generic knowledge learned in the pretraining stage is partly forgotten in the fine-tuning process. In this paper, we propose Co-Articulated Multi-Modal Learning (COMMA) to handle the above limitations. |
Lianyu Hu; Liqing Gao; Zekang Liu; Chi-Man Pun; Wei Feng; |
419 | Task-Disruptive Background Suppression for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they overlook the characteristics of the background that generally contains various types of objects. In this paper, we highlight this characteristic of background which can bring problematic cases as follows: (1) when the query and support backgrounds are dissimilar and (2) when objects in the support background are similar to the target object in the query. |
Suho Park; SuBeen Lee; Sangeek Hyun; Hyun Seok Seong; Jae-Pil Heo; |
420 | Fully Data-Driven Pseudo Label Estimation for Pointly-Supervised Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the limitation of hand-crafted rules, we estimate pseudo labels with a fully data-driven pseudo label branch, which is optimized by point labels end-to-end and predicts more accurate pseudo labels than previous methods. |
Jing Li; Junsong Fan; Yuran Yang; Shuqi Mei; Jun Xiao; Zhaoxiang Zhang; |
421 | Motif-Aware Riemannian Graph Neural Network with Generative-Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a novel Motif-aware Riemannian model with Generative-Contrastive learning (MotifRGC), which conducts a minmax game in Riemannian manifold in a self-supervised manner. |
Li Sun; Zhenhao Huang; Zixi Wang; Feiyang Wang; Hao Peng; Philip S. Yu; |
422 | Fine-Grained Prototypes Distillation for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to distill the most representative support features into fine-grained prototypes. |
Zichen Wang; Bo Yang; Haonan Yue; Zhenghao Ma; |
423 | Generative Model Perception Rectification Algorithm for Trade-Off Between Diversity and Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Abnormal perception in generative models is typically caused by two factors: inadequate model structure and imbalanced data distribution. In response to this issue, we propose the dynamic model perception rectification algorithm (DMPRA) for generalized generative models. |
Guipeng Lan; Shuai Xiao; Jiachen Yang; Jiabao Wen; |
424 | Union Subgraph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. |
Jiaxing Xu; Aihu Zhang; Qingtian Bian; Vijay Prakash Dwivedi; Yiping Ke; |
425 | Harnessing The Power of SVD: An SVA Module for Enhanced Signal Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a Singular Value decomposition-based Attention, SVA is proposed to explore structure of signal data for adaptively enhancing intrinsic feature. |
Lei Zhai; Shuyuan Yang; Yitong Li; Zhixi Feng; Zhihao Chang; Quanwei Gao; |
426 | SIG: Speaker Identification in Literature Via Prompt-Based Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a simple and effective approach SIG, a generation-based method that verbalizes the task and quotation input based on designed prompt templates, which also enables easy integration of other auxiliary tasks that further bolster the speaker identification performance. |
Zhenlin Su; Liyan Xu; Jin Xu; Jiangnan Li; Mingdu Huangfu; |
427 | Accelerating Adversarially Robust Model Selection for Deep Neural Networks Via Racing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of this, it becomes challenging to select from a given set of neural network models the one that is best in terms of robust accuracy, i.e., the fraction of instances for which the model is known to be robust against adversarial perturbations, especially when given limited computing resources. To tackle this problem, we propose a racing method specifically adapted to the domain of robustness verification. |
Matthias König; Holger H. Hoos; Jan N. van Rijn; |
428 | MDFL: Multi-Domain Diffusion-Driven Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the masking texture effect observed in the human visual system, we present a multi-domain diffusion-driven feature learning network (MDFL) , a scheme to redefine the effective information domain that the model really focuses on. |
Daixun Li; Weiying Xie; Jiaqing Zhang; Yunsong Li; |
429 | Controllable Mind Visual Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach, termed as Controllable Mind Visual Diffusion Model (CMVDM). |
Bohan Zeng; Shanglin Li; Xuhui Liu; Sicheng Gao; Xiaolong Jiang; Xu Tang; Yao Hu; Jianzhuang Liu; Baochang Zhang; |
430 | Learning Encodings for Constructive Neural Combinatorial Optimization Needs to Regret Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article proposes a novel regret-based mechanism for an advanced solution construction process. |
Rui Sun; Zhi Zheng; Zhenkun Wang; |
431 | E2HQV: High-Quality Video Generation from Event Camera Via Theory-Inspired Model-Aided Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose E2HQV, a novel E2V paradigm designed to produce high-quality video frames from events. |
Qiang Qu; Yiran Shen; Xiaoming Chen; Yuk Ying Chung; Tongliang Liu; |
432 | Towards Human-like Learning from Relational Structured Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this talk, we present our recent attempts towards human-like learning from relational structured data. |
Quanming Yao; |
433 | Can Large Language Models Understand Real-World Complex Instructions? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing benchmarks are insufficient to assess LLMs’ ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs’ ability to follow complex instructions systematically. |
Qianyu He; Jie Zeng; Wenhao Huang; Lina Chen; Jin Xiao; Qianxi He; Xunzhe Zhou; Jiaqing Liang; Yanghua Xiao; |
434 | Concealing Sensitive Samples Against Gradient Leakage in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This creates a vulnerability that an adversary can exploit to reconstruct the sensitive data. Building upon this insight, we present a simple, yet effective defense strategy that obfuscates the gradients of the sensitive data with concealed samples. |
Jing Wu; Munawar Hayat; Mingyi Zhou; Mehrtash Harandi; |
435 | Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a new framework called Multi-Prompts ReID (MP-ReID), based on prompt learning and language models, to fully dip fine attributes to assist ReID task. |
Yajing Zhai; Yawen Zeng; Zhiyong Huang; Zheng Qin; Xin Jin; Da Cao; |
436 | A Local-Ascending-Global Learning Strategy for Brain-Computer Interface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel approach called the Local-Ascending-Global Learning Strategy (LAG) to uncover higher-level latent topological patterns among functional brain regions. |
Dongrui Gao; Haokai Zhang; Pengrui Li; Tian Tang; Shihong Liu; Zhihong Zhou; Shaofei Ying; Ye Zhu; Yongqing Zhang; |
437 | Noise-Aware Image Captioning with Progressively Exploring Mismatched Words Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike traditional noisy label learning, the key challenge in processing noisy image-text pairs is to finely identify the mismatched words to make the most use of trustworthy information in the text, rather than coarsely weighing the entire examples. To tackle this challenge, we propose a Noise-aware Image Captioning method (NIC) to adaptively mitigate the erroneous guidance from noise by progressively exploring mismatched words. |
Zhongtian Fu; Kefei Song; Luping Zhou; Yang Yang; |
438 | Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel context-dependent mapping network, named Context-I2W, for adaptively converting description-relevant Image information into a pseudo-word token composed of the description for accurate ZS-CIR. |
Yuanmin Tang; Jing Yu; Keke Gai; Jiamin Zhuang; Gang Xiong; Yue Hu; Qi Wu; |
439 | Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. |
Weibo Jiang; Weihong Ren; Jiandong Tian; Liangqiong Qu; Zhiyong Wang; Honghai Liu; |
440 | Semi-supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Class-agnostic motion prediction methods aim to comprehend motion within open-world scenarios, holding significance for autonomous driving systems. |
Kewei Wang; Yizheng Wu; Zhiyu Pan; Xingyi Li; Ke Xian; Zhe Wang; Zhiguo Cao; Guosheng Lin; |
441 | Hand-Centric Motion Refinement for 3D Hand-Object Interaction Via Hierarchical Spatial-Temporal Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although grasp tracking or object manipulation synthesis can produce coarse hand motion, this kind of motion is inevitably noisy and full of jitter. To address this problem, we propose a data-driven method for coarse motion refinement. |
Yuze Hao; Jianrong Zhang; Tao Zhuo; Fuan Wen; Hehe Fan; |
442 | High-Fidelity Diffusion-Based Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Extensive experiments demonstrate that our proposed framework and training strategy achieve high-fidelity reconstruction and editing results across various levels of denoising steps, meanwhile exhibits exceptional performance in terms of both quantitative metric and qualitative assessments. |
Chen Hou; Guoqiang Wei; Zhibo Chen; |
443 | Latent Diffusion Transformer for Probabilistic Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This research proposes to condense high-dimensional multivariate time series forecasting into a problem of latent space time series generation, to improve the expressiveness of each timestamp and make forecasting more manageable. To solve the problem that the existing work is hard to extend to high-dimensional multivariate time series, we present a latent multivariate time series diffusion framework called Latent Diffusion Transformer (LDT), which consists of a symmetric statistics-aware autoencoder and a diffusion-based conditional generator, to implement this idea. |
Shibo Feng; Chunyan Miao; Zhong Zhang; Peilin Zhao; |
444 | Delivering Inflated Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is possible that some features may change values and still lead to the same decision. In this paper we formally define inflated explanations which is a set of features, and for each feature a set of values (always including the value of the instance being explained), such that the decision will remain unchanged, for any of the values allowed for any of the features in the (inflated) abductive explanation. |
Yacine Izza; Alexey Ignatiev; Peter J. Stuckey; Joao Marques-Silva; |
445 | Far3D: Expanding The Horizon for Surround-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Expanding existing methods directly to cover long distances poses challenges such as heavy computation costs and unstable convergence. To address these limitations, this paper proposes a novel sparse query-based framework, dubbed Far3D. |
Xiaohui Jiang; Shuailin Li; Yingfei Liu; Shihao Wang; Fan Jia; Tiancai Wang; Lijin Han; Xiangyu Zhang; |
446 | DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation. |
Jiayu Zou; Kun Tian; Zheng Zhu; Yun Ye; Xingang Wang; |
447 | CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. |
Jiangming Shi; Shanshan Zheng; Xiangbo Yin; Yang Lu; Yuan Xie; Yanyun Qu; |
448 | Understanding The Role of The Projector in Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we revisit the efficacy of knowledge distillation as a function matching and metric learning problem. |
Roy Miles; Krystian Mikolajczyk; |
449 | Task-Adaptive Prompted Transformer for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the large domain gap between training and novel classes makes previous FSL methods perform poorly. To address this issue, we propose MetaPrompt, a Task-adaptive Prompted Transformer model for CD-FSL, by jointly exploiting prompt learning and the parameter generation framework. |
Jiamin Wu; Xin Liu; Xiaotian Yin; Tianzhu Zhang; Yongdong Zhang; |
450 | Reliable Conflictive Multi-View Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Then, we can construct view-specific opinions consisting of decision results and reliability. In the multi-view fusion stage, we propose a conflictive opinion aggregation strategy and theoretically prove this strategy can exactly model the relation of multi-view common and view-specific reliabilities. |
Cai Xu; Jiajun Si; Ziyu Guan; Wei Zhao; Yue Wu; Xiyue Gao; |
451 | Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, directly applying CLIP-like models to video visual relationship detection encounters significant challenges due to the substantial gap between images and video object relationships. To address this challenge, we propose a multi-modal prompting method that adapts CLIP well to open-vocabulary video visual relationship detection by prompt-tuning on both visual representation and language input. |
Shuo Yang; Yongqi Wang; Xiaofeng Ji; Xinxiao Wu; |
452 | Attention-Induced Embedding Imputation for Incomplete Multi-View Partial Multi-Label Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the widespread incompleteness problem on multi-view features and labels greatly hinders the practical application of multi-view multi-label classification. Therefore, in this paper, we propose an attention-induced missing instances imputation technique to enhance the generalization ability of the model. |
Chengliang Liu; Jinlong Jia; Jie Wen; Yabo Liu; Xiaoling Luo; Chao Huang; Yong Xu; |
453 | Relational Distant Supervision for Image Captioning Without Image-Text Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Considering that the relationship between objects carries more information, we use the object relationship as a more accurate connection between images and texts. In this paper, we adapt the idea of distant supervision that extracts the knowledge about object relationships from an external corpus and imparts them to images to facilitate inferring visual object relationships, without introducing any extra pre-trained relationship detectors. |
Yayun Qi; Wentian Zhao; Xinxiao Wu; |
454 | End-to-End Real-Time Vanishing Point Detection with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel transformer-based end-to-end real-time vanishing point detection method, which is named Vanishing Point TRansformer (VPTR). |
Xin Tong; Shi Peng; Yufei Guo; Xuhui Huang; |
455 | Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they fail to semantically align the generated images with the prompts due to their limited compositional capabilities, leading to attribute leakage, entity leakage, and missing entities. In this paper, we propose a novel attention mask control strategy based on predicted object boxes to address these issues. |
Ruichen Wang; Zekang Chen; Chen Chen; Jian Ma; Haonan Lu; Xiaodong Lin; |
456 | Learning Image Demoiréing from Unpaired Real Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel moiré generation framework to synthesize moiré images with diverse moiré features, resembling real moiré patches, and details akin to real moiré-free images. |
Yunshan Zhong; Yuyao Zhou; Yuxin Zhang; Fei Chao; Rongrong Ji; |
457 | DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the recent success of the diffusion probabilistic model (DPM), we found it is especially suitable for accurate and crisp edge detection since the denoising process is directly applied to the original image size. Therefore, we propose the first diffusion model for the task of general edge detection, which we call DiffusionEdge. |
Yunfan Ye; Kai Xu; Yuhang Huang; Renjiao Yi; Zhiping Cai; |
458 | NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we introduce NestE, a novel KG embedding approach that captures the semantics of both atomic and nested factual knowledge. |
Bo Xiong; Mojtaba Nayyeri; Linhao Luo; Zihao Wang; Shirui Pan; Steffen Staab; |
459 | GCNext: Towards The Unity of Graph Convolutions for Human Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging UniGC on network-level, we propose GCNext, a novel GCN-building paradigm that dynamically determines the best-fitting graph convolutions both sample-wise and layer-wise. |
Xinshun Wang; Qiongjie Cui; Chen Chen; Mengyuan Liu; |
460 | MGNet: Learning Correspondences Via Multiple Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But they ignore the complementary relationship between different types of graphs, which can effectively capture potential relationships among sparse correspondences. To address this problem, we propose MGNet to effectively combine multiple complementary graphs. |
Dai Luanyuan; Xiaoyu Du; Hanwang Zhang; Jinhui Tang; |
461 | Urban Region Embedding Via Multi-View Contrastive Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we form a new pipeline to learn consistent representations across varying views, and propose the multi-view Contrastive Prediction model for urban Region embedding (ReCP), which leverages the multiple information views from point-of-interest (POI) and human mobility data. |
Zechen Li; Weiming Huang; Kai Zhao; Min Yang; Yongshun Gong; Meng Chen; |
462 | Propagation Tree Is Not Deep: Adaptive Graph Contrastive Learning Approach for Rumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To focus learning on intensive substructures, we propose Rumor Adaptive Graph Contrastive Learning (RAGCL) method with adaptive view augmentation guided by node centralities. |
Chaoqun Cui; Caiyan Jia; |
463 | Panoptic Scene Graph Generation with Semantics-Prototype Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the intrinsic bias above, we propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones. |
Li Li; Wei Ji; Yiming Wu; Mengze Li; You Qin; Lina Wei; Roger Zimmermann; |
464 | Cross-Constrained Progressive Inference for 3D Hand Pose Estimation with Dynamic Observer-Decision-Adjuster Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current pose estimation is one-time feed-forward and lacks the capability to gather feedback and adapt the inference outcome. To address this problem, we propose to explore the concept of progressive inference where the network learns an observer to continuously detect the prediction error based on constraints matching, as well as an adjuster to refine its inference outcome based on these constraints errors. |
Zhehan Kan; Xueting Hu; Zihan Liao; Ke Yu; Zhihai He; |
465 | Multi-Level Cross-Modal Alignment for Image Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. |
Liping Qiu; Qin Zhang; Xiaojun Chen; Shaotian Cai; |
466 | ViTEraser: Harnessing The Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser. |
Dezhi Peng; Chongyu Liu; Yuliang Liu; Lianwen Jin; |
467 | S2CycleDiff: Spatial-Spectral-Bilateral Cycle-Diffusion Framework for Hyperspectral Image Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a spatial-spectral-bilateral cycle-diffusion framework (S2CycleDiff) for HISR, which can step-wise generate the HrHSI with high spatial-spectral fidelity by learning the conditional distribution of spatial and spectral super-resolution processes bilaterally. |
Jiahui Qu; Jie He; Wenqian Dong; Jingyu Zhao; |
468 | Enhancing Neural Radiance Fields with Adaptive Multi-Exposure Fusion: A Bilevel Optimization Approach for Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the complexity of luminance information, existing NeRF methods often struggle to produce satisfactory renderings when dealing with high and low exposure images. To address this issue, we propose an innovative approach capable of effectively modeling and rendering images under multiple exposure conditions. |
Yang Zou; Xingyuan Li; Zhiying Jiang; Jinyuan Liu; |
469 | Towards Squeezing-Averse Virtual Try-On Via Sequential Deformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such contrary objectives feedback the misaligned gradients to a cascaded appearance flow estimation, resulting in undesirable squeezing artifacts. To reduce this, we propose a Sequential Deformation (SD-VITON) that disentangles the appearance flow prediction layers into TV objective-dominant (TVOB) layers and a task-coexistence (TACO) layer. |
Sang-Heon Shim; Jiwoo Chung; Jae-Pil Heo; |
470 | Leveraging Opposite Gender Interaction Ratio As A Path Towards Fairness in Online Dating Recommendations Based on User Sexual Orientation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, sexual orientation, which plays a significant role in finding a satisfying relationship, is under-investigated. To fill this crucial gap, we propose a novel metric, Opposite Gender Interaction Ratio (OGIR), as a way to investigate potential unfairness for users with varying preferences towards the opposite gender. |
Yuying Zhao; Yu Wang; Yi Zhang; Pamela Wisniewski; Charu Aggarwal; Tyler Derr; |
471 | Knowledge Graph Prompting for Multi-Document Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, few works explore this paradigm in multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. |
Yu Wang; Nedim Lipka; Ryan A. Rossi; Alexa Siu; Ruiyi Zhang; Tyler Derr; |
472 | SAVSR: Arbitrary-Scale Video Super-Resolution Via A Learned Scale-Adaptive Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Scale-adaptive Arbitrary-scale Video Super-Resolution network (SAVSR), which is the first work focusing on spatial VSR at arbitrary scales including both non-integer and asymmetric scales. |
Zekun Li; Hongying Liu; Fanhua Shang; Yuanyuan Liu; Liang Wan; Wei Feng; |
473 | TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, an overreliance on generated policy often leads to error accumulation, resulting in suboptimal responses when adhering to incorrect actions. To combat these challenges, we propose turn-level multi-task objectives for the encoder. |
Longxiang Liu; Xiuxing Li; Yang Feng; |
474 | Boosting Adversarial Transferability Across Model Genus By Deformation-Constrained Warping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained Warping Attack (DeCoWA), that can be effectively applied to cross model genus attack. |
Qinliang Lin; Cheng Luo; Zenghao Niu; Xilin He; Weicheng Xie; Yuanbo Hou; Linlin Shen; Siyang Song; |
475 | Principal-Agent Reward Shaping in MDPs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. |
Omer Ben-Porat; Yishay Mansour; Michal Moshkovitz; Boaz Taitler; |
476 | An Effective Augmented Lagrangian Method for Fine-Grained Multi-View Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed an effective Augmented Lagrangian MethOd for fiNe-graineD (ALMOND) multi-view optimization. |
Yuze Tan; Hecheng Cai; Shudong Huang; Shuping Wei; Fan Yang; Jiancheng Lv; |
477 | Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The standard Neural Radiance Fields (NeRF) paradigm employs a viewer-centered methodology, entangling the aspects of illumination and material reflectance into emission solely from 3D points. This simplified rendering approach presents challenges in accurately modeling images captured under adverse lighting conditions, such as low light or over-exposure. |
Ziteng Cui; Lin Gu; Xiao Sun; Xianzheng Ma; Yu Qiao; Tatsuya Harada; |
478 | Offline and Online Optical Flow Enhancement for Deep Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the twofold limitations by enhancing the optical flows in two stages: offline and online. |
Chuanbo Tang; Xihua Sheng; Zhuoyuan Li; Haotian Zhang; Li Li; Dong Liu; |
479 | M2Doc: A Multi-Modal Fusion Approach for Document Layout Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, current multi-modal methods perform worse than unimodal detectors on complex layout analysis datasets. To address these limitations, we propose an effective and pluggable multi-modal fusion approach named M2Doc, which fuses visual and textual features for better layout detection. |
Ning Zhang; Hiuyi Cheng; Jiayu Chen; Zongyuan Jiang; Jun Huang; Yang Xue; Lianwen Jin; |
480 | A Diffusion-Based Framework for Multi-Class Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, these methods might face challenges related to the preservation of image categories and pixel-wise structural integrity in the more practical multi-class setting. To solve the above problems, we propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection, which consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion’s denoising network, and a feature-space pre-trained feature extractor. |
Haoyang He; Jiangning Zhang; Hongxu Chen; Xuhai Chen; Zhishan Li; Xu Chen; Yabiao Wang; Chengjie Wang; Lei Xie; |
481 | MatchDet: A Collaborative Framework for Image Matching and Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a collaborative framework called MatchDet (i.e. task-collaborative) is proposed for image matching and object detection to obtain mutual improvements. |
Jinxiang Lai; Wenlong Wu; Bin-Bin Gao; Jun Liu; Jiawei Zhan; Congchong Nie; Yi Zeng; Chengjie Wang; |
482 | PREFER: Prompt Ensemble Learning Via Feedback-Reflect-Refine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple, universal, and automatic method named PREFER (Prompt Ensemble learning via Feedback-Reflect-Refine) to address the stated limitations. |
Chenrui Zhang; Lin Liu; Chuyuan Wang; Xiao Sun; Hongyu Wang; Jinpeng Wang; Mingchen Cai; |
483 | BEV-MAE: Bird’s Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving. |
Zhiwei Lin; Yongtao Wang; Shengxiang Qi; Nan Dong; Ming-Hsuan Yang; |
484 | A Comprehensive Analysis of The Effectiveness of Large Language Models As Automatic Dialogue Evaluators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we conduct a comprehensive study on the application of LLMs for automatic dialogue evaluation. |
Chen Zhang; Luis Fernando D’Haro; Yiming Chen; Malu Zhang; Haizhou Li; |
485 | Mining Gaze for Contrastive Learning Toward Computer-Assisted Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without ethical issues. |
Zihao Zhao; Sheng Wang; Qian Wang; Dinggang Shen; |
486 | CoPL: Contextual Prompt Learning for Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Second, existing work weights all prompts equally whereas intuitively, prompts should be reweighed according to the semantics of the image. We address these as part of our proposed Contextual Prompt Learning (CoPL) framework, capable of aligning the prompts to the localized features of the image. |
Koustava Goswami; Srikrishna Karanam; Prateksha Udhayanan; K J Joseph; Balaji Vasan Srinivasan; |
487 | ITrendRNN: An Interpretable Trend-Aware RNN for Meteorological Spatiotemporal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a straightforward and interpretable differential framework, where the key lies in explicitly estimating the evolutionary trends. |
Xu Huang; Chuyao Luo; Bowen Zhang; Huiwei Lin; Xutao Li; Yunming Ye; |
488 | SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods transform layout information into tokens or RGB images for conditional control in the generative process, leading to insufficient spatial and semantic controllability of individual instances. To address these limitations, we propose a novel Spatial-Semantic Map Guided (SSMG) diffusion model that adopts the feature map, derived from the layout, as guidance. |
Chengyou Jia; Minnan Luo; Zhuohang Dang; Guang Dai; Xiaojun Chang; Mengmeng Wang; Jingdong Wang; |
489 | Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models’ capacity for compositional generalization. |
Yuyang Chai; Zhuang Li; Jiahui Liu; Lei Chen; Fei Li; Donghong Ji; Chong Teng; |
490 | Double-Layer Hybrid-Label Identification Feature Selection for Multi-View Multi-Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To deal with the double problems in multi-view multi-label feature selection, we propose a unified loss function which is a totally splitting structure for observed labels as hybrid labels that is, common labels, view-to-all specific labels and noisy labels, and the view-to-all specific labels further splits into several specific labels of each view. |
Pingting Hao; Kunpeng Liu; Wanfu Gao; |
491 | 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this conventional paradigm encounters significant challenges, most notably in terms of the generation of lackluster initial proposals and a pronounced deceleration in inference speed. Recognizing these limitations, we introduce an innovative end-to-end Superpoint-Text Matching Network (3D-STMN) that is enriched by dependency-driven insights. |
Changli Wu; Yiwei Ma; Qi Chen; Haowei Wang; Gen Luo; Jiayi Ji; Xiaoshuai Sun; |
492 | Robustness-Guided Image Synthesis for Data-Free Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Robustness-Guided Image Synthesis (RIS), a simple but effective method to enrich the semantics of synthetic images and improve image diversity, further boosting the performance of data-free compression tasks. |
Jianhong Bai; Yuchen Yang; Huanpeng Chu; Hualiang Wang; Zuozhu Liu; Ruizhe Chen; Xiaoxuan He; Lianrui Mu; Chengfei Cai; Haoji Hu; |
493 | Video Frame Prediction from A Single Image and Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to predict video frames from a single image and the following events, which can not only handle complex dynamic scenes but also predict future frames with flexible prediction time intervals. |
Juanjuan Zhu; Zhexiong Wan; Yuchao Dai; |
494 | Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. |
Jialu Zhang; Xiaoying Yang; Wentao He; Jianfeng Ren; Qian Zhang; Yitian Zhao; Ruibin Bai; Xiangjian He; Jiang Liu; |
495 | S3A: Towards Realistic Zero-Shot Classification Via Self Structural Semantic Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the new problem, we propose the Self Structural Semantic Alignment (S3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning. |
Sheng Zhang; Muzammal Naseer; Guangyi Chen; Zhiqiang Shen; Salman Khan; Kun Zhang; Fahad Shahbaz Khan; |
496 | Multimodal Graph Neural Architecture Search Under Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing MGNAS fails to handle distribution shifts that naturally exist in multimodal graph data, since the searched architectures inevitably capture spurious statistical correlations under distribution shifts. To solve this problem, we propose a novel Out-of-distribution Generalized Multimodal Graph Neural Architecture Search (OMG-NAS) method which optimizes the MGNN architecture with respect to its performance on decorrelated OOD data. |
Jie Cai; Xin Wang; Haoyang Li; Ziwei Zhang; Wenwu Zhu; |
497 | Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We overcome these limitations by considering (1) federated agents operating in an asynchronous communication paradigm, where no mandatory synchronization is required and all agents communicate independently with the server, (2) heterogeneous user behaviors, where users can be stratified into latent user clusters, each exhibiting distinct preferences. For this setting, we propose a UCB-type algorithm with delicate communication protocols. |
Hantao Yang; Xutong Liu; Zhiyong Wang; Hong Xie; John C. S. Lui; Defu Lian; Enhong Chen; |
498 | Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Multi-Scale Video-Text Correspondence Learning (MVTCL) framework, which enhances the grounding performance in complex scenes by modeling multi-scale semantic correspondence both within and between modalities. |
Wenjia Geng; Yong Liu; Lei Chen; Sujia Wang; Jie Zhou; Yansong Tang; |
499 | Neural Time-Reversed Generalized Riccati Equation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel neural-based approach to optimal control. |
Alessandro Betti; Michele Casoni; Marco Gori; Simone Marullo; Stefano Melacci; Matteo Tiezzi; |
500 | Multi-Domain Incremental Learning for Face Presentation Attack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous face Presentation Attack Detection (PAD) methods aim to improve the effectiveness of cross-domain tasks. |
Keyao Wang; Guosheng Zhang; Haixiao Yue; Ajian Liu; Gang Zhang; Haocheng Feng; Junyu Han; Errui Ding; Jingdong Wang; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~2,500 papers), please visit Paper Digest: AAAI-2024 (Full List).