Paper Digest: ICCV 2023 Highlights
To search or review papers within ICCV-2023 related to a specific topic, please use the search by venue (ICCV-2023) and review by venue (ICCV-2023) services. To browse papers by author, here is a list of all authors (ICCV-2023). You may also like to explore our “Best Paper” Digest (ICCV), which lists the most influential ICCV papers since 1988.
Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services to track, search, review and rewrite scientific literature.
You are welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICCV 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | Towards Attack-tolerant Federated Learning Via Critical Parameter Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Parameter Analysis). |
Sungwon Han; Sungwon Park; Fangzhao Wu; Sundong Kim; Bin Zhu; Xing Xie; Meeyoung Cha; |
2 | Stochastic Segmentation with Conditional Categorical Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. |
Lukas Zbinden; Lars Doorenbos; Theodoros Pissas; Adrian Thomas Huber; Raphael Sznitman; Pablo Márquez-Neila; |
3 | Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. |
Xunpeng Yi; Han Xu; Hao Zhang; Linfeng Tang; Jiayi Ma; |
4 | Bird’s-Eye-View Scene Graph for Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current agents are built upon panoramic observations, which hinders their ability to perceive 3D scene geometry and easily leads to ambiguous selection of panoramic view. To address these limitations, we present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment under the supervision of 3D detection. |
Rui Liu; Xiaohan Wang; Wenguan Wang; Yi Yang; |
5 | PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). |
Bowen Li; Ziyuan Huang; Junjie Ye; Yiming Li; Sebastian Scherer; Hang Zhao; Changhong Fu; |
6 | A Dynamic Dual-Processing Object Detection Framework Inspired By The Brain’s Recognition Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research in neuroscience has shown that the recognition decision in the brain is based on two processes, namely familiarity and recollection. Based on this biological support, we propose an efficient and effective dual-processing object detection framework. |
Minying Zhang; Tianpeng Bu; Lulu Hu; |
7 | Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the vulnerability indeed exists. |
Zhengzhi Lu; He Wang; Ziyi Chang; Guoan Yang; Hubert P. H. Shum; |
8 | GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Autonomous vehicles operating in complex real-world environments require accurate predictions of interactive behaviors between traffic participants. This paper tackles the interaction prediction problem by formulating it with hierarchical game theory and proposing the GameFormer model for its implementation. |
Zhiyu Huang; Haochen Liu; Chen Lv; |
9 | Towards Better Robustness Against Common Corruptions for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards improving RaCC for UDA methods in an unsupervised manner, we propose a novel Distributionally and Discretely Adversarial Regularization (DDAR) framework in this paper. |
Zhiqiang Gao; Kaizhu Huang; Rui Zhang; Dawei Liu; Jieming Ma; |
10 | Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not surprisingly, we find that most LT-MLC and PL-MLC approaches fail to solve the PLT-MLC, resulting in significant performance degradation on the two proposed PLT-MLC benchmarks. Therefore, we propose an end-to-end learning framework: COrrection -> ModificatIon -> balanCe, abbreviated as COMC. |
Wenqiao Zhang; Changshuo Liu; Lingze Zeng; Bengchin Ooi; Siliang Tang; Yueting Zhuang; |
11 | Flexible Visual Recognition By Evidential Modeling of Confusion and Ignorance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Two challenges emerge along with this novel task. First, prediction uncertainty should be separately quantified as confusion depicting inter-class uncertainties and ignorance identifying out-of-distribution samples. Second, both confusion and ignorance should be comparable between samples to enable effective decision-making. In this paper, we propose to model these two sources of uncertainty explicitly with the theory of Subjective Logic. |
Lei Fan; Bo Liu; Haoxiang Li; Ying Wu; Gang Hua; |
12 | Texture Generation on 3D Meshes with Point-UV Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on synthesizing high-quality textures on 3D meshes. |
Xin Yu; Peng Dai; Wenbo Li; Lan Ma; Zhengzhe Liu; Xiaojuan Qi; |
13 | Supervised Homography Learning with Realistic Dataset Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an iterative framework, which consists of two phases: a generation phase and a training phase, to generate realistic training data and yield a supervised homography network. |
Hai Jiang; Haipeng Li; Songchen Han; Haoqiang Fan; Bing Zeng; Shuaicheng Liu; |
14 | E2E-LOAD: End-to-End Long-form Online Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods are constrained by their fixed backbone design, which fails to leverage the potential benefits of a trainable backbone. This paper introduces an end-to-end learning network that revises these approaches, incorporating a backbone network design that improves effectiveness and efficiency. |
Shuqiang Cao; Weixin Luo; Bairui Wang; Wei Zhang; Lin Ma; |
15 | TALL: Thumbnail Layout for Deepfake Video Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. |
Yuting Xu; Jian Liang; Gengyun Jia; Ziming Yang; Yanhao Zhang; Ran He; |
16 | Enhanced Soft Label for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, modern self-training based SSL algorithms use a pre-defined constant threshold to select unlabeled pixel samples that contribute to the training, thus failing to be compatible with different learning difficulties of variant categories and different learning status of the model. To address these issues, we propose Enhanced Soft Label (ESL), a curriculum learning approach to fully leverage the high-value supervisory signals implicit in the untrustworthy pseudo label. |
Jie Ma; Chuan Wang; Yang Liu; Liang Lin; Guanbin Li; |
17 | Self-supervised Monocular Depth Estimation: Let’s Talk About The Weather Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it is tempting to use such data augmentations for self-supervised depth, in the past this was shown to degrade performance instead of improving it. In this paper, we put forward a method that uses augmentations to remedy this problem. |
Kieran Saunders; George Vogiatzis; Luis J. Manso; |
18 | Bidirectional Alignment for Domain Adaptive Detection with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Bidirectional Alignment for domain adaptive Detection with Transformers (BiADT) to improve cross domain object detection performance. |
Liqiang He; Wei Wang; Albert Chen; Min Sun; Cheng-Hao Kuo; Sinisa Todorovic; |
19 | Fast Neural Scene Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that scene flow is different—with the dominant computational bottleneck stemming from the loss function itself (i.e., Chamfer distance). |
Xueqian Li; Jianqiao Zheng; Francesco Ferroni; Jhony Kaesemodel Pontes; Simon Lucey; |
20 | CAME: Contrastive Automated Model Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Contrastive Automatic Model Evaluation (CAME), a novel AutoEval framework that is rid of involving training set in the loop. |
Ru Peng; Qiuyang Duan; Haobo Wang; Jiachen Ma; Yanbo Jiang; Yongjun Tu; Xiu Jiang; Junbo Zhao; |
21 | ExposureDiffusion: Learning to Expose for Low-light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model. Different from a vanilla diffusion model that has to perform Gaussian denoising, with the injected physics-based exposure model, our restoration process can directly start from a noisy image instead of pure noise. |
Yufei Wang; Yi Yu; Wenhan Yang; Lanqing Guo; Lap-Pui Chau; Alex C. Kot; Bihan Wen; |
22 | HM-ViT: Hetero-Modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the multi-agent hetero-modal cooperative perception problem where agents may have distinct sensor modalities. |
Hao Xiang; Runsheng Xu; Jiaqi Ma; |
23 | HyperReenact: One-Shot Reenactment Via Jointly Learning to Refine and Retarget Faces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. |
Stella Bounareli; Christos Tzelepis; Vasileios Argyriou; Ioannis Patras; Georgios Tzimiropoulos; |
24 | Order-preserving Consistency Regularization for Domain Adaptation and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Order-preserving Consistency Regularization (OCR) for cross-domain tasks. |
Mengmeng Jing; Xiantong Zhen; Jingjing Li; Cees G. M. Snoek; |
25 | RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on egocentric videos of Ego4D, we constructed a broad coverage of the video-based referring expression comprehension dataset: RefEgo. |
Shuhei Kurita; Naoki Katsura; Eri Onami; |
26 | Exploring Temporal Frequency Spectrum in Deep Video Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the blurred sequence in the Fourier space and figure out some intrinsic frequency-temporal priors that imply the temporal blur degradation can be accessibly decoupled in the potential frequency domain. |
Qi Zhu; Man Zhou; Naishan Zheng; Chongyi Li; Jie Huang; Feng Zhao; |
27 | Unified Visual Relationship Detection with Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The issue is exacerbated in visual relationship detection when second-order visual semantics are introduced between pairs of objects. To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs). |
Long Zhao; Liangzhe Yuan; Boqing Gong; Yin Cui; Florian Schroff; Ming-Hsuan Yang; Hartwig Adam; Ting Liu; |
28 | Occ^2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Occ^2Net, a novel image matching method that models occlusion relations using 3D occupancy and infers matching points in occluded regions. |
Miao Fan; Mingrui Chen; Chen Hu; Shuchang Zhou; |
29 | Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Make-An-Animation, a text-conditioned human motion generation model which learns more diverse poses and prompts from large-scale image-text datasets, enabling significant improvement in performance over prior works. |
Samaneh Azadi; Akbar Shah; Thomas Hayes; Devi Parikh; Sonal Gupta; |
30 | Rickrolling The Artist: Injecting Backdoors Into Text Encoders for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. |
Lukas Struppek; Dominik Hintersdorf; Kristian Kersting; |
31 | LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is because they have to synthesize intricate details about all objects in an image based on a text description. Therefore, we present a technique for segmenting real and AI-generated images using latent diffusion models (LDMs) trained on internet-scale datasets. |
Koutilya PNVR; Bharat Singh; Pallabi Ghosh; Behjat Siddiquie; David Jacobs; |
32 | Workie-Talkie: Accelerating Federated Learning By Overlapping Computing and Communications Via Contrastive Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address aforementioned challenges, in this paper, we propose a novel "workie-talkie" FL scheme, which can accelerate FL’s training by overlapping local computing and wireless communications via contrastive regularization (FedCR). |
Rui Chen; Qiyu Wan; Pavana Prakash; Lan Zhang; Xu Yuan; Yanmin Gong; Xin Fu; Miao Pan; |
33 | Downstream-agnostic Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. |
Ziqi Zhou; Shengshan Hu; Ruizhi Zhao; Qian Wang; Leo Yu Zhang; Junhui Hou; Hai Jin; |
34 | Late Stopping: Avoiding Confidently Learning from Mislabeled Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. |
Suqin Yuan; Lei Feng; Tongliang Liu; |
35 | AerialVLN: Vision-and-Language Navigation for UAVs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. |
Shubo Liu; Hongsheng Zhang; Yuankai Qi; Peng Wang; Yanning Zhang; Qi Wu; |
36 | On The Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the robustness of OWTTT we first develop an adaptive strong OOD pruning which improves the efficacy of the self-training TTT method. We further propose a way to dynamically expand the prototypes to represent strong OOD samples for an improved weak/strong OOD data separation. |
Yushu Li; Xun Xu; Yongyi Su; Kui Jia; |
37 | Studying How to Efficiently and Effectively Guide Models with Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand the effectiveness of the various design choices that have been explored in the context of model guidance, in this work we conduct an in-depth evaluation across various loss functions, attribution methods, models, and ‘guidance depths’ on the PASCAL VOC 2007 and MS COCO 2014 datasets. |
Sukrut Rao; Moritz Böhle; Amin Parchami-Araghi; Bernt Schiele; |
38 | Most Important Person-Guided Dual-Branch Cross-Patch Attention for Group Affect Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a solution by incorporating the psychological concept of the Most Important Person (MIP), which represents the most noteworthy face in a crowd and has affective semantic meaning. |
Hongxia Xie; Ming-Xian Lee; Tzu-Jui Chen; Hung-Jen Chen; Hou-I Liu; Hong-Han Shuai; Wen-Huang Cheng; |
39 | SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL). |
Hong Yan; Yang Liu; Yushen Wei; Zhen Li; Guanbin Li; Liang Lin; |
40 | Achievement-Based Training Progress Balancing for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To balance the training progress, we propose an achievement-based multi-task loss to modulate training speed based on the "achievement," defined as the ratio of current accuracy to single-task accuracy. |
Hayoung Yun; Hanjoo Cho; |
41 | Pose-Free Neural Radiance Fields Via Implicit Pose Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design IR-NeRF, an innovative pose-free NeRF that introduces implicit pose regularization to refine pose estimator with unposed real images and improve the robustness of the pose estimation for real images. |
Jiahui Zhang; Fangneng Zhan; Yingchen Yu; Kunhao Liu; Rongliang Wu; Xiaoqin Zhang; Ling Shao; Shijian Lu; |
42 | Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Self-supervised learning framework for Dual reversed RS distortions Correction (SelfDRSC), where a DRSC network can be learned to generate a high framerate GS video only based on dual RS images with reversed distortions. |
Wei Shang; Dongwei Ren; Chaoyu Feng; Xiaotao Wang; Lei Lei; Wangmeng Zuo; |
43 | Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is that conflicts within pseudo labels, identified through symbolic knowledge, can serve as strong yet commonly ignored learning signals. |
Chen Liang; Wenguan Wang; Jiaxu Miao; Yi Yang; |
44 | Self-Supervised Monocular Depth Estimation By Direction-aware Cumulative Convolution Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency in the feature representation. |
Wencheng Han; Junbo Yin; Jianbing Shen; |
45 | Encyclopedic VQA: Visual Questions About Detailed Properties of Fine-Grained Categories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. |
Thomas Mensink; Jasper Uijlings; Lluis Castrejon; Arushi Goel; Felipe Cadar; Howard Zhou; Fei Sha; André Araujo; Vittorio Ferrari; |
46 | Towards Understanding The Generalization of Deepfake Detectors from A Game-Theoretical View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to explain the generalization of deepfake detectors from the novel perspective of multi-order interactions among visual concepts. |
Kelu Yao; Jin Wang; Boyu Diao; Chao Li; |
47 | Few-Shot Common Action Localization Via Cross-Attentional Fusion of Context and Temporal Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to localize action instances in a long untrimmed query video using just meager trimmed support videos representing a common action whose class information is not given. |
Juntae Lee; Mihir Jain; Sungrack Yun; |
48 | Physically-Plausible Illumination Distribution Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As part of this effort, we extend the Cube++ illumination estimation dataset to provide ground truth illumination distributions per image. Using this new ground truth data, we describe how to train a lightweight neural network method to predict the scene’s illumination distribution. |
Egor Ershov; Vasily Tesalin; Ivan Ermakov; Michael S. Brown; |
49 | 3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that 3D point locations can provide more information than rays. Therefore, we introduce 3D point positional encoding, 3DPPE, to the 3D detection Transformer decoder. |
Changyong Shu; Jiajun Deng; Fisher Yu; Yifan Liu; |
50 | Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel clustering-based F&B separation algorithm. |
Qinying Liu; Zilei Wang; Shenghai Rong; Junjie Li; Yixin Zhang; |
51 | VertexSerum: Poisoning Graph Neural Networks for Link Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. |
Ruyi Ding; Shijin Duan; Xiaolin Xu; Yunsi Fei; |
52 | NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. |
Chenfeng Xu; Bichen Wu; Ji Hou; Sam Tsai; Ruilong Li; Jialiang Wang; Wei Zhan; Zijian He; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka; |
53 | Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. |
Kun Yang; Dingkang Yang; Jingyu Zhang; Mingcheng Li; Yang Liu; Jing Liu; Hanqi Wang; Peng Sun; Liang Song; |
54 | LPFF: A Portrait Dataset for Face Generators Across Large Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present LPFF, a large-pose Flickr face dataset comprised of 19,590 high-quality real large-pose portrait images. |
Yiqian Wu; Jing Zhang; Hongbo Fu; Xiaogang Jin; |
55 | Pseudo-label Alignment for Semi-supervised Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in existing pipelines, pseudo-labels that contain valuable information may be directly filtered out due to mismatches in class and mask quality. To address this issue, we propose a novel framework, called pseudo-label aligning instance segmentation (PAIS), in this paper. |
Jie Hu; Chen Chen; Liujuan Cao; Shengchuan Zhang; Annan Shu; Guannan Jiang; Rongrong Ji; |
56 | Deep Geometrized Cartoon Line Inbetweening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To preserve the precision and detail of the line drawings, we propose a new approach, called AnimeInbet, which geometrizes raster line drawings into graphs of endpoints and reframes the inbetweening task as a graph fusion problem with vertex repositioning. |
Li Siyao; Tianpei Gu; Weiye Xiao; Henghui Ding; Ziwei Liu; Chen Change Loy; |
57 | MixBag: Bag-Level Data Augmentation for Learning from Label Proportions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a bag-level data augmentation method for LLP called MixBag, which is based on the key observation from our preliminary experiments; that the instance-level classification accuracy improves as the number of labeled bags increases even though the total number of instances is fixed. |
Takanori Asanomi; Shinnosuke Matsuo; Daiki Suehiro; Ryoma Bise; |
58 | Effective Real Image Editing with Accelerated Iterative Diffusion Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity. |
Zhihong Pan; Riccardo Gherardi; Xiufeng Xie; Stephen Huang; |
59 | 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we propose a generative model of deep features based on a volumetric human representation with Gaussian ellipsoidal kernels emitting 3D pose-dependent feature vectors. |
Yi Zhang; Pengliang Ji; Angtian Wang; Jieru Mei; Adam Kortylewski; Alan Yuille; |
60 | Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the way humans recognize Chinese texts, we propose a two-stage framework for CTR. |
Haiyang Yu; Xiaocong Wang; Bin Li; Xiangyang Xue; |
61 | MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging the Unreal Engine 5 City Sample project, we developed a pipeline to easily collect aerial and street city views, accompanied by ground-truth camera poses and a range of additional data modalities. |
Yixuan Li; Lihan Jiang; Linning Xu; Yuanbo Xiangli; Zhenzhi Wang; Dahua Lin; Bo Dai; |
62 | LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image. |
Jiapeng Zhu; Ceyuan Yang; Yujun Shen; Zifan Shi; Bo Dai; Deli Zhao; Qifeng Chen; |
63 | Exploiting Proximity-Aware Tasks for Embodied Social Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end architecture that exploits Proximity-Aware Tasks (referred as to Risk and Proximity Compass) to inject into a reinforcement learning navigation policy the ability to infer common-sense social behaviours. |
Enrico Cancelli; Tommaso Campari; Luciano Serafini; Angel X. Chang; Lamberto Ballan; |
64 | SVDiff: Compact Parameter Space for Diffusion Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to address the limitations in existing text-to-image diffusion models for personalization and customization. |
Ligong Han; Yinxiao Li; Han Zhang; Peyman Milanfar; Dimitris Metaxas; Feng Yang; |
65 | UniFace: Unified Cross-Entropy Loss for Deep Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, no unified threshold is available to separate positive sample-to-class pairs from negative sample-to-class pairs. To bridge this gap, we design a UCE (Unified Cross-Entropy) loss for face recognition model training, which is built on the vital constraint that all the positive sample-to-class similarities shall be larger than the negative ones. |
Jiancan Zhou; Xi Jia; Qiufu Li; Linlin Shen; Jinming Duan; |
66 | Jumping Through Local Minima: Quantization in The Loss Landscape of Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, dubbed Evol-Q, we use evolutionary search to effectively traverse the non-smooth landscape. Additionally, we propose using an infoNCE loss, which not only helps combat overfitting on the small (1,000 images) calibration dataset but also makes traversing such a highly non-smooth surface easier. |
Natalia Frumkin; Dibakar Gope; Diana Marculescu; |
67 | Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel method for automatic corruption detection, which allows for blind corruption restoration without known corruption masks. |
Xin Feng; Yifeng Xu; Guangming Lu; Wenjie Pei; |
68 | Learning Optical Flow from Event Camera with Rendered Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to render a physically correct event-flow dataset using computer graphics models. |
Xinglong Luo; Kunming Luo; Ao Luo; Zhengning Wang; Ping Tan; Shuaicheng Liu; |
69 | EPiC: Ensemble of Partial Point Clouds for Robust Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose a general ensemble framework, based on partial point cloud sampling. |
Meir Yossef Levi; Guy Gilboa; |
70 | Distilling Large Vision-Language Model with Out-of-Distribution Generalizability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose two principles from vision and language modality perspectives to enhance student’s OOD generalization: (1) by better imitating teacher’s visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher’s language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. |
Xuanlin Li; Yunhao Fang; Minghua Liu; Zhan Ling; Zhuowen Tu; Hao Su; |
71 | Cross-Modal Learning with 3D Deformable Attention for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. |
Sangwon Kim; Dasom Ahn; Byoung Chul Ko; |
72 | What Do Neural Networks Learn in Image Classification? A Frequency Shortcut Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a metric to measure class-wise frequency characteristics and a method to identify frequency shortcuts. |
Shunxin Wang; Raymond Veldhuis; Christoph Brune; Nicola Strisciuglio; |
73 | Tracking By 3D Model Estimation of Unknown Objects in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. |
Denys Rozumnyi; Jiří Matas; Marc Pollefeys; Vittorio Ferrari; Martin R. Oswald; |
74 | ScatterNeRF: Seeing Through Fog with Physically-Based Inverse Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce ScatterNeRF, a neural rendering method which adequately renders foggy scenes and decomposes the fog-free background from the participating media — exploiting the multiple views from a short automotive sequence without the need for a large training data corpus. |
Andrea Ramazzina; Mario Bijelic; Stefanie Walz; Alessandro Sanvito; Dominik Scheuble; Felix Heide; |
75 | Sigmoid Loss for Language Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple pairwise sigmoid loss for image-text pre-training. |
Xiaohua Zhai; Basil Mustafa; Alexander Kolesnikov; Lucas Beyer; |
76 | PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generic image captions often miss visual details essential for the LM to answer visual questions correctly. To address this challenge, we propose PromptCap (Prompt-guided image Captioning), a captioning model designed to serve as a better connector between images and black-box LMs. |
Yushi Hu; Hang Hua; Zhengyuan Yang; Weijia Shi; Noah A. Smith; Jiebo Luo; |
77 | Neural Video Depth Stabilizer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An alternative approach is to learn how to enforce temporal consistency from data, but this requires well-designed models and sufficient video depth data. To address these challenges, we propose a plug-and-play framework called Neural Video Depth Stabilizer (NVDS) that stabilizes inconsistent depth estimations and can be applied to different single-image depth models without extra effort. |
Yiran Wang; Min Shi; Jiaqi Li; Zihao Huang; Zhiguo Cao; Jianming Zhang; Ke Xian; Guosheng Lin; |
78 | Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a geometry correspondence-based framework, termed GCPose, to estimate 6D pose of arbitrary unseen objects without any re-training. |
Heng Zhao; Shenxing Wei; Dahu Shi; Wenming Tan; Zheyang Li; Ye Ren; Xing Wei; Yi Yang; Shiliang Pu; |
79 | TrackFlow: Multi-Object Tracking with Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. |
Gianluca Mancusi; Aniello Panariello; Angelo Porrello; Matteo Fabbri; Simone Calderara; Rita Cucchiara; |
80 | Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success of recent learning-based approaches for image manipulation detection, they typically require expensive pixel-level annotations to train, while exhibiting degraded performance when testing on images that are differently manipulated compared with training images. To address these limitations, we propose weakly-supervised image manipulation detection, such that only binary image-level labels (authentic or tampered with) are required for training purpose. |
Yuanhao Zhai; Tianyu Luan; David Doermann; Junsong Yuan; |
81 | PARF: Primitive-Aware Radiance Fusion for Indoor Scene Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a method for fast scene radiance field reconstruction with strong novel view synthesis performance and convenient scene editing functionality. |
Haiyang Ying; Baowei Jiang; Jinzhi Zhang; Di Xu; Tao Yu; Qionghai Dai; Lu Fang; |
82 | DeePoint: Visual Pointing Recognition and Direction Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we realize automatic visual recognition and direction estimation of pointing. |
Shu Nakamura; Yasutomo Kawanishi; Shohei Nobuhara; Ko Nishino; |
83 | Periodically Exchange Teacher-Student for Source-Free Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such paradigm can easily fall into a training instability problem that when the teacher model collapses uncontrollably due to the domain shift, the student model also suffers drastic performance degradation. To address this issue, we propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model. |
Qipeng Liu; Luojun Lin; Zhifeng Shen; Zhifeng Yang; |
84 | Generating Instance-level Prompts for Rehearsal-free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Domain-Adaptive Prompt (DAP), a novel method for continual learning using Vision Transformers (ViT). |
Dahuin Jung; Dongyoon Han; Jihwan Bang; Hwanjun Song; |
85 | Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To adaptively leverage the visual clue before and after the occlusion or blurring for robust hand pose estimation, we propose the Deformer: a framework that implicitly reasons about the relationship between hand parts within the same image (spatial dimension) and different timesteps (temporal dimension). |
Qichen Fu; Xingyu Liu; Ran Xu; Juan Carlos Niebles; Kris M. Kitani; |
86 | HSE: Hybrid Species Embedding for Deep Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we introduce Hybrid Species Embedding (HSE), which employs mixed sample data augmentations to generate hybrid species and provide additional training signals. |
Bailin Yang; Haoqiang Sun; Frederick W. B. Li; Zheng Chen; Jianlu Cai; Chao Song; |
87 | Online Continual Learning on Hierarchical Label Expansion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our configuration allows a network to first learn coarse-grained classes, with data labels continually expanding to more fine-grained classes in various hierarchy depths. To tackle this new setup, we propose a rehearsal-based method that utilizes hierarchy-aware pseudo-labeling to incorporate hierarchical class information. |
Byung Hyun Lee; Okchul Jung; Jonghyun Choi; Se Young Chun; |
88 | IDAG: Invariant DAG Searching for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first characterize that this failure of conventional ML models in DG is attributed to an inadequate identification of causal structures. We further propose a novel and theoretically grounded invariant Directed Acyclic Graph (dubbed iDAG) searching framework that attains an invariant graphical relation as the proxy to the causality structure from the intrinsic data-generating process. |
Zenan Huang; Haobo Wang; Junbo Zhao; Nenggan Zheng; |
89 | Spacetime Surface Regularization for Neural Dynamic Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm, 4DRegSDF, for the spacetime surface regularization to improve the fidelity of neural rendering and reconstruction in dynamic scenes. |
Jaesung Choe; Christopher Choy; Jaesik Park; In So Kweon; Anima Anandkumar; |
90 | GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we found that limited by the scale ambiguity across different scenes in the training dataset, a naive introduction of geometric coarse poses cannot play a positive role in performance improvement, which is counter-intuitive. To address this problem, we propose to refine those poses during training through rotation and translation/scale optimization. |
Chaoqiang Zhao; Matteo Poggi; Fabio Tosi; Lei Zhou; Qiyu Sun; Yang Tang; Stefano Mattoccia; |
91 | 3D Motion Magnification: Visualizing Subtle Motions from Time-Varying Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a 3D motion magnification method that can magnify subtle motions from scenes captured by a moving camera, while supporting novel view rendering. |
Brandon Y. Feng; Hadi Alzayer; Michael Rubinstein; William T. Freeman; Jia-bin Huang; |
92 | Learning to Transform for Generalizable Instance-wise Invariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we predict a distribution over transformations. We use variational inference to learn this distribution end-to-end. |
Utkarsh Singhal; Carlos Esteves; Ameesh Makadia; Stella X. Yu; |
93 | Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS, the largest gameshow deception detection dataset with rich deceptive conversations. |
Xiaobao Guo; Nithish Muthuchamy Selvaraj; Zitong Yu; Adams Wai-Kin Kong; Bingquan Shen; Alex Kot; |
94 | Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Some literature has revealed that hard examples are beneficial for modeling a discriminative boundary accurately. By applying such an idea at the instance level, we elaborate a novel MIL framework with masked hard instance mining (MHIM-MIL), which uses a Siamese structure (Teacher-Student) with a consistency constraint to explore the potential hard instances. |
Wenhao Tang; Sheng Huang; Xiaoxian Zhang; Fengtao Zhou; Yi Zhang; Bo Liu; |
95 | Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the inverse problem – given a collection of different images, can we discover the generative concepts that represent each image? |
Nan Liu; Yilun Du; Shuang Li; Joshua B. Tenenbaum; Antonio Torralba; |
96 | Partition-And-Debias: Agnostic Biases Mitigation Via A Mixture of Biases-Specific Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a more challenging scenario, agnostic biases mitigation, aiming at bias removal regardless of whether the type of bias or the number of types is unknown in the datasets. To address this difficult task, we present the Partition-and-Debias (PnD) method that uses a mixture of biases-specific experts to implicitly divide the bias space into multiple subspaces and a gating module to find a consensus among experts to achieve debiased classification. |
Jiaxuan Li; Duc Minh Vo; Hideki Nakayama; |
97 | Spatial Self-Distillation for Object Detection with Inaccurate Bounding Boxes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we heuristically propose a Spatial Self-Distillation based Object Detector (SSD-Det) to mine spatial information to refine the inaccurate box in a self-distillation fashion. |
Di Wu; Pengfei Chen; Xuehui Yu; Guorong Li; Zhenjun Han; Jianbin Jiao; |
98 | CC3D: Layout-Conditioned Generation of Compositional 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. |
Sherwin Bahmani; Jeong Joon Park; Despoina Paschalidou; Xingguang Yan; Gordon Wetzstein; Leonidas Guibas; Andrea Tagliasacchi; |
99 | Alleviating Catastrophic Forgetting of Incremental Object Detection Via Within-Class and Between-Class Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we discuss the cause of catastrophic forgetting in IOD task as destruction of semantic feature space. |
Mengxue Kang; Jinpeng Zhang; Jinming Zhang; Xiashuang Wang; Yang Chen; Zhe Ma; Xuhui Huang; |
100 | TextPSG: Panoptic Scene Graph Generation from Textual Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem is very challenging for three constraints: 1) no location priors; 2) no explicit links between visual regions and textual entities; and 3) no pre-defined concept sets. To tackle this problem, we propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator, with several novel techniques. |
Chengyang Zhao; Yikang Shen; Zhenfang Chen; Mingyu Ding; Chuang Gan; |
101 | Revisiting The Parameter Efficiency of Adapters from The Perspective of Precision Redundancy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network. |
Shibo Jie; Haoqing Wang; Zhi-Hong Deng; |
102 | EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. |
Peijie Dong; Lujun Li; Zimian Wei; Xin Niu; Zhiliang Tian; Hengyue Pan; |
103 | Face Clustering Via Graph Convolutional Networks with Confidence Edges Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define a new concept called confidence edge and guide the construction of graphs. |
Yang Wu; Zhiwei Ge; Yuhao Luo; Lin Liu; Sulong Xu; |
104 | Learning Spatial-context-aware Global Visual Feature Representation for Instance Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel feature learning framework for instance image retrieval, which embeds local spatial context information into the learned global feature representations. |
Zhongyan Zhang; Lei Wang; Luping Zhou; Piotr Koniusz; |
105 | Cross-modal Latent Space Alignment for Image to Avatar Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for automatic vectorized avatar generation from a single portrait image. |
Manuel Ladron de Guevara; Jose Echevarria; Yijun Li; Yannick Hold-Geoffroy; Cameron Smith; Daichi Ito; |
106 | Inspecting The Geographical Representativeness of Images from Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we measure the geographical representativeness of common nouns (e.g., a house) generated through DALL.E 2 and Stable Diffusion models using a crowdsourced study comprising 540 participants across 27 countries. |
Abhipsa Basu; R. Venkatesh Babu; Danish Pruthi; |
107 | Space-time Prompting for Video Class-incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, prompt-based learning has made impressive progress on image class-incremental learning, but it still lacks sufficient exploration in the video domain. In this paper, we will fill this gap by learning multiple prompts based on a powerful image-language pre-trained model, i.e., CLIP, making it fit for video class-incremental learning (VCIL). |
Yixuan Pei; Zhiwu Qing; Shiwei Zhang; Xiang Wang; Yingya Zhang; Deli Zhao; Xueming Qian; |
108 | Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. |
Alberto Baldrati; Davide Morelli; Giuseppe Cartella; Marcella Cornia; Marco Bertini; Rita Cucchiara; |
109 | Time-to-Contact Map By Joint Estimation of Up-to-Scale Inverse Depth and Global Motion Using A Single Event Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of time-to-contact (TTC) estimation using a single event camera. |
Urbano Miguel Nunes; Laurent Udo Perrinet; Sio-Hoi Ieng; |
110 | Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to present an efficient and flexible mechanism to learn and model degradation relationships in a global view, thereby achieving a unified removal of intricate rain scenes. |
Sixiang Chen; Tian Ye; Jinbin Bai; Erkang Chen; Jun Shi; Lei Zhu; |
111 | A Benchmark for Chinese-English Scene Text Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a real-world Chinese-English benchmark dataset, namely Real-CE, for the task of STISR with the emphasis on restoring structurally complex Chinese characters. |
Jianqi Ma; Zhetong Liang; Wangmeng Xiang; Xi Yang; Lei Zhang; |
112 | HSR-Diff: Hyperspectral Image Super-Resolution Via Conditional Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent advancements in deep generative models, we propose an HSI Super-resolution (SR) approach with Conditional Diffusion Models (HSR-Diff) that merges a high-resolution (HR) multispectral image (MSI) with the corresponding LR-HSI. |
Chanyue Wu; Dong Wang; Yunpeng Bai; Hanyu Mao; Ying Li; Qiang Shen; |
113 | Replay: Multi-modal Multi-view Acted Videos for Casual Holography Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. |
Roman Shapovalov; Yanir Kleiman; Ignacio Rocco; David Novotny; Andrea Vedaldi; Changan Chen; Filippos Kokkinos; Ben Graham; Natalia Neverova; |
114 | Advancing Example Exploitation Can Alleviate Critical Challenges in Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, we investigate the role of examples in AT and find that examples which contribute primarily to accuracy or robustness are distinct. Based on this finding, we propose a novel example-exploitation idea that can further improve the performance of advanced AT methods. |
Yao Ge; Yun Li; Keji Han; Junyi Zhu; Xianzhong Long; |
115 | Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions and is trained collaboratively through two sub-networks, a global and a local network. |
Junjia Huang; Haofeng Li; Xiang Wan; Guanbin Li; |
116 | Removing Anomalies As Noises for Industrial Defect Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a denoising model to detect and localize the anomalies with a generative diffusion model. |
Fanbin Lu; Xufeng Yao; Chi-Wing Fu; Jiaya Jia; |
117 | GPGait: Generalized Pose-based Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve the generalization ability of pose-based methods across datasets, we propose a Generalized Pose-based Gait recognition (GPGait) framework. |
Yang Fu; Shibei Meng; Saihui Hou; Xuecai Hu; Yongzhen Huang; |
118 | Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although many studies have demonstrated the empirical success of various learning methods, the resulting learned representations can exhibit instability and hinder downstream performance. In this study, we analyze discriminative self-supervised methods from a causal perspective to explain these unstable behaviors and propose solutions to overcome them. |
Yuewei Yang; Hai Li; Yiran Chen; |
119 | ShiftNAS: Improving One-shot NAS Via Probability Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the performance gap and attribute it to the use of uniform sampling, which is a common approach in supernet training. |
Mingyang Zhang; Xinyi Yu; Haodong Zhao; Linlin Ou; |
120 | Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From video, we reconstruct a neural volume that captures time-varying color, density, scene flow, semantics, and attention information. |
Yiqing Liang; Eliot Laidlaw; Alexander Meyerowitz; Srinath Sridhar; James Tompkin; |
121 | LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A critical gap emerges from representing continuous image data in a sparse vocabulary space. To bridge this gap, we introduce a novel pre-training framework, Lexicon-Bottlenecked Language-Image Pre-Training (LexLIP), that learns importance-aware lexicon representations. |
Ziyang Luo; Pu Zhao; Can Xu; Xiubo Geng; Tao Shen; Chongyang Tao; Jing Ma; Qingwei Lin; Daxin Jiang; |
122 | A Fast Unified System for 3D Object Detection and Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present FUS3D, a fast and lightweight system for real-time 3D object detection and tracking on edge devices. |
Thomas Heitzinger; Martin Kampel; |
123 | Adaptive Testing of Computer Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes. |
Irena Gao; Gabriel Ilharco; Scott Lundberg; Marco Tulio Ribeiro; |
124 | LFS-GAN: Lifelong Few-Shot Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, the existing few-shot GANs suffer from severe catastrophic forgetting when learning multiple tasks. To alleviate these issues, we propose a framework called Lifelong Few-Shot GAN (LFS-GAN) that can generate high-quality and diverse images in lifelong few-shot image generation task. |
Juwon Seo; Ji-Su Kang; Gyeong-Moon Park; |
125 | AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. |
Dingkang Yang; Shuai Huang; Zhi Xu; Zhenpeng Li; Shunli Wang; Mingcheng Li; Yuzheng Wang; Yang Liu; Kun Yang; Zhaoyu Chen; Yan Wang; Jing Liu; Peixuan Zhang; Peng Zhai; Lihua Zhang; |
126 | Feature Proliferation — The "Cancer" in StyleGAN and Its Treatments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although effective, it has long been noted that the truncation trick tends to reduce the diversity of synthesized images and unnecessarily sacrifices many distinct image features. To address this issue, in this paper, we first delve into the StyleGAN image synthesis mechanism and discover an important phenomenon, namely Feature Proliferation, which demonstrates how specific features reproduce with forward propagation. |
Shuang Song; Yuanbang Liang; Jing Wu; Yu-Kun Lai; Yipeng Qin; |
127 | Self-Supervised Character-to-Character Distillation for Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing self-supervised text recognition methods conduct sequence-to-sequence representation learning by roughly splitting the visual features along the horizontal axis, which limits the flexibility of the augmentations, as large geometric-based augmentations may lead to sequence-to-sequence feature inconsistency. Motivated by this, we propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate general text representation learning. |
Tongkun Guan; Wei Shen; Xue Yang; Qi Feng; Zekun Jiang; Xiaokang Yang; |
128 | MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the great success of cycle tracking in unsupervised 2D SOT, we introduce the first semi-supervised approach to 3D SOT. |
Qiao Wu; Jiaqi Yang; Kun Sun; Chu’ai Zhang; Yanning Zhang; Mathieu Salzmann; |
129 | Multi-Label Self-Supervised Learning with Scene Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Self-supervised learning (SSL) methods targeting scene images have seen a rapid growth recently, and they mostly rely on either a dedicated dense matching mechanism or a costly unsupervised object discovery module. This paper shows that instead of hinging on these strenuous operations, quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem, which greatly simplifies the learning framework. |
Ke Zhu; Minghao Fu; Jianxin Wu; |
130 | Domain Adaptive Few-Shot Open-Set Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing techniques fall short when it comes to identifying target outliers under domain shifts by learning to reject pseudo-outliers from the source domain, resulting in an incomplete solution to both problems. To address these challenges comprehensively, we propose a novel approach called Domain Adaptive Few-Shot Open Set Recognition (DA-FSOS) and introduce a meta-learning-based architecture named DAFOS-Net. |
Debabrata Pal; Deeptej More; Sai Bhargav; Dipesh Tamboli; Vaneet Aggarwal; Biplab Banerjee; |
131 | DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffFacto, a novel probabilistic generative model that learns the distribution of shapes with part-level control. |
George Kiyohiro Nakayama; Mikaela Angelina Uy; Jiahui Huang; Shi-Min Hu; Ke Li; Leonidas Guibas; |
132 | Interactive Class-Agnostic Object Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework for interactive class-agnostic object counting, where a human user can interactively provide feedback to improve the accuracy of a counter. |
Yifeng Huang; Viresh Ranjan; Minh Hoai; |
133 | Spatio-temporal Prompting Network for Robust Video Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a neat and unified framework, called Spatio-Temporal Prompting Network (STPN). |
Guanxiong Sun; Chi Wang; Zhaoyu Zhang; Jiankang Deng; Stefanos Zafeiriou; Yang Hua; |
134 | Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enhance the fine-tuning based defense, inspired by the observation that the backdoor-related neurons often have larger weight norms, we propose FT-SAM, a novel backdoor defense paradigm that aims to shrink the norms of backdoorrelated neurons by incorporating sharpness-aware minimization with fine-tuning. |
Mingli Zhu; Shaokui Wei; Li Shen; Yanbo Fan; Baoyuan Wu; |
135 | Deep Geometry-Aware Camera Self-Calibration from Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a camera self-calibration approach that infers camera intrinsics during application, from monocular videos in the wild. |
Annika Hagemann; Moritz Knorr; Christoph Stiller; |
136 | A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study weakly semi-supervised 3D object detection (WSS3D) with point annotations, where the dataset comprises a small number of fully labeled and massive weakly labeled data with a single point annotated for each 3D object. |
Dingyuan Zhang; Dingkang Liang; Zhikang Zou; Jingyu Li; Xiaoqing Ye; Zhe Liu; Xiao Tan; Xiang Bai; |
137 | Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fully take the gradient stability into consideration, we present a new perspective to the BNNs training, regarding it as the equilibrium between the estimating error and the gradient stability. |
Xiao-Ming Wu; Dian Zheng; Zuhao Liu; Wei-Shi Zheng; |
138 | Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. |
Shihao Wang; Yingfei Liu; Tiancai Wang; Ying Li; Xiangyu Zhang; |
139 | Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual recognizers. To address this, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. |
Hexiang Hu; Yi Luan; Yang Chen; Urvashi Khandelwal; Mandar Joshi; Kenton Lee; Kristina Toutanova; Ming-Wei Chang; |
140 | MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. |
Chaoyi Wu; Xiaoman Zhang; Ya Zhang; Yanfeng Wang; Weidi Xie; |
141 | Automated Knowledge Distillation Via Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Auto-KD, the first automated search framework for optimal knowledge distillation design. |
Lujun Li; Peijie Dong; Zimian Wei; Ya Yang; |
142 | EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. |
Ziqiao Peng; Haoyu Wu; Zhenbo Song; Hao Xu; Xiangyu Zhu; Jun He; Hongyan Liu; Zhaoxin Fan; |
143 | A Soft Nearest-Neighbor Framework for Continual Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning–a setting where not all the data samples are labeled. |
Zhiqi Kang; Enrico Fini; Moin Nabi; Elisa Ricci; Karteek Alahari; |
144 | Text-Conditioned Sampling Framework for Text-to-Image Generation with Masked Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. |
Jaewoong Lee; Sangwon Jang; Jaehyeong Jo; Jaehong Yoon; Yunji Kim; Jin-Hwa Kim; Jung-Woo Ha; Sung Ju Hwang; |
145 | ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. |
Chandan Yeshwanth; Yueh-Cheng Liu; Matthias Nießner; Angela Dai; |
146 | Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes minimal solutions to uncalibrated two-view geometry with known epipoles. |
Gaku Nakano; |
147 | Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we propose a novel method to find semantic variations of the target text in the CLIP space. |
Seogkyu Jeon; Bei Liu; Pilhyeon Lee; Kibeom Hong; Jianlong Fu; Hyeran Byun; |
148 | Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose the CPEM (Context-aware Planner and Environment-aware Memory) embodied agent to incorporate the contextual information of previous actions for planning and maintaining spatial arrangement of objects with their states (e.g., if an object has been already moved or not) in the environment to the perception model for improving both visual navigation and object interactions. |
Byeonghwi Kim; Jinyeon Kim; Yuyeong Kim; Cheolhong Min; Jonghyun Choi; |
149 | Vox-E: Text-Guided Voxel Editing of 3D Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a technique that harnesses the power of latent diffusion models for editing existing 3D objects. |
Etai Sella; Gal Fiebelman; Peter Hedman; Hadar Averbuch-Elor; |
150 | Inverse Problem Regularization with Hierarchical Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to regularize ill-posed inverse problems using a deep hierarchical Variational AutoEncoder (HVAE) as an image prior. |
Jean Prost; Antoine Houdard; Andrés Almansa; Nicolas Papadakis; |
151 | Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with A Square and Symmetric Geometric Map Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is primarily limited by the lack of 3D generative models and ineffective usage of 3D facial data. We propose a learning framework for 3D facial attribute translation to relieve these limitations. |
Zhenfeng Fan; Zhiheng Zhang; Shuang Yang; Chongyang Zhong; Min Cao; Shihong Xia; |
152 | Passive Ultra-Wideband Single-Photon Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of imaging a dynamic scene over an extreme range of timescales simultaneously–seconds to picoseconds–and doing so passively, without much light, and without any timing signals from the light source(s) emitting it. |
Mian Wei; Sotiris Nousias; Rahul Gulve; David B. Lindell; Kiriakos N. Kutulakos; |
153 | Template Inversion Attack Against Face Recognition Systems Using 3D Face Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on template inversion attacks against face recognition systems and introduce a novel method (dubbed GaFaR) to reconstruct 3D face from facial templates. |
Hatef Otroshi Shahreza; Sébastien Marcel; |
154 | ETran: Energy-Based Transferability Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose ETran, an energy-based transferability assessment metric, which includes three scores: 1) energy score, 2) classification score, and 3) regression score. |
Mohsen Gholami; Mohammad Akbari; Xinglu Wang; Behnam Kamranian; Yong Zhang; |
155 | Predict to Detect: Prediction-guided 3D Object Detection Using Sequential Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These approaches do not fully exploit the potential of sequential images and show limited performance improvements. To address this limitation, we propose a novel 3D object detection model, P2D (Predict to Detect), that integrates a prediction scheme into a detection framework to explicitly extract and leverage motion features. |
Sanmin Kim; Youngseok Kim; In-Jae Lee; Dongsuk Kum; |
156 | Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation (UniCon-HA), taking into account both the requirements above. |
Guodong Wang; Yunhong Wang; Jie Qin; Dongming Zhang; Xiuguo Bao; Di Huang; |
157 | Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose AdaCode for learning image-adaptive codebooks for class-agnostic image restoration. |
Kechun Liu; Yitong Jiang; Inchang Choi; Jinwei Gu; |
158 | 3D Segmentation of Humans in Point Clouds with Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated training data of humans interacting with 3D scenes. We address this challenge and propose a framework for generating training data of synthetic humans interacting with real 3D scenes. |
Ayça Takmaz; Jonas Schult; Irem Kaftan; Mertcan Akçay; Bastian Leibe; Robert Sumner; Francis Engelmann; Siyu Tang; |
159 | Mastering Spatial Graph Prediction of Road Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that given a partially generated graph, sequentially adds new edges. |
Anagnostidis Sotiris; Aurelien Lucchi; Thomas Hofmann; |
160 | IDiff-Face: Synthetic-based Face Recognition Through Fizzy Identity-Conditioned Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent synthetic datasets that are used to train face recognition models suffer either from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. |
Fadi Boutros; Jonas Henry Grebe; Arjan Kuijper; Naser Damer; |
161 | Deep Video Demoireing Via Compact Invertible Dyadic Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By interpreting video demoireing as a multi-frame decomposition problem, we propose a compact invertible dyadic network called CIDNet that progressively decouples latent frames and the moire patterns from an input video sequence. |
Yuhui Quan; Haoran Huang; Shengfeng He; Ruotao Xu; |
162 | Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, we propose the reference-aware implicit attention as an upsampling module, achieving arbitrary-scale super-resolution via implicit neural representation, further fusing supplementary information of the reference image. |
Guangyuan Li; Lei Zhao; Jiakai Sun; Zehua Lan; Zhanjie Zhang; Jiafu Chen; Zhijie Lin; Huaizhong Lin; Wei Xing; |
163 | Domain Generalization Via Rationale Invariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For a well-generalized model, we suggest the rationale matrices for samples belonging to the same category should be similar, indicating the model relies on domain-invariant clues to make decisions, thereby ensuring robust results. To implement this idea, we introduce a rationale invariance loss as a simple regularization technique, requiring only a few lines of code. |
Liang Chen; Yong Zhang; Yibing Song; Anton van den Hengel; Lingqiao Liu; |
164 | ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. |
Uddeshya Upadhyay; Shyamgopal Karthik; Massimiliano Mancini; Zeynep Akata; |
165 | Towards Open-Set Test-Time Adaptation Utilizing The Wisdom of Crowds in Entropy Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Long-term stable adaptation is hampered by such noisy signals, so training models without such error accumulation is crucial for practical TTA. To address these issues, including open-set TTA, we propose a simple yet effective sample selection method inspired by the following crucial empirical finding. |
Jungsoo Lee; Debasmit Das; Jaegul Choo; Sungha Choi; |
166 | Scene Graph Contrastive Learning for Embodied Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as training-only supervisory signals. |
Kunal Pratap Singh; Jordi Salvador; Luca Weihs; Aniruddha Kembhavi; |
167 | Long-Range Grouping Transformer for Multi-View 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To alleviate this problem, recent methods compress the token number representing each view or discard the attention operations between the tokens from different views. Obviously, they give a negative impact on performance. Therefore, we propose long-range grouping attention (LGA) based on the divide-and-conquer principle. |
Liying Yang; Zhenwei Zhu; Xuxin Lin; Jian Nong; Yanyan Liang; |
168 | Latent-OFER: Detect, Mask, and Reconstruct with Latent Vectors for Occluded Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Latent-OFER, the proposed method, can detect occlusions, restore occluded parts of the face as if they were unoccluded, and recognize them, improving FER accuracy. |
Isack Lee; Eungi Lee; Seok Bong Yoo; |
169 | DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the DenseShift network, which significantly improves the accuracy of Shift networks, achieving competitive performance to full-precision networks for vision and speech applications. |
Xinlin Li; Bang Liu; Rui Heng Yang; Vanessa Courville; Chao Xing; Vahid Partovi Nia; |
170 | Preparing The Future for Continual Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we focus on Continual Semantic Segmentation (CSS) and present a novel approach to tackle the issue of existing methods struggling to learn new classes. |
Zihan Lin; Zilei Wang; Yixin Zhang; |
171 | Efficient Computation Sharing for Multi-Task Visual Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel computation- and parameter-sharing framework that balances efficiency and accuracy to perform multiple visual tasks utilizing individually-trained single-task transformers. |
Sara Shoouri; Mingyu Yang; Zichen Fan; Hun-Seok Kim; |
172 | Self-supervised Cross-view Representation Reconstruction for Change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its key challenge is how to learn a stable difference representation under pseudo changes caused by viewpoint change. In this paper, we address this by proposing a self-supervised cross-view representation reconstruction (SCORER) network. |
Yunbin Tu; Liang Li; Li Su; Zheng-Jun Zha; Chenggang Yan; Qingming Huang; |
173 | Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an Unify, Align and then Refine (UAR) approach to learn multi-level cross-modal alignments and introduce three novel modules: Latent Space Unifier (LSU), Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR). |
Yaowei Li; Bang Yang; Xuxin Cheng; Zhihong Zhu; Hongxiang Li; Yuexian Zou; |
174 | Synthesizing Diverse Human Motions in 3D Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. |
Kaifeng Zhao; Yan Zhang; Shaofei Wang; Thabo Beeler; Siyu Tang; |
175 | Deep Optics for Video Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, there are two clouds in the sunshine of SCI: i) low dynamic range as a victim of high temporal multiplexing, and ii) existing deep learning algorithms’ degradation on real system. To address these challenges, this paper presents a deep optics framework to jointly optimize masks and a reconstruction network. |
Ping Wang; Lishun Wang; Xin Yuan; |
176 | DDIT: Semantic Scene Completion Via Deformable Deep Implicit Templates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the completed shapes may be rough and imprecise since respective methods rely on 3D convolution and/or lack effective shape constraints. To overcome these limitations, we propose a semantic scene completion method based on deformable deep implicit templates (DDIT). |
Haoang Li; Jinhu Dong; Binghui Wen; Ming Gao; Tianyu Huang; Yun-Hui Liu; Daniel Cremers; |
177 | Joint Demosaicing and Deghosting of Time-Varying Exposures for Single-Shot HDR Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, time-varying exposures are not ideal for dynamic scenes and require an additional deghosting method. To tackle this issue, we propose a single-shot HDR demosaicing method that takes time-varying multiple exposures as input and jointly solves both the demosaicing and deghosting problems. |
Jungwoo Kim; Min H. Kim; |
178 | Scene-Aware Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in significant performance degradation when handling challenging scenes such as scenes with large viewpoint and illumination changes. To tackle this problem, we propose a novel model named SAM, which applies attentional grouping to guide Scene-Aware feature Matching. |
Xiaoyong Lu; Yaping Yan; Tong Wei; Songlin Du; |
179 | FDViT: Improve The Hierarchical Architecture of Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose FDViT to improve the hierarchical architecture of the vision transformer by using a flexible downsampling layer that is not limited to integer stride to smoothly reduce the sizes of the middle feature maps. |
Yixing Xu; Chao Li; Dong Li; Xiao Sheng; Fan Jiang; Lu Tian; Ashish Sirasao; |
180 | Tuning Pre-trained Model Via Moment Probing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. |
Mingze Gao; Qilong Wang; Zhenyi Lin; Pengfei Zhu; Qinghua Hu; Jingbo Zhou; |
181 | Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. |
Haoyu Cao; Changcun Bao; Chaohu Liu; Huang Chen; Kun Yin; Hao Liu; Yinsong Liu; Deqiang Jiang; Xing Sun; |
182 | Task Agnostic Restoration of Natural Video Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-Of-The-Art (SOTA) techniques that address these inconsistencies rely on the availability of unprocessed videos to implicitly siphon and utilize consistent video dynamics to restore the temporal consistency of frame-wise processed videos which often jeopardizes the translation effect. We propose a general framework for this task that learns to infer and utilize consistent motion dynamics from inconsistent videos to mitigate the temporal flicker while preserving the perceptual quality for both the temporally neighboring and relatively distant frames without requiring the raw videos at test time. |
Muhammad Kashif Ali; Dongjin Kim; Tae Hyun Kim; |
183 | TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present TMR, a simple yet effective approach for text to 3D human motion retrieval. |
Mathis Petrovich; Michael J. Black; Gül Varol; |
184 | 3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce probabilistic modeling to the inverse graphics framework to quantify uncertainty and achieve robustness in 6D pose estimation tasks. |
Guangyao Zhou; Nishad Gothoskar; Lirui Wang; Joshua B. Tenenbaum; Dan Gutfreund; Miguel Lázaro-Gredilla; Dileep George; Vikash K. Mansinghka; |
185 | Towards Robust Model Watermark Via Reducing Parametric Vulnerability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To further explore this vulnerability, we investigate the parametric space and find there exist many watermark-removed models in the vicinity of the watermarked one, which may be easily used by removal attacks. Inspired by this finding, we propose a minimax formulation to find these watermark-removed models and recover their watermark behavior. |
Guanhao Gan; Yiming Li; Dongxian Wu; Shu-Tao Xia; |
186 | SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance. |
Yiran Qin; Chaoqun Wang; Zijian Kang; Ningning Ma; Zhen Li; Ruimao Zhang; |
187 | EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the issues, this paper proposes Emotional Motion Memory Net (EMMN) that synthesizes expression overall on the talking face via emotion embedding and lip motion instead of the sole audio. |
Shuai Tan; Bin Ji; Ye Pan; |
188 | Rethinking Vision Transformers for MobileNet Size and Speed Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate a central question, can transformer models run as fast as MobileNet and maintain a similar size? |
Yanyu Li; Ju Hu; Yang Wen; Georgios Evangelidis; Kamyar Salahi; Yanzhi Wang; Sergey Tulyakov; Jian Ren; |
189 | Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations, which produces severe artifacts and significantly degrades the generation quality. To tackle this problem, we propose to learn a global facial representation space, and design a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation. |
Fa-Ting Hong; Dan Xu; |
190 | SINC: Self-Supervised In-Context Learning for Vision-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. |
Yi-Syuan Chen; Yun-Zhu Song; Cheng Yu Yeo; Bei Liu; Jianlong Fu; Hong-Han Shuai; |
191 | LEA2: A Lightweight Ensemble Adversarial Attack Via Non-overlapping Vulnerable Frequency Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find three types of models with non-overlapping vulnerable frequency regions, which can cover a large enough vulnerable subspace. |
Yaguan Qian; Shuke He; Chenyu Zhao; Jiaqiang Sha; Wei Wang; Bin Wang; |
192 | Chupa: Carving 3D Clothed Humans from Skinned Shape Priors Using 2D Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a 3D generation pipeline that uses diffusion models to generate realistic human digital avatars. |
Byungjun Kim; Patrick Kwon; Kwangho Lee; Myunggi Lee; Sookwan Han; Daesik Kim; Hanbyul Joo; |
193 | Unsupervised Domain Adaptive Detection with Network Stability Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis. |
Wenzhang Zhou; Heng Fan; Tiejian Luo; Libo Zhang; |
194 | Learning A Room with The Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our findings show that the color rendering loss creates an optimization bias against low-intensity areas, resulting in gradient vanishing and leaving these areas unoptimized. To address this issue, we propose a feature-based color rendering loss that utilizes non-zero feature values to bring back optimization signals. |
Xiaoyang Lyu; Peng Dai; Zizhang Li; Dongyu Yan; Yi Lin; Yifan Peng; Xiaojuan Qi; |
195 | Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define and study a new Cloth2Body problem which has a goal of generating 3d human body meshes from a 2D clothing image. |
Lu Dai; Liqian Ma; Shenhan Qian; Hao Liu; Ziwei Liu; Hui Xiong; |
196 | Spatially and Spectrally Consistent Deep Functional Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cycle consistency has long been exploited as a powerful prior for jointly optimizing maps within a collection of shapes. In this paper, we investigate its utility in the approaches of Deep Functional Maps, which are considered state-of-the-art in non-rigid shape matching. |
Mingze Sun; Shiwei Mao; Puhua Jiang; Maks Ovsjanikov; Ruqi Huang; |
197 | Sparse Point Guided 3D Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a sparse point-guided 3D lane detection, focusing on points related to 3D lanes. |
Chengtang Yao; Lidong Yu; Yuwei Wu; Yunde Jia; |
198 | Event-based Temporally Dense Optical Flow Estimation with Sequential Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that a temporally dense flow estimation at 100Hz can be achieved by treating the flow estimation as a sequential problem using two different variants of recurrent networks – Long-short term memory (LSTM) and spiking neural network (SNN). |
Wachirawit Ponghiran; Chamika Mihiranga Liyanagedera; Kaushik Roy; |
199 | Going Beyond Nouns With Vision & Language Models Using Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, their difficulty to understand Visual Language Concepts (VLC) that go ‘beyond nouns’ such as the meaning of non-object words (e.g., attributes, actions, relations, states, etc.), or difficulty in performing compositional reasoning such as understanding the significance of the order of the words in a sentence. In this work, we investigate to which extent purely synthetic data could be leveraged to teach these models to overcome such shortcomings without compromising their zero-shot capabilities. |
Paola Cascante-Bonilla; Khaled Shehada; James Seale Smith; Sivan Doveh; Donghyun Kim; Rameswar Panda; Gul Varol; Aude Oliva; Vicente Ordonez; Rogerio Feris; Leonid Karlinsky; |
200 | Continual Zero-Shot Learning Through Semantically Guided Generative Random Walks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the challenge of continual zero-shot learning where unseen information is not provided during training, by leveraging generative modeling. |
Wenxuan Zhang; Paul Janson; Kai Yi; Ivan Skorokhodov; Mohamed Elhoseiny; |
201 | Foreground-Background Distribution Modeling Transformer for Visual Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the feature learning of these Transformer-based trackers is easily disturbed by complex backgrounds. To address the above limitations, we propose a novel foreground-background distribution modeling transformer for visual object tracking (F-BDMTrack), including a fore-background agent learning (FBAL) module and a distribution-aware attention (DA2) module in a unified transformer architecture. |
Dawei Yang; Jianfeng He; Yinchao Ma; Qianjin Yu; Tianzhu Zhang; |
202 | MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. |
Henghui Ding; Chang Liu; Shuting He; Xudong Jiang; Chen Change Loy; |
203 | OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Omni-suPErvised Representation leArning with hierarchical supervisions (OPERA) as a solution. |
Chengkun Wang; Wenzhao Zheng; Zheng Zhu; Jie Zhou; Jiwen Lu; |
204 | GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, from the perspective of feature extraction, most existing pFL methods only focus on extracting global or personalized feature information during local training, which fails to meet the collaborative learning and personalization goals of pFL. To address this, we propose a new pFL method, named GPFL, to simultaneously learn global and personalized feature information on each client. |
Jianqing Zhang; Yang Hua; Hao Wang; Tao Song; Zhengui Xue; Ruhui Ma; Jian Cao; Haibing Guan; |
205 | Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn’t require additional fine-tuning or auxiliary networks. |
Serin Yang; Hyunmin Hwang; Jong Chul Ye; |
206 | Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents ER-NeRF, a novel conditional Neural Radiance Fields (NeRF) based architecture for talking portrait synthesis that can concurrently achieve fast convergence, real-time rendering, and state-of-the-art performance with small model size. |
Jiahe Li; Jiawei Zhang; Xiao Bai; Jun Zhou; Lin Gu; |
207 | End2End Multi-View Feature Matching with Differentiable Pose Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a graph attention network to predict image correspondences along with confidence weights. |
Barbara Roessle; Matthias Nießner; |
208 | Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel network structure with illumination-aware gamma correction and complete image modelling to solve the low-light image enhancement problem. |
Yinglong Wang; Zhen Liu; Jianzhuang Liu; Songcen Xu; Shuaicheng Liu; |
209 | Both Diverse and Realism Matter: Physical Attribute and Style Alignment for Rainy Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing RIG methods mainly focus on diversity but miss realistic, or the realistic but neglect diversity of the generation. To solve this dilemma, we propose a physical alignment and controllable generation network (PCGNet) for diverse and realistic rain generation. |
Changfeng Yu; Shiming Chen; Yi Chang; Yibing Song; Luxin Yan; |
210 | Exploring The Benefits of Visual Prompting in Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the benefits of VP in constructing compelling neural network classifiers with differential privacy (DP). |
Yizhe Li; Yu-Lin Tsai; Chia-Mu Yu; Pin-Yu Chen; Xuebin Ren; |
211 | Single Image Reflection Separation Via Component Synergy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the investigation of the weaknesses of existing models, we propose a more general form of the superposition model by introducing a learnable residue term, which can effectively capture residual information during decomposition, guiding the separated layers to be complete. |
Qiming Hu; Xiaojie Guo; |
212 | Mining Bias-target Alignment from Voronoi Cells Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a bias-agnostic approach to mitigate the impact of biases in deep neural networks. |
Rémi Nahon; Van-Tam Nguyen; Enzo Tartaglione; |
213 | The Victim and The Beneficiary: Exploiting A Poisoned Model to Train A Clean Model on Poisoned Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find that the poisoned samples and benign samples can be distinguished with prediction entropy. |
Zixuan Zhu; Rui Wang; Cong Zou; Lihua Jing; |
214 | DIFFGUARD: Semantic Mismatch-Guided Out-of-Distribution Detection Using Pre-Trained Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As diffusion models are much easier to train and amenable to various conditions compared to cGANs, in this work, we propose to directly use pre-trained diffusion models for semantic mismatch-guided OOD detection, named DiffGuard. |
Ruiyuan Gao; Chenchen Zhao; Lanqing Hong; Qiang Xu; |
215 | Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an Identity-seeking Self-supervised Representation learning (ISR) method. |
Zhaopeng Dou; Zhongdao Wang; Yali Li; Shengjin Wang; |
216 | 3D-Aware Generative Model for Improved Side-View Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SideGAN, a novel 3D GAN training method to generate photo-realistic images irrespective of the camera pose, especially for faces of side-view angles. |
Kyungmin Jo; Wonjoon Jin; Jaegul Choo; Hyunjoon Lee; Sunghyun Cho; |
217 | Tracking Anything with Decoupled Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To ‘track anything’ without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. |
Ho Kei Cheng; Seoung Wug Oh; Brian Price; Alexander Schwing; Joon-Young Lee; |
218 | Generative Gradient Inversion Via Over-Parameterized Networks in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, our study shows that local participants in a federated learning system are vulnerable to potential data leakage issues. |
Chi Zhang; Zhang Xiaoman; Ekanut Sotthiwat; Yanyu Xu; Ping Liu; Liangli Zhen; Yong Liu; |
219 | EQ-Net: Elastic Quantization Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore a one-shot network quantization regime, named Elastic Quantization Neural Networks (EQ-Net), which aims to train a robust weight-sharing quantization supernet. |
Ke Xu; Lei Han; Ye Tian; Shangshang Yang; Xingyi Zhang; |
220 | OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OxfordTVG-HIC (Humorous Image Captions), a large-scale dataset for humour generation and understanding. |
Runjia Li; Shuyang Sun; Mohamed Elhoseiny; Philip Torr; |
221 | Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches often rely on expensive human annotations as supervision for model training, limiting their scalability to large, unlabeled datasets. To address this challenge, we present ZeroSeg, a novel method that leverages the existing pretrained vision-language (VL) model (e.g. CLIP vision encoder) to train open-vocabulary zero-shot semantic segmentation models. |
Jun Chen; Deyao Zhu; Guocheng Qian; Bernard Ghanem; Zhicheng Yan; Chenchen Zhu; Fanyi Xiao; Sean Chang Culatana; Mohamed Elhoseiny; |
222 | EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite being a crucial output of the perception stack, panoptic segmentation has been largely overlooked by the domain adaptation community. Therefore, we revisit well-performing domain adaptation strategies from other fields, adapt them to panoptic segmentation, and show that they can effectively enhance panoptic domain adaptation. |
Suman Saha; Lukas Hoyer; Anton Obukhov; Dengxin Dai; Luc Van Gool; |
223 | Parallax-Tolerant Unsupervised Deep Image Stitching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, deep stitching schemes overcome adverse conditions by adaptively learning robust semantic features, but they cannot handle large-parallax cases. To solve these issues, we propose a parallax-tolerant unsupervised deep image stitching technique. |
Lang Nie; Chunyu Lin; Kang Liao; Shuaicheng Liu; Yao Zhao; |
224 | Scratch Each Other’s Back: Incomplete Multi-Modal Brain Tumor Segmentation Via Category Aware Group Self-Support Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, considering the sensitivity of different modalities to diverse tumor regions, we propose a Category Aware Group Self-Support Learning framework, called GSS, to make up for the information deficit among the modalities in the individual modal feature extraction phase. |
Yansheng Qiu; Delin Chen; Hongdou Yao; Yongchao Xu; Zheng Wang; |
225 | SFHarmony: Source Free Domain Adaptation for Distributed Neuroimaging Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, neuroimaging data is inherently personal in nature, leading to data privacy concerns when sharing the data. To overcome these barriers, we propose an Unsupervised Source-Free Domain Adaptation (SFDA) method, SFHarmony. |
Nicola K Dinsdale; Mark Jenkinson; Ana IL Namburete; |
226 | M2T: Masking Transformers Twice for Faster Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show how bidirectional transformers trained for masked token prediction can be applied to neural image compression to achieve state-of-the-art results. |
Fabian Mentzer; Eirikur Agustson; Michael Tschannen; |
227 | CoIn: Contrastive Instance Feature Mining for Outdoor 3D Object Detection with Very Limited Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current detectors usually perform poorly under very limited annotations. To address this problem, we propose a novel Contrastive Instance feature mining method, named CoIn. |
Qiming Xia; Jinhao Deng; Chenglu Wen; Hai Wu; Shaoshuai Shi; Xin Li; Cheng Wang; |
228 | 3D Human Mesh Recovery with Sequentially Global Rotation Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to directly estimate the global rotation of each joint to avoid error accumulation and pursue better accuracy. |
Dongkai Wang; Shiliang Zhang; |
229 | DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose Dreamwalker — a world model based VLN-CE agent. |
Hanqing Wang; Wei Liang; Luc Van Gool; Wenguan Wang; |
230 | Computation and Data Efficient Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is also very time-consuming as it needs to go through almost all the training stages for data selection. To address such limitations, we propose a novel confidence-based scoring methodology, which can efficiently measure the contribution of each poisoning sample based on the distance posteriors. |
Yutong Wu; Xingshuo Han; Han Qiu; Tianwei Zhang; |
231 | Agglomerative Transformer for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an agglomerative Transformer (AGER) that enables Transformer-based human-object interaction (HOI) detectors to flexibly exploit extra instance-level cues in a single-stage and end-to-end manner for the first time. |
Danyang Tu; Wei Sun; Guangtao Zhai; Wei Shen; |
232 | Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, neglecting the interactions between modalities will lead to poor performance. To tackle these challenging issues, we propose a comprehensive formulation for CL-VQA from the perspective of multi-modal vision-language fusion. |
Zi Qian; Xin Wang; Xuguang Duan; Pengda Qin; Yuhong Li; Wenwu Zhu; |
233 | Rethinking Fast Fourier Convolution in Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the fundamental flaws of using FFC in image inpainting, which are 1) spectrum shifting, 2) unexpected spatial activation, and 3) limited frequency receptive field. |
Tianyi Chu; Jiafu Chen; Jiakai Sun; Shuobin Lian; Zhizhong Wang; Zhiwen Zuo; Lei Zhao; Wei Xing; Dongming Lu; |
234 | Learning Robust Representations with Information Bottleneck and Memory Network for RGB-D-based Gesture Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a convenient and analytical framework to learn a robust feature representation that is impervious to gesture-irrelevant factors. |
Yunan Li; Huizhou Chen; Guanwen Feng; Qiguang Miao; |
235 | P1AC: Revisiting Absolute Pose From A Single Affine Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence. |
Jonathan Ventura; Zuzana Kukelova; Torsten Sattler; Dániel Baráth; |
236 | LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an end-to-end HDR video composition framework, which aligns LDR frames in the feature space and then merges aligned features into an HDR frame, without relying on pixel-domain optical flow. |
Haesoo Chung; Nam Ik Cho; |
237 | Dancing in The Dark: A Benchmark Towards General Low-light Video Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current research in this area is limited by the lack of high-quality benchmark datasets. To address this issue, we design a camera system and collect a high-quality low-light video dataset with multiple exposures and cameras. |
Huiyuan Fu; Wenkai Zheng; Xicong Wang; Jiaxuan Wang; Heng Zhang; Huadong Ma; |
238 | RED-PSM: Regularization By Denoising of Partially Separable Models for Dynamic Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a partially separable objective with RED and an optimization scheme with variable splitting and ADMM. |
Berk Iskender; Marc L. Klasky; Yoram Bresler; |
239 | Unsupervised Manifold Linearizing and Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. |
Tianjiao Ding; Shengbang Tong; Kwan Ho Ryan Chan; Xili Dai; Yi Ma; Benjamin D. Haeffele; |
240 | Lossy and Lossless (L2) Post-training Model Size Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. |
Yumeng Shi; Shihao Bai; Xiuying Wei; Ruihao Gong; Jianlei Yang; |
241 | C2ST: Cross-Modal Contextualized Sequence Transduction for Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Cross-modal Contextualized Sequence Transduction (C2ST) for CSLR, which effectively incorporates the knowledge of gloss sequence into the process of video representation learning and sequence transduction. |
Huaiwen Zhang; Zihang Guo; Yang Yang; Xin Liu; De Hu; |
242 | ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Object-centric Fusion (ObjectFusion) paradigm, which completely gets rid of camera-to-BEV transformation during fusion to align object-centric features across different modalities for 3D object detection. |
Qi Cai; Yingwei Pan; Ting Yao; Chong-Wah Ngo; Tao Mei; |
243 | D-IF: Uncertainty-aware Human Digitization Via Implicit Distribution Field Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose replacing the implicit value with an adaptive uncertainty distribution, to differentiate between points based on their distance to the surface. |
Xueting Yang; Yihao Luo; Yuliang Xiu; Wei Wang; Hao Xu; Zhaoxin Fan; |
244 | MMVP: Motion-Matrix-Based Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A central challenge of video prediction lies where the system has to reason the object’s future motion from image frames while simultaneously maintaining the consistency of its appearance across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. |
Yiqi Zhong; Luming Liang; Ilya Zharkov; Ulrich Neumann; |
245 | Human Preference Score: Better Aligning Text-to-Image Models with Human Preference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. |
Xiaoshi Wu; Keqiang Sun; Feng Zhu; Rui Zhao; Hongsheng Li; |
246 | Guided Motion Diffusion for Controllable Human Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, integrating spatial constraints, such as pre-defined motion trajectories and obstacles, remains a challenge despite being essential for bridging the gap between isolated human motion and its surrounding environment. To address this issue, we propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process. |
Korrawe Karunratanakul; Konpat Preechakul; Supasorn Suwajanakorn; Siyu Tang; |
247 | AffordPose: A Large-Scale Dataset of Hand-Object Interactions with Affordance-Driven Hand Pose Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present AffordPose, a large-scale dataset of hand-object interactions with affordance-driven hand pose. |
Juntao Jian; Xiuping Liu; Manyi Li; Ruizhen Hu; Jian Liu; |
248 | Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LAMA, Locomotion-Action-MAnipulation, to synthesize natural and plausible long term human movements in complex indoor environments. |
Jiye Lee; Hanbyul Joo; |
249 | NDDepth: Normal-Distance Assisted Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel physics (geometry)-driven deep learning framework for monocular depth estimation by assuming that 3D scenes are constituted by piece-wise planes. |
Shuwei Shao; Zhongcai Pei; Weihai Chen; Xingming Wu; Zhengguo Li; |
250 | Sequential Texts Driven Cohesive Motions Synthesis with Natural Transitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a cohesive human motion sequence synthesis framework based on free-form sequential texts while ensuring semantic connection and natural transitions between adjacent motions. |
Shuai Li; Sisi Zhuang; Wenfeng Song; Xinyu Zhang; Hejia Chen; Aimin Hao; |
251 | Efficient Converted Spiking Neural Network for 3D and 2D Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient unified ANN-SNN conversion method for point cloud classification and image classification to significantly reduce the time step to meet the fast and lossless ANN-SNN transformation. |
Yuxiang Lan; Yachao Zhang; Xu Ma; Yanyun Qu; Yun Fu; |
252 | Eulerian Single-Photon Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we demonstrate computationally light-weight phase-based algorithms for the tasks of edge detection and motion estimation. |
Shantanu Gupta; Mohit Gupta; |
253 | Adaptive Calibrator Ensemble: Navigating Test Set Difficulty in Out-of-Distribution Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If a test set has a drastically different difficulty level from the calibration set, a phenomenon out-of-distribution (OOD) data often exhibit: the optimal calibration parameters of the two datasets would be different, rendering an optimal calibrator on the calibration set suboptimal on the OOD test set and thus degraded calibration performance. With this knowledge, we propose a simple and effective method named adaptive calibrator ensemble (ACE) to calibrate OOD datasets whose difficulty is usually higher than the calibration set. |
Yuli Zou; Weijian Deng; Liang Zheng; |
254 | Contrastive Learning Relies More on Spatial Inductive Bias Than Supervised Learning: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from most previous work that understands CL from learning objectives, we focus on an unexplored yet natural aspect: the spatial inductive bias which seems to be implicitly exploited via data augmentations in CL. |
Yuanyi Zhong; Haoran Tang; Jun-Kun Chen; Yu-Xiong Wang; |
255 | DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the pre-trained Stable Diffusion, which uses only text-image pairs during training. |
Weijia Wu; Yuzhong Zhao; Mike Zheng Shou; Hong Zhou; Chunhua Shen; |
256 | NSF: Neural Surface Fields for Human Modeling from Monocular Depth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, predicting per-vertex deformations on a pre-designed human template with a discrete surface lacks flexibility in resolution and topology. To overcome these limitations, we propose a novel method ‘NSF: Neural Surface Fields’ for modeling 3D clothed humans from monocular depth. |
Yuxuan Xue; Bharat Lal Bhatnagar; Riccardo Marin; Nikolaos Sarafianos; Yuanlu Xu; Gerard Pons-Moll; Tony Tung; |
257 | Unaligned 2D to 3D Translation with Conditional Vector-Quantized Code Diffusion Using Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By contrast, we propose a simple and novel 2D to 3D synthesis approach based on conditional diffusion with vector-quantized codes. |
Abril Corona-Figueroa; Sam Bond-Taylor; Neelanjan Bhowmik; Yona Falinie A. Gaus; Toby P. Breckon; Hubert P. H. Shum; Chris G. Willcocks; |
258 | DMNet: Delaunay Meshing Network for 3D Shape Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel learning-based method with Delaunay triangulation to achieve high-precision reconstruction. |
Chen Zhang; Ganzhangqin Yuan; Wenbing Tao; |
259 | StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. |
Aibek Alanov; Vadim Titov; Maksim Nakhodnov; Dmitry Vetrov; |
260 | RankMixup: Ranking-Based Mixup Training for Network Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present RankMixup, a novel mixup-based framework alleviating the problem of the mixture of labels for network calibration. |
Jongyoun Noh; Hyekang Park; Junghyup Lee; Bumsub Ham; |
261 | Body Knowledge and Uncertainty Modeling for Monocular 3D Human Body Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose KNOWN, a framework that effectively utilizes body KNOWledge and uNcertainty modeling to compensate for insufficient 3D supervisions. |
Yufei Zhang; Hanjing Wang; Jeffrey O. Kephart; Qiang Ji; |
262 | Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the orthogonal channel dimension for generic data augmentation by exploiting precision redundancy. |
Huimin Wu; Chenyang Lei; Xiao Sun; Peng-Shuai Wang; Qifeng Chen; Kwang-Ting Cheng; Stephen Lin; Zhirong Wu; |
263 | Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel approach for enhancing text-image correspondence by leveraging available semantic layouts. |
Minho Park; Jooyeol Yun; Seunghwan Choi; Jaegul Choo; |
264 | Neural Radiance Field with LiDAR Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing NeRF methods usually require specially collected hypersampled source views and do not perform well with the open source camera-LiDAR datasets – significantly limiting the approach’s practical utility. In this paper, we demonstrate an approach that allows for these datasets to be utilized for high quality neural renderings. |
MingFang Chang; Akash Sharma; Michael Kaess; Simon Lucey; |
265 | AREA: Adaptive Reweighting Via Effective Area for Long-Tailed Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reconsider reweighting from a totally new perspective of analyzing the spanned space of each class. |
Xiaohua Chen; Yucan Zhou; Dayan Wu; Chule Yang; Bo Li; Qinghua Hu; Weiping Wang; |
266 | Erasing Concepts from Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher. |
Rohit Gandikota; Joanna Materzynska; Jaden Fiotto-Kaufman; David Bau; |
267 | Fully Attentional Networks with Self-emerging Token Labeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. |
Bingyin Zhao; Zhiding Yu; Shiyi Lan; Yutao Cheng; Anima Anandkumar; Yingjie Lao; Jose M. Alvarez; |
268 | ACTIVE: Towards Highly Transferable 3D Physical Camouflage for Universal and Robust Vehicle Evasion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, universality and robustness in existing methods often fall short as the transferability aspect is often overlooked, thus restricting their application only to a specific target with limited performance. To address these challenges, we present Adversarial Camouflage for Transferable and Intensive Vehicle Evasion (ACTIVE), a state-of-the-art physical camouflage attack framework designed to generate universal and robust adversarial camouflage capable of concealing any 3D vehicle from detectors. |
Naufal Suryanto; Yongsu Kim; Harashta Tatimma Larasati; Hyoeun Kang; Thi-Thu-Huong Le; Yoonyoung Hong; Hunmin Yang; Se-Yoon Oh; Howon Kim; |
269 | Learning Adaptive Neighborhoods for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. |
Avishkar Saha; Oscar Mendez; Chris Russell; Richard Bowden; |
270 | Equivariant Similarity for Vision-Language Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose EqSim, a regularization loss that can be efficiently calculated from any two matched training pairs and easily pluggable into existing image-text retrieval fine-tuning. |
Tan Wang; Kevin Lin; Linjie Li; Chung-Ching Lin; Zhengyuan Yang; Hanwang Zhang; Zicheng Liu; Lijuan Wang; |
271 | ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel reconfigurable graph model that first associates all detected objects across cameras spatially before reconfiguring it into a temporal graph for Temporal Association. |
Cheng-Che Cheng; Min-Xuan Qiu; Chen-Kuo Chiang; Shang-Hong Lai; |
272 | Too Large; Data Reduction for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR which aims to compress the existing large VLP data into a small, high-quality set. |
Alex Jinpeng Wang; Kevin Qinghong Lin; David Junhao Zhang; Stan Weixian Lei; Mike Zheng Shou; |
273 | Make-It-3D: High-fidelity 3D Creation from A Single Image with Diffusion Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. |
Junshu Tang; Tengfei Wang; Bo Zhang; Ting Zhang; Ran Yi; Lizhuang Ma; Dong Chen; |
274 | Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a deeply unified framework for depth-aware panoptic segmentation, which performs joint segmentation and depth estimation both in a per-segment manner with identical object queries. |
Junwen He; Yifan Wang; Lijun Wang; Huchuan Lu; Bin Luo; Jun-Yan He; Jin-Peng Lan; Yifeng Geng; Xuansong Xie; |
275 | Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging Via Optimization Trajectory Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose optimization trajectory distillation, a unified approach to address the two technical challenges from a new perspective. |
Jianan Fan; Dongnan Liu; Hang Chang; Heng Huang; Mei Chen; Weidong Cai; |
276 | DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD in short. |
Sauradip Nag; Xiatian Zhu; Jiankang Deng; Yi-Zhe Song; Tao Xiang; |
277 | Ray Conditioning: Trading Photo-consistency for Photo-realism in Multi-view Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such explicit bias for photo-consistency sacrifices photo-realism, causing geometry artifacts and loss of fine-scale details when these methods are applied to edit real images. To address this issue, we propose ray conditioning, a geometry-free alternative that relaxes the photo-consistency constraint. |
Eric Ming Chen; Sidhanth Holalkere; Ruyu Yan; Kai Zhang; Abe Davis; |
278 | SCOB: Universal Text Understanding Via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the great success of language model (LM)-based pre-training, recent studies in visual document understanding have explored LM-based pre-training methods for modeling text within document images. |
Daehee Kim; Yoonsik Kim; DongHyun Kim; Yumin Lim; Geewook Kim; Taeho Kil; |
279 | Point-Query Quadtree for Crowd Counting, Localization, and More Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Too few imply underestimation; too many increase computational overhead. To address this dilemma, we introduce a decomposable structure, i.e., the point-query quadtree, and propose a new counting model, termed Point quEry Transformer (PET). |
Chengxin Liu; Hao Lu; Zhiguo Cao; Tongliang Liu; |
280 | Heterogeneous Diversity Driven Active Learning for Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the cost of human annotations, we propose Heterogeneous Diversity driven Active Multi-Object Tracking (HD-AMOT), to infer the most informative frames for any MOT tracker by observing the heterogeneous cues of samples. |
Rui Li; Baopeng Zhang; Jun Liu; Wei Liu; Jian Zhao; Zhu Teng; |
281 | Domain Generalization of 3D Semantic Segmentation in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its importance, domain generalization is relatively unexplored in the case of 3D autonomous driving semantic segmentation. To fill this gap, this paper presents the first benchmark for this application by testing state-of-the-art methods and discussing the difficulty of tackling Laser Imaging Detection and Ranging (LiDAR) domain shifts. |
Jules Sanchez; Jean-Emmanuel Deschaud; François Goulette; |
282 | HaMuCo: Hand Pose Estimation Via Multiview Collaborative Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the label-hungry limitation, we propose a self-supervised learning framework, HaMuCo, that learns a single view hand pose estimator from multi-view pseudo 2D labels. |
Xiaozheng Zheng; Chao Wen; Zhou Xue; Pengfei Ren; Jingyu Wang; |
283 | Efficient Model Personalization in Federated Learning Via Client-Specific Prompt Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage robust representations from large-scale models while enabling efficient model personalization for heterogeneous clients, we propose a novel personalized FL framework of client-specific Prompt Generation (pFedPG), which learns to deploy a personalized prompt generator at the server for producing client-specific visual prompts that efficiently adapts frozen backbones to local data distributions. |
Fu-En Yang; Chien-Yi Wang; Yu-Chiang Frank Wang; |
284 | Dual Aggregation Transformer for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. |
Zheng Chen; Yulun Zhang; Jinjin Gu; Linghe Kong; Xiaokang Yang; Fisher Yu; |
285 | Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. |
Guillaume Couairon; Marlène Careil; Matthieu Cord; Stéphane Lathuilière; Jakob Verbeek; |
286 | SegGPT: Towards Segmenting Everything in Context Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SegGPT, a generalist model for segmenting everything in context. |
Xinlong Wang; Xiaosong Zhang; Yue Cao; Wen Wang; Chunhua Shen; Tiejun Huang; |
287 | Semantify: Simplifying The Control of 3D Morphable Models Using CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Semantify: a self-supervised method that utilizes the semantic power of CLIP language-vision foundation model to simplify the control of 3D morphable models. |
Omer Gralnik; Guy Gafni; Ariel Shamir; |
288 | From Sky to The Ground: A Large-scale Benchmark and Simple Baseline Towards Real Rain Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning-based image deraining methods have made great progress. However, the lack of large-scale high-quality paired training samples is the main bottleneck to hamper the real image deraining (RID). To address this dilemma and advance RID, we construct a Large-scale High-quality Paired real rain benchmark (LHP-Rain), including 3000 video sequences with 1 million high-resolution (1920*1080) frame pairs. |
Yun Guo; Xueyao Xiao; Yi Chang; Shumin Deng; Luxin Yan; |
289 | Knowledge Restore and Transfer for Multi-Label Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although there have been many anti-forgetting methods to solve the problem of catastrophic forgetting in single-label class-incremental learning, these methods have difficulty in solving the MLCIL problem due to label absence and information dilution problems. To solve these problems, we propose a Knowledge Restore and Transfer (KRT) framework including a dynamic pseudo-label (DPL) module to solve the label absence problem by restoring the knowledge of old classes to the new data and an incremental cross-attention (ICA) module with session-specific knowledge retention tokens storing knowledge and a unified knowledge transfer token transferring knowledge to solve the information dilution problem. |
Songlin Dong; Haoyu Luo; Yuhang He; Xing Wei; Jie Cheng; Yihong Gong; |
290 | DDColor: Towards Photo-Realistic Image Colorization Via Dual Decoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While transformer-based methods can deliver better results, they often rely on manually designed priors, suffer from poor generalization ability, and introduce color bleeding effects. To address these issues, we propose DDColor, an end-to-end method with dual decoders for image colorization. |
Xiaoyang Kang; Tao Yang; Wenqi Ouyang; Peiran Ren; Lingzhi Li; Xuansong Xie; |
291 | Visual Explanations Via Iterated Integrated Attributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Iterated Integrated Attributions (IIA) – a generic method for explaining the predictions of vision models. |
Oren Barkan; Yehonatan Elisha; Yuval Asher; Amit Eshel; Noam Koenigstein; |
292 | PanFlowNet: A Flow-Based Deep Network for Pan-Sharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing deep learning-based methods recover only one HRMS image from the LRMS image and PAN image using a deterministic mapping, thus ignoring the diversity of the HRMS image. In this paper, to alleviate this ill-posed issue, we propose a flow-based pan-sharpening network (PanFlowNet) to directly learn the conditional distribution of HRMS image given LRMS image and PAN image instead of learning a deterministic mapping. |
Gang Yang; Xiangyong Cao; Wenzhe Xiao; Man Zhou; Aiping Liu; Xun Chen; Deyu Meng; |
293 | Domain Generalization Via Balancing Training Difficulty and Model Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model’s capability and the samples’ difficulties along the training process. |
Xueying Jiang; Jiaxing Huang; Sheng Jin; Shijian Lu; |
294 | Pairwise Similarity Learning Is SimPLE Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). |
Yandong Wen; Weiyang Liu; Yao Feng; Bhiksha Raj; Rita Singh; Adrian Weller; Michael J. Black; Bernhard Schölkopf; |
295 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Purposely, we present GO-SLAM, a deep-learning-based dense visual SLAM framework globally optimizing poses and 3D reconstruction in real-time. |
Youmin Zhang; Fabio Tosi; Stefano Mattoccia; Matteo Poggi; |
296 | JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we focus on the problem of 3D human mesh recovery from a single image under obscured conditions. |
Jiahao Li; Zongxin Yang; Xiaohan Wang; Jianxin Ma; Chang Zhou; Yi Yang; |
297 | CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to the small size and partially labeled problem of each dataset, as well as a limited investigation of diverse types of tumors, the resulting models are often limited to segmenting specific organs/tumors and ignore the semantics of anatomical structures, nor can they be extended to novel domains. To address these issues, we propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models. |
Jie Liu; Yixiao Zhang; Jie-Neng Chen; Junfei Xiao; Yongyi Lu; Bennett A Landman; Yixuan Yuan; Alan Yuille; Yucheng Tang; Zongwei Zhou; |
298 | NIR-assisted Video Enhancement Via Unpaired 24-hour Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we defend the feasibility and superiority of NIR-assisted low-light video enhancement results by using unpaired 24-hour data for the first time, which significantly eases data collection and improves generalization performance on in-the-wild data. |
Muyao Niu; Zhihang Zhong; Yinqiang Zheng; |
299 | FACTS: First Amplify Correlations and Then Slice to Discover Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold. In this work, we study the problem of identifying such slices to inform downstream bias mitigation strategies. |
Sriram Yenamandra; Pratik Ramesh; Viraj Prabhu; Judy Hoffman; |
300 | Anchor Structure Regularization Induced Multi-view Subspace Clustering Via Enhanced Tensor Rank Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Being aware of these, we propose Anchor Structure Regularitation Induced Multi-view Subspace Clustering via Enhanced Tensor Rank Minimization (ASR-ETR). |
Jintian Ji; Songhe Feng; |
301 | VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some recent work demonstrates promising results of learning human generative models using neural articulated radiance fields, yet their generalization ability and controllability lag behind parametric human models, i.e., they do not perform well when generalizing to novel pose/shape and are not part controllable. To solve these problems, we propose VeRi3D, a generative human vertex-based radiance field parameterized by vertices of the parametric human template, SMPL. |
Xinya Chen; Jiaxin Huang; Yanrui Bin; Lu Yu; Yiyi Liao; |
302 | MOSE: A New Dataset for Video Object Segmentation in Complex Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To revisit VOS and make it more applicable in the real world, we collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments. |
Henghui Ding; Chang Liu; Shuting He; Xudong Jiang; Philip H.S. Torr; Song Bai; |
303 | BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new method designed for noisy multi-label CXR learning, which detects and smoothly re-labels noisy samples from the dataset to be used in the training of common multi-label classifiers. |
Yuanhong Chen; Fengbei Liu; Hu Wang; Chong Wang; Yuyuan Liu; Yu Tian; Gustavo Carneiro; |
304 | Mask-Attention-Free Transformer for 3D Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through center regression, we effectively overcome the low-recall issue and perform cross-attention by imposing positional prior. |
Xin Lai; Yuhui Yuan; Ruihang Chu; Yukang Chen; Han Hu; Jiaya Jia; |
305 | SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors. |
Hongge Chen; Zhao Chen; Gregory P. Meyer; Dennis Park; Carl Vondrick; Ashish Shrivastava; Yuning Chai; |
306 | EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize a pipeline (we dub EgoLoc) that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. |
Jinjie Mai; Abdullah Hamdi; Silvio Giancola; Chen Zhao; Bernard Ghanem; |
307 | Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, their performance is severely limited by the lack of inter-person interactions in the spatial-temporal mesh recovery, as well as by detection and tracking defects. To address these challenges, we propose the Coordinate transFormer (CoordFormer) that directly models multi-person spatial-temporal relations and simultaneously performs multi-mesh recovery in an end-to-end manner. |
Haoyuan Li; Haoye Dong; Hanchao Jia; Dong Huang; Michael C. Kampffmeyer; Liang Lin; Xiaodan Liang; |
308 | FLatten Transformer: Vision Transformer Using Focused Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. |
Dongchen Han; Xuran Pan; Yizeng Han; Shiji Song; Gao Huang; |
309 | Q-Diffusion: Quantizing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture of the diffusion models, which compresses the noise estimation network to accelerate the generation process. |
Xiuyu Li; Yijiang Liu; Long Lian; Huanrui Yang; Zhen Dong; Daniel Kang; Shanghang Zhang; Kurt Keutzer; |
310 | Robustifying Token Attention for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More critically, these tokens are not robust to corruptions, often leading to highly diverging attention patterns. In this paper, we intend to alleviate this overfocusing issue and make attention more stable through two general techniques: First, our Token-aware Average Pooling (TAP) module encourages the local neighborhood of each token to take part in the attention mechanism. |
Yong Guo; David Stutz; Bernt Schiele; |
311 | Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of weakly supervised Audio-Visual Video Parsing (AVVP), where the goal is to temporally localize events that are audible or visible and simultaneously classify them into known event categories. |
Kranthi Kumar Rachavarapu; Rajagopalan A. N.; |
312 | ADNet: Lane Shape Prediction Via Anchor Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the limitations of anchor-based lane detection methods, which have predominantly focused on fixed anchors that stem from the edges of the image, disregarding their versatility and quality. |
Lingyu Xiao; Xiang Li; Sen Yang; Wankou Yang; |
313 | UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and The OpenPCSeg Codebase Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a unified multi-modal LiDAR segmentation network, termed UniSeg, which leverages the information of RGB images and three views of the point cloud, and accomplishes semantic segmentation and panoptic segmentation simultaneously. |
Youquan Liu; Runnan Chen; Xin Li; Lingdong Kong; Yuchen Yang; Zhaoyang Xia; Yeqi Bai; Xinge Zhu; Yuexin Ma; Yikang Li; Yu Qiao; Yuenan Hou; |
314 | Sign Language Translation with Iterative Prototype Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents IP-SLT, a simple yet effective framework for sign language translation (SLT). |
Huijie Yao; Wengang Zhou; Hao Feng; Hezhen Hu; Hao Zhou; Houqiang Li; |
315 | Pixel-Wise Contrastive Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple but effective pixel-level self-supervised distillation framework friendly to dense prediction tasks. |
Junqiang Huang; Zichao Guo; |
316 | Efficient Deep Space Filling Curve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, MST generation is un-differentiable, which is infeasible to optimize via gradient descent. To remedy these issues, we propose a GNN-based SFC-search framework with a tailored algorithm that largely reduces computational cost of GNN. |
Wanli Chen; Xufeng Yao; Xinyun Zhang; Bei Yu; |
317 | GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The approach introduces a new training objective that leverages parallel corpora to align the representation spaces of different encoders. |
Can Qin; Ning Yu; Chen Xing; Shu Zhang; Zeyuan Chen; Stefano Ermon; Yun Fu; Caiming Xiong; Ran Xu; |
318 | Humans in 4D: Reconstructing and Tracking Humans with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach to reconstruct humans and track them over time. |
Shubham Goel; Georgios Pavlakos; Jathushan Rajasegaran; Angjoo Kanazawa; Jitendra Malik; |
319 | Ponder: Point Cloud Pre-training Via Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. |
Di Huang; Sida Peng; Tong He; Honghui Yang; Xiaowei Zhou; Wanli Ouyang; |
320 | Perpetual Humanoid Control for Real-time Simulated Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a physics-based humanoid controller that achieves high-fidelity motion imitation and fault-tolerant behavior in the presence of noisy input (e.g. pose estimates from video or generated from language) and unexpected falls. |
Zhengyi Luo; Jinkun Cao; AlexanderWinkler; Kris Kitani; Weipeng Xu; |
321 | HollowNeRF: Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cull away unnecessary regions of the feature grid, existing solutions rely on prior knowledge of object shape or periodically estimate object shape during training by repeated model evaluations, which are costly and wasteful. To address this issue, we propose HollowNeRF, a novel compression solution for hashgrid-based NeRF which automatically sparsifies the feature grid during the training phase. |
Xiufeng Xie; Riccardo Gherardi; Zhihong Pan; Stephen Huang; |
322 | A Complete Recipe for Diffusion Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. |
Kushagra Pandey; Stephan Mandt; |
323 | The Devil Is in The Crack Orientation: A New Perspective for Crack Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a first-of-its-kind oriented sub-crack detector, dubbed as CrackDet, which is derived from a novel piecewise angle definition, to ease the boundary discontinuity problem. |
Zhuangzhuang Chen; Jin Zhang; Zhuonan Lai; Guanming Zhu; Zun Liu; Jie Chen; Jianqiang Li; |
324 | FedPD: Federated Open Set Recognition with Parameter Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a parameter disentanglement guided federated open-set recognition (FedPD) algorithm to address two core challenges of FedOSR: cross-client inter-set interference between learning closed-set and open-set knowledge and cross-client intra-set inconsistency by data heterogeneity. |
Chen Yang; Meilu Zhu; Yifan Liu; Yixuan Yuan; |
325 | WaterMask: Instance Segmentation for Underwater Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first underwater image instance segmentation dataset (UIIS), which provides 4628 images for 7 categories with pixel-level annotations. |
Shijie Lian; Hua Li; Runmin Cong; Suqi Li; Wei Zhang; Sam Kwong; |
326 | Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, applying existing score-based methods for real-world denoising requires not only the explicit train of score priors on the target domain but also the careful design of sampling procedures for posterior inference, which is complicated and impractical. To address these limitations, we propose a score priors-guided deep variational inference, namely ScoreDVI, for practical real-world denoising. |
Jun Cheng; Tao Liu; Shan Tan; |
327 | L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation. |
Yasar Abbas Ur Rehman; Yan Gao; Pedro Porto Buarque de Gusmao; Mina Alibeigi; Jiajun Shen; Nicholas D. Lane; |
328 | Improving Transformer-based Image Matching By Cascaded Capturing Spatially Informative Keypoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But correlations produced by transformer-based methods are spatially limited to the center of source views’ coarse patches, because of the costly attention learning. In this work, we rethink this issue and find that such matching formulation degrades pose estimation, especially for low-resolution images. |
Chenjie Cao; Yanwei Fu; |
329 | Controllable Guide-Space for Generalizable Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a controllable guide-space (GS) method to enhance the discrimination of different forgery domains, so as to increase the forgery relevance of features and thereby improve the generalization. |
Ying Guo; Cheng Zhen; Pengfei Yan; |
330 | Calibrating Uncertainty for Semi-Supervised Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to calibrate model uncertainty for crowd counting. |
Chen LI; Xiaoling Hu; Shahira Abousamra; Chao Chen; |
331 | MosaiQ: Quantum Generative Adversarial Networks for Image Generation on NISQ Computers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, quantum image generation has been explored with many potential advantages over non-quantum techniques; however, previous techniques have suffered from poor quality and robustness. To address these problems, we introduce MosaiQ a high-quality quantum image generation GAN framework that can be executed on today’s Near-term Intermediate Scale Quantum (NISQ) computers. |
Daniel Silver; Tirthak Patel; William Cutler; Aditya Ranjan; Harshitta Gandhi; Devesh Tiwari; |
332 | DVIS: Decoupled Video Instance Segmentation Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Firstly, offline methods are limited by the tightly-coupled modeling paradigm, which treats all frames equally and disregards the interdependencies between adjacent frames. Consequently, this leads to the introduction of excessive noise during long-term temporal alignment. Secondly, online methods suffer from inadequate utilization of temporal information. To tackle these challenges, we propose a decoupling strategy for VIS by dividing it into three independent sub-tasks: segmentation, tracking, and refinement. |
Tao Zhang; Xingye Tian; Yu Wu; Shunping Ji; Xuebo Wang; Yuan Zhang; Pengfei Wan; |
333 | Segmentation of Tubular Structures Using Iterative Training with Tailored Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a minimal path method to simultaneously compute segmentation masks and extract centerlines of tubular structures with line-topology. |
Wei Liao; |
334 | Boundary-Aware Divide and Conquer: A Diffusion-Based Solution for Unsupervised Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel diffusion-based solution for unsupervised shadow removal, which separately models the shadow, non-shadow, and their boundary regions. |
Lanqing Guo; Chong Wang; Wenhan Yang; Yufei Wang; Bihan Wen; |
335 | Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. |
Delin Qu; Yizhen Lao; Zhigang Wang; Dong Wang; Bin Zhao; Xuelong Li; |
336 | Surface Extraction from Neural Unsigned Distance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method, named DualMesh-UDF, to extract a surface from unsigned distance functions (UDFs), encoded by neural networks, or neural UDFs. |
Congyi Zhang; Guying Lin; Lei Yang; Xin Li; Taku Komura; Scott Schaefer; John Keyser; Wenping Wang; |
337 | CBA: Improving Online Continual Learning Via Continual Bias Adaptor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the time-varying training setting, the model learned from a changing distribution easily forgets the previously learned knowledge and biases towards the newly received task. To address this problem, we propose a Continual Bias Adaptor (CBA) module to augment the classifier network to adapt to catastrophic distribution change during training, such that the classifier network is able to learn a stable consolidation of previously learned tasks. |
Quanziang Wang; Renzhen Wang; Yichen Wu; Xixi Jia; Deyu Meng; |
338 | GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a newly collected CardiacUDA dataset and a novel GraphEcho method for cardiac structure segmentation. |
Jiewen Yang; Xinpeng Ding; Ziyang Zheng; Xiaowei Xu; Xiaomeng Li; |
339 | Multi-view Spectral Polarization Propagation for Video Glass Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first polarization-guided video glass segmentation propagation solution (PGVS-Net) that can robustly and coherently propagate glass segmentation in RGB-P video sequences. |
Yu Qiao; Bo Dong; Ao Jin; Yu Fu; Seung-Hwan Baek; Felix Heide; Pieter Peers; Xiaopeng Wei; Xin Yang; |
340 | Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose an Efficient object-centric Representation amodal Segmentation (EoRaS). |
Ke Fan; Jingshi Lei; Xuelin Qian; Miaopeng Yu; Tianjun Xiao; Tong He; Zheng Zhang; Yanwei Fu; |
341 | Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify the overlooked problem of foreground shift as the main reason for this. |
Yuyang Liu; Yang Cong; Dipam Goswami; Xialei Liu; Joost van de Weijer; |
342 | Distilled Reverse Attention Network for Open-world Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Distilled Reverse Attention Network to address the challenges. |
Yun Li; Zhe Liu; Saurav Jha; Lina Yao; |
343 | DandelionNet: Domain Composition with Instance Adaptive Classification for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to preserve more complementary information from multiple domains at the meantime of reducing their domain gap, we propose that the multiple domains should not be tightly aligned but composite together, where all domains are pulled closer but still preserve their individuality respectively. |
Lanqing Hu; Meina Kan; Shiguang Shan; Xilin Chen; |
344 | TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TexFusion(Texture Diffusion), a new method to synthesize textures for given 3D geometries, using only large-scale text-guided image diffusion models. |
Tianshi Cao; Karsten Kreis; Sanja Fidler; Nicholas Sharp; Kangxue Yin; |
345 | Shift from Texture-bias to Shape-bias: Edge Deformation-based Augmentation for Robust Object Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to augment the training dataset by generating semantically meaningful shapes and samples, via a shape deformation-based online augmentation, namely as SDbOA. |
Xilin He; Qinliang Lin; Cheng Luo; Weicheng Xie; Siyang Song; Feng Liu; Linlin Shen; |
346 | Lighting Every Darkness in Two Pairs: A Calibration-Free Pipeline for RAW Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods suffer from several main deficiencies: 1) the calibration procedure is laborious and time-consuming, 2) denoisers for different cameras are difficult to transfer, and 3) the discrepancy between synthetic noise and real noise is enlarged by high digital gain. To overcome the above shortcomings, we propose a calibration-free pipeline for Lighting Every Drakness (LED), regardless of the digital gain or camera sensor. |
Xin Jin; Jia-Wen Xiao; Ling-Hao Han; Chunle Guo; Ruixun Zhang; Xialei Liu; Chongyi Li; |
347 | Data-free Knowledge Distillation for Fine-grained Visual Categorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the existing methods exploiting DFKD have achieved inspiring achievements in coarse-grained classification, in practical applications involving fine-grained classification tasks that require more detailed distinctions between similar categories, sub-optimal results are obtained. To address this issue, we propose an approach called DFKD-FGVC that extends DFKD to fine-grained vision categorization (FGVC) tasks. |
Renrong Shao; Wei Zhang; Jianhua Yin; Jun Wang; |
348 | MotionBERT: A Unified Perspective on Learning Human Motion Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. |
Wentao Zhu; Xiaoxuan Ma; Zhaoyang Liu; Libin Liu; Wayne Wu; Yizhou Wang; |
349 | PASTA: Proportional Amplitude Spectrum Training Augmentation for Syn-to-Real Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Proportional Amplitude Spectrum Training Augmentation (PASTA), a simple and effective augmentation strategy to improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. |
Prithvijit Chattopadhyay; Kartik Sarangmath; Vivek Vijaykumar; Judy Hoffman; |
350 | EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper rethinks and proposes a new framework as an infrastructure to advance Ego-HOI recognition by Probing, Curation and Adaption (EgoPCA). |
Yue Xu; Yong-Lu Li; Zhemin Huang; Michael Xu Liu; Cewu Lu; Yu-Wing Tai; Chi-Keung Tang; |
351 | Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that the key to a zero-shot single-view metric depth model lies in the combination of large-scale data training and resolving the metric ambiguity from various camera models. |
Wei Yin; Chi Zhang; Hao Chen; Zhipeng Cai; Gang Yu; Kaixuan Wang; Xiaozhi Chen; Chunhua Shen; |
352 | I Can’t Believe There’s No Images! Learning Visual Tasks Using Only Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many high-level skills that are required for computer vision tasks, such as parsing questions, comparing and contrasting semantics, and writing descriptions, are also required in other domains such as natural language processing. In this paper, we ask whether it is possible to learn those skills from text data and then transfer them to vision tasks without ever training on visual training data. |
Sophia Gu; Christopher Clark; Aniruddha Kembhavi; |
353 | Lightweight Image Super-Resolution with Superpixel Token Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, this conventional regular patch division is too coarse and lacks interpretability, resulting in artifacts and non-similar structure interference during attention operations. To address these challenges, we propose a novel super token interaction network (SPIN). |
Aiping Zhang; Wenqi Ren; Yi Liu; Xiaochun Cao; |
354 | Feature Prediction Diffusion Model for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the impressive generative and anti-noise capacity of diffusion model (DM), in this work, we introduce a novel DM-based method to predict the features of video frames for anomaly detection. |
Cheng Yan; Shiyu Zhang; Yang Liu; Guansong Pang; Wenjun Wang; |
355 | RANA: Relightable Articulated Neural Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting. |
Umar Iqbal; Akin Caliskan; Koki Nagano; Sameh Khamis; Pavlo Molchanov; Jan Kautz; |
356 | Iterative Denoiser and Noise Estimator for Self-Supervised Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Denoise-Corrupt-Denoise pipeline (DCD-Net) for self-supervised image denoising. |
Yunhao Zou; Chenggang Yan; Ying Fu; |
357 | MasQCLIP for Open-Vocabulary Universal Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method for open-vocabulary universal image segmentation, which is capable of performing instance, semantic, and panoptic segmentation under a unified framework. |
Xin Xu; Tianyi Xiong; Zheng Ding; Zhuowen Tu; |
358 | Memory-and-Anticipation Transformer for Online Action Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we rethink the temporal dependence of event evolution and propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks. |
Jiahao Wang; Guo Chen; Yifei Huang; Limin Wang; Tong Lu; |
359 | Self-similarity Driven Scale-invariant Learning for Weakly Supervised Person Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address it by proposing a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL). |
Benzhi Wang; Yang Yang; Jinlin Wu; Guo-jun Qi; Zhen Lei; |
360 | MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified system for multi-person, diverse, and high-fidelity talking portrait generation. |
Yunfei Liu; Lijian Lin; Fei Yu; Changyin Zhou; Yu Li; |
361 | Realistic Full-Body Tracking from Sparse Observations Via Joint-Level Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only. |
Xiaozheng Zheng; Zhuo Su; Chao Wen; Zhou Xue; Xiaojie Jin; |
362 | MetaF2N: Blind Image Super-Resolution By Learning Efficient Model Adaptation from Faces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained faces to fine-tune model parameters for adapting to the whole natural image in a meta-learning framework. |
Zhicun Yin; Ming Liu; Xiaoming Li; Hui Yang; Longan Xiao; Wangmeng Zuo; |
363 | Lighting Up NeRF Via Unsupervised Decomposition and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. |
Haoyuan Wang; Xiaogang Xu; Ke Xu; Rynson W.H. Lau; |
364 | ViM: Vision Middleware for Unified Downstream Transferring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents Vision Middleware (ViM), a new learning paradigm that targets unified transferring from a single foundation model to a variety of downstream tasks. |
Yutong Feng; Biao Gong; Jianwen Jiang; Yiliang Lv; Yujun Shen; Deli Zhao; Jingren Zhou; |
365 | DIRE for Diffusion-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we seek to build a detector for telling apart real images from diffusion-generated images. |
Zhendong Wang; Jianmin Bao; Wengang Zhou; Weilun Wang; Hezhen Hu; Hong Chen; Houqiang Li; |
366 | Ord2Seq: Regarding Ordinal Regression As Label Sequence Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple sequence prediction framework for ordinal regression called Ord2Seq, which, for the first time, transforms each ordinal category label into a special label sequence and thus regards an ordinal regression task as a sequence prediction process. |
Jinhong Wang; Yi Cheng; Jintai Chen; TingTing Chen; Danny Chen; Jian Wu; |
367 | Bring Clipart to Life Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new interaction method by guiding the editing with abstract clipart, composed of a set of simple semantic parts, allowing users to control across face photos with simple clicks. |
Nanxuan Zhao; Shengqi Dang; Hexun Lin; Yang Shi; Nan Cao; |
368 | Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. |
Yingxuan You; Hong Liu; Ti Wang; Wenhao Li; Runwei Ding; Xia Li; |
369 | Noise2Info: Noisy Image to Information of Noise for Self-Supervised Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, it is unrealistic to assume that \sigma_n is known for pursuing high model performance. To alleviate this issue, we propose Noise2Info to extract the critical information, the standard deviation \sigma_n of injected noise, only based on the noisy images. |
Jiachuan Wang; Shimin Di; Lei Chen; Charles Wang Wai Ng; |
370 | Controllable Visual-Tactile Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we leverage deep generative models to create a multi-sensory experience where users can touch and see the synthesized object when sliding their fingers on a haptic surface. |
Ruihan Gao; Wenzhen Yuan; Jun-Yan Zhu; |
371 | Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By discussing the properties of each group of methods, we derive SimPool, a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. |
Bill Psomas; Ioannis Kakogeorgiou; Konstantinos Karantzalos; Yannis Avrithis; |
372 | SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To advance the diversity and annotation quality of human models, we introduce a new synthetic dataset, SynBody, with three appealing features: 1) a clothed parametric human model that can generate a diverse range of subjects; 2) the layered human representation that naturally offers high-quality 3D annotations to support multiple tasks; 3) a scalable system for producing realistic data to facilitate real-world tasks. |
Zhitao Yang; Zhongang Cai; Haiyi Mei; Shuai Liu; Zhaoxi Chen; Weiye Xiao; Yukun Wei; Zhongfei Qing; Chen Wei; Bo Dai; Wayne Wu; Chen Qian; Dahua Lin; Ziwei Liu; Lei Yang; |
373 | Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. |
Stanislaw Szymanowicz; Christian Rupprecht; Andrea Vedaldi; |
374 | LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores our key insight: synthetic text images are good visual prompts for vision-language models! |
Cheng Shi; Sibei Yang; |
375 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing approaches that jointly learn 2D-3D feature matching suffer from low inliers due to representational differences between the two modalities, and the methods that bypass this problem into classification have an issue of poor refinement. In this work, we propose EP2P-Loc, a novel large-scale visual localization method that mitigates such appearance discrepancy and enables end-to-end training for pose estimation. |
Minjung Kim; Junseo Koo; Gunhee Kim; |
376 | SIRA-PCR: Sim-to-Real Adaptation for 3D Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design SIRA-PCR, a new approach to 3D point cloud registration. |
Suyi Chen; Hao Xu; Ru Li; Guanghui Liu; Chi-Wing Fu; Shuaicheng Liu; |
377 | FeatEnHancer: Enhancing Hierarchical Features for Object Detection and Beyond Under Low-Light Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that optimizing enhanced image representation pertaining to the loss of the downstream task can result in more expressive representations. Therefore, in this work, we propose a novel module, FeatEnHancer, that hierarchically combines multiscale features using multiheaded attention guided by task-related loss function to create suitable representations. |
Khurram Azeem Hashmi; Goutham Kallempudi; Didier Stricker; Muhammad Zeshan Afzal; |
378 | SOAR: Scene-debiasing Open-set Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem severely degrades the open-set action recognition performance when the testing samples exhibit scene distributions different from the training samples. To mitigate this scene bias, we propose a Scene-debiasing Open-set Action Recognition method (SOAR), which features an adversarial reconstruction module and an adaptive adversarial scene classification module. |
Yuanhao Zhai; Ziyi Liu; Zhenyu Wu; Yi Wu; Chunluan Zhou; David Doermann; Junsong Yuan; Gang Hua; |
379 | Physics-Augmented Autoencoder for 3D Skeleton-Based Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce physics-augmented autoencoder (PAA), a framework for 3D skeleton-based human gait recognition. |
Hongji Guo; Qiang Ji; |
380 | Regularized Primitive Graph Learning for Unified Vector Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose GraphMapper, a unified framework for end-to-end vector map extraction from satellite images. |
Lei Wang; Min Dai; Jianan He; Jingwei Huang; |
381 | Saliency Regularization for Self-Training with Partial Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose saliency regularization (SR) for a novel self-training framework. |
Shouwen Wang; Qian Wan; Xiang Xiang; Zhigang Zeng; |
382 | Stabilizing Visual Reinforcement Learning Via Asymmetric Interactive Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze that the training instability arises from the oscillating self-overfitting of the heavy-optimizable encoder. |
Yunpeng Zhai; Peixi Peng; Yifan Zhao; Yangru Huang; Yonghong Tian; |
383 | FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose FlipNeRF, a novel regularization method for few-shot novel view synthesis by utilizing our proposed flipped reflection rays. |
Seunghyeon Seo; Yeonjin Chang; Nojun Kwak; |
384 | Discovering Spatio-Temporal Rationales for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenge, we highlight the importance of identifying question-critical temporal moments and spatial objects from the vast amount of video content. Towards this, we propose a Spatio-Temporal Rationalizer (STR), a differentiable selection module that adaptively collects question-critical moments and objects using cross-modal interaction. |
Yicong Li; Junbin Xiao; Chun Feng; Xiang Wang; Tat-Seng Chua; |
385 | Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly initialized network at each iteration and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly. |
Jiamian Wang; Huan Wang; Yulun Zhang; Yun Fu; Zhiqiang Tao; |
386 | Learning Hierarchical Features with Joint Latent Space Energy-Based Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning. |
Jiali Cui; Ying Nian Wu; Tian Han; |
387 | UniFormerV2: Unlocking The Potential of Image ViTs for Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the emergence of powerful open-source image ViTs, we propose unlocking their potential for video understanding with efficient UniFormer designs. |
Kunchang Li; Yali Wang; Yinan He; Yizhuo Li; Yi Wang; Limin Wang; Yu Qiao; |
388 | G2L: Semantically Aligned and Uniform Video Grounding Via Geodesic and Game Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. |
Hongxiang Li; Meng Cao; Xuxin Cheng; Yaowei Li; Zhihong Zhu; Yuexian Zou; |
389 | TARGET: Federated Class-Continual Learning Via Exemplar-Free Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper focuses on an under-explored yet important problem: Federated Class-Continual Learning (FCCL), where new classes are dynamically added in federated learning. |
Jie Zhang; Chen Chen; Weiming Zhuang; Lingjuan Lyu; |
390 | FashionNTM: Multi-turn Fashion Image Retrieval Via Cascaded Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. |
Anwesan Pal; Sahil Wadhwa; Ayush Jaiswal; Xu Zhang; Yue Wu; Rakesh Chada; Pradeep Natarajan; Henrik I. Christensen; |
391 | MolGrapher: Graph-based Visual Recognition of Chemical Structures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce MolGrapher to recognize chemical structures visually. |
Lucas Morin; Martin Danelljan; Maria Isabel Agea; Ahmed Nassar; Valery Weber; Ingmar Meijer; Peter Staar; Fisher Yu; |
392 | SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image based on improved multiplane images (MPI). |
Xiaoyu Zhou; Zhiwei Lin; Xiaojun Shan; Yongtao Wang; Deqing Sun; Ming-Hsuan Yang; |
393 | DiffV2S: Diffusion-Based Video-to-Speech Synthesis with Vision-Guided Speaker Embedding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. |
Jeongsoo Choi; Joanna Hong; Yong Man Ro; |
394 | PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. |
Yang Zheng; Adam W. Harley; Bokui Shen; Gordon Wetzstein; Leonidas J. Guibas; |
395 | The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an additional pre-pretraining stage that is simple and uses the self supervised MAE technique to initialize the model. |
Mannat Singh; Quentin Duval; Kalyan Vasudev Alwala; Haoqi Fan; Vaibhav Aggarwal; Aaron Adcock; Armand Joulin; Piotr Dollar; Christoph Feichtenhofer; Ross Girshick; Rohit Girdhar; Ishan Misra; |
396 | Towards Zero Domain Gap: A Comprehensive Study of Realistic LiDAR Simulation for Autonomy Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel "paired-scenario" approach to evaluating the domain gap of a LiDAR simulator by reconstructing digital twins of real world scenarios. |
Sivabalan Manivasagam; Ioan Andrei Bârsan; Jingkang Wang; Ze Yang; Raquel Urtasun; |
397 | GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel unsupervised domain adaptive 3D detection framework, namely Geometry-aware Prototype Alignment (GPA-3D), which explicitly leverages the intrinsic geometric relationship from point cloud objects to reduce the feature discrepancy, thus facilitating cross-domain transferring. |
Ziyu Li; Jingming Guo; Tongtong Cao; Liu Bingbing; Wankou Yang; |
398 | TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the task of generalizable neural human rendering which trains conditional Neural Radiance Fields (NeRF) from multi-view videos of different characters. |
Xiao Pan; Zongxin Yang; Jianxin Ma; Chang Zhou; Yi Yang; |
399 | LNPL-MIL: Learning from Noisy Pseudo Labels for Promoting Multiple Instance Learning in Whole Slide Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In MIL, we propose a Transformer aware of instance Order and Distribution (TOD-MIL) that strengthens instances correlation and weakens semantical unalignment in the bag. |
Zhuchen Shao; Yifeng Wang; Yang Chen; Hao Bian; Shaohui Liu; Haoqian Wang; Yongbing Zhang; |
400 | Few-Shot Dataset Distillation Via Translative Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on few-shot dataset distillation, where a distilled dataset is synthesized with only a few or even a single network. |
Songhua Liu; Xinchao Wang; |
401 | Random Sub-Samples Generation for Self-Supervised Real Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, as a typical method for self-supervised denoising, the original blind spot network (BSN) assumes that the noise is pixel-wise independent, which is much different from the real cases. To solve this problem, we propose a novel self-supervised real image denoising framework named Sampling Difference As Perturbation (SDAP) based on Random Sub-samples Generation (RSG) with a cyclic sample difference loss. |
Yizhong Pan; Xiao Liu; Xiangyu Liao; Yuanzhouhan Cao; Chao Ren; |
402 | Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, averaging over LLM-generated class descriptors, e.g. "waffle, which has a round shape", can notably improve generalization performance. In this work, we critically study this behavior and propose WaffleCLIP, a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random character and word descriptors. |
Karsten Roth; Jae Myung Kim; A. Sophia Koepke; Oriol Vinyals; Cordelia Schmid; Zeynep Akata; |
403 | Unsupervised Surface Anomaly Detection with Diffusion Probabilistic Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are three major challenges to the practical application of this approach: 1) the reconstruction quality needs to be further improved since it has a great impact on the final result, especially for images with structural changes; 2) it is observed that for many neural networks, the anomalies can also be well reconstructed, which severely violates the underlying assumption; 3) since reconstruction is an ill-conditioned problem, a test instance may correspond to multiple normal patterns, but most current reconstruction-based methods have ignored this critical fact. In this paper, we propose DiffAD, a method for unsupervised anomaly detection based on the latent diffusion model, inspired by its ability to generate high-quality and diverse images. |
Xinyi Zhang; Naiqi Li; Jiawei Li; Tao Dai; Yong Jiang; Shu-Tao Xia; |
404 | AutoAD II: The Sequel – Who, When, and What in Movie Audio Description Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we develop a new model for automatically generating movie AD, given CLIP visual features of the frames, the cast list, and the temporal locations of the speech; addressing all three of the `who’, `when’, and `what’ questions: (i) who — we introduce a character bank consisting of the character’s name, the actor that played the part, and a CLIP feature of their face, for the principal cast of each movie, and demonstrate how this can be used to improve naming in the generated AD; (ii) when — we investigate several models for determining whether an AD should be generated for a time interval or not, based on the visual content of the interval and its neighbours; and (iii) what — we implement a new vision-language model for this task, that can ingest the proposals from the character bank, whilst conditioning on the visual features using cross-attention, and demonstrate how this improves over previous architectures for AD text generation in an apples-to-apples comparison. |
Tengda Han; Max Bain; Arsha Nagrani; Gul Varol; Weidi Xie; Andrew Zisserman; |
405 | TinyCLIP: CLIP Distillation Via Affinity Mimicking and Weight Inheritance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models. |
Kan Wu; Houwen Peng; Zhenghong Zhou; Bin Xiao; Mengchen Liu; Lu Yuan; Hong Xuan; Michael Valenzuela; Xi (Stephen) Chen; Xinggang Wang; Hongyang Chao; Han Hu; |
406 | Hyperbolic Chamfer Distance for Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is well known that CD is vulnerable to outliers, leading to the drift towards suboptimal models. In contrast to the literature where most works address such issues in Euclidean space, we propose an extremely simple yet powerful metric for point cloud completion, namely Hyperbolic Chamfer Distance (HyperCD), that computes CD in hyperbolic space. |
Fangzhou Lin; Yun Yue; Songlin Hou; Xuechu Yu; Yajun Xu; Kazunori D Yamada; Ziming Zhang; |
407 | Democratising 2D Sketch to 3D Shape Retrieval Through Pivoting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the problem of 2D sketch to 3D shape retrieval, but with a focus on democratising the process. |
Pinaki Nath Chowdhury; Ayan Kumar Bhunia; Aneeshan Sain; Subhadeep Koley; Tao Xiang; Yi-Zhe Song; |
408 | Simoun: Synergizing Interactive Motion-appearance Understanding for Vision-based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we present Synergizing Interactive Motion-appearance Understanding (Simoun), a unified framework for vision-based RL. |
Yangru Huang; Peixi Peng; Yifan Zhao; Yunpeng Zhai; Haoran Xu; Yonghong Tian; |
409 | AG3D: Learning to Generate 3D Avatars from 2D Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new adversarial generative model of realistic 3D people from 2D images. |
Zijian Dong; Xu Chen; Jinlong Yang; Michael J. Black; Otmar Hilliges; Andreas Geiger; |
410 | KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we resort to a novel kernel coding rate maximization (KECOR) strategy which aims to identify the most informative point clouds to acquire labels through the lens of information theory. |
Yadan Luo; Zhuoxiao Chen; Zhen Fang; Zheng Zhang; Mahsa Baktashmotlagh; Zi Huang; |
411 | Learned Image Reasoning Prior Penetrates Deep Unfolding Network for Panchromatic and Multi-spectral Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The success of deep neural networks for pan-sharpening is commonly in a form of black box, lacking transparency and interpretability. To alleviate this issue, we propose a novel model-driven deep unfolding framework with image reasoning prior tailored for the pan-sharpening task. |
Man Zhou; Jie Huang; Naishan Zheng; Chongyi Li; |
412 | Representation Disparity-aware Distillation for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on developing knowledge distillation (KD) for compact 3D detectors. |
Yanjing Li; Sheng Xu; Mingbao Lin; Jihao Yin; Baochang Zhang; Xianbin Cao; |
413 | NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel framework for learning a compositional generative model of humans and objects (backpacks, coats, scarves, and more) from real-world 3D scans. |
Taeksoo Kim; Shunsuke Saito; Hanbyul Joo; |
414 | Breaking The Limits of Text-conditioned 3D Motion Synthesis with Elaborative Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Differing from the majority of previous works, which regard actions as single entities and can only generate short sequences for simple motions, we propose EMS, an elaborative motion synthesis model conditioned on detailed natural language descriptions. |
Yijun Qian; Jack Urbanek; Alexander G. Hauptmann; Jungdam Won; |
415 | VL-PET: Vision-and-Language Parameter-Efficient Tuning Via Granularity Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Vision-and-Language Parameter-Efficient Tuning (VL-PET) framework to impose effective control over modular modifications via a novel granularity-controlled mechanism. |
Zi-Yuan Hu; Yanyang Li; Michael R. Lyu; Liwei Wang; |
416 | ROME: Robustifying Memory-Efficient NAS Via Topology Disentanglement and Gradient Accumulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure. |
Xiaoxing Wang; Xiangxiang Chu; Yuda Fan; Zhexi Zhang; Bo Zhang; Xiaokang Yang; Junchi Yan; |
417 | Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fill the gap, this paper makes progresses from two distinct perspectives: (1) It presents a Hierarchical Concept Graph (HCG) that discriminates and associates multi-granularity concepts with a multi-layered hierarchical structure, aligning visual observations with knowledge across different levels to alleviate data biases. |
Yifeng Zhang; Shi Chen; Qi Zhao; |
418 | 3D-aware Image Generation Using 2D Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. |
Jianfeng Xiang; Jiaolong Yang; Binbin Huang; Xin Tong; |
419 | Locating Noise Is Halfway Denoising for Semi-Supervised Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work shows that locating the patch-wise noisy region is a better way to deal with noise. |
Yan Fang; Feng Zhu; Bowen Cheng; Luoqi Liu; Yao Zhao; Yunchao Wei; |
420 | Learning Non-Local Spatial-Angular Correlation for Light Field Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective method to learn the non-local spatial-angular correlation for LF image SR. |
Zhengyu Liang; Yingqian Wang; Longguang Wang; Jungang Yang; Shilin Zhou; Yulan Guo; |
421 | ICE-NeRF: Interactive Color Editing of NeRFs Via Decomposition-Aware Weight Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ICE-NeRF, an Interactive Color Editing framework that performs color editing by taking a pre-trained NeRF and a rough user mask as input. |
Jae-Hyeok Lee; Dae-Shik Kim; |
422 | SPANet: Frequency-balancing Token Mixer Using Spectral Pooling Aggregation Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account for this observation, we hypothesize that utilizing optimal token mixers that capture balanced representations of both high- and low-frequency components can enhance the performance of models. |
Guhnoo Yun; Juhan Yoo; Kijung Kim; Jeongho Lee; Dong Hwan Kim; |
423 | ASAG: Building Strong One-Decoder-Layer Sparse Detectors Via Adaptive Sparse Anchor Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although they gain remarkable acceleration, their performance still lags behind their six-decoder-layer counterparts by a large margin. In this work, we aim to bridge this performance gap while retaining fast speed. |
Shenghao Fu; Junkai Yan; Yipeng Gao; Xiaohua Xie; Wei-Shi Zheng; |
424 | MGMAE: Motion Guided Masking for Video Masked Autoencoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to further improve the performance of video masked autoencoding by introducing a motion guided masking strategy. |
Bingkun Huang; Zhiyu Zhao; Guozhen Zhang; Yu Qiao; Limin Wang; |
425 | The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Semi-supervised learning (SSL) is gaining popularity as it reduces cost of machine learning (ML) by training high performance models using unlabeled data. In this paper, we reveal that the key feature of SSL, i.e., learning from (non-inspected) unlabeled data, exposes SSL to strong poisoning attacks that can significantly damage its security. |
Virat Shejwalkar; Lingjuan Lyu; Amir Houmansadr; |
426 | SSB: Simple But Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the challenging and realistic open-set SSL setting, where the goal is to both correctly classify inliers and to detect outliers. |
Yue Fan; Anna Kukleva; Dengxin Dai; Bernt Schiele; |
427 | StyleDiffusion: Controllable Disentangled Style Transfer Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new C-S disentangled framework for style transfer without using previous assumptions. |
Zhizhong Wang; Lei Zhao; Wei Xing; |
428 | AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this leads to loss of high-level information, resulting in low-quality and unnatural UAEs. In light of this, we propose AddDiffuser, a new method for synthesizing natural UAEs using diffusion models. |
Xinquan Chen; Xitong Gao; Juanjuan Zhao; Kejiang Ye; Cheng-Zhong Xu; |
429 | ViewRefer: Grasp The Multi-view Knowledge for 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities. |
Zoey Guo; Yiwen Tang; Ray Zhang; Dong Wang; Zhigang Wang; Bin Zhao; Xuelong Li; |
430 | CaPhy: Capturing Physical Properties for Animatable Human Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CaPhy, a novel method for reconstructing animatable human avatars with realistic dynamic properties for clothing. |
Zhaoqi Su; Liangxiao Hu; Siyou Lin; Hongwen Zhang; Shengping Zhang; Justus Thies; Yebin Liu; |
431 | DarSwin: Distortion Aware Radial Swin Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. |
Akshaya Athwale; Arman Afrasiyabi; Justin Lagüe; Ichrak Shili; Ola Ahmad; Jean-François Lalonde; |
432 | Fine-grained Unsupervised Domain Adaptation for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel perspective that adjacent-view sequences exhibit overlapping views, which can be leveraged by the network to gradually attain cross-view and cross-dressing capabilities without pre-training on the labeled source domain. |
Kang Ma; Ying Fu; Dezhi Zheng; Yunjie Peng; Chunshui Cao; Yongzhen Huang; |
433 | Cross-Modal Orthogonal High-Rank Augmentation for RGB-Event Transformer-Trackers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively. |
Zhiyu Zhu; Junhui Hou; Dapeng Oliver Wu; |
434 | Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel 3D-to- 2D generative pre-training method that is adaptable to any point cloud model. |
Ziyi Wang; Xumin Yu; Yongming Rao; Jie Zhou; Jiwen Lu; |
435 | Open-vocabulary Panoptic Segmentation with Embedding Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panoptic Segmentation. |
Xi Chen; Shuang Li; Ser-Nam Lim; Antonio Torralba; Hengshuang Zhao; |
436 | Beyond Single Path Integrated Gradients for Reliable Input Attribution Via Randomized Path Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous work has shown that such method often produces noisy and unreliable attributions during the integration of the gradients over the path defined in the input space. In this paper, we tackle this issue by estimating the distribution of the possible attributions according to the integrating path selection. |
Giyoung Jeon; Haedong Jeong; Jaesik Choi; |
437 | Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find while instance-agnostic static prompting, e.g. VPT, shows some efficacy in downstream transfer, it is vulnerable to the distribution diversity caused by various types of noises in real-world point cloud data. To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models. |
Yaohua Zha; Jinpeng Wang; Tao Dai; Bin Chen; Zhi Wang; Shu-Tao Xia; |
438 | How to Boost Face Recognition with StyleGAN? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show that a simple approach based on fine-tuning an encoder for StyleGAN allows to improve upon the state-of-the-art facial recognition and performs better compared to training on synthetic face identities. |
Artem Sevastopolskiy; Yury Malkov; Nikita Durasov; Luisa Verdoliva; Matthias Nießner; |
439 | Text2Tex: Text-driven Texture Synthesis Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts. |
Dave Zhenyu Chen; Yawar Siddiqui; Hsin-Ying Lee; Sergey Tulyakov; Matthias Nießner; |
440 | MUVA: A New Large-Scale Benchmark for Multi-View Amodal Instance Segmentation in The Shopping Scenario Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At present, this approach has not yet been explored by existing methods and datasets. To bridge this gap, we propose a new task called Multi-view Amodal Instance Segmentation (MAIS) and introduce the MUVA dataset, the first MUlti-View AIS dataset that takes the shopping scenario as instantiation. |
Zhixuan Li; Weining Ye; Juan Terven; Zachary Bennett; Ying Zheng; Tingting Jiang; Tiejun Huang; |
441 | Foreground-Background Separation Through Concept Distillation from Generative Image Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions, without requiring segmentation labels. |
Mischa Dombrowski; Hadrien Reynaud; Matthew Baugh; Bernhard Kainz; |
442 | ENVIDR: Implicit Differentiable Renderer with Neural Environment Lighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce ENVIDR, a rendering and modeling framework for high-quality rendering and reconstruction of surfaces with challenging specular reflections. |
Ruofan Liang; Huiting Chen; Chunlin Li; Fan Chen; Selvakumar Panneer; Nandita Vijaykumar; |
443 | Not All Steps Are Created Equal: Selective Diffusion Distillation for Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework, Selective Diffusion Distillation (SDD), that ensures both the fidelity and editability of images. |
Luozhou Wang; Shuai Yang; Shu Liu; Ying-cong Chen; |
444 | SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a storage-efficient training strategy for vision classifiers for large-scale datasets (e.g., ImageNet) that only uses 1024 tokens per instance without using the raw level pixels; our token storage only needs <1% of the original JPEG-compressed raw pixels. |
Song Park; Sanghyuk Chun; Byeongho Heo; Wonjae Kim; Sangdoo Yun; |
445 | ALIP: Adaptive Language-Image Pre-Training with Synthetic Caption Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning. To address this issue, we first utilize the OFA model to generate synthetic captions that focus on the image content. |
Kaicheng Yang; Jiankang Deng; Xiang An; Jiawei Li; Ziyong Feng; Jia Guo; Jing Yang; Tongliang Liu; |
446 | GeoUDF: Surface Reconstruction from 3D Point Clouds Via Geometry-guided Distance Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a learning-based method, namely GeoUDF, to tackle the long-standing and challenging problem of reconstructing a discrete surface from a sparse point cloud. |
Siyu Ren; Junhui Hou; Xiaodong Chen; Ying He; Wenping Wang; |
447 | LaPE: Layer-adaptive Position Embedding for Vision Transformers with Independent Layer Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in restricted and monotonic PE across layers, as the shared LN affine parameters are not dedicated to PE, and the PE cannot be adjusted on a per-layer basis. To overcome these limitations, we propose using two independent LNs for token embeddings and PE in each layer, and progressively delivering PE across layers. |
Runyi Yu; Zhennan Wang; Yinhuai Wang; Kehan Li; Chang Liu; Haoyi Duan; Xiangyang Ji; Jie Chen; |
448 | CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the domain gap between 3D and images is unsolved, so that V-L pre-trained models are restricted in 3D downstream tasks. To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification. |
Tianyu Huang; Bowen Dong; Yunhan Yang; Xiaoshui Huang; Rynson W.H. Lau; Wanli Ouyang; Wangmeng Zuo; |
449 | Parametric Classification for Generalized Category Discovery: A Baseline Study Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that two prediction biases exist: the classifier tends to predict seen classes more often, and produces an imbalanced distribution across seen and novel categories. Based on these findings, we propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers. |
Xin Wen; Bingchen Zhao; Xiaojuan Qi; |
450 | MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. |
Ruopeng Gao; Limin Wang; |
451 | RawHDR: High Dynamic Range Image Reconstruction from A Single Raw Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a model customized for Raw images, considering the unique feature of Raw data to learn the Raw-to-HDR mapping. |
Yunhao Zou; Chenggang Yan; Ying Fu; |
452 | Denoising Diffusion Autoencoders Are Unified Self-supervised Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training. |
Weilai Xiang; Hongyu Yang; Di Huang; Yunhong Wang; |
453 | Robust Object Modeling for Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enjoy the merits of both methods, we propose a robust object modeling framework for visual tracking (ROMTrack), which simultaneously models the inherent template and the hybrid template features. |
Yidong Cai; Jie Liu; Jie Tang; Gangshan Wu; |
454 | FSI: Frequency and Spatial Interactive Learning for Image Restoration in Under-Display Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new perspective to handle various diffraction in UDC images by jointly exploring the feature restoration in the frequency and spatial domains, and present a Frequency and Spatial Interactive Learning Network (FSI). |
Chengxu Liu; Xuan Wang; Shuai Li; Yuzhi Wang; Xueming Qian; |
455 | Cross-view Topology Based Consistent and Complementary Information for Deep Multi-view Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, finding correlations between multiple views in an unsupervised setting is a significant challenge. To tackle these issues, we present a novel Cross-view Topology based Consistent and Complementary information extraction framework, termed CTCC. |
Zhibin Dong; Siwei Wang; Jiaqi Jin; Xinwang Liu; En Zhu; |
456 | Distribution-Consistent Modal Recovering for Incomplete Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing methods often directly estimate missed modalities from the observed ones by deep neural networks, lacking consideration of the distribution gap between modalities, resulting in the inconsistency of distributions between the recovered data and true data. To mitigate this issue, in this work, we propose a novel recovery paradigm, Distribution-Consistent Modal Recovering (DiCMoR), to transfer the distributions from available modalities to missed modalities, which thus maintains the distribution consistency of recovered data. |
Yuanzhi Wang; Zhen Cui; Yong Li; |
457 | ContactGen: Generative Contact Modeling for Grasp Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given an input object, we propose a conditional generative model to predict ContactGen and adopt model-based optimization to predict diverse and geometrically feasible grasps. |
Shaowei Liu; Yang Zhou; Jimei Yang; Saurabh Gupta; Shenlong Wang; |
458 | Temporal Collection and Distribution for Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It requires aligning the natural language expression with the objects’ motions and their dynamic associations at the global video level but segmenting objects at the frame level. To achieve this goal, we propose to simultaneously maintain a global referent token and a sequence of object queries, where the former is responsible for capturing video-level referent according to the language expression, while the latter serves to better locate and segment objects with each frame. |
Jiajin Tang; Ge Zheng; Sibei Yang; |
459 | SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features and transform image features into semantic-aware BEV features. |
Jinqing Zhang; Yanan Zhang; Qingjie Liu; Yunhong Wang; |
460 | Variational Degeneration to Structural Refinement: A Unified Framework for Superimposed Image Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified framework for superimposed image decomposition that can cope with intricate degradation patterns adaptively. |
Wenyu Li; Yan Xu; Yang Yang; Haoran Ji; Yue Lang; |
461 | Global Knowledge Calibration for Fast Open-Vocabulary Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, this approach incurs heavy computational overheads as the CLIP vision encoder must be repeatedly forward-passed for each mask, rendering it impractical for real-world applications. To address this challenge, our objective is to develop a fast OVS model that can perform comparably or better without the extra computational burden of the CLIP image encoder during inference. |
Kunyang Han; Yong Liu; Jun Hao Liew; Henghui Ding; Jiajun Liu; Yitong Wang; Yansong Tang; Yujiu Yang; Jiashi Feng; Yao Zhao; Yunchao Wei; |
462 | Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We rigorously evaluate existing state-of-the-art methods and highlight their limitations in the egocentric scenario, specifically on multi-human tracking. To address such limitations, we propose EgoFormer, a novel approach with a multi-stream transformer architecture and explicit 3D spatial reasoning to estimate and track the human pose. |
Rawal Khirodkar; Aayush Bansal; Lingni Ma; Richard Newcombe; Minh Vo; Kris Kitani; |
463 | Focal Network for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The aim of this study is to develop an efficient and effective framework for image restoration. |
Yuning Cui; Wenqi Ren; Xiaochun Cao; Alois Knoll; |
464 | Indoor Depth Recovery Based on Deep Unfolding with Non-Local Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Utilizing the property that there is a large amount of non-local common characteristics in depth images, we propose a novel model-guided depth recovery method, namely the DC-NLAR model. |
Yuhui Dai; Junkang Zhang; Faming Fang; Guixu Zhang; |
465 | Compatibility of Fundamental Matrices for Complete Viewing Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that the eigenvalue condition is redundant in the generic and collinear cases. |
Martin Bråtelund; Felix Rydell; |
466 | GAFlow: Incorporating Gaussian Attention Into Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we push Gaussian Attention (GA) into the optical flow models to accentuate local properties during representation learning and enforce the motion affinity during matching. |
Ao Luo; Fan Yang; Xin Li; Lang Nie; Chunyu Lin; Haoqiang Fan; Shuaicheng Liu; |
467 | MAtch, EXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, VL models tend to over-represent objects while paying much less attention to verbs, and require additional tuning on video data for best zero-shot action recognition performance. While previous work relied on large-scale, fully-annotated data, in this work we propose an unsupervised approach. |
Wei Lin; Leonid Karlinsky; Nina Shvetsova; Horst Possegger; Mateusz Kozinski; Rameswar Panda; Rogerio Feris; Hilde Kuehne; Horst Bischof; |
468 | Space Engage: Collaborative Space Supervision for Contrastive-Based Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous contrastive-based S4 methods merely rely on the supervision from the model’s output (logits) in logit space during unlabeled training. In contrast, we utilize the outputs in both logit space and representation space to obtain supervision in a collaborative way. |
Changqi Wang; Haoyu Xie; Yuhui Yuan; Chong Fu; Xiangyu Yue; |
469 | Delving Into Motion-Aware Matching for Monocular 3D Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose MoMA-M3T, a framework that mainly consists of three motion-aware components. |
Kuan-Chih Huang; Ming-Hsuan Yang; Yi-Hsuan Tsai; |
470 | SoDaCam: Software-defined Cameras Via Single-Photon Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present "SoDaCam" that provides reinterpretable cameras at the granularity of photons, from photon-cubes acquired by single-photon devices. |
Varun Sundar; Andrei Ardelean; Tristan Swedish; Claudio Bruschini; Edoardo Charbon; Mohit Gupta; |
471 | Reference-guided Controllable Inpainting of Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For non-reference disoccluded regions, which cannot be supervised by the single reference view, we devise a method based on image inpainters to guide both the geometry and appearance. |
Ashkan Mirzaei; Tristan Aumentado-Armstrong; Marcus A. Brubaker; Jonathan Kelly; Alex Levinshtein; Konstantinos G. Derpanis; Igor Gilitschenski; |
472 | Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we learn a diffusion network to model the conditional distribution of (geometric) renderings of objects conditioned on hand configuration and category label, and leverage it as a prior to guide the novel-view renderings of the reconstructed scene. |
Yufei Ye; Poorvi Hebbar; Abhinav Gupta; Shubham Tulsiani; |
473 | Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from A Single RGB Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction while efficiently modeling the spatial relationship between hands. |
Pengfei Ren; Chao Wen; Xiaozheng Zheng; Zhou Xue; Haifeng Sun; Qi Qi; Jingyu Wang; Jianxin Liao; |
474 | Fast Adversarial Training with Smooth Convergence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To obtain a smooth loss convergence process, we propose a novel oscillatory constraint (dubbed ConvergeSmooth) to limit the loss difference between adjacent epochs. |
Mengnan Zhao; Lihe Zhang; Yuqiu Kong; Baocai Yin; |
475 | Who Are You Referring To? Coreference Resolution In Image Narrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Coreference resolution aims to identify words and phrases which refer to the same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. |
Arushi Goel; Basura Fernando; Frank Keller; Hakan Bilen; |
476 | DVGaze: Dual-View Gaze Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a dual-view gaze estimation network (DV-Gaze). |
Yihua Cheng; Feng Lu; |
477 | Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet), which leverages intrinsic properties of hyperbolic space to learn representative features. |
Zhiying Leng; Shun-Cheng Wu; Mahdi Saleh; Antonio Montanaro; Hao Yu; Yin Wang; Nassir Navab; Xiaohui Liang; Federico Tombari; |
478 | A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By analyzing the cross-attention representations of these models, we notice two key issues. First, for text prompts that contain multiple concepts, there is a significant amount of pixel-space overlap (i.e., same spatial regions) among pairs of different concepts. This eventually leads to the model being unable to distinguish between the two concepts and one of them being ignored in the final generation. Next, while these models attempt to capture all such concepts during the beginning of denoising (e.g., first few steps) as evidenced by cross-attention maps, this knowledge is not retained by the end of denoising (e.g., last few steps). Such loss of knowledge eventually leads to inaccurate generation outputs |
Aishwarya Agarwal; Srikrishna Karanam; K J Joseph; Apoorv Saxena; Koustava Goswami; Balaji Vasan Srinivasan; |
479 | LivePose: Online 3D Reconstruction from Monocular Video with Dynamic Camera Poses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To support further research, we introduce a dataset called LivePose containing the dynamic poses from a SLAM system running on ScanNet. |
Noah Stier; Baptiste Angles; Liang Yang; Yajie Yan; Alex Colburn; Ming Chuang; |
480 | Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel layer-adaptive weight-pruning approach for Deep Neural Networks (DNNs) that addresses the challenge of optimizing the output distortion minimization while adhering to a target pruning ratio constraint. |
Kaixin Xu; Zhe Wang; Xue Geng; Min Wu; Xiaoli Li; Weisi Lin; |
481 | Feature Modulation Transformer: Cross-Refinement of Global Representation Via High-Frequency Prior for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. |
Ao Li; Le Zhang; Yun Liu; Ce Zhu; |
482 | Exploring The Sim2Real Gap Using Digital Twins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus using synthetic data still requires a large amount of time, money, and skill as one needs to author the data carefully. In this paper, we seek to understand which aspects of this authoring process are most critical. |
Sruthi Sudhakar; Jon Hanzelka; Josh Bobillot; Tanmay Randhavane; Neel Joshi; Vibhav Vineet; |
483 | MPI-Flow: Learning Realistic Optical Flow with Multiplane Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the domain gap of these data with real-world scenes constrains the generalization of the trained model to real-world applications. To address this issue, we investigate generating realistic optical flow datasets from real-world images. |
Yingping Liang; Jiaming Liu; Debing Zhang; Ying Fu; |
484 | Re:PolyWorld – A Graph Neural Network for Polygonal Scene Parsing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this work was to overcome weaknesses and shortcomings of the original model, as well as introducing an improved polygonal representation to obtain a general-purpose method for polygon extraction in images. |
Stefano Zorzi; Friedrich Fraundorfer; |
485 | FaceCLIPNeRF: Text-driven 3D Face Manipulation Using Deformable Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, our approach is designed to require a single text to manipulate a face reconstructed with NeRF. |
Sungwon Hwang; Junha Hyung; Daejin Kim; Min-Jung Kim; Jaegul Choo; |
486 | Video State-Changing Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper makes a pioneering effort to introduce a weakly-supervised benchmark on Video State-Changing Object Segmentation (VSCOS). |
Jiangwei Yu; Xiang Li; Xinran Zhao; Hongming Zhang; Yu-Xiong Wang; |
487 | Learning Shape Primitives Via Implicit Convexity Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is challenging since implicit shapes have a high degree of freedom, which violates the simplicity property of shape primitives. In this work, a novel regularization term named Implicit Convexity Regularization (ICR) imposed on implicit primitive learning is proposed to tackle this problem. |
Xiaoyang Huang; Yi Zhang; Kai Chen; Teng Li; Wenjun Zhang; Bingbing Ni; |
488 | MonoNeRF: Learning A Generalizable Dynamic Radiance Field from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. |
Fengrui Tian; Shaoyi Du; Yueqi Duan; |
489 | PG-RCNN: Semantic Surface Point Generation for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This leads to the indiscriminate point generation for incorrect proposals as well. Motivated by this, we propose Point Generation R-CNN (PG-RCNN), a novel end-to-end detector that generates semantic surface points of foreground objects for accurate detection. |
Inyong Koo; Inyoung Lee; Se-Ho Kim; Hee-Seon Kim; Woo-jin Jeon; Changick Kim; |
490 | ITI-GEN: Inclusive Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, this paper proposes a drastically different approach that adheres to the maxim that "a picture is worth a thousand words". |
Cheng Zhang; Xuanbai Chen; Siqi Chai; Chen Henry Wu; Dmitry Lagun; Thabo Beeler; Fernando De la Torre; |
491 | Learning Depth Estimation for Transparent and Mirror Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. |
Alex Costanzino; Pierluigi Zama Ramirez; Matteo Poggi; Fabio Tosi; Stefano Mattoccia; Luigi Di Stefano; |
492 | Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent progress in enhancing spectral clustering with powerful pre-trained models, current approaches still suffer from inefficiencies in spectral decomposition and inflexibility in applying them to the test data. This work addresses these issues by casting spectral clustering as a parametric approach that employs neural network-based eigenfunctions to produce spectral embeddings. |
Zhijie Deng; Yucen Luo; |
493 | Shape Analysis of Euclidean Curves Under Frenet-Serret Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that for any smooth curve in R^d, d>1, the generalized curvatures associated with the Frenet-Serret equation can be used to define a Riemannian geometry that takes into account all the geometric features of the shape. |
Perrine Chassat; Juhyun Park; Nicolas Brunel; |
494 | Representation Uncertainty in Self-Supervised Learning As Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, a novel self-supervised learning (SSL) method is proposed, which considers SSL in terms of variational inference to learn not only representation but also representation uncertainties. |
Hiroki Nakamura; Masashi Okada; Tadahiro Taniguchi; |
495 | Efficient Diffusion Training Via Min-SNR Weighting Strategy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. |
Tiankai Hang; Shuyang Gu; Chen Li; Jianmin Bao; Dong Chen; Han Hu; Xin Geng; Baining Guo; |
496 | Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although substantial progress has been made, most existing studies mainly focus on either single-modal tasks or simple classification tasks, with few works paying attention to the dense prediction tasks and the interaction between different modalities. Therefore, in this paper, we do an in-depth investigation of the efficient tuning problem on referring image segmentation. |
Zunnan Xu; Zhihong Chen; Yong Zhang; Yibing Song; Xiang Wan; Guanbin Li; |
497 | Towards Zero-Shot Scale-Aware Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. |
Vitor Guizilini; Igor Vasiljevic; Dian Chen; Rareș Ambruș; Adrien Gaidon; |
498 | ATT3D: Amortized Text-to-3D Object Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model instead of separately. |
Jonathan Lorraine; Kevin Xie; Xiaohui Zeng; Chen-Hsuan Lin; Towaki Takikawa; Nicholas Sharp; Tsung-Yi Lin; Ming-Yu Liu; Sanja Fidler; James Lucas; |
499 | Virtual Try-On with Pose-Garment Keypoints Guided Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a pose-garment keypoints guided inpainting method for the image-based virtual try-on task, which produces high-fidelity try-on images and well preserves the shapes and patterns of the garments. |
Zhi Li; Pengfei Wei; Xiang Yin; Zejun Ma; Alex C. Kot; |
500 | Learning By Sorting: Self-supervised Learning with Group Ordering Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new variation of the contrastive learning objective, Group Ordering Constraints (GroCo), that leverages the idea of sorting the distances of positive and negative pairs and computing the respective loss based on how many positive pairs have a larger distance than the negative pairs, and thus are not ordered correctly. |
Nina Shvetsova; Felix Petersen; Anna Kukleva; Bernt Schiele; Hilde Kuehne; |
501 | Cross Modal Transformer: Towards Fast and Robust 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. |
Junjie Yan; Yingfei Liu; Jianjian Sun; Fan Jia; Shuailin Li; Tiancai Wang; Xiangyu Zhang; |
502 | Perceptual Grouping in Contrastive Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery. |
Kanchana Ranasinghe; Brandon McKinzie; Sachin Ravi; Yinfei Yang; Alexander Toshev; Jonathon Shlens; |
503 | Dynamic Perceiver for Efficient Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task with a novel dual-branch architecture. |
Yizeng Han; Dongchen Han; Zeyu Liu; Yulin Wang; Xuran Pan; Yifan Pu; Chao Deng; Junlan Feng; Shiji Song; Gao Huang; |
504 | MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce a space-time local implicit neural function. |
Yi-Hsin Chen; Si-Cun Chen; Yi-Hsin Chen; Yen-Yu Lin; Wen-Hsiao Peng; |
505 | CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Camera Radar Net (CRN), a novel camera-radar fusion framework that generates a semantically rich and spatially accurate bird’s-eye-view (BEV) feature map for various tasks. |
Youngseok Kim; Juyeb Shin; Sanmin Kim; In-Jae Lee; Jun Won Choi; Dongsuk Kum; |
506 | PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. |
Junhyeong Cho; Gilhyun Nam; Sungyeon Kim; Hunmin Yang; Suha Kwak; |
507 | Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in scenarios where data is extremely limited (less than 10), the generative network tends to overfit and suffers from content degradation. To address these problems, we propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss, which targets different learning objectives at distinct training stages of the diffusion model. |
Teng Hu; Jiangning Zhang; Liang Liu; Ran Yi; Siqi Kou; Haokun Zhu; Xu Chen; Yabiao Wang; Chengjie Wang; Lizhuang Ma; |
508 | SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our observation that stacking all the historical points would damage performance due to a large amount of redundant and misleading information, we propose the Sparse Voxel-Adjacent Query Network (SVQNet) for 4D LiDAR semantic segmentation. |
Xuechao Chen; Shuangjie Xu; Xiaoyi Zou; Tongyi Cao; Dit-Yan Yeung; Lu Fang; |
509 | HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first active learning tool for fine-grained 3D part labeling, a problem which challenges even the most advanced deep learning (DL) methods due to the significant structural variations among the intricate parts. |
Fenggen Yu; Yiming Qian; Francisca Gil-Ureta; Brian Jackson; Eric Bennett; Hao Zhang; |
510 | MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new approach for high-quality multi-exposure image fusion (MEF). |
Ting Jiang; Chuan Wang; Xinpeng Li; Ru Li; Haoqiang Fan; Shuaicheng Liu; |
511 | FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate where and how to partially personalize a ViT model. |
Guangyu Sun; Matias Mendieta; Jun Luo; Shandong Wu; Chen Chen; |
512 | Conditional 360-degree Image Synthesis for Immersive Indoor Scene Decoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of conditional scene decoration for 360deg images. |
Ka Chun Shum; Hong-Wing Pang; Binh-Son Hua; Duc Thanh Nguyen; Sai-Kit Yeung; |
513 | The Unreasonable Effectiveness of Large Language-Vision Models for Source-Free Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take an orthogonal approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift. |
Giacomo Zara; Alessandro Conti; Subhankar Roy; Stéphane Lathuilière; Paolo Rota; Elisa Ricci; |
514 | SIDGAN: High-Resolution Dubbed Video Generation Via Shift-Invariant Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the accurate lip generation of previous approaches that adopts a pretrained audio-video synchronization metric as an objective function, called Sync-Loss, extending it to high-resolution videos was challenging due to shift biases in the loss landscape that inhibit tandem optimization of Sync-Loss and visual quality, leading to a loss of detail. To address this issue, we introduce shift-invariant learning, which generates photo-realistic high-resolution videos with accurate Lip-Sync. |
Urwa Muaz; Wondong Jang; Rohun Tripathi; Santhosh Mani; Wenbin Ouyang; Ravi Teja Gadde; Baris Gecer; Sergio Elizondo; Reza Madad; Naveen Nair; |
515 | Meta-ZSDETR: Zero-shot DETR with Meta-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first method that combines DETR and meta-learning to perform zero-shot object detection, named Meta-ZSDETR, where model training is formalized as an individual episode based meta-learning task. |
Lu Zhang; Chenbo Zhang; Jiajia Zhao; Jihong Guan; Shuigeng Zhou; |
516 | GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes As Pseudo Labelers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose GaPro, a new instance segmentation for 3D point clouds using axis-aligned 3D bounding box supervision. |
Tuan Duc Ngo; Binh-Son Hua; Khoi Nguyen; |
517 | STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the first time, we introduce vision Transformers into PPAR by treating a video as a tubelet sequence, and accordingly design two complementary mechanisms, i.e., sparsification and anonymization, to remove privacy from a spatio-temporal perspective. |
Ming Li; Xiangyu Xu; Hehe Fan; Pan Zhou; Jun Liu; Jia-Wei Liu; Jiahe Li; Jussi Keppo; Mike Zheng Shou; Shuicheng Yan; |
518 | Get The Best of Both Worlds: Improving Accuracy and Transferability By Grassmann Class Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In GCR, each class is a subspace, and the logit is defined as the norm of the projection of a feature onto the class subspace. We integrate Riemannian SGD into deep learning frameworks such that class subspaces in a Grassmannian are jointly optimized with the rest model parameters. |
Haoqi Wang; Zhizhong Li; Wayne Zhang; |
519 | Computationally-Efficient Neural Image Compression with Shallow Decoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they suffer orders of magnitude higher computational complexity compared to traditional codecs, which hinders their real-world deployment. This paper takes a step forward in closing this gap in decoding complexity by adopting shallow or even linear decoding transforms. |
Yibo Yang; Stephan Mandt; |
520 | ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new framework called ObjectSDF++ to overcome the limitations of ObjectSDF. |
Qianyi Wu; Kaisiyuan Wang; Kejie Li; Jianmin Zheng; Jianfei Cai; |
521 | Tracing The Origin of Adversarial Attack for Forensic Investigation and Deterrence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. |
Han Fang; Jiyi Zhang; Yupeng Qiu; Jiayang Liu; Ke Xu; Chengfang Fang; Ee-Chien Chang; |
522 | Sketch and Text Guided Diffusion Model for Colored Point Cloud Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, text based descriptions of 3D shapes are inherently ambiguous and lack details. In this paper, we propose a sketch and text guided probabilistic diffusion model for colored point cloud generation that conditions the denoising process jointly with a hand drawn sketch of the object and its textual description. |
Zijie Wu; Yaonan Wang; Mingtao Feng; He Xie; Ajmal Mian; |
523 | Scenimefy: Learning to Craft Anime Scene Via Semi-Supervised Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite promising attempts, previous efforts are still incompetent in achieving satisfactory results with consistent semantic preservation, evident stylization, and fine details. In this study, we propose Scenimefy, a novel semi-supervised image-to-image translation framework that addresses these challenges. |
Yuxin Jiang; Liming Jiang; Shuai Yang; Chen Change Loy; |
524 | Towards Unsupervised Domain Generalization for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first Unsupervised Domain Generalization framework for Face Anti-Spoofing, namely UDG-FAS, which could exploit large amounts of easily accessible unlabeled data to learn generalizable features for enhancing the low-data regime of FAS. |
Yuchen Liu; Yabo Chen; Mengran Gou; Chun-Ting Huang; Yaoming Wang; Wenrui Dai; Hongkai Xiong; |
525 | DR-Tune: Improving Fine-tuning of Pretrained Visual Models By Distribution Regularization with Semantic Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The former fails to retain the knowledge in the successive fine-tuning phase, thereby prone to be over-fitting, and the latter imposes strong constraints to the weights or feature maps of the downstream model without considering semantic drift, often incurring insufficient optimization. To deal with these issues, we propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune). |
Nan Zhou; Jiaxin Chen; Di Huang; |
526 | MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos with Spherical Buffers and Padded Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MotionDeltaCNN, a sparse CNN inference framework that supports moving cameras. |
Mathias Parger; Chengcheng Tang; Thomas Neff; Christopher D. Twigg; Cem Keskin; Robert Wang; Markus Steinberger; |
527 | General Image-to-Image Translation with One-Shot Image Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods are inadequate in meeting this demand as they lack the ability to preserve content or translate visual concepts effectively. Inspired by this, we propose a novel framework named visual concept translator (VCT) with the ability to preserve content in the source image and translate the visual concepts guided by a single reference image. |
Bin Cheng; Zuhao Liu; Yunbo Peng; Yue Lin; |
528 | Dense 2D-3D Indoor Prediction with Sound Via Aligned Cross-Modal Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Spatial Alignment via Matching (SAM) distillation framework that elicits local correspondence between the two modalities in vision-to-audio knowledge transfer. |
Heeseung Yun; Joonil Na; Gunhee Kim; |
529 | Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As the both geometric and pose space of fractured parts are exceptionally large, shape pose disentanglement of part representations is beneficial to geometric shape assembly. In our paper, we propose to leverage SE(3) equivariance for such shape pose disentanglement. |
Ruihai Wu; Chenrui Tie; Yushi Du; Yan Zhao; Hao Dong; |
530 | Adversarial Bayesian Augmentation for Single-Source Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Adversarial Bayesian Augmentation (ABA), a novel algorithm that learns to generate image augmentations in the challenging single-source domain generalization setting. |
Sheng Cheng; Tejas Gokhale; Yezhou Yang; |
531 | Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we address the challenge of 3D scene structure recovery from monocular depth estimation. |
Chi Zhang; Wei Yin; Gang Yu; Zhibin Wang; Tao Chen; Bin Fu; Joey Tianyi Zhou; Chunhua Shen; |
532 | Self-regulating Prompts: Foundational Model Adaptation Without Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This leads to the loss of the model’s original generalization capability. To address this issue, our work introduces a self-regularization framework for prompting called PromptSRC (Prompting with Self-regulating Constraints). |
Muhammad Uzair Khattak; Syed Talal Wasim; Muzammal Naseer; Salman Khan; Ming-Hsuan Yang; Fahad Shahbaz Khan; |
533 | ASM: Adaptive Skinning Model for High-Quality 3D Face Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Adaptive Skinning Model (ASM), which redefines the skinning model with more compact and fully tunable parameters. |
Kai Yang; Hong Shang; Tianyang Shi; Xinghan Chen; Jingkai Zhou; Zhongqian Sun; Wei Yang; |
534 | EverLight: Indoor-Outdoor Editable HDR Lighting Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage recent advances in GAN-based LDR panorama extrapolation from a regular image, which we extend to HDR using parametric spherical gaussians. To achieve this, we introduce a novel lighting co-modulation method that injects lighting-related features throughout the generator, tightly coupling the original or edited scene illumination within the panorama generation process. |
Mohammad Reza Karimi Dastjerdi; Jonathan Eisenmann; Yannick Hold-Geoffroy; Jean-François Lalonde; |
535 | MARS: Model-agnostic Biased Object Removal Without Additional Supervision for Weakly-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following the first observation that biased features can be separated and eliminated by matching biased objects with backgrounds in the same dataset, we propose a fully-automatic/model-agnostic biased removal framework called MARS (Model-Agnostic biased object Removal without additional Supervision), which utilizes semantically consistent features of an unsupervised technique to eliminate biased objects in pseudo labels. |
Sanghyun Jo; In-Jae Yu; Kyungsu Kim; |
536 | CAFA: Class-Aware Feature Alignment for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: That is, a model does not have a chance to learn test data in a class-discriminative manner, which was feasible in other adaptation tasks (e.g., unsupervised domain adaptation) via supervised losses on the source data. Based on this observation, we propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously 1) encourages a model to learn target representations in a class-discriminative manner and 2) effectively mitigates the distribution shifts at test time. |
Sanghun Jung; Jungsoo Lee; Nanhee Kim; Amirreza Shaban; Byron Boots; Jaegul Choo; |
537 | Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to extend LT-ReID beyond pedestrian recognition to include a wider range of real-world human activities while still accounting for cloth-changing scenarios over large time gaps. |
Feng Liu; Minchul Kim; ZiAng Gu; Anil Jain; Xiaoming Liu; |
538 | Agile Modeling: From Concept to Classifier in Minutes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reaction, we introduce the problem of Agile Modeling: the process of turning any subjective visual concept into a computer vision model through a real-time user-in-the-loop interactions. |
Otilia Stretcu; Edward Vendrow; Kenji Hata; Krishnamurthy Viswanathan; Vittorio Ferrari; Sasan Tavakkol; Wenlei Zhou; Aditya Avinash; Emming Luo; Neil Gordon Alldrin; MohammadHossein Bateni; Gabriel Berger; Andrew Bunner; Chun-Ta Lu; Javier Rey; Giulia DeSalvo; Ranjay Krishna; Ariel Fuxman; |
539 | Improving Lens Flare Removal with General-Purpose Pipeline and Multiple Light Sources Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a solution to improve the performance of lens flare removal by revisiting the ISP and remodeling the principle of automatic exposure in the synthesis pipeline and design a more reliable light sources recovery strategy. |
Yuyan Zhou; Dong Liang; Songcan Chen; Sheng-Jun Huang; Shuo Yang; Chongyi Li; |
540 | FACET: Fairness in Computer Vision Evaluation Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks – image classification, object detection and segmentation. |
Laura Gustafson; Chloe Rolland; Nikhila Ravi; Quentin Duval; Aaron Adcock; Cheng-Yang Fu; Melissa Hall; Candace Ross; |
541 | Few-Shot Physically-Aware Articulated Mesh Generation Via Hierarchical Deformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous mesh generative models either have difficulties in depicting a diverse data space from only a few examples or fail to ensure physical validity of their samples. Regarding the above challenges, we propose two key innovations, including 1) a hierarchical mesh deformation-based generative model based upon the divide-and-conquer philosophy to alleviate the few-shot challenge by borrowing transferrable deformation patterns from large scale rigid meshes and 2) a physics-aware deformation correction scheme to encourage physically plausible generations. |
Xueyi Liu; Bin Wang; He Wang; Li Yi; |
542 | Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. |
Hansheng Chen; Jiatao Gu; Anpei Chen; Wei Tian; Zhuowen Tu; Lingjie Liu; Hao Su; |
543 | DCPB: Deformable Convolution Based on The Poincare Ball for Top-view Fisheye Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we demonstrate the analogy between the fisheye model and the Poincare ball and that learning the shape of convolution kernels in the Poincare Ball can alleviate the spatial distortion problem. |
Xuan Wei; Zhidan Ran; Xiaobo Lu; |
544 | Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Joint tracking and segmentation have been attempted in some studies but they often lack full compatibility of both box and mask in initialization and prediction, and mainly focus on single-object scenarios. To address these limitations, this paper proposes a Multi-object Mask-box Integrated framework for unified Tracking and Segmentation, dubbed MITS. |
Yuanyou Xu; Zongxin Yang; Yi Yang; |
545 | One-Shot Generative Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to transfer a Generative Adversarial Network (GAN) pre-trained on one image domain to another domain referred to as few as just one reference image. |
Ceyuan Yang; Yujun Shen; Zhiyi Zhang; Yinghao Xu; Jiapeng Zhu; Zhirong Wu; Bolei Zhou; |
546 | Prototypes-oriented Transductive Few-shot Learning with Conditional Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transductive Few-Shot Learning (TFSL) has recently attracted increasing attention since it typically outperforms its inductive peer by leveraging statistics of query samples.However, previous TFSL methods usually encode uniform prior that all the classes within query samples are equally likely, which is biased in imbalanced TFSL and causes severe performance degradation.Given this pivotal issue, in this work, we propose a novel Conditional Transport (CT) based imbalanced TFSL model called Prototypes-oriented Unbiased Transfer Model (PUTM) to fully exploit unbiased statistics of imbalanced query samples, which employs forward and backward navigators as transport matrices to balance the prior of query samples per class between uniform and adaptive data-driven distributions. |
Long Tian; Jingyi Feng; Xiaoqiang Chai; Wenchao Chen; Liming Wang; Xiyang Liu; Bo Chen; |
547 | SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations. |
Yichen Xie; Chenfeng Xu; Marie-Julie Rakotosaona; Patrick Rim; Federico Tombari; Kurt Keutzer; Masayoshi Tomizuka; Wei Zhan; |
548 | DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing Using Determiners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we have developed and released the DetermiNet dataset, which comprises 250,000 synthetically generated images and captions based on 25 determiners. |
Clarence Lee; M Ganesh Kumar; Cheston Tan; |
549 | 3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. |
Shuxiao Ding; Eike Rehder; Lukas Schneider; Marius Cordts; Juergen Gall; |
550 | ReGen: A Good Generative Zero-Shot Video Classifier Should Be Rewarded Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate base class overfitting, in this work, we propose to use reinforcement learning to enforce the output of the video captioning model to be more class-level discriminative. |
Adrian Bulat; Enrique Sanchez; Brais Martinez; Georgios Tzimiropoulos; |
551 | Complementary Domain Adaptation and Generalization for Unsupervised Continual Domain Shift Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Complementary Domain Adaptation and Generalization (CoDAG), a simple yet effective learning framework that combines domain adaptation and generalization in a complementary manner to achieve three major goals of unsupervised continual domain shift learning: adapting to a current domain, generalizing to unseen domains, and preventing forgetting of previously seen domains. |
Wonguk Cho; Jinha Park; Taesup Kim; |
552 | RICO: Regularizing The Unobservable for Indoor Compositional Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though achieving plausible disentanglement, the performance drops significantly when processing the indoor scenes where objects are usually partially observed. We propose RICO to address this by regularizing the unobservable regions for indoor compositional reconstruction. |
Zizhang Li; Xiaoyang Lyu; Yuanyuan Ding; Mengmeng Wang; Yiyi Liao; Yong Liu; |
553 | Ordered Atomic Activity for Fine-grained Interactive Traffic Scenario Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel representation called Ordered Atomic Activity for interactive scenario understanding. |
Nakul Agarwal; Yi-Ting Chen; |
554 | CO-PILOT: Dynamic Top-Down Point Cloud with Conditional Neighborhood Aggregation for Multi-Gigapixel Histopathology Image Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we developed a novel dynamic and hierarchical point-cloud-based method (CO-PILOT) for the processing of cellular graphs extracted from routine histopathology images. |
Ramin Nakhli; Allen Zhang; Ali Mirabadi; Katherine Rich; Maryam Asadi; Blake Gilks; Hossein Farahani; Ali Bashashati; |
555 | Troubleshooting Ethnic Quality Bias with Curriculum Domain Adaptation for Face Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents the first attempt in the field of FIQA to address these challenges with a novel Ethnic-Quality-Bias Mitigating (EQBM) framework. |
Fu-Zhao Ou; Baoliang Chen; Chongyi Li; Shiqi Wang; Sam Kwong; |
556 | HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: First, inspired by these observations, we propose a simple yet effective data augmentation method HybridAugment that reduces the reliance of CNNs on high-frequency components, and thus improves their robustness while keeping their clean accuracy high. Second, we propose HybridAugment++, which is a hierarchical augmentation method that attempts to unify various frequency-spectrum augmentations. |
Mehmet Kerim Yucel; Ramazan Gokberk Cinbis; Pinar Duygulu; |
557 | CLR: Channel-wise Lightweight Reprogramming for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Channel-wise Lightweight Reprogramming (CLR) approach that helps convolutional neural networks (CNNs) overcome catastrophic forgetting during continual learning. |
Yunhao Ge; Yuecheng Li; Shuo Ni; Jiaping Zhao; Ming-Hsuan Yang; Laurent Itti; |
558 | IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we observe a surprising fact that such approach could result in more severe performance degradation when labels are extremely scarce, as the unreliable outlier detector may wrongly exclude a considerable portion of valuable inliers. To tackle with this issue, we introduce a novel open-set SSL framework, IOMatch, which can jointly utilize inliers and outliers, even when it is difficult to distinguish exactly between them. |
Zekun Li; Lei Qi; Yinghuan Shi; Yang Gao; |
559 | Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The most recent methods of this kind measure the uncertainty of each pre-divided region for manual labelling but they suffer from redundant information and require additional efforts for region division. This paper aims at addressing this issue by developing a hierarchical point-based active learning strategy. |
Zongyi Xu; Bo Yuan; Shanshan Zhao; Qianni Zhang; Xinbo Gao; |
560 | Doppelgangers: Learning to Disambiguate Images of Similar Structures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a learning-based approach to visual disambiguation, formulating it as a binary classification task on image pairs. |
Ruojin Cai; Joseph Tung; Qianqian Wang; Hadar Averbuch-Elor; Bharath Hariharan; Noah Snavely; |
561 | BEV-DG: Cross-Modal Learning Under Bird’s-Eye View for Domain Generalization of 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, UDA methods rely on access to the target domain during training, meaning the trained model only works in a specific target domain. In light of this, we propose cross-modal learning under bird’s-eye view for Domain Generalization (DG) of 3D semantic segmentation, called BEV-DG. |
Miaoyu Li; Yachao Zhang; Xu Ma; Yanyun Qu; Yun Fu; |
562 | Grounded Entity-Landmark Adaptive Pre-Training for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, another critical problem of achieving fine-grained alignment at the entity level is seldom considered. To address this problem, we propose a novel Grounded Entity-Landmark Adaptive (GELA) pre-training paradigm for VLN tasks. |
Yibo Cui; Liang Xie; Yakun Zhang; Meishan Zhang; Ye Yan; Erwei Yin; |
563 | Lip Reading for Low-resource Languages By Learning and Combining General Speech Knowledge and Language-specific Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. |
Minsu Kim; Jeong Hun Yeo; Jeongsoo Choi; Yong Man Ro; |
564 | Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a universal intra-model collaborative learning framework to enable the effective and simultaneous detection of different quality of deepfakes. |
Binh M. Le; Simon S. Woo; |
565 | Object-Centric Multiple Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a video object-centric model for MOT. |
Zixu Zhao; Jiaze Wang; Max Horn; Yizhuo Ding; Tong He; Zechen Bai; Dominik Zietlow; Carl-Johann Simon-Gabriel; Bing Shuai; Zhuowen Tu; Thomas Brox; Bernt Schiele; Yanwei Fu; Francesco Locatello; Zheng Zhang; Tianjun Xiao; |
566 | Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) that improves the generalization and the performance of registration models. |
Ahmed Hatem; Yiming Qian; Yang Wang; |
567 | HopFIR: Hop-wise GraphFormer with Intragroup Joint Refinement for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Hop-wise GraphFormer with Intragroup Joint Refinement (HopFIR) architecture to tackle the 3D HPE problem. |
Kai Zhai; Qiang Nie; Bo Ouyang; Xiang Li; Shanlin Yang; |
568 | Improving Generalization of Adversarial Training Via Robust Critical Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Robustness Critical Fine-Tuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness. |
Kaijie Zhu; Xixu Hu; Jindong Wang; Xing Xie; Ge Yang; |
569 | Minimal Solutions to Generalized Three-View Relative Pose Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, minimal problems of three views with four points and three views of six lines have not yet been explored and solved, despite the efforts from the computer vision community. This paper develops the formulations of these two minimal problems and shows how state-of-the-art GPU implementations of Homotopy Continuation solver can be used effectively. |
Yaqing Ding; Chiang-Heng Chien; Viktor Larsson; Karl Åström; Benjamin Kimia; |
570 | Trajectory Unified Transformer for Pedestrian Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Trajectory Unified TRansformer, called TUTR, which unifies the trajectory prediction components, social interaction and multimodal trajectory prediction, into a transformer encoder-decoder architecture to effectively remove the need for post-processing. |
Liushuai Shi; Le Wang; Sanping Zhou; Gang Hua; |
571 | Understanding The Feature Norm for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this intriguing phenomenon being utilized in many applications, the underlying cause has not been thoroughly investigated. In this study, we demystify this very phenomenon by scrutinizing the discriminative structures concealed in the intermediate layers of a neural network. |
Jaewoo Park; Jacky Chen Long Chai; Jaeho Yoon; Andrew Beng Jin Teoh; |
572 | MHEntropy: Entropy Meets Multiple Hypotheses for Pose and Shape Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a multi-hypothesis probabilistic framework by optimizing the Kullback-Leibler divergence (KLD) between the data and model distribution. |
Rongyu Chen; Linlin Yang; Angela Yao; |
573 | USplit: Image Decomposition for Fluorescence Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present mSplit, a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. |
Ashesh Ashesh; Alexander Krull; Moises Di Sante; Francesco Pasqualini; Florian Jug; |
574 | Modeling The Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe that relative visual tempo is more in line with human intuition, and thus providing more effective supervision signals. |
Yisheng Zhu; Hu Han; Zhengtao Yu; Guangcan Liu; |
575 | LightGlue: Local Feature Matching at Light Speed Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LightGlue, a deep neural network that learns to match local features across images. |
Philipp Lindenberger; Paul-Edouard Sarlin; Marc Pollefeys; |
576 | Masked Autoencoders Are Efficient Class Incremental Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. |
Jiang-Tian Zhai; Xialei Liu; Andrew D. Bagdanov; Ke Li; Ming-Ming Cheng; |
577 | Knowledge Proxy Intervention for Deconfounded Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenge that the confounder in VideoQA is unobserved and non-enumerable in general, we propose a model-agnostic framework called Knowledge Proxy Intervention (KPI), which introduces an extra knowledge proxy variable in the causal graph to cut the backdoor path and remove the confounder. |
Jiangtong Li; Li Niu; Liqing Zhang; |
578 | Towards Semi-supervised Learning with Non-random Missing Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, class transition tracking based Pseudo-Rectifying Guidance (PRG) is devised for MNAR. |
Yue Duan; Zhen Zhao; Lei Qi; Luping Zhou; Lei Wang; Yinghuan Shi; |
579 | DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We have found that the full potential of offboard 3D detectors is not explored mainly due to two reasons: (1) the onboard multi-object tracker cannot generate sufficient complete object trajectories, and (2) the motion state of objects poses an inevitable challenge for the object-centric refining stage in leveraging the long-term temporal context representation. To tackle these problems, we propose a novel paradigm of offboard 3D object detection, named DetZero. |
Tao Ma; Xuemeng Yang; Hongbin Zhou; Xin Li; Botian Shi; Junjie Liu; Yuchen Yang; Zhizheng Liu; Liang He; Yu Qiao; Yikang Li; Hongsheng Li; |
580 | ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through investigating this specific type of task, we identify that its generalization bottleneck primarily lies in the severe overfitting for tail classes with limited training data. To overcome this bottleneck, we leverage class priors to restrict the generalization scope of the class-agnostic SAM and propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM). |
Yixuan Zhou; Yi Qu; Xing Xu; Hengtao Shen; |
581 | Learning from Noisy Data for Semi-Supervised 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take PL from a noisy learning perspective: instead of directly applying vanilla pseudo-labels, we design a noise-resistant instance supervision module for better generalization. |
Zehui Chen; Zhenyu Li; Shuo Wang; Dengpan Fu; Feng Zhao; |
582 | NeRFrac: Neural Radiance Fields Through Refractive Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce NeRFrac to realize neural novel view synthesis of scenes captured through refractive surfaces, typically water surfaces. |
Yifan Zhan; Shohei Nobuhara; Ko Nishino; Yinqiang Zheng; |
583 | MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR. |
Renrui Zhang; Han Qiu; Tai Wang; Ziyu Guo; Ziteng Cui; Yu Qiao; Hongsheng Li; Peng Gao; |
584 | Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve authentic restoration, we propose IDM, an Iteratively learned face restoration system based on denoising Diffusion Models (DDMs). |
Yang Zhao; Tingbo Hou; Yu-Chuan Su; Xuhui Jia; Yandong Li; Matthias Grundmann; |
585 | LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we introduce LivelySpeaker, a framework that realizes semantics-aware co-speech gesture generation and offers several control handles. |
Yihao Zhi; Xiaodong Cun; Xuelin Chen; Xi Shen; Wen Guo; Shaoli Huang; Shenghua Gao; |
586 | Contrastive Feature Masking Open-Vocabulary Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Contrastive Feature Masking Vision Transformer (CFM-ViT) – an image-text pretraining methodology that achieves simultaneous learning of image- and region level representation for open-vocabulary object detection (OVD). |
Dahun Kim; Anelia Angelova; Weicheng Kuo; |
587 | Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Group DETR, a simple yet efficient DETR training approach that introduces a group-wise way for one-to-many assignment. |
Qiang Chen; Xiaokang Chen; Jian Wang; Shan Zhang; Kun Yao; Haocheng Feng; Junyu Han; Errui Ding; Gang Zeng; Jingdong Wang; |
588 | Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, replaying data of previously learned downstream tasks can enhance their performance but comes at the cost of sacrificing zero-shot performance. To address this challenge, we propose a novel method ZSCL to prevent zero-shot transfer degradation in the continual learning of vision-language models in both feature and parameter space. |
Zangwei Zheng; Mingyuan Ma; Kai Wang; Ziheng Qin; Xiangyu Yue; Yang You; |
589 | Personalized Image Generation for Color Vision Deficiency Population Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a personalized CVD-friendly image generation algorithm with two key characteristics: (i) generating CVD-oriented images end-to-end; (ii) generating continuous personalized images for people with various CVD types and degrees through disentangling the color representation based on a triple-latent structure. |
Shuyi Jiang; Daochang Liu; Dingquan Li; Chang Xu; |
590 | EGC: Image Generation and Classification Via A Diffusion Energy-Based Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. |
Qiushan Guo; Chuofan Ma; Yi Jiang; Zehuan Yuan; Yizhou Yu; Ping Luo; |
591 | OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents OccFormer, a dual-path transformer network to effectively process the 3D volume for semantic occupancy prediction. |
Yunpeng Zhang; Zheng Zhu; Dalong Du; |
592 | Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel Probabilistic Triangulation module that can be embedded in a calibrated 3D human pose estimation method, generalizing it to uncalibration scenes. |
Boyuan Jiang; Lei Hu; Shihong Xia; |
593 | Joint Metrics Matter: A Better Standard for Trajectory Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate. |
Erica Weng; Hana Hoshino; Deva Ramanan; Kris Kitani; |
594 | TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a set of simple yet effective TOken REduction (TORE) strategies for Transformer-based Human Mesh Recovery from monocular images. |
Zhiyang Dou; Qingxuan Wu; Cheng Lin; Zeyu Cao; Qiangqiang Wu; Weilin Wan; Taku Komura; Wenping Wang; |
595 | Test Time Adaptation for Blind Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce two novel quality-relevant auxiliary tasks at the batch and sample levels to enable TTA for blind IQA. |
Subhadeep Roy; Shankhanil Mitra; Soma Biswas; Rajiv Soundararajan; |
596 | GeT: Generative Target Structure Debiasing for Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GeT that learns a non-bias target embedding distribution with high quality pseudo labels. |
Can Zhang; Gim Hee Lee; |
597 | D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we aim to reduce the annotation cost yet keep competitive performance for TSG task compared to fully supervised ones. |
Hanjun Li; Xiujun Shu; Sunan He; Ruizhi Qiao; Wei Wen; Taian Guo; Bei Gan; Xing Sun; |
598 | GEDepth: Ground Embedding for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the leading algorithms in this field have reported significant improvement, they are essentially geared to the particular compound of pictorial observations and camera parameters (i.e., intrinsics and extrinsics), strongly limit- ing their generalizability in real-world scenarios. In or- der to cope with this difficulty, this paper proposes a novel ground embedding module to decouple camera parameters from pictorial cues, thus promoting the generalization ca- pability. |
Xiaodong Yang; Zhuang Ma; Zhiyu Ji; Zhe Ren; |
599 | DETRs with Collaborative Hybrid Assignments Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder’s output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. |
Zhuofan Zong; Guanglu Song; Yu Liu; |
600 | Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation. |
Jiacong Xu; Yi Zhang; Jiawei Peng; Wufei Ma; Artur Jesslen; Pengliang Ji; Qixin Hu; Jiehua Zhang; Qihao Liu; Jiahao Wang; Wei Ji; Chen Wang; Xiaoding Yuan; Prakhar Kaushik; Guofeng Zhang; Jie Liu; Yushan Xie; Yawen Cui; Alan Yuille; Adam Kortylewski; |
601 | Rethinking Video Frame Interpolation from Shutter Mode Induced Degradation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first real-world dataset for learning and benchmark degraded video frame interpolation, named RD-VFI, and further explore the performance differences of three types of degradations, including GS blur, RS distortion, and an in-between effect caused by the rolling shutter with global reset (RSGR), thanks to our novel quad-axis imaging system. |
Xiang Ji; Zhixiang Wang; Zhihang Zhong; Yinqiang Zheng; |
602 | Multi-Modal Neural Radiance Field for Monocular Dense SLAM with A Light-Weight ToF Sensor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the first dense SLAM system with a monocular camera and a light-weight ToF sensor. |
Xinyang Liu; Yijin Li; Yanbin Teng; Hujun Bao; Guofeng Zhang; Yinda Zhang; Zhaopeng Cui; |
603 | MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This explicit methodology induces sparsity in 3D representations due to the increased dimensionality from 2D to 3D, and leads to substantial information loss, especially for distant and occluded objects. To alleviate this issue, we propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy. |
Junkai Xu; Liang Peng; Haoran Cheng; Hao Li; Wei Qian; Ke Li; Wenxiao Wang; Deng Cai; |
604 | Monocular 3D Object Detection with Bounding Box Denoising in 3D By Perceiver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a top-down manner. |
Xianpeng Liu; Ce Zheng; Kelvin B Cheng; Nan Xue; Guo-Jun Qi; Tianfu Wu; |
605 | Point-SLAM: Dense Neural Point Cloud-based SLAM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a dense neural simultaneous localization and mapping (SLAM) approach for monocular RGBD input which anchors the features of a neural scene representation in a point cloud that is iteratively generated in an input-dependent data-driven manner. |
Erik Sandström; Yue Li; Luc Van Gool; Martin R. Oswald; |
606 | TrajectoryFormer: 3D Object Tracking Transformer with Predictive Trajectory Hypotheses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present TrajectoryFormer, a novel point-cloud-based 3D MOT framework. |
Xuesong Chen; Shaoshuai Shi; Chao Zhang; Benjin Zhu; Qiang Wang; Ka Chun Cheung; Simon See; Hongsheng Li; |
607 | Semantic-Aware Dynamic Parameter for Video Inpainting Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they still cannot fully utilize semantic information within the video frames and predict improper scene layout, failing to restore clear object boundaries for mixed scenes. To mitigate this problem, we introduce a new transformer-based video inpainting technique that can exploit semantic information within the input and considerably improve reconstruction quality. |
Eunhye Lee; Jinsu Yoo; Yunjeong Yang; Sungyong Baik; Tae Hyun Kim; |
608 | See More and Know More: Zero-shot Point Cloud Segmentation Via Multi-modal Visual Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In fact, the rich appearance information of images is a natural complement to the textureless point cloud, which is not well explored in previous literature. Motivated by this, we propose a novel multi-modal zero-shot learning method to better utilize the complementary information of point clouds and images for more accurate visual-semantic alignment. |
Yuhang Lu; Qi Jiang; Runnan Chen; Yuenan Hou; Xinge Zhu; Yuexin Ma; |
609 | SKED: Sketch-guided Text-based 3D Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SKED, a technique for editing 3D shapes represented by NeRFs. |
Aryan Mikaeili; Or Perel; Mehdi Safaee; Daniel Cohen-Or; Ali Mahdavi-Amiri; |
610 | WaveIPT: Joint Attention and Flow Alignment in The Wavelet Domain for Pose Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage the advantages of both attention and flow simultaneously, we propose Wavelet-aware Image-based Pose Transfer (WaveIPT) to fuse the attention and flow in the wavelet domain. |
Liyuan Ma; Tingwei Gao; Haitian Jiang; Haibin Shen; Kejie Huang; |
611 | Editable Image Geometric Abstraction Via Neural Primitive Assembly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores a novel image geometric abstraction paradigm based on assembly out of a pool of pre-defined simple parametric primitives (i.e., triangle, rectangle, circle and semicircle), facilitating controllable shape editing in images. |
Ye Chen; Bingbing Ni; Xuanhong Chen; Zhangli Hu; |
612 | Homeomorphism Alignment for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel HomeomorphisM Alignment (HMA) approach characterized by aligning the source and target data in two separate spaces. |
Lihua Zhou; Mao Ye; Xiatian Zhu; Siying Xiao; Xu-Qian Fan; Ferrante Neri; |
613 | MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite its wide-ranging use, this task remains challenging due to the significant appearance variation caused by occlusion and size differences among tracked targets. To address these issues, we present MBPTrack, which adopts a Memory mechanism to utilize past information and formulates localization in a coarse-to-fine scheme using Box Priors given in the first frame. |
Tian-Xing Xu; Yuan-Chen Guo; Yu-Kun Lai; Song-Hai Zhang; |
614 | Novel-View Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We share the inspiration from recent scene understanding work that shows a scene specific model built beforehand can significantly improve and unblock vision tasks especially when inputs are sparse, and extend it to the dynamic hand-object interaction scenario and propose to solve the problem in two stages. |
Wentian Qu; Zhaopeng Cui; Yinda Zhang; Chenyu Meng; Cuixia Ma; Xiaoming Deng; Hongan Wang; |
615 | EmoSet: A Large-scale Visual Emotion Dataset with Rich Attributes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce EmoSet, the first large-scale visual emotion dataset annotated with rich attributes, which is superior to existing datasets in four aspects: scale, annotation richness, diversity, and data balance. |
Jingyuan Yang; Qirui Huang; Tingting Ding; Dani Lischinski; Danny Cohen-Or; Hui Huang; |
616 | Distilling from Similar Tasks for Transfer Learning on A Budget Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though effective, DistillNearest assumes a single source model matches the target task, which is not always the case. To alleviate this, we propose a weighted multi-source distillation method to distill multiple source models trained on different domains weighted by their relevance for the target task into a single efficient model (named DistillWeighted). |
Kenneth Borup; Cheng Perng Phoo; Bharath Hariharan; |
617 | Self-Supervised Burst Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared to weakly-paired training strategies, which require noisy smartphone burst photos of static scenes, paired with a clean reference obtained from a tripod-mounted DSLR camera, our approach is more scalable, and avoids the color mismatch between the smartphone and DSLR. To achieve this, we propose a new self-supervised objective that uses a forward imaging model to recover a high-resolution image from aliased high frequencies in the burst. |
Goutam Bhat; Michaël Gharbi; Jiawen Chen; Luc Van Gool; Zhihao Xia; |
618 | Class-relation Knowledge Distillation for Novel Class Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Empirically, we find that such class relation becomes less informative during typical discovery training. To prevent such information loss, we propose a novel knowledge distillation framework, which utilizes our class-relation representation to regularize the learning of novel classes. |
Peiyan Gu; Chuyu Zhang; Ruijie Xu; Xuming He; |
619 | PARTNER: Level Up The Polar Representation for LiDAR 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, state-of-the-art polar-based detection methods inevitably suffer from the feature distortion problem because of the non-uniform division of polar representation, resulting in a non-negligible performance gap compared to Cartesian-based approaches. To tackle this issue, we present PARTNER, a novel 3D object detector in the polar coordinate. |
Ming Nie; Yujing Xue; Chunwei Wang; Chaoqiang Ye; Hang Xu; Xinge Zhu; Qingqiu Huang; Michael Bi Mi; Xinchao Wang; Li Zhang; |
620 | Data-Free Class-Incremental Hand Gesture Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose BOAT-MI — a simple and effective boundary-aware prototypical sampling mechanism for model inversion for DFCIL. |
Shubhra Aich; Jesus Ruiz-Santaquiteria; Zhenyu Lu; Prachi Garg; K J Joseph; Alvaro Fernandez Garcia; Vineeth N Balasubramanian; Kenrick Kin; Chengde Wan; Necati Cihan Camgoz; Shugao Ma; Fernando De la Torre; |
621 | Corrupting Neuron Explanations of Deep Visual Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are growing concerns that these explainability methods are not robust and trustworthy. In this work, we perform the first robustness analysis of Neuron Explanation Methods under a unified pipeline and show that these explanations can be significantly corrupted by random noises and well-designed perturbations added to their probing data. |
Divyansh Srivastava; Tuomas Oikarinen; Tsui-Wei Weng; |
622 | PNI : Industrial Anomaly Detection Using Position and Neighborhood Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods neglect the impact of position and neighborhood information on the distribution of normal features. To overcome this, we propose a new algorithm, PNI, which estimates the normal distribution using conditional probability given neighborhood features, modeled with a multi-layer perceptron network. |
Jaehyeok Bae; Jae-Han Lee; Seyun Kim; |
623 | PC-Adapter: Topology-Aware Adapter for Efficient Domain Adaption on Point Clouds with Rectified Pseudo-label Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by our observations, we propose an adapter-guided domain adaptation method, PC-Adapter, that preserves the global shape information of the source domain using an attention-based adapter, while learning the local characteristics of the target domain via another adapter equipped with graph convolution. |
Joonhyung Park; Hyunjin Seo; Eunho Yang; |
624 | Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Second, 2D evidence is noisy or partially non-existent during test time, and such imperfect 2D evidence leads to erroneous adaptation. To overcome the above issues, we introduce CycleAdapt, which cyclically adapts two networks: a human mesh reconstruction network (HMRNet) and a human motion denoising network (MDNet), given a test video. |
Hyeongjin Nam; Daniel Sungho Jung; Yeonguk Oh; Kyoung Mu Lee; |
625 | 2D3D-MATR: 2D-3D Matching Transformer for Detection-Free Registration Between Images and Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose, 2D3D-MATR, a detection-free method for accurate and robust registration between images and point clouds. |
Minhao Li; Zheng Qin; Zhirui Gao; Renjiao Yi; Chenyang Zhu; Yulan Guo; Kai Xu; |
626 | Mixed Neural Voxels for Fast Multi-view Video Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel method named MixVoxels to efficiently represent dynamic scenes which leads to fast training and rendering speed. |
Feng Wang; Sinan Tan; Xinghang Li; Zeyue Tian; Yafei Song; Huaping Liu; |
627 | Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering the difficulties in transferring highly structural patterns on the garments and discontinuous poses, existing methods often generate unsatisfactory results such as distorted textures and flickering artifacts. To address these issues, we propose a novel Deformable Motion Modulation (DMM) that utilizes geometric kernel offset with adaptive weight modulation to simultaneously perform feature alignment and style transfer. |
Wing-Yin Yu; Lai-Man Po; Ray C.C. Cheung; Yuzhi Zhao; Yu Xue; Kun Li; |
628 | Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the data scarcity issue, this paper proposes two solutions. |
Yan Luo; Min Shi; Yu Tian; Tobias Elze; Mengyu Wang; |
629 | Tracking Everything Everywhere All at Once Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. |
Qianqian Wang; Yen-Yu Chang; Ruojin Cai; Zhengqi Li; Bharath Hariharan; Aleksander Holynski; Noah Snavely; |
630 | Group Pose: A Simple Baseline for End-to-End Multi-Person Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of end-to-end multi-person pose estimation. |
Huan Liu; Qiang Chen; Zichang Tan; Jiang-Jiang Liu; Jian Wang; Xiangbo Su; Xiaolong Li; Kun Yao; Junyu Han; Errui Ding; Yao Zhao; Jingdong Wang; |
631 | Objects Do Not Disappear: Video Object Detection By Single-Frame Object Location Anticipation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We exploit continuous smooth motion in three ways. 1) Improved accuracy by using object motion as an additional source of supervision, which we obtain by anticipating object locations from a static keyframe. 2) Improved efficiency by only doing the expensive feature computations on a small subset of all frames. Because neighboring video frames are often redundant, we only compute features for a single static keyframe and predict object locations in subsequent frames. 3) Reduced annotation cost, where we only annotate the keyframe and use smooth pseudo-motion between keyframes. |
Xin Liu; Fatemeh Karimi Nejadasl; Jan C. van Gemert; Olaf Booij; Silvia L. Pintea; |
632 | CauSSL: Causality-inspired Semi-supervised Learning for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite its empirical benefits, there are still concerns in the literature about the theoretical foundation and explanation of semi-supervised segmentation. To explore this problem, this study first proposes a novel causal diagram to provide a theoretical foundation for the mainstream semi-supervised segmentation methods. Our causal diagram takes two additional intermediate variables into account, which are neglected in previous work. Drawing from this proposed causal diagram, we then introduce a causality-inspired SSL approach on top of co-training frameworks called CauSSL, to improve SSL for medical image segmentation. |
Juzheng Miao; Cheng Chen; Furui Liu; Hao Wei; Pheng-Ann Heng; |
633 | ChartReader: A Unified Framework for Chart Derendering and Comprehension Without Heuristic Rules Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing chart comprehension methods suffer from either heuristic rules or an over-reliance on OCR systems, resulting in suboptimal performance. To address these issues, we present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. |
Zhi-Qi Cheng; Qi Dai; Alexander G. Hauptmann; |
634 | Learning from Semantic Alignment Between Unpaired Multiviews for Egocentric Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We are concerned with a challenging scenario in unpaired multiview video learning. In this case, the model aims to learn comprehensive multiview representations while the cross-view semantic information exhibits variations. |
Qitong Wang; Long Zhao; Liangzhe Yuan; Ting Liu; Xi Peng; |
635 | Neural LiDAR Fields for Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints. |
Shengyu Huang; Zan Gojcic; Zian Wang; Francis Williams; Yoni Kasten; Sanja Fidler; Konrad Schindler; Or Litany; |
636 | Source-free Depth for Object Pop-out Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fortunately, though, modern learning-based methods offer promising depth maps by inference in the wild. In this work, we adapt such depth inference models for object segmentation using the objects’ "pop-out" prior in 3D. |
Zongwei WU; Danda Pani Paudel; Deng-Ping Fan; Jingjing Wang; Shuo Wang; Cédric Demonceaux; Radu Timofte; Luc Van Gool; |
637 | Token-Label Alignment for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The training target computed by the original data mixing strategy can thus be inaccurate, resulting in less effective training. To address this, we propose a token-label alignment (TL-Align) method to trace the correspondence between transformed tokens and the original tokens to maintain a label for each token. |
Han Xiao; Wenzhao Zheng; Zheng Zhu; Jie Zhou; Jiwen Lu; |
638 | Understanding 3D Object Interaction from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we would like to endow machines with the similar ability, so that intelligent agents can better explore the 3D scene or manipulate objects. |
Shengyi Qian; David F. Fouhey; |
639 | SkeleTR: Towards Skeleton-based Action Recognition in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SkeleTR, a new framework for skeleton-based action recognition. |
Haodong Duan; Mingze Xu; Bing Shuai; Davide Modolo; Zhuowen Tu; Joseph Tighe; Alessandro Bergamo; |
640 | Learning Gabor Texture Features for Fine-Grained Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenge, we propose a novel texture branch as complimentary to the CNN branch for feature extraction. |
Lanyun Zhu; Tianrun Chen; Jianxiong Yin; Simon See; Jun Liu; |
641 | Weakly-Supervised Action Localization By Hierarchically-Structured Latent Attention Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generally, they locate temporal regions by the video-level classification but overlook the temporal variations of feature semantics. To address this problem, we propose a novel attention-based hierarchically-structured latent model to learn the temporal variations of feature semantics. |
Guiqin Wang; Peng Zhao; Cong Zhao; Shusen Yang; Jie Cheng; Luziwei Leng; Jianxing Liao; Qinghai Guo; |
642 | Get3DHuman: Lifting StyleGAN-Human Into A 3D Generative Model Using Pixel-Aligned Reconstruction Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Get3DHuman, a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes by only using a limited budget of 3D ground-truth data. |
Zhangyang Xiong; Di Kang; Derong Jin; Weikai Chen; Linchao Bao; Shuguang Cui; Xiaoguang Han; |
643 | Query6DoF: Learning Sparse Queries As Implicit Shape Prior for Category-Level 6DoF Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel 6D pose estimation network, named Query6DoF, based on a series of category-specific sparse queries that represent the prior shape. |
Ruiqi Wang; Xinggang Wang; Te Li; Rong Yang; Minhong Wan; Wenyu Liu; |
644 | Towards High-Quality Specular Highlight Removal By Leveraging Large-Scale Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to remove specular highlights from a single object-level image. |
Gang Fu; Qing Zhang; Lei Zhu; Chunxia Xiao; Ping Li; |
645 | An Embarrassingly Simple Backdoor Attack on Self-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the extent to which this robustness superiority generalizes to other types of attacks remains an open question. We explore this question in the context of backdoor attacks. |
Changjiang Li; Ren Pang; Zhaohan Xi; Tianyu Du; Shouling Ji; Yuan Yao; Ting Wang; |
646 | Cross-Modal Translation and Alignment for Survival Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The latter would discard pathological information irrelevant to gene expression. To address these issues, we present a Cross-Modal Translation and Alignment (CMTA) framework to explore the intrinsic cross-modal correlations and transfer potential complementary information. |
Fengtao Zhou; Hao Chen; |
647 | Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we create the first large and challenging multi-modal dataset, Chaotic World, that simultaneously provides different levels of fine-grained and dense spatio-temporal annotations of sounds, individual actions and group interaction graphs, and even text descriptions for each scene in each video, thereby enabling a thorough analysis of complicated behaviors in crowds and chaos. |
Kian Eng Ong; Xun Long Ng; Yanchao Li; Wenjie Ai; Kuangyi Zhao; Si Yong Yeo; Jun Liu; |
648 | Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a multi-modal auto labeling pipeline capable of generating amodal 3D bounding boxes and tracklets for training models on open-set categories without 3D human labels. |
Mahyar Najibi; Jingwei Ji; Yin Zhou; Charles R. Qi; Xinchen Yan; Scott Ettinger; Dragomir Anguelov; |
649 | Towards Grand Unified Representation Learning for Unsupervised Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they ignore the fact that USL-VI-ReID is a cross-modality retrieval task with the hierarchical discrepancy, i.e., camera variation and modality discrepancy, resulting in clustering inconsistencies and ambiguous cross-modality label association. To address these issues, we propose a hierarchical framework to learn grand unified representation (GUR) for USL-VI-ReID. |
Bin Yang; Jun Chen; Mang Ye; |
650 | Active Stereo Without Pattern Projector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel framework integrating the principles of active stereo in standard passive camera systems without a physical pattern projector. |
Luca Bartolomei; Matteo Poggi; Fabio Tosi; Andrea Conti; Stefano Mattoccia; |
651 | Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we empirically investigate that if a neural network is enforced to fit a discontinuous piecewise function to reach a fixed small error, the time costs will increase exponentially with respect to the boundaries in the spatial domain of the target signal. |
Ke Liu; Feng Liu; Haishuai Wang; Ning Ma; Jiajun Bu; Bo Han; |
652 | Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, current methods with a fixed trained model do not work uniformly well across various datasets, greatly limiting their real-world applicability. To tackle this issue, this paper proposes a new perspective to dynamically calculate correlation for robust stereo matching. |
Junpeng Jing; Jiankun Li; Pengfei Xiong; Jiangyu Liu; Shuaicheng Liu; Yichen Guo; Xin Deng; Mai Xu; Lai Jiang; Leonid Sigal; |
653 | ReFit: Recurrent Fitting Network for 3D Human Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Recurrent Fitting (ReFit), a neural network architecture for single-image, parametric 3D human reconstruction. |
Yufu Wang; Kostas Daniilidis; |
654 | Towards Instance-adaptive Inference for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework. |
Chun-Mei Feng; Kai Yu; Nian Liu; Xinxing Xu; Salman Khan; Wangmeng Zuo; |
655 | CGBA: Curvature-aware Geometric Black-box Attack Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel query-efficient curvature-aware geometric decision-based black-box attack (CGBA) that conducts boundary search along a semicircular path on a restricted 2D plane to ensure finding a boundary point successfully irrespective of the boundary curvature. |
Md Farhamdur Reza; Ali Rahmati; Tianfu Wu; Huaiyu Dai; |
656 | Unsupervised Facial Performance Editing Via Vector-Quantized StyleGAN Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel face editing framework that combines a 3D face model with StyleGAN vector-quantization to learn multi-level semantic facial control. |
Berkay Kicanaoglu; Pablo Garrido; Gaurav Bharaj; |
657 | Online Clustered Codebook Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE). |
Chuanxia Zheng; Andrea Vedaldi; |
658 | A Multidimensional Analysis of Social Biases in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The embedding spaces of image models have been shown to encode a range of social biases such as racism and sexism. Here, we investigate specific factors that contribute to the emergence of these biases in Vision Transformers (ViT). |
Jannik Brinkmann; Paul Swoboda; Christian Bartelt; |
659 | PGFed: Personalize Each Client’s Global Objective for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observed that this implicit knowledge transfer fails to maximize the potential of each client’s empirical risk toward other clients. Based on our observation, in this work, we propose Personalized Global Federated Learning (PGFed), a novel personalized FL framework that enables each client to personalize its own global objective by explicitly and adaptively aggregating the empirical risks of itself and other clients. |
Jun Luo; Matias Mendieta; Chen Chen; Shandong Wu; |
660 | Verbs in Action: Improving Verb Understanding in Video-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we improve verb understanding for CLIP-based video-language models by proposing a new Verb-Focused Contrastive (VFC) framework. |
Liliane Momeni; Mathilde Caron; Arsha Nagrani; Andrew Zisserman; Cordelia Schmid; |
661 | Zero-Shot Point Cloud Segmentation By Semantic-Visual Aware Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a feature synthesis approach for zero-shot semantic segmentation of 3D point clouds, enabling generalization to previously unseen categories. |
Yuwei Yang; Munawar Hayat; Zhao Jin; Hongyuan Zhu; Yinjie Lei; |
662 | Exploring Predicate Visual Context in Detecting of Human-Object Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This naturally hinders the recognition of complex or ambiguous interactions. In this work, we study these issues through visualisations and carefully designed experiments. |
Frederic Z Zhang; Yuhui Yuan; Dylan Campbell; Zhuoyao Zhong; Stephen Gould; |
663 | Robo3D: Towards Robust and Reliable 3D Perception Against Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To pursue better robustness, we propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency. |
Lingdong Kong; Youquan Liu; Xin Li; Runnan Chen; Wenwei Zhang; Jiawei Ren; Liang Pan; Kai Chen; Ziwei Liu; |
664 | Towards Saner Deep Image Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that most existing registrations suffer from low inverse consistency and nondiscrimination of identical pairs due to overly optimized image similarities. To rectify these behaviors, we propose a novel regularization-based sanity-enforcer method that imposes two sanity checks on the deep model to reduce its inverse consistency errors and increase its discriminative power simultaneously. |
Bin Duan; Ming Zhong; Yan Yan; |
665 | Instance and Category Supervision Are Alternate Learners for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reformulate SSL from the information-theoretic perspective by disentangling the goal of instance-level discrimination, and tackle the trade-off to promote compact representations with maximally preserved invariance to distortion. |
Xudong Tian; Zhizhong Zhang; Xin Tan; Jun Liu; Chengjie Wang; Yanyun Qu; Guannan Jiang; Yuan Xie; |
666 | Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on a particular setting of learning adaptive prompts on the fly for each test sample from an unseen new domain, which is known as test-time prompt tuning (TPT). |
Chun-Mei Feng; Kai Yu; Yong Liu; Salman Khan; Wangmeng Zuo; |
667 | Interaction-aware Joint Attention Estimation Using People Attributes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes joint attention estimation in a single image. |
Chihiro Nakatani; Hiroaki Kawashima; Norimichi Ukita; |
668 | GePSAn: Generative Procedure Step Anticipation in Cooking Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of future step anticipation in procedural videos. |
Mohamed A. Abdelsalam; Samrudhdhi B. Rangrej; Isma Hadji; Nikita Dvornik; Konstantinos G. Derpanis; Afsaneh Fazly; |
669 | Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this research gap, we comprehensively study the class imbalance problem for SSOD under more challenging scenarios, thus forming the first experimental setting for class imbalanced SSOD (CI-SSOD). |
Jiaming Li; Xiangru Lin; Wei Zhang; Xiao Tan; Yingying Li; Junyu Han; Errui Ding; Jingdong Wang; Guanbin Li; |
670 | SLCA: Slow Learner with Classifier Alignment for Continual Learning on A Pre-trained Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. |
Gengwei Zhang; Liyuan Wang; Guoliang Kang; Ling Chen; Yunchao Wei; |
671 | Implicit Temporal Modeling with Learnable Alignment for Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper, we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance. |
Shuyuan Tu; Qi Dai; Zuxuan Wu; Zhi-Qi Cheng; Han Hu; Yu-Gang Jiang; |
672 | Non-Coaxial Event-Guided Motion Deblurring with Spatial Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the application of the event camera, we propose the first Non-coaxial Event-guided Image Deblurring (NEID) approach that utilizes the camera setup composed of a standard frame-based camera with a non-coaxial single event camera. |
Hoonhee Cho; Yuhwan Jeong; Taewoo Kim; Kuk-Jin Yoon; |
673 | Fingerprinting Deep Image Restoration Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a fingerprinting framework for DNN models of image restoration. |
Yuhui Quan; Huan Teng; Ruotao Xu; Jun Huang; Hui Ji; |
674 | AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. |
Lijiang Li; Huixia Li; Xiawu Zheng; Jie Wu; Xuefeng Xiao; Rui Wang; Min Zheng; Xin Pan; Fei Chao; Rongrong Ji; |
675 | SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a new large-scale multi-object tracking dataset in multiple sports scenes, coined as SportsMOT, where all players on the court are supposed to be tracked. |
Yutao Cui; Chenkai Zeng; Xiaoyu Zhao; Yichun Yang; Gangshan Wu; Limin Wang; |
676 | Localizing Moments in Long Video Via Multimodal Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows. |
Wayner Barrios; Mattia Soldan; Alberto Mario Ceballos-Arroyo; Fabian Caba Heilbron; Bernard Ghanem; |
677 | Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PARQ – a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. |
Yiming Xie; Huaizu Jiang; Georgia Gkioxari; Julian Straub; |
678 | Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. |
Nikolaos-Antonios Ypsilantis; Kaifeng Chen; Bingyi Cao; Mário Lipovský; Pelin Dogan-Schönberger; Grzegorz Makosa; Boris Bluntschli; Mojtaba Seyedhosseini; Ondřej Chum; André Araujo; |
679 | SemARFlow: Injecting Semantics Into Unsupervised Optical Flow Estimation for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SemARFlow, an unsupervised optical flow network designed for autonomous driving data that takes estimated semantic segmentation masks as additional inputs. |
Shuai Yuan; Shuzhi Yu; Hannah Kim; Carlo Tomasi; |
680 | TiDAL: Learning Training Dynamics for Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first provide theoretical and empirical evidence to argue the usefulness of utilizing the ever-changing model behavior rather than the fully trained model snapshot. We then propose a novel AL method, Training Dynamics for Active Learning (TiDAL), which efficiently predicts the training dynamics of unlabeled data to estimate their uncertainty. |
Seong Min Kye; Kwanghee Choi; Hyeongmin Byun; Buru Chang; |
681 | Uncertainty-aware Unsupervised Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a tracklet-guided augmentation strategy to simulate the tracklet’s motion, which adopts a hierarchical uncertainty-based sampling mechanism for hard sample mining. |
Kai Liu; Sheng Jin; Zhihang Fu; Ze Chen; Rongxin Jiang; Jieping Ye; |
682 | DPS-Net: Deep Polarimetric Stereo Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel neural network, i.e., DPS-Net, to exploit both the prior geometric knowledge and polarimetric information for depth estimation with two polarimetric stereo images. |
Chaoran Tian; Weihong Pan; Zimo Wang; Mao Mao; Guofeng Zhang; Hujun Bao; Ping Tan; Zhaopeng Cui; |
683 | Designing Phase Masks for Under-Display Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we incorporate phase masks on display panels to tackle both challenges. |
Anqi Yang; Eunhee Kang; Hyong-Euk Lee; Aswin C. Sankaranarayanan; |
684 | Can Language Models Learn to Listen? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker’s words. |
Evonne Ng; Sanjay Subramanian; Dan Klein; Angjoo Kanazawa; Trevor Darrell; Shiry Ginosar; |
685 | SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify that the poor INT8 latency is due to the quantization-unfriendly issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse quantization efficiency and can slow down the INT8 inference speed. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, quantization-friendly search space for each target hardware. |
Xudong Wang; Li Lyna Zhang; Jiahang Xu; Quanlu Zhang; Yujing Wang; Yuqing Yang; Ningxin Zheng; Ting Cao; Mao Yang; |
686 | How Far Pre-trained Models Are from Neural Collapse on The Target Dataset Informs Their Transferability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the neural collapse (NC) that reveals the feature geometry at the terminal stage of training, our method considers the model transferability as how far the target activations obtained by pre-trained models are from their hypothetical state in the terminal phase of the fine-tuned model. We propose a metric that computes this proximity based on three phenomena of NC: within-class variability collapse, simplex encoded label interpolation geometry structure is formed, and the nearest center classifier becomes optimal on training data. |
Zijian Wang; Yadan Luo; Liang Zheng; Zi Huang; Mahsa Baktashmotlagh; |
687 | SurfsUP: Learning Fluid Simulation for Novel Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SurfsUP, a framework that represents objects implicitly using signed distance functions (SDFs), rather than an explicit representation of meshes or particles. |
Arjun Mani; Ishaan Preetam Chandratreya; Elliot Creager; Carl Vondrick; Richard Zemel; |
688 | Convolutional Networks with Oriented 1D Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask an intriguing question: can we make a ConvNet work without 2D convolutions? |
Alexandre Kirchmeyer; Jia Deng; |
689 | Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-Trained Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection. |
Kecheng Zheng; Wei Wu; Ruili Feng; Kai Zhu; Jiawei Liu; Deli Zhao; Zheng-Jun Zha; Wei Chen; Yujun Shen; |
690 | Skill Transformer: A Monolithic Policy for Mobile Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Skill Transformer, an approach for solving long-horizon robotic tasks by combining conditional sequence modeling and skill modularity. |
Xiaoyu Huang; Dhruv Batra; Akshara Rai; Andrew Szot; |
691 | Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient ViT-based tracking framework, Aba-ViTrack, for UAV tracking. |
Shuiwang Li; Yangxiang Yang; Dan Zeng; Xucheng Wang; |
692 | Improving Pixel-based MIM By Reducing Wasted Modeling Capability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The former offers a simpler pipeline and lower computational cost, but it is known to be biased toward high-frequency details. In this paper, we provide a set of empirical studies to confirm this limitation of pixel-based MIM and propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction. |
Yuan Liu; Songyang Zhang; Jiacheng Chen; Zhaohui Yu; Kai Chen; Dahua Lin; |
693 | Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency compared with BPTT. |
Qingyan Meng; Mingqing Xiao; Shen Yan; Yisen Wang; Zhouchen Lin; Zhi-Quan Luo; |
694 | Persistent-Transient Duality: A Multi-Mechanism Approach for Modeling Human-Object Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge that gap, this work proposes to model two concurrent mechanisms that jointly control human motion: the Persistent process that runs continually on the global scale, and the Transient sub-processes that operate intermittently on the local context of the human while interacting with objects. |
Hung Tran; Vuong Le; Svetha Venkatesh; Truyen Tran; |
695 | When to Learn What: Model-Adaptive Data Augmentation Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose "Model-Adaptive Data Augmentation (MADAug)" that jointly trains an augmentation policy network to teach the model "when to learn what". |
Chengkai Hou; Jieyu Zhang; Tianyi Zhou; |
696 | DiffPose: Multi-hypothesis Human Pose Estimation Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose DiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. |
Karl Holmquist; Bastian Wandt; |
697 | AesPA-Net: Aesthetic Pattern-Aware Style Transfer Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel metric, namely pattern repeatability, that quantifies the repetition of patterns in the style image. |
Kibeom Hong; Seogkyu Jeon; Junsoo Lee; Namhyuk Ahn; Kunhee Kim; Pilhyeon Lee; Daesik Kim; Youngjung Uh; Hyeran Byun; |
698 | COPILOT: Human-Environment Collision Prediction and Localization from Egocentric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the challenging problem of predicting collisions in diverse environments from multi-view egocentric videos captured from body-mounted cameras. |
Boxiao Pan; Bokui Shen; Davis Rempe; Despoina Paschalidou; Kaichun Mo; Yanchao Yang; Leonidas J. Guibas; |
699 | EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an equirectangular geometry-biased transformer termed EGformer. |
Ilwi Yun; Chanyong Shin; Hyunku Lee; Hyuk-Jae Lee; Chae Eun Rhee; |
700 | Size Does Matter: Size-aware Virtual Try-on Via Clothing-oriented Transformation Try-on Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, there is a critical unaddressed challenge of adjusting clothing sizes for try-on. To tackle these issues, we propose a Clothing-Oriented Transformation Try-On Network (COTTON). |
Chieh-Yun Chen; Yi-Chung Chen; Hong-Han Shuai; Wen-Huang Cheng; |
701 | Generating Realistic Images from In-the-wild Sounds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to generate images from wild sounds. |
Taegyeong Lee; Jeonghun Kang; Hyeonyu Kim; Taehwan Kim; |
702 | DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is especially inappropriate for data-starved hyperspectral image (HSI) restoration. To tackle this problem, this work puts forth a self-supervised diffusion model for HSI restoration, namely Denoising Diffusion Spatio-Spectral Model (DDS2M), which works by inferring the parameters of the proposed Variational Spatio-Spectral Module (VS2M) during the reverse diffusion process, solely using the degraded HSI without any extra training data. |
Yuchun Miao; Lefei Zhang; Liangpei Zhang; Dacheng Tao; |
703 | Candidate-aware Selective Disambiguation Based On Normalized Entropy for Instance-dependent Partial-label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start with an empirical study of the dynamics of label disambiguation in both II-PLL and ID-PLL. |
Shuo He; Guowu Yang; Lei Feng; |
704 | Open-vocabulary Video Question Answering: A New Benchmark for Evaluating The Generalizability of Video Question Answering Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hence propose a new benchmark, Open-vocabulary Video Question Answering (OVQA), to measure the generalizability of VideoQA models by considering rare and unseen answers. |
Dohwan Ko; Ji Soo Lee; Miso Choi; Jaewon Chu; Jihwan Park; Hyunwoo J. Kim; |
705 | Using A Waffle Iron for Automotive Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions. |
Gilles Puy; Alexandre Boulch; Renaud Marlet; |
706 | AutoReP: Automatic ReLU Replacement for Fast Private Network Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many attempts to reduce ReLU operations exist, but they may need heuristic threshold selection or cause substantial accuracy loss. This work introduces AutoReP, a gradient-based approach to lessen non-linear operators and alleviate these issues. |
Hongwu Peng; Shaoyi Huang; Tong Zhou; Yukui Luo; Chenghong Wang; Zigeng Wang; Jiahui Zhao; Xi Xie; Ang Li; Tony Geng; Kaleel Mahmood; Wujie Wen; Xiaolin Xu; Caiwen Ding; |
707 | MotionLM: Multi-Agent Motion Forecasting As Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. |
Ari Seff; Brian Cera; Dian Chen; Mason Ng; Aurick Zhou; Nigamaa Nayakanti; Khaled S. Refaat; Rami Al-Rfou; Benjamin Sapp; |
708 | Black Box Few-Shot Adaptation for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these shortcomings, in this work, we describe a black-box method for V-L few-shot adaptation that (a) operates on pre-computed image and text features and hence works without access to the model’s weights, (b) it is orders of magnitude faster at training time, (c) it is amenable to both supervised and unsupervised training, and (d) it can be even used to align image and text features computed from uni-modal models. To achieve this, we propose Linear Feature Alignment (LFA), a simple linear approach for V-L re-alignment in the target domain. |
Yassine Ouali; Adrian Bulat; Brais Matinez; Georgios Tzimiropoulos; |
709 | Center-Based Decoupled Point-cloud Registration for 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel center-based decoupled point cloud registration framework for robust 6D object pose estimation in real-world scenarios. |
Haobo Jiang; Zheng Dang; Shuo Gu; Jin Xie; Mathieu Salzmann; Jian Yang; |
710 | Self-Ordering Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we address the task of finding representative subsets of points in a 3D point cloud by means of a point-wise ordering. |
Pengwan Yang; Cees G. M. Snoek; Yuki M. Asano; |
711 | Continual Segment: Towards A Single, Unified and Non-forgetting Continual Segmentation Model of 143 Whole-body Organs in CT Scans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new architectural CSS learning framework to learn a single deep segmentation model for segmenting a total of 143 whole-body organs. |
Zhanghexuan Ji; Dazhou Guo; Puyang Wang; Ke Yan; Le Lu; Minfeng Xu; Qifeng Wang; Jia Ge; Mingchen Gao; Xianghua Ye; Dakai Jin; |
712 | Enhancing Modality-Agnostic Representations Via Meta-Learning for Brain Tumor Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach to learn enhanced modality-agnostic representations by employing a meta-learning strategy in training, even when only limited full modality samples are available. |
Aishik Konwer; Xiaoling Hu; Joseph Bae; Xuan Xu; Chao Chen; Prateek Prasanna; |
713 | Zero-1-to-3: Zero-shot One Image to 3D Object Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. |
Ruoshi Liu; Rundi Wu; Basile Van Hoorick; Pavel Tokmakov; Sergey Zakharov; Carl Vondrick; |
714 | 3D Distillation: Improving Self-Supervised Monocular Depth Estimation on Reflective Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, reflective surfaces can be accurately reconstructed by aggregating the predicted depth of these views. Motivated by this observation, we propose 3D distillation: a novel training framework that utilizes the projected depth of reconstructed reflective surfaces to generate reasonably accurate depth pseudo-labels. |
Xuepeng Shi; Georgi Dikov; Gerhard Reitmayr; Tae-Kyun Kim; Mohsen Ghafoorian; |
715 | GAIT: Generating Aesthetic Indoor Tours with Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GAIT, a framework for training a Deep Reinforcement Learning (DRL) agent, that learns to automatically control a camera to generate a sequence of aesthetically meaningful views for synthetic 3D indoor scenes. |
Desai Xie; Ping Hu; Xin Sun; Soren Pirk; Jianming Zhang; Radomir Mech; Arie E. Kaufman; |
716 | Low-Light Image Enhancement with Multi-Stage Residue Quantization and Brightness-Aware Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a brightness-aware network with normal-light priors based on brightness-aware attention and residual quantized codebook. |
Yunlong Liu; Tao Huang; Weisheng Dong; Fangfang Wu; Xin Li; Guangming Shi; |
717 | Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Generating adjacency matrices with semantically meaningful edges is particularly important for this task, but extracting such edges is challenging problem. To solve this, we propose a hierarchically decomposed graph convolutional network (HD-GCN) architecture with a novel hierarchically decomposed graph (HD-Graph). |
Jungho Lee; Minhyeok Lee; Dogyoon Lee; Sangyoun Lee; |
718 | LIST: Learning Implicitly from Spatial Transformers for Single-View 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing explicit/implicit solutions to this problem struggle to recover self-occluded geometry and/or faithfully reconstruct topological shape structures. To resolve this dilemma, we introduce LIST, a novel neural architecture that leverages local and global image features to accurately reconstruct the geometric and topological structure of a 3D object from a single image. |
Mohammad Samiul Arshad; William J. Beksi; |
719 | Rethinking Mobile Block for Efficient Attention-based Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following simple but effective design criterion, we deduce a modern Inverted Residual Mobile Block (iRMB) and build a ResNet-like Efficient MOdel (EMO) with only iRMB for down-stream tasks. |
Jiangning Zhang; Xiangtai Li; Jian Li; Liang Liu; Zhucun Xue; Boshen Zhang; Zhengkai Jiang; Tianxin Huang; Yabiao Wang; Chengjie Wang; |
720 | REAP: A Large-Scale Realistic Adversarial Patch Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the REAP (REalistic Adversarial Patch) benchmark, a digital benchmark that enables the evaluations on real images under real-world conditions. |
Nabeel Hingun; Chawin Sitawarin; Jerry Li; David Wagner; |
721 | LRRU: Long-short Range Recurrent Updating Networks for Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To accomplish depth completion more efficiently, we propose a novel lightweight deep network framework, the Long-short Range Recurrent Updating (LRRU) network. |
Yufei Wang; Bo Li; Ge Zhang; Qi Liu; Tao Gao; Yuchao Dai; |
722 | MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments, involving overall six sensor corruptions and two extreme sensor-missing situations. |
Chongjian Ge; Junsong Chen; Enze Xie; Zhongdao Wang; Lanqing Hong; Huchuan Lu; Zhenguo Li; Ping Luo; |
723 | DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering. |
Wei Cheng; Ruixiang Chen; Siming Fan; Wanqi Yin; Keyu Chen; Zhongang Cai; Jingbo Wang; Yang Gao; Zhengming Yu; Zhengyu Lin; Daxuan Ren; Lei Yang; Ziwei Liu; Chen Change Loy; Chen Qian; Wayne Wu; Dahua Lin; Bo Dai; Kwan-Yee Lin; |
724 | Exploring Temporal Concurrency for Video-Language Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to learn video-language representations by modeling video-language pairs as Temporal Concurrent Processes (TCP) via a process-wised distance metric learning framework. |
Heng Zhang; Daqing Liu; Zezhong Lv; Bing Su; Dacheng Tao; |
725 | StegaNeRF: Embedding Invisible Information Within Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce StegaNeRF, an innovative approach for steganographic information embedding within NeRF renderings. |
Chenxin Li; Brandon Y. Feng; Zhiwen Fan; Panwang Pan; Zhangyang Wang; |
726 | DynamicISP: Dynamically Controlled Image Signal Processor for Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The latter has expressive power, but the computational cost is too heavy on edge devices. To solve these problems, we propose "DynamicISP," which consists of multiple classical ISP functions and dynamically controls the parameters of each frame according to the recognition result of the previous frame. |
Masakazu Yoshimura; Junji Otsuka; Atsushi Irie; Takeshi Ohashi; |
727 | R-Pred: Two-Stage Motion Prediction Via Tube-Query Attention-Based Trajectory Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a two-stage motion prediction method, called R-Pred, designed to effectively utilize both scene and interaction context using a cascade of the initial trajectory proposal and trajectory refinement networks. |
Sehwan Choi; Jungho Kim; Junyong Yun; Jun Won Choi; |
728 | A Step Towards Understanding Why Classification Helps Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A number of computer vision deep regression approaches report improved results when adding a classification loss to the regression loss. Here, we explore why this is useful in practice and when it is beneficial. |
Silvia L. Pintea; Yancong Lin; Jouke Dijkstra; Jan C. van Gemert; |
729 | Robust Evaluation of Diffusion-Based Adversarial Purification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the current practices and provide a new guideline for measuring the robustness of purification methods against adversarial attacks. |
Minjong Lee; Dongwoo Kim; |
730 | Hyperbolic Audio-visual Zero-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The proposed approach employs a novel loss function that incorporates cross-modality alignment between video and audio features in the hyperbolic space. |
Jie Hong; Zeeshan Hayder; Junlin Han; Pengfei Fang; Mehrtash Harandi; Lars Petersson; |
731 | CTP:Towards Vision-Language Continual Pretraining Via Compatible Momentum Contrast and Topology Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We comprehensively study the characteristics and challenges of VLCP, and propose a new algorithm: Compatible momentum contrast with Topology Preservation, dubbed CTP. |
Hongguang Zhu; Yunchao Wei; Xiaodan Liang; Chunjie Zhang; Yao Zhao; |
732 | Aggregating Feature Point Cloud for Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the limited receptive field of conventional convolution, the generalizability with respect to different sparsity levels of input depth maps is impeded. To tackle these problems, we propose a feature point cloud aggregation framework to directly propagate 3D depth information between the given points and the missing ones. |
Zhu Yu; Zehua Sheng; Zili Zhou; Lun Luo; Si-Yuan Cao; Hong Gu; Huaqi Zhang; Hui-Liang Shen; |
733 | FLIP: Cross-domain Face Anti-spoofing with Language Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we show that aligning the image representation with an ensemble of class descriptions (based on natural language semantics) improves FAS generalizability in low-data regimes. |
Koushik Srivatsan; Muzammal Naseer; Karthik Nandakumar; |
734 | Distribution Shift Matters for Knowledge Distillation with Webly Collected Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of them have ignored the common distribution shift between the instances from original training data and webly collected data, affecting the reliability of the trained student network. To solve this problem, we propose a novel method dubbed "Knowledge Distillation between Different Distributions" (KD^ 3 ), which consists of three components. |
Jialiang Tang; Shuo Chen; Gang Niu; Masashi Sugiyama; Chen Gong; |
735 | Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation. |
Guandu Liu; Yukang Ding; Mading Li; Ming Sun; Xing Wen; Bin Wang; |
736 | Action Sensitivity Learning for Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing approaches directly predict action classes and regress offsets to boundaries, while overlooking the discrepant importance of each frame. In this paper, we propose an Action Sensitivity Learning framework (ASL) to tackle this task, which aims to assess the value of each frame and then leverage the generated action sensitivity to recalibrate the training procedure. |
Jiayi Shao; Xiaohan Wang; Ruijie Quan; Junjun Zheng; Jiang Yang; Yi Yang; |
737 | Gram-based Attentive Neural Ordinary Differential Equations Network for Video Nystagmography Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent automatic VNG classification methods approach this problem from the perspective of video analysis without considering medical prior knowledge, resulting in unsatisfactory accuracy and limited diagnostic capability for nystagmographic types, thereby preventing their clinical application. In this paper, we propose an end-to-end data-driven novel BPPV diagnosis framework (TC-BPPV) by considering this problem as an eye trajectory classification problem due to the disease’s symptoms and experts’ prior knowledge. |
Xihe Qiu; Shaojie Shi; Xiaoyu Tan; Chao Qu; Zhijun Fang; Hailing Wang; Yongbin Gao; Peixia Wu; Huawei Li; |
738 | PEANUT: Predicting and Navigating to Unseen Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Efficient ObjectGoal navigation (ObjectNav) in novel environments requires an understanding of the spatial and semantic regularities in environment layouts. In this work, we present a straightforward method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps. |
Albert J. Zhai; Shenlong Wang; |
739 | Pluralistic Aging Diffusion Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel CLIP-driven Pluralistic Aging Diffusion Autoencoder (PADA) to enhance the diversity of aging patterns. |
Peipei Li; Rui Wang; Huaibo Huang; Ran He; Zhaofeng He; |
740 | ModelGiF: Gradient Fields for Model Functional Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the concept of field in physics, in this work we introduce Model Gradient Field (abbr. ModelGiF) to extract homogeneous representations from the heterogeneous pre-trained models. |
Jie Song; Zhengqi Xu; Sai Wu; Gang Chen; Mingli Song; |
741 | PoseDiffusion: Solving Pose Estimation Via Diffusion-aided Bundle Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. |
Jianyuan Wang; Christian Rupprecht; David Novotny; |
742 | TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on this approach, we introduce TIFA v1.0, a benchmark consisting of 4K diverse text inputs and 25K questions across 12 categories (object, counting, etc.). |
Yushi Hu; Benlin Liu; Jungo Kasai; Yizhong Wang; Mari Ostendorf; Ranjay Krishna; Noah A. Smith; |
743 | SIGMA: Scale-Invariant Global Sparse Shape Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for highly non-rigid shapes. |
Maolin Gao; Paul Roetzer; Marvin Eisenberger; Zorah Lähner; Michael Moeller; Daniel Cremers; Florian Bernard; |
744 | CORE: Cooperative Reconstruction for Multi-Agent Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CORE, a conceptually simple, effective and communication-efficient model for multi-agent cooperative perception. |
Binglu Wang; Lei Zhang; Zhaozhong Wang; Yongqiang Zhao; Tianfei Zhou; |
745 | VidStyleODE: Disentangled Video Editing Via StyleGAN and NeuralODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose VidStyleODE, a spatiotemporally continuous disentangled video representation based upon StyleGAN and Neural-ODEs. |
Moayed Haji Ali; Andrew Bond; Tolga Birdal; Duygu Ceylan; Levent Karacan; Erkut Erdem; Aykut Erdem; |
746 | SEFD: Learning to Distill Complex Pose and Occlusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although many improvements have been made in 3D human mesh estimation using the two-dimensional (2D) pose with occlusion between humans, occlusion from complex poses and other objects remains a consistent problem. Therefore, we propose the novel Skinned Multi-Person Linear (SMPL) Edge Feature Distillation (SEFD) that demonstrates robustness to complex poses and occlusions, without increasing the number of parameters compared to the baseline model. |
ChangHee Yang; Kyeongbo Kong; SungJun Min; Dongyoon Wee; Ho-Deok Jang; Geonho Cha; SukJu Kang; |
747 | CiT: Curation in Training for Effective Vision-Language Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper trades generality for efficiency and presents Curation in Training (CiT), a simple and efficient vision-text learning algorithm that couples a data objective into training. |
Hu Xu; Saining Xie; Po-Yao Huang; Licheng Yu; Russell Howes; Gargi Ghosh; Luke Zettlemoyer; Christoph Feichtenhofer; |
748 | SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. |
Guangcong Wang; Zhaoxi Chen; Chen Change Loy; Ziwei Liu; |
749 | Towards Models That Can See and Read Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their obvious resemblance, the two are treated independently and, as we show, yield task-specific methods that can either see or read, but not both. In this work, we conduct an in-depth analysis of this phenomenon and propose UniTNT, a Unified Text-Non-Text approach, which grants existing multimodal architectures scene-text understanding capabilities. |
Roy Ganz; Oren Nuriel; Aviad Aberdam; Yair Kittenplon; Shai Mazor; Ron Litman; |
750 | ProPainter: Improving Propagation and Transformer for Video Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. |
Shangchen Zhou; Chongyi Li; Kelvin C.K. Chan; Chen Change Loy; |
751 | Query Refinement Transformer for 3D Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides,noise background queries interfere with proper scene perception and accurate instance segmentation. To address the above issues, we propose a Query Refinement Transformer termed QueryFormer. |
Jiahao Lu; Jiacheng Deng; Chuxin Wang; Jianfeng He; Tianzhu Zhang; |
752 | Root Pose Decomposition Towards Generic Non-rigid 3D Reconstruction with Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Concretely, we aim at building high-fidelity models for generic object categories and casually captured scenes. |
Yikai Wang; Yinpeng Dong; Fuchun Sun; Xiao Yang; |
753 | 3DHumanGAN: 3D-Aware Human Image Generation with 3D Pose Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 3DHumanGAN, a 3D-aware generative adversarial network that synthesizes photorealistic images of full-body humans with consistent appearances under different view-angles and body-poses. |
Zhuoqian Yang; Shikai Li; Wayne Wu; Bo Dai; |
754 | LeaF: Learning Frames for 4D Point Cloud Sequence Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on learning descriptive geometry and motion features from 4D point cloud sequences in this work. |
Yunze Liu; Junyu Chen; Zekai Zhang; Jingwei Huang; Li Yi; |
755 | GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As such, we concentrate on improving the 3D human pose lifting via ground truth data for the future improvement of more quality estimated pose data. Towards this goal, a simple yet effective model called Global-local Adaptive Graph Convolutional Network (GLA-GCN) is proposed in this work. |
Bruce X.B. Yu; Zhi Zhang; Yongxu Liu; Sheng-hua Zhong; Yan Liu; Chang Wen Chen; |
756 | Snow Removal in Video: A New Dataset and A Novel Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we target a more complex task — video snow removal, which aims to restore the clear video from the snowy video. |
Haoyu Chen; Jingjing Ren; Jinjin Gu; Hongtao Wu; Xuequan Lu; Haoming Cai; Lei Zhu; |
757 | Degradation-Resistant Unfolding Network for Heterogeneous Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Degradation-Resistant Unfolding Network (DeRUN) for the HIF task to generate high-quality fused images even in degradation scenarios. |
Chunming He; Kai Li; Guoxia Xu; Yulun Zhang; Runze Hu; Zhenhua Guo; Xiu Li; |
758 | Priority-Centric Human Motion Generation in Discrete Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion representation, incorporating a global self-attention mechanism and a regularization term to counteract code collapse. |
Hanyang Kong; Kehong Gong; Dongze Lian; Michael Bi Mi; Xinchao Wang; |
759 | Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose to build a framework that supports disentanglement and learning of domain-specific factors and task-specific factors in a unified model. |
Sunandini Sanyal; Ashish Ramayee Asokan; Suvaansh Bhambri; Akshay Kulkarni; Jogendra Nath Kundu; R Venkatesh Babu; |
760 | Towards Improved Input Masking for Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new masking method for CNNs we call layer masking in which the missingness bias caused by masking is reduced to a large extent. |
Sriram Balasubramanian; Soheil Feizi; |
761 | 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, we explore a more challenging yet practical 3D attack setting, i.e., attacking point clouds with black-box hard labels, in which the attacker can only have access to the prediction label of the input. To tackle this setting, we propose a novel 3D attack method, termed 3D Hard-label attacker (3DHacker), based on the developed decision boundary algorithm to generate adversarial samples solely with the knowledge of class labels. |
Yunbo Tao; Daizong Liu; Pan Zhou; Yulai Xie; Wei Du; Wei Hu; |
762 | Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing trackers are hampered by low speed, limiting their applicability on devices with limited computational power. To alleviate this problem, we propose HiT, a new family of efficient tracking models that can run at high speed on different devices while retaining high performance. |
Ben Kang; Xin Chen; Dong Wang; Houwen Peng; Huchuan Lu; |
763 | Improving Zero-Shot Generalization for CLIP with Synthesized Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called Synt\HesIzed Prompts (SHIP) to improve existing fine-tuning methods. |
Zhengbo Wang; Jian Liang; Ran He; Nan Xu; Zilei Wang; Tieniu Tan; |
764 | MiniROAD: Minimal RNN Framework for Online Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the underlying reasons for the inferior performance of RNNs compared to transformer-based algorithms. |
Joungbin An; Hyolim Kang; Su Ho Han; Ming-Hsuan Yang; Seon Joo Kim; |
765 | Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner through parameter-efficient adaptations. |
Yuan Gan; Zongxin Yang; Xihang Yue; Lingyun Sun; Yi Yang; |
766 | Object-aware Gaze Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a Transformer-based architecture that automatically detects objects (including heads) in the scene to build associations between every head and the gazed-head/object, resulting in a comprehensive, explainable gaze analysis composed of: gaze target area, gaze pixel point, the class and the image location of the gazed-object. |
Francesco Tonini; Nicola Dall’Asen; Cigdem Beyan; Elisa Ricci; |
767 | Gramian Attention Heads Are Strong Yet Efficient Vision Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (i.e., classification heads) instead of relying on channel expansion or additional building blocks. |
Jongbin Ryu; Dongyoon Han; Jongwoo Lim; |
768 | VADER: Video Alignment Differencing and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. |
Alexander Black; Simon Jenni; Tu Bui; Md. Mehrab Tanjim; Stefano Petrangeli; Ritwik Sinha; Viswanathan Swaminathan; John Collomosse; |
769 | MI-GAN: A Simple Baseline for Image Inpainting on Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a simple image inpainting baseline, Mobile Inpainting GAN (MI-GAN), which is approximately one order of magnitude computationally cheaper and smaller than existing state-of-the-art inpainting models, and can be efficiently deployed on mobile devices. |
Andranik Sargsyan; Shant Navasardyan; Xingqian Xu; Humphrey Shi; |
770 | HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. |
Zijian Zhou; Miaojing Shi; Holger Caesar; |
771 | Chop & Learn: Recognizing and Generating Object-State Compositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the task of cutting objects in different styles and the resulting object state changes. |
Nirat Saini; Hanyu Wang; Archana Swaminathan; Vinoj Jayasundara; Bo He; Kamal Gupta; Abhinav Shrivastava; |
772 | Automatic Animation of Hair Blowing in Still Portrait Photos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach to animate human hair in a still portrait photo. |
Wenpeng Xiao; Wentao Liu; Yitong Wang; Bernard Ghanem; Bing Li; |
773 | A Large-Scale Outdoor Multi-Modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a large-scale outdoor multi-modal dataset, OMMO dataset, containing complex objects and scenes with calibrated images, point clouds and prompt annotations. |
Chongshan Lu; Fukun Yin; Xin Chen; Wen Liu; Tao Chen; Gang Yu; Jiayuan Fan; |
774 | 4D Panoptic Segmentation As Invariant and Equivariant Field Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation. |
Minghan Zhu; Shizhong Han; Hong Cai; Shubhankar Borse; Maani Ghaffari; Fatih Porikli; |
775 | Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach to efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for object detection, which is based on our two novel observations: (i) A MIM pre-trained vanilla ViT encoder can work surprisingly well in the challenging object-level recognition scenario even with randomly sampled partial observations, e.g., only 25% 50% of the input embeddings. |
Yuxin Fang; Shusheng Yang; Shijie Wang; Yixiao Ge; Ying Shan; Xinggang Wang; |
776 | NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Computation Imbalance in the 3D convolution across different depth levels. |
Jiawei Yao; Chuming Li; Keqiang Sun; Yingjie Cai; Hao Li; Wanli Ouyang; Hongsheng Li; |
777 | Spatio-Temporal Crop Aggregation for Video Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To train the model, we propose a self-supervised objective consisting of masked clip feature predictions. |
Sepehr Sameni; Simon Jenni; Paolo Favaro; |
778 | Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8%-77% lower than either prior technique, and that trains 24x faster than mip-NeRF 360. |
Jonathan T. Barron; Ben Mildenhall; Dor Verbin; Pratul P. Srinivasan; Peter Hedman; |
779 | Neural-PBIR Reconstruction of Shape, Material, and Illumination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce an accurate and highly efficient object reconstruction pipeline combining neural based object reconstruction and physics-based inverse rendering (PBIR). |
Cheng Sun; Guangyan Cai; Zhengqin Li; Kai Yan; Cheng Zhang; Carl Marshall; Jia-Bin Huang; Shuang Zhao; Zhao Dong; |
780 | Fg-T2M: Fine-Grained Text-Driven Human Motion Generation Via Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a fine-grained method for generating high-quality, conditional human motion sequences supporting precise text description. |
Yin Wang; Zhiying Leng; Frederick W. B. Li; Shun-Cheng Wu; Xiaohui Liang; |
781 | BlindHarmony: "Blind" Harmonization for MR Images Via Flow Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they often require datasets from multiple domains for deep learning training and may still be unsuccessful when applied to images from unseen domains. To address this limitation, we propose a novel concept called ‘Blind Harmonization’, which utilizes only target domain data for training but still has the capability to harmonize images from unseen domains. |
Hwihun Jeong; Heejoon Byun; Dong Un Kang; Jongho Lee; |
782 | Zero-guidance Segmentation Using Zero Segment Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution is a novel attention-masking technique that balances the two contexts by analyzing the attention layers inside CLIP. |
Pitchaporn Rewatbowornwong; Nattanat Chatthee; Ekapol Chuangsuwanich; Supasorn Suwajanakorn; |
783 | Efficient LiDAR Point Cloud Oversegmentation Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet efficient end-to-end LiDAR oversegmentation network, which segments superpoints from the LiDAR point cloud by grouping points based on low-level point embeddings. |
Le Hui; Linghua Tang; Yuchao Dai; Jin Xie; Jian Yang; |
784 | Communication-efficient Federated Learning with Single-Step Synthetic Features Compressor for Faster Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method named Single-Step Synthetic Features Compressor (3SFC) to achieve communication-efficient FL by directly constructing a tiny synthetic dataset containing synthetic features based on raw gradients. |
Yuhao Zhou; Mingjia Shi; Yuanxi Li; Yanan Sun; Qing Ye; Jiancheng Lv; |
785 | SVDFormer: Complementing Point Cloud Via Self-view Augmentation and Self-structure Dual-generator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel network, SVDFormer, to tackle two specific challenges in point cloud completion: understanding faithful global shapes from incomplete point clouds and generating high-accuracy local structures. |
Zhe Zhu; Honghua Chen; Xing He; Weiming Wang; Jing Qin; Mingqiang Wei; |
786 | Few-Shot Video Classification Via Representation Fusion and Promotion Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they ignore two important issues: a) It is difficult to capture rich intrinsic action semantics from a limited number of support instances within each task. b) Redundant or irrelevant frames in videos easily weaken the positive influence of discriminative frames. To address these two issues, this paper proposes a novel Representation Fusion and Promotion Learning (RFPL) mechanism with two sub-modules: meta-action learning (MAL) and reinforced image representation (RIR). |
Haifeng Xia; Kai Li; Martin Renqiang Min; Zhengming Ding; |
787 | E3Sym: Leveraging E(3) Invariance for Unsupervised 3D Planar Reflective Symmetry Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our proposed method, E3Sym, aims to detect planar reflective symmetry in an unsupervised and end-to-end manner by leveraging E(3) invariance. |
Ren-Wu Li; Ling-Xiao Zhang; Chunpeng Li; Yu-Kun Lai; Lin Gao; |
788 | CTVIS: Consistent Training for Online Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a simple yet effective training strategy, called Consistent Training for Online VIS(CTVIS), which devotes to aligning the training and inference pipelines in terms of building CIs. |
Kaining Ying; Qing Zhong; Weian Mao; Zhenhua Wang; Hao Chen; Lin Yuanbo Wu; Yifan Liu; Chengxiang Fan; Yunzhi Zhuge; Chunhua Shen; |
789 | Unsupervised Video Object Segmentation with Online Adversarial Self-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to perform online fine-tuning on the pre-trained segmentation model to adapt to any ad-hoc videos at the test time. |
Tiankang Su; Huihui Song; Dong Liu; Bo Liu; Qingshan Liu; |
790 | Hallucination Improves The Performance of Unsupervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: If the conditions are not met, these frameworks will lack semantic contrast and be fragile on overfitting. To address these two issues, we propose Hallucinator that could efficiently generate additional positive samples for further contrast. |
Jing Wu; Jennifer Hobbs; Naira Hovakimyan; |
791 | S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To the best of our knowledge, we are the first to design a nonlocal multiplex training paradigm for NeRF and relevant neural field methods via a novel Stochastic Structural SIMilarity (S3IM) loss that processes multiple data points as a whole set instead of process multiple inputs independently. |
Zeke Xie; Xindi Yang; Yujie Yang; Qi Sun; Yixiang Jiang; Haoran Wang; Yunfeng Cai; Mingming Sun; |
792 | GlobalMapper: Arbitrary-Shaped Urban Layout Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose a fully automatic approach to building layout generation using a graph attention networks. |
Liu He; Daniel Aliaga; |
793 | Membrane Potential Batch Normalization for Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The regulated data flow after the BN layer will be disturbed again by the membrane potential updating operation before the firing function, i.e., the nonlinear activation. Therefore, we advocate adding another BN layer before the firing function to normalize the membrane potential again, called MPBN. |
Yufei Guo; Yuhan Zhang; Yuanpei Chen; Weihang Peng; Xiaode Liu; Liwen Zhang; Xuhui Huang; Zhe Ma; |
794 | Enhancing Sample Utilization Through Sample Adaptive Augmentation in Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, existing SSL models overlook the characteristics of naive samples, and they just apply the same learning strategy to all samples. To further optimize the SSL model, we emphasize the importance of giving attention to naive samples and augmenting them in a more diverse manner. |
Guan Gui; Zhen Zhao; Lei Qi; Luping Zhou; Lei Wang; Yinghuan Shi; |
795 | Imitator: Personalized Speech-driven 3D Facial Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies, thus, resulting in unrealistic and inaccurate lip movements. To address this, we present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video and produces novel facial expressions matching the identity-specific speaking style and facial idiosyncrasies of the target actor. |
Balamurugan Thambiraja; Ikhsanul Habibie; Sadegh Aliakbarian; Darren Cosker; Christian Theobalt; Justus Thies; |
796 | Unified Coarse-to-Fine Alignment for Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a Unified Coarse-to-fine Alignment model, dubbed UCOFIA. |
Ziyang Wang; Yi-Lin Sung; Feng Cheng; Gedas Bertasius; Mohit Bansal; |
797 | Seeing Beyond The Patch: Scale-Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery Based on Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This shortcoming poses a significant challenge in processing complex and variable geo-objects, which results in semantic inconsistency in segmentation results. To address this challenge, we propose a dynamic scale perception framework, named GeoAgent, which adaptively captures appropriate scale context information outside the image patch based on the different geo-objects. |
Yinhe Liu; Sunan Shi; Junjue Wang; Yanfei Zhong; |
798 | Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, prompt tuning could undermine the generalizability of the pre-training models, because the learnable prompt tokens are easy to overfit to the limited training samples. To address these issues, we introduce a novel Gradient-RegulAted Meta-prompt learning (GRAM) framework that jointly meta-learns an efficient soft prompt initialization for better adaptation and a lightweight gradient regulating function for strong cross-domain generalizability in a meta-learning paradigm using only the unlabeled image-text pre-training data. |
Juncheng Li; Minghe Gao; Longhui Wei; Siliang Tang; Wenqiao Zhang; Mengze Li; Wei Ji; Qi Tian; Tat-Seng Chua; Yueting Zhuang; |
799 | Zero-Shot Composed Image Retrieval with Textual Inversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset. |
Alberto Baldrati; Lorenzo Agnolucci; Marco Bertini; Alberto Del Bimbo; |
800 | MUter: Machine Unlearning on Adversarially Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new approach called MUter for unlearning from ATMs. |
Junxu Liu; Mingsheng Xue; Jian Lou; Xiaoyu Zhang; Li Xiong; Zhan Qin; |
801 | WALDO: Future Video Synthesis Using Object Layer Decomposition and Parametric Flow Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents WALDO (WArping Layer-Decomposed Objects), a novel approach to the prediction of future video frames from past ones. |
Guillaume Le Moing; Jean Ponce; Cordelia Schmid; |
802 | ParCNetV2: Oversized Kernel with Enhanced Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a new convolutional neural network, ParCNetV2, that extends the research line of ParCNetV1 by bridging the gap between CNN and ViT. |
Ruihan Xu; Haokui Zhang; Wenze Hu; Shiliang Zhang; Xiaoyu Wang; |
803 | BiFF: Bi-level Future Fusion with Polyline-based Coordinate for Interactive Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Bi-level Future Fusion (BiFF) to explicitly capture future interactions between interactive agents. |
Yiyao Zhu; Di Luan; Shaojie Shen; |
804 | RealGraph: A Multiview Dataset for 4D Real-world Context Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a brand new scene understanding paradigm called "Context Graph Generation (CGG)", aiming at abstracting holistic semantic information in the complicated 4D world. |
Haozhe Lin; Zequn Chen; Jinzhi Zhang; Bing Bai; Yu Wang; Ruqi Huang; Lu Fang; |
805 | COOL-CHIC: Coordinate-based Low Complexity Hierarchical Image Codec Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce COOL-CHIC, a Coordinate-based Low Complexity Hierarchical Image Codec. |
Théo Ladune; Pierrick Philippe; Félix Henry; Gordon Clare; Thomas Leguay; |
806 | Normalizing Flows for Human Pose Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our model works directly on human pose graph sequences and is exceptionally lightweight ( 1K parameters), capable of running on any machine able to run the pose estimation with negligible additional resources. We leverage the highly compact pose representation in a normalizing flows framework, which we extend to tackle the unique characteristics of spatio-temporal pose data and show its advantages in this use case. |
Or Hirschorn; Shai Avidan; |
807 | Reconstructing Groups of People with Hypergraph Relational Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the mutual occlusion, severe scale variation, and complex spatial distribution, the current multi-person mesh recovery methods cannot produce accurate absolute body poses and shapes in large-scale crowded scenes. To address the obstacles, we fully exploit crowd features for reconstructing groups of people from a monocular image. |
Buzhen Huang; Jingyi Ju; Zhihao Li; Yangang Wang; |
808 | PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards precise map element learning, we propose a simple yet effective architecture named PivotNet, which adopts unified pivot-based map representations and is formulated as a direct set prediction paradigm. |
Wenjie Ding; Limeng Qiao; Xi Qiu; Chi Zhang; |
809 | Universal Domain Adaptation Via Compressive Attention Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The mainstream methods make judgments based on the sample features, which overemphasizes global information while ignoring the most crucial local objects in the image, resulting in limited accuracy. To address this issue, we propose a Universal Attention Matching (UniAM) framework by exploiting the self-attention mechanism in vision transformer to capture the crucial object information. |
Didi Zhu; Yinchuan Li; Junkun Yuan; Zexi Li; Kun Kuang; Chao Wu; |
810 | Contactless Pulse Estimation Leveraging Pseudo Labels and Self-Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often collapse to learning irrelevant periodicities when dealing with interferences such as head motions, facial dynamics, and video compression. To address this limitation, firstly, we enhance the current self-supervised learning by introducing more reliable and explicit contrastive constraints. |
Zhihua Li; Lijun Yin; |
811 | Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for editing NeRF scenes with text-instructions. |
Ayaan Haque; Matthew Tancik; Alexei A. Efros; Aleksander Holynski; Angjoo Kanazawa; |
812 | Point2Mask: Point-supervised Panoptic Segmentation Via Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an effective method, namely Point2Mask, to achieve high-quality panoptic prediction using only a single random point annotation per target for training. |
Wentong Li; Yuqian Yuan; Song Wang; Jianke Zhu; Jianshu Li; Jian Liu; Lei Zhang; |
813 | Multi-Task Learning with Knowledge Distillation for Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new knowledge distillation procedure with an alternative match for MTL of dense prediction based on two simple design principles. |
Yangyang Xu; Yibo Yang; Lefei Zhang; |
814 | What Does A Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. |
Sarah Pratt; Ian Covert; Rosanne Liu; Ali Farhadi; |
815 | Scene As Occupancy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose OccNet, a multi-view vision centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. |
Wenwen Tong; Chonghao Sima; Tai Wang; Li Chen; Silei Wu; Hanming Deng; Yi Gu; Lewei Lu; Ping Luo; Dahua Lin; Hongyang Li; |
816 | U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose U-RED, an Unsupervised shape REtrieval and Deformation pipeline that takes an arbitrary object observation as input, typically captured by RGB images or scans, and jointly retrieves and deforms the geometrically similar CAD models from a pre-established database to tightly match the target. |
Yan Di; Chenyangguang Zhang; Ruida Zhang; Fabian Manhardt; Yongzhi Su; Jason Rambach; Didier Stricker; Xiangyang Ji; Federico Tombari; |
817 | RFLA: A Stealthy Reflected Light Adversarial Attack in The Physical World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Reflected Light Attack (RFLA), featuring effective and stealthy in both the digital and physical world, which is implemented by placing the color transparent plastic sheet and a paper cut of a specific shape in front of the mirror to create different colored geometries on the target object. |
Donghua Wang; Wen Yao; Tingsong Jiang; Chao Li; Xiaoqian Chen; |
818 | Nearest Neighbor Guidance for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these scores often suffer from overconfidence issues, misclassifying OOD samples distant from the in-distribution region. To address this challenge, we propose a method called Nearest Neighbor Guidance (NNGuide) that guides the classifier-based score to respect the boundary geometry of the data manifold. |
Jaewoo Park; Yoon Gyo Jung; Andrew Beng Jin Teoh; |
819 | PatchCT: Aligning Patch Set and Label Set with Conditional Transport for Multi-Label Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper considers the semantic consistency of the latent space between the visual patch and linguistic label domains and introduces the conditional transport (CT) theory to bridge the acknowledged gap. |
Miaoge Li; Dongsheng Wang; Xinyang Liu; Zequn Zeng; Ruiying Lu; Bo Chen; Mingyuan Zhou; |
820 | VI-Net: Boosting Category-level 6D Object Pose Estimation Via Learning Decoupled Rotations on The Spherical Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel rotation estimation network, termed as VI-Net, to make the task easier by decoupling the rotation as the combination of a viewpoint rotation and an in-plane rotation. |
Jiehong Lin; Zewei Wei; Yabin Zhang; Kui Jia; |
821 | ICD-Face: Intra-class Compactness Distillation for Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, after using FCD, we observe that the intra-class similarities of the student model are lower than the intra-class similarities of the teacher model a lot. Therefore, we propose an effective FR distillation method called ICD-Face by introducing intra-class compactness distillation into the existing distillation framework. |
Zhipeng Yu; Jiaheng Liu; Haoyu Qin; Yichao Wu; Kun Hu; Jiayi Tian; Ding Liang; |
822 | Diffusion-SDF: Conditional Generative Modeling of Signed Distance Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes Diffusion-SDF, a generative model for shape completion, single-view reconstruction, and reconstruction of real-scanned point clouds. |
Gene Chou; Yuval Bahat; Felix Heide; |
823 | Open-Vocabulary Object Detection With An Open Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, an open corpus, composed of a set of external object concepts and clustered to several centroids, is introduced to improve the generalization ability in the detector. |
Jiong Wang; Huiming Zhang; Haiwen Hong; Xuan Jin; Yuan He; Hui Xue; Zhou Zhao; |
824 | Long-range Multimodal Pretraining for Movie Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Long-range Multimodal Pretraining, a strategy, and a model that leverages movie data to train transferable multimodal and cross-modal encoders. |
Dawit Mureja Argaw; Joon-Young Lee; Markus Woodson; In So Kweon; Fabian Caba Heilbron; |
825 | MRM: Masked Relation Modeling for Medical Image Pre-Training with Genetics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to leverage genetics to boost image pre-training and present a masked relation modeling (MRM) framework. |
Qiushi Yang; Wuyang Li; Baopu Li; Yixuan Yuan; |
826 | Adverse Weather Removal with Codebook Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent advancements in codebook and vector quantization (VQ) techniques, we present a novel Adverse Weather Removal network with Codebook Priors (AWRCP) to address the problem of unified adverse weather removal. |
Tian Ye; Sixiang Chen; Jinbin Bai; Jun Shi; Chenghao Xue; Jingxia Jiang; Junjie Yin; Erkang Chen; Yun Liu; |
827 | Spectrum-guided Multi-granularity Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. |
Bo Miao; Mohammed Bennamoun; Yongsheng Gao; Ajmal Mian; |
828 | Sound Source Localization Is All About Cross-Modal Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cross-modal semantic understanding is important in understanding semantically mismatched audio-visual events, e.g., silent objects, or off-screen sounds. To account for this, we propose a cross-modal alignment task as a joint task with sound source localization to better learn the interaction between audio and visual modalities. |
Arda Senocak; Hyeonggon Ryu; Junsik Kim; Tae-Hyun Oh; Hanspeter Pfister; Joon Son Chung; |
829 | MAP: Towards Balanced Generalization of IID and OOD Through Model-Agnostic Adapters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we investigate an intriguing problem of balancing IID and OOD generalizations and propose a novel Model Agnostic adaPters (MAP) method, which is more reliable and effective for distribution-shift-agnostic real-world data. |
Min Zhang; Junkun Yuan; Yue He; Wenbin Li; Zhengyu Chen; Kun Kuang; |
830 | Exploring Group Video Captioning with Efficient Relational Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a new task, group video captioning, which aims to infer the desired content among a group of target videos and describe it with another group of related reference videos. |
Wang Lin; Tao Jin; Ye Wang; Wenwen Pan; Linjun Li; Xize Cheng; Zhou Zhao; |
831 | ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods for trajectory prediction are either inefficient or sacrifice accuracy. To address this challenge, we propose ADAPT, a novel approach for jointly predicting the trajectories of all agents in the scene with dynamic weight learning. |
Görkay Aydemir; Adil Kaan Akan; Fatma Güney; |
832 | TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, as the input feature is fully shared and each task decoder also shares decoding parameters for different input samples, it leads to a static feature decoding process, producing less discriminative task-specific representations. To tackle this limitation, we propose TaskExpert, a novel multi-task mixture-of-experts model that enables learning multiple representative task-generic feature spaces and decoding task-specific features in a dynamic manner. |
Hanrong Ye; Dan Xu; |
833 | Meta OOD Learning For Continuously Adaptive OOD Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel and more realistic setting called continuously adaptive out-of-distribution (CAOOD) detection which targets on developing an OOD detection model that enables dynamic and quick adaptation to a new arriving distribution, with insufficient ID samples during deployment time. |
Xinheng Wu; Jie Lu; Zhen Fang; Guangquan Zhang; |
834 | MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel self-supervised framework for 3D pose transfer which can be trained in unsupervised, semi-supervised, or fully supervised settings without any correspondence labels. |
Jiaze Sun; Zhixiang Chen; Tae-Kyun Kim; |
835 | BlendFace: Re-designing Identity Encoders for Face-Swapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although a lot of studies seems to have proposed almost satisfactory solutions, we notice previous methods still suffer from an identity-attribute entanglement that causes undesired attributes swapping because widely used identity encoders, e.g., ArcFace, have some crucial attribute biases owing to their pretraining on face recognition tasks. To address this issue, we design BlendFace, a novel identity encoder for face-swapping. |
Kaede Shiohara; Xingchao Yang; Takafumi Taketomi; |
836 | Test-time Personalizable Forecasting of 3D Human Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this case, the source pre-trained model has a low ability to adapt to these out-of-source characteristics, resulting in an unreliable prediction. To tackle this issue, we propose a novel helper-predictor test-time personalization approach (H/P-TTP), which allows for a generalizable representation of out-of-source subjects to gain more realistic predictions. |
Qiongjie Cui; Huaijiang Sun; Jianfeng Lu; Weiqing Li; Bin Li; Hongwei Yi; Haofan Wang; |
837 | Few-shot Continual Infomax Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Few-shot Continual Infomax Learning (FCIL) framework that makes a deep model to continually/incrementally learn new concepts from few labeled samples, relieving the catastrophic forgetting of past knowledge. |
Ziqi Gu; Chunyan Xu; Jian Yang; Zhen Cui; |
838 | A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to use text as the guidance to create graphic layouts, i.e., Text-to-Layout, aiming to lower the design barriers. |
Jiawei Lin; Jiaqi Guo; Shizhao Sun; Weijiang Xu; Ting Liu; Jian-Guang Lou; Dongmei Zhang; |
839 | DreamBooth3D: Subject-Driven Text-to-3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. |
Amit Raj; Srinivas Kaza; Ben Poole; Michael Niemeyer; Nataniel Ruiz; Ben Mildenhall; Shiran Zada; Kfir Aberman; Michael Rubinstein; Jonathan Barron; Yuanzhen Li; Varun Jampani; |
840 | DARTH: Holistic Test-time Adaptation for Multiple Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the effect of domain shift on appearance-based trackers, and introduce DARTH, a holistic test-time adaptation framework for MOT. |
Mattia Segu; Bernt Schiele; Fisher Yu; |
841 | Multi-interactive Feature Learning and A Full-time Multi-modality Benchmark for Image Fusion and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Early efforts focus on boosting the performance for only one task, e.g., fusion or segmentation, making it hard to reach `Best of Both Worlds’. To overcome this issue, in this paper, we propose a Multi-interactive Feature learning architecture for image fusion and segmentation, namely SegMiF, and exploit dual-task correlation to promote the performance of both tasks. |
Jinyuan Liu; Zhu Liu; Guanyao Wu; Long Ma; Risheng Liu; Wei Zhong; Zhongxuan Luo; Xin Fan; |
842 | BaRe-ESA: A Riemannian Framework for Unregistered Human Body Shapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Basis Restricted Elastic Shape Analysis (BaRe-ESA), a novel Riemannian framework for human body scan representation, interpolation and extrapolation. |
Emmanuel Hartman; Emery Pierson; Martin Bauer; Nicolas Charon; Mohamed Daoudi; |
843 | Skip-Plan: Procedure Planning in Instructional Videos Via Condensed Action Space Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Skip-Plan, a condensed action space learning method for procedure planning in instructional videos. |
Zhiheng Li; Wenjia Geng; Muheng Li; Lei Chen; Yansong Tang; Jiwen Lu; Jie Zhou; |
844 | A Retrospect to Multi-prompt Learning Across Vision and Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to provide a principled retrospect for vision-language multi-prompt learning. |
Ziliang Chen; Xin Huang; Quanlong Guan; Liang Lin; Weiqi Luo; |
845 | Sparse Instance Conditioned Multimodal Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective Sparse Instance Conditioned Network (SICNet), which gives a balanced solution between goal-conditioned and instance-conditioned methods. |
Yonghao Dong; Le Wang; Sanping Zhou; Gang Hua; |
846 | Label Shift Adapter for Test-Time Adaptation Under Covariate and Label Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we discover that the majority of existing TTA methods fail to address the coexistence of covariate and label shifts. To tackle this challenge, we propose a novel label shift adapter that can be incorporated into existing TTA approaches to deal with label shifts during the TTA process effectively. |
Sunghyun Park; Seunghan Yang; Jaegul Choo; Sungrack Yun; |
847 | NAPA-VQ: Neighborhood-Aware Prototype Augmentation with Vector Quantization for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the lack of old data, NECIL methods struggle to discriminate between old and new classes causing their feature representations to overlap. We propose NAPA-VQ: Neighborhood Aware Prototype Augmentation with Vector Quantization, a framework that reduces this class overlap in NECIL. |
Tamasha Malepathirana; Damith Senanayake; Saman Halgamuge; |
848 | Dynamic Snake Convolution Based on Topological Geometric Constraints for Tubular Structure Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, many factors complicate the task, including thin local structures and variable global morphologies. In this work, we note the specificity of tubular structures and use this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraint. |
Yaolei Qi; Yuting He; Xiaoming Qi; Yuan Zhang; Guanyu Yang; |
849 | Unsupervised Open-Vocabulary Object Localization in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization. |
Ke Fan; Zechen Bai; Tianjun Xiao; Dominik Zietlow; Max Horn; Zixu Zhao; Carl-Johann Simon-Gabriel; Mike Zheng Shou; Francesco Locatello; Bernt Schiele; Thomas Brox; Zheng Zhang; Yanwei Fu; Tong He; |
850 | Dataset Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, as the gradient calculation is coupled with the specific network architecture, the synthesized dataset is biased and performs poorly when used for training unseen architectures. To address these limitations, we present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets which can be used for training any neural network architectures. |
Daquan Zhou; Kai Wang; Jianyang Gu; Xiangyu Peng; Dongze Lian; Yifan Zhang; Yang You; Jiashi Feng; |
851 | Unsupervised Video Deraining with An Event Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach by integrating a bio-inspired event camera into the unsupervised video deraining pipeline, which enables us to capture high temporal resolution information and model complex rain characteristics. |
Jin Wang; Wenming Weng; Yueyi Zhang; Zhiwei Xiong; |
852 | Overcoming Forgetting Catastrophe in Quantization-Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing quantization processes learned only from the current data tend to suffer from forgetting catastrophe on streaming data, i.e., significant performance decrement on old task data after being trained on new tasks. Therefore, we propose a lifelong quantization process, LifeQuant, to address the problem. |
Ting-An Chen; De-Nian Yang; Ming-Syan Chen; |
853 | DIME-FM : DIstilling Multimodal and Efficient Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences. |
Ximeng Sun; Pengchuan Zhang; Peizhao Zhang; Hardik Shah; Kate Saenko; Xide Xia; |
854 | Boosting Single Image Super-Resolution Via Partial Channel Shifting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a straightforward and generic approach for feature enhancement that can effectively promote the performance of SR models, dubbed partial channel shifting (PCS). |
Xiaoming Zhang; Tianrui Li; Xiaole Zhao; |
855 | Learning to Upsample By Learning to Sample Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DySample, an ultra-lightweight and effective dynamic upsampler. |
Wenze Liu; Hao Lu; Hongtao Fu; Zhiguo Cao; |
856 | LayoutDiffusion: Improving Graphic Layout Generation By Discrete Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel generative model named LayoutDiffusion for automatic layout generation. |
Junyi Zhang; Jiaqi Guo; Shizhao Sun; Jian-Guang Lou; Dongmei Zhang; |
857 | Efficiently Robustify Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first benchmark the performance of these models under different perturbations and datasets thereby representing real-world shifts, and highlight their degrading performance under these shifts. We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks and can also lead them to forget some of the desired characterstics. |
Nishant Jain; Harkirat Behl; Yogesh Singh Rawat; Vibhav Vineet; |
858 | Efficient Video Prediction Via Sparsely Conditioned Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel generative model for video prediction based on latent flow matching, an efficient alternative to diffusion-based models. |
Aram Davtyan; Sepehr Sameni; Paolo Favaro; |
859 | Surface Normal Clustering for Implicit Representation of Manhattan Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to leverage the geometric prior of Manhattan scenes to improve the implicit neural radiance field representations. |
Nikola Popovic; Danda Pani Paudel; Luc Van Gool; |
860 | Distracting Downpour: Adversarial Weather Attacks for Motion Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this work, we present a novel attack on motion estimation that exploits adversarially optimized particles to mimic weather effects like snowflakes, rain streaks or fog clouds. |
Jenny Schmalfuss; Lukas Mehl; Andrés Bruhn; |
861 | Adaptive Similarity Bootstrapping for Self-Distillation Based Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to adaptively bootstrap neighbors based on the estimated quality of the latent space. |
Tim Lebailly; Thomas Stegmüller; Behzad Bozorgtabar; Jean-Philippe Thiran; Tinne Tuytelaars; |
862 | Generalized Differentiable RANSAC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose -RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. |
Tong Wei; Yash Patel; Alexander Shekhovtsov; Jiri Matas; Daniel Barath; |
863 | Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our proposed model is under the DUN framework and we propose a 3D Convolution-Transformer Mixture (CTM) module with a 3D efficient and scalable attention model plugged in, which helps fully learn the correlation between temporal and spatial dimensions by virtue of Transformer. |
Siming Zheng; Xin Yuan; |
864 | Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on a novel unsupervised video semantic compression problem, where video semantics is compressed in a downstream task-agnostic manner. |
Yuan Tian; Guo Lu; Guangtao Zhai; Zhiyong Gao; |
865 | ResQ: Residual Quantization for Video Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that residuals, as the difference in network activations between two neighboring frames, exhibit properties that make them highly quantizable. Based on this observation, we propose a novel quantization scheme for video networks coined as Residual Quantization. |
Davide Abati; Haitam Ben Yahia; Markus Nagel; Amirhossein Habibian; |
866 | Inverse Compositional Learning for Weakly-supervised Relation Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we introduce a novel approach called inverse compositional learning (ICL) for weakly-supervised video relation grounding. |
Huan Li; Ping Wei; Zeyu Ma; Nanning Zheng; |
867 | XMem++: Production-level Video Segmentation From Few Annotated Frames Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel semi-supervised video object segmentation (SSVOS) model, XMem++, that improves existing memory-based models, with a permanent memory module. |
Maksym Bekuzarov; Ariana Bermudez; Joon-Young Lee; Hao Li; |
868 | MHCN: A Hyperbolic Neural Network Model for Multi-view Hierarchical Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods often overlook this interplay due to the simple heuristic agglomerative strategies or the decoupling of multi-view representation learning and hierarchical modeling, thus leading to insufficient representation learning. To address these issues, this paper proposes a novel Multi-view Hierarchical Clustering Network (MHCN) model by performing simultaneous multi-view learning and hierarchy modeling. |
Fangfei Lin; Bing Bai; Yiwen Guo; Hao Chen; Yazhou Ren; Zenglin Xu; |
869 | End-to-End Diffusion Latent Optimization Improves Classifier Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, which leads to misaligned gradients and sub-optimal control. We highlight this approximation’s shortcomings and propose a novel guidance method: Direct Optimization of Diffusion Latents (DOODL), which enables plug-and-play guidance by optimizing diffusion latents w.r.t. the gradients of a pre-trained classifier on the true generated pixels, using an invertible diffusion process to achieve memory-efficient backpropagation. |
Bram Wallace; Akash Gokul; Stefano Ermon; Nikhil Naik; |
870 | FineRecon: Depth-aware Feed-forward Network for Detailed 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the reconstructed geometry, typically represented as a 3D truncated signed distance function (TSDF), is often coarse without fine geometric details. To address this problem, we propose three effective solutions for improving the fidelity of inference-based 3D reconstructions. |
Noah Stier; Anurag Ranjan; Alex Colburn; Yajie Yan; Liang Yang; Fangchang Ma; Baptiste Angles; |
871 | Navigating to Objects Specified By Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform this task in both simulation and the real world. |
Jacob Krantz; Theophile Gervet; Karmesh Yadav; Austin Wang; Chris Paxton; Roozbeh Mottaghi; Dhruv Batra; Jitendra Malik; Stefan Lee; Devendra Singh Chaplot; |
872 | TRM-UAP: Enhancing The Transferability of Data-Free Universal Adversarial Perturbation Via Truncated Ratio Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel data-free universal attack without depending on any real data samples through truncated ratio maximization, which we term as TRM-UAP. |
Yiran Liu; Xin Feng; Yunlong Wang; Wu Yang; Di Ming; |
873 | LATR: 3D Lane Detection from Monocular Images with Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. |
Yueru Luo; Chaoda Zheng; Xu Yan; Tang Kun; Chao Zheng; Shuguang Cui; Zhen Li; |
874 | Scratching Visual Transformer’s Back with Uniform Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the role of MSA in terms of the different axis, density. |
Nam Hyeon-Woo; Kim Yu-Ji; Byeongho Heo; Dongyoon Han; Seong Joon Oh; Tae-Hyun Oh; |
875 | Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new T2V generation setting–One-Shot Video Tuning, where only one text-video pair is presented. |
Jay Zhangjie Wu; Yixiao Ge; Xintao Wang; Stan Weixian Lei; Yuchao Gu; Yufei Shi; Wynne Hsu; Ying Shan; Xiaohu Qie; Mike Zheng Shou; |
876 | Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper proposes the Box Decouple-Couple(BDC) strategy in the inference, which no longer discards the overlapping boxes, but decouples the corner points of these boxes. |
Yilong Lv; Min Li; Yujie He; Shaopeng Li; Zhuzhen He; Aitao Yang; |
877 | Environment-Invariant Curriculum Relation Learning for Fine-Grained Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the two imbalance cases, we propose a novel Environment Invariant Curriculum Relation learning (EICR) method, which can be applied in a plug-and-play fashion to existing SGG methods. |
Yukuan Min; Aming Wu; Cheng Deng; |
878 | Extensible and Efficient Proxy for Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, two significant drawbacks hinder the wide adoption of these efficient proxies: 1. they are not adaptive to various NAS search spaces and 2. they are not extensible to multi-modality downstream tasks. To address these two issues, we first propose an Extensible proxy (Eproxy) that utilizes self-supervised, few-shot training to achieve near-zero costs. |
Yuhong Li; Jiajie Li; Cong Hao; Pan Li; Jinjun Xiong; Deming Chen; |
879 | Zenseact Open Dataset: A Large-Scale and Diverse Multimodal Dataset for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing datasets for autonomous driving (AD) often lack diversity and long-range capabilities, focusing instead on 360* perception and temporal reasoning. To address this gap, we introduce ZOD, a large-scale and diverse multimodal dataset collected over two years in various European countries, covering an area 9x that of existing datasets. |
Mina Alibeigi; William Ljungbergh; Adam Tonderski; Georg Hess; Adam Lilja; Carl Lindström; Daria Motorniuk; Junsheng Fu; Jenny Widahl; Christoffer Petersson; |
880 | MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel Multimodality-Aware Autoencoder-based affordance Learning (MAAL) for the 3D object affordance problem. |
Yuanzhi Liang; Xiaohan Wang; Linchao Zhu; Yi Yang; |
881 | Generalizable Decision Boundaries: Dualistic Meta-Learning for Open Set Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel meta-learning-based framework called dualistic MEta-learning with joint DomaIn-Class matching (MEDIC), which considers gradient matching towards inter-domain and inter-class splits simultaneously to find a generalizable boundary balanced for all tasks. |
Xiran Wang; Jian Zhang; Lei Qi; Yinghuan Shi; |
882 | Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first establish a comprehensive, and rigorous point cloud adversarial robustness benchmark to evaluate adversarial robustness, which can provide a detailed understanding of the effects of the defense and attack methods. We then collect existing defense tricks in point cloud adversarial defenses and then perform extensive and systematic experiments to identify an effective combination of these tricks. |
Qiufan Ji; Lin Wang; Cong Shi; Shengshan Hu; Yingying Chen; Lichao Sun; |
883 | Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a weakly supervised learning method for RIS that only uses readily available image-text pairs. |
Jungbeom Lee; Sungjin Lee; Jinseok Nam; Seunghak Yu; Jaeyoung Do; Tara Taghavi; |
884 | Poincare ResNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate how to learn hyperbolic representations of visual data directly from the pixel-level. |
Max van Spengler; Erwin Berkhout; Pascal Mettes; |
885 | Parameterized Cost Volume for Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a parameterized cost volume to encode the entire disparity space using multi-Gaussian distribution. |
Jiaxi Zeng; Chengtang Yao; Lidong Yu; Yuwei Wu; Yunde Jia; |
886 | SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of out-of-distribution (OOD) detection for the task of object detection. |
Samuel Wilson; Tobias Fischer; Feras Dayoub; Dimity Miller; Niko Sünderhauf; |
887 | SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These distortion patterns are independent of the visual content and provide informative cues for rectification. To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning. |
Hao Feng; Wendi Wang; Jiajun Deng; Wengang Zhou; Li Li; Houqiang Li; |
888 | Subclass-balancing Contrastive Learning for Long-tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods such as data reweighing, resampling, and supervised contrastive learning enforce the class balance with a price of introducing imbalance between instances of head class and tail class, which may ignore the underlying rich semantic substructures of the former and exaggerate the biases in the latter. We overcome these drawbacks by a novel "subclass-balancing contrastive learning (SBCL)" approach that clusters each head class into multiple subclasses of similar sizes as the tail classes and enforce representations to capture the two-layer class hierarchy between the original classes and their subclasses. |
Chengkai Hou; Jieyu Zhang; Haonan Wang; Tianyi Zhou; |
889 | Generalized Lightness Adaptation with Channel Selective Normalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods typically work well on their trained lightness conditions but perform poorly in unknown ones due to their limited generalization ability. To address this limitation, we propose a novel generalized lightness adaptation algorithm that extends conventional normalization techniques through a channel filtering design, dubbed Channel Selective Normalization (CSNorm). |
Mingde Yao; Jie Huang; Xin Jin; Ruikang Xu; Shenglong Zhou; Man Zhou; Zhiwei Xiong; |
890 | Omnidirectional Information Gathering for Knowledge Transfer-Based Audio-Visual Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we present ORAN, an omnidirectional audio-visual navigator based on cross-task navigation skill transfer. |
Jinyu Chen; Wenguan Wang; Si Liu; Hongsheng Li; Yi Yang; |
891 | Multi-Scale Bidirectional Recurrent Network with Hybrid Correlation for Point Cloud Based Scene Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel multi-scale bidirectional recurrent architecture that iteratively optimizes the coarse-to-fine scene flow estimation. |
Wencan Cheng; Jong Hwan Ko; |
892 | Dynamic Mesh-Aware Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper designs a two-way coupling between mesh and NeRF during rendering and simulation. |
Yi-Ling Qiao; Alexander Gao; Yiran Xu; Yue Feng; Jia-Bin Huang; Ming C. Lin; |
893 | Learning Support and Trivial Prototypes for Interpretable Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to improve the classification of ProtoPNet with a new method to learn support prototypes that lie near the classification boundary in the feature space, as suggested by the SVM theory. |
Chong Wang; Yuyuan Liu; Yuanhong Chen; Fengbei Liu; Yu Tian; Davis McCarthy; Helen Frazer; Gustavo Carneiro; |
894 | Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, such spatial misalignment between these two tasks greatly hinders DETR’s training. Therefore, in this work, we focus on decoupling localization and classification tasks in DETR. |
Manyuan Zhang; Guanglu Song; Yu Liu; Hongsheng Li; |
895 | GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, performing gradient inversion attacks in the latent space of the GAN model limits their expression ability and generalizability. To tackle these challenges, we propose Gradient Inversion over Feature Domains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers. |
Hao Fang; Bin Chen; Xuan Wang; Zhi Wang; Shu-Tao Xia; |
896 | VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. |
Yanyuan Qiao; Zheng Yu; Qi Wu; |
897 | Generalized Sum Pooling for Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following this perspective, we generalize GAP and propose a learnable generalized sum pooling method (GSP). |
Yeti Z. Gürbüz; Ozan Sener; A. Aydin Alatan; |
898 | AlignDet: Aligning Pre-training and Fine-tuning in Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector’s performance, generalization ability, and convergence speed. |
Ming Li; Jie Wu; Xionghui Wang; Chen Chen; Jie Qin; Xuefeng Xiao; Rui Wang; Min Zheng; Xin Pan; |
899 | Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current methods generate the LDR stack with predetermined exposure values (EVs), which may limit the quality of HDR reconstruction. To address this, we propose the continuous exposure value representation (CEVR) model, which uses an implicit function to generate LDR images with arbitrary EVs, including those unseen during training. |
Su-Kai Chen; Hung-Lin Yen; Yu-Lun Liu; Min-Hung Chen; Hou-Ning Hu; Wen-Hsiao Peng; Yen-Yu Lin; |
900 | DREAM: Efficient Dataset Distillation By Representative Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we propose a novel matching strategy named as Dataset distillation by REpresentAive Matching (DREAM), where only representative original images are selected for matching. |
Yanqing Liu; Jianyang Gu; Kai Wang; Zheng Zhu; Wei Jiang; Yang You; |
901 | MixSynthFormer: A Transformer Encoder-like Structure with Mixed Synthetic Self-attention for Efficient Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods only consider the temporal relation while neglecting spatial attention, and the complexity of dot product self-attention calculations in transformers are quadratically proportional to the embedding size. To address these limitations, we propose MixSynthFormer, a transformer encoder-like model with MLP-based mixed synthetic attention. |
Yuran Sun; Alan William Dougherty; Zhuoying Zhang; Yi King Choi; Chuan Wu; |
902 | Focus on Your Target: A Dual Teacher-Student Framework for Domain-Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the issue, we propose a novel dual teacher-student (DTS) framework and equip it with a bidirectional learning strategy. |
Xinyue Huo; Lingxi Xie; Wengang Zhou; Houqiang Li; Qi Tian; |
903 | Enhanced Meta Label Correction for Coping with Label Corruption Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose EMLC, an enhanced meta-label correction approach for the LNL problem. |
Mitchell Keren Taraday; Chaim Baskin; |
904 | Dense Text-to-Image Generation with Attention Modulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. |
Yunji Kim; Jiyoung Lee; Jin-Hwa Kim; Jung-Woo Ha; Jun-Yan Zhu; |
905 | HumanMAC: Masked Motion Completion for Human Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in practice, they are still unsatisfactory due to several issues, including complicated loss constraints, cumbersome training processes, and scarce switch of different categories of motions in prediction. In this paper, to address the above issues, we jump out of the foregoing style and propose a novel framework from a new perspective. |
Ling-Hao Chen; JiaWei Zhang; Yewen Li; Yiren Pang; Xiaobo Xia; Tongliang Liu; |
906 | Will Large-scale Generative Models Corrupt Future Datasets? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These trends lead us to a research question: "will such generated images impact the quality of future datasets and the performance of computer vision models positively or negatively?" This paper empirically answers this question by simulating contamination. |
Ryuichiro Hataya; Han Bao; Hiromi Arai; |
907 | SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these feature grids come at the expense of large memory consumption which can be a bottleneck for storage and streaming applications. In this work, we propose SHACIRA, a simple yet effective task-agnostic framework for compressing such feature grids with no additional post-hoc pruning/quantization stages. |
Sharath Girish; Abhinav Shrivastava; Kamal Gupta; |
908 | Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle this, state-of-the-art methods adopt complex cross-modal modeling techniques to fuse the text information into video frame representations, which, however, incurs severe efficiency issues in large-scale retrieval systems as the video representations must be recomputed online for every text query. In this paper, we discard this problematic cross-modal fusion process and aim to learn semantically-enhanced representations purely from the video, so that the video representations can be computed offline and reused for different texts. |
Chaorui Deng; Qi Chen; Pengda Qin; Da Chen; Qi Wu; |
909 | Video Action Recognition with Attentive Semantic Units Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enhance the alignments between visual contents and the SUs, we introduce a multi-region module (MRA) to the visual branch of the VLM. |
Yifei Chen; Dapeng Chen; Ruijin Liu; Hao Li; Wei Peng; |
910 | Sentence Attention Blocks for Answer Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While a wide variety of attention methods have been introduced for this task, they suffer from the following three problems: designs that do not allow the usage of pre-trained networks and do not benefit from large data pre-training, custom designs that are not based on well-grounded previous designs, therefore limiting the learning power of the network, or complicated designs that make it challenging to re-implement or improve them. In this paper, we propose a novel architectural block, which we term Sentence Attention Block, to solve these problems. |
Seyedalireza Khoshsirat; Chandra Kambhamettu; |
911 | Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an end-to-end framework for fast temporal grounding, which is able to model an hours-long video with one-time network execution. |
Yulin Pan; Xiangteng He; Biao Gong; Yiliang Lv; Yujun Shen; Yuxin Peng; Deli Zhao; |
912 | A Low-Shot Object Counting Network With Iterative Prototype Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA). |
Nikola Đukić; Alan Lukežič; Vitjan Zavrtanik; Matej Kristan; |
913 | Towards Fairness-aware Adversarial Network Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an end-to-end learnable framework for fairness-aware network pruning, which optimizes both pruning and debias tasks jointly by adversarial training against those final evaluation metrics like accuracy for pruning, and disparate impact (DI) and equalized odds (DEO) for fairness. |
Lei Zhang; Zhibo Wang; Xiaowei Dong; Yunhe Feng; Xiaoyi Pang; Zhifei Zhang; Kui Ren; |
914 | VoroMesh: Learning Watertight Surface Meshes with Voronoi Diagrams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present VoroMesh, a novel and differentiable of watertight 3D shape surfaces. |
Nissim Maruani; Roman Klokov; Maks Ovsjanikov; Pierre Alliez; Mathieu Desbrun; |
915 | Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is a challenge that image models are limited in their ability to analyze the temporal aspects of videos, which is crucial for a successful video attack. To address this challenge, we introduce the Breaking Temporal Consistancy (BTC) method, which is the first attempt to incorporate temporal information into video attacks using image models. |
Hee-Seon Kim; Minji Son; Minbeom Kim; Myung-Joon Kwon; Changick Kim; |
916 | Smoothness Similarity Regularization for Few-Shot GAN Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing methods perform well when the dataset for pre-training is structurally similar to the target dataset, the approaches suffer from training instabilities or memorization issues when the objects in the two domains have a very different structure. To mitigate this limitation, we propose a new smoothness similarity regularization that transfers the inherently learned smoothness of the pre-trained GAN to the few-shot target domain even if the two domains are very different. |
Vadim Sushko; Ruyu Wang; Juergen Gall; |
917 | Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem that fine-grained annotated data is difficult to obtain, we propose to leverage weakly supervised annotations to learn the 3D visual grounding model, i.e., only coarse scene-sentence correspondences are used to learn object-sentence links. |
Zehan Wang; Haifeng Huang; Yang Zhao; Linjun Li; Xize Cheng; Yichen Zhu; Aoxiong Yin; Zhou Zhao; |
918 | What Does CLIP Know About A Red Circle? Visual Prompt Engineering for VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we explore the idea of visual prompt engineering for solving computer vision tasks beyond classification by editing in image space instead of text. |
Aleksandar Shtedritski; Christian Rupprecht; Andrea Vedaldi; |
919 | MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Multimodal alignmEnt aGgregation and distillAtion (MEGA) for cinematic long-video segmentation. |
Najmeh Sadoughi; Xinyu Li; Avijit Vajpayee; David Fan; Bing Shuai; Hector Santos-Villalobos; Vimal Bhat; Rohith MV; |
920 | DiffRate : Differentiable Compression Rate for Efficient Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although recent advanced approaches achieved great success, they need to carefully handcraft a compression rate (i.e. number of tokens to remove), which is tedious and leads to sub-optimal performance. To tackle this problem, we propose Differentiable Compression Rate (DiffRate), a novel token compression method that has several appealing properties prior arts do not have. |
Mengzhao Chen; Wenqi Shao; Peng Xu; Mingbao Lin; Kaipeng Zhang; Fei Chao; Rongrong Ji; Yu Qiao; Ping Luo; |
921 | ZPROBE: Zero Peek Robustness Checks for Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We establish the first private robustness check that uses high break point rank-based statistics on aggregated model updates. |
Zahra Ghodsi; Mojan Javaheripi; Nojan Sheybani; Xinqiao Zhang; Ke Huang; Farinaz Koushanfar; |
922 | LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. |
Cong Wang; Yu-Ping Wang; Dinesh Manocha; |
923 | Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation. |
Haozhi Cao; Yuecong Xu; Jianfei Yang; Pengyu Yin; Shenghai Yuan; Lihua Xie; |
924 | Exploring Positional Characteristics of Dual-Pixel Data for Camera Autofocus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, dual-pixel data is prone to multiple error sources in its image capturing process, including lens shading or distortions due to the inherent optical characteristics of the lens. We observe that, while these degradations are hard to model using prior knowledge, they are correlated with the spatial position of the pixels within the image sensor area, and we propose a learning-based autofocus model with positional encodings (PE) to capture these patterns. |
Myungsub Choi; Hana Lee; Hyong-euk Lee; |
925 | Heterogeneous Forgetting Compensation for Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing CIL methods unreasonably assume that all old categories have the same forgetting pace, and neglect negative influence of forgetting heterogeneity among different old classes on forgetting compensation. To surmount the above challenges, we develop a novel Heterogeneous Forgetting Compensation (HFC) model, which can resolve heterogeneous forgetting of easy-to-forget and hard-to-forget old categories from both representation and gradient aspects. |
Jiahua Dong; Wenqi Liang; Yang Cong; Gan Sun; |
926 | FemtoDet: An Object Detection Baseline for Energy Versus Performance Tradeoffs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to serve as a baseline by designing detectors to reach tradeoffs between energy and performance from two perspectives: 1) We extensively analyze various CNNs to identify low-energy architectures, including selecting activation functions, convolutions operators, and feature fusion structures on necks. |
Peng Tu; Xu Xie; Guo Ai; Yuexiang Li; Yawen Huang; Yefeng Zheng; |
927 | Generative Prompt Model for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. |
Yuzhong Zhao; Qixiang Ye; Weijia Wu; Chunhua Shen; Fang Wan; |
928 | ActFormer: A GAN-based Transformer Towards General Action-Conditioned 3D Human Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a GAN-based Transformer for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. |
Liang Xu; Ziyang Song; Dongliang Wang; Jing Su; Zhicheng Fang; Chenjing Ding; Weihao Gan; Yichao Yan; Xin Jin; Xiaokang Yang; Wenjun Zeng; Wei Wu; |
929 | Hiding Visual Information Via Obfuscating Adversarial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, inspired by the Type-I adversarial attack, we propose an Adversarial Visual Information Hiding (AVIH) method to protect the visual privacy of data. |
Zhigang Su; Dawei Zhou; Nannan Wang; Decheng Liu; Zhen Wang; Xinbo Gao; |
930 | Category-aware Allocation Transformer for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current transformer-based methods predict bounding boxes using category-agnostic attention maps, which may lead to confused and noisy object localization. To address this issue, we propose a novel Category-aware Allocation TRansformer (CATR) that learns category-aware representations for specific objects and produces corresponding category-aware attention maps for object localization. |
Zhiwei Chen; Jinren Ding; Liujuan Cao; Yunhang Shen; Shengchuan Zhang; Guannan Jiang; Rongrong Ji; |
931 | Domain Specified Optimization for Deployment Authorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from distributional robust statistics, we present a lightweight method called Domain-Specified Optimization (DSO) for SDPA that degrades the model’s generalization over a divergence ball. |
Haotian Wang; Haoang Chi; Wenjing Yang; Zhipeng Lin; Mingyang Geng; Long Lan; Jing Zhang; Dacheng Tao; |
932 | Iterative Prompt Learning for Unsupervised Backlit Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT, by exploring the potential of Contrastive Language-Image Pre-Training (CLIP) for pixel-level image enhancement. |
Zhexin Liang; Chongyi Li; Shangchen Zhou; Ruicheng Feng; Chen Change Loy; |
933 | UMIFormer: Mining The Correlations Between Similar Tokens for Multi-View 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is not usable prior relationship, which is similar to the temporally-coherence property in a video. To solve this problem, we propose a novel transformer network for Unstructured Multiple Images (UMIFormer). |
Zhenwei Zhu; Liying Yang; Ning Li; Chaohao Jiang; Yanyan Liang; |
934 | Improved Knowledge Transfer for Semi-Supervised Domain Adaptation Via Trico Training Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve these, we propose the Trico-training method that utilizes a multilayer perceptron (MLP) classifier and two graph convolutional network (GCN) classifiers called inter-view GCN and intra-view GCN classifiers. |
Ba Hung Ngo; Yeon Jeong Chae; Jung Eun Kwon; Jae Hyeon Park; Sung In Cho; |
935 | Locally Stylized Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a stylization framework for NeRF based on local style transfer. |
Hong-Wing Pang; Binh-Son Hua; Sai-Kit Yeung; |
936 | InterFormer: Real-time Interactive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: First, annotators’ later click is based on models’ feedback of annotators’ former click. This serial interaction is unable to utilize model’s parallelism capabilities. Second, in each interaction step, the model handles the invariant image along with the sparse variable clicks, resulting in a process that’s highly repetitive and redundant. For efficient computations, we propose a method named InterFormer that follows a new pipeline to address these issues. |
You Huang; Hao Yang; Ke Sun; Shengchuan Zhang; Liujuan Cao; Guannan Jiang; Rongrong Ji; |
937 | Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the cross-modal heterogeneous gap, these method often suffer from high confidence spurious association and model prone to error propagation. In this paper, we propose Confidence-aware Pseudo-label Learning (CPL) to overcome the above limitations. |
Yang Liu; Jiahua Zhang; Qingchao Chen; Yuxin Peng; |
938 | Luminance-aware Color Transform for Multiple Exposure Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, some works to tackle multiple exposure rely on the encoder-decoder architecture, resulting in losses of details in input images during down-sampling and up-sampling processes. With this regard, a novel correction algorithm for multiple exposure, called luminance-aware color transform (LACT), is proposed in this study. |
Jong-Hyeon Baek; DaeHyun Kim; Su-Min Choi; Hyo-jun Lee; Hanul Kim; Yeong Jun Koh; |
939 | A Simple Framework for Open-Vocabulary Segmentation and Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that learns from different segmentation and detection datasets. |
Hao Zhang; Feng Li; Xueyan Zou; Shilong Liu; Chunyuan Li; Jianwei Yang; Lei Zhang; |
940 | Alignment Before Aggregation: Trajectory Memory Retrieval Network for Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reconcile the inherent tension of spatial and temporal information to retrieve memory frame information along the object trajectory, and propose a novel and coherent Trajectory Memory Retrieval Network (TMRN) to equip with the trajectory information, including a spatial alignment module and a temporal aggregation module. |
Rui Sun; Yuan Wang; Huayu Mai; Tianzhu Zhang; Feng Wu; |
941 | UATVR: Uncertainty-Adaptive Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an Uncertainty-Adaptive Text-Video Retrieval approach, termed UATVR, which models each look-up as a distribution matching procedure. |
Bo Fang; Wenhao Wu; Chang Liu; Yu Zhou; Yuxin Song; Weiping Wang; Xiangbo Shu; Xiangyang Ji; Jingdong Wang; |
942 | Deep Directly-Trained Spiking Neural Networks for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. |
Qiaoyi Su; Yuhong Chou; Yifan Hu; Jianing Li; Shijie Mei; Ziyang Zhang; Guoqi Li; |
943 | Online Prototype Learning for Online Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We identify shortcut learning as the key limiting factor for online CL, where the learned features may be biased, not generalizable to new tasks, and may have an adverse impact on knowledge distillation. To tackle this issue, we present the online prototype learning (OnPro) framework for online CL. |
Yujie Wei; Jiaxin Ye; Zhizhong Huang; Junping Zhang; Hongming Shan; |
944 | Robust E-NeRF: NeRF from Sparse & Noisy Events Under Non-Uniform Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Robust e-NeRF, a novel method to directly and robustly reconstruct NeRFs from moving event cameras under various real-world conditions, especially from sparse and noisy events generated under non-uniform motion. |
Weng Fei Low; Gim Hee Lee; |
945 | ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel animatable NeRF called ActorsNeRF. |
Jiteng Mu; Shen Sang; Nuno Vasconcelos; Xiaolong Wang; |
946 | SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a cascaded diffusion model based on a part-level implicit 3D representation. |
Juil Koo; Seungwoo Yoo; Minh Hieu Nguyen; Minhyuk Sung; |
947 | COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel NN-based spatially scalable image compression method, called COMPASS, which supports arbitrary-scale spatial scalability. |
Jongmin Park; Jooyoung Lee; Munchurl Kim; |
948 | Masked Autoencoders Are Stronger Knowledge Distillers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Knowledge distillation (KD) has shown great success in improving student’s performance by mimicking the intermediate output of the high-capacity teacher in fine-grained visual tasks, e.g. object detection. This paper proposes a technique called Masked Knowledge Distillation (MKD) that enhances this process using a masked autoencoding scheme. |
Shanshan Lao; Guanglu Song; Boxiao Liu; Yu Liu; Yujiu Yang; |
949 | Score-Based Diffusion Models As Principled Priors for Inverse Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we empirically validate the theoretically-proven probability function of a score-based diffusion model. |
Berthy T. Feng; Jamie Smith; Michael Rubinstein; Huiwen Chang; Katherine L. Bouman; William T. Freeman; |
950 | Multiscale Structure Guided Diffusion for Image Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a simple yet effective multiscale structure guidance as an implicit bias that informs the icDPM about the coarse structure of the sharp image at the intermediate layers. |
Mengwei Ren; Mauricio Delbracio; Hossein Talebi; Guido Gerig; Peyman Milanfar; |
951 | Multiple Planar Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The greater degree-of-freedom of planar objects compared with common objects makes MPOT far more challenging than well-studied object tracking, especially when occlusion occurs. To address this challenging task, we are inspired by amodal perception that humans jointly track visible and invisible parts of the target, and propose a tracking framework that unifies appearance perception and occlusion reasoning. |
Zhicheng Zhang; Shengzhe Liu; Jufeng Yang; |
952 | CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects. |
Ruyi Lian; Haibin Ling; |
953 | ASIC: Aligning Sparse In-the-wild Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for joint alignment of sparse in-the-wild image collections of an object category. |
Kamal Gupta; Varun Jampani; Carlos Esteves; Abhinav Shrivastava; Ameesh Makadia; Noah Snavely; Abhishek Kar; |
954 | Residual Pattern Learning for Pixel-Wise Out-of-Distribution Detection in Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although successful, these re-trained methods have two issues: 1) their in-distribution segmentation accuracy may drop during re-training, and 2) their OoD detection accuracy does not generalise well to new contexts (e.g., country surroundings) outside the training set (e.g., city surroundings). In this paper, we mitigate these issues with: (i) a new residual pattern learning (RPL) module that assists the segmentation model to detect OoD pixels with minimal deterioration to the inlier segmentation performance; and (ii) a novel context-robust contrastive learning (CoroCL) that enforces RPL to robustly detect OoD pixels in various contexts. |
Yuyuan Liu; Choubo Ding; Yu Tian; Guansong Pang; Vasileios Belagiannis; Ian Reid; Gustavo Carneiro; |
955 | Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data. We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. |
Hanjae Kim; Jiyoung Lee; Seongheon Park; Kwanghoon Sohn; |
956 | Event Camera Data Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a pre-trained neural network for handling event camera data. |
Yan Yang; Liyuan Pan; Liu Liu; |
957 | Segment Every Reference Object in Spatial and Temporal Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we end the current fragmented situation and propose UniRef to unify the three reference-based object segmentation tasks with a single architecture. |
Jiannan Wu; Yi Jiang; Bin Yan; Huchuan Lu; Zehuan Yuan; Ping Luo; |
958 | Unified Out-Of-Distribution Detection: A Model-Specific Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel, unifying framework to study OOD detection in a broader scope. |
Reza Averly; Wei-Lun Chao; |
959 | One-shot Implicit Animatable Avatars with Model-based Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. |
Yangyi Huang; Hongwei Yi; Weiyang Liu; Haofan Wang; Boxi Wu; Wenxiao Wang; Binbin Lin; Debing Zhang; Deng Cai; |
960 | Unsupervised Feature Representation Learning for Domain-generalized Cross-domain Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve domain generalizability of the model, we thus propose a new two-stage domain augmentation technique for diversified training data generation. |
Conghui Hu; Can Zhang; Gim Hee Lee; |
961 | RankMatch: Fostering Confidence and Consistency in Learning with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose RankMatch, a novel LNL framework that investigates additional dimensions of confidence and consistency in order to combat noisy labels. |
Ziyi Zhang; Weikai Chen; Chaowei Fang; Zhen Li; Lechao Chen; Liang Lin; Guanbin Li; |
962 | Dec-Adapter: Exploring Efficient Decoder-Side Adapter for Bridging Screen Content and Natural Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, we observe that parameter-efficient trans-fer learning (PETL) methods have shown great adaptation ability in high-level vision tasks. Therefore, we propose a Dec-Adapter, a pioneering entropy-efficient transfer learning module for the decoder to bridge natural image and screen content compression. |
Sheng Shen; Huanjing Yue; Jingyu Yang; |
963 | MixReorg: Cross-Modal Mixed Patch Reorganization Is A Good Mask Learner for Open-World Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these models still face difficulties in learning fine-grained semantic alignment at the pixel level and predicting accurate object masks. To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model’s ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence. |
Kaixin Cai; Pengzhen Ren; Yi Zhu; Hang Xu; Jianzhuang Liu; Changlin Li; Guangrun Wang; Xiaodan Liang; |
964 | Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel volumetric human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior’s training distribution. |
Marcel C. Bühler; Kripasindhu Sarkar; Tanmay Shah; Gengyan Li; Daoye Wang; Leonhard Helminger; Sergio Orts-Escolano; Dmitry Lagun; Otmar Hilliges; Thabo Beeler; Abhimitra Meka; |
965 | Label-Guided Knowledge Distillation for Continual Semantic Segmentation on 2D Images and 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing KD-based CSS methods continue to suffer from confusion between the background and novel classes since they fail to establish a reliable class correspondence for distillation. To address this issue, we propose a new label-guided knowledge distillation (LGKD) loss, where the old model output is expanded and transplanted (with the guidance of the ground truth label) to form a semantically appropriate class correspondence with the new model output. |
Ze Yang; Ruibo Li; Evan Ling; Chi Zhang; Yiming Wang; Dezhao Huang; Keng Teck Ma; Minhoe Hur; Guosheng Lin; |
966 | Under-Display Camera Image Restoration with Scattering Effect Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the UDC image restoration problem with the specific consideration of the scattering effect caused by the display. |
Binbin Song; Xiangyu Chen; Shuning Xu; Jiantao Zhou; |
967 | PRANC: Pseudo RAndom Networks for Compacting Deep Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we employ PRANC to condense image classification models and compress images by compacting their associated implicit neural networks. |
Parsa Nooralinejad; Ali Abbasi; Soroush Abbasi Koohpayegani; Kossar Pourahmadi Meibodi; Rana Muhammad Shahroz Khan; Soheil Kolouri; Hamed Pirsiavash; |
968 | ICICLE: Interpretable Class Incremental Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, continual learning poses new challenges for interpretability, as the rationale behind model predictions may change over time, leading to interpretability concept drift. We address this problem by proposing Interpretable Class-InCremental LEarning (ICICLE), an exemplar-free approach that adopts a prototypical part-based approach. |
Dawid Rymarczyk; Joost van de Weijer; Bartosz Zieliński; Bartlomiej Twardowski; |
969 | Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. |
Fangyin Wei; Thomas Funkhouser; Szymon Rusinkiewicz; |
970 | PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection. |
Xiangyang Zhu; Renrui Zhang; Bowei He; Ziyu Guo; Ziyao Zeng; Zipeng Qin; Shanghang Zhang; Peng Gao; |
971 | VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VideoFlow, a novel optical flow estimation framework for videos. |
Xiaoyu Shi; Zhaoyang Huang; Weikang Bian; Dasong Li; Manyuan Zhang; Ka Chun Cheung; Simon See; Hongwei Qin; Jifeng Dai; Hongsheng Li; |
972 | 3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 3DMiner — a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. |
Ta-Ying Cheng; Matheus Gadelha; Sören Pirk; Thibault Groueix; Radomír Měch; Andrew Markham; Niki Trigoni; |
973 | Identification of Systematic Errors of Image Classifiers on Rare Subgroups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups (prompts) for subgroups where the target model has low performance on the prompt-conditioned synthesized data. |
Jan Hendrik Metzen; Robin Hutmacher; N. Grace Hua; Valentyn Boreiko; Dan Zhang; |
974 | Hierarchical Spatio-Temporal Representation Learning for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a hierarchical spatio-temporal representation learning (HSTL) framework for extracting gait features from coarse to fine. |
Lei Wang; Bo Liu; Fangfang Liang; Bincheng Wang; |
975 | Order-Prompted Tag Sequence Generation for Video Tagging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is difficult for the existing multi-label classification and generation methods to adapt directly to this task. This paper proposes a novel generative model, Order-Prompted Tag Sequence Generation (OP-TSG), according to the above characteristics. |
Zongyang Ma; Ziqi Zhang; Yuxin Chen; Zhongang Qi; Yingmin Luo; Zekun Li; Chunfeng Yuan; Bing Li; Xiaohu Qie; Ying Shan; Weiming Hu; |
976 | XVO: Generalized Visual Odometry Via Cross-Modal Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. |
Lei Lai; Zhongkai Shangguan; Jimuyang Zhang; Eshed Ohn-Bar; |
977 | Weakly Supervised Learning of Semantic Correspondence Through Cascaded Online Correspondence Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a weakly supervised learning algorithm to learn robust semantic correspondences from large-scale datasets with only image-level labels. |
Yiwen Huang; Yixuan Sun; Chenghang Lai; Qing Xu; Xiaomei Wang; Xuli Shen; Weifeng Ge; |
978 | Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a novel query-based 3D detector called Clusterformer, our Clusterformer regards each object as a cluster of 3D space which mainly consists of the non-empty voxels belonging to the same object, and leverages the cluster to conduct the transformer decoder to generate the proposals from the sparse voxel features directly. |
Yu Pei; Xian Zhao; Hao Li; Jingyuan Ma; Jingwei Zhang; Shiliang Pu; |
979 | HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible. |
Sadegh Aliakbarian; Fatemeh Saleh; David Collier; Pashmina Cameron; Darren Cosker; |
980 | NaviNeRF: NeRF-based 3D Representation Disentanglement By Latent Semantic Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This task is currently under-explored and poses great challenges: (i) the 3D representations are complex and in general contains much more information than 2D image; (ii) many 3D representations are not well suited for gradient-based optimization, let alone disentanglement. To address these challenges, we use NeRF as a differentiable 3D representation, and introduce a self-supervised Navigation to identify interpretable semantic directions in the latent space. |
Baao Xie; Bohan Li; Zequn Zhang; Junting Dong; Xin Jin; Jingyu Yang; Wenjun Zeng; |
981 | Adaptive Illumination Mapping for Shadow Detection in Raw Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to detect shadows from raw images. |
Jiayu Sun; Ke Xu; Youwei Pang; Lihe Zhang; Huchuan Lu; Gerhard Hancke; Rynson Lau; |
982 | CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a CLIP-based unsupervised learning method for annotation-free multi-label image classification, including three stages: initialization, training, and inference. |
Rabab Abdelfattah; Qing Guo; Xiaoguang Li; Xiaofeng Wang; Song Wang; |
983 | Your Diffusion Model Is Secretly A Zero-Shot Classifier Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that the density estimates from large-scale text-to-image diffusion models like Stable Diffusion can be leveraged to perform zero-shot classification without any additional training. |
Alexander C. Li; Mihir Prabhudesai; Shivam Duggal; Ellis Brown; Deepak Pathak; |
984 | Backpropagation Path Search On Adversarial Transferability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose backPropagation pAth Search (PAS), solving the aforementioned two problems. |
Zhuoer Xu; Zhangxuan Gu; Jianping Zhang; Shiwen Cui; Changhua Meng; Weiqiang Wang; |
985 | Boosting Adversarial Transferability Via Gradient Relevance Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make adversarial examples more transferable, in this paper, we explore the fluctuation phenomenon on the plus-minus sign of the adversarial perturbations’ pixels during the generation of adversarial examples, and propose an ingenious Gradient Relevance Attack (GRA). |
Hegui Zhu; Yuchen Ren; Xiaoyan Sui; Lianping Yang; Wuming Jiang; |
986 | Image-Free Classifier Injection for Zero-Shot Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to equip pre-trained models with zero-shot classification capabilities without the use of image data. |
Anders Christensen; Massimiliano Mancini; A. Sophia Koepke; Ole Winther; Zeynep Akata; |
987 | CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel method, namely CLIP saying no (CLIPN), which empowers "no" logic within CLIP. |
Hualiang Wang; Yi Li; Huifeng Yao; Xiaomeng Li; |
988 | CO-Net: Learning Multiple Point Cloud Tasks at Once with A Cohesive Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CO-Net, a cohesive framework that optimizes multiple point cloud tasks collectively across heterogeneous dataset domains. |
Tao Xie; Ke Wang; Siyi Lu; Yukun Zhang; Kun Dai; Xiaoyu Li; Jie Xu; Li Wang; Lijun Zhao; Xinyu Zhang; Ruifeng Li; |
989 | Quality Diversity for Visual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we develop a feature that better supports diverse downstream tasks by providing a diverse set of sensitivities and invariances. |
Ruchika Chavhan; Henry Gouk; Da Li; Timothy Hospedales; |
990 | UniDexGrasp++: Improving Dexterous Grasping Policy Learning Via Geometry-Aware Curriculum and Iterative Generalist-Specialist Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel, object-agnostic method for learning a universal policy for dexterous object grasping from realistic point cloud observations and proprioceptive information under a table-top setting, namely UniDexGrasp++. |
Weikang Wan; Haoran Geng; Yun Liu; Zikang Shan; Yaodong Yang; Li Yi; He Wang; |
991 | Multi-Scale Residual Low-Pass Filter Network for Image Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple and effective Multi-scale Residual Low-Pass Filter Network (MRLPFNet) that jointly explores the image details and main structures for image deblurring. |
Jiangxin Dong; Jinshan Pan; Zhongbao Yang; Jinhui Tang; |
992 | FerKD: Surgical Label Adaptation for Efficient Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. |
Zhiqiang Shen; |
993 | Neural Fields for Structured Lighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an image formation model and optimization procedure that combines the advantages of neural radiance fields and structured light imaging. |
Aarrushi Shandilya; Benjamin Attal; Christian Richardt; James Tompkin; Matthew O’toole; |
994 | ClothPose: A Real-world Benchmark for Visual Analysis of Garment Pose Via An Indirect Recording Solution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a recording system, GarmentTwin, which can track garment poses in dynamic settings such as manipulation. |
Wenqiang Xu; Wenxin Du; Han Xue; Yutong Li; Ruolin Ye; Yan-Feng Wang; Cewu Lu; |
995 | Semantically Structured Image Compression Via Irregular Group-Based Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. |
Ruoyu Feng; Yixin Gao; Xin Jin; Runsen Feng; Zhibo Chen; |
996 | PhaseMP: Robust 3D Pose Estimation Via Phase-conditioned Human Motion Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel motion prior, called PhaseMP, modeling a probability distribution on pose transitions conditioned by a frequency domain feature extracted from a periodic autoencoder. |
Mingyi Shi; Sebastian Starke; Yuting Ye; Taku Komura; Jungdam Won; |
997 | NLOS-NeuS: Non-line-of-sight Neural Implicit Surface Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose NLOS neural implicit surface (NLOS-NeuS), which extends the NeTF to neural implicit surfaces with a signed distance function (SDF) for reconstructing three-dimensional surfaces in NLOS scenes. |
Yuki Fujimura; Takahiro Kushida; Takuya Funatomi; Yasuhiro Mukaigawa; |
998 | Unsupervised Object Localization with Representer Point Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel unsupervised object localization method that allows us to explain the predictions of the model by utilizing self-supervised pre-trained models without additional finetuning. |
Yeonghwan Song; Seokwoo Jang; Dina Katabi; Jeany Son; |
999 | SEMPART: Self-supervised Multi-resolution Partitioning of Image Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose SEMPART, which jointly infers coarse and fine bi-partitions over an image’s DINO-based semantic graph. |
Sriram Ravindran; Debraj Basu; |
1000 | Flatness-Aware Minimization for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. |
Xingxuan Zhang; Renzhe Xu; Han Yu; Yancheng Dong; Pengfei Tian; Peng Cui; |
1001 | ProtoFL: Unsupervised Federated Learning Via Prototypical Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ‘ProtoFL’, Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. |
Hansol Kim; Youngjun Kwak; Minyoung Jung; Jinho Shin; Youngsung Kim; Changick Kim; |
1002 | Augmenting and Aligning Snippets for Few-Shot Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel SSA2lign to address FSVDA at the snippet level, where the target domain is expanded through a simple snippet-level augmentation followed by the attentive alignment of snippets both semantically and statistically, where semantic alignment of snippets is conducted through multiple perspectives. |
Yuecong Xu; Jianfei Yang; Yunjiao Zhou; Zhenghua Chen; Min Wu; Xiaoli Li; |
1003 | Self-Organizing Pathway Expansion for Non-Exemplar Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The conflict between old and new class optimization is exacerbated since the shared neural pathways can only be differentiated by the incremental samples. To address this problem, we propose a novel self-organizing pathway expansion scheme. |
Kai Zhu; Kecheng Zheng; Ruili Feng; Deli Zhao; Yang Cao; Zheng-Jun Zha; |
1004 | Preserving Tumor Volumes for Unsupervised Medical Image Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current traditional and learning-based methods rely on similarity measures to generate a deforming field, which often results in disproportionate volume changes in dissimilar regions, especially in tumor regions. These changes can significantly alter the tumor size and underlying anatomy, which limits the practical use of image registration in clinical diagnosis. To address this issue, we have formulated image registration with tumors as a constraint problem that preserves tumor volumes while maximizing image similarity in other normal regions. |
Qihua Dong; Hao Du; Ying Song; Yan Xu; Jing Liao; |
1005 | Multi-label Affordance Mapping from Egocentric Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new approach to affordance perception which enables accurate multi-label segmentation. |
Lorenzo Mur-Labadia; Jose J. Guerrero; Ruben Martinez-Cantin; |
1006 | Towards Real-World Burst Image Super-Resolution: Benchmark and Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish a large-scale real-world burst super-resolution dataset, i.e., RealBSR, to explore the faithful reconstruction of image details from multiple frames. |
Pengxu Wei; Yujing Sun; Xingbei Guo; Chang Liu; Guanbin Li; Jie Chen; Xiangyang Ji; Liang Lin; |
1007 | Unified Adversarial Patch for Cross-Modal Attacks in The Physical World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To ensure the security, many scenarios are simultaneously deployed with visible sensors and infrared sensors, leading to the failures of these single-modal physical attacks. To show the potential risks under such scenes, we propose a unified adversarial patch to perform cross-modal physical attacks, i.e., fooling visible and infrared object detectors at the same time via a single patch. |
Xingxing Wei; Yao Huang; Yitong Sun; Jie Yu; |
1008 | Unsupervised Accuracy Estimation of Deep Visual Models Using Domain-Adaptive Adversarial Perturbation Without Source Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work proposes a new framework to estimate model accuracy on unlabeled target data without access to source data. |
JoonHo Lee; Jae Oh Woo; Hankyu Moon; Kwonho Lee; |
1009 | Misalign, Contrast Then Distill: Rethinking Misalignments in Language-Image Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior methods either disregarded this discrepancy or introduced external models to mitigate the impact of misalignments during training. In contrast, we propose a novel metric learning approach that capitalizes on these misalignments as an additional training source, which we term "Misalign, Contrast then Distill (MCD)". |
Bumsoo Kim; Yeonsik Jo; Jinhyung Kim; Seunghwan Kim; |
1010 | SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-Time Performance on Mobile Device Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Firstly, most low-level vision algorithms are task-specific and independent to each other, which makes them difficult to integrate into a single neural network architecture and accelerate simultaneously without task-level time-multiplexing. Secondly, most of these networks feature large amounts of parameters and huge computational costs in terms of multiplication-and-accumulation operations, and thus it is difficult to achieve real-time performance, especially on mobile devices with limited computing power. To tackle with these problems, we propose a novel network, SYENet, with only 6K parameters. |
Weiran Gou; Ziyao Yi; Yan Xiang; Shaoqing Li; Zibin Liu; Dehui Kong; Ke Xu; |
1011 | MATE: Masked Autoencoders Are Online 3D Test-Time Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our MATE is the first Test-Time-Training (TTT) method designed for 3D data, which makes deep networks trained for point cloud classification robust to distribution shifts occurring in test data. |
M. Jehanzeb Mirza; Inkyu Shin; Wei Lin; Andreas Schriebl; Kunyang Sun; Jaesung Choe; Mateusz Kozinski; Horst Possegger; In So Kweon; Kuk-Jin Yoon; Horst Bischof; |
1012 | EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first identify that the loss of critical fine-grained local image semantics hinders existing methods from attaining strong base-to-novel generalization. Then, we propose Early Dense Alignment (EDA) to bridge the gap between generalizable local semantics and object-level prediction. |
Cheng Shi; Sibei Yang; |
1013 | MixPath: A Unified Approach for One-shot Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we are motivated to train a one-shot multi-path supernet to accurately evaluate the candidate architectures. |
Xiangxiang Chu; Shun Lu; Xudong Li; Bo Zhang; |
1014 | Enhancing NeRF Akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While those feedforward neuralized architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. |
Wenyan Cong; Hanxue Liang; Peihao Wang; Zhiwen Fan; Tianlong Chen; Mukund Varma; Yi Wang; Zhangyang Wang; |
1015 | Task-aware Adaptive Learning for Cross-domain Few-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Abundant task-specific parameters may over-fit, and insufficient task-specific parameters may result in under-adaptation — but the optimal task-specific configuration varies for different test tasks. Based on these findings, we propose the Task-aware Adaptive Network (TA2-Net), which is trained by reinforcement learning to adaptively estimate the optimal task-specific parameter configuration for each test task. |
Yurong Guo; Ruoyi Du; Yuan Dong; Timothy Hospedales; Yi-Zhe Song; Zhanyu Ma; |
1016 | Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to achieve satisfying image and video style transfers, two different models are inevitably required with separate training processes on image and video domains, respectively. In this paper, we show that this can be precluded by introducing UniST, a Unified Style Transfer framework for both images and videos. |
Bohai Gu; Heng Fan; Libo Zhang; |
1017 | Revisiting Domain-Adaptive 3D Object Detection By Reliable, Diverse and Class-balanced Pseudo-Labeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While effective, existing DA methods suffer from a substantial drop in performance when applied to a multi-class training setting, due to the co-existence of low-quality pseudo labels and class imbalance issues. In this paper, we address this challenge by proposing a novel ReDB framework tailored for learning to detect all classes at once. |
Zhuoxiao Chen; Yadan Luo; Zheng Wang; Mahsa Baktashmotlagh; Zi Huang; |
1018 | Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it directly fuses identical image semantics to prompts of different labels and significantly weakens the discrimination among different classes as shown in our experiments. Motivated by this observation, we first propose a class-aware text prompt (CTP) to enrich generated prompts with label-related image information. |
Sifan Long; Zhen Zhao; Junkun Yuan; Zichang Tan; Jiangjiang Liu; Luping Zhou; Shengsheng Wang; Jingdong Wang; |
1019 | Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the powerful generalization ability of the large Vision-Language Models (VLM) on classification and retrieval tasks, we propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM). |
Ting Lei; Fabian Caba; Qingchao Chen; Hailin Jin; Yuxin Peng; Yang Liu; |
1020 | NeMF: Inverse Volume Rendering with Neural Microflake Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recovering the physical attributes of an object’s appearance from its images captured under an unknown illumination is challenging yet essential for photo-realistic rendering.Recent approaches adopt the emerging implicit scene representations and have shown impressive results.However, they unanimously adopt a surface-based representation,and hence can not well handle scenes with very complex geometry, translucent object and etc.In this paper, we propose to conduct inverse volume rendering, in contrast to surface-based, by representing a scene using microflake volume, which assumes the space is filled with infinite small flakes and light reflects or scatters at each spatial location according to microflake distributions. |
Youjia Zhang; Teng Xu; Junqing Yu; Yuteng Ye; Yanqing Jing; Junle Wang; Jingyi Yu; Wei Yang; |
1021 | Attentive Mask CLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that removing a large portion of image tokens may inadvertently destroy the semantic information associated to a given text description, resulting in misaligned paired data in CLIP training. To address this issue, we propose an attentive token removal approach, which retains a small number of tokens that have a strong semantic correlation to the corresponding text description. |
Yifan Yang; Weiquan Huang; Yixuan Wei; Houwen Peng; Xinyang Jiang; Huiqiang Jiang; Fangyun Wei; Yin Wang; Han Hu; Lili Qiu; Yuqing Yang; |
1022 | DOLCE: A Model-Based Probabilistic Diffusion Framework for Limited-Angle CT Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present DOLCE as the first framework for integrating conditionally-trained diffusion models and explicit physical measurement models for solving imaging inverse problems. |
Jiaming Liu; Rushil Anirudh; Jayaraman J. Thiagarajan; Stewart He; K Aditya Mohan; Ulugbek S. Kamilov; Hyojin Kim; |
1023 | Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, the synthesized extrapolated regions may be included in the cropped image, making the image composition result not real and potentially with degraded image quality. In this paper, we circumvent this issue by presenting a joint framework for both unbounded recommendation of camera view and image composition (i.e., UNIC). |
Xiaoyu Liu; Ming Liu; Junyi Li; Shuai Liu; Xiaotao Wang; Lei Lei; Wangmeng Zuo; |
1024 | MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop MasaCtrl, a tuning-free method to achieve consistent image generation and complex non-rigid image editing simultaneously. |
Mingdeng Cao; Xintao Wang; Zhongang Qi; Ying Shan; Xiaohu Qie; Yinqiang Zheng; |
1025 | Understanding Hessian Alignment for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite this success, our understanding of the role of Hessian and gradient alignment in domain generalization is still limited. To address this shortcoming, we analyze the role of the classifier’s head Hessian matrix and gradient in domain generalization using recent OoD theory of transferability. |
Sobhan Hemati; Guojun Zhang; Amir Estiri; Xi Chen; |
1026 | DeepChange: A Long-Term Person Re-Identification Benchmark with Clothes Change Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we contribute a large, realistic long-term person re-identification benchmark, termed DeepChange. |
Peng Xu; Xiatian Zhu; |
1027 | Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. |
Songwei Ge; Seungjun Nah; Guilin Liu; Tyler Poon; Andrew Tao; Bryan Catanzaro; David Jacobs; Jia-Bin Huang; Ming-Yu Liu; Yogesh Balaji; |
1028 | Discrepant and Multi-Instance Proxies for Unsupervised Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To completely and accurately represent the information contained in a cluster and learn discriminative features, we propose to maintain discrepant cluster proxies and multi-instance proxies for a cluster. |
Chang Zou; Zeqi Chen; Zhichao Cui; Yuehu Liu; Chi Zhang; |
1029 | Joint-Relation Transformer for Multi-Person Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. |
Qingyao Xu; Weibo Mao; Jingze Gong; Chenxin Xu; Siheng Chen; Weidi Xie; Ya Zhang; Yanfeng Wang; |
1030 | Revisiting Vision Transformer from The View of Path Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Vision Transformers (ViTs) are normally regarded as a stack of transformer layers. In this work, we propose a novel view of ViTs showing that they can be seen as ensemble networks containing multiple parallel paths with different lengths. |
Shuning Chang; Pichao Wang; Hao Luo; Fan Wang; Mike Zheng Shou; |
1031 | Tetra-NeRF: Representing Neural Radiance Fields Using Tetrahedra Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the observation that a (sparse) point cloud of the scene is often available, this paper proposes to use an adaptive representation based on tetrahedra obtained by Delaunay triangulation instead of uniform subdivision or point-based representations. |
Jonas Kulhanek; Torsten Sattler; |
1032 | TMA: Temporal Motion Aggregation for Event-based Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, most existing learning-based approaches for event optical flow estimation directly remould the paradigm of conventional images by representing the consecutive event stream as static frames, ignoring the inherent temporal continuity of event data. In this paper, we argue that temporal continuity is a vital element of event-based optical flow and propose a novel Temporal Motion Aggregation (TMA) approach to unlock its potential. |
Haotian Liu; Guang Chen; Sanqing Qu; Yanping Zhang; Zhijun Li; Alois Knoll; Changjun Jiang; |
1033 | Ablating Concepts in Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How can we remove such copyrighted concepts or images without retraining the model from scratch? To achieve this goal, we propose an efficient method of ablating concepts in the pretrained model, i.e., preventing the generation of a target concept. |
Nupur Kumari; Bingliang Zhang; Sheng-Yu Wang; Eli Shechtman; Richard Zhang; Jun-Yan Zhu; |
1034 | Motion-Guided Masking for Spatiotemporal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a motion-guided masking algorithm (MGM) which leverages motion vectors to guide the position of each mask over time. |
David Fan; Jue Wang; Shuai Liao; Yi Zhu; Vimal Bhat; Hector Santos-Villalobos; Rohith MV; Xinyu Li; |
1035 | MapFormer: Boosting Change Detection By Using Pre-change Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth’s surface. In this paper, we leverage this information for change detection in bi-temporal images. |
Maximilian Bernhard; Niklas Strauß; Matthias Schubert; |
1036 | Masked Diffusion Transformer Is A Strong Image Synthesizer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process. To solve this issue, we propose a Masked Diffusion Transformer (MDT) that introduces a mask latent modeling scheme to explicitly enhance the DPMs’ ability to contextual relation learning among object semantic parts in an image. |
Shanghua Gao; Pan Zhou; Ming-Ming Cheng; Shuicheng Yan; |
1037 | LightDepth: Single-View Depth Self-Supervision from Illumination Decline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. |
Javier Rodríguez-Puigvert; Víctor M. Batlle; J.M.M. Montiel; Ruben Martinez-Cantin; Pascal Fua; Juan D. Tardós; Javier Civera; |
1038 | Urban Radiance Field Representation with Deformable Neural Mesh Primitives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive (DNMP), and propose to parameterize the entire scene with such primitives. |
Fan Lu; Yan Xu; Guang Chen; Hongsheng Li; Kwan-Yee Lin; Changjun Jiang; |
1039 | Adaptive Frequency Filters As Efficient Global Token Mixers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In this work, we apply conventional convolution theorem to deep learning for addressing this and reveal that adaptive frequency filters can serve as efficient global token mixer. |
Zhipeng Huang; Zhizheng Zhang; Cuiling Lan; Zheng-Jun Zha; Yan Lu; Baining Guo; |
1040 | Referring Image Segmentation Using Text Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we observe that the referring texts used in RIS already provide sufficient information to localize the target object. |
Fang Liu; Yuhao Liu; Yuqiu Kong; Ke Xu; Lihe Zhang; Baocai Yin; Gerhard Hancke; Rynson Lau; |
1041 | Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body. |
Wenjia Wang; Yongtao Ge; Haiyi Mei; Zhongang Cai; Qingping Sun; Yanjun Wang; Chunhua Shen; Lei Yang; Taku Komura; |
1042 | Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR Based 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims for high-performance offline LiDAR-based 3D object detection. |
Lue Fan; Yuxue Yang; Yiming Mao; Feng Wang; Yuntao Chen; Naiyan Wang; Zhaoxiang Zhang; |
1043 | Building A Winning Team: Selecting Source Model Ensembles Using A Submodular Transferability Estimation Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, they overlook the most important factor while selecting the source models, viz., the cohesiveness factor between them, which can impact the performance and confidence in the prediction of the ensemble. To address these gaps, we propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task. |
Vimal K B; Saketh Bachu; Tanmay Garg; Niveditha Lakshmi Narasimhan; Raghavan Konuru; Vineeth N Balasubramanian; |
1044 | Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing. |
Matthew Dutson; Yin Li; Mohit Gupta; |
1045 | Plausible Uncertainties for Human Pose Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a straightforward human pose regression framework to examine the behavior of two established methods for simultaneous aleatoric and epistemic uncertainty estimation: maximum a-posteriori (MAP) estimation with Monte-Carlo variational inference and deep evidential regression (DER). |
Lennart Bramlage; Michelle Karg; Cristóbal Curio; |
1046 | Beyond One-to-One: Rethinking The Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, such methods fail when the expressions refer to either no objects or multiple objects. In this paper, we address this issue from two perspectives. |
Yutao Hu; Qixiong Wang; Wenqi Shao; Enze Xie; Zhenguo Li; Jungong Han; Ping Luo; |
1047 | Robust Referring Video Object Segmentation with Cyclic Structural Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we highlight the need for a robust R-VOS model that can handle semantic mismatches. |
Xiang Li; Jinglu Wang; Xiaohao Xu; Xiao Li; Bhiksha Raj; Yan Lu; |
1048 | DiffIR: Efficient Diffusion Model for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. |
Bin Xia; Yulun Zhang; Shiyin Wang; Yitong Wang; Xinglong Wu; Yapeng Tian; Wenming Yang; Luc Van Gool; |
1049 | MoreauGrad: Sparse and Robust Interpretation of Neural Networks Via Moreau Envelope Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose MoreauGrad as an interpretation scheme based on the classifier neural net’s Moreau envelope. |
Jingwei Zhang; Farzan Farnia; |
1050 | Building Bridge Across The Time: Disruption and Restoration of Murals In The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the mural-restoration task, which aims to detect damaged regions in the mural and repaint them automatically. |
Huiyang Shao; Qianqian Xu; Peisong Wen; Peifeng Gao; Zhiyong Yang; Qingming Huang; |
1051 | Class-Incremental Grouping Network for Continual Audio-Visual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While previous methods have focused on using either regularization or rehearsal-based frameworks to alleviate catastrophic forgetting in image classification, they are limited to a single modality and cannot learn compact class-aware cross-modal representations for continual audio-visual learning. To address this gap, we propose a novel class-incremental grouping network (CIGN) that can learn category-wise semantic features to achieve continual audio-visual learning. |
Shentong Mo; Weiguo Pian; Yapeng Tian; |
1052 | Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes an approach capable of accurate hair geometry reconstruction at a strand level from a monocular video or multi-view images captured in uncontrolled lighting conditions. |
Vanessa Sklyarova; Jenya Chelishev; Andreea Dogaru; Igor Medvedev; Victor Lempitsky; Egor Zakharov; |
1053 | Improving Sample Quality of Diffusion Models Using Self-Attention Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a more comprehensive perspective that goes beyond the traditional guidance methods. |
Susung Hong; Gyuseong Lee; Wooseok Jang; Seungryong Kim; |
1054 | Evaluating Data Attribution for Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an initial step toward this problem, we evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. |
Sheng-Yu Wang; Alexei A. Efros; Jun-Yan Zhu; Richard Zhang; |
1055 | Delta Denoising Score Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Delta Denoising Score (DDS), a novel diffusion-based scoring technique that optimizes a parametric model for the task of image editing. |
Amir Hertz; Kfir Aberman; Daniel Cohen-Or,; |
1056 | Hierarchical Prior Mining for Non-local Multi-View Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Hierarchical Prior Mining for Non-local Multi-View Stereo (HPM-MVS). |
Chunlin Ren; Qingshan Xu; Shikun Zhang; Jiaqi Yang; |
1057 | Generative Multiplane Neural Radiance for 3D-Aware Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views. |
Amandeep Kumar; Ankan Kumar Bhunia; Sanath Narayan; Hisham Cholakkal; Rao Muhammad Anwer; Salman Khan; Ming-Hsuan Yang; Fahad Shahbaz Khan; |
1058 | DG-Recon: Depth-Guided Neural 3D Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key challenge in neural 3D scene reconstruction from monocular images is to fuse features back projected from various views without any depth or occlusion information. We address this by leveraging monocular depth priors, which effectively guide the fusion to improve surface prediction and skip over irrelevant, ambiguous, or occluded features. |
Jihong Ju; Ching Wei Tseng; Oleksandr Bailo; Georgi Dikov; Mohsen Ghafoorian; |
1059 | Simple Baselines for Interactive Video Retrieval with Questions and Answers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, there has been renewed interest in interactive systems to enhance retrieval, but existing approaches are complex and deliver limited gains in performance. In this work, we revisit this topic and propose several simple yet effective baselines for interactive video retrieval via question-answering. |
Kaiqu Liang; Samuel Albanie; |
1060 | MSRA-SR: Image Super-resolution Transformer with Multi-scale Shared Representation Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an image super-resolution Transformer with Multi-scale Shared Representation Acquisition (MSRA-SR). |
Xiaoqiang Zhou; Huaibo Huang; Ran He; Zilei Wang; Jie Hu; Tieniu Tan; |
1061 | The Stable Signature: Rooting Watermarks in Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. |
Pierre Fernandez; Guillaume Couairon; Hervé Jégou; Matthijs Douze; Teddy Furon; |
1062 | Boosting Semantic Segmentation from The Perspective of Explicit Class Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the mechanism of class embeddings and have an insight that more explicit and meaningful class embeddings can be generated based on class masks purposely. |
Yuhe Liu; Chuanjian Liu; Kai Han; Quan Tang; Zengchang Qin; |
1063 | Going Denser with Open-Vocabulary Part Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation. |
Peize Sun; Shoufa Chen; Chenchen Zhu; Fanyi Xiao; Ping Luo; Saining Xie; Zhicheng Yan; |
1064 | Learning to Identify Critical States for Reinforcement Learning from Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. |
Haozhe Liu; Mingchen Zhuge; Bing Li; Yuhui Wang; Francesco Faccio; Bernard Ghanem; Jürgen Schmidhuber; |
1065 | Editing Implicit Assumptions in Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. |
Hadas Orgad; Bahjat Kawar; Yonatan Belinkov; |
1066 | OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D Via RF-Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to exploit Radio-Frequency-Vision (RF-vision) capable of bypassing obstacles for achieving occluded HPE, and we introduce OCHID-Fi as the first RF-HPE method with 3D pose estimation capability. |
Shujie Zhang; Tianyue Zheng; Zhe Chen; Jingzhi Hu; Abdelwahed Khamis; Jiajun Liu; Jun Luo; |
1067 | Conceptual and Hierarchical Latent Space Decomposition for Face Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an encoder-decoder model that decomposes the entangled GAN space into a conceptual and hierarchical latent space in a self-supervised manner. |
Savas Ozkan; Mete Ozay; Tom Robinson; |
1068 | VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose VL-Match, a Vision-Language framework with Enhanced Token-level and Instance-level Matching. |
Junyu Bi; Daixuan Cheng; Ping Yao; Bochen Pang; Yuefeng Zhan; Chuanguang Yang; Yujing Wang; Hao Sun; Weiwei Deng; Qi Zhang; |
1069 | Reconstructing Interacting Hands with Interaction Prior from Monocular Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key idea is to first construct a two-hand interaction prior and recast the interaction reconstruction task as the conditional sampling from the prior. |
Binghui Zuo; Zimeng Zhao; Wenqian Sun; Wei Xie; Zhou Xue; Yangang Wang; |
1070 | Towards Realistic Evaluation of Industrial Continual Learning Scenarios with An Emphasis on Energy Consumption and Computational Footprint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce InVar-100 (Industrial Objects in Varied Contexts), a novel dataset meant to simulate the visual environments in industrial setups and perform various experiments for IL. |
Vivek Chavan; Paul Koch; Marian Schlüter; Clemens Briese; |
1071 | Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non-Euclidean data into an integer series called RoadNet Sequence. |
Jiachen Lu; Renyuan Peng; Xinyue Cai; Hang Xu; Hongyang Li; Feng Wen; Wei Zhang; Li Zhang; |
1072 | How Much Temporal Long-Term Context Is Needed for Action Segmentation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we try to answer how much long-term temporal context is required for temporal action segmentation by introducing a transformer-based model that leverages sparse attention to capture the full context of a video. |
Emad Bahrami; Gianpiero Francesca; Juergen Gall; |
1073 | 3D VR Sketch Guided 3D Shape Prototyping and Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate 3D shape modeling, we propose a 3D shape generation network that takes a 3D VR sketch as a condition. |
Ling Luo; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; Yulia Gryaditskaya; |
1074 | Seal-3D: Interactive Pixel-Level Editing for Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap, we propose a new interactive editing method and system for implicit representations, called Seal-3D, which allows users to edit NeRF models in a pixel-level and free manner with a wide range of NeRF-like backbone and preview the editing effects instantly. |
Xiangyu Wang; Jingsen Zhu; Qi Ye; Yuchi Huo; Yunlong Ran; Zhihua Zhong; Jiming Chen; |
1075 | Generative Novel View Synthesis with 3D-Aware Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. |
Eric R. Chan; Koki Nagano; Matthew A. Chan; Alexander W. Bergman; Jeong Joon Park; Axel Levy; Miika Aittala; Shalini De Mello; Tero Karras; Gordon Wetzstein; |
1076 | MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose More Diverse experts with Consistency Self-distillation (MDCS) to bridge the gap left by earlier methods. |
Qihao Zhao; Chen Jiang; Wei Hu; Fan Zhang; Jun Liu; |
1077 | Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper challenges a more complicated scenario with border applicability, i.e., zero-shot day-night domain adaptation, which eliminates reliance on any nighttime data. Unlike prior zero-shot adaptation approaches emphasizing either image-level translation or model-level adaptation, we propose a similarity min-max paradigm that considers them under a unified framework. |
Rundong Luo; Wenjing Wang; Wenhan Yang; Jiaying Liu; |
1078 | Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further increase the variability in the training examples and to maximize the generalization of the trained model, we propose a novel method of diverse anchor mining. |
Albert Mohwald; Tomas Jenicek; Ondřej Chum; |
1079 | NeRF-MS: Neural Radiance Fields with Multi-Sequence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-sequence data takes two main challenges: appearance variation due to different lighting conditions and non-static objects like pedestrians. To address these issues, we propose NeRF-MS, a novel approach to training NeRF with multi-sequence data. |
Peihao Li; Shaohui Wang; Chen Yang; Bingbing Liu; Weichao Qiu; Haoqian Wang; |
1080 | LVOS: A Benchmark for Long-term Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: So, in this paper, we present a new benchmark dataset and evaluation methodology named LVOS, which consists of 220 videos with a total duration of 421 minutes. |
Lingyi Hong; Wenchao Chen; Zhongying Liu; Wei Zhang; Pinxue Guo; Zhaoyu Chen; Wenqiang Zhang; |
1081 | Diffusion Model As Representation Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an in-depth investigation of the representation power of DPMs, and propose a novel knowledge transfer method that leverages the knowledge acquired by generative DPMs for analytical tasks. |
Xingyi Yang; Xinchao Wang; |
1082 | Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose a 3D diffusion-based method that leverages local 3D priors and a novel density-based score distillation sampling loss to discourage artifacts during NeRF optimization. |
Frederik Warburg; Ethan Weber; Matthew Tancik; Aleksander Holynski; Angjoo Kanazawa; |
1083 | Document Understanding Dataset and Evaluation (DUDE) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins and dates. |
Jordy Van Landeghem; Rubèn Tito; Łukasz Borchmann; Michał Pietruszka; Pawel Joziak; Rafal Powalski; Dawid Jurkiewicz; Mickael Coustaty; Bertrand Anckaert; Ernest Valveny; Matthew Blaschko; Sien Moens; Tomasz Stanislawek; |
1084 | ALWOD: Active Learning for Weakly-Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Object detection (OD), a crucial vision task, remains challenged by the lack of large training datasets with precise object localization labels. In this work, we propose ALWOD, a new framework that addresses this problem by fusing active learning (AL) with weakly and semi-supervised object detection paradigms. |
Yuting Wang; Velibor Ilic; Jiatong Li; Branislav Kisačanin; Vladimir Pavlovic; |
1085 | Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the aforementioned problems by jointing the prototypical kernel learning and open-set foreground perception. |
Kai Huang; Feigege Wang; Ye Xi; Yutao Gao; |
1086 | Simple and Effective Out-of-Distribution Detection Via Cosine-based Softmax Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and effective OOD detection method by combining the feature norm and the Mahalanobis distance obtained from classification models trained with the cosine-based softmax loss. |
SoonCheol Noh; DongEon Jeong; Jee-Hyong Lee; |
1087 | CFCG: Semi-Supervised Semantic Segmentation Via Cross-Fusion and Contour Guidance Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although the performance is desirable, many issues remain: (1) supervisions from a single learner tend to be noisy which causes unreliable consistency regularization (2) existing pixel-wise confidence-score-based reliability measurement causes potential error accumulation as the training proceeds. In this paper, we propose a novel SSSS framework, called CFCG, which combines cross-fusion and contour guidance supervision to tackle these issues. |
Shuo Li; Yue He; Weiming Zhang ; Wei Zhang; Xiao Tan; Junyu Han; Errui Ding; Jingdong Wang; |
1088 | CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce CHAMPAGNE, a generative model of conversations that can account for visual contexts. |
Seungju Han; Jack Hessel; Nouha Dziri; Yejin Choi; Youngjae Yu; |
1089 | SLAN: Self-Locator Aided Network for Vision-Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing works are either limited by text-agnostic and redundant regions obtained with the frozen detectors, or failing to scale further due to their heavy reliance on scarce grounding (gold) data to pre-train detectors. To solve these problems, we propose Self-Locator Aided Network (SLAN) for vision-language understanding tasks without any extra gold data. |
Jiang-Tian Zhai; Qi Zhang; Tong Wu; Xing-Yu Chen; Jiang-Jiang Liu; Ming-Ming Cheng; |
1090 | S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose to regularize neural rendering optimization with an MVS solution. |
Haoyu Wu; Alexandros Graikos; Dimitris Samaras; |
1091 | Anomaly Detection Using Score-based Perturbation Resilience Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel unsupervised anomaly detection method leveraging the score-based model. |
Woosang Shin; Jonghyeon Lee; Taehan Lee; Sangmoon Lee; Jong Pil Yun; |
1092 | Generating Visual Scenes from Touch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We draw on recent advances in latent diffusion to create a model for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks. |
Fengyu Yang; Jiacheng Zhang; Andrew Owens; |
1093 | DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the challenging problem of 3D toonification, which involves transferring the style of an artistic domain onto a target 3D face with stylized geometry and texture. |
Junzhe Zhang; Yushi Lan; Shuai Yang; Fangzhou Hong; Quan Wang; Chai Kiat Yeo; Ziwei Liu; Chen Change Loy; |
1094 | SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SatlasPretrain, a remote sensing dataset that is large in both breadth and scale, combining Sentinel-2 and NAIP images with 302M labels under 137 categories and seven label types. |
Favyen Bastani; Piper Wolters; Ritwik Gupta; Joe Ferdinando; Aniruddha Kembhavi; |
1095 | Empowering Low-Light Image Enhancer Through Customized Learnable Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. |
Naishan Zheng; Man Zhou; Yanmeng Dong; Xiangyu Rui; Jie Huang; Chongyi Li; Feng Zhao; |
1096 | TextManiA: Enriching Visual Feature By Text-driven Manifold Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TextManiA, a text-driven manifold augmentation method that semantically enriches visual feature spaces, regardless of class distribution. |
Moon Ye-Bin; Jisoo Kim; Hongyeob Kim; Kilho Son; Tae-Hyun Oh; |
1097 | Guiding Image Captioning Models Toward More Specific Captions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that it is possible to generate more specific captions with minimal changes to the training process. |
Simon Kornblith; Lala Li; Zirui Wang; Thao Nguyen; |
1098 | Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WHOOPS!, a new dataset and benchmark for visual commonsense. The dataset is comprised of purposefully commonsense-defying images created by designers using publicly-available image generation tools like Midjourney. |
Nitzan Bitton-Guetta; Yonatan Bitton; Jack Hessel; Ludwig Schmidt; Yuval Elovici; Gabriel Stanovsky; Roy Schwartz; |
1099 | Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a real-time reconstruction method using a novel stereo-based depth prediction network to keep the consistency of depth prediction in a sequence of images. |
Yuxiang Cai; Yifan Zhu; Haiwei Zhang; Bo Ren; |
1100 | DReg-NeRF: Deep Registration for Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike the existing work, NeRF2NeRF, which is based on traditional optimization methods and needs human annotated keypoints, we propose DReg-NeRF to solve the NeRF registration problem on object-centric scenes without human intervention. |
Yu Chen; Gim Hee Lee; |
1101 | DETR Does Not Need Multi-Scale or Locality Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder. |
Yutong Lin; Yuhui Yuan; Zheng Zhang; Chen Li; Nanning Zheng; Han Hu; |
1102 | Towards Effective Instance Discrimination Contrastive Loss for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a class relationship enhanced features. It uses probability weighted class prototpyes as the input features of IDCo loss, which can implicitly transfer the domain-invariant class relationship. |
Yixin Zhang; Zilei Wang; Junjie Li; Jiafan Zhuang; Zihan Lin; |
1103 | ClusT3: Information Invariant Test-Time Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel unsupervised TTT technique based on the maximization of Mutual Information between multi-scale feature maps and a discrete latent representation, which can be integrated to the standard training as an auxiliary clustering task. |
Gustavo A. Vargas Hakim; David Osowiechi; Mehrdad Noori; Milad Cheraghalikhani; Ali Bahri; Ismail Ben Ayed; Christian Desrosiers; |
1104 | FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, robust monocular depth estimation models trained with large-scale datasets have been proven to possess weak 3D geometry prior, but they are insufficient for reconstruction due to the unknown camera parameters, the affine-invariant property, and inter-frame inconsistency. To address these issues, we propose a novel test-time optimization approach that can transfer the robustness of affine-invariant depth models such as LeReS to challenging diverse scenes while ensuring inter-frame consistency, with only dozens of parameters to optimize per video frame. |
Guangkai Xu; Wei Yin; Hao Chen; Chunhua Shen; Kai Cheng; Feng Zhao; |
1105 | Affective Image Filter: Reflecting Emotions from Text to Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Understanding the emotions in text and presenting them visually is a very challenging problem that requires a deep understanding of natural language and high-quality image synthesis simultaneously. In this work, we propose Affective Image Filter (AIF), a novel model that is able to understand the visually-abstract emotions from the text and reflect them to visually-concrete images with appropriate colors and textures. |
Shuchen Weng; Peixuan Zhang; Zheng Chang; Xinlong Wang; Si Li; Boxin Shi; |
1106 | Content-Aware Local GAN for Photo-Realistic Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, natural images have complex distribution in the real world, and a single classifier in the discriminator may not have enough capacity to classify real and fake samples, making the preceding SR network generate unpleasing noise and artifacts. To solve the problem, we propose a novel content-aware local GAN framework, CAL-GAN, which processes a large and complicated distribution of real-world images by dividing them into smaller subsets based on similar contents. |
JoonKyu Park; Sanghyun Son; Kyoung Mu Lee; |
1107 | Structure-Aware Surface Reconstruction Via Primitive Assembly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel and efficient method for reconstructing manifold surfaces from point clouds. |
Jingen Jiang; Mingyang Zhao; Shiqing Xin; Yanchao Yang; Hanxiao Wang; Xiaohong Jia; Dong-Ming Yan; |
1108 | FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generating full-body and multi-genre dance sequences from given music is a challenging task, due to the limitations of existing datasets and the inherent complexity of the fine-grained hand motion and dance genres. To address these problems, we propose FineDance, which contains 14.6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture. |
Ronghui Li; Junfan Zhao; Yachao Zhang; Mingyang Su; Zeping Ren; Han Zhang; Yansong Tang; Xiu Li; |
1109 | AssetField: Assets Mining and Reconfiguration in Ground Feature Plane Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Traditional modeling pipelines keep an asset library storing unique object templates, which is both versatile and memory efficient in practice. Inspired by this observation, we propose AssetField, a novel neural scene representation that learns a set of object-aware ground feature planes to represent the scene, where an asset library storing template feature patches can be constructed in an unsupervised manner. |
Yuanbo Xiangli; Linning Xu; Xingang Pan; Nanxuan Zhao; Bo Dai; Dahua Lin; |
1110 | Improving Online Lane Graph Extraction By Object-Lane Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an architecture and loss formulation to improve the accuracy of local lane graph estimates by using 3D object detection outputs. |
Yigit Baran Can; Alexander Liniger; Danda Pani Paudel; Luc Van Gool; |
1111 | SAGA: Spectral Adversarial Geometric Attack on 3D Meshes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework for a geometric adversarial attack on a 3D mesh autoencoder. |
Tomer Stolik; Itai Lang; Shai Avidan; |
1112 | All in Tokens: Unifying Output Space of Visual Tasks Via Soft Token Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce AiT, a unified output representation for various vision tasks, which is a crucial step towards general-purpose vision task solvers. |
Jia Ning; Chen Li; Zheng Zhang; Chunyu Wang; Zigang Geng; Qi Dai; Kun He; Han Hu; |
1113 | Learning Navigational Visual Representations with Semantic Map Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the behavior that human naturally build semantically and spatially meaningful cognitive maps in their brain during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent’s egocentric views and semantic maps (Ego^2-Map). |
Yicong Hong; Yang Zhou; Ruiyi Zhang; Franck Dernoncourt; Trung Bui; Stephen Gould; Hao Tan; |
1114 | LDL: Line Distance Functions for Panoramic Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LDL, a fast and robust algorithm that localizes a panorama to a 3D map using line segments. |
Junho Kim; Changwoon Choi; Hojun Jang; Young Min Kim; |
1115 | TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a transferable Transformer-based image compression framework, termed TransTIC. |
Yi-Hsin Chen; Ying-Chieh Weng; Chia-Hao Kao; Cheng Chien; Wei-Chen Chiu; Wen-Hsiao Peng; |
1116 | CHORUS : Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D in a self-supervised way. |
Sookwan Han; Hanbyul Joo; |
1117 | Shortcut-V2V: Compression Framework for Video-to-Video Translation Based on Temporal Redundancy Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, we present Shortcut-V2V, a general-purpose compression framework for video-to-video translation. |
Chaeyeon Chung; Yeojeong Park; Seunghwan Choi; Munkhsoyol Ganbat; Jaegul Choo; |
1118 | ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The first key contribution of this work is to demonstrate through systematic evaluations that as the pairwise complexity of the training dataset increases, standard VLMs struggle to learn region-attribute relationships, exhibiting performance degradations of up to 37% on retrieval tasks. |
Maya Varma; Jean-Benoit Delbrouck; Sarah Hooper; Akshay Chaudhari; Curtis Langlotz; |
1119 | SG-Former: Self-guided Transformer with Evolving Token Reallocation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel model, termed as Self-guided Transformer (SG-Former), towards effective global self-attention with adaptive fine granularity. |
Sucheng Ren; Xingyi Yang; Songhua Liu; Xinchao Wang; |
1120 | Towards Unifying Medical Vision-and-Language Pre-Training Via Soft Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The former is superior at multi-modal tasks owing to the sufficient interaction between modalities; the latter is good at uni-modal and cross-modal tasks due to the single-modality encoding ability. To take advantage of these two types, we propose an effective yet straightforward scheme named PTUnifier to unify the two types. |
Zhihong Chen; Shizhe Diao; Benyou Wang; Guanbin Li; Xiang Wan; |
1121 | A Large-scale Study of Spatiotemporal Representation Learning with A New Benchmark on Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To comprehensively probe the effectiveness of spatiotemporal representation learning, we introduce BEAR, a new BEnchmark on video Action Recognition. |
Andong Deng; Taojiannan Yang; Chen Chen; |
1122 | Video Background Music Generation: Dataset, Method and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation. |
Le Zhuo; Zhaokai Wang; Baisen Wang; Yue Liao; Chenxi Bao; Stanley Peng; Songhao Han; Aixi Zhang; Fei Fang; Si Liu; |
1123 | HoloFusion: Towards Photo-realistic 3D Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects with potential structural defects and lacking either view consistency or realism. We present HoloFusion, a method that combines the best of these approaches to produce high-fidelity, plausible, and diverse 3D samples while learning from a collection of multi-view 2D images only. |
Animesh Karnewar; Niloy J. Mitra; Andrea Vedaldi; David Novotny; |
1124 | ProtoTransfer: Cross-Modal Prototype Transfer for Point Cloud Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach, named ProtoTransfer, which not only fully exploits image representations but also transfers the learned multi-modal knowledge to all point cloud features. |
Pin Tang; Hai-Ming Xu; Chao Ma; |
1125 | Improving Continuous Sign Language Recognition with Cross-Lingual Signs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the feasibility of utilizing multilingual sign language corpora to facilitate monolingual CSLR. |
Fangyun Wei; Yutong Chen; |
1126 | Markov Game Video Augmentation for Action Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses data augmentation for action segmentation. Our key novelty is that we augment the original training videos in the deep feature space, not in the visual spatiotemporal domain as done by previous work. |
Nicolas Aziere; Sinisa Todorovic; |
1127 | Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that using global information to guide foreground feature transformation could achieve significant improvement. |
Li Niu; Linfeng Tan; Xinhao Tao; Junyan Cao; Fengjun Guo; Teng Long; Liqing Zhang; |
1128 | TransIFF: An Instance-Level Feature Fusion Framework for Vehicle-Infrastructure Cooperative 3D Detection with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TransIFF, an instance-level feature fusion framework with transformers that can effectively reduce bandwidth usage. |
Ziming Chen; Yifeng Shi; Jinrang Jia; |
1129 | RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, a projection-aware hierarchical transformer is proposed to capture long-range dependencies and filter outliers by extracting point features globally. |
Jiuming Liu; Guangming Wang; Zhe Liu; Chaokang Jiang; Marc Pollefeys; Hesheng Wang; |
1130 | Masked Retraining Teacher-Student Framework for Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though great success, they suffer from the limited number of pseudo boxes with incorrect predictions caused by the domain shift, misleading the student model to get sub-optimal results. To mitigate this problem, we propose Masked Retraining Teacher-student framework (MRT) which leverages masked autoencoder and selective retraining mechanism on detection transformer. |
Zijing Zhao; Sitong Wei; Qingchao Chen; Dehui Li; Yifan Yang; Yuxin Peng; Yang Liu; |
1131 | Prune Spatio-temporal Tokens By Semantic-aware Temporal Accumulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To optimize the speed-accuracy trade-off, we propose Semantic-aware Temporal Accumulation score (STA) to prune spatio-temporal tokens integrally. |
Shuangrui Ding; Peisen Zhao; Xiaopeng Zhang; Rui Qian; Hongkai Xiong; Qi Tian; |
1132 | VQ3D: Learning A 3D-Aware Generative Model on ImageNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder into a two-stage vector-quantized autoencoder. |
Kyle Sargent; Jing Yu Koh; Han Zhang; Huiwen Chang; Charles Herrmann; Pratul Srinivasan; Jiajun Wu; Deqing Sun; |
1133 | Growing A Brain with Sparsity-Inducing Generation for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, using the fixed old knowledge might act as an obstacle to capturing novel representations. To overcome this limitation, we propose a framework that evolves the previously allocated parameters by absorbing the knowledge of the new task. |
Hyundong Jin; Gyeong-hyeon Kim; Chanho Ahn; Eunwoo Kim; |
1134 | Cross-Ray Neural Radiance Fields for Novel-View Synthesis from Unconstrained Image Collections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mimic the perception process of humans, in this paper, we propose Cross-Ray NeRF (CR-NeRF) that leverages interactive information across multiple rays to synthesize occlusion-free novel views with the same appearances as the images. |
Yifan Yang; Shuhai Zhang; Zixiong Huang; Yubing Zhang; Mingkui Tan; |
1135 | Graphics2RAW: Mapping Computer Graphics Images to Sensor RAW Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a framework to process graphics images to mimic RAW sensor images accurately. |
Donghwan Seo; Abhijith Punnappurath; Luxi Zhao; Abdelrahman Abdelhamed; Sai Kiran Tedla; Sanguk Park ; Jihwan Choe; Michael S. Brown; |
1136 | SPACE: Speech-driven Portrait Animation with Controllable Expression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present SPACE, which uses speech and a single image to generate high-resolution, and expressive videos with realistic head pose, without requiring a driving video. |
Siddharth Gururani; Arun Mallya; Ting-Chun Wang; Rafael Valle; Ming-Yu Liu; |
1137 | 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. |
Cheng-Kun Yang; Min-Hung Chen; Yung-Yu Chuang; Yen-Yu Lin; |
1138 | Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer By Permuting Textures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Pose Transfer by Permuting Textures, a self-driven human pose transfer approach that disentangles pose from texture at the patch-level. |
Nannan Li; Kevin J Shih; Bryan A. Plummer; |
1139 | VAD: Vectorized Scene Representation for Efficient Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as a fully vectorized representation. |
Bo Jiang; Shaoyu Chen; Qing Xu; Bencheng Liao; Jiajie Chen; Helong Zhou; Qian Zhang; Wenyu Liu; Chang Huang; Xinggang Wang; |
1140 | End-to-end 3D Tracking with Decoupled Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present an end-to-end framework for camera-based 3D multi-object tracking, called DQTrack. |
Yanwei Li; Zhiding Yu; Jonah Philion; Anima Anandkumar; Sanja Fidler; Jiaya Jia; Jose Alvarez; |
1141 | Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The images and sounds that we perceive undergo subtle but geometrically consistent changes as we rotate our heads. In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources. |
Ziyang Chen; Shengyi Qian; Andrew Owens; |
1142 | Batch-based Model Registration for Fast 3D Sherd Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to develop a portable, high-throughput, and accurate reconstruction system for efficient digitization of fragments excavated in archaeological sites. |
Jiepeng Wang; Congyi Zhang; Peng Wang; Xin Li; Peter J. Cobb; Christian Theobalt; Wenping Wang; |
1143 | HiFace: High-Fidelity 3D Face Reconstruction By Learning Static and Dynamic Details Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim at high-fidelity 3D face reconstruction and propose HiFace to explicitly model the static and dynamic details. |
Zenghao Chai; Tianke Zhang; Tianyu He; Xu Tan; Tadas Baltrusaitis; HsiangTao Wu; Runnan Li; Sheng Zhao; Chun Yuan; Jiang Bian; |
1144 | Fast and Accurate Transferability Measurement By Evaluating Intra-class Feature Variance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TMI (TRANSFERABILITY MEASUREMENT WITH INTRA-CLASS FEATURE VARIANCE), a fast and accurate algorithm to measure transferability. |
Huiwen Xu; U Kang; |
1145 | Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Reconstructing 3D human heads in low-view settings presents technical challenges, mainly due to the pronounced risk of overfitting with limited views and high-frequency signals. To address this, we propose geometry decomposition and adopt a two-stage, coarse-to-fine training strategy, allowing for progressively capturing high-frequency geometric details. |
Baixin Xu; Jiarui Zhang; Kwan-Yee Lin; Chen Qian; Ying He; |
1146 | Algebraically Rigorous Quaternion Framework for The Neural Network Pose Estimation Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a comprehensive theoretical context, using the quaternion adjugate, to confirm and establish the necessity of replacing single-valued quaternion functions by quaternions treated in the extended domain of multiple-charted manifolds. |
Chen Lin; Andrew J. Hanson; Sonya M. Hanson; |
1147 | Prompt Tuning Inversion for Text-driven Image Editing Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing. |
Wenkai Dong; Song Xue; Xiaoyue Duan; Shumin Han; |
1148 | CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Cross-View Synthesis Transformer (CVSformer), which consists of Multi-View Feature Synthesis and Cross-View Transformer for learning cross-view object relationships. |
Haotian Dong; Enhui Ma; Lubo Wang; Miaohui Wang; Wuyuan Xie; Qing Guo; Ping Li; Lingyu Liang; Kairui Yang; Di Lin; |
1149 | UrbanGIRAFFE: Representing Urban Scenes As Compositional Generative Neural Feature Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the fact that rapid progress has been made in 3D-aware generative models, most existing methods focus on object-centric images and are not applicable to generating urban scenes for free camera viewpoint control and scene editing. To address this challenging task, we propose UrbanGIRAFFE, which uses a coarse 3D panoptic prior, including the layout distribution of uncountable stuff and countable objects, to provide semantic and geometric prior. |
Yuanbo Yang; Yifei Yang; Hanlei Guo; Rong Xiong; Yue Wang; Yiyi Liao; |
1150 | UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, multi-source data inherently a) contains different parts that do not spatially align into a coherent human, and b) comes with different scales. To tackle these challenges, we propose an end-to-end framework, UnitedHuman, that empowers continuous GAN with the ability to effectively utilize multi-source data for high-resolution human generation. |
Jianglin Fu; Shikai Li; Yuming Jiang; Kwan-Yee Lin; Wayne Wu; Ziwei Liu; |
1151 | Active Neural Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the weight space of the continually-learned neural field, and show empirically that the neural variability, the prediction robustness against random weight perturbation, can be directly utilized to measure the instant uncertainty of the neural map. |
Zike Yan; Haoxiang Yang; Hongbin Zha; |
1152 | Density-invariant Features for Distant Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Group-wise Contrastive Learning (GCL) scheme to extract density-invariant geometric features to register distant outdoor LiDAR point clouds. |
Quan Liu; Hongzi Zhu; Yunsong Zhou; Hongyang Li; Shan Chang; Minyi Guo; |
1153 | UniverSeg: Universal Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present UniverSeg, a method for solving unseen medical segmentation tasks without additional training. |
Victor Ion Butoi; Jose Javier Gonzalez Ortiz; Tianyu Ma; Mert R. Sabuncu; John Guttag; Adrian V. Dalca; |
1154 | RecRecNet: Rectangling Rectified Wide-Angle Images By Thin-Plate Spline Model and DoF-based Curriculum Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore constructing a win-win representation on both content and boundary by contributing a new learning model, i.e., Rectangling Rectification Network (RecRecNet). |
Kang Liao; Lang Nie; Chunyu Lin; Zishuo Zheng; Yao Zhao; |
1155 | Neural Microfacet Fields for Inverse Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Neural Microfacet Fields, a method for recovering materials, geometry (volumetric density), and environmental illumination from a collection of images of a scene. |
Alexander Mai; Dor Verbin; Falko Kuester; Sara Fridovich-Keil; |
1156 | Understanding Self-attention Mechanism Via Dynamical System Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Similar to the adaptive step-size method which is effective in solving stiff ODEs, we show that the SAM is also a stiffness-aware step size adaptor that can enhance the model’s representational ability to measure intrinsic SP by refining the estimation of stiffness information and generating adaptive attention values, which provides a new understanding about why and how the SAM can benefit the model performance. |
Zhongzhan Huang; Mingfu Liang; Jinghui Qin; Shanshan Zhong; Liang Lin; |
1157 | Learning Versatile 3D Shape Generation with Improved Auto-regressive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While this approach has been extended to the 3D domain for powerful shape generation, it still has two limitations: expensive computations on volumetric grids and ambiguous auto-regressive order along grid dimensions. To overcome these limitations, we propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids. |
Simian Luo; Xuelin Qian; Yanwei Fu; Yinda Zhang; Ying Tai; Zhenyu Zhang; Chengjie Wang; Xiangyang Xue; |
1158 | DETA: Denoised Task Adaptation for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In fact, with only a handful of samples available, the adverse effect of either the image noise (a.k.a. X-noise) or the label noise (a.k.a. Y-noise) from support samples can be severely amplified. To address this challenge, in this work we propose DEnoised Task Adaptation (DETA), a first, unified image- and label-denoising framework orthogonal to existing task adaptation approaches. |
Ji Zhang; Lianli Gao; Xu Luo; Hengtao Shen; Jingkuan Song; |
1159 | DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, all of them omit that ambiguous snippets deliver contradictory information, which would reduce the discriminability of linked snippets. Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations. |
Xiaojun Tang; Junsong Fan; Chuanchen Luo; Zhaoxiang Zhang; Man Zhang; Zongyuan Yang; |
1160 | Diffusion Models As Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. |
Chen Wei; Karttikeya Mangalam; Po-Yao Huang; Yanghao Li; Haoqi Fan; Hu Xu; Huiyu Wang; Cihang Xie; Alan Yuille; Christoph Feichtenhofer; |
1161 | Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. |
Fabien Delattre; David Dirnfeld; Phat Nguyen; Stephen K Scarano; Michael J Jones; Pedro Miraldo; Erik Learned-Miller; |
1162 | Bayesian Prompt Learning for Image-Language Model Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By leveraging the regularization ability of Bayesian methods, we frame prompt learning from the Bayesian perspective and formulate it as a variational inference problem. |
Mohammad Mahdi Derakhshani; Enrique Sanchez; Adrian Bulat; Victor G. Turrisi da Costa; Cees G.M. Snoek; Georgios Tzimiropoulos; Brais Martinez; |
1163 | One-Shot Recognition of Any Material Anywhere Using Contrastive Learning with Physics-Based Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current image recognition methods are limited to specific classes and properties and can’t handle the vast number of material states in the world. To address this, we present MatSim: the first dataset and benchmark for computer vision-based recognition of similarities and transitions between materials and textures, focusing on identifying any material under any conditions using one or a few examples. |
Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik; |
1164 | DiLiGenT-Pi: Photometric Stereo for Planar Surfaces with Rich Details – Benchmark Dataset and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As shape detail recovery is the key strength of photometric stereo over other 3D reconstruction techniques, and the near-planar surfaces widely exist in cultural relics and manufacturing workpieces, we present a new real-world dataset DiLiGenT-Pi containing 30 near-planar scenes with rich surface details. |
Feishi Wang; Jieji Ren; Heng Guo; Mingjun Ren; Boxin Shi; |
1165 | Rethinking Data Distillation: Do Not Overlook Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that distilled data lead to networks that are not calibratable due to (i) a more concentrated distribution of the maximum logits and (ii) the loss of information that is semantically meaningful but unrelated to classification tasks. |
Dongyao Zhu; Bowen Lei; Jie Zhang; Yanbo Fang; Yiqun Xie; Ruqi Zhang; Dongkuan Xu; |
1166 | Accurate and Fast Compressed Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple yet effective end-to-end transformer in the compressed domain for video captioning that enables learning from the compressed video for captioning. |
Yaojie Shen; Xin Gu; Kai Xu; Heng Fan; Longyin Wen; Libo Zhang; |
1167 | Building Vision Transformers with Hierarchy Aware Feature Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to the problem that the semantic information of the grid area of image becomes confused after feature aggregation, making it difficult for attention to accurately model global relationships. To address this, we propose the Hierarchy Aware Feature Aggregation framework (HAFA). |
Yongjie Chen; Hongmin Liu; Haoran Yin; Bin Fan; |
1168 | Visible-Infrared Person Re-Identification Via Semantic Alignment and Affinity Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Semantic Alignment and Affinity Inference framework (SAAI), which aims to align latent semantic part features with the learnable prototypes and improve inference with affinity information. |
Xingye Fang; Yang Yang; Ying Fu; |
1169 | SAL-ViT: Towards Latency Efficient Private Inference on ViT Using Selective Attention Search with A Learnable Softmax Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present SAL-ViT with two novel techniques to boost PI efficiency on ViTs. |
Yuke Zhang; Dake Chen; Souvik Kundu; Chenghao Li; Peter A. Beerel; |
1170 | TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a Multimodal Backdoor defense technique TIJO (Trigger Inversion using Joint Optimization). |
Indranil Sur; Karan Sikka; Matthew Walmer; Kaushik Koneripalli; Anirban Roy; Xiao Lin; Ajay Divakaran; Susmit Jha; |
1171 | DG3D: Generating High Quality 3D Textured Shapes By Learning to Discriminate Multi-Modal Diffusion-Renderings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a Diffusion-augmented Generative model to generate high-fidelity 3D textured meshes that can be directly used in modern graphics engines. |
Qi Zuo; Yafei Song; Jianfang Li; Lin Liu; Liefeng Bo; |
1172 | Improving Adversarial Robustness of Masked Autoencoders Via Test-time Frequency-domain Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the adversarial robustness of vision transformers that are equipped with BERT pretraining (e.g., BEiT, MAE). |
Qidong Huang; Xiaoyi Dong; Dongdong Chen; Yinpeng Chen; Lu Yuan; Gang Hua; Weiming Zhang; Nenghai Yu; |
1173 | HairCLIPv2: Unifying Hair Editing Via Proxy Feature Blending Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HairCLIPv2, aiming to support all the aforementioned interactions with one unified framework. |
Tianyi Wei; Dongdong Chen; Wenbo Zhou; Jing Liao; Weiming Zhang; Gang Hua; Nenghai Yu; |
1174 | VLSlice: Interactive Vision-and-Language Slice Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents VLSlice, an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior, denoted as vision-and-language slices, from unlabeled image sets. |
Eric Slyman; Minsuk Kahng; Stefan Lee; |
1175 | Learning to Ground Instructional Articles in Videos Through Narrations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an approach for localizing steps of procedural activities in narrated how-to videos. |
Effrosyni Mavroudi; Triantafyllos Afouras; Lorenzo Torresani; |
1176 | DocTr: Document Transformer for Structured Information Extraction in Documents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new formulation for structured information extraction (SIE) from visually rich documents. |
Haofu Liao; Aruni RoyChowdhury; Weijian Li; Ankan Bansal; Yuting Zhang; Zhuowen Tu; Ravi Kumar Satzoda; R. Manmatha; Vijay Mahadevan; |
1177 | The Making and Breaking of Camouflage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not all camouflages are equally effective, as even a partially visible contour or a slight color difference can make the animal stand out and break its camouflage. In this paper, we address the question of what makes a camouflage successful, by proposing three scores for automatically assessing its effectiveness. |
Hala Lamdouar; Weidi Xie; Andrew Zisserman; |
1178 | Role-Aware Interaction Generation from Textual Description Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a model that learns to generate motions of the designated role, which together form a mutually consistent interaction. |
Mikihiro Tanaka; KentFujiwara; |
1179 | MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MAMo, a novel memory and attention framework for monocular video depth estimation. |
Rajeev Yasarla; Hong Cai; Jisoo Jeong; Yunxiao Shi; Risheek Garrepalli; Fatih Porikli; |
1180 | Continual Learning for Personalized Co-speech Gesture Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Also, prior generative continual learning works are not multimodal, making this space less studied. In this paper, we explore this new paradigm and propose C-DiffGAN: an approach that continually learns new speaker gesture styles with only a few minutes of per-speaker data, while retaining previously learnt styles. |
Chaitanya Ahuja; Pratik Joshi; Ryo Ishii; Louis-Philippe Morency; |
1181 | Object As Query: Lifting Any 2D Object Detector to 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design Multi-View 2D Objects guided 3D Object Detector (MV2D), which can lift any 2D object detector to multi-view 3D object detection. |
Zitian Wang; Zehao Huang; Jiahui Fu; Naiyan Wang; Si Liu; |
1182 | HDG-ODE: A Hierarchical Continuous-Time Model for Human Pose Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new framework, Hierarchical Dynamic Graph Ordinary Differential Equation (HDG-ODE), to tackle the 3D pose forecasting task from 2D skeleton representations in videos. |
Yucheng Xing; Xin Wang; |
1183 | Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we expand the existing single-flow diffusion pipeline into a multi-task multimodal network, dubbed Versatile Diffusion (VD), that handles multiple flows of text-to-image, image-to-text, and variations in one unified model. |
Xingqian Xu; Zhangyang Wang; Gong Zhang; Kai Wang; Humphrey Shi; |
1184 | DreamTeacher: Pretraining Image Backbones with Deep Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. |
Daiqing Li; Huan Ling; Amlan Kar; David Acuna; Seung Wook Kim; Karsten Kreis; Antonio Torralba; Sanja Fidler; |
1185 | Decomposition-Based Variational Network for Multi-Contrast MRI Super-Resolution and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing deep learning-based methods usually manually design fusion rules to aggregate the multi-contrast images, fail to model their correlations accurately and lack certain interpretations. Against these issues, we propose a multi-contrast variational network (MC-VarNet) to explicitly model the relationship of multi-contrast images. |
Pengcheng Lei; Faming Fang; Guixu Zhang; Tieyong Zeng; |
1186 | Self-supervised Monocular Underwater Depth Recovery, Image Restoration, and A Real-sea Video Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To obtain improved estimates of depth from a single UW image, we propose a deep learning (DL) method that utilizes both haze and geometry during training. |
Nisha Varghese; Ashish Kumar; A. N. Rajagopalan; |
1187 | Geometrized Transformer for Self-Supervised Homography Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For homography estimation, we propose Geometrized Transformer (GeoFormer), a new detector-free feature matching method. |
Jiazhen Liu; Xirong Li; |
1188 | Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We draw inspiration from the density field representation used in volumetric neural rendering and propose a new approach, called Sat2Density. |
Ming Qian; Jincheng Xiong; Gui-Song Xia; Nan Xue; |
1189 | TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent advances in spatial light modulator (SLM) technology, this paper answers a natural question: Can one encode additional information and achieve superior performance by changing a phase mask dynamically over time? |
Sachin Shah; Sakshum Kulshrestha; Christopher A. Metzler; |
1190 | Expressive Text-to-Image Generation with Rich Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations. |
Songwei Ge; Taesung Park; Jun-Yan Zhu; Jia-Bin Huang; |
1191 | Learning Fine-Grained Features for Pixel-Wise Video Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by optical flows as well as the self-supervised feature learning, we propose to use not only labeled synthetic videos but also unlabeled real-world videos for learning fine-grained representations in a holistic framework. |
Rui Li; Shenglong Zhou; Dong Liu; |
1192 | FS-DETR: Few-Shot DEtection TRansformer with Prompting and Without Re-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards satisfying (a)-(c), in this work, we make the following contributions: We introduce, for the first time, a simple, yet powerful, few-shot detection transformer (FS-DETR) based on visual prompting that can address both desiderata (a) and (b). |
Adrian Bulat; Ricardo Guerrero; Brais Martinez; Georgios Tzimiropoulos; |
1193 | Semi-supervised Speech-driven 3D Facial Animation Via Cross-modal Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paradigm faces two major challenges: the high cost of supervision acquisition, and the ambiguity in mapping between speech and lip movements. To address these challenges, this study proposes a novel cross-modal semi-supervised framework, comprising a Speech-to-Image Transcoder and a Face-to-Geometry Regressor. |
Peiji Yang; Huawei Wei; Yicheng Zhong; Zhisheng Wang; |
1194 | Learning to Learn: How to Continuously Teach Humans and Machines Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an initial step towards automated curriculum design for online class-incremental learning, we propose a novel algorithm, dubbed Curriculum Designer (CD), that designs and ranks curricula based on inter-class feature similarities. |
Parantak Singh; You Li; Ankur Sikarwar; Stan Weixian Lei; Difei Gao; Morgan B. Talbot; Ying Sun; Mike Zheng Shou; Gabriel Kreiman; Mengmi Zhang; |
1195 | Text-Driven Generative Domain Adaptation with Spectral Consistency Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current methods still suffer from overfitting and the mode collapse problem. In this paper, we analyze the mode collapse from the geometric point of view and reveal its relationship to the Hessian matrix of generator. |
Zhenhuan Liu; Liang Li; Jiayu Xiao; Zheng-Jun Zha; Qingming Huang; |
1196 | A 5-Point Minimal Solver for Event Camera Relative Motion Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We solve this problem by deriving the correct non-linear parametrization of such manifolds, which we term eventails, and demonstrate its application to event-based linear motion estimation, with known rotation from an Inertial Measurement Unit. Using this parametrization, we introduce a novel minimal 5-point solver that jointly estimates line parameters and linear camera velocity projections, which can be fused into a single, averaged linear velocity when considering multiple lines. |
Ling Gao; Hang Su; Daniel Gehrig; Marco Cannici; Davide Scaramuzza; Laurent Kneip; |
1197 | TM2D: Bimodality Driven 3D Dance Generation Via Music-Text Integration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. |
Kehong Gong; Dongze Lian; Heng Chang; Chuan Guo; Zihang Jiang; Xinxin Zuo; Michael Bi Mi; Xinchao Wang; |
1198 | Bootstrap Motion Forecasting With Self-Consistent Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework to bootstrap Motion forecasting with Self-consistent Constraints (MISC). |
Maosheng Ye; Jiamiao Xu; Xunnong Xu; Tengfei Wang; Tongyi Cao; Qifeng Chen; |
1199 | CDAC: Cross-domain Attention Consistency in Transformer for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Cross-Domain Attention Consistency (CDAC), to perform adaptation on attention maps using cross-domain attention layers that share features between source and target domains. |
Kaihong Wang; Donghyun Kim; Rogerio Feris; Margrit Betke; |
1200 | WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it usually suffers from poor scalability as requiring densely sampled images for each new scene. Several studies have attempted to mitigate this problem by integrating Multi-View Stereo (MVS) technique into NeRF while they still entail a cumbersome fine-tuning process for new scenes. |
Muyu Xu; Fangneng Zhan; Jiahui Zhang; Yingchen Yu; Xiaoqin Zhang; Christian Theobalt; Ling Shao; Shijian Lu; |
1201 | LoCUS: Learning Multiscale 3D-consistent Features from Posed Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We start from the idea that the training objective can be framed as a patch retrieval problem: given an image patch in one view of a scene, we would like to retrieve (with high precision and recall) all patches in other views that map to the same real-world location. |
Dominik A. Kloepfer; Dylan Campbell; João F. Henriques; |
1202 | Neural Reconstruction of Relightable Human Model from Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel self-supervised framework that takes a monocular video of a moving human as input and generates a 3D neural representation capable of being rendered with novel poses under arbitrary lighting conditions. |
Wenzhang Sun; Yunlong Che; Han Huang; Yandong Guo; |
1203 | FB-BEV: BEV Representation from Forward-Backward View Transformations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Backward projection, with BEVFormer being an example, tends to generate false-positive BEV features from incorrect projections due to the lack of utilization on depth. To address the above limitations, we propose a novel forward-backward view transformation module. |
Zhiqi Li; Zhiding Yu; Wenhai Wang; Anima Anandkumar; Tong Lu; Jose M. Alvarez; |
1204 | BoxSnake: Polygonal Instance Segmentation with Box Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new end-to-end training technique, termed BoxSnake, to achieve effective polygonal instance segmentation using only box annotations for the first time. |
Rui Yang; Lin Song; Yixiao Ge; Xiu Li; |
1205 | Confidence-based Visual Dispersal for Few-shot Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle both deficiencies, in this paper, we propose a novel Confidence-based Visual Dispersal Transfer learning method (C-VisDiT) for FUDA. |
Yizhe Xiong; Hui Chen; Zijia Lin; Sicheng Zhao; Guiguang Ding; |
1206 | Event-Guided Procedure Planning from Instructional Videos with Text Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual state. |
An-Lan Wang; Kun-Yu Lin; Jia-Run Du; Jingke Meng; Wei-Shi Zheng; |
1207 | Foreground Object Search By Distilling Composite Image Feature Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel FOS method via distilling composite feature (DiscoFOS). |
Bo Zhang; Jiacheng Sui; Li Niu; |
1208 | Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel generative model for video anomaly detection (VAD), which assumes that both normality and abnormality are multimodal. |
Alessandro Flaborea; Luca Collorone; Guido Maria D’Amely di Melendugno; Stefano D’Arrigo; Bardh Prenkaj; Fabio Galasso; |
1209 | ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a novel NeRF-editing procedure that can fuse physical simulations with NeRF models of scenes, producing realistic movies of physical phenomena in those scenes. |
Yuan Li; Zhi-Hao Lin; David Forsyth; Jia-Bin Huang; Shenlong Wang; |
1210 | CDFSL-V: Cross-Domain Few-Shot Learning for Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a progressive curriculum to encourage the emergence of rich features in the target domain based on class discriminative supervised features in the source domain. |
Sarinda Samarasinghe; Mamshad Nayeem Rizve; Navid Kardan; Mubarak Shah; |
1211 | Generalized Few-Shot Point Cloud Segmentation Via Geometric Words Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This largely motivates us to present the first attempt at a more practical paradigm of generalized few-shot point cloud segmentation, which requires the model to generalize to new categories with only a few support point clouds and simultaneously retain the capability to segment base classes. We propose the geometric words to represent geometric components shared between the base and novel classes, and incorporate them into a novel geometric-aware semantic representation to facilitate better generalization to the new classes without forgetting the old ones. |
Yating Xu; Conghui Hu; Na Zhao; Gim Hee Lee; |
1212 | Monte Carlo Linear Clustering with Single-Point Supervision Is Enough for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The core idea of this work is to recover the per-pixel mask of each target from the given single point label by using clustering approaches, which looks simple but is indeed challenging since targets are always insalient and accompanied with background clutters. |
Boyang Li; Yingqian Wang; Longguang Wang; Fei Zhang; Ting Liu; Zaiping Lin; Wei An; Yulan Guo; |
1213 | Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a simple baseline strategy by thresholding the cosine similarity between text and image features of a target point, and propose further enhancing the baseline by aggregating cosine similarity across transformations of the target. |
Myeongseob Ko; Ming Jin; Chenguang Wang; Ruoxi Jia; |
1214 | TCOVIS: Temporally Consistent Online Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel online method for video instance segmentation, called TCOVIS, which fully exploits the temporal information in a video clip. |
Junlong Li; Bingyao Yu; Yongming Rao; Jie Zhou; Jiwen Lu; |
1215 | Towards Viewpoint Robustness in Bird’s Eye View Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, only a small number of rig variations exist across most fleets. In this paper, we study how AV perception models are affected by changes in camera viewpoint and propose a way to scale them across vehicle types without repeated data collection and labeling. |
Tzofi Klinghoffer; Jonah Philion; Wenzheng Chen; Or Litany; Zan Gojcic; Jungseock Joo; Ramesh Raskar; Sanja Fidler; Jose M. Alvarez; |
1216 | Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel generative model capable of producing a sequence of photorealistic images consistent with a specified camera trajectory, and a single starting image. |
Jason J. Yu; Fereshteh Forghani; Konstantinos G. Derpanis; Marcus A. Brubaker; |
1217 | What Can A Cook in Italy Teach A Mechanic in India? Action Recognition Generalisation Over Scenarios and Locations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose CIR, a method to represent each video as a Cross-Instance Reconstruction of videos from other domains. |
Chiara Plizzari; Toby Perrett; Barbara Caputo; Dima Damen; |
1218 | EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present EMDB, the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild. |
Manuel Kaufmann; Jie Song; Chen Guo; Kaiyue Shen; Tianjian Jiang; Chengcheng Tang; Juan José Zárate; Otmar Hilliges; |
1219 | STEERER: Resolving Scale Variations for Counting and Localization Via Selective Inheritance Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method termed STEERER (SelecTivE inhERitance lEaRning) that addresses the issue of scale variations in object counting. |
Tao Han; Lei Bai; Lingbo Liu; Wanli Ouyang; |
1220 | Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an experimental method for measuring bias in face recognition systems. |
Hao Liang; Pietro Perona; Guha Balakrishnan; |
1221 | Spatial-Aware Token for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing transformer-based methods synthesize the classification feature maps as the localization map, which leads to optimization conflicts between classification and localization tasks. To address this problem, we propose to learn a task-specific spatial-aware token (SAT) to condition localization in a weakly supervised manner. |
Pingyu Wu; Wei Zhai; Yang Cao; Jiebo Luo; Zheng-Jun Zha; |
1222 | Harnessing The Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new text-to-image algorithm that adds explicit control over spatial-temporal cross-attention in diffusion models. |
Qiucheng Wu; Yujian Liu; Handong Zhao; Trung Bui; Zhe Lin; Yang Zhang; Shiyu Chang; |
1223 | GraphAlign: Enhancing Accurate Feature Alignment By Graph Matching for Multi-Modal 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present GraphAlign, a more accurate feature alignment strategy for 3D object detection by graph matching. |
Ziying Song; Haiyue Wei; Lin Bai; Lei Yang; Caiyan Jia; |
1224 | Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for weakly-supervised action segmentation and unseen error detection in anomalous instructional videos. |
Reza Ghoddoosian; Isht Dwivedi; Nakul Agarwal; Behzad Dariush; |
1225 | NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose NEMTO, the first end-to-end neural rendering pipeline to model 3D transparent objects with complex geometry and unknown indices of refraction. |
Dongqing Wang; Tong Zhang; Sabine Süsstrunk; |
1226 | Geometric Viewpoint Learning with Hyper-Rays and Harmonics Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the first deep-learning framework for the viewpoint modality. |
Zhixiang Min; Juan Carlos Dibene; Enrique Dunn; |
1227 | C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and Generalizable Neural Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel integration scheme that combines the multi-view stereo with neural signed distance function representations, which potentially overcomes the limitations of both methods. |
Luoyuan Xu; Tao Guan; Yuesong Wang; Wenkai Liu; Zhaojie Zeng; Junle Wang; Wei Yang; |
1228 | Mesh2Tex: Generating Mesh Textures from Image Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Mesh2Tex, which learns a realistic object texture manifold from uncorrelated collections of 3D object geometry and photorealistic RGB images, by leveraging a hybrid mesh-neural-field texture representation. |
Alexey Bokhovkin; Shubham Tulsiani; Angela Dai; |
1229 | USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Unified optimization paradigm for Seed Area GEneration (USAGE) for both types of networks, in which the objective function to be optimized consists of two terms: One is a generation loss, which controls the shape of seed areas by a temperature parameter following a deterministic principle for different types of networks; The other is a regularization loss, which ensures the consistency between the seed areas that are generated by self-adaptive network adjustment from different views, to overturn false activation in seed areas. |
Zelin Peng; Guanchun Wang; Lingxi Xie; Dongsheng Jiang; Wei Shen; Qi Tian; |
1230 | NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast neural surface reconstruction approach, called NeuS2, which achieves two orders of magnitude improvement in terms of acceleration without compromising reconstruction quality. |
Yiming Wang; Qin Han; Marc Habermann; Kostas Daniilidis; Christian Theobalt; Lingjie Liu; |
1231 | Deep Feature Deblurring Diffusion for Detecting Out-of-Distribution Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach of Deep Feature Deblurring Diffusion (DFDD), consisting of forward blurring and reverse deblurring processes. |
Aming Wu; Da Chen; Cheng Deng; |
1232 | Fast Full-frame Video Stabilization with Iterative Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the analogy between wobbly frames and jigsaw puzzles, we propose an iterative optimization-based learning approach using synthetic datasets for video stabilization, which consists of two interacting submodules: motion trajectory smoothing and full-frame outpainting. |
Weiyue Zhao; Xin Li; Zhan Peng; Xianrui Luo; Xinyi Ye; Hao Lu; Zhiguo Cao; |
1233 | Gender Artifacts in Visual Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To understand the feasibility and practicality of these approaches, we investigate what gender artifacts exist in large-scale visual datasets. |
Nicole Meister; Dora Zhao; Angelina Wang; Vikram V. Ramaswamy; Ruth Fong; Olga Russakovsky; |
1234 | Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of generalized category discovery (GCD), i.e., given a set of images where part of them are labelled and the rest are not, the task is to automatically cluster the images in the unlabelled data, leveraging the information from the labelled data, while the unlabelled data contain images from the labelled classes and also new ones. |
Bingchen Zhao; Xin Wen; Kai Han; |
1235 | SuS-X: Training-Free Name-Only Transfer of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about downstream tasks comprises the names of downstream target categories. |
Vishaal Udandarao; Ankush Gupta; Samuel Albanie; |
1236 | Rethinking Point Cloud Registration As Masking and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a critical observation is made that the invisible parts of each point cloud can be directly utilized as inherent masks, and the aligned point cloud pair can be regarded as the reconstruction target. |
Guangyan Chen; Meiling Wang; Li Yuan; Yi Yang; Yufeng Yue; |
1237 | Beating Backdoor Attack at Its Own Game Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. |
Min Liu; Alberto Sangiovanni-Vincentelli; Xiangyu Yue; |
1238 | Introducing Language Guidance in Prompt-based Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Language Guidance for Prompt-based Continual Learning (LGCL) as a plug-in for prompt-based methods. |
Muhammad Gul Zain Ali Khan; Muhammad Ferjad Naeem; Luc Van Gool; Didier Stricker; Federico Tombari; Muhammad Zeshan Afzal; |
1239 | Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-pretrained 2D model. |
Xuanyu Yi; Jiajun Deng; Qianru Sun; Xian-Sheng Hua; Joo-Hwee Lim; Hanwang Zhang; |
1240 | EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A major challenge in this task is recognizing places seen from different viewpoints. To overcome this limitation, we propose a new method, called EigenPlaces, to train our neural network on images from different point of views, which embeds viewpoint robustness into the learned global descriptors. |
Gabriele Berton; Gabriele Trivigno; Barbara Caputo; Carlo Masone; |
1241 | Do DALL-E and Flamingo Understand Each Other? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An interesting question worth exploring in this domain is whether Flamingo and DALL-E understand each other. To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image. |
Hang Li; Jindong Gu; Rajat Koner; Sahand Sharifzadeh; Volker Tresp; |
1242 | CIRI: Curricular Inactivation for Residue-aware One-shot Video Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we resolve the one-shot video inpainting problem in which only one annotated first frame is provided. |
Weiying Zheng; Cheng Xu; Xuemiao Xu; Wenxi Liu; Shengfeng He; |
1243 | Protoype-based Dataset Comparison Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable dataset comparison we present a module that learns concept-level prototypes across datasets. |
Nanne van Noord; |
1244 | FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a self-supervised curvilinear object segmentation method (FreeCOS) that learns robust and distinctive features from fractals and unlabeled images. |
Tianyi Shi; Xiaohuan Ding; Liang Zhang; Xin Yang; |
1245 | Generating Dynamic Kernels Via Transformers for Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, the kernels resulting from the key locations are sensitive to occlusion and lane intersections. To overcome these limitations, we propose a transformer-based dynamic kernel generation architecture for lane detection. |
Ziye Chen; Yu Liu; Mingming Gong; Bo Du; Guoqi Qian; Kate Smith-Miles; |
1246 | RSFNet: A White-Box Image Retouching Approach Using Region-Specific Color Filters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, colorists typically employ a divide-and-conquer approach, performing a series of region-specific fine-grained enhancements when using traditional tools like Davinci Resolve. We draw on this insight to develop a white-box framework for photo retouching using parallel region-specific filters, called RSFNet. |
Wenqi Ouyang; Yi Dong; Xiaoyang Kang; Peiran Ren; Xin Xu; Xuansong Xie; |
1247 | Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates us to leverage the knowledge from image-based pretraining, despite the obvious gaps between image and video domains. To bridge these gaps, in this paper, we propose Tem-Adapter, which enables the learning of temporal dynamics and complex semantics by a visual Temporal Aligner and a textual Semantic Aligner. |
Guangyi Chen; Xiao Liu; Guangrun Wang; Kun Zhang; Philip H.S. Torr; Xiao-Ping Zhang; Yansong Tang; |
1248 | Boosting Long-tailed Object Detection Via Step-wise Learning on Smooth-tail Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a frustratingly simple but effective step-wise learning framework to gradually enhance the capability of the model in detecting all categories of long-tailed datasets. |
Na Dong; Yongqiang Zhang; Mingli Ding; Gim Hee Lee; |
1249 | Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel framework for one-shot audio-driven talking head generation. |
Zhentao Yu; Zixin Yin; Deyu Zhou; Duomin Wang; Finn Wong; Baoyuan Wang; |
1250 | Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, in more realistic scenarios, only minimal annotations are available for a new scene, which poses significant challenges to existing RVOS methods. With this in mind, we propose a simple yet effective model with a newly designed cross-modal affinity (CMA) module based on a Transformer architecture. |
Guanghui Li; Mingqi Gao; Heng Liu; Xiantong Zhen; Feng Zheng; |
1251 | Human Part-wise 3D Motion Context Learning for Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose P3D, the human part-wise motion context learning framework for sign language recognition. |
Taeryung Lee; Yeonguk Oh; Kyoung Mu Lee; |
1252 | Remembering Normality: Memory-guided Knowledge Distillation for Unsupervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Trained on anomaly-free data, the student still well reconstructs anomalous representations for anomalies and is sensitive to fine patterns in normal data, which also appear in training. To mitigate this issue, we introduce a novel Memory-guided Knowledge-Distillation (MemKD) framework that adaptively modulates the normality of student features in detecting anomalies. |
Zhihao Gu; Liang Liu; Xu Chen; Ran Yi; Jiangning Zhang; Yabiao Wang; Chengjie Wang; Annan Shu; Guannan Jiang; Lizhuang Ma; |
1253 | Coordinate Quantized Neural Implicit Representations for Multi-view Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, high frequency positional encodings make the optimization un- stable, which results in noisy reconstructions and artifacts in empty space. To resolve this issue in a general sense, we introduce to learn neural implicit representations with quantized coordinates, which reduces the uncertainty and ambiguity in the field during optimization. |
Sijia Jiang; Jing Hua; Zhizhong Han; |
1254 | Unleashing The Potential of Spiking Neural Networks with Dynamic Confidence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new methodology to alleviate the fundamental trade-off between accuracy and latency in spiking neural networks (SNNs). |
Chen Li; Edward G Jones; Steve Furber; |
1255 | TeD-SPAD: Temporal Distinctiveness for Self-Supervised Privacy-Preservation for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner. |
Joseph Fioresi; Ishan Rajendrakumar Dave; Mubarak Shah; |
1256 | MAS: Towards Resource-Efficient Federated Multiple-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the first FL system to effectively coordinate and train multiple simultaneous FL tasks. |
Weiming Zhuang; Yonggang Wen; Lingjuan Lyu; Shuai Zhang; |
1257 | Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue of protocol differences between distillation and classification, we propose a novel distillation method with cross-task consistent protocols, tailored for the dense object detection. |
Longrong Yang; Xianpan Zhou; Xuewei Li; Liang Qiao; Zheyang Li; Ziwei Yang; Gaoang Wang; Xi Li; |
1258 | Divide and Conquer: A Two-Step Method for High Quality Face De-identification with Model Explainability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this pattern often leads to a trade-off between privacy and utility, and the latent space remains difficult to explain. To address these issues, we propose IDeudemon, which employs a "divide and conquer" strategy to protect identity and preserve utility step by step while maintaining good explainability. |
Yunqian Wen; Bo Liu; Jingyi Cao; Rong Xie; Li Song; |
1259 | HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Hierarchical Temporal-Aware video-language pre-training framework, HiTeA, with two novel pre-training tasks for yielding temporal-aware multi-modal representation with cross-modal fine-grained temporal moment information and temporal contextual relations between video-text multi-modal pairs. |
Qinghao Ye; Guohai Xu; Ming Yan; Haiyang Xu; Qi Qian; Ji Zhang; Fei Huang; |
1260 | VAPCNet: Viewpoint-Aware 3D Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we thus propose an unsupervised viewpoint representation learning scheme for 3D point cloud completion without explicit viewpoint estimation. |
Zhiheng Fu; Longguang Wang; Lian Xu; Zhiyong Wang; Hamid Laga; Yulan Guo; Farid Boussaid; Mohammed Bennamoun; |
1261 | Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the first study to investigate the adversarial transferability of recent VLP models. |
Dong Lu; Zhiqiang Wang; Teng Wang; Weili Guan; Hongchang Gao; Feng Zheng; |
1262 | AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new approach called AutoSynth, which automatically generates 3D training data for point cloud registration. |
Zheng Dang; Mathieu Salzmann; |
1263 | Multimodal Distillation for Egocentric Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The added complexity of the modality-specific modules, on the other hand, makes these models impractical for deployment. The goal of this work is to retain the performance of such a multimodal approach, while using only the RGB frames as input at inference time. |
Gorjan Radevski; Dusan Grujicic; Matthew Blaschko; Marie-Francine Moens; Tinne Tuytelaars; |
1264 | Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel self-supervised approach to learn neural implicit shape representation for deformable objects, which can represent shapes with a template shape and dense correspondence in 3D. |
Baowen Zhang; Jiahe Li; Xiaoming Deng; Yinda Zhang; Cuixia Ma; Hongan Wang; |
1265 | Perceptual Artifacts Localization for Image Synthesis Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present a comprehensive empirical examination of Perceptual Artifacts Localization (PAL) spanning diverse image synthesis endeavors. |
Lingzhi Zhang; Zhengjie Xu; Connelly Barnes; Yuqian Zhou; Qing Liu; He Zhang; Sohrab Amirghodsi; Zhe Lin; Eli Shechtman; Jianbo Shi; |
1266 | Narrator: Towards Natural Control of Human-Scene Interaction Generation Via Relationship Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we propose Narrator, a novel relationship reasoning-based generative approach using a conditional variation autoencoder for naturally controllable generation given a 3D scene and a textual description. |
Haibiao Xuan; Xiongzheng Li; Jinsong Zhang; Hongwen Zhang; Yebin Liu; Kun Li; |
1267 | Vision Relation Transformer for Unbiased Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, current SGG methods suffer from an information loss regarding the entities’ local-level cues during the relation encoding process. To mitigate this, we introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder. |
Gopika Sudhakaran; Devendra Singh Dhami; Kristian Kersting; Stefan Roth; |
1268 | Scaling Data Generation in Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. |
Zun Wang; Jialu Li; Yicong Hong; Yi Wang; Qi Wu; Mohit Bansal; Stephen Gould; Hao Tan; Yu Qiao; |
1269 | Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide 20,000 non-trivial human annotations on popular datasets as a first step to bridge gap to studying how natural semantic spurious features affect image classification, as prior works often study datasets mixing low-level features due to limitations in accessing realistic datasets. |
Ming-Chang Chiu; Pin-Yu Chen; Xuezhe Ma; |
1270 | 3D Implicit Transporter for Temporally Consistent Keypoint Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the existing 2D and 3D methods for detecting keypoints mainly rely on geometric consistency to achieve spatial alignment, neglecting temporal consistency. To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information. |
Chengliang Zhong; Yuhang Zheng; Yupeng Zheng; Hao Zhao; Li Yi; Xiaodong Mu; Ling Wang; Pengfei Li; Guyue Zhou; Chao Yang; Xinliang Zhang; Jian Zhao; |
1271 | Adaptive Rotated Convolution for Rotated Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Adaptive Rotated Convolution (ARC) module to handle the aforementioned challenges. |
Yifan Pu; Yiru Wang; Zhuofan Xia; Yizeng Han; Yulin Wang; Weihao Gan; Zidong Wang; Shiji Song; Gao Huang; |
1272 | Revisit PCA-based Technique for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, after detailed analysis of the issue in OOD detection by the conventional principal component analysis (PCA), we propose fusing a simple regularized PCA-based reconstruction error with other source of scoring function to further improve OOD detection performance. |
Xiaoyuan Guan; Zhouwu Liu; Wei-Shi Zheng; Yuren Zhou; Ruixuan Wang; |
1273 | Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in An Open World Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Cross-modal prediCate boosting (CaCao) framework, where a visually-prompted language model is learned to generate diverse fine-grained predicates in a low-resource way. |
Qifan Yu; Juncheng Li; Yu Wu; Siliang Tang; Wei Ji; Yueting Zhuang; |
1274 | FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the need for generalized systems that can recognize, locate, and predict a wide array of species and their functional traits, we present FishNet, a large-scale diverse dataset containing 94,532 meticulously organized images from 17,357 aquatic species, organized according to aquatic biological taxonomy (order, family, genus, and species). |
Faizan Farooq Khan; Xiang Li; Andrew J. Temple; Mohamed Elhoseiny; |
1275 | Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the more practical but challenging Partially Relevant Video Retrieval (PRVR) task, which aims to retrieve partially relevant untrimmed videos with the query input. |
Jianfeng Dong; Minsong Zhang; Zheng Zhang; Xianke Chen; Daizong Liu; Xiaoye Qu; Xun Wang; Baolong Liu; |
1276 | UniVTG: Towards Unified Video-Language Temporal Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to Unify the diverse VTG labels and tasks, dubbed UniVTG, along three directions: Firstly, we revisit a wide range of VTG labels and tasks and define a unified formulation. Based on this, we develop data annotation schemes to create scalable pseudo supervision. Secondly, we develop an effective and flexible grounding model capable of addressing each task and making full use of each label. |
Kevin Qinghong Lin; Pengchuan Zhang; Joya Chen; Shraman Pramanick; Difei Gao; Alex Jinpeng Wang; Rui Yan; Mike Zheng Shou; |
1277 | Disposable Transfer Learning for Selective Source Task Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve knowledge disposal, we propose a novel loss named Gradient Collision loss (GC loss). |
Seunghee Koh; Hyounguk Shon; Janghyeon Lee; Hyeong Gwon Hong; Junmo Kim; |
1278 | Grounding 3D Object Affordance from 2D Interactions in Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Normally, humans possess the ability to perceive object affordances in the physical world through demonstration images or videos. Motivated by this, we introduce a novel task setting: grounding 3D object affordance from 2D interactions in images, which faces the challenge of anticipating affordance through interactions of different sources. |
Yuhang Yang; Wei Zhai; Hongchen Luo; Yang Cao; Jiebo Luo; Zheng-Jun Zha; |
1279 | Fast Globally Optimal Surface Normal Estimation from An Affine Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new solver for estimating a surface normal from a single affine correspondence in two calibrated views. |
Levente Hajder; Lajos Lóczi; Daniel Barath; |
1280 | Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations. |
Zhiqiang Shen; Xiaoxiao Sheng; Hehe Fan; Longguang Wang; Yulan Guo; Qiong Liu; Hao Wen; Xi Zhou; |
1281 | Frequency-aware GAN for Adversarial Manipulation Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design an Adversarial Manipulation Generation (AMG) task to explore the vulnerability of image manipulation detectors. |
Peifei Zhu; Genki Osada; Hirokatsu Kataoka; Tsubasa Takahashi; |
1282 | DreamPose: Fashion Video Synthesis with Stable Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. |
Johanna Karras; Aleksander Holynski; Ting-Chun Wang; Ira Kemelmacher-Shlizerman; |
1283 | Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. |
Xiangtai Li; Haobo Yuan; Wenwei Zhang; Guangliang Cheng; Jiangmiao Pang; Chen Change Loy; |
1284 | Hybrid Spectral Denoising Transformer with Guided Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Hybrid Spectral Denoising Transformer (HSDT) for hyperspectral image denoising. |
Zeqiang Lai; Chenggang Yan; Ying Fu; |
1285 | HiVLP: Hierarchical Interactive Video-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we train a VLP model with a hybrid of image-text and video-text pairs, which significantly outperforms pre-training with only the video-text pairs. |
Bin Shao; Jianzhuang Liu; Renjing Pei; Songcen Xu; Peng Dai; Juwei Lu; Weimian Li; Youliang Yan; |
1286 | Learning Concordant Attention Via Target-aware Alignment for Visible-Infrared Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Concordant Attention Learning (CAL), a novel framework that learns semantic-aligned representations for VI Re-ID. |
Jianbing Wu; Hong Liu; Yuxin Su; Wei Shi; Hao Tang; |
1287 | PhysDiff: Physics-Guided Human Motion Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This seriously impacts the quality of generated motions and limits their real-world application. To address this issue, we present a novel physics-guided motion diffusion model (PhysDiff), which incorporates physical constraints into the diffusion process. |
Ye Yuan; Jiaming Song; Umar Iqbal; Arash Vahdat; Jan Kautz; |
1288 | Masked Motion Predictors Are Strong 3D Action Representation Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that instead of following the prevalent pretext task to perform masked self-component reconstruction in human joints, explicit contextual motion modeling is key to the success of learning effective feature representation for 3D action recognition. |
Yunyao Mao; Jiajun Deng; Wengang Zhou; Yao Fang; Wanli Ouyang; Houqiang Li; |
1289 | Template-guided Hierarchical Feature Restoration for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Targeting for detecting anomalies of various sizes for complicated normal patterns, we propose a Template-guided Hierarchical Feature Restoration method, which introduces two key techniques, bottleneck compression and template-guided compensation, for anomaly-free feature restoration. |
Hewei Guo; Liping Ren; Jingjing Fu; Yuwang Wang; Zhizheng Zhang; Cuiling Lan; Haoqian Wang; Xinwen Hou; |
1290 | SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications. |
Abdelrahman Shaker; Muhammad Maaz; Hanoona Rasheed; Salman Khan; Ming-Hsuan Yang; Fahad Shahbaz Khan; |
1291 | UpCycling: Semi-supervised 3D Object Detection Without Sharing Raw-level Unlabeled Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UpCycling, a novel SSL framework for 3D object detection with zero additional raw-level point cloud: learning from unlabeled de-identified intermediate features (i.e., "smashed" data) to preserve privacy. |
Sunwook Hwang; Youngseok Kim; Seongwon Kim; Saewoong Bahk; Hyung-Sin Kim; |
1292 | RIGID: Recurrent GAN Inversion and Editing of Real Face Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified recurrent framework, named Recurrent vIdeo GAN Inversion and eDiting (RIGID), to explicitly and simultaneously enforce temporally coherent GAN inversion and facial editing of real videos. |
Yangyang Xu; Shengfeng He; Kwan-Yee K. Wong; Ping Luo; |
1293 | PourIt!: Weakly-Supervised Liquid Perception from A Single Image for Visual Closed-Loop Robotic Pouring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this paper proposes a simple yet effective framework PourIt! |
Haitao Lin; Yanwei Fu; Xiangyang Xue; |
1294 | CSDA: Learning Category-Scale Joint Feature for Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when objects are non-uniformly distributed at different scales, such category-level alignment causes imbalanced object feature learning, refer as the inconsistency of category alignment at different scales. For better category-level feature alignment, we propose a novel DAOD framework of joint category and scale information, dubbed CSDA, such a design enables effective object learning for different scales. |
Changlong Gao; Chengxu Liu; Yujie Dun; Xueming Qian; |
1295 | A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the possibility of defining a latent space even when the denoising process remains stochastic. |
Chen Henry Wu; Fernando De la Torre; |
1296 | Single Image Defocus Deblurring Via Implicit Neural Inverse Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations, this paper proposes an interpretable approach that explicitly predicts inverse kernels with structural regularization. |
Yuhui Quan; Xin Yao; Hui Ji; |
1297 | Open Set Video HOI Detection from Action-Centric Chain-of-Look Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose ACoLP, a model of Action-centric Chain-of-Look Prompting for open set video HOI detection. |
Nan Xi; Jingjing Meng; Junsong Yuan; |
1298 | Robust Mixture-of-Expert Training for Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. |
Yihua Zhang; Ruisi Cai; Tianlong Chen; Guanhua Zhang; Huan Zhang; Pin-Yu Chen; Shiyu Chang; Zhangyang Wang; Sijia Liu; |
1299 | AvatarCraft: Transforming Text Into Neural Human Avatars with Parameterized Shape and Pose Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. |
Ruixiang Jiang; Can Wang; Jingbo Zhang; Menglei Chai; Mingming He; Dongdong Chen; Jing Liao; |
1300 | S-Adaptive Decoupled Prototype for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we look closely into those critical issues and propose the s-Adaptive Decoupled Prototype (s-ADP) as a solution. |
Jinhao Du; Shan Zhang; Qiang Chen; Haifeng Le; Yanpeng Sun; Yao Ni; Jian Wang; Bin He; Jingdong Wang; |
1301 | Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. |
Cheng-En Wu; Yu Tian; Haichao Yu; Heng Wang; Pedro Morgado; Yu Hen Hu; Linjie Yang; |
1302 | Unified Pre-Training with Pseudo Texts for Text-To-Image Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The processes of pre-training of images and texts are independent, despite cross-modality learning being critical to T2I-ReID. To address the above issues, we present a new unified pre-training pipeline (UniPT) designed specifically for the T2I-ReID task. |
Zhiyin Shao; Xinyu Zhang; Changxing Ding; Jian Wang; Jingdong Wang; |
1303 | Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our preliminary experiments indicate that query slot attention can extract different semantic components from the RGB feature map, while random sampling based slot attention can exploit temporal correspondence cues between frames to assist instance identification. Motivated by this, we propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps. |
Rui Qian; Shuangrui Ding; Xian Liu; Dahua Lin; |
1304 | UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an efficient multi-modal backbone for outdoor 3D perception named UniTR, which processes a variety of modalities with unified modeling and shared parameters. |
Haiyang Wang; Hao Tang; Shaoshuai Shi; Aoxue Li; Zhenguo Li; Bernt Schiele; Liwei Wang; |
1305 | Traj-MAE: Masked Autoencoders for Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the catastrophic forgetting problem that arises when pre-training the network with multiple masking strategies, we introduce a continual pre-training framework, which can help Traj-MAE learn valuable and diverse information from various strategies efficiently. |
Hao Chen; Jiaze Wang; Kun Shao; Furui Liu; Jianye Hao; Chenyong Guan; Guangyong Chen; Pheng-Ann Heng; |
1306 | First Session Adaptation: A Strong Replay-Free Baseline for Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a baseline method, First Session Adaptation (FSA), that sheds light on the efficacy of existing CIL approaches, and allows us to assess the relative performance contributions from head and body adaption. |
Aristeidis Panos; Yuriko Kobe; Daniel Olmeda Reino; Rahaf Aljundi; Richard E. Turner; |
1307 | Ada3D : Exploiting The Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations. To address this issue, we propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy. |
Tianchen Zhao; Xuefei Ning; Ke Hong; Zhongyuan Qiu; Pu Lu; Yali Zhao; Linfeng Zhang; Lipu Zhou; Guohao Dai; Huazhong Yang; Yu Wang; |
1308 | R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. |
Aron Schmied; Tobias Fischer; Martin Danelljan; Marc Pollefeys; Fisher Yu; |
1309 | UniFusion: Unified Multi-View Fusion Transformer for Spatial-Temporal Representation in Bird’s-Eye-View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Further, temporal fusion is also introduced in BEV representation and gains great success. In this work, we propose a new method that unifies both spatial and temporal fusion and merges them into a unified mathematical formulation. |
Zequn Qin; Jingyu Chen; Chao Chen; Xiaozhi Chen; Xi Li; |
1310 | Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. |
Xiaoxiao Sheng; Zhiqiang Shen; Gang Xiao; Longguang Wang; Yulan Guo; Hehe Fan; |
1311 | Preserving Modality Structure Improves Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods often struggle to generalize well on out-of-domain data as they ignore the semantic structure present in modality-specific embeddings. In this context, we propose a novel Semantic-Structure-Preserving Consistency approach to improve generalizability by preserving the modality-specific relationships in the joint embedding space. |
Sirnam Swetha; Mamshad Nayeem Rizve; Nina Shvetsova; Hilde Kuehne; Mubarak Shah; |
1312 | Focus The Discrepancy: Intra- and Inter-Correlation Learning for Image Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel AD framework: FOcus-the- Discrepancy (FOD), which can simultaneously spot the patch-wise, intra- and inter-discrepancies of anomalies. |
Xincheng Yao; Ruoqi Li; Zefeng Qian; Yan Luo; Chongyang Zhang; |
1313 | Pre-training Vision Transformers with Very Limited Synthesized Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the present work, we hypothesize that the process for generating different instances for the same category in FDSL, can be viewed as a form of data augmentation. |
Ryo Nakamura; Hirokatsu Kataoka; Sora Takashima; Edgar Josafat Martinez Noriega; Rio Yokota; Nakamasa Inoue; |
1314 | Sample-adaptive Augmentation for Point Cloud Recognition Against Real-world Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an alternative to make sample-adaptive transformations based on the structure of the sample to cope with potential corruption via an auto-augmentation framework, named as AdaptPoint. |
Jie Wang; Lihe Ding; Tingfa Xu; Shaocong Dong; Xinli Xu; Long Bai; Jianan Li; |
1315 | Make Encoder Great Again in 3D GAN Inversion Through Geometry and Occlusion-Aware Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel encoder-based inversion framework based on EG3D, one of the most widely-used 3D GAN models. |
Ziyang Yuan; Yiming Zhu; Yu Li; Hongyu Liu; Chun Yuan; |
1316 | Modality Unifying Network for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the learned feature emphasizes the common patterns across modalities while suppressing modality-specific and identity-aware information that is valuable for Re-ID. To address these issues, we propose a novel Modality Unifying Network (MUN) to explore a robust auxiliary modality for VI-ReID. |
Hao Yu; Xu Cheng; Wei Peng; Weihao Liu; Guoying Zhao; |
1317 | DLT: Conditioned Layout Generation with Joint Discrete-Continuous Diffusion Layout Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the conditioning layout generation problem, we introduce DLT, a joint discrete-continuous diffusion model. |
Elad Levi; Eli Brosh; Mykola Mykhailych; Meir Perez; |
1318 | PADDLES: Phase-Amplitude Spectrum Disentangled Early Stopping for Learning with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose early stops at different times for AS and PS by disentangling the features of some layer(s) into AS and PS using Discrete Fourier Transform (DFT) during training. |
Huaxi Huang; Hui Kang; Sheng Liu; Olivier Salvado; Thierry Rakotoarivelo; Dadong Wang; Tongliang Liu; |
1319 | Taming Contrast Maximization for Learning Sequential, Low-latency, Event-based Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current state-of-the-art is still highly influenced by the frame-based literature, and usually fails to deliver on these promises. In this work, we take this into consideration and propose a novel self-supervised learning pipeline for the sequential estimation of event-based optical flow that allows for the scaling of the models to high inference frequencies. |
Federico Paredes-Vallés; Kirk Y. W. Scheper; Christophe De Wagter; Guido C. H. E. de Croon; |
1320 | CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Images of the same identity but with different face attributes usually tend to be clustered into different sub-clusters. For the first time, we proposed an attribute hallucination framework named CLIP-Cluster to address this issue, which first hallucinates multiple representations for different attributes with the powerful CLIP model and then pools them by learning neighbor-adaptive attention. |
Shuai Shen; Wanhua Li; Xiaobing Wang; Dafeng Zhang; Zhezhu Jin; Jie Zhou; Jiwen Lu; |
1321 | CASSPR: Cross Attention Single Scan Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, existing methods struggle with fine-grained matching of subtle geometric features in sparse single-shot LiDAR scans. To overcome these limitations, we propose CASSPR as a method to fuse point-based and voxel-based approaches using cross attention transformers. |
Yan Xia; Mariia Gladkova; Rui Wang; Qianyun Li; Uwe Stilla; João F Henriques; Daniel Cremers; |
1322 | DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To leverage strong generative priors and address challenges such as unstable training and lack of interpretability for GAN-based generative methods, we propose a novel fusion algorithm based on the denoising diffusion probabilistic model (DDPM). |
Zixiang Zhao; Haowen Bai; Yuanzhi Zhu; Jiangshe Zhang; Shuang Xu; Yulun Zhang; Kai Zhang; Deyu Meng; Radu Timofte; Luc Van Gool; |
1323 | A Unified Continual Learning Framework with General Parameter-Efficient Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we position prompting as one instantiation of PET, and propose a unified CL framework with general PET, dubbed as Learning-Accumulation-Ensemble (LAE). |
Qiankun Gao; Chen Zhao; Yifan Sun; Teng Xi; Gang Zhang; Bernard Ghanem; Jian Zhang; |
1324 | Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel approach to generating the 3D motion of a human interacting with a target object, with a focus on solving the challenge of synthesizing long-range and diverse motions, which could not be fulfilled by existing auto-regressive models or path planning-based methods. |
Huaijin Pi; Sida Peng; Minghui Yang; Xiaowei Zhou; Hujun Bao; |
1325 | Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we thoroughly explore the characteristics of animation videos and leverage the rich priors in real-world animation data for a more practical animation VSR model. |
Zixi Tuo; Huan Yang; Jianlong Fu; Yujie Dun; Xueming Qian; |
1326 | Compositional Feature Augmentation for Unbiased Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that all existing re-balancing strategies fail to increase the diversity of the relation triplet features of each predicate, which is critical for robust SGG. |
Lin Li; Guikun Chen; Jun Xiao; Yi Yang; Chunping Wang; Long Chen; |
1327 | Foreground and Text-lines Aware Document Image Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims at the distorted document image rectification problem, the objective to eliminate the geometric distortion in the document images and realize document intelligence. |
Heng Li; Xiangping Wu; Qingcai Chen; Qianjin Xiang; |
1328 | Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. |
Cong Han; Yujie Zhong; Dengjie Li; Kai Han; Lin Ma; |
1329 | INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To compensate for the accuracy drop, we propose a novel BNN design called Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which controls the quantization threshold dynamically in an input-dependent or instance-aware manner. |
Changhun Lee; Hyungjun Kim; Eunhyeok Park; Jae-Joon Kim; |
1330 | Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, different artists may use diverse drawing techniques and create multiple styles of sketches; but the style is globally consistent in a sketch. Inspired by such observations, in this paper, we propose a novel Human-Inspired Dynamic Adaptation (HIDA) method. |
Fei Gao; Yifan Zhu; Chang Jiang; Nannan Wang; |
1331 | When Epipolar Constraint Meets Non-Local Operators in Multi-View Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose to constrain non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines. |
Tianqi Liu; Xinyi Ye; Weiyue Zhao; Zhiyu Pan; Min Shi; Zhiguo Cao; |
1332 | LU-NeRF: Scene and Pose Estimation By Synchronizing Local Unposed NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel approach, LU-NeRF, that jointly estimates camera poses and neural radiance fields with relaxed assumptions on pose configuration. |
Zezhou Cheng; Carlos Esteves; Varun Jampani; Abhishek Kar; Subhransu Maji; Ameesh Makadia; |
1333 | Calibrating Panoramic Depth Estimation for Practical Localization and Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose that accurate depth estimated from panoramic images can serve as a powerful and light-weight input for a wide range of downstream tasks requiring 3D information. |
Junho Kim; Eun Sun Lee; Young Min Kim; |
1334 | DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the possibility of jointly modeling generation and discrimination. |
Runhui Huang; Jianhua Han; Guansong Lu; Xiaodan Liang; Yihan Zeng; Wei Zhang; Hang Xu; |
1335 | DS-Fusion: Artistic Typography Via Discriminated and Stylized Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable. |
Maham Tanveer; Yizhi Wang; Ali Mahdavi-Amiri; Hao Zhang; |
1336 | Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective framework to Distill the Knowledge from the VLM to a DETR-like detector, termed DK-DETR. |
Liangqi Li; Jiaxu Miao; Dahu Shi; Wenming Tan; Ye Ren; Yi Yang; Shiliang Pu; |
1337 | Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Scale-MAE, a pretraining method that explicitly learns relationships between data at different, known scales throughout the pretraining process. |
Colorado J Reed; Ritwik Gupta; Shufan Li; Sarah Brockman; Christopher Funk; Brian Clipp; Kurt Keutzer; Salvatore Candido; Matt Uyttendaele; Trevor Darrell; |
1338 | View Consistent Purification for Accurate Cross-View Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images. |
Shan Wang; Yanhao Zhang; Akhil Perincherry; Ankit Vora; Hongdong Li; |
1339 | A Unified Framework for Robustness on Diverse Sampling Errors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work starts from a motivation to allow adaptive inference once we know the target, since it is accessible only at testing. |
Myeongho Jeon; Myungjoo Kang; Joonseok Lee; |
1340 | Efficient Video Action Detection with Token Dropout and Context Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an end-to-end framework for efficient video action detection (EVAD) based on vanilla ViTs. |
Lei Chen; Zhan Tong; Yibing Song; Gangshan Wu; Limin Wang; |
1341 | Explicit Motion Disentangling for Efficient Optical Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework for optical flow estimation that achieves a good balance between performance and efficiency. |
Changxing Deng; Ao Luo; Haibin Huang; Shaodan Ma; Jiangyu Liu; Shuaicheng Liu; |
1342 | LiDAR-Camera Panoptic Segmentation Via Geometry-Consistent and Semantic-Aware Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose LCPS, the first LiDAR-Camera Panoptic Segmentation network. |
Zhiwei Zhang; Zhizhong Zhang; Qian Yu; Ran Yi; Yuan Xie; Lizhuang Ma; |
1343 | GrowCLIP: Data-Aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, it is important to utilize the knowledge in current model to obtain efficient training and better performance. To address the above issues, in this paper, we propose GrowCLIP, a data-driven automatic model growing algorithm for contrastive language-image pre-training with continuous image-text pairs as input. |
Xinchi Deng; Han Shi; Runhui Huang; Changlin Li; Hang Xu; Jianhua Han; James Kwok; Shen Zhao; Wei Zhang; Xiaodan Liang; |
1344 | From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus finding representations with high task scores is equivalent to finding representations with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. |
Nikola Zubić; Daniel Gehrig; Mathias Gehrig; Davide Scaramuzza; |
1345 | LA-Net: Landmark-Aware Learning for Reliable Facial Expression Recognition Under Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The derived noisy labels significantly harm the performance in real-world scenarios. To address this issue, we present a new FER model named Landmark-Aware Net (LA-Net), which leverages facial landmarks to mitigate the impact of label noise from two perspectives. |
Zhiyu Wu; Jinshi Cui; |
1346 | Identity-Consistent Aggregation for Video Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, in this paper, we aim to enable the model to focus on the identity-consistent temporal contexts of each object to obtain more comprehensive object representations and handle the rapid object appearance variations such as occlusion, motion blur, etc. |
Chaorui Deng; Da Chen; Qi Wu; |
1347 | Scene-Aware Label Graph Learning for Multi-Label Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel scene-aware label graph learning framework is proposed, which is capable of learning visual representations for labels while fully perceiving their co-occurrence relationships under variable scenes. |
Xuelin Zhu; Jian Liu; Weijia Liu; Jiawei Ge; Bo Liu; Jiuxin Cao; |
1348 | Relightify: Relightable 3D Faces from A Single Image Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the remarkable success of diffusion models on image generation, recent works have also demonstrated their impressive ability to address a number of inverse problems in an unsupervised way, by properly constraining the sampling process based on a conditioning input. Motivated by this, in this paper, we present the first approach to use diffusion models as a prior for highly accurate 3D facial BRDF reconstruction from a single image. |
Foivos Paraperas Papantoniou; Alexandros Lattas; Stylianos Moschoglou; Stefanos Zafeiriou; |
1349 | Fcaformer: Forward Cross Attention in Hybrid Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose a different approach that aims to improve the performance of transformer-based architectures by densifying the attention pattern. |
Haokui Zhang; Wenze Hu; Xiaoyu Wang; |
1350 | Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a text-video learning framework with progressive spatio-temporal prototype matching. |
Pandeng Li; Chen-Wei Xie; Liming Zhao; Hongtao Xie; Jiannan Ge; Yun Zheng; Deli Zhao; Yongdong Zhang; |
1351 | Leveraging Spatio-Temporal Dependency for Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Spatio-Temporal Curve Network (STC-Net) to effectively leverage the spatio-temporal dependency of the human skeleton. |
Jungho Lee; Minhyeok Lee; Suhwan Cho; Sungmin Woo; Sungjun Jang; Sangyoun Lee; |
1352 | Data Augmented Flatness-aware Gradient Projection for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our findings, we propose a Data Augmented Flatness-aware Gradient Projection (DFGP) method to solve the problem, which consists of three modules: data and weight perturbation, flatness-aware optimization, and gradient projection. |
Enneng Yang; Li Shen; Zhenyi Wang; Shiwei Liu; Guibing Guo; Xingwei Wang; |
1353 | Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. |
Geon Lee; Sanghoon Lee; Dohyung Kim; Younghoon Shin; Yongsang Yoon; Bumsub Ham; |
1354 | Sample-wise Label Confidence Incorporation for Learning with Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although recent studies on designing a robust objective function to label noise, known as the robust loss method, have shown promising results for learning with noisy labels, they suffer from the issue of underfitting not only noisy samples but also clean ones, leading to suboptimal model performance. To address this issue, we propose a novel learning framework that selectively suppresses noisy samples while avoiding underfitting clean data. |
Chanho Ahn; Kikyung Kim; Ji-won Baek; Jongin Lim; Seungju Han; |
1355 | CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these are not directly applicable to MMT since they do not provide aligned multimodal multilingual features for generative tasks. To alleviate this issue, instead of designing complex modules for MMT, we propose CLIPTrans, which simply adapts the independently pre-trained multimodal M-CLIP and the multilingual mBART. |
Devaansh Gupta; Siddhant Kharbanda; Jiawei Zhou; Wanhua Li; Hanspeter Pfister; Donglai Wei; |
1356 | SGAligner: 3D Scene Alignment with Scene Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios (i.e., unknown overlap – if any – and changes in the environment). |
Sayan Deb Sarkar; Ondrej Miksik; Marc Pollefeys; Daniel Barath; Iro Armeni; |
1357 | Name Your Colour For The Task: Artificially Discover Colour Naming Via Colour Quantisation Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose a novel colour quantisation transformer, CQFormer, that quantises colour space while maintaining the accuracy of machine recognition on the quantised images. |
Shenghan Su; Lin Gu; Yue Yang; Zenghui Zhang; Tatsuya Harada; |
1358 | FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate and discover that the heterogeneous human topology graph structure is the crucial factor hindering training stability. |
Jingwen Guo; Hong Liu; Shitong Sun; Tianyu Guo; Min Zhang; Chenyang Si; |
1359 | Video Adverse-Weather-Component Suppression Network Via Weather Messenger and Adversarial Backpropagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first framework for restoring videos from all adverse weather conditions by developing a video adverse-weather-component suppression network (ViWS-Net). |
Yijun Yang; Angelica I. Aviles-Rivero; Huazhu Fu; Ye Liu; Weiming Wang; Lei Zhu; |
1360 | Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the first large-scale fashion visual similarity benchmark dataset, consisting of more than 110K expert-annotated image pairs. |
Oren Barkan; Tal Reiss; Jonathan Weill; Ori Katz; Roy Hirsch; Itzik Malkiel; Noam Koenigstein; |
1361 | Ego-Only: Egocentric Action Detection Without Exocentric Transferring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Ego-Only, the first approach that enables state-of-the-art action detection on egocentric (first-person) videos without any form of exocentric (third-person) transferring. |
Huiyu Wang; Mitesh Kumar Singh; Lorenzo Torresani; |
1362 | CoinSeg: Contrast Inter- and Intra- Class Representations for Incremental Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, releasing parameter training for plasticity could lead to the best performance for all categories, but this requires discriminative feature representation. Therefore, we prioritize the model’s plasticity and propose the Contrast inter- and intra-class representations for Incremental Segmentation (CoinSeg), which pursue discriminative representations for flexible parameter tuning. |
Zekang Zhang; Guangyu Gao; Jianbo Jiao; Chi Harold Liu; Yunchao Wei; |
1363 | Multi-View Active Fine-Grained Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, towards FGVC in the real physical world, we put forward the problem of multi-view active fine-grained visual recognition (MAFR) and complete this study in three steps: (i) a multi-view, fine-grained vehicle dataset is collected as the testbed, (ii) a pilot experiment is designed to validate the need and research value of MAFR, (iii) a policy-gradient-based framework along with a dynamic exiting strategy is proposed to achieve efficient recognition with active view selection. |
Ruoyi Du; Wenqing Yu; Heqing Wang; Ting-En Lin; Dongliang Chang; Zhanyu Ma; |
1364 | Part-Aware Transformer for Generalizable Person Re-identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that while the global images of different IDs should have different features, their similar local parts (e.g., black backpack) are not bounded by this constraint. Motivated by this, we propose a pure Transformer model (termed Part-aware Transformer) for DG-ReID by designing a proxy task, named Cross-ID Similarity Learning (CSL), to mine local visual information shared by different IDs. |
Hao Ni; Yuke Li; Lianli Gao; Heng Tao Shen; Jingkuan Song; |
1365 | Variational Causal Inference Network for Explanatory Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, they neglect the complex relationships among question words, visual regions, and explanation tokens. To address these issues, we propose a Variational Causal Inference Network (VCIN) that establishes the causal correlation between predicted answers and explanations, and captures cross-modal relationships to generate rational explanations. |
Dizhan Xue; Shengsheng Qian; Changsheng Xu; |
1366 | Improving Representation Learning for Histopathologic Images with Cluster Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These SSL strategies are quickly bridging the performance disparity with their supervised counterparts. In this context, we introduce an SSL framework. |
Weiyi Wu; Chongyang Gao; Joseph DiPalma; Soroush Vosoughi; Saeed Hassanpour; |
1367 | Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object’s form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. |
Hyeonseop Song; Seokhun Choi; Hoseok Do; Chul Lee; Taehyeong Kim; |
1368 | Panoramas from Photons Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we present a method capable of estimating extreme scene motion under challenging conditions, such as low light or high dynamic range, from a sequence of high-speed image frames such as those captured by a single-photon camera. |
Sacha Jungerman; Atul Ingle; Mohit Gupta; |
1369 | Global Adaptation Meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we combine global adaptation and local generalization in PoseDA, a simple yet effective framework of unsupervised domain adaptation for 3D human pose estimation. |
Wenhao Chai; Zhongyu Jiang; Jenq-Neng Hwang; Gaoang Wang; |
1370 | Learning Neural Implicit Surfaces with Object-Aware Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Object-aware Radiance Fields (ORF) to automatically learn an object-aware geometry reconstruction. |
Yiheng Zhang; Zhaofan Qiu; Yingwei Pan; Ting Yao; Tao Mei; |
1371 | PADCLIP: Pseudo-labeling with Adaptive Debiasing in CLIP for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Catastrophic Forgetting Measurement (CFM) to adjust the learning rate to avoid excessive training (thus mitigating the catastrophic forgetting issue). |
Zhengfeng Lai; Noranart Vesdapunt; Ning Zhou; Jun Wu; Cong Phuoc Huynh; Xuelu Li; Kah Kuen Fu; Chen-Nee Chuah; |
1372 | Causal-DFQ: Causality Guided Data-Free Network Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the causal understanding, we propose the Causality-guided Data-free Network Quantization method, Causal-DFQ, to eliminate the reliance on data via approaching an equilibrium of causality-driven intervened distributions. |
Yuzhang Shang; Bingxin Xu; Gaowen Liu; Ramana Rao Kompella; Yan Yan; |
1373 | Enhancing Generalization of Universal Adversarial Perturbation Through Gradient Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we examine the serious dilemma of UAP generation methods from a generalization perspective — the gradient vanishing problem using small-batch stochastic gradient optimization and the local optima problem using large-batch optimization. |
Xuannan Liu; Yaoyao Zhong; Yuhang Zhang; Lixiong Qin; Weihong Deng; |
1374 | CancerUniT: Towards A Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using A Large Collection of CT Scans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. |
Jieneng Chen; Yingda Xia; Jiawen Yao; Ke Yan; Jianpeng Zhang; Le Lu; Fakai Wang; Bo Zhou; Mingyan Qiu; Qihang Yu; Mingze Yuan; Wei Fang; Yuxing Tang; Minfeng Xu; Jian Zhou; Yuqian Zhao; Qifeng Wang; Xianghua Ye; Xiaoli Yin; Yu Shi; Xin Chen; Jingren Zhou; Alan Yuille; Zaiyi Liu; Ling Zhang; |
1375 | Dual Meta-Learning with Longitudinally Consistent Regularization for One-Shot Brain Tissue Segmentation Across The Human Lifespan Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a dual meta-learning paradigm to learn longitudinally consistent representations and persist when fine-tuning. |
Yongheng Sun; Fan Wang; Jun Shu; Haifeng Wang; Li Wang; Deyu Meng; Chunfeng Lian; |
1376 | DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel bi-channel Transformer architecture, integrated with parameterized deformable models, termed DeFormer, to simultaneously estimate global and local deformations. |
Di Liu; Xiang Yu; Meng Ye; Qilong Zhangli; Zhuowei Li; Zhixing Zhang; Dimitris N. Metaxas; |
1377 | Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works aim to learn action-specific embeddings by exploiting either intra-skeleton or inter-skeleton spatial associations, which may lead to less discriminative representations. To address these issues, we propose a novel Parallel Attention Interaction Network (PAINet) that incorporates two complementary branches to strengthen the match by inter-skeleton and intra-skeleton correlation. |
Xingyu Liu; Sanping Zhou; Le Wang; Gang Hua; |
1378 | Cross-view Semantic Alignment for Livestreaming Product Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we contribute LPR4M, a large-scale multimodal dataset that covers 34 categories, comprises 3 modalities (image, video, and text), and is 50 times larger than the largest publicly available dataset. |
Wenjie Yang; Yiyi Chen; Yan Li; Yanhua Cheng; Xudong Liu; Quan Chen; Han Li; |
1379 | Continuously Masked Transformer for Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A novel continuous-mask-aware transformer for image inpainting, called CMT, is proposed in this paper, which uses a continuous mask to represent the amounts of errors in tokens. |
Keunsoo Ko; Chang-Su Kim; |
1380 | Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We tackle the problem of estimating a Manhattan frame, i.e. three orthogonal vanishing points, and the unknown focal length of the camera, leveraging a prior vertical direction. |
Rémi Pautrat; Shaohui Liu; Petr Hruby; Marc Pollefeys; Daniel Barath; |
1381 | Learn TAROT with MENTOR: A Meta-Learned Self-Supervised Approach for Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we shed new light on learning common driving patterns by introducing meTA ROad paTh (TAROT) to formulate combinations of various relations between lanes on the road topology. |
Mozhgan Pourkeshavarz; Changhe Chen; Amir Rasouli; |
1382 | MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an efficient multi-camera to Bird’s-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT. |
Hongyu Zhou; Zheng Ge; Zeming Li; Xiangyu Zhang; |
1383 | Local and Global Logit Adjustments for Long-Tailed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the former is insufficient for tail classes due to the high imbalance factor of the entire dataset, while the latter may bring ambiguity in predicting unseen classes. To address these issues, we propose a novel Local and Global Logit Adjustments (LGLA) method that learns experts with full data covering all classes and enlarges the discrepancy among them by elaborated logit adjustments. |
Yingfan Tao; Jingna Sun; Hao Yang; Li Chen; Xu Wang; Wenming Yang; Daniel Du; Min Zheng; |
1384 | Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we generalize and formalize this principle through Positive Active Learning (PAL) where an oracle queries semantic relationships between samples. |
Vivien Cabannes; Leon Bottou; Yann Lecun; Randall Balestriero; |
1385 | Wasserstein Expansible Variational Autoencoder for Discriminative and Generative Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes the Wasserstein Expansible Variational Autoencoder (WEVAE), which evaluates the statistical similarity between the probabilistic representation of new data and that represented by each mixture component and then uses it for deciding when to expand the model. |
Fei Ye; Adrian G. Bors; |
1386 | Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient fine-Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. |
Haoyu He; Jianfei Cai; Jing Zhang; Dacheng Tao; Bohan Zhuang; |
1387 | Label-Free Event-based Object Recognition Via Joint Learning with Image Reconstruction from Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study label-free event-based object recognition where category labels and paired images are not available. |
Hoonhee Cho; Hyeonseong Kim; Yujeong Chae; Kuk-Jin Yoon; |
1388 | Gloss-Free Sign Language Translation: Improving from Visual-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the scarcity of gloss-annotated sign language data, combined with the information bottleneck in the mid-level gloss representation, has hindered the further development of the SLT task. To address this challenge, we propose a novel Gloss-Free SLT base on Visual-Language Pretraining (GFSLT-VLP), which improves SLT by inheriting language-oriented prior knowledge from pre-trained models, without any gloss annotation assistance. |
Benjia Zhou; Zhigang Chen; Albert Clapés; Jun Wan; Yanyan Liang; Sergio Escalera; Zhen Lei; Du Zhang; |
1389 | Weakly-supervised 3D Pose Transfer with Keypoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies. We thus propose a novel weakly-supervised keypoint-based framework to overcome these difficulties. |
Jinnan Chen; Chen Li; Gim Hee Lee; |
1390 | Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose APE, an Adaptive Prior rEfinement method for CLIP’s pre-trained knowledge, which achieves superior accuracy with high computational efficiency. |
Xiangyang Zhu; Renrui Zhang; Bowei He; Aojun Zhou; Dong Wang; Bin Zhao; Peng Gao; |
1391 | EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in The Backbone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones. |
Shraman Pramanick; Yale Song; Sayan Nag; Kevin Qinghong Lin; Hardik Shah; Mike Zheng Shou; Rama Chellappa; Pengchuan Zhang; |
1392 | On The Effectiveness of Spectral Discriminators for Perceptual Quality Improvement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to spectral changes. |
Xin Luo; Yunan Zhu; Shunxin Xu; Dong Liu; |
1393 | Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This practice ensures high-quality pseudo labels, but incurs a relatively low utilization of the whole unlabeled set. In this work, our key insight is that these uncertain samples can be turned into certain ones, as long as the confusion classes for the top-1 class are detected and removed. |
Lihe Yang; Zhen Zhao; Lei Qi; Yu Qiao; Yinghuan Shi; Hengshuang Zhao; |
1394 | Deep Equilibrium Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new query-based object detector (DEQDet) by designing a deep equilibrium decoder. |
Shuai Wang; Yao Teng; Limin Wang; |
1395 | Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a novel Diffusion-based 3D Pose estimation (D3DP) method with Joint-wise reProjection-based Multi-hypothesis Aggregation (JPMA) is proposed for probabilistic 3D human pose estimation. |
Wenkang Shan; Zhenhua Liu; Xinfeng Zhang; Zhao Wang; Kai Han; Shanshe Wang; Siwei Ma; Wen Gao; |
1396 | RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we incorporate RGB images, Point clouds and Events for joint optical flow and scene flow estimation with our proposed multi-stage multimodal fusion model, RPEFlow. |
Zhexiong Wan; Yuxin Mao; Jing Zhang; Yuchao Dai; |
1397 | SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop SMAUG, an efficient pre-training framework for video-language models. |
Yuanze Lin; Chen Wei; Huiyu Wang; Alan Yuille; Cihang Xie; |
1398 | EP-ALM: Efficient Perceptual Augmentation of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to rather direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception. |
Mustafa Shukor; Corentin Dancette; Matthieu Cord; |
1399 | Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although current approaches attempt to model these interactions via co-attention between histology and genomic data, they focus on only dense local similarity across modalities, which fails to capture global consistency between potential structures, i.e. TME-related interactions of histology and co-expression of genomic data. To address these challenges, we propose a Multimodal Optimal Transport-based Co-Attention Transformer framework with global structure consistency, in which optimal transport (OT) is applied to match patches of a WSI and genes embeddings for selecting informative patches to represent the gigapixel WSI. |
Yingxue Xu; Hao Chen; |
1400 | GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel Geometry-aware Facial Expression Translation (GaFET) framework, which is based on parametric 3D facial representations and can stably decoupled expression. |
Tianxiang Ma; Bingchuan Li; Qian He; Jing Dong; Tieniu Tan; |
1401 | Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a practical vertical federated learning (VFL) framework called one-shot VFL that can solve the communication bottleneck and the problem of limited overlapping samples simultaneously based on semi-supervised learning. |
Jingwei Sun; Ziyue Xu; Dong Yang; Vishwesh Nath; Wenqi Li; Can Zhao; Daguang Xu; Yiran Chen; Holger R. Roth; |
1402 | On The Audio-visual Synchronization for Lip-to-Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that commonly used audiovisual datasets such as GRID, TCD-TIMIT, and Lip2Wav can, however, have the data asynchrony issue, which will lead to inaccurate evaluation with conventional time alignment-sensitive metrics such as STOI, ESTOI, and MCD. |
Zhe Niu; Brian Mak; |
1403 | Robust One-Shot Face Video Re-enactment Using Hybrid Latent Spaces of StyleGAN2 Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing methods are sensitive (not robust) to the source frame’s facial expressions and head pose, even though ideally only the identity of the source frame should have an effect. Addressing these limitations, we propose a novel framework exploiting the implicit 3D prior and inherent latent properties of StyleGAN2 to facilitate one-shot face re-enactment at 1024×1024 (1) with zero dependencies on explicit structural priors, (2) accommodating attribute edits, and (3) robust to diverse facial expressions and head poses of the source frame. |
Trevine Oorloff; Yaser Yacoob; |
1404 | BallGAN: 3D-aware Image Synthesis with A Spherical Background Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to represent the background as a spherical surface for multiple reasons inspired by computer graphics. |
Minjung Shin; Yunji Seo; Jeongmin Bae; Young Sun Choi; Hyunsu Kim; Hyeran Byun; Youngjung Uh; |
1405 | RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel realistic pseudo-palmprint generation (RPG) model to synthesize palmprints with massive identities. |
Lei Shen; Jianlong Jin; Ruixin Zhang; Huaen Li; Kai Zhao; Yingyi Zhang; Jingyun Zhang; Shouhong Ding; Yang Zhao; Wei Jia; |
1406 | Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a step toward developing vision-language models to aid in student learning as intelligent teacher assistants, we introduce the Lecture Presentations Multimodal (LPM) Dataset as a large-scale benchmark testing the capabilities of vision-and-language models in multimodal understanding of educational videos. |
Dong Won Lee; Chaitanya Ahuja; Paul Pu Liang; Sanika Natu; Louis-Philippe Morency; |
1407 | Window-Based Early-Exit Cascades for Uncertainty Estimation: When Deep Ensembles Are More Efficient Than Single Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is achieved by cascading ensemble members via an early-exit approach. In this work, we investigate extending these efficiency gains to tasks related to uncertainty estimation. |
Guoxuan Xia; Christos-Savvas Bouganis; |
1408 | AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the complex spatio-temporal nature of human motion and the difficulty in learning the cross-modal relationship between text and motion, text-driven motion generation is still a challenging problem. To address these issues, we propose AttT2M, a two-stage method with multi-perspective attention mechanism: body-part attention and global-local motion-text attention. |
Chongyang Zhong; Lei Hu; Zihao Zhang; Shihong Xia; |
1409 | A Theory of Topological Derivatives for Inverse Rendering of Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a theoretical framework for differentiable surface evolution that allows discrete topology changes through the use of topological derivatives for variational optimization of image functionals. |
Ishit Mehta; Manmohan Chandraker; Ravi Ramamoorthi; |
1410 | Canonical Factors for Hybrid Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals — they can lead to radiance field reconstruction differences of as high as 2 PSNR — and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. |
Brent Yi; Weijia Zeng; Sam Buchanan; Yi Ma; |
1411 | XNet: Wavelet-Based Low and High Frequency Fusion Networks for Fully- and Semi-Supervised Semantic Segmentation of Biomedical Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a wavelet-based LF and HF fusion model XNet, which supports both fully- and semi-supervised semantic segmentation and outperforms state-of-the-art models in both fields. |
Yanfeng Zhou; Jiaxing Huang; Chenlong Wang; Le Song; Ge Yang; |
1412 | Betrayed By Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on open vocabulary instance segmentation to expand a segmentation model to classify and segment instance-level novel categories. |
Jianzong Wu; Xiangtai Li; Henghui Ding; Xia Li; Guangliang Cheng; Yunhai Tong; Chen Change Loy; |
1413 | StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, StyleGAN is inherently limited to cropped aligned faces at a fixed image resolution it is pre-trained on. In this paper, we propose a simple and effective solution to this limitation by using dilated convolutions to rescale the receptive fields of shallow layers in StyleGAN, without altering any model parameters. |
Shuai Yang; Liming Jiang; Ziwei Liu; Chen Change Loy; |
1414 | HandR2N2: Iterative 3D Hand Pose Estimation Using A Residual Recurrent Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the existing models follows a non-recurrent scheme and thus require complex architectures or redundant parameters in order to achieve acceptable model capacity. To tackle this limitation, this paper proposes HandR2N2, a compact neural network that iteratively regresses the hand pose using a novel residual recurrent unit. |
Wencan Cheng; Jong Hwan Ko; |
1415 | GET: Group Event Transformer for Event-Based Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing event-based backbones mainly rely on image-based designs to extract spatial information within the image transformed from events, overlooking important event properties like time and polarity. To address this issue, we propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET), which de-couples temporal-polarity information from spatial infor-mation throughout the feature extraction process. |
Yansong Peng; Yueyi Zhang; Zhiwei Xiong; Xiaoyan Sun; Feng Wu; |
1416 | Unsupervised Learning of Object-Centric Embeddings for Cell Instance Segmentation in Microscopy Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Segmentation of objects in microscopy images is required for many biomedical applications. We introduce object-centric embeddings (OCEs), which embed image patches such that the spatial offsets between patches cropped from the same object are preserved. |
Steffen Wolf; Manan Lalit; Katie McDole; Jan Funke; |
1417 | DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since dynamic parts of the human body are more informative than other parts (e.g. bags) during walking, in this paper, we propose a novel and high-performance framework named DyGait. |
Ming Wang; Xianda Guo; Beibei Lin; Tian Yang; Zheng Zhu; Lincheng Li; Shunli Zhang; Xin Yu; |
1418 | When Do Curricula Work in Federated Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investi- gate a different and rarely studied dimension of FL: ordered learning. |
Saeed Vahidian; Sreevatsank Kadaveru; Woonjoon Baek; Weijia Wang; Vyacheslav Kungurtsev; Chen Chen; Mubarak Shah; Bill Lin; |
1419 | XiNet: Efficient Neural Networks for TinyML Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we propose XiNet, a novel convolutional neural architecture that targets edge devices. |
Alberto Ancilotto; Francesco Paissan; Elisabetta Farella; |
1420 | GridPull: Towards Scalability in Learning Implicit Representations from 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To resolve the scalability issue in surface reconstruction, we propose GridPull to improve the efficiency of learning implicit representations from large scale point clouds. |
Chao Chen; Yu-Shen Liu; Zhizhong Han; |
1421 | Audio-Visual Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce audio-visual class-incremental learning, a class-incremental learning scenario for audio-visual video recognition. |
Weiguo Pian; Shentong Mo; Yunhui Guo; Yapeng Tian; |
1422 | GeoMIM: Towards Better 3D Knowledge Transfer Via Masked Image Modeling for Multi-view 3D Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection. |
Jihao Liu; Tai Wang; Boxiao Liu; Qihang Zhang; Yu Liu; Hongsheng Li; |
1423 | Towards Viewpoint-Invariant Visual Recognition Via Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the success of adversarial training in promoting model robustness, we propose Viewpoint-Invariant Adversarial Training (VIAT) to improve viewpoint robustness of common image classifiers. |
Shouwei Ruan; Yinpeng Dong; Hang Su; Jianteng Peng; Ning Chen; Xingxing Wei; |
1424 | Helping Hands: An Object-Aware Ego-Centric Video Recognition Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an object-aware decoder for improving the performance of spatio-temporal representations on ego-centric videos. |
Chuhan Zhang; Ankush Gupta; Andrew Zisserman; |
1425 | RenderIH: A Large-Scale Synthetic Dataset for 3D Interacting Hand Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate natural and diverse interacting poses, we propose a new pose optimization algorithm. |
Lijun Li; Linrui Tian; Xindi Zhang; Qi Wang; Bang Zhang; Liefeng Bo; Mengyuan Liu; Chen Chen; |
1426 | Multi-Metrics Adaptively Identifies Backdoors in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the distance-based defense methods and discover that i) Euclidean distance becomes meaningless in high dimensions and ii) malicious gradients with diverse characteristics cannot be identified by a single metric. |
Siquan Huang; Yijiang Li; Chong Chen; Leyu Shi; Ying Gao; |
1427 | SpinCam: High-Speed Imaging Via A Rotating Point-Spread Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we propose a novel approach for compressive high-speed imaging based on temporally coding the camera’s point-spread function (PSF). |
Dorian Chan; Mark Sheinin; Matthew O’Toole; |
1428 | FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a False Positive Rectification (FPR) approach to tackle the co-occurrence problem by leveraging the false positives of CAM. |
Liyi Chen; Chenyang Lei; Ruihuang Li; Shuai Li; Zhaoxiang Zhang; Lei Zhang; |
1429 | Cross-modal Scalable Hierarchical Clustering in Hyperbolic Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, existing approaches are hampered by their inability to scale to large datasets and the discrete encoding of the hierarchy. We introduce scalable Hyperbolic Hierarchical Clustering (sHHC) which overcomes these limitations by learning continuous hierarchies in hyperbolic space. |
Teng Long; Nanne van Noord; |
1430 | DETRDistill: A Universal Knowledge Distillation Framework for DETR-families Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DETRDistill, a novel knowledge distillation method dedicated to DETR-families. |
Jiahao Chang; Shuo Wang; Hai-Ming Xu; Zehui Chen; Chenhongyi Yang; Feng Zhao; |
1431 | F&F Attack: Adversarial Attack Against Multiple Object Trackers By Inducing False Negatives and False Positives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel False negative and False positive attack (F&F attack) mechanism: it perturbs the input image to erase original detections and to inject deceptive false alarms around original ones while integrating the association attack implicitly. |
Tao Zhou; Qi Ye; Wenhan Luo; Kaihao Zhang; Zhiguo Shi; Jiming Chen; |
1432 | Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios. |
Junjie Fei; Teng Wang; Jinrui Zhang; Zhenyu He; Chengjie Wang; Feng Zheng; |
1433 | ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. |
Mingyuan Zhang; Xinying Guo; Liang Pan; Zhongang Cai; Fangzhou Hong; Huirong Li; Lei Yang; Ziwei Liu; |
1434 | GlueStick: Robust Image Matching By Sticking Points and Lines Together Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a new matching paradigm, where points, lines, and their descriptors are unified into a single wireframe structure. |
Rémi Pautrat; Iago Suárez; Yifan Yu; Marc Pollefeys; Viktor Larsson; |
1435 | Computational 3D Imaging with Position Sensors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a structured light system based on position sensing diodes (PSDs), an unconventional sensing modality that directly measures the centroid of the spatial distribution of incident light, thus enabling high-resolution 3D laser scanning with a minimal amount of sensor data. |
Jeremy Klotz; Mohit Gupta; Aswin C. Sankaranarayanan; |
1436 | PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To leverage the complementary information more effectively, we propose a network implementing multi-scale bidirectional fusion between RGB images and point clouds generated from depth images. |
Mingzhi Yuan; Kexue Fu; Zhihao Li; Yucong Meng; Manning Wang; |
1437 | Towards Multi-Layered 3D Garments Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel data-driven method, called LayersNet, to model garment-level animations as particle-wise interactions in a micro physics system. |
Yidi Shao; Chen Change Loy; Bo Dai; |
1438 | LiveHand: Real-time and Photorealistic Neural Hand Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first neural-implicit approach to photo-realistically render hands in real-time. |
Akshay Mundra; Mallikarjun B R; Jiayi Wang; Marc Habermann; Christian Theobalt; Mohamed Elgharib; |
1439 | Advancing Referring Expression Segmentation Beyond Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a more realistic setting, named Group-wise Referring Expression Segmentation (GRES), which expands RES to a group of related images, allowing the described objects to exist in a subset of the input image group. |
Yixuan Wu; Zhao Zhang; Chi Xie; Feng Zhu; Rui Zhao; |
1440 | Learning Image Harmonization in The Linear Color Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel neural approach to harmonize the image colors in a camera-independent color space, in which color values are proportional to the scene radiance. |
Ke Xu; Gerhard Petrus Hancke; Rynson W.H. Lau; |
1441 | Chasing Clouds: Differentiable Volumetric Rasterisation of Point Clouds As A Highly Efficient and Accurate Loss for Large-Scale Deformable 3D Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel Differentiable Volumetric Rasterisation of point Clouds (DiVRoC) that overcomes those limitations and offers a highly efficient and accurate loss for large-scale deformable 3D registration. |
Mattias P. Heinrich; Alexander Bigalke; Christoph Großbröhmer; Lasse Hansen; |
1442 | TripLe: Revisiting Pretrained Model Reuse and Progressive Learning for Efficient Vision Transformer Scaling and Searching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, maintaining the pretrained optimizer states for weights is critical for model scaling, whereas the new weights added during expansion lack these states in pretrained models. To address these issues, we propose TripLe, which partially scales a model before training, while growing the rest of the new parameters during training by copying both the warmed-up weights with the optimizer states from existing weights. |
Cheng Fu; Hanxian Huang; Zixuan Jiang; Yun Ni; Lifeng Nai; Gang Wu; Liqun Cheng; Yanqi Zhou; Sheng Li; Andrew Li; Jishen Zhao; |
1443 | LogicSeg: Parsing Visual Semantics with Neural Logic Learning and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is in stark contrast to human cognition which abstracts visual perceptions at multiple levels and conducts symbolic reasoning with such structured abstraction. To fill these fundamental gaps, we devise LogicSeg, a holistic visual semantic parser that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge. |
Liulei Li; Wenguan Wang; Yi Yang; |
1444 | The Devil Is in The Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we demonstrate from a frequency perspective that unlearnt upsampling is the main driving force behind the denoising phenomenon with DIP. |
Yilin Liu; Jiang Li; Yunkui Pang; Dong Nie; Pew-Thian Yap; |
1445 | Video Object Segmentation-aware Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a video object segmentation (VOS)-aware training framework called VOS-VFI that allows VFI models to interpolate frames with more precise object boundaries. |
Jun-Sang Yoo; Hongjae Lee; Seung-Won Jung; |
1446 | Coherent Event Guided Low-Light Video Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For low-light videos, event data are not only suitable to help capture temporal correspondences but also provide alternative observations in the form of intensity ratios between consecutive frames and exposure-invariant information. Motivated by this, we propose a low-light video enhancement method with hybrid inputs of events and frames. |
Jinxiu Liang; Yixin Yang; Boyu Li; Peiqi Duan; Yong Xu; Boxin Shi; |
1447 | Texture Learning Domain Randomization for Domain Generalized Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a novel framework, coined Texture Learning Domain Randomization (TLDR). |
Sunghwan Kim; Dae-hwan Kim; Hoseong Kim; |
1448 | FCCNs: Fully Complex-valued Convolutional Networks Using Complex-valued Color Model and Loss Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, they lack an end-to-end flow of complex-valued information, making them inconsistent w.r.t. the claimed operating domain, i.e., complex numbers. Considering these inconsistencies, we propose a complex-valued color model and loss function and turn fully-connected layers into convolutional layers. |
Saurabh Yadav; Koteswar Rao Jerripothula; |
1449 | Learning Concise and Descriptive Attributes for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We hypothesize that there exist subsets of attributes that can maintain the classification performance with much smaller sizes, and propose a novel learning-to-search method to discover those concise sets of attributes. |
An Yan; Yu Wang; Yiwu Zhong; Chengyu Dong; Zexue He; Yujie Lu; William Yang Wang; Jingbo Shang; Julian McAuley; |
1450 | Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we target a more challenging scenario, i.e., joint scene novel view synthesis and editing based on implicit neural scene representations. |
Yuxin Wang; Wayne Wu; Dan Xu; |
1451 | Label-Noise Learning with Intrinsically Long-Tailed Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a learning framework for label-noise learning with intrinsically long-tailed data. |
Yang Lu; Yiliang Zhang; Bo Han; Yiu-ming Cheung; Hanzi Wang; |
1452 | SeeABLE: Soft Discrepancies and Bounded Contrastive Learning for Exposing Deepfakes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deepfake detector, called SeeABLE, that formalizes the detection problem as a (one-class) out-of-distribution detection task and generalizes better to unseen deepfakes. |
Nicolas Larue; Ngoc-Son Vu; Vitomir Struc; Peter Peer; Vassilis Christophides; |
1453 | Semi-Supervised Learning Via Weight-Aware Distillation Under Class Distribution Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, by strict mathematical reasoning, we reveal that the SSL error under class distribution mismatch is composed of pseudo-labeling error and invasion error, both of which jointly bound the SSL population risk. |
Pan Du; Suyun Zhao; Zisen Sheng; Cuiping Li; Hong Chen; |
1454 | ELFNet: Evidential Local-global Fusion for Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce the Evidential Local-global Fusion (ELF) framework for stereo matching, which endows both uncertainty estimation and confidence-aware fusion with trustworthy heads. |
Jieming Lou; Weide Liu; Zhuo Chen; Fayao Liu; Jun Cheng; |
1455 | SimpleClick: Interactive Image Segmentation with Simple Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although this design is simple and has been proven effective, it has not yet been explored for interactive segmentation. To fill this gap, we propose SimpleClick, the first plain-backbone method for interactive segmentation. |
Qin Liu; Zhenlin Xu; Gedas Bertasius; Marc Niethammer; |
1456 | Towards Content-based Pixel Retrieval in Revisited Oxford and Paris Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces the first two landmark pixel retrieval benchmarks. |
Guoyuan An; Woo Jae Kim; Saelyne Yang; Rong Li; Yuchi Huo; Sun-Eui Yoon; |
1457 | S-TREK: Sequential Translation and Rotation Equivariant Keypoints for Local Feature Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we introduce S-TREK, a novel local feature extractor that combines a deep keypoint detector, which is both translation and rotation equivariant by design, with a lightweight deep descriptor extractor. |
Emanuele Santellani; Christian Sormann; Mattia Rossi; Andreas Kuhn; Friedrich Fraundorfer; |
1458 | Retro-FPN: Retrospective Feature Pyramid Network for Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most previous methods suffered from ambiguous region features or failed to refine per-point features effectively, which leads to information loss and ambiguous semantic identification. To resolve this, we propose Retro-FPN to model the per-point feature prediction as an explicit and retrospective refining process, which goes through all the pyramid layers to extract semantic features explicitly for each point. |
Peng Xiang; Xin Wen; Yu-Shen Liu; Hui Zhang; Yi Fang; Zhizhong Han; |
1459 | Rethinking Range View Representation for LiDAR Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we unveil several key factors in building powerful range view models. |
Lingdong Kong; Youquan Liu; Runnan Chen; Yuexin Ma; Xinge Zhu; Yikang Li; Yuenan Hou; Yu Qiao; Ziwei Liu; |
1460 | Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To suppress potential over-segmentation, we propose to construct local scenes with the weight mask for each instance. |
Weiguang Zhao; Yuyao Yan; Chaolong Yang; Jianan Ye; Xi Yang; Kaizhu Huang; |
1461 | BANSAC: A Dynamic BAyesian Network for Adaptive SAmple Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive a dynamic Bayesian network that updates individual data points’ inlier scores while iterating RANSAC. |
Valter Piedade; Pedro Miraldo; |
1462 | ShapeScaffolder: Structure-Aware 3D Shape Generation from Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ShapeScaffolder, a structure-based neural network for generating colored 3D shapes based on text input. |
Xi Tian; Yong-Liang Yang; Qi Wu; |
1463 | Read-only Prompt Optimization for Vision-Language Few-shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance vari- ance and generalization, especially in data-deficient set- tings. To address these issues, we propose a novel ap- proach, Read-only Prompt Optimization (RPO). |
Dongjun Lee; Seokwon Song; Jihee Suh; Joonmyeong Choi; Sanghyeok Lee; Hyunwoo J. Kim; |
1464 | COCO-O: A Benchmark for Object Detectors Under Natural Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To give a more comprehensive robustness assessment, we introduce COCO-O(ut-of-distribution), a test dataset based on COCO with 6 types of natural distribution shifts. |
Xiaofeng Mao; Yuefeng Chen; Yao Zhu; Da Chen; Hang Su; Rong Zhang; Hui Xue; |
1465 | E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is difficult to reconstruct a sharp NeRF from blurry input as often occurred in the wild. To solve this problem, we propose a novel Event-Enhanced NeRF (E2NeRF) by utilizing the combination data of a bio-inspired event camera and a standard RGB camera. |
Yunshan Qi; Lin Zhu; Yu Zhang; Jia Li; |
1466 | EgoTV: Egocentric Task Verification from Natural Language Task Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the needs of EgoTV, we propose a novel Neuro-Symbolic Grounding (NSG) approach that leverages symbolic representations to capture the compositional and temporal structure of tasks. |
Rishi Hazra; Brian Chen; Akshara Rai; Nitin Kamra; Ruta Desai; |
1467 | Benchmarking Low-Shot Robustness to Natural Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such fine-tuning assumes access to large amounts of labelled data, and the extent to which the observations hold when the amount of training data is not as high remains unknown. We address this gap by performing the first in-depth study of robustness to various natural distribution shifts in different low-shot regimes: spanning datasets, architectures, pre-trained initializations, and state-of-the-art robustness interventions. |
Aaditya Singh; Kartik Sarangmath; Prithvijit Chattopadhyay; Judy Hoffman; |
1468 | AdaptGuard: Defending Against Universal Attacks for Model Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore both universal adversarial perturbations and backdoor attacks as loopholes on the source side and discover that they still survive in the target models after adaptation. To address this issue, we propose a model preprocessing framework, named AdaptGuard, to improve the security of model adaptation algorithms. |
Lijun Sheng; Jian Liang; Ran He; Zilei Wang; Tieniu Tan; |
1469 | StageInteractor: Query-based Object Detector with Cross-stage Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. |
Yao Teng; Haisong Liu; Sheng Guo; Limin Wang; |
1470 | DeLiRa: Self-Supervised Depth, Light, and Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to use the multi-view photometric objective from the self-supervised depth estimation literature as a geometric regularizer for volumetric rendering, significantly improving novel view synthesis without requiring additional information. |
Vitor Guizilini; Igor Vasiljevic; Jiading Fang; Rares Ambrus; Sergey Zakharov; Vincent Sitzmann; Adrien Gaidon; |
1471 | Moment Detection in Long Tutorial Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the task of moment detection, in which the goal is to localise the temporal window where a given event occurs within a given tutorial video. |
Ioana Croitoru; Simion-Vlad Bogolin; Samuel Albanie; Yang Liu; Zhaowen Wang; Seunghyun Yoon; Franck Dernoncourt; Hailin Jin; Trung Bui; |
1472 | Stable Cluster Discrimination for Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering due to the lack of ground-truth labels and positive instances for certain clusters in each mini-batch. To mitigate the issue, a novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly. |
Qi Qian; |
1473 | Pix2Video: Video Editing Using Image Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They support inverting real images and conditional (e.g., text) generation, making them attractive for high-quality image editing applications. We investigate how to use such pre-trained image models for text-guided video editing. |
Duygu Ceylan; Chun-Hao P. Huang; Niloy J. Mitra; |
1474 | DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into a unified 3D space for 3D object detection. |
Hongyang Li; Hao Zhang; Zhaoyang Zeng; Shilong Liu; Feng Li; Tianhe Ren; Lei Zhang; |
1475 | Holistic Geometric Feature Learning for Structured Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose a frequency-domain feature learning strategy (F-Learn) to fuse scattered geometric fragments holistically for topology-intact structure reasoning. |
Ziqiong Lu; Linxi Huan; Qiyuan Ma; Xianwei Zheng; |
1476 | FateZero: Fusing Attentions for Zero-shot Text-based Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. |
Chenyang QI; Xiaodong Cun; Yong Zhang; Chenyang Lei; Xintao Wang; Ying Shan; Qifeng Chen; |
1477 | Uncertainty-guided Learning for Improving Image Manipulation Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address both problems by introducing an uncertainty-guided learning framework, which measures data and model uncertainty by a novel Uncertainty Estimation Network (UEN). |
Kaixiang Ji; Feng Chen; Xin Guo; Yadong Xu; Jian Wang; Jingdong Chen; |
1478 | LMR: A Large-Scale Multi-Reference Dataset for Reference-Based Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we construct a large-scale, multi-reference super-resolution dataset, named LMR. |
Lin Zhang; Xin Li; Dongliang He; Fu Li; Errui Ding; Zhaoxiang Zhang; |
1479 | Neural Implicit Surface Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the use of smooth neural networks for modeling dynamic variations of implicit surfaces under the level set equation (LSE). |
Tiago Novello; Vinicius da Silva; Guilherme Schardong; Luiz Schirmer; Helio Lopes; Luiz Velho; |
1480 | Distribution-Aligned Diffusion for Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by their capability, we explore a diffusion-based approach for human mesh recovery, and propose a Human Mesh Diffusion (HMDiff) framework which frames mesh recovery as a reverse diffusion process. |
Lin Geng Foo; Jia Gong; Hossein Rahmani; Jun Liu; |
1481 | Rosetta Neurons: Mining The Common Units in A Model Zoo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm for mining a dictionary of Rosetta Neurons across several popular vision models: Class Supervised-ResNet50, DINO-ResNet50, DINO-ViT, MAE, CLIP-ResNet50, BigGAN, StyleGAN-2, StyleGAN-XL. |
Amil Dravid; Yossi Gandelsman; Alexei A. Efros; Assaf Shocher; |
1482 | Semi-Supervised Semantic Segmentation Under Label Noise Via Diverse Learning Groups Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an approach that is robust to label noise in the annotated data. |
Peixia Li; Pulak Purkait; Thalaiyasingam Ajanthan; Majid Abdolshah; Ravi Garg; Hisham Husain; Chenchen Xu; Stephen Gould; Wanli Ouyang; Anton van den Hengel; |
1483 | AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adaptive MoE framework for multi-task vision recognition, dubbed AdaMV-MoE. |
Tianlong Chen; Xuxi Chen; Xianzhi Du; Abdullah Rashwan; Fan Yang; Huizhong Chen; Zhangyang Wang; Yeqing Li; |
1484 | Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel hierarchical visual category modeling scheme to separate out-of-distribution data from in-distribution data through joint representation learning and statistical modeling. |
Jinglun Li; Xinyu Zhou; Pinxue Guo; Yixuan Sun; Yiwen Huang; Weifeng Ge; Wenqiang Zhang; |
1485 | Diffuse3D: Wide-Angle 3D Photography Via Bilateral Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to resolve the challenging problem of wide-angle novel view synthesis from a single image, a.k.a. wide-angle 3D photography. |
Yutao Jiang; Yang Zhou; Yuan Liang; Wenxi Liu; Jianbo Jiao; Yuhui Quan; Shengfeng He; |
1486 | ReNeRF: Relightable Neural Radiance Fields with Nearfield Lighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we target the application scenario of capturing high-fidelity assets for neural relighting in controlled studio conditions, but without requiring a dense light stage. |
Yingyan Xu; Gaspard Zoss; Prashanth Chandran; Markus Gross; Derek Bradley; Paulo Gotardo; |
1487 | Segment Anything Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. |
Alexander Kirillov; Eric Mintun; Nikhila Ravi; Hanzi Mao; Chloe Rolland; Laura Gustafson; Tete Xiao; Spencer Whitehead; Alexander C. Berg; Wan-Yen Lo; Piotr Dollar; Ross Girshick; |
1488 | Unsupervised Prompt Tuning for Text-Driven Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a rarely studied problem is to optimize text prompts without using any annotations. In this paper, we delve into this problem and propose an Unsupervised Prompt Tuning framework for text-driven object detection, which is composed of two novel mean teaching mechanisms. |
Weizhen He; Weijie Chen; Binbin Chen; Shicai Yang; Di Xie; Luojun Lin; Donglian Qi; Yueting Zhuang; |
1489 | Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a self-supervised method for learning motion-focused video representations. |
Fida Mohammad Thoker; Hazel Doughty; Cees G. M. Snoek; |
1490 | Re-ReND: Real-Time Rendering of NeRFs Across Devices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel approach for rendering a pre-trained Neural Radiance Field (NeRF) in real-time on resource-constrained devices. |
Sara Rojas; Jesus Zarzar; Juan C. Pérez; Artsiom Sanakoyeu; Ali Thabet; Albert Pumarola; Bernard Ghanem; |
1491 | 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore 360deg images for visual object tracking and perceive new challenges caused by large distortion, stitching artifacts, and other unique attributes of 360deg images. |
Huajian Huang; Yinzhe Xu; Yingshu Chen; Sai-Kit Yeung; |
1492 | Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. |
Yao Wei; Yanchao Sun; Ruijie Zheng; Sai Vemprala; Rogerio Bonatti; Shuhang Chen; Ratnesh Madaan; Zhongjie Ba; Ashish Kapoor; Shuang Ma; |
1493 | Generalizing Event-Based Motion Deblurring in Real-World Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current approaches are limited in their practical usage, as they assume the same spatial resolution of inputs and specific blurriness distributions. This work addresses these limitations and aims to generalize the performance of event-based deblurring in real-world scenarios. |
Xiang Zhang; Lei Yu; Wen Yang; Jianzhuang Liu; Gui-Song Xia; |
1494 | Handwritten and Printed Text Segmentation: A Signature Case Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, in this research, we develop novel approaches to address the challenges of handwritten and printed text segmentation. |
Sina Gholamian; Ali Vahdat; |
1495 | LERF: Language Embedded Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Humans describe the physical world using natural language to refer to specific 3D locations based on a vast range of properties: visual appearance, semantics, abstract associations, or actionable affordances. In this work we propose Language Embedded Radiance Fields (LERFs), a method for grounding language embeddings from off-the-shelf models like CLIP into NeRF, which enable these types of open-ended language queries in 3D. |
Justin Kerr; Chung Min Kim; Ken Goldberg; Angjoo Kanazawa; Matthew Tancik; |
1496 | DomainAdaptor: A Novel Approach to Test-time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate a more challenging task that aims to adapt a trained CNN model to unseen domains during the test. |
Jian Zhang; Lei Qi; Yinghuan Shi; Yang Gao; |
1497 | RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel approach to novel object captioning which employs relative contrastive learning to learn visual and semantic alignment. |
Jiashuo Fan; Yaoyuan Liang; Leyao Liu; Shaolun Huang; Lei Zhang; |
1498 | Mitigating and Evaluating Static Bias of Action Representations in The Background and The Foreground Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we empirically verify the existence of foreground static bias by creating test videos with conflicting signals from the static and moving portions of the video. To tackle this issue, we propose a simple yet effective technique, StillMix, to learn robust action representations. |
Haoxin Li; Yuan Liu; Hanwang Zhang; Boyang Li; |
1499 | RbA: Segmenting Unknown Regions Rejected By All Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore another paradigm with region-level classification to better segment unknown objects. |
Nazir Nayal; Misra Yavuz; João F. Henriques; Fatma Güney; |
1500 | CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their applications in various scenarios. To overcome these limitations, we propose Cube-based Neural Radiance Field (CuNeRF), a zero-shot MIASSR framework that is able to yield medical images at arbitrary scales and free viewpoints in a continuous domain. |
Zixuan Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; |
1501 | Beyond Object Recognition: A New Benchmark Towards Object Concept Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a challenging Object Concept Learning (OCL) task to push the envelope of object understanding. |
Yong-Lu Li; Yue Xu; Xinyu Xu; Xiaohan Mao; Yuan Yao; Siqi Liu; Cewu Lu; |
1502 | Towards Open-Vocabulary Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation, which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. |
Haochen Wang; Cilin Yan; Shuai Wang; Xiaolong Jiang; Xu Tang; Yao Hu; Weidi Xie; Efstratios Gavves; |
1503 | Unleashing The Power of Gradient Signal-to-Noise Ratio for Zero-Shot NAS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we not only explicitly give the probability that larger GSNR at network initialization can ensure better generalization, but also theoretically prove that GSNR can ensure better convergence. Then we design the Xi-based gradient signal-to-noise ratio (Xi-GSNR) as a Zero-Shot NAS proxy to predict the network accuracy at initialization. |
Zihao Sun; Yu Sun; Longxing Yang; Shun Lu; Jilin Mei; Wenxiao Zhao; Yu Hu; |
1504 | EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce EgoObjects, a large-scale egocentric dataset for fine-grained object understanding. |
Chenchen Zhu; Fanyi Xiao; Andres Alvarado; Yasmine Babaei; Jiabo Hu; Hichem El-Mohri; Sean Culatana; Roshan Sumbaly; Zhicheng Yan; |
1505 | What Can Simple Arithmetic Operations Do for Temporal Modeling? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the potential of four simple arithmetic operations for temporal modeling. |
Wenhao Wu; Yuxin Song; Zhun Sun; Jingdong Wang; Chang Xu; Wanli Ouyang; |
1506 | Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, existing methods still face the problem of insufficient matching with HSI data. The issues lie in three aspects: 1) fixed gradient descent step in the data module while the degradation of HSI is agnostic in the pixel-level. 2) inadequate prior module for 3D HSI cube. 3) stage interaction ignoring the differences in features at different stages. To address these issues, in this work, we propose a Pixel Adaptive Deep Unfolding Transformer (PADUT) for HSI reconstruction. |
Miaoyu Li; Ying Fu; Ji Liu; Yulun Zhang; |
1507 | BiViT: Extremely Compressed Binary Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose to solve two fundamental challenges to push the horizon of Binary Vision Transformers (BiViT). |
Yefei He; Zhenyu Lou; Luoming Zhang; Jing Liu; Weijia Wu; Hong Zhou; Bohan Zhuang; |
1508 | Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, POT’s fixed structure for direct optimization is sub-optimal as the scene complexity evolves continuously with updates to cached color and density, necessitating refining the sampling distribution to capture signal complexity accordingly. To address this issue, we propose the dynamic PlenOctree (DOT), which adaptively refines the sample distribution to adjust to changing scene complexity. |
Haotian Bai; Yiqi Lin; Yize Chen; Lin Wang; |
1509 | Scene Matters: Model-based Deep Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences. |
Lv Tang; Xinfeng Zhang; Gai Zhang; Xiaoqi Ma; |
1510 | Tree-Structured Shading Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A main challenge in inferring the shade tree is that the inference problem involves both the discrete tree structure and the continuous parameters of the tree nodes. We propose a hybrid approach to address this issue. |
Chen Geng; Hong-Xing Yu; Sharon Zhang; Maneesh Agrawala; Jiajun Wu; |
1511 | EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). |
Yulin Wang; Yang Yue; Rui Lu; Tianjiao Liu; Zhao Zhong; Shiji Song; Gao Huang; |
1512 | Simulating Fluids in Real-World Still Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle the problem of real-world fluid animation from a still image. |
Siming Fan; Jingtan Piao; Chen Qian; Hongsheng Li; Kwan-Yee Lin; |
1513 | SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated, Noisy, and Decimated Point Cloud Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new method to infer keypoints from arbitrary object categories in practical scenarios where point cloud data (PCD) are noisy, down-sampled and arbitrarily rotated. |
Mohammad Zohaib; Alessio Del Bue; |
1514 | IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since intrinsic decomposition is a fundamentally under-constrained inverse problem, we propose a novel distance-aware point sampling and adaptive reflectance iterative clustering optimization method, which enables IntrinsicNeRF with traditional intrinsic decomposition constraints to be trained in an unsupervised manner, resulting in multi-view consistent intrinsic decomposition results. |
Weicai Ye; Shuo Chen; Chong Bao; Hujun Bao; Marc Pollefeys; Zhaopeng Cui; Guofeng Zhang; |
1515 | Segmenting Known Objects and Unseen Unknowns Without Prior Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the necessary step to extend segmentation with a new setting which we term holistic segmentation. |
Stefano Gasperini; Alvaro Marcos-Ramiro; Michael Schmidt; Nassir Navab; Benjamin Busam; Federico Tombari; |
1516 | A Good Student Is Cooperative and Reliable: CNN-Transformer Collaborative Learning for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we strive to answer the question ‘how to collaboratively learn convolutional neural network (CNN)-based and vision transformer (ViT)-based models by selecting and exchanging the reliable knowledge between them for semantic segmentation?’ |
Jinjing Zhu; Yunhao Luo; Xu Zheng; Hao Wang; Lin Wang; |
1517 | CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel unsupervised Cross-Modality Domain Adaptation (CMDA) framework to leverage multi-modality (Images and Events) information for nighttime semantic segmentation, with only labels on daytime images. |
Ruihao Xia; Chaoqiang Zhao; Meng Zheng; Ziyan Wu; Qiyu Sun; Yang Tang; |
1518 | Learning with Diversity: Self-Expanded Equalization for Better Generalized Deep Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new self-expanded equalization (SEE) method to effectively generalize the DML model to both unseen categories and domains. |
Jiexi Yan; Zhihui Yin; Erkun Yang; Yanhua Yang; Heng Huang; |
1519 | Fan-Beam Binarization Difference Projection (FB-BDP): A Novel Local Object Descriptor for Fine-Grained Leaf Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fine-grained leaf image retrieval (FGLIR) aims to search similar leaf images in subspecies level which involves very high interclass visual similarity and accordingly poses great challenges to leaf image description. In this study, we introduce a new concept, named fan-beam binarization difference projection (FB-BDP) to address this challenging issue. |
Xin Chen; Bin Wang; Yongsheng Gao; |
1520 | Dynamic Residual Classifier for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing CIL methods exploit the long-tailed (LT) recognition techniques, e.g., the adjusted losses and the data re-sampling methods, to handle the data imbalance issue within each increment task. In this work, the dynamic nature of data imbalance in CIL is shown and a novel Dynamic Residual Classifier (DRC) is proposed to handle this challenging scenario. |
Xiuwei Chen; Xiaobin Chang; |
1521 | Optimizing The Placement of Roadside LiDARs for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an approach to optimize the placement of roadside LiDARs by selecting optimized positions within the scene for better perception performance. |
Wentao Jiang; Hao Xiang; Xinyu Cai; Runsheng Xu; Jiaqi Ma; Yikang Li; Gim Hee Lee; Si Liu; |
1522 | Diverse Inpainting and Editing with GAN Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle an even more difficult task, inverting erased images into GAN’s latent space for realistic inpaintings and editings. |
Ahmet Burak Yildirim; Hamza Pehlivan; Bahri Batuhan Bilecen; Aysegul Dundar; |
1523 | InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. |
Sirui Xu; Zhengyuan Li; Yu-Xiong Wang; Liang-Yan Gui; |
1524 | DiFaReli: Diffusion Face Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach to single-view face relighting in the wild. |
Puntawat Ponglertnapakorn; Nontawat Tritrong; Supasorn Suwajanakorn; |
1525 | IST-Net: Prior-Free Category-Level Pose Estimation with Implicit Space Transformation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The keypoint actually is the explicit deformation process, which aligns camera and world coordinates supervised by world space 3D models (also called canonical space). Inspired by these observations, we introduce a simple prior-free implicit space transformation network, namely IST-Net, to transform camera-space features to world-space counterparts and build correspondences between them in an implicit manner without relying on 3D priors. |
Jianhui Liu; Yukang Chen; Xiaoqing Ye; Xiaojuan Qi; |
1526 | Building3D: A Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a urban-scale dataset consisting of more than 160 thousands buildings along with corresponding point clouds, mesh and wire-frame models, covering 16 cities in Estonia about 998 Km2. |
Ruisheng Wang; Shangfeng Huang; Hongxin Yang; |
1527 | Multi-Object Discovery By Low-Dimensional Object Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to model pixel-wise geometry and object motion to remove ambiguity in reconstructing flow from a single image. |
Sadra Safadoust; Fatma Güney; |
1528 | Localizing Object-Level Shape Variations with Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a technique to generate a collection of images that depicts variations in the shape of a specific object, enabling an object-level shape exploration process. |
Or Patashnik; Daniel Garibi; Idan Azuri; Hadar Averbuch-Elor; Daniel Cohen-Or; |
1529 | CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective GCN-based approach, named CoSign, to incorporate Co-occurrence Signals and explore the potential of skeleton data in CSLR. |
Peiqi Jiao; Yuecong Min; Yanan Li; Xiaotao Wang; Lei Lei; Xilin Chen; |
1530 | GACE: Geometry Aware Confidence Enhancement for Black-Box 3D Object Detectors on LiDAR-Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In 3D, however, considering the object properties and its surroundings in a holistic way is important to distinguish between true and false positive detections, e.g. occluded pedestrians in a group. To address this, we present GACE, an intuitive and highly efficient method to improve the confidence estimation of a given black-box 3D object detector. |
David Schinagl; Georg Krispel; Christian Fruhwirth-Reisinger; Horst Possegger; Horst Bischof; |
1531 | Curvature-Aware Training for Coordinate Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a solution that leverages second-order optimization methods to significantly reduce training times for coordinate networks while maintaining their compressibility. |
Hemanth Saratchandran; Shin-Fang Chng; Sameera Ramasinghe; Lachlan MacDonald; Simon Lucey; |
1532 | Disentangle Then Parse: Night-time Semantic Segmentation with Illumination Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most prior semantic segmentation methods have been developed for day-time scenes, while typically underperforming in night-time scenes due to insufficient and complicated lighting conditions. In this work, we tackle this challenge by proposing a novel night-time semantic segmentation paradigm, i.e., disentangle then parse (DTP). |
Zhixiang Wei; Lin Chen; Tao Tu; Pengyang Ling; Huaian Chen; Yi Jin; |
1533 | Large-Scale Land Cover Mapping with Fine-Grained Classes Via Class-Aware Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, in class-imbalanced scenarios, existing pseudo-labeling methods mostly only pick confident samples, failing to exploit the hard samples during training. To tackle these issues, we propose a unified Class-Aware Semi-Supervised Semantic Segmentation framework. |
Runmin Dong; Lichao Mou; Mengxuan Chen; Weijia Li; Xin-Yi Tong; Shuai Yuan; Lixian Zhang; Juepeng Zheng; Xiaoxiang Zhu; Haohuan Fu; |
1534 | ToonTalker: Cross-Domain Face Reenactment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method for cross-domain reenactment without paired data. |
Yuan Gong; Yong Zhang; Xiaodong Cun; Fei Yin; Yanbo Fan; Xuan Wang; Baoyuan Wu; Yujiu Yang; |
1535 | LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is a crucial issue, since the lengths of the text to be recognized are usually not given in advance in real-world applications, but it has not been adequately investigated in previous works. Therefore, we propose in this paper a method called Length-Insensitive Scene TExt Recognizer (LISTER), which remedies the limitation regarding the robustness to various text lengths. |
Changxu Cheng; Peng Wang; Cheng Da; Qi Zheng; Cong Yao; |
1536 | Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. |
Hyungmin Kim; Sungho Suh; Daehwan Kim; Daun Jeong; Hansang Cho; Junmo Kim; |
1537 | Distribution-Aware Prompt Tuning for Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observed that the alignment becomes more effective when embeddings of each modality are ‘well-arranged’ in the latent space. Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. |
Eulrang Cho; Jooyeon Kim; Hyunwoo J Kim; |
1538 | Learning Rain Location Prior for Nighttime Deraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlight the importance of rain streak location information in nighttime deraining. |
Fan Zhang; Shaodi You; Yu Li; Ying Fu; |
1539 | FBLNet: FeedBack Loop Network for Driver Attention Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a FeedBack Loop Network (FBLNet), which attempts to model the driving experience accumulation procedure. |
Yilong Chen; Zhixiong Nan; Tao Xiang; |
1540 | Source-free Domain Adaptive Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a new task, named source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process. |
Qucheng Peng; Ce Zheng; Chen Chen; |
1541 | Video Anomaly Detection Via Sequentially Learning Multiple Pretext Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to sequentially learn multiple pretext tasks according to their difficulties in an ascending manner to improve the performance of anomaly detection. |
Chenrui Shi; Che Sun; Yuwei Wu; Yunde Jia; |
1542 | SlaBins: Fisheye Depth Estimation Using Slanted Bins on Road Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since fisheye cameras head road areas, it observes road areas mostly and results in severe distortion on object areas, such as vehicles or pedestrians. To alleviate these issues, we propose a new fisheye depth estimation network, SlaBins, that infers an accurate and dense depth map based on a geometric property of road environments; most objects are standing (i.e., orthogonal) on the road environments. |
Jongsung Lee; Gyeongsu Cho; Jeongin Park; Kyongjun Kim; Seongoh Lee; Jung-Hee Kim; Seong-Gyun Jeong; Kyungdon Joo; |
1543 | DOT: A Distillation-Oriented Trainer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we observe a trade-off between task and distillation losses, i.e., introducing distillation loss limits the convergence of task loss. |
Borui Zhao; Quan Cui; Renjie Song; Jiajun Liang; |
1544 | Neural Collage Transfer: Artistic Reconstruction Via Material Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for learning to make collages via reinforcement learning without the need for demonstrations or collage artwork data. |
Ganghun Lee; Minji Kim; Yunsu Lee; Minsu Lee; Byoung-Tak Zhang; |
1545 | Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. |
Rui Chen; Yongwei Chen; Ningxin Jiao; Kui Jia; |
1546 | MagicFusion: Boosting Text-to-Image Generation Performance By Fusing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. |
Jing Zhao; Heliang Zheng; Chaoyue Wang; Long Lan; Wenjing Yang; |
1547 | UCF: Uncovering Common Features for Generalizable Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel approach to address the two types of overfitting issues by uncovering common forgery features. |
Zhiyuan Yan; Yong Zhang; Yanbo Fan; Baoyuan Wu; |
1548 | March in Chat: Interactive Prompting for Remote Embodied Referring Expression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a March-in-Chat (MiC) model that can talk to the LLM on the fly and plan dynamically based on a newly proposed Room-and-Object Aware Scene Perceiver (ROASP). |
Yanyuan Qiao; Yuankai Qi; Zheng Yu; Jing Liu; Qi Wu; |
1549 | Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this article, we present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss that outperforms current state-of-the-art results. |
Fabian Deuser; Konrad Habel; Norbert Oswald; |
1550 | Novel Scenes & Classes: Towards Adaptive Open-set Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, e.g., the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured mOtif MAtching (SOMA) framework for AOOD, which models the high-order relation with motifs, i.e., a statistically significant subgraph, and formulates AOOD solution as motif matching to learn with high-order patterns. |
Wuyang Li; Xiaoqing Guo; Yixuan Yuan; |
1551 | LIMITR: Leveraging Local Information for Medical Image-Text Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on chest X-ray images and their corresponding radiological reports. It presents a new model that learns a joint X-ray image & report representation. |
Gefen Dawidowicz; Elad Hirsch; Ayellet Tal; |
1552 | Multi-task View Synthesis with Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the MTVS problem, we propose MuvieNeRF, a framework that incorporates both multi-task and cross-view knowledge to simultaneously synthesize multiple scene properties. |
Shuhong Zheng; Zhipeng Bao; Martial Hebert; Yu-Xiong Wang; |
1553 | Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite this, the referring style transfer module still faces issues with computation cost and over-fitting problems. To address this problem, we propose a novel framework called Informative Data Mining (IDM) that enables efficient one-shot domain adaptation for semantic segmentation. |
Yuxi Wang; Jian Liang; Jun Xiao; Shuqi Mei; Yuran Yang; Zhaoxiang Zhang; |
1554 | Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs’ RAW data in different operation modes. |
Haechang Lee; Dongwon Park; Wongi Jeong; Kijeong Kim; Hyunwoo Je; Dongil Ryu; Se Young Chun; |
1555 | Visual Traffic Knowledge Graph Generation from Scene Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although previous works on traffic scene understanding have achieved great success, most of them stop at a lowlevel perception stage, such as road segmentation and lane detection, and few concern high-level understanding. In this paper, we present Visual Traffic Knowledge Graph Generation (VTKGG), a new task for in-depth traffic scene understanding that tries to extract multiple kinds of information and integrate them into a knowledge graph. |
Yunfei Guo; Fei Yin; Xiao-hui Li; Xudong Yan; Tao Xue; Shuqi Mei; Cheng-Lin Liu; |
1556 | Householder Projector for Unsupervised Latent Semantics Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The non-orthogonality would entangle semantic attributes in the top few eigenvectors, and the large dimensionality might result in meaningless variations among the directions even if the matrix is orthogonal. To avoid these issues, we propose Householder Projector, a flexible and general low-rank orthogonal matrix representation based on Householder transformations, to parameterize the projection matrix. |
Yue Song; Jichao Zhang; Nicu Sebe; Wei Wang; |
1557 | Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although deep learning-based solutions have achieved impressive reconstruction performance in image super-resolution (SR), these models are generally large, with complex architectures, making them incompatible with low-power devices with many computational and memory constraints. To overcome these challenges, we propose a spatially-adaptive feature modulation (SAFM) mechanism for efficient SR design. |
Long Sun; Jiangxin Dong; Jinhui Tang; Jinshan Pan; |
1558 | Unsupervised Image Denoising in Real-World Scenarios Via Self-Collaboration Parallel Generative Adversarial Branches Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although unsupervised approaches based on generative adversarial networks (GANs) offer a promising solution for denoising without paired datasets, they are difficult in surpassing the performance limitations of conventional GAN-based unsupervised frameworks without significantly modifying existing structures or increasing the computational complexity of denoisers. To address this problem, we propose a self-collaboration (SC) strategy for multiple denoisers. |
Xin Lin; Chao Ren; Xiao Liu; Jie Huang; Yinjie Lei; |
1559 | Bayesian Optimization Meets Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. |
HyunJae Lee; Heon Song; Hyeonsoo Lee; Gi-hyeon Lee; Suyeong Park; Donggeun Yoo; |
1560 | No Fear of Classifier Biases: Neural Collapse Inspired Federated Learning with Synthetic and Fixed Classifier Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent advances in neural collapse have shown that the classifiers and feature prototypes under perfect training scenarios collapse into an optimal structure called simplex equiangular tight frame (ETF). Building on this neural collapse insight, we propose a solution to the FL’s classifier bias problem by utilizing a synthetic and fixed ETF classifier during training. |
Zexi Li; Xinyi Shang; Rui He; Tao Lin; Chao Wu; |
1561 | MemorySeg: Online LiDAR Semantic Segmentation with A Latent Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the challenge of exploiting the information from the past frames to improve the predictions of the current frame in an online fashion. |
Enxu Li; Sergio Casas; Raquel Urtasun; |
1562 | Hashing Neural Video Decomposition with Multiplicative Residuals in Space-Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a video decomposition method that facilitates layer-based editing of videos with spatiotemporally varying lighting and motion effects. |
Cheng-Hung Chan; Cheng-Yang Yuan; Cheng Sun; Hwann-Tzong Chen; |
1563 | Multimodal Variational Auto-encoder Based Audio-Visual Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an Explicit Conditional Multimodal Variational Auto-Encoder (ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the video sequence. |
Yuxin Mao; Jing Zhang; Mochu Xiang; Yiran Zhong; Yuchao Dai; |
1564 | DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. |
Amit Kumar Rana; Sabarinath Mahadevan; Alexander Hermans; Bastian Leibe; |
1565 | FRAug: Tackling Federated Learning with Non-IID Features Via Representation Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This issue can significantly impact the global model performance in the FL framework. In this work, we propose Federated Representation Augmentation (FRAug) to resolve this practical and challenging problem. |
Haokun Chen; Ahmed Frikha; Denis Krompass; Jindong Gu; Volker Tresp; |
1566 | Homography Guided Temporal Fusion for Road Line and Marking Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work is motivated by the observations that road lines and markings are (1) frequently occluded in the presence of moving vehicles, shadow, and glare and (2) highly structured with low intra-class shape variance and overall high appearance consistency. To solve these issues, we propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues facilitating the correct classification of the partially occluded road lines or markings. |
Shan Wang; Chuong Nguyen; Jiawei Liu; Kaihao Zhang; Wenhan Luo; Yanhao Zhang; Sundaram Muthu; Fahira Afzal Maken; Hongdong Li; |
1567 | NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel type of neural fields that uses general radial bases for signal representation. |
Zhang Chen; Zhong Li; Liangchen Song; Lele Chen; Jingyi Yu; Junsong Yuan; Yi Xu; |
1568 | OmnimatteRF: Robust Omnimatte with 3D Background Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel video matting method, OmnimatteRF, that combines dynamic 2D foreground layers and a 3D background model. |
Geng Lin; Chen Gao; Jia-Bin Huang; Changil Kim; Yipeng Wang; Matthias Zwicker; Ayush Saraf; |
1569 | Self-supervised Image Denoising with Downsampled Invariance Loss and Conditional Blind-Spot Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, to realize a more practical denoiser, we propose a novel self-supervised training framework that can remove real noise. |
Yeong Il Jang; Keuntek Lee; Gu Yong Park; Seyun Kim; Nam Ik Cho; |
1570 | Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reveal that informative interactions can be made by simulation with semantic-consistent yet diverse region exploration in an unsupervised paradigm. |
Kehan Li; Yian Zhao; Zhennan Wang; Zesen Cheng; Peng Jin; Xiangyang Ji; Li Yuan; Chang Liu; Jie Chen; |
1571 | RecursiveDet: End-to-End Region-Based Recursive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find the general setting of decoding stages is actually redundant. |
Jing Zhao; Li Sun; Qingli Li; |
1572 | Bold But Cautious: Unlocking The Potential of Personalized Federated Learning Through Cautiously Aggressive Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel guideline for client collaboration in PFL. |
Xinghao Wu; Xuefeng Liu; Jianwei Niu; Guogang Zhu; Shaojie Tang; |
1573 | ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This results in inadequate utilization of spectral information and artifacts after upsampling. To address this issue, we propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure. |
Mingjin Zhang; Chi Zhang; Qiming Zhang; Jie Guo; Xinbo Gao; Jing Zhang; |
1574 | Generative Action Description Prompts for Skeleton-based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a Generative Action-description Prompts (GAP) approach for skeleton-based action recognition. |
Wangmeng Xiang; Chao Li; Yuxuan Zhou; Biao Wang; Lei Zhang; |
1575 | Structure Invariant Transformation for Better Adversarial Transferability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that the existing input transformation based attacks transform the input image globally, resulting in limited diversity of the transformed images. |
Xiaosen Wang; Zeliang Zhang; Jianping Zhang; |
1576 | Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a comprehensive study on a new task named image color aesthetics assessment (ICAA), which aims to assess color aesthetics based on human perception. |
Shuai He; Anlong Ming; Yaqi Li; Jinyuan Sun; ShunTian Zheng; Huadong Ma; |
1577 | Multi-body Depth and Camera Pose Estimation from Multiple Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a depth and camera pose estimation framework to resolve the scale ambiguity in multi-body scenes. |
Andrea Porfiri Dal Cin; Giacomo Boracchi; Luca Magri; |
1578 | DISeR: Designing Imaging Systems with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, cameras and perception models are often designed independently, leading to sub-optimal task performance. In this paper, we formulate these four building blocks of imaging systems as a context-free grammar (CFG), which can be automatically searched over with a learned camera designer to jointly optimize the imaging system with task-specific perception models. |
Tzofi Klinghoffer; Kushagra Tiwary; Nikhil Behari; Bhavya Agrawalla; Ramesh Raskar; |
1579 | The Euclidean Space Is Evil: Hyperbolic Attribute Editing for Few-shot Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods suffer from the trade-off between the quality and diversity of generated images. To tackle this problem, we propose Hyperbolic Attribute Editing (HAE), a simple yet effective method. |
Lingxiao Li; Yi Zhang; Shuhui Wang; |
1580 | FULLER: Unified Multi-modality Multi-task 3D Perception Via Multi-level Gradient Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the issue, we propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization. |
Zhijian Huang; Sihao Lin; Guiyu Liu; Mukun Luo; Chaoqiang Ye; Hang Xu; Xiaojun Chang; Xiaodan Liang; |
1581 | Transparent Shape from A Single View Polarization Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a learning-based method for transparent surface estimation from a single view polarization image. |
Mingqi Shao; Chongkun Xia; Zhendong Yang; Junnan Huang; Xueqian Wang; |
1582 | Invariant Feature Regularization for Fair Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such annotations can be prohibitively expensive due to the diversity of the demographic attributes. To tackle this, we propose to generate diverse data partitions iteratively in an unsupervised fashion. |
Jiali Ma; Zhongqi Yue; Kagaya Tomoyuki; Suzuki Tomoki; Karlekar Jayashree; Sugiri Pranata; Hanwang Zhang; |
1583 | Cross-Domain Product Representation Learning for Rich-Content E-Commerce Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap in the rich-content e-commerce area, in this paper, we introduce a large-scale cross-domain poduct recognition dataset, called ROPE. |
Xuehan Bai; Yan Li; Yanhua Cheng; Wenjie Yang; Quan Chen; Han Li; |
1584 | DriveAdapter: Breaking The Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to explore the possibility of directly adopting the strong teacher model to conduct planning while letting the student model focus more on the perception part. |
Xiaosong Jia; Yulu Gao; Li Chen; Junchi Yan; Patrick Langechuan Liu; Hongyang Li; |
1585 | General Planar Motion from A Pair of 3D Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel 2-point method for estimating the relative pose of a camera undergoing planar motion from 3D data (e.g. from a calibrated stereo setup or an RGB-D sensor). |
Juan Carlos Dibene; Zhixiang Min; Enrique Dunn; |
1586 | Single Depth-image 3D Reflection Symmetry and Shape Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Iterative Symmetry Completion Network (ISCNet), a single depth-image shape completion method that exploits reflective symmetry cues to obtain more detailed shapes. |
Zhaoxuan Zhang; Bo Dong; Tong Li; Felix Heide; Pieter Peers; Baocai Yin; Xin Yang; |
1587 | Local Context-Aware Active Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this has not been fully explored by existing ADA works. In this paper, we propose a Local context-aware ADA framework, named LADA, to address this issue. |
Tao Sun; Cheng Lu; Haibin Ling; |
1588 | Deep Incubation: Training Large Models By Divide-and-Conquering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, training these models is challenging due to high computational costs, painfully slow convergence, and overfitting issues. In this paper, we present Deep Incubation, a novel approach that enables the efficient and effective training of large models by dividing them into smaller sub-modules which can be trained separately and assembled seamlessly. |
Zanlin Ni; Yulin Wang; Jiangwei Yu; Haojun Jiang; Yue Cao; Gao Huang; |
1589 | Downscaled Representation Matters: Improving Image Rescaling with Collaborative Downscaled Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, we propose a Hierarchical Collaborative Downscaling (HCD) method that performs gradient descent w.r.t. the reconstruction loss in both HR and LR domains to improve the downscaled representations, so as to boost IR performance. |
Bingna Xu; Yong Guo; Luoqian Jiang; Mianjie Yu; Jian Chen; |
1590 | Detection Transformer with Stable Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR’s classification loss and matching cost, named position-supervised loss and position-modulated cost. |
Shilong Liu; Tianhe Ren; Jiayu Chen; Zhaoyang Zeng; Hao Zhang; Feng Li; Hongyang Li; Jun Huang; Hang Su; Jun Zhu; Lei Zhang; |
1591 | Be Everywhere – Hear Everything (BEE): Audio Scene Reconstruction By Sparse Audio-Visual Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to circumvent the need for extensive sensors by leveraging audio and visual samples from only a handful of A/V receivers placed in the scene. |
Mingfei Chen; Kun Su; Eli Shlizerman; |
1592 | IVS-Net: Learning Human View Synthesis from Internet Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods usually rely on limited multi-view images typically collected in the studio or commercial high-quality 3D scans for training, which heavily prohibits their generalization capability for in-the-wild images. To solve this problem, we propose a new approach to learn a generalizable human model from a new source of data, i.e., Internet videos. |
Junting Dong; Qi Fang; Tianshuo Yang; Qing Shuai; Chengyu Qiao; Sida Peng; |
1593 | Story Visualization By Online Text Augmentation with Context Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. |
Daechul Ahn; Daneul Kim; Gwangmo Song; Seung Hwan Kim; Honglak Lee; Dongyeop Kang; Jonghyun Choi; |
1594 | Attention Discriminant Sampling for Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes an attention-driven approach to 3-D point cloud sampling. |
Cheng-Yao Hong; Yu-Ying Chou; Tyng-Luh Liu; |
1595 | Global Balanced Experts for Federated Long-Tailed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further strengthen the privacy-preserving ability, we present a GBME-p algorithm with a theoretical guarantee to prevent privacy leakage from the proxy. |
Yaopei Zeng; Lei Liu; Li Liu; Li Shen; Shaoguo Liu; Baoyuan Wu; |
1596 | All4One: Symbiotic Neighbour Contrastive Learning Via Self-Attention and Redundancy Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel contrastive SSL approach, which we call All4One, that reduces the distance between neighbour representations using "centroids" created through a self-attention mechanism. |
Imanol G. Estepa; Ignacio Sarasua; Bhalaji Nagarajan; Petia Radeva; |
1597 | Contrastive Pseudo Learning for Open-World DeepFake Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. |
Zhimin Sun; Shen Chen; Taiping Yao; Bangjie Yin; Ran Yi; Shouhong Ding; Lizhuang Ma; |
1598 | ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples. |
Jiabang He; Lei Wang; Yi Hu; Ning Liu; Hui Liu; Xing Xu; Heng Tao Shen; |
1599 | IHNet: Iterative Hierarchical Network Guided By High-Resolution Estimated Information for Scene Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, inaccuracies in the initial coarse layer’s scene flow estimates may accumulate, leading to incorrect final estimates. To alleviate this, we introduce a novel Iterative Hierarchical Network—-IHNet. |
Yun Wang; Cheng Chi; Min Lin; Xin Yang; |
1600 | SimNP: Learning Self-Similarity Priors Between Neural Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SimNP, a method to learn category-level self-similarities, which combines the advantages of both worlds by connecting neural point radiance fields with a category-level self-similarity representation. |
Christopher Wewer; Eddy Ilg; Bernt Schiele; Jan Eric Lenssen; |
1601 | Beyond The Limitation of Monocular 3D Detector Via Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the potential of depth information, we propose a novel distillation framework that validly improves the performance of the student model without extra depth labels. |
Yiran Yang; Dongshuo Yin; Xuee Rong; Xian Sun; Wenhui Diao; Xinming Li; |
1602 | Cascade-DETR: Delving Into High-Quality Universal Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Cascade-DETR for high-quality universal object detection. |
Mingqiao Ye; Lei Ke; Siyuan Li; Yu-Wing Tai; Chi-Keung Tang; Martin Danelljan; Fisher Yu; |
1603 | ACLS: Adaptive and Conditional Label Smoothing for Network Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although these approaches have shown the effectiveness on calibrating the networks, there is still a lack of understanding on the underlying principles of regularization in terms of network calibration. We present in this paper an in-depth analysis of existing regularization-based methods, providing a better understanding on how they affect to network calibration. |
Hyekang Park; Jongyoun Noh; Youngmin Oh; Donghyeon Baek; Bumsub Ham; |
1604 | EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the scope of supervised learning. |
Zijie Jiang; Masatoshi Okutomi; |
1605 | Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, recent works find that the interpretability from prototypes is fragile, due to the semantic gap between the similarities in the feature space and that in the input space. In this work, we strive to address this challenge by making the first attempt to quantitatively and objectively evaluate the interpretability of the part-prototype networks. |
Qihan Huang; Mengqi Xue; Wenqi Huang; Haofei Zhang; Jie Song; Yongcheng Jing; Mingli Song; |
1606 | Temporal-Coded Spiking Neural Networks with Dynamic Firing Threshold: Learning with Event-Driven Backpropagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we investigate issues related to over-sparsity of spikes and the complexity of finding the `causal set’. |
Wenjie Wei; Malu Zhang; Hong Qu; Ammar Belatreche; Jian Zhang; Hong Chen; |
1607 | Mitigating Adversarial Vulnerability Through Causal Parameter Estimation By Adversarial Double Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. |
Byung-Kwan Lee; Junho Kim; Yong Man Ro; |
1608 | Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, this work introduces a Dynamic Token Pruning (DToP) method based on the early exit of tokens for semantic segmentation. |
Quan Tang; Bowen Zhang; Jiajun Liu; Fagui Liu; Yifan Liu; |
1609 | Shape Anchor Guided Holistic Indoor Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding. |
Mingyue Dong; Linxi Huan; Hanjiang Xiong; Shuhan Shen; Xianwei Zheng; |
1610 | Knowledge-Aware Federated Active Learning with Non-IID Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a federated active learning paradigm to efficiently learn a global model with a limited annotation budget while protecting data privacy in a decentralized learning manner. |
Yu-Tong Cao; Ye Shi; Baosheng Yu; Jingya Wang; Dacheng Tao; |
1611 | PlankAssembly: Robust 3D Reconstruction from Three Orthographic Views with Learnt Shape Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a new method to automatically convert 2D line drawings from three orthographic views into 3D CAD models. |
Wentao Hu; Jia Zheng; Zixin Zhang; Xiaojun Yuan; Jian Yin; Zihan Zhou; |
1612 | PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, adapting 3D generators to domains with significant domain gaps from the source domain still remains challenging due to issues in current text-to-image diffusion models as following: 1) shape-pose trade-off in diffusion-based translation, 2) pose bias, and 3) instance bias in the target domain, resulting in inferior 3D shapes, low text-image correspondence, and low intra-domain diversity in the generated samples. To address these issues, we propose a novel pipeline called PODIA-3D, which uses pose-preserved text-to-image diffusion-based domain adaptation for 3D generative models. |
Gwanghyun Kim; Ji Ha Jang; Se Young Chun; |
1613 | Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. |
Nithin Gopalakrishnan Nair; Anoop Cherian; Suhas Lohit; Ye Wang; Toshiaki Koike-Akino; Vishal M. Patel; Tim K. Marks; |
1614 | DiffFit: Unlocking Transferability of Large Diffusion Models Via Simple Parameter-efficient Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. |
Enze Xie; Lewei Yao; Han Shi; Zhili Liu; Daquan Zhou; Zhaoqiang Liu; Jiawei Li; Zhenguo Li; |
1615 | NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields for sparse view synthesis of outdoor scenes. |
Muhammad Zubair Irshad; Sergey Zakharov; Katherine Liu; Vitor Guizilini; Thomas Kollar; Adrien Gaidon; Zsolt Kira; Rares Ambrus; |
1616 | UnLoc: A Unified Framework for Video Localization Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task. We design a new approach for this called UnLoc, which uses pretrained image and text towers, and feeds tokens to a video-text fusion model. |
Shen Yan; Xuehan Xiong; Arsha Nagrani; Anurag Arnab; Zhonghao Wang; Weina Ge; David Ross; Cordelia Schmid; |
1617 | QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the wide application of quantization to lighten models, we show in our paper that directly applying quantization in BEV tasks will 1) make the training unstable, and 2) lead to intolerable performance degradation. |
Yifan Zhang; Zhen Dong; Huanrui Yang; Ming Lu; Cheng-Ching Tseng; Yuan Du; Kurt Keutzer; Li Du; Shanghang Zhang; |
1618 | Fast Inference and Update of Probabilistic Density Estimation on Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address both requirements, this paper presents a new normalizing flow-based trajectory prediction model named FlowChain. |
Takahiro Maeda; Norimichi Ukita; |
1619 | CLIPascene: Scene Sketching with Different Types and Levels of Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a method for converting a given scene image into a sketch using different types and multiple levels of abstraction. |
Yael Vinker; Yuval Alaluf; Daniel Cohen-Or; Ariel Shamir; |
1620 | Vision Grid Transformer for Document Layout Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fully leverage multi-modal information and exploit pre-training techniques to learn better representation for DLA, in this paper, we present VGT, a two-stream Vision Grid Transformer, in which Grid Transformer (GiT) is proposed and pre-trained for 2D token-level and segment-level semantic understanding. |
Cheng Da; Chuwei Luo; Qi Zheng; Cong Yao; |
1621 | Multi-Directional Subspace Editing in Style-Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper describes a new technique for finding disentangled semantic directions in the latent space of StyleGAN. |
Chen Naveh; |
1622 | Adaptive Superpixel for Active Learning in Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To reduce the annotation cost, we propose a superpixel-based active learning (AL) framework, which collects a dominant label per superpixel instead. |
Hoyoung Kim; Minhyeon Oh; Sehyun Hwang; Suha Kwak; Jungseul Ok; |
1623 | Adaptive Spiral Layers for Efficient 3D Representation Learning on Meshes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel intrinsic operator suitable for representation learning on 3D meshes. |
Francesca Babiloni; Matteo Maggioni; Thomas Tanay; Jiankang Deng; Ales Leonardis; Stefanos Zafeiriou; |
1624 | Parametric Information Maximization for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a Parametric Information Maximization (PIM) model for the Generalized Category Discovery (GCD) problem. |
Florent Chiaroni; Jose Dolz; Ziko Imtiaz Masud; Amar Mitiche; Ismail Ben Ayed; |
1625 | Convex Decomposition of Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a method to parse a complex, cluttered indoor scene into primitives which offer a parsimonious abstraction of scene structure. |
Vaibhav Vavilala; David Forsyth; |
1626 | Toward Unsupervised Realistic Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs), is studied. |
Yuwei Zhang; Chih-Hui Ho; Nuno Vasconcelos; |
1627 | A Generalist Framework for Panoptic Segmentation of Images and Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. |
Ting Chen; Lala Li; Saurabh Saxena; Geoffrey Hinton; David J. Fleet; |
1628 | DALL-Eval: Probing The Reasoning Skills and Social Biases of Text-to-Image Generation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models. |
Jaemin Cho; Abhay Zala; Mohit Bansal; |
1629 | Video OWL-ViT: Temporally-consistent Open-world Localization in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an architecture and a training recipe that adapts pretrained open-world image models to localization in videos. |
Georg Heigold; Matthias Minderer; Alexey Gritsenko; Alex Bewley; Daniel Keysers; Mario Lučić; Fisher Yu; Thomas Kipf; |
1630 | Few Shot Font Generation Via Transferring Similarity Guided Global Style and Quantization Local Style Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. |
Wei Pan; Anna Zhu; Xinyu Zhou; Brian Kenji Iwana; Shilin Li; |
1631 | Differentiable Transportation Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a novel accurate pruning technique that allows precise control over the output network size. |
Yunqiang Li; Jan C. van Gemert; Torsten Hoefler; Bert Moons; Evangelos Eleftheriou; Bram-Ernst Verhoef; |
1632 | Physics-Driven Turbulence Image Restoration with Stochastic Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. |
Ajay Jaiswal; Xingguang Zhang; Stanley H. Chan; Zhangyang Wang; |
1633 | Enhancing Non-line-of-sight Imaging Via Learnable Inverse Kernel and Attention Mechanisms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, while learning-based methods can avoid these assumptions, they may struggle to reconstruct details without specific designs due to the spectral bias of neural networks. To overcome these issues, we propose a novel approach that enhances physics-based NLOS imaging methods by introducing a learnable inverse kernel in the Fourier domain and using an attention mechanism to improve the neural network to learn high-frequency information. |
Yanhua Yu; Siyuan Shen; Zi Wang; Binbin Huang; Yuehan Wang; Xingyue Peng; Suan Xia; Ping Liu; Ruiqian Li; Shiying Li; |
1634 | DECO: Dense Estimation of 3D Human-Scene Contact In The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we focus on inferring dense, 3D contact between the full body surface and objects in arbitrary images. |
Shashank Tripathi; Agniv Chatterjee; Jean-Claude Passy; Hongwei Yi; Dimitrios Tzionas; Michael J. Black; |
1635 | Scale-Aware Modulation Meet Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new vision Transformer, Scale Aware Modulation Transformer (SMT), that can handle various downstream tasks efficiently by combining the convolutional network and vision Transformer. |
Weifeng Lin; Ziheng Wu; Jiayu Chen; Jun Huang; Lianwen Jin; |
1636 | Large Selective Kernel Network for Remote Sensing Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such prior knowledge can be useful because tiny remote sensing objects may be mistakenly detected without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes the lightweight Large Selective Kernel Network (LSKNet). |
Yuxuan Li; Qibin Hou; Zhaohui Zheng; Ming-Ming Cheng; Jian Yang; Xiang Li; |
1637 | PlaneRecTR: Unified Query Learning for 3D Plane Recovery from A Single View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, none of them tried to integrate above related subtasks into a unified framework but treated them separately and sequentially, which we suspect is potentially a main source of performance limitation for existing approaches. Motivated by this finding and the success of query-based learning in enriching reasoning among semantic entities, in this paper, we propose PlaneRecTR, a Transformer-based architecture, which for the first time unifies all subtasks related to single-view plane recovery with a single compact model. |
Jingjia Shi; Shuaifeng Zhi; Kai Xu; |
1638 | EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present EigenTrajectory (ET), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as ET space, in place of Euclidean space, for representing pedestrian movements. |
Inhwan Bae; Jean Oh; Hae-Gon Jeon; |
1639 | I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer arithmetic and bit-shifting, and without any floating-point arithmetic. |
Zhikai Li; Qingyi Gu; |
1640 | SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. |
Cody Simons; Dripta S. Raychaudhuri; Sk Miraj Ahmed; Suya You; Konstantinos Karydis; Amit K. Roy-Chowdhury; |
1641 | Learning A More Continuous Zero Level Set in Unsigned Distance Fields Through Level Set Projection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the differential networks struggle from learning the zero level set where the UDF is not differentiable, which leads to large errors on unsigned distances and gradients around the zero level set, resulting in highly fragmented and discontinuous surfaces. To resolve this problem, we propose to learn a more continuous zero level set in UDFs with level set projections. |
Junsheng Zhou; Baorui Ma; Shujuan Li; Yu-Shen Liu; Zhizhong Han; |
1642 | Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards achieving the best of both designs, this work proposes Video-FocalNet, an effective and efficient architecture for video recognition that models both local and global contexts. |
Syed Talal Wasim; Muhammad Uzair Khattak; Muzammal Naseer; Salman Khan; Mubarak Shah; Fahad Shahbaz Khan; |
1643 | To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real-time domain adaptation. |
Marc Botet Colomer; Pier Luigi Dovesi; Theodoros Panagiotakopoulos; Joao Frederico Carvalho; Linus Härenstam-Nielsen; Hossein Azizpour; Hedvig Kjellström; Daniel Cremers; Matteo Poggi; |
1644 | Hidden Biases of End-to-End Driving Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By incorporating our insights, we develop TF++, a simple end-to-end method that ranks first on the Longest6 and LAV benchmarks, gaining 11 driving score over the best prior work on Longest6. |
Bernhard Jaeger; Kashyap Chitta; Andreas Geiger; |
1645 | HairNeRF: Geometry-Aware Image Synthesis for Hairstyle Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel hairstyle transferred image synthesis method considering the underlying head geometry of two input images. |
Seunggyu Chang; Gihoon Kim; Hayeon Kim; |
1646 | Strivec: Sparse Tri-Vector Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Strivec, a novel neural representation that models a 3D scene as a radiance field with sparsely distributed and compactly factorized local tensor feature grids. |
Quankai Gao; Qiangeng Xu; Hao Su; Ulrich Neumann; Zexiang Xu; |
1647 | Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. |
Dongting Hu; Zhenkai Zhang; Tingbo Hou; Tongliang Liu; Huan Fu; Mingming Gong; |
1648 | Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. |
Wenxuan Ma; Shuang Li; JinMing Zhang; Chi Harold Liu; Jingxuan Kang; Yulin Wang; Gao Huang; |
1649 | GETAvatar: Generative Textured Meshes for Animatable Human Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generally, two challenges remain in this field: i) existing methods struggle to generate geometries with rich realistic details such as the wrinkles of garments; ii) they typically utilize volumetric radiance fields and neural renderers in the synthesis process, making high-resolution rendering non-trivial. To overcome these problems, we propose GETAvatar, a Generative model that directly generates Explicit Textured 3D meshes for animatable human Avatar, with photo-realistic appearance and fine geometric details. |
Xuanmeng Zhang; Jianfeng Zhang; Rohan Chacko; Hongyi Xu; Guoxian Song; Yi Yang; Jiashi Feng; |
1650 | Tracking Without Label: Unsupervised Multiple Object Tracking Via Contrastive Similarity Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the latent consistency of sample features across video frames and propose an Unsupervised Contrastive Similarity Learning method, named UCSL, including three contrast modules: self-contrast, cross-contrast, and ambiguity contrast. |
Sha Meng; Dian Shao; Jiacheng Guo; Shan Gao; |
1651 | PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the modality difference between videos and images, how to effectively adapt CLIP to the video domain is still underexplored. In this paper, we investigate this problem from two aspects. |
Peiyan Guan; Renjing Pei; Bin Shao; Jianzhuang Liu; Weimian Li; Jiaxi Gu; Hang Xu; Songcen Xu; Youliang Yan; Edmund Y. Lam; |
1652 | Re-mine, Learn and Reason: Exploring The Cross-modal Semantic Correlations for Language-guided HOI Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a systematic and unified framework (RmLR) that enhances HOI detection by incorporating structured text knowledge. |
Yichao Cao; Qingfei Tang; Feng Yang; Xiu Su; Shan You; Xiaobo Lu; Chang Xu; |
1653 | Strata-NeRF : Neural Radiance Fields for Stratified Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose Strata-NeRF, a single radiance field that can implicitly learn the 3D representation of outer, inner, and subsequent levels. |
Ankit Dhiman; R Srinath; Harsh Rangwani; Rishubh Parihar; Lokesh R Boregowda; Srinath Sridhar; R Venkatesh Babu; |
1654 | StylerDALLE: Language-Guided Style Transfer Using A Vector-Quantized Tokenizer of A Large-Scale Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documents. In this paper, we propose StylerDALLE, a style transfer method that exploits both of these models and uses natural language to describe abstract art styles. |
Zipeng Xu; Enver Sangineto; Nicu Sebe; |
1655 | 3D-aware Blending with Generative NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It remains challenging for existing 2D-based methods, especially when input images are misaligned due to differences in 3D camera poses and object shapes. To tackle these issues, we propose a 3D-aware blending method using generative Neural Radiance Fields (NeRF), including two key components: 3D-aware alignment and 3D-aware blending. |
Hyunsu Kim; Gayoung Lee; Yunjey Choi; Jin-Hwa Kim; Jun-Yan Zhu; |
1656 | Multi-Modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing methods directly combined the texture details and object contrast of different modalities, ignoring the dynamic changes in reality, which diminishes the visible texture in good lighting conditions and the infrared contrast in low lighting conditions. To fill this gap, we propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts, termed MoE-Fusion, to dynamically extract effective and comprehensive information from the respective modalities. |
Bing Cao; Yiming Sun; Pengfei Zhu; Qinghua Hu; |
1657 | Deep Image Harmonization with Learnable Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore learnable augmentation to enrich the illumination diversity of small-scale datasets for better harmonization performance. |
Li Niu; Junyan Cao; Wenyan Cong; Liqing Zhang; |
1658 | DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous methods rarely predict scene flow from the entire point clouds of the scene with one-time inference due to the memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation. To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids. |
Chensheng Peng; Guangming Wang; Xian Wan Lo; Xinrui Wu; Chenfeng Xu; Masayoshi Tomizuka; Wei Zhan; Hesheng Wang; |
1659 | RFD-ECNet: Extreme Underwater Image Compression with Reference to Feature Dictionary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remove redundancy among UWIs, we first construct an exhaustive underwater multi-scale feature dictionary to provide coarse-to-fine reference features for UWI compression. Subsequently, an extreme UWI compression network with reference to the feature dictionary (RFD-ECNet) is creatively proposed, which utilizes feature match and reference feature variant to significantly remove redundancy among UWIs. |
Mengyao Li; Liquan Shen; Peng Ye; Guorui Feng; Zheyin Wang; |
1660 | E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although these methods show promising results, there is still a significant performance gap compared to full fine-tuning. To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E^2VPT) approach for large-scale transformer-based model adaptation. |
Cheng Han; Qifan Wang; Yiming Cui; Zhiwen Cao; Wenguan Wang; Siyuan Qi; Dongfang Liu; |
1661 | High-Resolution Document Shadow Removal Via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works ignore this problem and remove the shadows via approximate attention and small datasets, which might not work in real-world situations. We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully-designed frequency-aware network. |
Zinuo Li; Xuhang Chen; Chi-Man Pun; Xiaodong Cun; |
1662 | Scalable Diffusion Models with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. |
William Peebles; Saining Xie; |
1663 | MMST-ViT: Climate Change-aware Crop Yield Prediction Via Multi-Modal Spatial-Temporal Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a deep learning-based solution, namely Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), for predicting crop yields at the county level across the United States, by considering the effects of short-term meteorological variations during the growing season and the long-term climate change on crops. |
Fudong Lin; Summer Crawford; Kaleb Guillot; Yihe Zhang; Yan Chen; Xu Yuan; Li Chen; Shelby Williams; Robert Minvielle; Xiangming Xiao; Drew Gholson; Nicolas Ashwell; Tri Setiyono; Brenda Tubana; Lu Peng; Magdy Bayoumi; Nian-Feng Tzeng; |
1664 | From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work unifies the formulations of the two tasks by decomposing and reorganizing the generic KD loss into a Normalized KD (NKD) loss and customized soft labels for both target class (image’s category) and non-target classes named Universal Self-Knowledge Distillation (USKD). We decompose the KD loss and find the non-target loss from it forces the student’s non-target logits to match the teacher’s, but the sum of the two non-target logits is different, preventing them from being identical. |
Zhendong Yang; Ailing Zeng; Zhe Li; Tianke Zhang; Chun Yuan; Yu Li; |
1665 | SILT: Shadow-Aware Iterative Label Tuning for Learning to Detect Shadows from Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing shadow detection datasets often contain missing or mislabeled shadows, which can hinder the performance of deep learning models trained directly on such data. To address this issue, we propose SILT, the Shadow-aware Iterative Label Tuning framework, which explicitly considers noise in shadow labels and trains the deep model in a self-training manner. |
Han Yang; Tianyu Wang; Xiaowei Hu; Chi-Wing Fu; |
1666 | Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces the Implicit AutoEncoder (IAE), a simple yet effective method that addresses the sampling variation issue by replacing the commonly-used point-cloud decoder with an implicit decoder. |
Siming Yan; Zhenpei Yang; Haoxiang Li; Chen Song; Li Guan; Hao Kang; Gang Hua; Qixing Huang; |
1667 | Grounded Image Text Matching with Mismatched Relation Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models. |
Yu Wu; Yana Wei; Haozhe Wang; Yongfei Liu; Sibei Yang; Xuming He; |
1668 | UniKD: Universal Knowledge Distillation for Mimicking Homogeneous or Heterogeneous Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Bridging this semantic gap now requires case-by-case algorithm design which is time-consuming and heavily relies on experienced adjustment. To alleviate this problem, we propose Universal Knowledge Distillation (UniKD), introducing additional decoder heads with deformable cross-attention called Adaptive Knowledge Extractor (AKE). |
Shanshan Lao; Guanglu Song; Boxiao Liu; Yu Liu; Yujiu Yang; |
1669 | Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel framework named Speech4Mesh to consecutively generate 4D talking head data and train the audio2mesh network with the reconstructed meshes. |
Shan He; Haonan He; Shuo Yang; Xiaoyan Wu; Pengcheng Xia; Bing Yin; Cong Liu; Lirong Dai; Chang Xu; |
1670 | BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the aforementioned problem, we propose a training-free method to control objects and contexts in the synthesized images adhering to the given spatial conditions. |
Jinheng Xie; Yuexiang Li; Yawen Huang; Haozhe Liu; Wentian Zhang; Yefeng Zheng; Mike Zheng Shou; |
1671 | Generalizing Neural Human Fitting to Unseen Poses With Articulated SE(3) Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For rigid point clouds, remarkable generalization has been achieved by leveraging SE(3)-equivariant networks, but these methods do not work on articulated objects. In this work we extend this idea to human bodies and propose ArtEq, a novel part-based SE(3)-equivariant neural architecture for SMPL model estimation from point clouds. |
Haiwen Feng; Peter Kulits; Shichen Liu; Michael J. Black; Victoria Fernandez Abrevaya; |
1672 | Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for adapting neural networks to distribution shifts at test-time. |
Teresa Yeo; Oğuzhan Fatih Kar; Zahra Sodagar; Amir Zamir; |
1673 | Theoretical and Numerical Analysis of 3D Reconstruction Using Point and Line Incidences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the joint image of lines incident to points, meaning the set of image tuples obtained from fixed cameras observing a varying 3D point-line incidence. |
Felix Rydell; Elima Shehu; Angélica Torres; |
1674 | Explaining Adversarial Robustness of Neural Networks from Clustering Effect Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We afterward observe that the intermediate-layer attack disobeys the clustering effect of the AT-trained model. Inspired by these significant observations, we propose a regularization method to extend the perturbation searching space during training, named sufficient adversarial training (SAT). |
Yulin Jin; Xiaoyu Zhang; Jian Lou; Xu Ma; Zilong Wang; Xiaofeng Chen; |
1675 | Leaping Into Memories: Space-Time Deep Feature Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose LEArned Preconscious Synthesis (LEAPS), an architecture-independent method for synthesizing videos from the internal spatiotemporal representations of models. |
Alexandros Stergiou; Nikos Deligiannis; |
1676 | Improving Generalization in Visual Reinforcement Learning Via Conflict-aware Gradient Agreement Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first conduct qualitative analysis and illuminate the main causes: (i) high-variance gradient magnitudes and (ii) gradient conflicts existed in various augmentation methods. To alleviate these issues, we propose a general policy gradient optimization framework, named Conflict-aware Gradient Agreement Augmentation (CG2A), and better integrate augmentation combination into visual RL algorithms to address the generalization bias. |
Siao Liu; Zhaoyu Chen; Yang Liu; Yuzheng Wang; Dingkang Yang; Zhile Zhao; Ziqing Zhou; Xie Yi; Wei Li; Wenqiang Zhang; Zhongxue Gan; |
1677 | Graph Matching with Bi-level Noisy Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a novel and widely existing problem in graph matching (GM), namely, Bi-level Noisy Correspondence (BNC), which refers to node-level noisy correspondence (NNC) and edge-level noisy correspondence (ENC). |
Yijie Lin; Mouxing Yang; Jun Yu; Peng Hu; Changqing Zhang; Xi Peng; |
1678 | Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Albeit the recent advancement, one of the major challenges still remains: noisy pseudo labels hinder efficient learning on abundant unlabeled videos, embodied as location biases and category errors. In this paper, we dive deep into such an important but understudied dilemma. |
Kun Xia; Le Wang; Sanping Zhou; Gang Hua; Wei Tang; |
1679 | InfiniCity: Infinite-Scale City Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises. |
Chieh Hubert Lin; Hsin-Ying Lee; Willi Menapace; Menglei Chai; Aliaksandr Siarohin; Ming-Hsuan Yang; Sergey Tulyakov; |
1680 | OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark. |
Xiaofeng Wang; Zheng Zhu; Wenbo Xu; Yunpeng Zhang; Yi Wei; Xu Chi; Yun Ye; Dalong Du; Jiwen Lu; Xingang Wang; |
1681 | Weakly-Supervised Text-Driven Contrastive Learning for Facial Behavior Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we introduce a two-stage Contrastive Learning with Text-Embeded framework for Facial behavior understanding (CLEF). |
Xiang Zhang; Taoyue Wang; Xiaotian Li; Huiyuan Yang; Lijun Yin; |
1682 | Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, we extend this understanding by demonstrating that these detectors can be utilized to improve the original network, paving the way for further advancements. To accomplish this, we train the detectors on top of the network output instead of the image data and apply suitable loss backpropagation. |
Eyal Gomel; Tal Shaharbany; Lior Wolf; |
1683 | Activate and Reject: Towards Safe Domain Generalization Under Category Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a practical problem of Domain Generalization under Category Shift (DGCS), which aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains. |
Chaoqi Chen; Luyao Tang; Leitian Tao; Hong-Yu Zhou; Yue Huang; Xiaoguang Han; Yizhou Yu; |
1684 | PRIOR: Prototype Representation Joint Learning from Medical Images and Reports Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a prototype representation learning framework incorporating both global and local alignment between medical images and reports. |
Pujin Cheng; Li Lin; Junyan Lyu; Yijin Huang; Wenhan Luo; Xiaoying Tang; |
1685 | Dynamic Mesh Recovery from Partial Point Cloud Sequence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, optimizing for 3D configurations from image observation requires a significant amount of computation, whereas real-world 3D measurements often suffer from noisy observation or complex occlusion. We resolve the challenge by learning a latent distribution representing strong temporal priors. |
Hojun Jang; Minkwan Kim; Jinseok Bae; Young Min Kim; |
1686 | WDiscOOD: Out-of-Distribution Detection Via Whitened Linear Discriminant Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel feature-space OOD detection score based on class-specific and class-agnostic information. |
Yiye Chen; Yunzhi Lin; Ruinian Xu; Patricio A. Vela; |
1687 | Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose GgHM, a new framework with Graph-guided Hybrid Matching. |
Jiazheng Xing; Mengmeng Wang; Yudi Ruan; Bofan Chen; Yaowei Guo; Boyu Mu; Guang Dai; Jingdong Wang; Yong Liu; |
1688 | Neural Deformable Models for 3D Bi-Ventricular Heart Shape Reconstruction and Modeling from 2D Sparse Cardiac Magnetic Resonance Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel neural deformable model (NDM) targeting at the reconstruction and modeling of 3D bi-ventricular shape of the heart from 2D sparse cardiac magnetic resonance (CMR) imaging data. |
Meng Ye; Dong Yang; Mikael Kanski; Leon Axel; Dimitris Metaxas; |
1689 | Vision HGNN: An Image Is More Than A Graph of Nodes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we enhance ViG by transcending conventional "pairwise" linkages and harnessing the power of the hypergraph to encapsulate image information. |
Yan Han; Peihao Wang; Souvik Kundu; Ying Ding; Zhangyang Wang; |
1690 | Nonrigid Object Contact Estimation With Regional Unwrapping Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our experiments demonstrate that the proposed framework can robustly estimate the deformed degrees and deformed transformations, which make it suitable for both nonrigid and rigid contact. |
Wei Xie; Zimeng Zhao; Shiying Li; Binghui Zuo; Yangang Wang; |
1691 | Diffusion in Style Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Diffusion in Style, a simple method to adapt Stable Diffusion to any desired style, using only a small set of target images. |
Martin Nicolas Everaert; Marco Bocchio; Sami Arpa; Sabine Süsstrunk; Radhakrishna Achanta; |
1692 | FunnyBirds: A Synthetic Vision Dataset for A Part-Based Analysis of Explainable AI Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While being crucial for safety-critical domains, XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem. We address this challenge by proposing a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation protocols. |
Robin Hesse; Simone Schaub-Meyer; Stefan Roth; |
1693 | Deformable Neural Radiance Fields Using RGB and Event Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a novel method to model the deformable neural radiance fields using RGB and Event cameras. |
Qi Ma; Danda Pani Paudel; Ajad Chhatkuli; Luc Van Gool; |
1694 | BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such methods also neglect scenarios where anticipating diverse short-range behaviors with subtle joint displacements is important. To address these issues, we present BeLFusion, a model that, for the first time, leverages latent diffusion models in HMP to sample from a behavioral latent space where behavior is disentangled from pose and motion. |
German Barquero; Sergio Escalera; Cristina Palmero; |
1695 | Semi-supervised Semantics-guided Adversarial Training for Robust Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel adversarial training method for trajectory prediction. |
Ruochen Jiao; Xiangguo Liu; Takami Sato; Qi Alfred Chen; Qi Zhu; |
1696 | Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we argue that this conflicts with the averaging nature of the PnP problem, leading to gradients that may encourage the network to degrade the accuracy of individual correspondences. To address this, we derive a loss function that exploits the ground truth pose before solving the PnP problem. |
Fulin Liu; Yinlin Hu; Mathieu Salzmann; |
1697 | RLSAC: Reinforcement Learning Enhanced Sample Consensus for End-to-End Robust Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation. |
Chang Nie; Guangming Wang; Zhe Liu; Luca Cavalli; Marc Pollefeys; Hesheng Wang; |
1698 | CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Injecting even a small number of poisoned examples, such as 75 examples in 3 million pretraining data, can significantly manipulate the model’s behavior, making it difficult to detect or unlearn such correlations. To address this issue, we propose CleanCLIP, a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks by independently re-aligning the representations for individual modalities. |
Hritik Bansal; Nishad Singhi; Yu Yang; Fan Yin; Aditya Grover; Kai-Wei Chang; |
1699 | Multi-Frequency Representation Enhancement with Privilege Information for Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, a novel model training method named privilege training is proposed to encode the privilege information from high-resolution videos to facilitate model training. With these two methods, we introduce a new VSR model named MFPI, which outperforms state-of-the-art methods by a large margin while maintaining good efficiency on various datasets, including REDS4, Vimeo, Vid4, and UDM10. |
Fei Li; Linfeng Zhang; Zikun Liu; Juan Lei; Zhenbo Li; |
1700 | Self-supervised Pre-training for Mirror Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that mirror reflection is crucial to how people perceive the presence of mirrors, and such mid-level features can be better transferred from self-supervised pre-trained models. Inspired by this observation, in this paper we aim to improve mirror detection methods by proposing a new self-supervised learning (SSL) pre-training framework for modeling the representation of mirror reflection progressively in the pre-training process. |
Jiaying Lin; Rynson W.H. Lau; |
1701 | GlowGAN: Unsupervised Learning of HDR Images from LDR Images in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite limited dynamic range, these LDR images are often captured with different exposures, implicitly containing information about the underlying HDR image distribution. Inspired by this intuition, in this work we present, to the best of our knowledge, the first method for learning a generative model of HDR images from in-the-wild LDR image collections in a fully unsupervised manner. |
Chao Wang; Ana Serrano; Xingang Pan; Bin Chen; Karol Myszkowski; Hans-Peter Seidel; Christian Theobalt; Thomas Leimkühler; |
1702 | Cumulative Spatial Knowledge Distillation for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present Cumulative Spatial Knowledge Distillation (CSKD). |
Borui Zhao; Renjie Song; Jiajun Liang; |
1703 | Dual Pseudo-Labels Interactive Self-Training for Semi-Supervised Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, it is difficult to generate reliable pseudo-labels and learn modality-invariant features from noise pseudo-labels. In this paper, we propose a dual pseudo-label interactive self-training (DPIS) for these two semi-supervised VI-ReID. |
Jiangming Shi; Yachao Zhang; Xiangbo Yin; Yuan Xie; Zhizhong Zhang; Jianping Fan; Zhongchao Shi; Yanyun Qu; |
1704 | Less Is More: Focus Attention for Efficient DETR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. |
Dehua Zheng; Wenhui Dong; Hailin Hu; Xinghao Chen; Yunhe Wang; |
1705 | Efficient Controllable Multi-Task Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. |
Abhishek Aich; Samuel Schulter; Amit K. Roy-Chowdhury; Manmohan Chandraker; Yumin Suh; |
1706 | HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a native skeleton-guided diffusion model for controllable HIG called HumanSD. |
Xuan Ju; Ailing Zeng; Chenchen Zhao; Jianan Wang; Lei Zhang; Qiang Xu; |
1707 | Lens Parameter Estimation for Realistic Depth of Field Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method to estimate the depth of field effect from a single image. |
Dominique Piché-Meunier; Yannick Hold-Geoffroy; Jianming Zhang; Jean-François Lalonde; |
1708 | Learned Compressive Representations for Single-Photon 3D Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the spatio-temporal resolution of the histogram tensor increases, the in-pixel memory requirements and output data rates can quickly become impractical. To overcome this limitation, we propose a family of linear compressive representations of histogram tensors that can be computed efficiently, in an online fashion, as a matrix operation. |
Felipe Gutierrez-Barragan; Fangzhou Mu; Andrei Ardelean; Atul Ingle; Claudio Bruschini; Edoardo Charbon; Yin Li; Mohit Gupta; Andreas Velten; |
1709 | Alignment-free HDR Deghosting with Semantics Consistent Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is no research on jointly leveraging the dynamic and static context in a simultaneous manner. To delve into this problem, we propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules in the network. |
Steven Tel; Zongwei Wu; Yulun Zhang; Barthélémy Heyrman; Cédric Demonceaux; Radu Timofte; Dominique Ginhac; |
1710 | Semantic-Aware Implicit Template Learning Via Part Deformation Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we highlight the importance of part deformation consistency and propose a semantic-aware implicit template learning framework to enable semantically plausible deformation. |
Sihyeon Kim; Minseok Joo; Jaewon Lee; Juyeon Ko; Juhan Cha; Hyunwoo J. Kim; |
1711 | HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we introduce our holistic, reliable, and scalable benchmark, termed \papernameAbbrev , for T2I models. |
Eslam Mohamed Bakr; Pengzhan Sun; Xiaogian Shen; Faizan Farooq Khan; Li Erran Li; Mohamed Elhoseiny; |
1712 | Multi3DRefer: Grounding Text Description to Multiple 3D Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. |
Yiming Zhang; ZeMing Gong; Angel X. Chang; |
1713 | Examining Autoexposure for Challenging Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A significant hurdle in developing new AE algorithms for challenging environments, especially those with time-varying lighting, is the lack of suitable image datasets. To address this issue, we have captured a new 4D exposure dataset that provides a large solution space (i.e., shutter speed range from 1/500 to 15 seconds) over a temporal sequence with moving objects, bright lights, and varying lighting. |
SaiKiran Tedla; Beixuan Yang; Michael S. Brown; |
1714 | DiffCloth: Diffusion Based Garment Synthesis and Manipulation Via Structural Cross-modal Semantic Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we instead introduce DiffCloth, a diffusion-based pipeline for cross-modal garment synthesis and manipulation, which empowers diffusion models with flexible compositionality in the fashion domain by structurally aligning the cross-modal semantics. |
Xujie Zhang; Binbin Yang; Michael C. Kampffmeyer; Wenqing Zhang; Shiyue Zhang; Guansong Lu; Liang Lin; Hang Xu; Xiaodan Liang; |
1715 | Improved Visual Fine-tuning with Natural Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the problem of catastrophic forgetting in pre-trained backbone has been extensively studied for fine-tuning, its potential bias from the corresponding pre-training task and data, attracts less attention. In this work, we investigate this problem by demonstrating that the obtained classifier after fine-tuning will be close to that induced by the pre-trained model. |
Junyang Wang; Yuanhong Xu; Juhua Hu; Ming Yan; Jitao Sang; Qi Qian; |
1716 | Person Re-Identification Without Identification Via Event Anonymization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, recent deep learning architectures have been able to reconstruct images from event cameras with high fidelity, reintroducing a potential threat to privacy for event-based vision applications. In this paper, we aim to anonymize event-streams to protect the identity of human subjects against such image reconstruction attacks. |
Shafiq Ahmad; Pietro Morerio; Alessio Del Bue; |
1717 | GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel 3D-aware GAN that can generate high resolution images (up to 1024×1024) while keeping strict 3D consistency as in volume rendering. |
Jianfeng Xiang; Jiaolong Yang; Yu Deng; Xin Tong; |
1718 | Small Object Detection Via Coarse-to-fine Proposal Generation and Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To alleviate the aforementioned issues, we propose CFINet, a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. |
Xiang Yuan; Gong Cheng; Kebing Yan; Qinghua Zeng; Junwei Han; |
1719 | Anomaly Detection Under Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of anomaly detection under distribution shift and establish performance benchmarks on four widely-used AD and out-of-distribution (OOD) generalization datasets. |
Tri Cao; Jiawen Zhu; Guansong Pang; |
1720 | Class Prior-Free Positive-Unlabeled Learning with Taylor Variational Loss for Hyperspectral Remote Sensing Imagery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a Taylor variational loss is proposed for HSI PU learning, which reduces the weight of the gradient of the unlabeled data by Taylor series expansion to enable the network to find a balance between overfitting and underfitting. |
Hengwei Zhao; Xinyu Wang; Jingtao Li; Yanfei Zhong; |
1721 | HoloAssist: An Egocentric Human Interaction Dataset for Interactive AI Assistants in The Real World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. |
Xin Wang; Taein Kwon; Mahdi Rad; Bowen Pan; Ishani Chakraborty; Sean Andrist; Dan Bohus; Ashley Feniello; Bugra Tekin; Felipe Vieira Frujeri; Neel Joshi; Marc Pollefeys; |
1722 | Self-Feedback DETR for Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problem, we propose a novel framework, Self-DETR, which utilizes cross-attention maps of the decoder to reactivate self-attention modules. |
Jihwan Kim; Miso Lee; Jae-Pil Heo; |
1723 | StableVideo: Text-driven Consistency-aware Diffusion Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This prevents diffusion models from being applied to natural video editing. In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the new objects. |
Wenhao Chai; Xun Guo; Gaoang Wang; Yan Lu; |
1724 | PIRNet: Privacy-Preserving Image Restoration Network Via Wavelet Lifting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method namely PIRNet, which operates privacy-preserving image restoration in the steganographic domain. |
Xin Deng; Chao Gao; Mai Xu; |
1725 | LAW-Diffusion: Complex Scene Generation By Diffusion with Layouts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we achieve accurate complex scene generation by proposing a semantically controllable Layout-AWare diffusion model, termed LAW-Diffusion. |
Binbin Yang; Yi Luo; Ziliang Chen; Guangrun Wang; Xiaodan Liang; Liang Lin; |
1726 | Multi-Label Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel multi-label knowledge distillation method. |
Penghui Yang; Ming-Kun Xie; Chen-Chen Zong; Lei Feng; Gang Niu; Masashi Sugiyama; Sheng-Jun Huang; |
1727 | Towards Geospatial Foundation Models Via Continual Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Researchers have explored two prominent approaches for introducing such models in geospatial applications, but both have drawbacks in terms of limited performance benefit or prohibitive training cost. Therefore, in this work, we propose a novel paradigm for building highly effective geospatial foundation models with minimal resource cost and carbon impact. |
Matías Mendieta; Boran Han; Xingjian Shi; Yi Zhu; Chen Chen; |
1728 | ConSlide: Asynchronous Hierarchical Interaction Transformer with Breakup-Reorganize Rehearsal for Continual Whole Slide Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the FIRST continual learning framework for WSI analysis, named ConSlide, to tackle the challenges of enormous image size, utilization of hierarchical structure, and catastrophic forgetting by progressive model updating on multiple sequential datasets. |
Yanyan Huang; Weiqin Zhao; Shujun Wang; Yu Fu; Yuming Jiang; Lequan Yu; |
1729 | RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. |
Zhikai Li; Junrui Xiao; Lianwei Yang; Qingyi Gu; |
1730 | Enhancing Privacy Preservation in Federated Learning Via Learning Rate Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we theoretically derive a convergence guarantee to FedAvg with uniformly perturbed local learning rates. Therefore, by perturbing the learning rate of each client with random noise, we propose a learning rate perturbation (LRP) defense against gradient inversion attacks. |
Guangnian Wan; Haitao Du; Xuejing Yuan; Jun Yang; Meiling Chen; Jie Xu; |
1731 | UMC: A Unified Bandwidth-efficient and Multi-resolution Based Collaborative Perception Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we aim to propose a Unified Collaborative perception framework named UMC, optimizing the communication, collaboration, and reconstruction processes with the Multi-resolution technique. |
Tianhang Wang; Guang Chen; Kai Chen; Zhengfa Liu; Bo Zhang; Alois Knoll; Changjun Jiang; |
1732 | Viewing Graph Solvability in Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an advance in understanding the projective Structure-from-Motion, focusing in particular on the viewing graph: such a graph has cameras as nodes and fundamental matrices as edges. |
Federica Arrigoni; Tomas Pajdla; Andrea Fusiello; |
1733 | SATR: Zero-Shot Semantic Segmentation of 3D Shapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore the task of zero-shot semantic segmentation of 3D shapes by using large-scale off-the-shelf 2D im- age recognition models. |
Ahmed Abdelreheem; Ivan Skorokhodov; Maks Ovsjanikov; Peter Wonka; |
1734 | ReactioNet: Learning High-Order Facial Behavior from Universal Stimulus-Reaction By Dyadic Relation Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we collected a large-scale spontaneous facial behavior database ReactioNet, which contains 1.1 million coupled stimulus-reaction tuples (visual/audio/caption from both stimuli and subjects). |
Xiaotian Li; Taoyue Wang; Geran Zhao; Xiang Zhang; Xi Kang; Lijun Yin; |
1735 | Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information. |
Yang Hai; Rui Song; Jiaojiao Li; David Ferstl; Yinlin Hu; |
1736 | Emotional Listener Portrait: Neural Listener Head Generation with Emotion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation, which varies depending on the emotions and attitudes of both the speaker and the listener. To tackle this problem, we propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords and explicitly models the probability distribution of the motions under different emotional contexts in conversation. |
Luchuan Song; Guojun Yin; Zhenchao Jin; Xiaoyi Dong; Chenliang Xu; |
1737 | Unsupervised Domain Adaptation for Training Event-Based Networks Using Contrastive Learning and Uncorrelated Conditioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop an unsupervised domain adaptation algorithm for training a deep network for event-based data image classification using contrastive learning and uncorrelated conditioning of data. |
Dayuan Jian; Mohammad Rostami; |
1738 | Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One of the biggest challenges of this task is severe body truncation due to close social distances in egocentric scenarios, which brings large pose ambiguities for unseen body parts. To tackle this challenge, we propose a novel scene-conditioned diffusion method to model the body pose distribution. |
Siwei Zhang; Qianli Ma; Yan Zhang; Sadegh Aliakbarian; Darren Cosker; Siyu Tang; |
1739 | ImGeoNet: Image-induced Geometry-aware Voxel Representation for Multi-view 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose ImGeoNet, a multi-view image-based 3D object detection framework that models a 3D space by an image-induced geometry-aware voxel representation. |
Tao Tu; Shun-Po Chuang; Yu-Lun Liu; Cheng Sun; Ke Zhang; Donna Roy; Cheng-Hao Kuo; Min Sun; |
1740 | DRAW: Defending Camera-shooted RAW Against Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. |
Xiaoxiao Hu; Qichao Ying; Zhenxing Qian; Sheng Li; Xinpeng Zhang; |
1741 | Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address both issues, a novel Pose-Constrained Latent Diffusion model (PoCoLD) is introduced. |
Xiao Han; Xiatian Zhu; Jiankang Deng; Yi-Zhe Song; Tao Xiang; |
1742 | Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source-domain labels as explicit guidance during image translation. |
Duo Peng; Ping Hu; Qiuhong Ke; Jun Liu; |
1743 | TopoSeg: Topology-Aware Nuclear Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To efficiently focus on regions with high topological errors, we propose an adaptive topology-aware selection (ATS) strategy to enhance the topology-aware optimization procedure further. |
Hongliang He; Jun Wang; Pengxu Wei; Fan Xu; Xiangyang Ji; Chang Liu; Jie Chen; |
1744 | SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To relax the dependence to depth we propose SceneRF, a self-supervised monocular scene reconstruction method using only posed image sequences for training. |
Anh-Quan Cao; Raoul de Charette; |
1745 | Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the observations, we propose two Transformer variants: i) Context-Sharing Transformer (CST) that learns the global-shared contextual information within image frames with a lightweight computation. |
Yichen Yuan; Yifan Wang; Lijun Wang; Xiaoqi Zhao; Huchuan Lu; Yu Wang; Weibo Su; Lei Zhang; |
1746 | CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective Contextual Point Cloud Modeling (CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method. |
Lizhao Liu; Zhuangwei Zhuang; Shangxin Huang; Xunlong Xiao; Tianhang Xiang; Cen Chen; Jingdong Wang; Mingkui Tan; |
1747 | PATMAT: Person Aware Tuning of Mask-Aware Transformer for Face Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current generative models for face inpainting often fail to preserve fine facial details and the identity of the person, despite creating aesthetically convincing image structures and textures. In this work, we propose Person Aware Tuning (PAT) of Mask-Aware Transformer (MAT) for face inpainting, which addresses this issue. |
Saman Motamed; Jianjin Xu; Chen Henry Wu; Christian Häne; Jean-Charles Bazin; Fernando De la Torre; |
1748 | Adaptive Nonlinear Latent Transformation for Conditional Face Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel adaptive nonlinear latent transformation for disentangled and conditional face editing, termed AdaTrans. |
Zhizhong Huang; Siteng Ma; Junping Zhang; Hongming Shan; |
1749 | Tiny Updater: Towards Efficient Neural Network-Driven Software Updating Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, during the update of neural network-based software, users are required to download all the parameters of the neural network anew, which harms the user experience. Motivated by previous progress in model compression, we propose a novel training methodology named Tiny Updater to address this issue. |
Linfeng Zhang; Kaisheng Ma; |
1750 | INT2: Interactive Trajectory Prediction at Intersections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a large-scale interactive trajectory prediction dataset named INT2 for INTeractive trajectory prediction at INTersections. |
Zhijie Yan; Pengfei Li; Zheng Fu; Shaocong Xu; Yongliang Shi; Xiaoxue Chen; Yuhang Zheng; Yang Li; Tianyu Liu; Chuxuan Li; Nairui Luo; Xu Gao; Yilun Chen; Zuoxu Wang; Yifeng Shi; Pengfei Huang; Zhengxiao Han; Jirui Yuan; Jiangtao Gong; Guyue Zhou; Hang Zhao; Hao Zhao; |
1751 | MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MapPrior, a novel BEV perception framework that combines a traditional discriminative BEV perception model with a learned generative model for semantic map layouts. |
Xiyue Zhu; Vlas Zyrianov; Zhijian Liu; Shenlong Wang; |
1752 | CAD-Estate: Large-scale CAD Model Annotation in RGB Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects. |
Kevis-Kokitsi Maninis; Stefan Popov; Matthias Nießner; Vittorio Ferrari; |
1753 | Conditional Cross Attention Network for Multi-Space Embedding Without Entanglement in Only A SINGLE Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional approaches to embedding multiple specific attributes into a single network often result in entanglement, where fine-grained features of each attribute cannot be identified separately. To address this problem, we propose a Conditional Cross-Attention Network that induces disentangled multi-space embeddings for various specific attributes with only a single backbone. |
Chull Hwan Song; Taebaek Hwang; Jooyoung Yoon; Shunghyun Choi; Yeong Hyeon Gu; |
1754 | MB-TaylorFormer: Multi-Branch Efficient Transformer Expanded By Taylor Formula for Image Dehazing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the quadratic computational complexity of softmax-attention limits the wide application in image dehazing task, especially for high-resolution images. To address this issue, we propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity. |
Yuwei Qiu; Kaihao Zhang; Chenxi Wang; Wenhan Luo; Hongdong Li; Zhi Jin; |
1755 | X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization Via Dynamic Textual Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitations, we present X-Mesh, an innovative text-driven 3D stylization framework that incorporates a novel Text-guided Dynamic Attention Module (TDAM). |
Yiwei Ma; Xiaoqing Zhang; Xiaoshuai Sun; Jiayi Ji; Haowei Wang; Guannan Jiang; Weilin Zhuang; Rongrong Ji; |
1756 | Muscles in Action Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new dataset, Muscles in Action (MIA), to to learn to incorporate muscle activity into human motion representations. |
Mia Chiquier; Carl Vondrick; |
1757 | Large-Scale Person Detection and Localization Using Overhead Fisheye Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on developing person positioning solutions using overhead fisheye cameras. |
Lu Yang; Liulei Li; Xueshi Xin; Yifan Sun; Qing Song; Wenguan Wang; |
1758 | ViLTA: Enhancing Vision-Language Pre-training Through Textual Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method ViLTA, comprising of two components to further facilitate the model to learn fine-grained representations among image-text pairs. |
Weihan Wang; Zhen Yang; Bin Xu; Juanzi Li; Yankui Sun; |
1759 | All-to-Key Attention for Arbitrary Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel all-to-key attention mechanism—each position of content features is matched to stable key positions of style features—that is more in line with the characteristics of style transfer. |
Mingrui Zhu; Xiao He; Nannan Wang; Xiaoyu Wang; Xinbo Gao; |
1760 | Learning to Distill Global Representation for Sparse-View CT Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we stick to image post-processing methods due to great flexibility and propose global representation (GloRe) distillation framework for sparse-view CT, termed GloReDi. |
Zilong Li; Chenglong Ma; Jie Chen; Junping Zhang; Hongming Shan; |
1761 | FocalFormer3D: Focusing on Hard Instance for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies FN in a multi-stage manner and guides the models to focus on excavating difficult instances. |
Yilun Chen; Zhiding Yu; Yukang Chen; Shiyi Lan; Anima Anandkumar; Jiaya Jia; Jose M. Alvarez; |
1762 | Not Every Side Is Equal: Localization Uncertainty Estimation for Semi-Supervised 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, existing methods filter out a large number of low-quality pseudo-labels, which also contain some correct regression values that can help with model training. To address the above issues, we propose a side-aware framework for semi-supervised 3D object detection consisting of three key designs: a 3D bounding box parameterization method, an uncertainty estimation module, and a pseudo-label selection strategy. |
Chuxin Wang; Wenfei Yang; Tianzhu Zhang; |
1763 | Teaching CLIP to Count to Ten Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a simple yet effective method to improve the quantitative understanding of vision-language models, while maintaining their overall performance on common benchmarks. |
Roni Paiss; Ariel Ephrat; Omer Tov; Shiran Zada; Inbar Mosseri; Michal Irani; Tali Dekel; |
1764 | TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. |
Rohan Choudhury; Kris M. Kitani; László A. Jeni; |
1765 | SparseMAE: Sparse Training Meets Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to reduce model complexity from large vision transformers pretrained by MAE with assistant of sparse training. |
Aojun Zhou; Yang Li; Zipeng Qin; Jianbo Liu; Junting Pan; Renrui Zhang; Rui Zhao; Peng Gao; Hongsheng Li; |
1766 | DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present DiffPose, a novel diffusion architecture that formulates video-based human pose estimation as a conditional heatmap generation problem. |
Runyang Feng; Yixing Gao; Tze Ho Elden Tse; Xueqing Ma; Hyung Jin Chang; |
1767 | ELITE: Encoding Visual Concepts Into Textual Embeddings for Customized Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we instead propose a learning-based encoder, which consists of a global and a local mapping networks for fast and accurate customized text-to-image generation. |
Yuxiang Wei; Yabo Zhang; Zhilong Ji; Jinfeng Bai; Lei Zhang; Wangmeng Zuo; |
1768 | Text2Performer: Text-Driven Human Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Text2Performer to generate vivid human videos with articulated motions from texts. |
Yuming Jiang; Shuai Yang; Tong Liang Koh; Wayne Wu; Chen Change Loy; Ziwei Liu; |
1769 | A Simple Recipe to Meta-Learn Forward and Backward Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new, general, and simple meta-learning algorithm for continual learning (SiM4C) that explicitly optimizes to minimize forgetting and facilitate forward transfer. |
Edoardo Cetin; Antonio Carta; Oya Celiktutan; |
1770 | 4D Myocardium Reconstruction with Decoupled Motion and Shape Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, cine magnetic resonance (CMR) imaging is dominated by 2D slices, whose large slice spacing challenges inter-slice shape reconstruction and motion acquisition. To address this problem, we propose a 4D reconstruction method that decouples motion and shape, which can predict the inter-/intra- shape and motion estimation from a given sparse point cloud sequence obtained from limited slices. |
Xiaohan Yuan; Cong Liu; Yangang Wang; |
1771 | IntentQA: Context-aware Video Intent Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel task IntentQA, a special VideoQA task focusing on video intent reasoning, which has become increasingly important for AI with its advantages in equipping AI agents with the capability of reasoning beyond mere recognition in daily tasks. |
Jiapeng Li; Ping Wei; Wenjuan Han; Lifeng Fan; |
1772 | LiDAR-UDA: Self-ensembling Through Time for Unsupervised LiDAR Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LiDAR-UDA, a novel two-stage self-training-based Unsupervised Domain Adaptation (UDA) method for LiDAR segmentation. |
Amirreza Shaban; JoonHo Lee; Sanghun Jung; Xiangyun Meng; Byron Boots; |
1773 | Robust Monocular Depth Estimation Under Challenging Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While state-of-the-art monocular depth estimation approaches achieve impressive results in ideal settings, they are highly unreliable under challenging illumination and weather conditions, such as at nighttime or in the presence of rain. In this paper, we uncover these safety-critical issues and tackle them with md4all: a simple and effective solution that works reliably under both adverse and ideal conditions, as well as for different types of learning supervision. |
Stefano Gasperini; Nils Morbitzer; HyunJun Jung; Nassir Navab; Federico Tombari; |
1774 | Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird’s-Eye View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose to use parametric depth distribution modeling for feature transformation. |
Jiayu Yang; Enze Xie; Miaomiao Liu; Jose M. Alvarez; |
1775 | MSI: Maximize Support-Set Information for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a novel method (MSI), which maximizes the support-set information by exploiting two complementary sources of features to generate super correlation maps. |
Seonghyeon Moon; Samuel S. Sohn; Honglu Zhou; Sejong Yoon; Vladimir Pavlovic; Muhammad Haris Khan; Mubbasir Kapadia; |
1776 | Global Features Are All You Need for Image Retrieval and Reranking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SuperGlobal, a novel approach that exclusively employs global features for both stages, improving efficiency without sacrificing accuracy. |
Shihao Shao; Kaifeng Chen; Arjun Karpur; Qinghua Cui; André Araujo; Bingyi Cao; |
1777 | DPF-Net: Combining Explicit Shape Priors in Deformable Primitive Field for Unsupervised Structural Reconstruction of 3D Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unsupervised methods for reconstructing structures face significant challenges in capturing the geometric details with consistent structures among diverse shapes of the same category. To address this issue, we present a novel unsupervised structural reconstruction method, named DPF-Net, based on a new Deformable Primitive Field (DPF) representation, which allows for high-quality shape reconstruction using parameterized geometric primitives. |
Qingyao Shuai; Chi Zhang; Kaizhi Yang; Xuejin Chen; |
1778 | CORE: Co-planarity Regularized Monocular Geometry Estimation with Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we devise CO-planarity REgularized (CORE) loss functions and Structure-Aware Normal Estimator (SANE). |
Yuguang Li; Kai Wang; Hui Li; Seon-Min Rhee; Seungju Han; Jihye Kim; Min Yang; Ran Yang; Feng Zhu; |
1779 | A Sentence Speaks A Thousand Images: Domain Generalization Through Distilling CLIP with Language Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach for domain generalization that leverages recent advances in large vision-language models, specifically a CLIP teacher model, to train a smaller model that generalizes to unseen domains. |
Zeyi Huang; Andy Zhou; Zijian Ling; Mu Cai; Haohan Wang; Yong Jae Lee; |
1780 | H3WB: Human3.6M 3D WholeBody Dataset and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a benchmark for 3D human whole-body pose estimation, which involves identifying accurate 3D keypoints on the entire human body, including face, hands, body, and feet. |
Yue Zhu; Nermin Samet; David Picard; |
1781 | Yes, We CANN: Constrained Approximate Nearest Neighbors for Local Feature-Based Visual Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It seems to have become common belief thatglobal embeddings are critical for said image-retrieval invisual localization, despite the significant downside of hav-ing to compute two feature types for each query image. Inthis paper, we take a step back from this assumption and pro-pose Constrained Approximate Nearest Neighbors (CANN),a joint solution of k-nearest-neighbors across both the ge-ometry and appearance space using only local features. |
Dror Aiger; Andre Araujo; Simon Lynen; |
1782 | Multi-Object Navigation with Dynamically Learned Neural Implicit Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to structure neural networks with two neural implicit representations, which are learned dynamically during each episode and map the content of the scene: (i) the Semantic Finder predicts the position of a previously seen queried object; (ii) the Occupancy and Exploration Implicit Representation encapsulates information about explored area and obstacles, and is queried with a novel global read mechanism which directly maps from function space to a usable embedding space. |
Pierre Marza; Laetitia Matignon; Olivier Simonin; Christian Wolf; |
1783 | NPC: Neural Point Characters from Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a hybrid point-based representation for animatable humans that does not require an explicit surface model, while being generalizable to novel poses. |
Shih-Yang Su; Timur Bagautdinov; Helge Rhodin; |
1784 | LDP-Feat: Image Features with Local Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, researchers recently proposed privatizing image features by embedding them within an affine subspace containing the original feature as well as adversarial feature samples. In this paper, we propose two novel inversion attacks to show that it is possible to (approximately) recover the original image features from these embeddings, allowing us to recover privacy-critical image content. |
Francesco Pittaluga; Bingbing Zhuang; |
1785 | Pre-Training-Free Image Manipulation Localization Through Non-Mutually Exclusive Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Simply abnegating these contour patches results in a drastic performance loss since contour patches are decisive to the learning outcomes. Hence, we propose the Non-mutually exclusive Contrastive Learning (NCL) framework to rescue conventional contrastive learning from the above dilemma. |
Jizhe Zhou; Xiaochen Ma; Xia Du; Ahmed Y. Alhammadi; Wentao Feng; |
1786 | MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Incremental MLTR (IMLTR) task in the context of incremental learning (IL), where different languages are introduced in batches. |
Tianlun Zheng; Zhineng Chen; Bingchen Huang; Wei Zhang; Yu-Gang Jiang; |
1787 | Domain Generalization Guided By Gradient Signal to Noise Ratio of Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we move away from the classical approach of Bernoulli sampled dropout mask construction and propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network’s parameters. |
Mateusz Michalkiewicz; Masoud Faraki; Xiang Yu; Manmohan Chandraker; Mahsa Baktashmotlagh; |
1788 | Counterfactual-based Saliency Map: Towards Visual Contrastive Explanations for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we propose a content-aware counterfactual perturbing algorithm to stimulate contrastive examples, from which a pair of positive and negative saliency maps could be derived to contrastively explain why P (positive class) rather than Q (negative class). |
Xue Wang; Zhibo Wang; Haiqin Weng; Hengchang Guo; Zhifei Zhang; Lu Jin; Tao Wei; Kui Ren; |
1789 | MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. To address these issues, this paper proposes a novel method called Minimum Spanning Tree (MST) compression that learns to compress and accelerate BNNs. |
Quang Hieu Vo; Linh-Tam Tran; Sung-Ho Bae; Lok-Won Kim; Choong Seon Hong; |
1790 | MOST: Multiple Object Localization with Self-Supervised Transformers for Object Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Multiple Object localization with Self-supervised Transformers (MOST) that uses features of transformers trained using self-supervised learning to localize multiple objects in real world images. |
Sai Saketh Rambhatla; Ishan Misra; Rama Chellappa; Abhinav Shrivastava; |
1791 | IIEU: Rethinking Neural Feature Activation from Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By treating activation models as selective feature re-calibrators that suppress/emphasize features according to their importance scores measured by feature-filter similarities, we propose a set of specific properties of effective Act models with new intuitions. |
Sudong Cai; |
1792 | Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose to integrally migrate pre-trained transformer encoder-decoders (imTED) to a detector, constructing a feature extraction path which is "fully pre-trained" so that detectors’ generalization capacity is maximized. |
Feng Liu; Xiaosong Zhang; Zhiliang Peng; Zonghao Guo; Fang Wan; Xiangyang Ji; Qixiang Ye; |
1793 | V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a learning-based depth map fusion framework that accepts a set of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm as input and improves them. |
Nathaniel Burgdorfer; Philippos Mordohai; |
1794 | CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. |
Tianrui Guan; Aswath Muthuselvam; Montana Hoover; Xijun Wang; Jing Liang; Adarsh Jagan Sathyamoorthy; Damon Conover; Dinesh Manocha; |
1795 | Recursive Video Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A novel algorithm to detect road lanes in videos, called recursive video lane detector (RVLD), is proposed in this paper, which propagates the state of a current frame recursively to the next frame. |
Dongkwon Jin; Dahyun Kim; Chang-Su Kim; |
1796 | GECCO: Geometrically-Conditioned Point Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically-motivated conditioning scheme based on projecting sparse image features into the point cloud and attaching them to each individual point, at every step in the denoising process. |
Michał J Tyszkiewicz; Pascal Fua; Eduard Trulls; |
1797 | Unsupervised Self-Driving Attention Prediction Via Uncertainty Mining and Knowledge Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, the huge domain gap between natural scenes and traffic scenes in current datasets also limits the potential for model training. To address these challenges, we are the first to introduce an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration. |
Pengfei Zhu; Mengshi Qi; Xia Li; Weijian Li; Huadong Ma; |
1798 | PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. |
Yingfei Liu; Junjie Yan; Fan Jia; Shuailin Li; Aqi Gao; Tiancai Wang; Xiangyu Zhang; |
1799 | Out-of-Domain GAN Inversion Via Invertibility Decomposition for Photo-Realistic Human Face Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework that enhances the fidelity of human face inversion by designing a new module to decompose the input images to ID and OOD partitions with invertibility masks. |
Xin Yang; Xiaogang XU; Yingcong Chen; |
1800 | SAFE: Machine Unlearning With Shard Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. |
Yonatan Dukler; Benjamin Bowman; Alessandro Achille; Aditya Golatkar; Ashwin Swaminathan; Stefano Soatto; |
1801 | Learning Trajectory-Word Alignments for Video-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, modern Video-Language BERTs (VDL-BERTs) neglect this trajectory characteristic that they usually follow image-language BERTs (IL-BERTs) to deploy the patch-to-word (P2W) attention that may over-exploit trivial spatial contexts and neglect significant temporal contexts. To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks. |
Xu Yang; Zhangzikang Li; Haiyang Xu; Hanwang Zhang; Qinghao Ye; Chenliang Li; Ming Yan; Yu Zhang; Fei Huang; Songfang Huang; |
1802 | OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method for generating realistic and view-consistent images with fine geometry from 2D image collections. |
Honglin He; Zhuoqian Yang; Shikai Li; Bo Dai; Wayne Wu; |
1803 | Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper proposes a novel geometry integration mechanism for 3D scene reconstruction. |
Ruihong Yin; Sezer Karaoglu; Theo Gevers; |
1804 | Atmospheric Transmission and Thermal Inertia Induced Blind Road Segmentation with A Large-Scale Dataset TBRSD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most walking assistant systems rely on visual light images, which is dangerous in weak illumination environments such as darkness or fog. To address this issue and enhance the safety of vision-based walking assistant systems, we developed a thermal infrared blind road segmentation neural network (TINN). |
Junzhang Chen; Xiangzhi Bai; |
1805 | NeTO:Neural Reconstruction of Transparent Objects with Self-Occlusion Aware Refraction-Tracing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method called NeTO, for capturing the 3D geometry of solid transparent objects from 2D images via volume rendering. |
Zongcheng Li; Xiaoxiao Long; Yusen Wang; Tuo Cao; Wenping Wang; Fei Luo; Chunxia Xiao; |
1806 | Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy Via Geometry-Guided Cross-View Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera’s location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. |
Yujiao Shi; Fei Wu; Akhil Perincherry; Ankit Vora; Hongdong Li; |
1807 | Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we seek to explore a more efficient two-stage framework for high-resolution image generation with improvements in the following three aspects. |
Shiyue Cao; Yueqin Yin; Lianghua Huang; Yu Liu; Xin Zhao; Deli Zhao; Kaigi Huang; |
1808 | DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an effective lightweight dynamic local and global self-attention network (DLGSANet) to solve image super-resolution. |
Xiang Li; Jiangxin Dong; Jinhui Tang; Jinshan Pan; |
1809 | Adaptive Reordering Sampler with Neurally Guided MAGSAC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new sampler for robust estimators that always selects the sample with the highest probability of consisting only of inliers. |
Tong Wei; Jiri Matas; Daniel Barath; |
1810 | Learning Cross-Representation Affinity Consistency for Sparsely Supervised Biomedical Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a sparsely supervised biomedical instance segmentation framework via cross-representation affinity consistency regularization. |
Xiaoyu Liu; Wei Huang; Zhiwei Xiong; Shenglong Zhou; Yueyi Zhang; Xuejin Chen; Zheng-Jun Zha; Feng Wu; |
1811 | Black-Box Unsupervised Domain Adaptation with Bi-Directional Atkinson-Shiffrin Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose BiMem, a bi-directional memorization mechanism that learns to remember useful and representative information to correct noisy pseudo labels on the fly, leading to robust black-box UDA that can generalize across different visual recognition tasks. |
Jingyi Zhang; Jiaxing Huang; Xueying Jiang; Shijian Lu; |
1812 | Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. |
Xinzhu Ma; Yongtao Wang; Yinmin Zhang; Zhiyi Xia; Yuan Meng; Zhihui Wang; Haojie Li; Wanli Ouyang; |
1813 | Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present DiST, which disentangles the learning of spatial and temporal aspects of videos. |
Zhiwu Qing; Shiwei Zhang; Ziyuan Huang; Yingya Zhang; Changxin Gao; Deli Zhao; Nong Sang; |
1814 | A Skeletonization Algorithm for Gradient-Based Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces the first three-dimensional skeletonization algorithm that is both compatible with gradient-based optimization and preserves an object’s topology. |
Martin J. Menten; Johannes C. Paetzold; Veronika A. Zimmer; Suprosanna Shit; Ivan Ezhov; Robbie Holland; Monika Probst; Julia A. Schnabel; Daniel Rueckert; |
1815 | V3Det: Vast Vocabulary Visual Detection Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By offering a vast exploration space, V3Det enables extensive benchmarks on both vast and open vocabulary object detection, leading to new observations, practices, and insights for future research. |
Jiaqi Wang; Pan Zhang; Tao Chu; Yuhang Cao; Yujie Zhou; Tong Wu; Bin Wang; Conghui He; Dahua Lin; |
1816 | Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design a Coarse-to-Fine framework to learn Compact Discriminative representation (CFCD) for end-to-end single-stage image retrieval-requiring only image-level labels. |
Yunquan Zhu; Xinkai Gao; Bo Ke; Ruizhi Qiao; Xing Sun; |
1817 | Multi-weather Image Restoration Via Domain Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods may face challenges when dealing with real-world situations where the images may have multiple, more intricate weather conditions. To address this issue, we propose a domain translation-based unified method for multi-weather image restoration. |
Prashant W. Patil; Sunil Gupta; Santu Rana; Svetha Venkatesh; Subrahmanyam Murala; |
1818 | Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One critical challenge in 6D object pose estimation from a single RGBD image is efficient integration of two different modalities, i.e., color and depth. In this work, we tackle this problem by a novel Deep Fusion Transformer (DFTr) block that can aggregate cross-modality features for improving pose estimation. |
Jun Zhou; Kai Chen; Linlin Xu; Qi Dou; Jing Qin; |
1819 | BT^2: Backward-compatible Training with Basis Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show in this paper how a BT can be utilized to add only the necessary amount of additional dimensions. |
Yifei Zhou; Zilu Li; Abhinav Shrivastava; Hengshuang Zhao; Antonio Torralba; Taipeng Tian; Ser-Nam Lim; |
1820 | ViperGPT: Visual Inference Via Python Execution for Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ViperGPT, a framework that leverages code-generation models to compose vision-and-language models into subroutines to produce a result for any query. |
Dídac Surís; Sachit Menon; Carl Vondrick; |
1821 | Improving Unsupervised Visual Program Inference with Code Rewriting Families Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We explore how code rewriting can be used to improve systems for inferring programs from visual data. |
Aditya Ganeshan; R. Kenny Jones; Daniel Ritchie; |
1822 | Essential Matrix Estimation Using Convex Relaxations in Orthogonal Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel method to estimate the essential matrix for two-view Structure from Motion (SfM). |
Arman Karimian; Roberto Tron; |
1823 | Concept-wise Fine-tuning Matters in Preventing Negative Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Rooted in structural causal models of predictions after fine-tuning, we propose a Concept-wise fine-tuning (Concept-Tuning) approach which refines feature representations in the level of patches with each patch encoding a concept. |
Yunqiao Yang; Long-Kai Huang; Ying Wei; |
1824 | Learning Human Dynamics in Autonomous Driving Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a holistic framework for learning physically plausible human dynamics from real driving scenarios, narrowing the gap between real and simulated human behavior in safety-critical applications. |
Jingbo Wang; Ye Yuan; Zhengyi Luo; Kevin Xie; Dahua Lin; Umar Iqbal; Sanja Fidler; Sameh Khamis; |
1825 | Fine-grained Visible Watermark Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works have designed dynamic network to handle various types of watermarks adaptively, but they ignore that even the watermarked region in a single image can be divided into multiple local parts with distinct visual appearances. In this work, we advance image-specific dynamic network towards part-specific dynamic network, which discovers multiple local parts within the watermarked region and handle them adaptively. |
Li Niu; Xing Zhao; Bo Zhang; Liqing Zhang; |
1826 | DDP: Diffusion Model for Dense Visual Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. |
Yuanfeng Ji; Zhe Chen; Enze Xie; Lanqing Hong; Xihui Liu; Zhaoqiang Liu; Tong Lu; Zhenguo Li; Ping Luo; |
1827 | Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It would damage the semantic consistency of representation to pull these augmentations closer in the feature space indiscriminately. In this study, we introduce feature-level augmentation and propose a novel semantics-consistent feature search (SCFS) method to mitigate this negative effect. |
Kaiyou Song; Shan Zhang; Zimeng Luo; Tong Wang; Jin Xie; |
1828 | GridMM: Grid Memory Map for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To represent the previously visited environment, most approaches for VLN implement memory using recurrent states, topological maps, or top-down semantic maps. In contrast to these approaches, we build the top-down egocentric and dynamically growing Grid Memory Map (i.e., GridMM) to structure the visited environment. |
Zihan Wang; Xiangyang Li; Jiahao Yang; Yeqi Liu; Shuqiang Jiang; |
1829 | Probabilistic Modeling of Inter- and Intra-observer Variability in Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel model, called Probabilistic Inter-Observer and iNtra-Observer variation NetwOrk (Pionono). |
Arne Schmidt; Pablo Morales-Álvarez; Rafael Molina; |
1830 | LAC – Latent Action Composition for Skeleton-based Action Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their performances remain limited as the visual features cannot sufficiently express composable actions. In this context, we propose Latent Action Composition (LAC), a novel self-supervised framework aiming at learning from synthesized composable motions for skeleton-based action segmentation. |
Di Yang; Yaohui Wang; Antitza Dantcheva; Quan Kong; Lorenzo Garattoni; Gianpiero Francesca; Francois Bremond; |
1831 | Learning Vision-and-Language Navigation from YouTube Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these videos have not been explored for VLN before. In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it. |
Kunyang Lin; Peihao Chen; Diwei Huang; Thomas H. Li; Mingkui Tan; Chuang Gan; |
1832 | Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Longer videos are more likely to capture the scene from diverse viewpoints (which helps reconstruction) but are also more likely to contain larger motions (which complicates reconstruction). To address these challenges, we present Total-Recon, the first method to photorealistically reconstruct deformable scenes from long monocular RGBD videos. |
Chonghyuk Song; Gengshan Yang; Kangle Deng; Jun-Yan Zhu; Deva Ramanan; |
1833 | AdaNIC: Towards Practical Neural Image Compression Via Dynamic Transform Routing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework with three techniques to enable efficient CAE-based image coding: 1) Spatially-adaptive convolution and normalization operators enable block-wise nonlinear transform to spend FLOPs unevenly across the image to be compressed, according to a transform capacity map. 2) Just-unpenalized model capacity (JUMC) optimizes the transform capacity of each CAE block via rate-distortion-complexity optimization, finding the optimal capacity for the source image content. 3) A lightweight routing agent model predicts the transform capacity map for the CAEs by approximating JUMC targets. |
Lvfang Tao; Wei Gao; Ge Li; Chenhao Zhang; |
1834 | Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view. |
Wentao Bao; Lele Chen; Libing Zeng; Zhong Li; Yi Xu; Junsong Yuan; Yu Kong; |
1835 | Pretrained Language Models As Visual Planners for Human Assistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our pursuit of advancing multi-modal AI assistants capable of guiding users to achieve complex multi-step goals, we propose the task of ‘Visual Planning for Assistance (VPA)’. |
Dhruvesh Patel; Hamid Eghbalzadeh; Nitin Kamra; Michael Louis Iuzzolino; Unnat Jain; Ruta Desai; |
1836 | Dynamic Point Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a dynamic point field model that combines the representational benefits of explicit point-based graphics with implicit deformation networks to allow efficient modeling of non-rigid 3D surfaces. |
Sergey Prokudin; Qianli Ma; Maxime Raafat; Julien Valentin; Siyu Tang; |
1837 | Lip2Vec: Efficient and Robust Visual Speech Recognition Via Latent-to-Latent Visual to Audio Representation Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Un-like previous works that involve auxiliary losses or com-plex training procedures and architectures, we propose a simple approach, named Lip2Vec that is based on learning a prior model. |
Yasser Abdelaziz Dahou Djilali; Sanath Narayan; Haithem Boussaid; Ebtessam Almazrouei; Merouane Debbah; |
1838 | Privacy Preserving Localization Via Coordinate Permutations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to a significant loss of accuracy for the privacy-preserving methods. In this paper, we overcome this limitation by devising a coordinate permutation scheme that allows for recovering the original point positions during pose estimation. |
Linfei Pan; Johannes L. Schönberger; Viktor Larsson; Marc Pollefeys; |
1839 | Random Boxes Are Open-world Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose RandBox, a Fast R-CNN based architecture trained on random proposals at each training iteration, surpassing existing Faster R-CNN and Transformer based OWOD. |
Yanghao Wang; Zhongqi Yue; Xian-Sheng Hua; Hanwang Zhang; |
1840 | DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. |
Shengqu Cai; Eric Ryan Chan; Songyou Peng; Mohamad Shahbazi; Anton Obukhov; Luc Van Gool; Gordon Wetzstein; |
1841 | Spectral Graphormer: Spectral Graph-Based Transformer for Egocentric Two-Hand Reconstruction Using Multi-View Color Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images. |
Tze Ho Elden Tse; Franziska Mueller; Zhengyang Shen; Danhang Tang; Thabo Beeler; Mingsong Dou; Yinda Zhang; Sasa Petrovic; Hyung Jin Chang; Jonathan Taylor; Bardia Doosti; |
1842 | SMMix: Self-Motivated Image Mixing for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an efficient and effective Self-Motivated image Mixing method (SMMix), which motivates both image and label enhancement by the model under training itself. |
Mengzhao Chen; Mingbao Lin; Zhihang Lin; Yuxin Zhang; Fei Chao; Rongrong Ji; |
1843 | Enhancing Adversarial Robustness in Low-Label Regime Via Adaptively Weighted Regularization and Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate semi -supervised adversarial training where labeled data is scarce. |
Dongyoon Yang; Insung Kong; Yongdai Kim; |
1844 | Recovering A Molecule’s 3D Dynamics from Liquid-phase Electron Microscopy Movies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TEMPOR, a Temporal Electron MicroscoPy Object Reconstruction algorithm for liquid-phase EM that leverages an implicit neural representation (INR) and a dynamical variational auto-encoder (DVAE) to recover time series of molecular structures. |
Enze Ye; Yuhang Wang; Hong Zhang; Yiqin Gao; Huan Wang; He Sun; |
1845 | Reconciling Object-Level and Global-Level Objectives for Long-Tail Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework that Reconciles Object-level and Global-level (ROG) objectives to address both problems. |
Shaoyu Zhang; Chen Chen; Silong Peng; |
1846 | In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an approach, In-Style, that learns the style of the text queries and transfers it to uncurated web videos. |
Nina Shvetsova; Anna Kukleva; Bernt Schiele; Hilde Kuehne; |
1847 | MIMO-NeRF: Fast Neural Rendering with Multi-input Multi-output Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a multi-input multi-output NeRF (MIMO-NeRF) that reduces the number of MLPs running by replacing the SISO MLP with a MIMO MLP and conducting mappings in a group-wise manner. |
Takuhiro Kaneko; |
1848 | Instance Neural Radiance Field Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as Instance Neural Radiance Field, or Instance-NeRF. |
Yichen Liu; Benran Hu; Junkai Huang; Yu-Wing Tai; Chi-Keung Tang; |
1849 | One-bit Flip Is All You Need: When Bit-flip Attack Meets Model Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we seek to further reduce the number of bit flips. |
Jianshuo Dong; Han Qiu; Yiming Li; Tianwei Zhang; Yuanjie Li; Zeqi Lai; Chao Zhang; Shu-Tao Xia; |
1850 | CLIPTER: Looking at The Bigger Picture in Scene Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. |
Aviad Aberdam; David Bensaid; Alona Golts; Roy Ganz; Oren Nuriel; Royee Tichauer; Shai Mazor; Ron Litman; |
1851 | Revisiting Scene Text Recognition: A Data Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. |
Qing Jiang; Jiapeng Wang; Dezhi Peng; Chongyu Liu; Lianwen Jin; |
1852 | Improving CLIP Fine-tuning Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the differences, we introduce a classical feature map distillation framework, which can simultaneously inherit the semantic capability of CLIP models while constructing a task incorporated key ingredients of MIM. |
Yixuan Wei; Han Hu; Zhenda Xie; Ze Liu; Zheng Zhang; Yue Cao; Jianmin Bao; Dong Chen; Baining Guo; |
1853 | The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude. |
Yujin Jeong; Wonjeong Ryoo; Seunghyun Lee; Dabin Seo; Wonmin Byeon; Sangpil Kim; Jinkyu Kim; |
1854 | SOCS: Semantically-Aware Object Coordinate Space for Category-Level 6D Object Pose Estimation Under Large Shape Variations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Semantically-aware Object Coordinate Space (SOCS) built by warping-and-aligning the objects guided by a sparse set of keypoints with semantically meaningful correspondence. |
Boyan Wan; Yifei Shi; Kai Xu; |
1855 | NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the recently developed neural radiance fields (NeRF) have shown promising advances in implicit reconstruction for indoor environments, the problem of simultaneous odometry and mapping for large-scale scenarios using incremental LiDAR data remains unexplored. To bridge this gap, in this paper, we propose a novel NeRF-based LiDAR odometry and mapping approach, NeRF-LOAM, consisting of three modules neural odometry, neural mapping, and mesh reconstruction. |
Junyuan Deng; Qi Wu; Xieyuanli Chen; Songpengcheng Xia; Zhen Sun; Guoqing Liu; Wenxian Yu; Ling Pei; |
1856 | DINAR: Diffusion Inpainting of Neural Textures for One-Shot Human Avatars Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present DINAR, an approach for creating realistic rigged fullbody avatars from single RGB images. |
David Svitov; Dmitrii Gudkov; Renat Bashirov; Victor Lempitsky; |
1857 | DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it can’t be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose DPM-OT, a unified learning framework for fast DPMs with the direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. |
Zezeng Li; Shenghao Li; Zhanpeng Wang; Na Lei; Zhongxuan Luo; David Xianfeng Gu; |
1858 | ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current supernet training methods that rely on uniform sampling suffer from the gradient conflict issue: the sampled subnets can have vastly different model sizes (e.g., 50M vs. 2G FLOPs), leading to different optimization directions and inferior performance. To address this challenge, we propose two novel sampling techniques: complexity-aware sampling and performance-aware sampling. |
Chen Tang; Li Lyna Zhang; Huiqiang Jiang; Jiahang Xu; Ting Cao; Quanlu Zhang; Yuqing Yang; Zhi Wang; Mao Yang; |
1859 | OmniLabel: A Challenging Benchmark for Language-Based Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With OmniLabel, we propose a novel task definition, dataset, and evaluation metric. |
Samuel Schulter; Vijay Kumar B G; Yumin Suh; Konstantinos M. Dafnis; Zhixing Zhang; Shiyu Zhao; Dimitris Metaxas; |
1860 | Noise-Aware Learning from Web-Crawled Image-Text Data for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To take the best of both worlds, we propose a Noise-aware Captioning (NoC) framework, which learns rich knowledge from the whole web-crawled data while being less affected by the noises. |
Wooyoung Kang; Jonghwan Mun; Sungjun Lee; Byungseok Roh; |
1861 | Divide&Classify: Fine-Grained Classification for City-Wide Visual Geo-Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using approximate nearest neighbour search for retrieval helps to mitigate this issue, at the cost of a performance drop. In this paper we investigate whether we can effectively approach this task as a classification problem, thus bypassing the need for a similarity search. |
Gabriele Trivigno; Gabriele Berton; Juan Aragon; Barbara Caputo; Carlo Masone; |
1862 | 3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed a novel semantic generative model named 3D Semantic Subspace Traverser that utilizes semantic attributes for category-specific 3D shape generation and editing. |
Ruowei Wang; Yu Liu; Pei Su; Jianwei Zhang; Qijun Zhao; |
1863 | Inherent Redundancy in Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we pose and focus on three key questions regarding the inherent redundancy in SNNs. |
Man Yao; Jiakui Hu; Guangshe Zhao; Yaoyuan Wang; Ziyang Zhang; Bo Xu; Guoqi Li; |
1864 | Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. |
Lukas Höllein; Ang Cao; Andrew Owens; Justin Johnson; Matthias Nießner; |
1865 | On The Robustness of Normalizing Flows for Inverse Problems in Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, unintended severe artifacts are occasionally observed in the output of them. In this work, we address this critical issue by investigating the origins of these artifacts and proposing the conditions to avoid them. |
Seongmin Hong; Inbum Park; Se Young Chun; |
1866 | FastRecon: Few-shot Industrial Anomaly Detection Via Fast Feature Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a few-shot anomaly detection strategy that works in a low-data regime and can generalize across products at no cost. |
Zheng Fang; Xiaoyang Wang; Haocheng Li; Jiejie Liu; Qiugui Hu; Jimin Xiao; |
1867 | Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. |
Yae Jee Cho; Gauri Joshi; Dimitrios Dimitriadis; |
1868 | DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector. |
Zeyu Wang; Dingwen Li; Chenxu Luo; Cihang Xie; Xiaodong Yang; |
1869 | PoseFix: Correcting 3D Human Poses with Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the problem of correcting 3D human poses with natural language. |
Ginger Delmas; Philippe Weinzaepfel; Francesc Moreno-Noguer; Grégory Rogez; |
1870 | TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. |
Carl Doersch; Yi Yang; Mel Vecerik; Dilara Gokay; Ankush Gupta; Yusuf Aytar; Joao Carreira; Andrew Zisserman; |
1871 | SwinLSTM: Improving Spatiotemporal Prediction Accuracy Using Swin Transformer and LSTM Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new recurrent cell, SwinLSTM, which integrates Swin Transformer blocks and the simplified LSTM, an extension that replaces the convolutional structure in ConvLSTM with the self-attention mechanism. |
Song Tang; Chuang Li; Pu Zhang; RongNian Tang; |
1872 | Detecting Objects with Context-Likelihood Graphs and Graph Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to detect objects by exploiting their interrelationships. |
Aritra Bhowmik; Yu Wang; Nora Baka; Martin R. Oswald; Cees G. M. Snoek; |
1873 | Coarse-to-Fine Amodal Segmentation with Shape Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Amodal object segmentation is a challenging task that involves segmenting both visible and occluded parts of an object. In this paper, we propose a novel approach, called Coarse-to-Fine Segmentation (C2F-Seg), that addresses this problem by progressively modeling the amodal segmentation. |
Jianxiong Gao; Xuelin Qian; Yikai Wang; Tianjun Xiao; Tong He; Zheng Zhang; Yanwei Fu; |
1874 | DEDRIFT: Robust Similarity Search Under Content Drift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce and analyze real-world image and video datasets for which temporal information is available over a long time period. |
Dmitry Baranchuk; Matthijs Douze; Yash Upadhyay; I. Zeki Yalniz; |
1875 | Learning Pseudo-Relations for Cross-domain Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a pseudo-relation learning framework, Relation Teacher (RTea), which can exploitable pixel relations to efficiently use unreliable pixels and learn generalized representations. |
Dong Zhao; Shuang Wang; Qi Zang; Dou Quan; Xiutiao Ye; Rui Yang; Licheng Jiao; |
1876 | AdVerb: Visually Guided Audio Dereverberation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. |
Sanjoy Chowdhury; Sreyan Ghosh; Subhrajyoti Dasgupta; Anton Ratnarajah; Utkarsh Tyagi; Dinesh Manocha; |
1877 | Audio-Enhanced Text-to-Video Retrieval Using Text-Conditioned Feature Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment. To address this issue, we introduce TEFAL, a TExt-conditioned Feature ALignment method that produces both audio and video representations conditioned on the text query. |
Sarah Ibrahimi; Xiaohang Sun; Pichao Wang; Amanmeet Garg; Ashutosh Sanan; Mohamed Omar; |
1878 | Open-vocabulary Object Segmentation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i.e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt. |
Ziyi Li; Qinye Zhou; Xiaoyun Zhang; Ya Zhang; Yanfeng Wang; Weidi Xie; |
1879 | Human-centric Scene Understanding for 3D Large-scale Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife, which is collected in diverse daily-life scenarios with rich and fine-grained annotations. |
Yiteng Xu; Peishan Cong; Yichen Yao; Runnan Chen; Yuenan Hou; Xinge Zhu; Xuming He; Jingyi Yu; Yuexin Ma; |
1880 | With A Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. |
Manuele Barraco; Sara Sarto; Marcella Cornia; Lorenzo Baraldi; Rita Cucchiara; |
1881 | SimMatchV2: Semi-Supervised Learning with Graph Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new semi-supervised learning algorithm – SimMatchV2, which formulates various consistency regularizations between labeled and unlabeled data from the graph perspective. |
Mingkai Zheng; Shan You; Lang Huang; Chen Luo; Fei Wang; Chen Qian; Chang Xu; |
1882 | Reinforced Disentanglement for Face Swapping Without Skip Connection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fix them, we introduce a new face swap framework called "WSC-swap" that gets rid of skip connections and uses two target encoders to respectively capture the pixel-level non-facial region attributes and the semantic non-identity attributes in the face region. |
Xiaohang Ren; Xingyu Chen; Pengfei Yao; Heung-Yeung Shum; Baoyuan Wang; |
1883 | PDiscoNet: Semantically Consistent Part Discovery for Fine-grained Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose PDiscoNet to discover object parts by using only image-level class labels along with priors encouraging the parts to be: discriminative, compact, distinct from each other, equivariant to rigid transforms, and active in at least some of the images. |
Robert van der Klis; Stephan Alaniz; Massimiliano Mancini; Cassio F. Dantas; Dino Ienco; Zeynep Akata; Diego Marcos; |
1884 | Privacy-Preserving Face Recognition Using Random Frequency Components Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on recent theoretical insights and our observation on model attention, we propose a solution to the dilemma, by advocating for the training and inference of recognition models on randomly selected frequency components. |
Yuxi Mi; Yuge Huang; Jiazhen Ji; Minyi Zhao; Jiaxiang Wu; Xingkun Xu; Shouhong Ding; Shuigeng Zhou; |
1885 | Vision Transformer Adapters for Generalizable Multitask Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. |
Deblina Bhattacharjee; Sabine Süsstrunk; Mathieu Salzmann; |
1886 | How to Choose Your Best Allies for A Transferable Attack? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new methodology for evaluating transferability by putting distortion in a central position. |
Thibault Maho; Seyed-Mohsen Moosavi-Dezfooli; Teddy Furon; |
1887 | CVRecon: Rethinking 3D Geometric Feature Learning For Neural Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Drawing inspiration from traditional multi-view stereo methods, we propose an end-to-end 3D neural reconstruction framework CVRecon, designed to exploit the rich geometric embedding in the cost volumes to facilitate 3D geometric feature learning. |
Ziyue Feng; Liang Yang; Pengsheng Guo; Bing Li; |
1888 | Self-Supervised Object Detection from Egocentric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the egocentric domain in mind, we address the problem of self-supervised, class-agnostic object detection, which aims to locate all objects in a given view, regardless of category, without any annotations or pre-training weights. |
Peri Akiva; Jing Huang; Kevin J Liang; Rama Kovvuri; Xingyu Chen; Matt Feiszli; Kristin Dana; Tal Hassner; |
1889 | Prior-guided Source-free Domain Adaptation for Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Prior-guided Self-training (POST), a pseudo-labeling approach that builds on the popular Mean Teacher framework to compensate for the distribution shift. |
Dripta S. Raychaudhuri; Calvin-Khang Ta; Arindam Dutta; Rohit Lal; Amit K. Roy-Chowdhury; |
1890 | ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ClothesNet: a large-scale dataset of 3D clothes objects with information-rich annotations. |
Bingyang Zhou; Haoyu Zhou; Tianhai Liang; Qiaojun Yu; Siheng Zhao; Yuwei Zeng; Jun Lv; Siyuan Luo; Qiancai Wang; Xinyuan Yu; Haonan Chen; Cewu Lu; Lin Shao; |
1891 | Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper considers a new direction by introducing a model learning framework with auxiliary tasks. |
Chenxin Xu; Robby T. Tan; Yuhong Tan; Siheng Chen; Xinchao Wang; Yanfeng Wang; |
1892 | Measuring Asymmetric Gradient Discrepancy in Parallel Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, in this paper, we formulate the PCL into a minimum distance optimization problem among gradients and propose an explicit Asymmetric Gradient Distance (AGD) to evaluate the gradient discrepancy in PCL. |
Fan Lyu; Qing Sun; Fanhua Shang; Liang Wan; Wei Feng; |
1893 | StyleLipSync: Style-based Personalized Lip-sync Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present StyleLipSync, a style-based personalized lip-sync video generative model that can generate identity-agnostic lip-synchronizing video from arbitrary audio. |
Taekyung Ki; Dongchan Min; |
1894 | Cross Contrasting Feature Perturbation for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Cross Contrasting Feature Perturbation (CCFP) framework to simulate domain shift by generating perturbed features in the latent space while regularizing the model prediction against domain shift. |
Chenming Li; Daoan Zhang; Wenjian Huang; Jianguo Zhang; |
1895 | DiffusionRet: Generative Text-Video Retrieval with Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While straightforward, this de facto paradigm overlooks the underlying data distribution p(query), which makes it challenging to identify out-of-distribution data. To address this limitation, we creatively tackle this task from a generative viewpoint and model the correlation between the text and the video as their joint probability p(candidates,query). |
Peng Jin; Hao Li; Zesen Cheng; Kehan Li; Xiangyang Ji; Chang Liu; Li Yuan; Jie Chen; |
1896 | Efficient 3D Semantic Segmentation with Superpoint Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. |
Damien Robert; Hugo Raguet; Loic Landrieu; |
1897 | Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). |
Satoshi Suzuki; Shin’ya Yamaguchi; Shoichiro Takeda; Sekitoshi Kanai; Naoki Makishima; Atsushi Ando; Ryo Masumura; |
1898 | HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose HyperDiffusion, a novel approach for unconditional generative modeling of implicit neural fields. |
Ziya Erkoç; Fangchang Ma; Qi Shan; Matthias Nießner; Angela Dai; |
1899 | Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate a simple yet principled One-stage Retinex-based Framework (ORF). |
Yuanhao Cai; Hao Bian; Jing Lin; Haoqian Wang; Radu Timofte; Yulun Zhang; |
1900 | Minimum Latency Deep Online Video Stabilization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel camera path optimization framework for the task of online video stabilization. |
Zhuofan Zhang; Zhen Liu; Ping Tan; Bing Zeng; Shuaicheng Liu; |
1901 | Speech2Lip: High-fidelity Speech to Lip Generation By Learning from A Short Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose a decomposition-synthesis-composition framework named Speech to Lip (Speech2Lip) that disentangles speech-sensitive and speech-insensitive motion/appearance to facilitate effective learning from limited training data, resulting in the generation of natural-looking videos. |
Xiuzhe Wu; Pengfei Hu; Yang Wu; Xiaoyang Lyu; Yan-Pei Cao; Ying Shan; Wenming Yang; Zhongqian Sun; Xiaojuan Qi; |
1902 | UHDNeRF: Ultra-High-Definition Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose UHDNeRF, a new framework for novel view synthesis on the challenging ultra-high-resolution (e.g., 4K) real-world scenes. |
Quewei Li; Feichao Li; Jie Guo; Yanwen Guo; |
1903 | Linear Spaces of Meanings: Compositional Structures in Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). |
Matthew Trager; Pramuditha Perera; Luca Zancato; Alessandro Achille; Parminder Bhatia; Stefano Soatto; |
1904 | MULLER: Multilayer Laplacian Resizer for Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an extremely lightweight multilayer Laplacian resizer with only a handful of trainable parameters, dubbed MULLER resizer. |
Zhengzhong Tu; Peyman Milanfar; Hossein Talebi; |
1905 | X-VoE: Measuring EXplanatory Violation of Expectation in Physical Events Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study introduces X-VoE, a comprehensive benchmark dataset, to assess AI agents’ grasp of intuitive physics. |
Bo Dai; Linge Wang; Baoxiong Jia; Zeyu Zhang; Song-Chun Zhu; Chi Zhang; Yixin Zhu; |
1906 | Tracking By Natural Language Specification with Long Short-term Context Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the linguistic information contained in the textual query and the visual representation stored in the search area may sometimes be inconsistent, in which case the direct fusion of the two may lead to conflicts. To address this problem, we propose DecoupleTNL, introducing a video clip containing short-term context information into the framework of TNL and exploring a proper way to reduce the impact when visual representation is inconsistent with linguistic information. |
Ding Ma; Xiangqian Wu; |
1907 | COOP: Decoupling and Coupling of Whole-Body Grasping Pose Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the generated grasping poses of the human body in the key-frames are limited, failing to capture the full range of grasping poses that humans are capable of. To address this issue, we propose a novel framework called COOP (DeCOupling and COupling of Whole-Body GrasPing Pose Generation) to synthesize life-like whole-body poses that cover the widest range of human grasping capabilities. |
Yanzhao Zheng; Yunzhou Shi; Yuhao Cui; Zhongzhou Zhao; Zhiling Luo; Wei Zhou; |
1908 | Pyramid Dual Domain Injection Network for Pan-sharpening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this way, the model can capture multi-scale dual-domain information to enable generating high-quality pan-sharpening results. |
Xuanhua He; Keyu Yan; Rui Li; Chengjun Xie; Jie Zhang; Man Zhou; |
1909 | Why Do Networks Have Inhibitory/negative Connections? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an answer from the perspective of representation capacity. |
Qingyang Wang; Mike A. Powell; Ali Geisa; Eric Bridgeford; Carey E. Priebe; Joshua T. Vogelstein; |
1910 | Ordinal Label Distribution Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The value of a particular label contains information about previous labels, and we adopt cumulative distribution to construct this relationship. Based on these characteristics of ordinal labels, we propose the learning objectives and evaluation metrics for OLDL, namely CAD, QFD, and CJS. |
Changsong Wen; Xin Zhang; Xingxu Yao; Jufeng Yang; |
1911 | Model Calibration in Dense Classification with Adaptive Label Perturbation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. |
Jiawei Liu; Changkun Ye; Shan Wang; Ruikai Cui; Jing Zhang; Kaihao Zhang; Nick Barnes; |
1912 | Boosting Multi-modal Model Performance with Adaptive Gradient Modulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first propose an adaptive gradient modulation method that can boost the performance of multi-modal models with various fusion strategies. |
Hong Li; Xingyu Li; Pengbo Hu; Yinuo Lei; Chunxiao Li; Yi Zhou; |
1913 | Semantic Information in Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work investigates the functionality of Semantic information in Contrastive Learning (SemCL). |
Shengjiang Quan; Masahiro Hirano; Yuji Yamakawa; |
1914 | Structure and Content-Guided Video Synthesis with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a structure and content-guided video diffusion model that edits videos based on descriptions of the desired output. |
Patrick Esser; Johnathan Chiu; Parmida Atighehchian; Jonathan Granskog; Anastasis Germanidis; |
1915 | NeSS-ST: Detecting Good and Stable Keypoints with A Neural Stability Score and The Shi-Tomasi Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning a feature point detector presents a challenge both due to the ambiguity of the definition of a keypoint and, correspondingly, the need for specially prepared ground truth labels for such points. In our work, we address both of these issues by utilizing a combination of a hand-crafted Shi-Tomasi detector, a specially designed metric that assesses the quality of keypoints, the stability score (SS), and a neural network. |
Konstantin Pakulev; Alexander Vakhitov; Gonzalo Ferrer; |
1916 | Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards a more comprehensive measure of skin color, we introduce the hue angle ranging from red to yellow. |
William Thong; Przemyslaw Joniak; Alice Xiang; |
1917 | PODA: Prompt-driven Zero-shot Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the task of ‘Prompt-driven Zero-shot Domain Adaptation’, where we adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prompt. |
Mohammad Fahes; Tuan-Hung Vu; Andrei Bursuc; Patrick Pérez; Raoul de Charette; |
1918 | Video Action Segmentation Via Contextually Refined Temporal Keypoints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In specific, we develop a graph matching module that aggregates structural information between different temporal keypoints by learning the corresponding relationship of the temporal source graphs and the annotated target graphs. |
Borui Jiang; Yang Jin; Zhentao Tan; Yadong Mu; |
1919 | Shatter and Gather: Learning Referring Image Segmentation with Text Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source of supervision. |
Dongwon Kim; Namyup Kim; Cuiling Lan; Suha Kwak; |
1920 | Two-in-One Depth: Bridging The Gap Between Monocular and Binocular Self-Supervised Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Two-in-One self-supervised depth estimation network, called TiO-Depth, which could not only compatibly handle the two tasks, but also improve the prediction accuracy. |
Zhengming Zhou; Qiulei Dong; |
1921 | SAFL-Net: Semantic-Agnostic Feature Learning Network with Auxiliary Plugins for Image Manipulation Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose SAFL-Net, which constrains a feature extractor to learn semantic-agnostic features by designing specific modules with corresponding auxiliary tasks. |
Zhihao Sun; Haoran Jiang; Danding Wang; Xirong Li; Juan Cao; |
1922 | DataDAM: Efficient Dataset Distillation with Attention Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite promising results, there still exists a significant performance gap between models trained on condensed synthetic sets and those trained on the whole dataset. In this paper, we address these challenges using efficient Dataset Distillation with Attention Matching (DataDAM), achieving state-of-the-art performance while reducing training costs. |
Ahmad Sajedi; Samir Khaki; Ehsan Amjadian; Lucy Z. Liu; Yuri A. Lawryshyn; Konstantinos N. Plataniotis; |
1923 | Rethinking Pose Estimation in Crowds: Overcoming The Detection Information Bottleneck and Ambiguity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, when individuals closely interact, top-down methods are ill-defined due to overlapping individuals, and bottom-up methods often falsely infer connections to distant bodyparts. Thus, we propose a novel pipeline called bottom-up conditioned top-down pose estimation (BUCTD) that combines the strengths of bottom-up and top-down methods. |
Mu Zhou; Lucas Stoffl; Mackenzie Weygandt Mathis; Alexander Mathis; |
1924 | Social Diffusion: Long-term Multiple Human Motion Anticipation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Social Diffusion, a novel method for short-term and long-term forecasting of the motion of multiple persons as well as their social interactions. |
Julian Tanke; Linguang Zhang; Amy Zhao; Chengcheng Tang; Yujun Cai; Lezi Wang; Po-Chen Wu; Juergen Gall; Cem Keskin; |
1925 | Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network. |
Teli Ma; Mengmeng Wang; Jimin Xiao; Huifeng Wu; Yong Liu; |
1926 | Leveraging Intrinsic Properties for Non-Rigid Garment Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve wrinkle-level as well as texture-level alignment, we present a novel coarse-to-fine two-stage method that leverages intrinsic manifold properties with two neural deformation fields, in the 3D space and the intrinsic space, respectively. |
Siyou Lin; Boyao Zhou; Zerong Zheng; Hongwen Zhang; Yebin Liu; |
1927 | NeILF++: Inter-Reflectable Light Fields for Geometry and Material Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel differentiable rendering framework for joint geometry, material, and lighting estimation from multi-view images. |
Jingyang Zhang; Yao Yao; Shiwei Li; Jingbo Liu; Tian Fang; David McKinnon; Yanghai Tsin; Long Quan; |
1928 | MAGI: Multi-Annotated Explanation-Guided Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, how to use multiple annotations to improve accuracy is particularly challenging due to the following: 1) The noisiness of annotations from different annotators; 2) The lack of pre-given information about the corresponding relationship between annotations and annotators; 3) Missing annotations since some images are not labeled by all annotators. To solve these challenges, we propose a Multi-annotated explanation-guided learning (MAGI) framework to do explanation supervision with comprehensive and high-quality generated annotations. |
Yifei Zhang; Siyi Gu; Yuyang Gao; Bo Pan; Xiaofeng Yang; Liang Zhao; |
1929 | Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present adaptive positional encoding (APE) for bundle-adjusting neural radiance fields to reconstruct the neural radiance fields from unknown camera poses (or even intrinsics). |
Zelin Gao; Weichen Dai; Yu Zhang; |
1930 | Inducing Neural Collapse to A Fixed Hierarchy-Aware Frame for Reducing Mistake Severity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to fix the linear classifier of a deep neural network to a Hierarchy-Aware Frame (HAFrame), instead of an ETF, and use a cosine similarity-based auxiliary loss to learn hierarchy-aware penultimate features that collapse to the HAFrame. |
Tong Liang; Jim Davis; |
1931 | PlanarTrack: A Large-scale Challenging Benchmark for Planar Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite rapid progress, its further development, especially in the deep learning era, is largely hindered due to the lack of large-scale challenging benchmarks. Addressing this, we introduce PlanarTrack, a large-scale challenging planar tracking benchmark. |
Xinran Liu; Xiaoqiong Liu; Ziruo Yi; Xin Zhou; Thanh Le; Libo Zhang; Yan Huang; Qing Yang; Heng Fan; |
1932 | Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our algorithm enables accurate material and lighting optimization faster than previous work, and is more effective at resolving ambiguities. |
Liwen Wu; Rui Zhu; Mustafa B. Yaldiz; Yinhao Zhu; Hong Cai; Janarbek Matai; Fatih Porikli; Tzu-Mao Li; Manmohan Chandraker; Ravi Ramamoorthi; |
1933 | P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to previous approaches, we present Partial2Complete (P2C), the first self-supervised framework that completes point cloud objects using training samples consisting of only a single incomplete point cloud per object. |
Ruikai Cui; Shi Qiu; Saeed Anwar; Jiawei Liu; Chaoyue Xing; Jing Zhang; Nick Barnes; |
1934 | Overwriting Pretrained Bias with Finetuning Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate bias when conceptualized as both spurious correlations between the target task and a sensitive attribute as well as underrepresentation of a particular group in the dataset. |
Angelina Wang; Olga Russakovsky; |
1935 | Anti-DreamBooth: Protecting Users from Personalized Text-to-image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when misused, such a powerful and convenient tool can produce fake news or disturbing content targeting any individual victim, posing a severe negative social impact. In this paper, we explore a defense system called Anti-DreamBooth against such malicious use of DreamBooth. |
Thanh Van Le; Hao Phung; Thuan Hoang Nguyen; Quan Dao; Ngoc N. Tran; Anh Tran; |
1936 | Contrastive Continuity on Augmentation Stability Rehearsal for Continual Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to address catastrophic forgetting without overfitting on the rehearsal samples, we propose Augmentation Stability Rehearsal (ASR) in this paper, which selects the most representative and discriminative samples by estimating the augmentation stability for rehearsal. |
Haoyang Cheng; Haitao Wen; Xiaoliang Zhang; Heqian Qiu; Lanxiao Wang; Hongliang Li; |
1937 | Treating Pseudo-labels Generation As Image Matting for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a Mat-Label pipeline that provides a fresh way to treat WSSS pseudo-labels generation as an image matting task. |
Changwei Wang; Rongtao Xu; Shibiao Xu; Weiliang Meng; Xiaopeng Zhang; |
1938 | Structural Alignment for Network Pruning Through Partial Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel channel pruning method to reduce the computational and storage costs of Convolutional Neural Networks (CNNs). |
Shangqian Gao; Zeyu Zhang; Yanfu Zhang; Feihu Huang; Heng Huang; |
1939 | Learning Long-Range Information with Dual-Scale Transformers for Indoor Scene Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the problem, we propose a novel Dual-Scale Transformer Network (DST-Net) that efficiently utilizes both long-range and short-range spatial context information to improve the quality of 3D scene completion. |
Ziqi Wang; Fei Luo; Xiaoxiao Long; Wenxiao Zhang; Chunxia Xiao; |
1940 | A Game of Bundle Adjustment – Learning Efficient Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This might take many iterations, making the process computationally expensive, which might be harmful to real-time applications. We propose to replace this heuristic by viewing the problem in a holistic manner, as a game, and formulating it as a reinforcement-learning task. |
Amir Belder; Refael Vivanti; Ayellet Tal; |
1941 | Learning Correction Filter Via Degradation-Adaptive Regression for Blind Single Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the problem, we propose an innovative unsupervised method of Learning Correction Filter via Degradation-Adaptive Regression for Blind Single Image Super-Resolution. |
Hongyang Zhou; Xiaobin Zhu; Jianqing Zhu; Zheng Han; Shi-Xue Zhang; Jingyan Qin; Xu-Cheng Yin; |
1942 | UMFuse: Unified Multi View Fusion for Human Editing Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. |
Rishabh Jain; Mayur Hemani; Duygu Ceylan; Krishna Kumar Singh; Jingwan Lu; Mausoom Sarkar; Balaji Krishnamurthy; |
1943 | CROSSFIRE: Camera Relocalization On Self-Supervised Features from An Implicit Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond novel view synthesis, Neural Radiance Fields are useful for applications that interact with the real world. In this paper, we use them as an implicit map of a given scene and propose a camera relocalization algorithm tailored for this representation. |
Arthur Moreau; Nathan Piasco; Moussab Bennehar; Dzmitry Tsishkou; Bogdan Stanciulescu; Arnaud de La Fortelle; |
1944 | Discriminative Class Tokens for Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a pretrained classifier. |
Idan Schwartz; Vésteinn Snæbjarnarson; Hila Chefer; Serge Belongie; Lior Wolf; Sagie Benaim; |
1945 | SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example `waving hand’ while `walking’ at the same time. |
Nikos Athanasiou; Mathis Petrovich; Michael J. Black; Gül Varol; |
1946 | ORC: Network Group-based Knowledge Distillation Using Online Role Change Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple networks, we divide the multiple networks into teacher and student groups, respectively. |
Junyong Choi; Hyeon Cho; Seokhwa Cheung; Wonjun Hwang; |
1947 | Audiovisual Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and objectives within the masked autoencoding framework, motivated by the success of similar methods in natural language and image understanding. |
Mariana-Iuliana Georgescu; Eduardo Fonseca; Radu Tudor Ionescu; Mario Lucic; Cordelia Schmid; Anurag Arnab; |
1948 | MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework, dubbed MV-DeepSDF, which estimates the optimal Signed Distance Function (SDF) shape representation from multi-sweep point clouds to reconstruct vehicles in the wild. |
Yibo Liu; Kelly Zhu; Guile Wu; Yuan Ren; Bingbing Liu; Yang Liu; Jinjun Shan; |
1949 | CHORD: Category-level Hand-held Object Reconstruction Via Shape Deformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This can be attributed to the fact that humans have mastered the shape prior of the ‘mug’ category, and can quickly establish the corresponding relations between different mug instances and the prior, such as where the rim and handle are located. In light of this, we propose a new method, CHORD, for Category-level Hand-held Object Reconstruction via shape Deformation. |
Kailin Li; Lixin Yang; Haoyu Zhen; Zenan Lin; Xinyu Zhan; Licheng Zhong; Jian Xu; Kejian Wu; Cewu Lu; |
1950 | Unmasking Anomalies in Road-Scene Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a paradigm change by shifting from a per-pixel classification to a mask classification. |
Shyam Nandan Rai; Fabio Cermelli; Dario Fontanel; Carlo Masone; Barbara Caputo; |
1951 | DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel approach for domain generalization from a novel perspective of enhancing the robustness of channels in feature maps to domain shifts. |
Jintao Guo; Lei Qi; Yinghuan Shi; |
1952 | Towards Universal LiDAR-Based 3D Object Detection By Multi-Domain Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore informative knowledge across domains towards a universal 3D object detector, we propose a multi-domain knowledge transfer framework with universal feature transformation. |
Guile Wu; Tongtong Cao; Bingbing Liu; Xingxin Chen; Yuan Ren; |
1953 | StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel motion generator design that uses a learning-based inversion network for GAN. |
Yuhan Wang; Liming Jiang; Chen Change Loy; |
1954 | Self-Calibrated Cross Attention Network for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, as both query FG and BG are combined with support FG, they get entangled, thereby leading to ineffective segmentation. To cope with these issues, we design a self-calibrated cross attention (SCCA) block. |
Qianxiong Xu; Wenting Zhao; Guosheng Lin; Cheng Long; |
1955 | Anatomical Invariance Modeling and Semantic Alignment for Self-supervised Learning in 3D Medical Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new self-supervised learning framework, namely Alice, that explicitly fulfills Anatomical invariance modeling and semantic alignment via elaborately combining discriminative and generative objectives. |
Yankai Jiang; Mingze Sun; Heng Guo; Xiaoyu Bai; Ke Yan; Le Lu; Minfeng Xu; |
1956 | Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using Only Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D face using text guidance. |
Cuican Yu; Guansong Lu; Yihan Zeng; Jian Sun; Xiaodan Liang; Huibin Li; Zongben Xu; Songcen Xu; Wei Zhang; Hang Xu; |
1957 | SSDA: Secure Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our investigation of the current SFDA setting reveals that because of the unique challenges present in SFDA (e.g., no source data, target label), defending against backdoor attack using existing defenses become practically ineffective in protecting the target model. To address this, we propose a novel target domain protection scheme called secure source-free domain adaptation (SSDA). |
Sabbir Ahmed; Abdullah Al Arafat; Mamshad Nayeem Rizve; Rahim Hossain; Zhishan Guo; Adnan Siraj Rakin; |
1958 | ENTL: Embodied Navigation Trajectory Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Embodied Navigation Trajectory Learner (ENTL), a method for extracting long sequence representations for embodied navigation. |
Klemen Kotar; Aaron Walsman; Roozbeh Mottaghi; |
1959 | AGG-Net: Attention Guided Gated-Convolutional Network for Depth Image Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new model for depth image completion based on the Attention Guided Gated-convolutional Network (AGG-Net), through which more accurate and reliable depth images can be obtained based on the raw depth maps and the corresponding RGB images. |
Dongyue Chen; Tingxuan Huang; Zhimin Song; Shizhuo Deng; Tong Jia; |
1960 | Learning Global-aware Kernel for Image Harmonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, they still show a limited performance across varied foreground objects and scenes. To address this issue, we propose a novel Global-aware Kernel Network (GKNet) to harmonize local regions with comprehensive consideration of long-distance background references. |
Xintian Shen; Jiangning Zhang; Jun Chen; Shipeng Bai; Yue Han; Yabiao Wang; Chengjie Wang; Yong Liu; |
1961 | Real-Time Neural Rasterization for Large Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for realistic real-time novel-view synthesis (NVS) of large scenes. |
Jeffrey Yunfan Liu; Yun Chen; Ze Yang; Jingkang Wang; Sivabalan Manivasagam; Raquel Urtasun; |
1962 | ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue that the explicit synergy considering distinct characteristics of text detection and recognition can significantly improve the performance text spotting. |
Mingxin Huang; Jiaxin Zhang; Dezhi Peng; Hao Lu; Can Huang; Yuliang Liu; Xiang Bai; Lianwen Jin; |
1963 | UGC: Unified GAN Compression for Efficient Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. |
Yuxi Ren; Jie Wu; Peng Zhang; Manlin Zhang; Xuefeng Xiao; Qian He; Rui Wang; Min Zheng; Xin Pan; |
1964 | Efficient View Synthesis with Neural Radiance Distribution Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new representation called Neural Radiance Distribution Field (NeRDF) that targets efficient view synthesis in real-time. |
Yushuang Wu; Xiao Li; Jinglu Wang; Xiaoguang Han; Shuguang Cui; Yan Lu; |
1965 | MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, visual speech is not as distinguishable as audio speech, making it difficult to develop a mapping from source speech phonemes to the target language text. To address this issue, we propose MixSpeech, a cross-modality self-learning framework that utilizes audio speech to regularize the training of visual speech tasks. |
Xize Cheng; Tao Jin; Rongjie Huang; Linjun Li; Wang Lin; Zehan Wang; Ye Wang; Huadai Liu; Aoxiong Yin; Zhou Zhao; |
1966 | Chordal Averaging on Flag Manifolds and Its Applications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new, provably-convergent algorithm for computing the flag-mean and flag-median of a set of points on a flag manifold under the chordal metric. |
Nathan Mankovich; Tolga Birdal; |
1967 | Towards Building More Robust Models with Frequency Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a plug-and-play module called the Frequency Preference Control Module that adaptively reconfigures the low- and high-frequency components of intermediate feature representations, providing better utilization of frequency in robust learning. |
Qingwen Bu; Dong Huang; Heming Cui; |
1968 | SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other side, sparse detectors follow a query-based paradigm without explicit dense BEV feature construction, but achieve worse performance than the dense counterparts. In this paper, we find that the key to mitigate this performance gap is the adaptability of the detector in both BEV and image space. |
Haisong Liu; Yao Teng; Tao Lu; Haiguang Wang; Limin Wang; |
1969 | Boosting Whole Slide Image Classification from The Perspectives of Distribution, Correlation and Magnification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are still three important issues that have not been fully addressed: (1) positive bags with a low positive instance ratio are prone to the influence of a large number of negative instances; (2) the correlation between local and global features of pathology images has not been fully modeled; and (3) there is a lack of effective information interaction between different magnifications. In this paper, we propose MILBooster, a powerful dual-scale multi-stage MIL framework to address these issues from the perspectives of distribution, correlation, and magnification. |
Linhao Qu; Zhiwei Yang; Minghong Duan; Yingfan Ma; Shuo Wang; Manning Wang; Zhijian Song; |
1970 | PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in multi-agent RL systems, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their bad impact. |
Junfeng Guo; Ang Li; Lixu Wang; Cong Liu; |
1971 | Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current methods often fail to accurately reconstruct reflective surfaces, leading to severe ambiguity. To overcome this issue, we propose Ref-NeuS, which aims to reduce ambiguity by attenuating the effect of reflective surfaces. |
Wenhang Ge; Tao Hu; Haoyu Zhao; Shu Liu; Ying-Cong Chen; |
1972 | Innovating Real Fisheye Image Correction with Dual Diffusion Architecture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fisheye image rectification is hindered by synthetic models producing poor results for real-world correction. To address this, we propose a Dual Diffusion Architecture (DDA) for fisheye rectification that offers better practicality. |
Shangrong Yang; Chunyu Lin; Kang Liao; Yao Zhao; |
1973 | Global Perception Based Autoregressive Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an autoregressive-based framework for NPs, based on their autoregressive properties. |
Jinyang Tai; |
1974 | Class-incremental Continual Learning for Instance Segmentation with Image-level Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a continual-learning method to segment object instances from image-level labels. |
Yu-Hsing Hsieh; Guan-Sheng Chen; Shun-Xian Cai; Ting-Yun Wei; Huei-Fang Yang; Chu-Song Chen; |
1975 | When Prompt-based Incremental Learning Does Not Meet Strong Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a learnable Adaptive Prompt Generator (APG). |
Yu-Ming Tang; Yi-Xing Peng; Wei-Shi Zheng; |
1976 | Multimodal High-order Relation Transformer for Scene Boundary Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite significant advancements in this area, this task remains a challenging problem as it requires a comprehensive understanding of multimodal cues and high-level semantics. To tackle this issue, we propose a multimodal high-order relation transformer, which integrates a high-order encoder and an adaptive decoder in a unified framework. |
Xi Wei; Zhangxiang Shi; Tianzhu Zhang; Xiaoyuan Yu; Lei Xiao; |
1977 | Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cope with the novel Tri-Mip representation, we propose a cone-casting rendering technique to efficiently sample anti-aliased 3D features with the Tri-Mip encoding considering both pixel imaging and observing distance. |
Wenbo Hu; Yuling Wang; Lin Ma; Bangbang Yang; Lin Gao; Xiao Liu; Yuewen Ma; |
1978 | LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, obstacle classes, and acquisition conditions among the related datasets. |
Lojze Žust; Janez Perš; Matej Kristan; |
1979 | Exploring Transformers for Open-world Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we utilize the Transformer for open-world instance segmentation and present SWORD. |
Jiannan Wu; Yi Jiang; Bin Yan; Huchuan Lu; Zehuan Yuan; Ping Luo; |
1980 | VQA Therapy: Exploring Answer Differences By Visually Grounding Answers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given that different people can provide different answers to a visual question, we aim to better understand why with answer groundings. |
Chongyan Chen; Samreen Anjum; Danna Gurari; |
1981 | Energy-based Self-Training and Normalization for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an Unsupervised Domain Adaptation (UDA) method by making use of Energy-Based Learning (EBL) and demonstrate 1. |
Samitha Herath; Basura Fernando; Ehsan Abbasnejad; Munawar Hayat; Shahram Khadivi; Mehrtash Harandi; Hamid Rezatofighi; Gholamreza Haffari; |
1982 | Self-Evolved Dynamic Expansion Model for Task-Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel and effective framework for TFCL, which dynamically expands the architecture of a DEM model through a self-assessment mechanism evaluating the diversity of knowledge among existing experts as expansion signals. |
Fei Ye; Adrian G. Bors; |
1983 | Adaptive Template Transformer for Mitochondria Segmentation in Electron Microscopy Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of existing methods struggle to adapt to different scales and appearances of the input due to the inherent limitations of the traditional CNN architecture. To mitigate these limitations, we propose a novel adaptive template transformer (ATFormer) for mitochondria segmentation. |
Yuwen Pan; Naisong Luo; Rui Sun; Meng Meng; Tianzhu Zhang; Zhiwei Xiong; Yongdong Zhang; |
1984 | BEVBert: Multimodal Map Pre-training for Language-guided Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose a new map-based pre-training paradigm that is spatial-aware for use in VLN. |
Dong An; Yuankai Qi; Yangguang Li; Yan Huang; Liang Wang; Tieniu Tan; Jing Shao; |
1985 | Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose to explore collaborative tracking learning (CoTracker) for frame-rate-insensitive MOT in a query-based end-to-end manner. |
Yiheng Liu; Junta Wu; Yi Fu; |
1986 | Tangent Model Composition for Ensembling and Continual Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tangent Model Composition (TMC) is a method to combine component models independently fine-tuned around a pre-trained point. |
Tian Yu Liu; Stefano Soatto; |
1987 | Knowledge-Spreader: Learning Semi-Supervised Facial Action Dynamics By Consistifying Knowledge Granularity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, the reliance on offline design and excessive parameters hinder the efficiency of the learning process. To remedy these issues, we propose a lightweight and on-line semi-supervised framework, a so-called Knowledge-Spreader (KS), to learn AU dynamics with sparse annotations. |
Xiaotian Li; Xiang Zhang; Taoyue Wang; Lijun Yin; |
1988 | SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally analyze the backward process of classic SG and find that the membrane accumulation through time leads to exponential growth of training time. |
Jingtao Wang; Zengjie Song; Yuxi Wang; Jun Xiao; Yuran Yang; Shuqi Mei; Zhaoxiang Zhang; |
1989 | Manipulate By Seeing: Creating Manipulation Controllers from Pre-Trained Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a scalable alternative where the visual representations can help directly infer robot actions. |
Jianren Wang; Sudeep Dasari; Mohan Kumar Srirama; Shubham Tulsiani; Abhinav Gupta; |
1990 | Learning Human-Human Interactions in Images from Weak Textual Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new paradigm of learning human-human interactions as free text from a single still image, allowing for flexibility in modeling the unlimited space of situations and relationships between people. |
Morris Alper; Hadar Averbuch-Elor; |
1991 | Prompt-aligned Gradient for Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Prompt-aligned Gradient, dubbed ProGrad to prevent prompt tuning from forgetting the general knowledge learned from VLMs. |
Beier Zhu; Yulei Niu; Yucheng Han; Yue Wu; Hanwang Zhang; |
1992 | Aperture Diffraction for Compact Snapshot Spectral Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. |
Tao Lv; Hao Ye; Quan Yuan; Zhan Shi; Yibo Wang; Shuming Wang; Xun Cao; |
1993 | Diffusion Action Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. |
Daochang Liu; Qiyue Li; Anh-Dung Dinh; Tingting Jiang; Mubarak Shah; Chang Xu; |
1994 | Prototype Reminiscence and Augmented Asymmetric Knowledge Aggregation for Non-Exemplar Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, since the model continuously learns new knowledge, the stored prototypical representations cannot correctly model the properties of old classes in the existence of knowledge updates. To address this problem, we propose a novel prototype reminiscence mechanism that incorporates the previous class prototypes with arriving new class features to dynamically reshape old class feature distributions thus preserving the decision boundaries of previous tasks. |
Wuxuan Shi; Mang Ye; |
1995 | Exemplar-Free Continual Transformer with Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new exemplar-free approach for class/task incremental learning called ConTraCon, which does not require task-id to be explicitly present during inference and avoids the need for storing previous training instances. |
Anurag Roy; Vinay K. Verma; Sravan Voonna; Kripabandhu Ghosh; Saptarshi Ghosh; Abir Das; |
1996 | Scalable Video Object Segmentation with Simplified Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the above hand-crafted designs empirically cause insufficient target interaction, thus limiting the dynamic target-aware feature learning in VOS. To tackle these limitations, this paper presents a scalable Simplified VOS (SimVOS) framework to perform joint feature extraction and matching by leveraging a single transformer backbone. |
Qiangqiang Wu; Tianyu Yang; Wei Wu; Antoni B. Chan; |
1997 | Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget Less Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first rehearsal-free method for Domain Continual Learning (DCL) of FAS, which deals with catastrophic forgetting and unseen domain generalization problems simultaneously. |
Rizhao Cai; Yawen Cui; Zhi Li; Zitong Yu; Haoliang Li; Yongjian Hu; Alex Kot; |
1998 | Efficient Decision-based Black-box Patch Attacks on Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve a query-efficient attack, we propose a spatial-temporal differential evolution (STDE) framework. |
Kaixun Jiang; Zhaoyu Chen; Hao Huang; Jiafeng Wang; Dingkang Yang; Bo Li; Yan Wang; Wenqiang Zhang; |
1999 | Kick Back & Relax: Learning to Reconstruct The World By Watching SlowTV Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. |
Jaime Spencer; Chris Russell; Simon Hadfield; Richard Bowden; |
2000 | MetaGCD: Learning to Continually Learn in Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a real-world scenario where a model that is trained on pre-defined classes continually encounters unlabeled data that contains both known and novel classes. |
Yanan Wu; Zhixiang Chi; Yang Wang; Songhe Feng; |
2001 | Strip-MLP: Efficient Token Interaction for Vision MLP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the power of token interaction on the spatial dimension is highly dependent on the spatial resolution of the feature maps, which limits the model’s expressive ability, especially in deep layers where the feature are down-sampled to a small spatial size. To address this issue, we present a novel method called Strip-MLP to enrich the token interaction power in three ways. |
Guiping Cao; Shengda Luo; Wenjian Huang; Xiangyuan Lan; Dongmei Jiang; Yaowei Wang; Jianguo Zhang; |
2002 | SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify several challenges that the state-of-the-art is unable to cope with collectively: i) existing metrics are not comprehensive ii) XAI techniques are highly heterogeneous; iii) misinterpretations are normally rare events. |
Wei Huang; Xingyu Zhao; Gaojie Jin; Xiaowei Huang; |
2003 | ChildPlay: A New Benchmark for Understanding Children’s Gaze Behaviour Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the first study for predicting the gaze target of children and interacting adults. |
Samy Tafasca; Anshul Gupta; Jean-Marc Odobez; |
2004 | Towards General Low-Light Raw Noise Synthesis and Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although most recent works have adopted physics-based models to synthesize noise, the signal-independent noise in low-light conditions is far more complicated and varies dramatically across camera sensors, which is beyond the description of these models. To address this issue, we introduce a new perspective to synthesize the signal-independent noise by a generative model. |
Feng Zhang; Bin Xu; Zhiqiang Li; Xinran Liu; Qingbo Lu; Changxin Gao; Nong Sang; |
2005 | Combating Noisy Labels with Sample Selection By Mining High-Discrepancy Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to address the issue, we propose a simple yet effective method called CoDis. |
Xiaobo Xia; Bo Han; Yibing Zhan; Jun Yu; Mingming Gong; Chen Gong; Tongliang Liu; |
2006 | Beyond The Pixel: A Photometrically Calibrated HDR Dataset for Luminance and Color Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most computer vision tasks treat pixels without considering their relationship to physical luminance. To address this shortcoming, we introduce the Laval Photometric Indoor HDR Dataset, the first large-scale photometrically calibrated dataset of high dynamic range 360deg panoramas. |
Christophe Bolduc; Justine Giroux; Marc Hébert; Claude Demers; Jean-François Lalonde; |
2007 | What Can Discriminator Do? Towards Box-free Ownership Verification of Generative Adversarial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a novel IP protection scheme for GANs where ownership verification can be done by checking outputs only, without choosing the inputs (i.e., box-free setting). |
Ziheng Huang; Boheng Li; Yan Cai; Run Wang; Shangwei Guo; Liming Fang; Jing Chen; Lina Wang; |
2008 | When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to handle the problem and address the limitations of prior works, we propose a representation calibration method RCAL. |
Manyi Zhang; Xuyang Zhao; Jun Yao; Chun Yuan; Weiran Huang; |
2009 | Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. |
Fartash Faghri; Hadi Pouransari; Sachin Mehta; Mehrdad Farajtabar; Ali Farhadi; Mohammad Rastegari; Oncel Tuzel; |
2010 | An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adaptive ensemble attack, dubbed AdaEA, to adaptively control the fusion of the outputs from each model, via monitoring the discrepancy ratio of their contributions towards the adversarial objective. |
Bin Chen; Jiali Yin; Shukai Chen; Bohao Chen; Ximeng Liu; |
2011 | Incremental Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new model for IGCD which combines non-parametric categorization with efficient image sampling to mitigate catastrophic forgetting. |
Bingchen Zhao; Oisin Mac Aodha; |
2012 | Prototypical Mixing and Retrieval-Based Refinement for Label Noise-Resistant Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach called Prototypical Mixing and Retrieval-based Refinement (TITAN) for label noise-resistant image retrieval, which corrects label noise and mitigates the effects of the memorization simultaneously. |
Xinlong Yang; Haixin Wang; Jinan Sun; Shikun Zhang; Chong Chen; Xian-Sheng Hua; Xiao Luo; |
2013 | AccFlow: Backward Accumulation for Long-Range Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel recurrent framework called AccFlow, which recursively backward accumulates local flows using a deformable module called as AccPlus. |
Guangyang Wu; Xiaohong Liu; Kunming Luo; Xi Liu; Qingqing Zheng; Shuaicheng Liu; Xinyang Jiang; Guangtao Zhai; Wenyi Wang; |
2014 | Guiding Local Feature Matching with Surface Curvature Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method, named curvature similarity extractor (CSE), for improving local feature matching across images. |
Shuzhe Wang; Juho Kannala; Marc Pollefeys; Daniel Barath; |
2015 | 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose 3D-VisTA, a pre-trained Transformer for 3D Vis ion and Text Alignment that can be easily adapted to various downstream tasks. |
Ziyu Zhu; Xiaojian Ma; Yixin Chen; Zhidong Deng; Siyuan Huang; Qing Li; |
2016 | Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that different depth geometries have significant performance gaps, even using the same depth prediction error. |
Xinyi Ye; Weiyue Zhao; Tianqi Liu; Zihao Huang; Zhiguo Cao; Xin Li; |
2017 | SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that existing methods suffer at higher levels of sparsity in the data due to noisy pseudo-labels. To prevent this, we propose an end-to-end system that learns to separate the proposals into labeled and unlabeled regions using Pseudo-positive mining. |
Saksham Suri; Saketh Rambhatla; Rama Chellappa; Abhinav Shrivastava; |
2018 | Among Us: Adversarially Robust Collaborative Perception By Consensus Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently, we propose ROBOSAC, a novel sampling-based defense strategy generalizable to unseen attackers. |
Yiming Li; Qi Fang; Jiamu Bai; Siheng Chen; Felix Juefei-Xu; Chen Feng; |
2019 | BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Bottom-Up Patch Summarization approach named BUS which is inspired by the Document Summarization Task in NLP to learn a concise visual summary of lengthy visual token sequences, guided by textual semantics. |
Chaoya Jiang; Haiyang Xu; Wei Ye; Qinghao Ye; Chenliang Li; Ming Yan; Bin Bi; Shikun Zhang; Fei Huang; Songfang Huang; |
2020 | DiffusionDet: Diffusion Model for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DiffusionDet, a new framework that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. |
Shoufa Chen; Peize Sun; Yibing Song; Ping Luo; |
2021 | Forward Flow for Novel View Synthesis of Dynamic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a neural radiance field (NeRF) approach for novel view synthesis of dynamic scenes using forward warping. |
Xiang Guo; Jiadai Sun; Yuchao Dai; Guanying Chen; Xiaoqing Ye; Xiao Tan; Errui Ding; Yumeng Zhang; Jingdong Wang; |
2022 | CopyRNeRF: Protecting The CopyRight of Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by analyzing the pros and cons of possible copyright protection solutions, we propose to protect the copyright of NeRF models by replacing the original color representation in NeRF with a watermarked color representation. |
Ziyuan Luo; Qing Guo; Ka Chun Cheung; Simon See; Renjie Wan; |
2023 | Contrastive Model Adaptation for Cross-Condition Robustness in Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate normal-to-adverse condition model adaptation for semantic segmentation, whereby image-level correspondences are available in the target domain. |
David Brüggemann; Christos Sakaridis; Tim Broedermann; Luc Van Gool; |
2024 | SegRCDB: Semantic Segmentation Via Formula-Driven Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Segmentation Radial Contour DataBase (SegRCDB), which for the first time applies formula-driven supervised learning for semantic segmentation. |
Risa Shinoda; Ryo Hayamizu; Kodai Nakashima; Nakamasa Inoue; Rio Yokota; Hirokatsu Kataoka; |
2025 | Creative Birds: Self-Supervised Single-View 3D Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method for single-view 3D style transfer that generates a unique 3D object with both shape and texture transfer. |
Renke Wang; Guimin Que; Shuo Chen; Xiang Li; Jun Li; Jian Yang; |
2026 | LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To break the deadlock, we present LoTE-Animal, a large-scale endangered animal dataset collected over 12 years, to foster the application of deep learning in rare species conservation. |
Dan Liu; Jin Hou; Shaoli Huang; Jing Liu; Yuxin He; Bochuan Zheng; Jifeng Ning; Jingdong Zhang; |
2027 | DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes. |
Huan-ang Gao; Beiwen Tian; Pengfei Li; Hao Zhao; Guyue Zhou; |
2028 | Towards Inadequately Pre-trained Models in Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the contradictory phenomenon between FE and FT that a better feature extractor fails to be fine-tuned better accordingly, we conduct comprehensive analyses on features before the softmax layer to provide insightful explanations. |
Andong Deng; Xingjian Li; Di Hu; Tianyang Wang; Haoyi Xiong; Cheng-Zhong Xu; |
2029 | Boosting Novel Category Discovery Over Domains with Soft Contrastive Learning and All in One Classifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we found that domain variance can lead to more significant view-noise in unsupervised data augmentation, which affects the effectiveness of contrastive learning (CL) and causes the model to be overconfident in novel category discovery. To address these issues, a framework nameded Soft-contrastive All-in-one Network (SAN) is proposed for ODA and UNDA tasks. |
Zelin Zang; Lei Shang; Senqiao Yang; Fei Wang; Baigui Sun; Xuansong Xie; Stan Z. Li; |
2030 | Class-Aware Patch Embedding Adaptation for Few-Shot Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This could significantly reduce the efficiency of a large family of few-shot learning algorithms, which have limited data and highly rely on the comparison of image patches. To address this issue, we propose a Class-aware Patch Embedding Adaptation (CPEA) method to learn "class-aware embeddings" of the image patches. |
Fusheng Hao; Fengxiang He; Liu Liu; Fuxiang Wu; Dacheng Tao; Jun Cheng; |
2031 | SegPrompt: Boosting Open-World Segmentation Via Category-Level Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel training mechanism called SegPrompt that utilizes category information to improve the model’s class-agnostic segmentation ability for both known and unknown categories. |
Muzhi Zhu; Hengtao Li; Hao Chen; Chengxiang Fan; Weian Mao; Chenchen Jing; Yifan Liu; Chunhua Shen; |
2032 | Search for or Navigate To? Dual Adaptive Thinking for Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent object navigation methods consider using object association mostly to enhance the "search for" phase while neglecting the importance of the "navigate to" phase. Therefore, this paper proposes a dual adaptive thinking (DAT) method that flexibly adjusts thinking strategies in different navigation stages. |
Ronghao Dang; Liuyi Wang; Zongtao He; Shuai Su; Jiagui Tang; Chengju Liu; Qijun Chen; |
2033 | CL-MVSNet: Unsupervised Multi-View Stereo with Dual-Level Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous methods primarily depend on the photometric consistency assumption, which may suffer from two limitations: indistinguishable regions and view-dependent effects, e.g., low-textured areas and reflections. To address these issues, in this paper, we propose a new dual-level contrastive learning approach, named CL-MVSNet. |
Kaiqiang Xiong; Rui Peng; Zhe Zhang; Tianxing Feng; Jianbo Jiao; Feng Gao; Ronggang Wang; |
2034 | Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods. |
Erdong Hu; Yuxin Tang; Anastasios Kyrillidis; Chris Jermaine; |
2035 | HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from A Single Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce HOSNeRF, a novel 360deg free-viewpoint rendering method that reconstructs neural radiance fields for dynamic human-object-scene from a single monocular in-the-wild video. |
Jia-Wei Liu; Yan-Pei Cao; Tianyuan Yang; Zhongcong Xu; Jussi Keppo; Ying Shan; Xiaohu Qie; Mike Zheng Shou; |
2036 | OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep learning-based approach, called OmniZoomer, to incorporate the Mobius transformation into the network for movement and zoom on ODIs. |
Zidong Cao; Hao Ai; Yan-Pei Cao; Ying Shan; Xiaohu Qie; Lin Wang; |
2037 | Knowing Where to Focus: Event-aware Transformer for Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formulate an event-aware dynamic moment query to enable the model to take the input-specific content and positional information of the video into account. |
Jinhyun Jang; Jungin Park; Jin Kim; Hyeongjun Kwon; Kwanghoon Sohn; |
2038 | TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. |
Shilin Lu; Yanzhu Liu; Adams Wai-Kin Kong; |
2039 | Landscape Learning for Neural Network Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method that learns a loss landscape where gradient descent is efficient, bringing massive improvement and acceleration to the inversion process. |
Ruoshi Liu; Chengzhi Mao; Purva Tendulkar; Hao Wang; Carl Vondrick; |
2040 | Movement Enhancement Toward Multi-Scale Video Feature Representation for Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first design a Movement Enhance Module (MEM) to highlight movement feature for better action location, and then, we propose a Scale Feature Pyramid Network (SFPN) to detect multi-scale actions in videos. |
Zixuan Zhao; Dongqi Wang; Xu Zhao; |
2041 | Collaborative Propagation on Multiple Instance Graphs for 3D Instance Segmentation with Single-point Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, we propose a novel weakly supervised method RWSeg that only requires labeling one object with one point. With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information respectively to unknown regions using self-attention and a cross-graph random walk method. |
Shichao Dong; Ruibo Li; Jiacheng Wei; Fayao Liu; Guosheng Lin; |
2042 | PPR: Physically Plausible Reconstruction from Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given monocular videos, we build 3D models of articulated objects and environments whose 3D configurations satisfy dynamics and contact constraints. |
Gengshan Yang; Shuo Yang; John Z. Zhang; Zachary Manchester; Deva Ramanan; |
2043 | Single Image Deblurring with Row-dependent Blur Magnitude Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a novel in-between exposure mode, called global reset release (GRR) shutter, which produces GS-like blur but with row-dependent blur magnitude. |
Xiang Ji; Zhixiang Wang; Shin’ichi Satoh; Yinqiang Zheng; |
2044 | Robust Heterogeneous Federated Learning Under Data Corruption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a novel method named Augmented Heterogeneous Federated Learning (AugHFL), which consists of two stages: 1) In the local update stage, a corruption-robust data augmentation strategy is adopted to minimize the adverse effects of local corruption while enabling the models to learn rich local knowledge. |
Xiuwen Fang; Mang Ye; Xiyuan Yang; |
2045 | RMP-Loss: Regularizing Membrane Potential Distribution for Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the quantization error problem, we propose a regularizing membrane potential loss (RMP-Loss) to adjust the distribution which is directly related to quantization error to a range close to the spikes. |
Yufei Guo; Xiaode Liu; Yuanpei Chen; Liwen Zhang; Weihang Peng; Yuhan Zhang; Xuhui Huang; Zhe Ma; |
2046 | Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore how to ameliorate the quality of pseudo-labeling in MIDN. |
Yufei Yin; Jiajun Deng; Wengang Zhou; Li Li; Houqiang Li; |
2047 | Deep Active Contours for Real-time 6-DoF Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a learning-based active contour model to make the best use of both worlds. |
Long Wang; Shen Yan; Jianan Zhen; Yu Liu; Maojun Zhang; Guofeng Zhang; Xiaowei Zhou; |
2048 | Tangent Sampson Error: Fast Approximate Two-view Reprojection Error for Central Camera Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce the Tangent Sampson error, which is a generalization of the classical Sampson error in two-view geometry that allows for arbitrary central camera models. |
Mikhail Terekhov; Viktor Larsson; |
2049 | Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. |
Nian Liu; Kepan Nan; Wangbo Zhao; Yuanwei Liu; Xiwen Yao; Salman Khan; Hisham Cholakkal; Rao Muhammad Anwer; Junwei Han; Fahad Shahbaz Khan; |
2050 | Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most diffusion-based inverse problem-solving methods only deal with 2D images, and even recently published 3D methods do not fully exploit the 3D distribution prior. To address this, we propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem. |
Suhyeon Lee; Hyungjin Chung; Minyoung Park; Jonghyuk Park; Wi-Sun Ryu; Jong Chul Ye; |
2051 | Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the abundance of temporal data in the form of videos, this information-rich source has been largely overlooked. Our paper aims to address this gap by proposing a novel approach that incorporates temporal consistency in dense self-supervised learning. |
Mohammadreza Salehi; Efstratios Gavves; Cees G.M. Snoek; Yuki M. Asano; |
2052 | CroCo V2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene which makes it well suited for binocular downstream tasks. |
Philippe Weinzaepfel; Thomas Lucas; Vincent Leroy; Yohann Cabon; Vaibhav Arora; Romain Brégier; Gabriela Csurka; Leonid Antsfeld; Boris Chidlovskii; Jerome Revaud; |
2053 | ExBluRF: Efficient Radiance Fields for Extreme Motion Blurred Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ExBluRF, a novel view synthesis method for extreme motion blurred images based on efficient radiance fields optimization. |
Dongwoo Lee; Jeongtaek Oh; Jaesung Rim; Sunghyun Cho; Kyoung Mu Lee; |
2054 | MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. |
Wenxuan Zeng; Meng Li; Wenjie Xiong; Tong Tong; Wen-jie Lu; Jin Tan; Runsheng Wang; Ru Huang; |
2055 | Online Class Incremental Learning on Stochastic Blurry Task Boundary Via Mask and Visual Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new Stochastic incremental Blurry task boundary scenario, called Si-Blurry, which reflects the stochastic properties of the real-world. |
Jun-Yeong Moon; Keon-Hee Park; Jung Uk Kim; Gyeong-Moon Park; |
2056 | Text2Video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets. In this paper, we introduce a new task, zero-shot text-to-video generation, and propose a low-cost approach (without any training or optimization) by leveraging the power of existing text-to-image synthesis methods (e.g., Stable Diffusion), making them suitable for the video domain. |
Levon Khachatryan; Andranik Movsisyan; Vahram Tadevosyan; Roberto Henschel; Zhangyang Wang; Shant Navasardyan; Humphrey Shi; |
2057 | Masked Spiking Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works on this topic typically rely on direct training, which can lead to suboptimal performance. To address this issue, we propose to leverage the benefits of the ANN-to-SNN conversion method to combine SNNs and Transformers, resulting in significantly improved performance over existing state-of-the-art SNN models. |
Ziqing Wang; Yuetong Fang; Jiahang Cao; Qiang Zhang; Zhongrui Wang; Renjing Xu; |
2058 | Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. |
Haoning Wu; Erli Zhang; Liang Liao; Chaofeng Chen; Jingwen Hou; Annan Wang; Wenxiu Sun; Qiong Yan; Weisi Lin; |
2059 | Distributed Bundle Adjustment with Block-Based Sparse Matrix Compression for Super Large Scale Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets. |
Maoteng Zheng; Nengcheng Chen; Junfeng Zhu; Xiaoru Zeng; Huanbin Qiu; Yuyao Jiang; Xingyue Lu; Hao Qu; |
2060 | SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a novel concept of a retrieval system referred to as Scene Complexity Aware Network (SCANet), which measures the `scene complexity’ of multiple scenes in each video and generates adaptive proposals responding to variable complexities of scenes in each video. |
Sunjae Yoon; Gwanhyeong Koo; Dahyun Kim; Chang D. Yoo; |
2061 | Neural Interactive Keypoint Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes an end-to-end neural interactive keypoint detection framework named Click-Pose, which can significantly reduce more than 10 times labeling costs of 2D keypoint annotation compared with manual-only annotation. |
Jie Yang; Ailing Zeng; Feng Li; Shilong Liu; Ruimao Zhang; Lei Zhang; |
2062 | Joint Implicit Neural Representation for High-fidelity and Compact Vector Fonts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to learn vector fonts from pixelated font images utilizing a joint neural representation that consists of a signed distance field (SDF) and a probabilistic corner field (CF) to capture shape corner details. |
Chia-Hao Chen; Ying-Tian Liu; Zhifei Zhang; Yuan-Chen Guo; Song-Hai Zhang; |
2063 | Spurious Features Everywhere – Large-Scale Detection of Harmful Spurious Features in ImageNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a framework that allows us to systematically identify spurious features in large datasets like ImageNet. |
Yannic Neuhaus; Maximilian Augustin; Valentyn Boreiko; Matthias Hein; |
2064 | Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models. |
Baoshuo Kan; Teng Wang; Wenpeng Lu; Xiantong Zhen; Weili Guan; Feng Zheng; |
2065 | Delicate Textured Mesh Recovery from NeRF Via Adaptive Surface Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their implicit volumetric representations differ significantly from the widely-adopted polygonal meshes and lack support from common 3D software and hardware, making their rendering and manipulation inefficient. To overcome this limitation, we present a novel framework that generates textured surface meshes from images. |
Jiaxiang Tang; Hang Zhou; Xiaokang Chen; Tianshu Hu; Errui Ding; Jingdong Wang; Gang Zeng; |
2066 | Leveraging Inpainting for Single-Image Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find that pretraining shadow removal networks on the image inpainting dataset can reduce the shadow remnants significantly: a naive encoder-decoder network gets competitive restoration quality w.r.t. the state-of-the-art methods via only 10% shadow & shadow-free image pairs. |
Xiaoguang Li; Qing Guo; Rabab Abdelfattah; Di Lin; Wei Feng; Ivor Tsang; Song Wang; |
2067 | Neural Characteristic Function Learning for Conditional Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the success, cGANs have been consistently put under scrutiny due to their ill-posed discrepancy measure between distributions, leading to mode collapse and instability problems in training. To address this issue, we propose a novel conditional characteristic function generative adversarial network (CCF-GAN) to reduce the discrepancy by the characteristic functions (CFs), which is able to learn accurate distance measure of joint distributions under theoretical soundness. |
Shengxi Li; Jialu Zhang; Yifei Li; Mai Xu; Xin Deng; Li Li; |
2068 | Accurate 3D Face Reconstruction with Facial Component Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TokenFace, a transformer-based monocular 3D face reconstruction model. |
Tianke Zhang; Xuangeng Chu; Yunfei Liu; Lijian Lin; Zhendong Yang; Zhengzhuo Xu; Chengkun Cao; Fei Yu; Changyin Zhou; Chun Yuan; Yu Li; |
2069 | Holistic Label Correction for Noisy Multi-Label Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we bring label dependence to tackle the problem of multi-label classification with noisy labels. |
Xiaobo Xia; Jiankang Deng; Wei Bao; Yuxuan Du; Bo Han; Shiguang Shan; Tongliang Liu; |
2070 | Probabilistic Precision and Recall Towards Reliable Evaluation of Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose novel metrics, P-precision and P-recall (PP&PR), based on a probabilistic approach that address the problems. |
Dogyun Park; Suhyun Kim; |
2071 | Deep Multitask Learning with Progressive Parameter Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel progressive parameter-sharing strategy (MPPS) in this paper for effectively training multitask learning models on diverse computer vision tasks simultaneously. |
Haosen Shi; Shen Ren; Tianwei Zhang; Sinno Jialin Pan; |
2072 | Personalized Semantics Excitation for Federated Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing PFL methods typically customize the local model by fine-tuning with limited local supervision and the global model regularizer, which secures local specificity but risks ruining the global discriminative knowledge. In this paper, we propose a novel Personalized Semantics Excitation (PSE) mechanism to breakthrough this limitation by exciting and fusing personalized semantics from the global model during local model customization. |
Haifeng Xia; Kai Li; Zhengming Ding; |
2073 | Unified Data-Free Compression: Pruning and Quantization Without Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework named Unified Data-Free Compression(UDFC), which performs pruning and quantization simultaneously without any data and fine-tuning process. |
Shipeng Bai; Jun Chen; Xintian Shen; Yixuan Qian; Yong Liu; |
2074 | SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images. |
Yi Wei; Linqing Zhao; Wenzhao Zheng; Zheng Zhu; Jie Zhou; Jiwen Lu; |
2075 | Temporal Enhanced Training of Multi-view 3D Object Detector Via Historical Object Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. |
Zhuofan Zong; Dongzhi Jiang; Guanglu Song; Zeyue Xue; Jingyong Su; Hongsheng Li; Yu Liu; |
2076 | PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given two sets of multi-view images of an object in two static articulation states, we decouple the movable part from the static part and reconstruct shape and appearance while predicting the motion parameters. To tackle this problem, we present PARIS: a self-supervised, end-to-end architecture that learns part-level implicit shape and appearance models and optimizes motion parameters jointly without any 3D supervision, motion, or semantic annotation. |
Jiayi Liu; Ali Mahdavi-Amiri; Manolis Savva; |
2077 | OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we break up the previous offline belief and propose a simple yet effective online model using explicit query propagation, named OnlineRefer. |
Dongming Wu; Tiancai Wang; Yuang Zhang; Xiangyu Zhang; Jianbing Shen; |
2078 | Implicit Neural Representation for Cooperative Low-light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The following three factors restrict the application of existing low-light image enhancement methods: unpredictable brightness degradation and noise, inherent gap between metric-favorable and visual-friendly versions, and the limited paired training data. To address these limitations, we propose an implicit Neural Representation method for Cooperative low-light image enhancement, dubbed NeRCo. |
Shuzhou Yang; Moxuan Ding; Yanmin Wu; Zihan Li; Jian Zhang; |
2079 | Environment Agnostic Representation for Visual Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Environment Agnostic Reinforcement learning (EAR), which is a compact framework for domain generalization of the visual deep RL. |
Hyesong Choi; Hunsang Lee; Seongwon Jeong; Dongbo Min; |
2080 | Deep Multiview Clustering By Contrasting Cluster Assignments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a cross-view contrastive learning (CVCL) method that learns view-invariant representations and produces clustering results by contrasting the cluster assignments among multiple views. |
Jie Chen; Hua Mao; Wai Lok Woo; Xi Peng; |
2081 | Mimic3D: Thriving 3D-Aware GANs Via 3D-to-2D Imitation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Improving the photorealism via CNN-based 2D super-resolution can break the strict 3D consistency, while keeping the 3D consistency by learning high-resolution 3D representations for direct rendering often compromises image quality. In this paper, we propose a novel learning strategy, namely 3D-to-2D imitation, which enables a 3D-aware GAN to generate high-quality images while maintaining their strict 3D consistency, by letting the images synthesized by the generator’s 3D rendering branch mimic those generated by its 2D super-resolution branch. |
Xingyu Chen; Yu Deng; Baoyuan Wang; |
2082 | Look at The Neighbor: Distortion-aware Unsupervised Domain Adaptation for Panoramic Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find that the pixels’ neighborhood regions of the ERP indeed introduce less distortion. |
Xu Zheng; Tianbo Pan; Yunhao Luo; Lin Wang; |
2083 | Rethinking Safe Semi-supervised Learning: Transferring The Open-set Problem to A Close-set One Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When we are experimenting with the mainstream safe SSL methods, we have a surprising finding that all OOD data show a clear tendency to gather in the feature space. This inspires us to solve the safe SSL problem from a fresh perspective. |
Qiankun Ma; Jiyao Gao; Bo Zhan; Yunpeng Guo; Jiliu Zhou; Yan Wang; |
2084 | Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct the first measurement study on whether and how effectively the existing designs can lead to system-level effects, especially for the STOP sign-evasion attacks due to their popularity and severity. |
Ningfei Wang; Yunpeng Luo; Takami Sato; Kaidi Xu; Qi Alfred Chen; |
2085 | ReLeaPS : Reinforcement Learning-based Illumination Planning for Generalized Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It depends on factors such as the unknown shape and general reflectance of the target object, global illumination, and the choice of photometric stereo backbones, which are too complex to be handled by existing methods based on handcrafted illumination planning rules. This paper proposes a learning-based illumination planning method that jointly considers these factors via integrating a neural network and a generalized image formation model. |
Jun Hoong Chan; Bohan Yu; Heng Guo; Jieji Ren; Zongqing Lu; Boxin Shi; |
2086 | Learning Foresightful Dense Visual Affordance for Deformable Object Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study deformable object manipulation using dense visual affordance, with generalization towards diverse states, and propose a novel kind of foresightful dense affordance, which avoids local optima by estimating states’ values for long-term manipulation. |
Ruihai Wu; Chuanruo Ning; Hao Dong; |
2087 | Generalizable Neural Fields As Partially Observed Neural Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. |
Jeffrey Gu; Kuan-Chieh Wang; Serena Yeung; |
2088 | CiteTracker: Correlating Image and Text for Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the CiteTracking algorithm to enhance target modeling and inference in visual tracking by connecting images and text. |
Xin Li; Yuqing Huang; Zhenyu He; Yaowei Wang; Huchuan Lu; Ming-Hsuan Yang; |
2089 | Adding Conditional Control to Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. |
Lvmin Zhang; Anyi Rao; Maneesh Agrawala; |
2090 | 3D Instance Segmentation Via Enhanced Spatial and Semantic Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a novel approach for 3D point clouds instance segmentation that addresses the challenge of generating distinct instance masks for objects that share similar appearances but are spatially separated. |
Salwa Al Khatib; Mohamed El Amine Boudjoghra; Jean Lahoud; Fahad Shahbaz Khan; |
2091 | Unleashing Text-to-Image Diffusion Models for Visual Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose VPD (Visual Perception with pre-trained Diffusion models), a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks. |
Wenliang Zhao; Yongming Rao; Zuyan Liu; Benlin Liu; Jie Zhou; Jiwen Lu; |
2092 | Iterative Superquadric Recomposition of 3D Objects from Multiple Views Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision. |
Stephan Alaniz; Massimiliano Mancini; Zeynep Akata; |
2093 | PHRIT: Parametric Hand Representation with Implicit Template Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose PHRIT, a novel approach for parametric hand mesh modeling with an implicit template that combines the advantages of both parametric meshes and implicit representations. |
Zhisheng Huang; Yujin Chen; Di Kang; Jinlu Zhang; Zhigang Tu; |
2094 | BEVPlace: Learning LiDAR-based Place Recognition Using Bird’s Eye View Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the potential of a different representation in place recognition, i.e. bird’s eye view (BEV) images. |
Lun Luo; Shuhang Zheng; Yixuan Li; Yongzhi Fan; Beinan Yu; Si-Yuan Cao; Junwei Li; Hui-Liang Shen; |
2095 | Transferable Adversarial Attack for Both Vision Transformers and Convolutional Networks Via Momentum Integrated Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel attack method named Momentum Integrated Gradients (MIG), which not only attacks ViTs with high success rate, but also exhibits impressive transferability across ViTs and CNNs. |
Wenshuo Ma; Yidong Li; Xiaofeng Jia; Wei Xu; |
2096 | TrajPAC: Towards Robustness Verification of Pedestrian Trajectory Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although previous works have studied adversarial robustness in the context of trajectory forecasting, some significant issues remain unaddressed. In this work, we try to tackle these crucial problems. |
Liang Zhang; Nathaniel Xu; Pengfei Yang; Gaojie Jin; Cheng-Chao Huang; Lijun Zhang; |
2097 | Adaptive Image Anonymization in The Context of Image Classification with Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, applying anonymization causes the classifier to provide different class decisions before and after applying it and therefore reduces the classifier’s reliability and usability. In order to achieve a robust solution to this problem we propose a novel anonymization procedure that allows the existing classifiers to become class decision invariant on the anonymized images without any modification requires to apply on the classification models. |
Nadiya Shvai; Arcadi Llanza Carmona; Amir Nakib; |
2098 | SiLK: Simple Learned Keypoints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the design of existing keypoint detectors by deconstructing their methodologies and identifying the key components. |
Pierre Gleize; Weiyao Wang; Matt Feiszli; |
2099 | EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents EfficientViT, a new family of high-resolution vision models with novel lightweight multi-scale attention. |
Han Cai; Junyan Li; Muyan Hu; Chuang Gan; Song Han; |
2100 | Efficient Neural Supersampling on A Novel Gaming Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work introduces a novel neural algorithm for supersampling rendered content that is 4x more efficient than existing methods while maintaining the same level of accuracy. |
Antoine Mercier; Ruan Erasmus; Yashesh Savani; Manik Dhingra; Fatih Porikli; Guillaume Berger; |
2101 | Rapid Adaptation in Online Continual Learning: Are We Evaluating It Right? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study reveals that existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information, suggesting that they unintentionally learn spurious label correlations. To address this issue, we propose a novel metric for measuring adaptation based on the accuracy on the near-future samples, where spurious correlations are removed. |
Hasan Abed Al Kader Hammoud; Ameya Prabhu; Ser-Nam Lim; Philip H.S. Torr; Adel Bibi; Bernard Ghanem; |
2102 | Label-Efficient Online Continual Object Detection in Streaming Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a plug-and-play module, Efficient-CLS, that can be easily inserted into and improve existing continual learners for object detection in video streams with reduced data annotation costs and model retraining time. |
Jay Zhangjie Wu; David Junhao Zhang; Wynne Hsu; Mengmi Zhang; Mike Zheng Shou; |
2103 | Learning Point Cloud Completion Without Complete Point Clouds: A Pose-Aware Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a remedy, in this paper, we propose a novel point cloud completion framework without using any complete point cloud at all. |
Jihun Kim; Hyeokjun Kweon; Yunseo Yang; Kuk-Jin Yoon; |
2104 | Frequency Guidance Matters in Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the effect of different frequency components on the few-shot learning tasks. |
Hao Cheng; Siyuan Yang; Joey Tianyi Zhou; Lanqing Guo; Bihan Wen; |
2105 | Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a community, we have made tremendous progress in within-domain LiDAR semantic segmentation. However, do these methods generalize across domains? To answer this question, we design the first experimental setup for studying domain generalization (DG) for LiDAR semantic segmentation (DG-LSS). |
Cristiano Saltori; Aljosa Osep; Elisa Ricci; Laura Leal-Taixé; |
2106 | Diverse Cotraining Makes Strong Semi-Supervised Segmentor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the core assumption that supports co-training: multiple compatible and conditionally independent views. |
Yijiang Li; Xinjiang Wang; Lihe Yang; Litong Feng; Wayne Zhang; Ying Gao; |
2107 | Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, three detailed issues, namely blurry edges, noisy surfaces, and over-transferred RGB texture, need to be addressed. In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues. |
Zixiang Zhao; Jiangshe Zhang; Xiang Gu; Chengli Tan; Shuang Xu; Yulun Zhang; Radu Timofte; Luc Van Gool; |
2108 | Tiled Multiplane Images for Practical 3D Photography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for generating a TMPI with adaptive depth planes for single-view 3D photography in the wild. |
Numair Khan; Lei Xiao; Douglas Lanman; |
2109 | VQA-GNN: Reasoning with Multimodal Knowledge Via Graph Neural Networks for Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To perform more expressive reasoning, we propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations. |
Yanan Wang; Michihiro Yasunaga; Hongyu Ren; Shinya Wada; Jure Leskovec; |
2110 | Unmasked Teacher: Towards Training-Efficient Video Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a training-efficient method for temporal-sensitive VFMs that integrates the benefits of existing methods. |
Kunchang Li; Yali Wang; Yizhuo Li; Yi Wang; Yinan He; Limin Wang; Yu Qiao; |
2111 | Explore and Tell: Embodied Visual Captioning in 3D Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To support this task, we build the ET-Cap dataset with Kubric simulator, consisting of 10K 3D scenes with cluttered objects and three annotated paragraphs per scene. We propose a Cascade Embodied Captioning model (CaBOT), which comprises of a navigator and a captioner, to tackle this task. |
Anwen Hu; Shizhe Chen; Liang Zhang; Qin Jin; |
2112 | FastViT: A Fast Hybrid Vision Transformer Using Structural Reparameterization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. |
Pavan Kumar Anasosalu Vasu; James Gabriel; Jeff Zhu; Oncel Tuzel; Anurag Ranjan; |
2113 | OFVL-MS: Once for Visual Localization Across Multiple Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we seek to predict camera poses across scenes with a multi-task learning manner, where we view the localization of each scene as a new task. |
Tao Xie; Kun Dai; Siyi Lu; Ke Wang; Zhiqiang Jiang; Jinghan Gao; Dedong Liu; Jie Xu; Lijun Zhao; Ruifeng Li; |
2114 | HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads to the key difficulty in RVOS, i.e., various descriptions of different ob- jects are corresponding to different temporal scales in the video, which is ignored by most existing approaches with single stride of frame sampling. To tackle this problem, we propose a concise Hybrid Temporal-scale Multimodal Learning (HTML) framework, which can effectively align lingual and visual features to discover core object semantics in the video, by learning multimodal interaction hierarchically from different temporal scales. |
Mingfei Han; Yali Wang; Zhihui Li; Lina Yao; Xiaojun Chang; Yu Qiao; |
2115 | SQAD: Automatic Smartphone Camera Quality Assessment and Benchmarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the Smartphone Camera Quality Assessment Dataset (SQAD), which includes natural images captured by 29 devices. |
Zilin Fang; Andrey Ignatov; Eduard Zamfir; Radu Timofte; |
2116 | PointDC: Unsupervised Semantic Segmentation of 3D Point Clouds Via Cross-Modal Distillation and Super-Voxel Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works of unsupervised pipeline on 2D images fails in this task of point clouds, due to: 1) Clustering Ambiguity caused by limited magnitude of data and imbalanced class distribution; 2) Irregularity Ambiguity caused by the irregular sparsity of point cloud. Therefore, we propose a novel framework, PointDC, which is comprised of two steps that handles the aforementioned problems respectively: Cross-Modal Distillation (CVD) and Super-Voxel Clustering (SVC). |
Zisheng Chen; Hongbin Xu; Weitao Chen; Zhipeng Zhou; Haihong Xiao; Baigui Sun; Xuansong Xie; Wenxiong kang; |
2117 | MV-Map: Offboard HD-Map Generation with Multi-view Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel offboard pipeline called MV-Map that capitalizes multi-view consistency and can handle an arbitrary number of frames with the key desgin of a "region-centric" framework. |
Ziyang Xie; Ziqi Pang; Yu-Xiong Wang; |
2118 | Multi-view Self-supervised Disentanglement for General Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we instead propose to learn to disentangle the noisy image, under the intuitive assumption that different corrupted versions of the same clean image share a common latent space. |
Hao Chen; Chenyuan Qu; Yu Zhang; Chen Chen; Jianbo Jiao; |
2119 | Inter-Realization Channels: Unsupervised Anomaly Detection Beyond One-Class Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Assuming training images to be realizations of the underlying image distribution, it follows that nominal patches from these realizations will be well associated between and represented across realizations. From this, we propose Inter-Realization Channels (InReaCh), a fully unsupervised method of detecting and localizing anomalies. |
Declan McIntosh; Alexandra Branzan Albu; |
2120 | Multi-Event Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task. |
Gengyuan Zhang; Jisen Ren; Jindong Gu; Volker Tresp; |
2121 | SHERF: Generalizable Human NeRF from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose SHERF, the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image. |
Shoukang Hu; Fangzhou Hong; Liang Pan; Haiyi Mei; Lei Yang; Ziwei Liu; |
2122 | MVPSNet: Fast Generalizable Multi-view Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a fast and generalizable solution to Multiview Photometric Stereo (MVPS), called MVPSNet. |
Dongxu Zhao; Daniel Lichy; Pierre-Nicolas Perrin; Jan-Michael Frahm; Soumyadip Sengupta; |
2123 | High Quality Entity Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images. |
Lu Qi; Jason Kuen; Tiancheng Shen; Jiuxiang Gu; Wenbo Li; Weidong Guo; Jiaya Jia; Zhe Lin; Ming-Hsuan Yang; |
2124 | CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to explore fundamental affordances rather than object categories, i.e., common attributes that enable different objects to accomplish the same task. |
Jiajin Tang; Ge Zheng; Jingyi Yu; Sibei Yang; |
2125 | You Never Get A Second Chance To Make A Good First Impression: Seeding Active Learning for 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SeedAL, a method to seed active learning for efficient annotation of 3D point clouds for semantic segmentation. |
Nermin Samet; Oriane Siméoni; Gilles Puy; Georgy Ponimatkin; Renaud Marlet; Vincent Lepetit; |
2126 | Scalable Multi-Temporal Remote Sensing Change Data Generation Via Simulating Stochastic Change Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a scalable multi-temporal remote sensing change data generator via generative modeling, which is cheap and automatic, alleviating these problems. |
Zhuo Zheng; Shiqi Tian; Ailong Ma; Liangpei Zhang; Yanfei Zhong; |
2127 | Human from Blur: Human Pose Tracking from Blurry Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to estimate 3D human poses from substantially blurred images. |
Yiming Zhao; Denys Rozumnyi; Jie Song; Otmar Hilliges; Marc Pollefeys; Martin R. Oswald; |
2128 | NerfAcc: Efficient Sampling Accelerates NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate and compare multiple sampling approaches and demonstrate that improved sampling is generally applicable across NeRF variants under an unified concept of transmittance estimator. |
Ruilong Li; Hang Gao; Matthew Tancik; Angjoo Kanazawa; |
2129 | A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present accumulator-aware quantization (A2Q), a novel weight quantization method designed to train quantized neural networks (QNNs) to avoid overflow when using low-precision accumulators during inference. |
Ian Colbert; Alessandro Pappalardo; Jakoba Petri-Koenig; |
2130 | Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent advances in 2D vision that unify image segmentation and detection by Transformer-based models, we present Uni-3D, a holistic 3D scene parsing/reconstruction system for a single RGB image. |
Xiang Zhang; Zeyuan Chen; Fangyin Wei; Zhuowen Tu; |
2131 | ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, the robot’s ability to follow human instructions based on grounding the actions and states is limited. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. |
Ran Gong; Jiangyong Huang; Yizhou Zhao; Haoran Geng; Xiaofeng Gao; Qingyang Wu; Wensi Ai; Ziheng Zhou; Demetri Terzopoulos; Song-Chun Zhu; Baoxiong Jia; Siyuan Huang; |
2132 | Full-Body Articulated Human-Object Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the challenging problem of Full-Body Articulated Human-Object Interaction (f-AHOI), wherein the whole human bodies interact with articulated objects, whose parts are connected by movable joints. |
Nan Jiang; Tengyu Liu; Zhexuan Cao; Jieming Cui; Zhiyuan Zhang; Yixin Chen; He Wang; Yixin Zhu; Siyuan Huang; |
2133 | FeatureNeRF: Learning Generalizable NeRFs By Distilling Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework named FeatureNeRF to learn generalizable NeRFs by distilling pre-trained vision foundation models (e.g., DINO, Latent Diffusion). |
Jianglong Ye; Naiyan Wang; Xiaolong Wang; |
2134 | SRFormer: Permuted Self-Attention for Single Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present SRFormer, a simple but novel method that can enjoy the benefit of large window self-attention but introduces even less computational burden. |
Yupeng Zhou; Zhen Li; Chun-Le Guo; Song Bai; Ming-Ming Cheng; Qibin Hou; |
2135 | Deep Homography Mixture for Single Image Rolling Shutter Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a deep homography mixture motion model for single image rolling shutter correction. |
Weilong Yan; Robby T. Tan; Bing Zeng; Shuaicheng Liu; |
2136 | Audio-Visual Glance Network for Efficient Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep learning has made significant strides in video understanding tasks, but the computation required to classify lengthy and massive videos using clip-level video classifiers remains impractical and prohibitively expensive. To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video. |
Muhammad Adi Nugroho; Sangmin Woo; Sumin Lee; Changick Kim; |
2137 | CLNeRF: Continual Learning Meets NeRF Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To study other practical scene changes, we propose a new dataset, World Across Time (WAT), consisting of scenes that change in appearance and geometry over time. |
Zhipeng Cai; Matthias Müller; |
2138 | Rendering Humans from Object-Occluded Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Firstly, the standard rendering strategy relies on point-point mapping, which could lead to dramatic disparities between the visible and occluded areas of the body. Secondly, the naive direct regression approach does not consider any feasibility criteria (i.e., prior information) for rendering under occlusions. To tackle the above drawbacks, we present OccNeRF, a neural rendering method that achieves better rendering of humans in severely occluded scenes. |
Tiange Xiang; Adam Sun; Jiajun Wu; Ehsan Adeli; Li Fei-Fei; |
2139 | CrossMatch: Source-Free Domain Adaptive Semantic Segmentation Via Cross-Modal Consistency Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel asymmetric two-stream architecture that learns more robustly from noisy pseudo labels. |
Yifang Yin; Wenmiao Hu; Zhenguang Liu; Guanfeng Wang; Shili Xiang; Roger Zimmermann; |
2140 | Out-of-Distribution Detection for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by anomaly detection, we propose to detect OOD images from an encoder-decoder depth estimation model based on the reconstruction error. |
Julia Hornauer; Adrian Holzbock; Vasileios Belagiannis; |
2141 | STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a training objective, Bootstrapped Multi-Cue Contrastive (BMC2) loss to learn discriminative representations for various steps without any labels. |
Anshul Shah; Benjamin Lundell; Harpreet Sawhney; Rama Chellappa; |
2142 | Improving Equivariance in State-of-the-Art Supervised Depth and Normal Predictors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The problem exists even when crop-and-resize data augmentation is employed during training. To remedy this, we propose an equivariant regularization technique, consisting of an averaging procedure and a self-consistency loss, to explicitly promote cropping-and-resizing equivariance in depth and normal networks. |
Yuanyi Zhong; Anand Bhattad; Yu-Xiong Wang; David Forsyth; |
2143 | Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We pose three unresolved issues with the existing methods: lack of robustness on unseen views during training, vulnerability to occlusion, and severe jittering in the output. As a remedy, we propose POTR-3D, the first realization of a sequence-to-sequence 2D-to-3D lifting model for 3DMPPE, powered by a novel geometry-aware data augmentation strategy, capable of generating unbounded data with a variety of views while caring about the ground plane and occlusions. |
Sungchan Park; Eunyi You; Inhoe Lee; Joonseok Lee; |
2144 | Reducing Training Time in Cross-Silo Federated Learning Using Multigraph Topology Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new multigraph topology for cross-silo federated learning. |
Tuong Do; Binh X. Nguyen; Vuong Pham; Toan Tran; Erman Tjiputra; Quang D. Tran; Anh Nguyen; |
2145 | Counting Crowds in Bad Weather Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for robust crowd counting in adverse weather scenarios. |
Zhi-Kai Huang; Wei-Ting Chen; Yuan-Chun Chiang; Sy-Yen Kuo; Ming-Hsuan Yang; |
2146 | FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions. |
Jiwen Yu; Yinhuai Wang; Chen Zhao; Bernard Ghanem; Jian Zhang; |
2147 | UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose UniT3D, a simple yet effective fully unified transformer-based architecture for jointly solving 3D visual grounding and dense captioning. |
Zhenyu Chen; Ronghang Hu; Xinlei Chen; Matthias Nießner; Angel X. Chang; |
2148 | SKiT: A Fast Key Information Video Transformer for Online Surgical Phase Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To sum up, we propose an effective and efficient model for surgical phase recognition that leverages key global information. |
Yang Liu; Jiayu Huo; Jingjing Peng; Rachel Sparks; Prokar Dasgupta; Alejandro Granados; Sebastien Ourselin; |
2149 | Clustering Based Point Cloud Representation Learning for 3D Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a response, we propose a clustering based supervised learning scheme for point cloud analysis. |
Tuo Feng; Wenguan Wang; Xiaohan Wang; Yi Yang; Qinghua Zheng; |
2150 | Automatic Network Pruning Via Hilbert-Schmidt Independence Criterion Lasso Under Information Bottleneck Principle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This constructs heavy and unintended dependencies on heuristics and expert experience for both the objective and the parameters of the pruning approach. In this paper, we try to solve this problem by introducing a principled and unified framework based on Information Bottleneck (IB) theory, which further guides us to an automatic pruning approach. |
Song Guo; Lei Zhang; Xiawu Zheng; Yan Wang; Yuchao Li; Fei Chao; Chenglin Wu; Shengchuan Zhang; Rongrong Ji; |
2151 | Forecast-MAE: Self-supervised Pre-training for Motion Forecasting with Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This study explores the application of self-supervised learning (SSL) to the task of motion forecasting, an area that has not yet been extensively investigated despite the widespread success of SSL in computer vision and natural language processing. To address this gap, we introduce Forecast-MAE, an extension of the mask autoencoders framework that is specifically designed for self-supervised learning of the motion forecasting task. |
Jie Cheng; Xiaodong Mei; Ming Liu; |
2152 | Efficient Transformer-based 3D Object Detection with Dynamic Token Halting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective approach for accelerating transformer-based 3D object detectors by dynamically halting tokens at different layers depending on their contribution to the detection task. |
Mao Ye; Gregory P. Meyer; Yuning Chai; Qiang Liu; |
2153 | Neglected Free Lunch – Learning Image Classifiers Using Annotation Byproducts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotation byproducts Z provide approximate human attention that weakly guides the model to focus on the foreground cues, reducing spurious correlations and discouraging shortcut learning. |
Dongyoon Han; Junsuk Choe; Seonghyeok Chun; John Joon Young Chung; Minsuk Chang; Sangdoo Yun; Jean Y. Song; Seong Joon Oh; |
2154 | Rethinking The Role of Pre-Trained Networks in Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to integrate the pre-trained network into the target adaptation process as it has diversified features important for generalization and provides an alternate view of features and classification decisions different from the source model. |
Wenyu Zhang; Li Shen; Chuan-Sheng Foo; |
2155 | RLIPv2: Fast Scaling of Relational Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data. |
Hangjie Yuan; Shiwei Zhang; Xiang Wang; Samuel Albanie; Yining Pan; Tao Feng; Jianwen Jiang; Dong Ni; Yingya Zhang; Deli Zhao; |
2156 | TransFace: Calibrating Transformer Training for Face Recognition from A Data-Centric Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate the reasons for this phenomenon and discover that the existing data augmentation approach and hard sample mining strategy are incompatible with ViTs-based FR backbone due to the lack of tailored consideration on preserving face structural information and leveraging each local token information. To remedy these problems, this paper proposes a superior FR model called TransFace, which employs a patch-level data augmentation strategy named DPAP and a hard sample mining strategy named EHSM. |
Jun Dan; Yang Liu; Haoyu Xie; Jiankang Deng; Haoran Xie; Xuansong Xie; Baigui Sun; |
2157 | LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models to do few-shot planning for embodied agents. |
Chan Hee Song; Jiaman Wu; Clayton Washington; Brian M Sadler; Wei-Lun Chao; Yu Su; |
2158 | Exploring Model Transferability Through The Lens of Potential Energy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels, but they overlook the impact of underlying representation dynamics during fine-tuning, leading to unreliable results, especially for self-supervised models. In this paper, we present an insightful physics-inspired approach named PED to address these challenges. |
Xiaotong Li; Zixuan Hu; Yixiao Ge; Ying Shan; Ling-Yu Duan; |
2159 | Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures. To enable such an investigation, we design a new challenge, Video Task Decathlon (VTD), which includes ten representative image and video tasks spanning classification, segmentation, localization, and association of objects and pixels. |
Thomas E. Huang; Yifan Liu; Luc Van Gool; Fisher Yu; |
2160 | Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Aria Digital Twin (ADT) – an egocentric dataset captured using Aria glasses with extensive object, environment, and human level ground truth. |
Xiaqing Pan; Nicholas Charron; Yongqian Yang; Scott Peters; Thomas Whelan; Chen Kong; Omkar Parkhi; Richard Newcombe; Yuheng (Carl) Ren; |
2161 | PreSTU: Pre-Training for Scene-Text Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose PreSTU, a novel pre-training recipe dedicated to scene-text understanding (STU). |
Jihyung Kil; Soravit Changpinyo; Xi Chen; Hexiang Hu; Sebastian Goodman; Wei-Lun Chao; Radu Soricut; |