Paper Digest: CVPR 2023 Highlights
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2023, it is to be held in Vancouver, Canada.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: CVPR 2023 Highlights
Paper | Author(s) | |
---|---|---|
1 | GFPose: Learning 3D Human Pose Prior With Gradient Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we present GFPose, a versatile framework to model plausible 3D human poses for various applications. |
Hai Ci; Mingdong Wu; Wentao Zhu; Xiaoxuan Ma; Hao Dong; Fangwei Zhong; Yizhou Wang; |
2 | CXTrack: Improving 3D Point Cloud Tracking With Contextual Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve accurate localization for objects of all sizes, we propose a transformer-based localization head with a novel center embedding module to distinguish the target from distractors. |
Tian-Xing Xu; Yuan-Chen Guo; Yu-Kun Lai; Song-Hai Zhang; |
3 | Deep Frequency Filtering for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Deep Frequency Filtering (DFF) for learning domain-generalizable features, which is the first endeavour to explicitly modulate the frequency components of different transfer difficulties across domains in the latent space during training. |
Shiqi Lin; Zhizheng Zhang; Zhipeng Huang; Yan Lu; Cuiling Lan; Peng Chu; Quanzeng You; Jiang Wang; Zicheng Liu; Amey Parulkar; Viraj Navkal; Zhibo Chen; |
4 | Frame Flexible Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly (see Fig.1, which is summarized as Temporal Frequency Deviation phenomenon. To fix this issue, we propose a general framework, named Frame Flexible Network (FFN), which not only enables the model to be evaluated at different frames to adjust its computation, but also reduces the memory costs of storing multiple models significantly. |
Yitian Zhang; Yue Bai; Chang Liu; Huan Wang; Sheng Li; Yun Fu; |
5 | Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle the practical optical flow under real foggy scenes, in this work, we propose a novel unsupervised cumulative domain adaptation optical flow (UCDA-Flow) framework: depth-association motion adaptation and correlation-alignment motion adaptation. |
Hanyu Zhou; Yi Chang; Wending Yan; Luxin Yan; |
6 | NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With NoisyTwins, we first introduce an effective and inexpensive augmentation strategy for class embeddings, which then decorrelates the latents based on self-supervision in the W space. |
Harsh Rangwani; Lavish Bansal; Kartik Sharma; Tejan Karmali; Varun Jampani; R. Venkatesh Babu; |
7 | DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents DisCoScene: a 3D-aware generative model for high-quality and controllable scene synthesis. |
Yinghao Xu; Menglei Chai; Zifan Shi; Sida Peng; Ivan Skorokhodov; Aliaksandr Siarohin; Ceyuan Yang; Yujun Shen; Hsin-Ying Lee; Bolei Zhou; Sergey Tulyakov; |
8 | Revisiting Self-Similarity: Structural Embedding for Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the conventional self-similarity descriptor from a convolutional perspective, to encode both the visual and structural cues of the image to global image representation. |
Seongwon Lee; Suhyeon Lee; Hongje Seong; Euntai Kim; |
9 | Minimizing The Accumulated Trajectory Error To Improve Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these gradient-matching methods suffer from the accumulated trajectory error caused by the discrepancy between the distillation and subsequent evaluation. To alleviate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. |
Jiawei Du; Yidi Jiang; Vincent Y. F. Tan; Joey Tianyi Zhou; Haizhou Li; |
10 | Decoupling-and-Aggregating for Image Exposure Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This will limit the statistical and structural modeling capacity for exposure correction. To address this issue, this paper proposes to decouple the contrast enhancement and detail restoration within each convolution process. |
Yang Wang; Long Peng; Liang Li; Yang Cao; Zheng-Jun Zha; |
11 | Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. |
Ben Agro; Quinlan Sykora; Sergio Casas; Raquel Urtasun; |
12 | CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the open challenges and introduces the first quantum-hybrid approach for 3D shape multi-matching; in addition, it is also cycle-consistent. |
Harshil Bhatia; Edith Tretschk; Zorah Lähner; Marcel Seelbach Benkner; Michael Moeller; Christian Theobalt; Vladislav Golyanik; |
13 | TrojViT: Trojan Insertion in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a stealth and practical ViT-specific backdoor attack TrojViT. |
Mengxin Zheng; Qian Lou; Lei Jiang; |
14 | MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MarS3D, a plug-and-play motion-aware model for semantic segmentation on multi-scan 3D point clouds. |
Jiahui Liu; Chirui Chang; Jianhui Liu; Xiaoyang Wu; Lan Ma; Xiaojuan Qi; |
15 | An Image Quality Assessment Dataset for Portraits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces PIQ23, a portrait-specific IQA dataset of 5116 images of 50 predefined scenarios acquired by 100 smartphones, covering a high variety of brands, models, and use cases. |
Nicolas Chahine; Stefania Calarasanu; Davide Garcia-Civiero; Théo Cayla; Sira Ferradans; Jean Ponce; |
16 | MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. |
Jiale Li; Hang Dai; Hao Han; Yong Ding; |
17 | Robust Outlier Rejection for 3D Registration With Variational Bayes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a novel variational non-local network-based outlier rejection framework for robust alignment. |
Haobo Jiang; Zheng Dang; Zhen Wei; Jin Xie; Jian Yang; Mathieu Salzmann; |
18 | Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the application of Test-time domain adaptation in semantic segmentation (TTDA-Seg) where both efficiency and effectiveness are crucial. |
Wei Wang; Zhun Zhong; Weijie Wang; Xi Chen; Charles Ling; Boyu Wang; Nicu Sebe; |
19 | Painting 3D Nature in 2D: View Synthesis of Natural Scenes From A Single Semantic Mask Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel approach that takes a single semantic mask as input to synthesize multi-view consistent color images of natural scenes, trained with a collection of single images from the Internet. |
Shangzhan Zhang; Sida Peng; Tianrun Chen; Linzhan Mou; Haotong Lin; Kaicheng Yu; Yiyi Liao; Xiaowei Zhou; |
20 | LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome these, we present LANguage-driven Image-to-image Translation model, dubbed LANIT. |
Jihye Park; Sunwoo Kim; Soohyun Kim; Seokju Cho; Jaejun Yoo; Youngjung Uh; Seungryong Kim; |
21 | MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they generally suffer from two limitations: i) the matching procedure between local frames tends to be inaccurate due to the lack of guidance to force long-range temporal perception; ii) explicit motion learning is usually ignored, leading to partial information loss. To address these issues, we develop a Motion-augmented Long-short Contrastive Learning (MoLo) method that contains two crucial components, including a long-short contrastive objective and a motion autodecoder. |
Xiang Wang; Shiwei Zhang; Zhiwu Qing; Changxin Gao; Yingya Zhang; Deli Zhao; Nong Sang; |
22 | Fast Point Cloud Generation With Straight Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While beneficial, the complexity of learning steps has limited its applications to many 3D real-world. To address this limitation, we propose Point Straight Flow (PSF), a model that exhibits impressive performance using one step. |
Lemeng Wu; Dilin Wang; Chengyue Gong; Xingchao Liu; Yunyang Xiong; Rakesh Ranjan; Raghuraman Krishnamoorthi; Vikas Chandra; Qiang Liu; |
23 | Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To associate with preset attributes, most existing approaches focus on supervised learning for semantically meaningful latent space traversal directions, and each manipulation step is typically determined for an individual attribute. To address this limitation, we propose a Text-guided Unsupervised StyleGAN Latent Transformation (TUSLT) model, which adaptively infers a single transformation step in the latent space of StyleGAN to simultaneously manipulate multiple attributes on a given input image. |
Xiwen Wei; Zhen Xu; Cheng Liu; Si Wu; Zhiwen Yu; Hau San Wong; |
24 | Achieving A Better Stability-Plasticity Trade-Off Via Auxiliary Networks in Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Auxiliary Network Continual Learning (ANCL), a novel method that applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability. |
Sanghwan Kim; Lorenzo Noci; Antonio Orvieto; Thomas Hofmann; |
25 | Power Bundle Adjustment for Large-Scale 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Power Bundle Adjustment as an expansion type algorithm for solving large-scale bundle adjustment problems. |
Simon Weber; Nikolaus Demmel; Tin Chon Chan; Daniel Cremers; |
26 | Picture That Sketch: Photorealistic Image Generation From Abstract Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our contribution at the outset is a decoupled encoder-decoder training paradigm, where the decoder is a StyleGAN trained on photos only. |
Subhadeep Koley; Ayan Kumar Bhunia; Aneeshan Sain; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; |
27 | Contrastive Semi-Supervised Learning for Underwater Image Restoration Via Reliable Bank Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a mean-teacher based Semi-supervised Underwater Image Restoration (Semi-UIR) framework to incorporate the unlabeled data into network training. |
Shirui Huang; Keyan Wang; Huan Liu; Jun Chen; Yunsong Li; |
28 | Video Event Restoration Based on Keyframes for Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration, where a cross-attention and a temporal upsampling residual skip connection are introduced to further assist in restoring complex static and dynamic motion object features in the video. |
Zhiwei Yang; Jing Liu; Zhaoyang Wu; Peng Wu; Xiaotao Liu; |
29 | EcoTTA: Memory-Efficient Continual Test-Time Adaptation Via Self-Distilled Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a simple yet effective approach that improves continual test-time adaptation (TTA) in a memory-efficient manner. |
Junha Song; Jungsoo Lee; In So Kweon; Sungha Choi; |
30 | 3D-Aware Object Goal Navigation Via Simultaneous Exploration and Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. |
Jiazhao Zhang; Liu Dai; Fanpeng Meng; Qingnan Fan; Xuelin Chen; Kai Xu; He Wang; |
31 | Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite its better efficiency than voxel representation, it has difficulty describing the fine-grained 3D structure of a scene with a single plane. To address this, we propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes. |
Yuanhui Huang; Wenzhao Zheng; Yunpeng Zhang; Jie Zhou; Jiwen Lu; |
32 | Castling-ViT: Compressing Self-Attention Via Switching Towards Linear-Angular Attention at Vision Transformer Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? |
Haoran You; Yunyang Xiong; Xiaoliang Dai; Bichen Wu; Peizhao Zhang; Haoqi Fan; Peter Vajda; Yingyan (Celine) Lin; |
33 | Shape, Pose, and Appearance From A Single Image Via Bootstrapped Radiance Field Inversion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. |
Dario Pavllo; David Joseph Tan; Marie-Julie Rakotosaona; Federico Tombari; |
34 | Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. |
Jiaming Zhang; Xingjun Ma; Qi Yi; Jitao Sang; Yu-Gang Jiang; Yaowei Wang; Changsheng Xu; |
35 | Rethinking Federated Learning With Domain Shift: A Prototype View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Federated Prototypes Learning (FPL) for federated learning under domain shift. |
Wenke Huang; Mang Ye; Zekun Shi; He Li; Bo Du; |
36 | NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. |
Wenjing Bian; Zirui Wang; Kejie Li; Jia-Wang Bian; Victor Adrian Prisacariu; |
37 | HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel hierarchical grouping transformer (HGFormer) to explicitly group pixels to form part-level masks and then whole-level masks. |
Jian Ding; Nan Xue; Gui-Song Xia; Bernt Schiele; Dengxin Dai; |
38 | Distilling Vision-Language Pre-Training To Collaborate With Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle this issue without additional annotations, this paper considers to distill free action knowledge from Vision-Language Pre-training (VLP), as we surprisingly observe that the localization results of vanilla VLP have an over-complete issue, which is just complementary to the CBP results. To fuse such complementarity, we propose a novel distillation-collaboration framework with two branches acting as CBP and VLP respectively. |
Chen Ju; Kunhao Zheng; Jinxiang Liu; Peisen Zhao; Ya Zhang; Jianlong Chang; Qi Tian; Yanfeng Wang; |
39 | Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance. |
Zhen Zhao; Lihe Yang; Sifan Long; Jimin Pi; Luping Zhou; Jingdong Wang; |
40 | SIEDOB: Semantic Image Editing By Disentangling Object and Background Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, they remain limited in processing content-rich images and suffer from generating unrealistic objects and texture-inconsistent backgrounds. To address this issue, we propose a novel paradigm, Semantic Image Editing by Disentangling Object and Background (SIEDOB), the core idea of which is to explicitly leverages several heterogeneous subnetworks for objects and backgrounds. |
Wuyang Luo; Su Yang; Xinjian Zhang; Weishan Zhang; |
41 | Multiclass Confidence and Localization Calibration for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new train-time technique for calibrating modern object detection methods. |
Bimsara Pathiraja; Malitha Gunawardhana; Muhammad Haris Khan; |
42 | Query-Dependent Video Representation for Moment Retrieval and Highlight Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR (QD-DETR), a detection transformer tailored for MR/HD. |
WonJun Moon; Sangeek Hyun; SangUk Park; Dongchan Park; Jae-Pil Heo; |
43 | Robust 3D Shape Classification Via Non-Local Graph Attention Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a non-local graph attention network (NLGAT), which generates a novel global descriptor through two sub-networks for robust 3D shape classification. |
Shengwei Qin; Zhong Li; Ligang Liu; |
44 | Boosting Verified Training for Robust Image Classifications Via Abstraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel, abstraction-based, certified training method for robust image classifiers. |
Zhaodi Zhang; Zhiyi Xue; Yang Chen; Si Liu; Yueling Zhang; Jing Liu; Min Zhang; |
45 | Exploring Structured Semantic Prior for Multi Label Recognition With Incomplete Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior about the label-to-label correspondence via a semantic prior prompter. |
Zixuan Ding; Ao Wang; Hui Chen; Qiang Zhang; Pengzhang Liu; Yongjun Bao; Weipeng Yan; Jungong Han; |
46 | Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we emphasize the cruciality of instance differences and propose an instance-specific and model-adaptive supervision for semi-supervised semantic segmentation, named iMAS. |
Zhen Zhao; Sifan Long; Jimin Pi; Jingdong Wang; Luping Zhou; |
47 | 3D Shape Reconstruction of Semi-Transparent Worms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This approach is not viable when the subject is semi-transparent and moving in and out of focus. Here we overcome these challenges by rendering a candidate shape with adaptive blurring and transparency for comparison with the images. |
Thomas P. Ilett; Omer Yuval; Thomas Ranner; Netta Cohen; David C. Hogg; |
48 | Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Interestingly, during the training phase supervised by point labels, we discover that CNNs first learn to segment a cluster of pixels near the targets, and then gradually converge to predict groundtruth point labels. Motivated by this "mapping degeneration" phenomenon, we propose a label evolution framework named label evolution with single point supervision (LESPS) to progressively expand the point label by leveraging the intermediate predictions of CNNs. |
Xinyi Ying; Li Liu; Yingqian Wang; Ruojing Li; Nuo Chen; Zaiping Lin; Weidong Sheng; Shilin Zhou; |
49 | Swept-Angle Synthetic Wavelength Interferometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new imaging technique, swept-angle synthetic wavelength interferometry, for full-field micron-scale 3D sensing. |
Alankar Kotwal; Anat Levin; Ioannis Gkioulekas; |
50 | Delving Into Shape-Aware Zero-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, translating this success to semantic segmentation is not trivial, because this dense prediction task requires not only accurate semantic understanding but also fine shape delineation and existing vision-language models are trained with image-level language descriptions. To bridge this gap, we pursue shape-aware zero-shot semantic segmentation in this study. |
Xinyu Liu; Beiwen Tian; Zhen Wang; Rui Wang; Kehua Sheng; Bo Zhang; Hao Zhao; Guyue Zhou; |
51 | Post-Training Quantization on Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we accelerate generation from the perspective of compressing the noise estimation network. |
Yuzhang Shang; Zhihang Yuan; Bin Xie; Bingzhe Wu; Yan Yan; |
52 | Adaptive Global Decay Process for Event Cameras Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead propose a novel decay process for event cameras that adapts to the global scene dynamics and whose latency is in the order of nanoseconds. |
Urbano Miguel Nunes; Ryad Benosman; Sio-Hoi Ieng; |
53 | Multi-Space Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of calculating a single radiance field, we propose a multispace neural radiance field (MS-NeRF) that represents the scene using a group of feature fields in parallel sub-spaces, which leads to a better understanding of the neural network toward the existence of reflective and refractive objects. |
Ze-Xin Yin; Jiaxiong Qiu; Ming-Ming Cheng; Bo Ren; |
54 | Leveraging Inter-Rater Agreement for Classification in The Presence of Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. |
Maria Sofia Bucarelli; Lucas Cassano; Federico Siciliano; Amin Mantrach; Fabrizio Silvestri; |
55 | Bitstream-Corrupted JPEG Images Are Restorable: Two-Stage Compensation and Alignment Framework for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a real-world JPEG image restoration problem with bit errors on the encrypted bitstream. |
Wenyang Liu; Yi Wang; Kim-Hui Yap; Lap-Pui Chau; |
56 | Analyzing Physical Impacts Using Transient Surface Wave Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extract information from the transient surface vibrations simultaneously measured at a sparse set of object points using the dual-shutter camera described by Sheinin[31]. |
Tianyuan Zhang; Mark Sheinin; Dorian Chan; Mark Rau; Matthew O’Toole; Srinivasa G. Narasimhan; |
57 | X-Pruner: EXplainable Pruning for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent studies have proposed to prune transformers in an unexplainable manner, which overlook the relationship between internal units of the model and the target class, thereby leading to inferior performance. To alleviate this problem, we propose a novel explainable pruning framework dubbed X-Pruner, which is designed by considering the explainability of the pruning criterion. |
Lu Yu; Wei Xiang; |
58 | Hard Sample Matters A Lot in Zero-Shot Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, quantized models obtained by these methods suffer from significant performance degradation on hard samples. To address this issue, we propose HArd sample Synthesizing and Training (HAST). |
Huantong Li; Xiangmiao Wu; Fanbing Lv; Daihai Liao; Thomas H. Li; Yonggang Zhang; Bo Han; Mingkui Tan; |
59 | Meta Compositional Referring Expression Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, through the lens of meta learning, we propose a Meta Compositional Referring Expression Segmentation (MCRES) framework to enhance model compositional generalization performance. |
Li Xu; Mark He Huang; Xindi Shang; Zehuan Yuan; Ying Sun; Jun Liu; |
60 | Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel heterogeneous graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. |
Tsai Hor Chan; Fernando Julio Cendra; Lan Ma; Guosheng Yin; Lequan Yu; |
61 | ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a scanpath prediction method for 360deg images by designing a novel Deep Markov Model (DMM) architecture, namely ScanDMM. |
Xiangjie Sui; Yuming Fang; Hanwei Zhu; Shiqi Wang; Zhou Wang; |
62 | Towards All-in-One Pre-Training Via Maximizing Multi-Modal Mutual Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first propose a general multi-modal mutual information formula as a unified optimization target and demonstrate that all mainstream approaches are special cases of our framework. Under this unified perspective, we propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training). |
Weijie Su; Xizhou Zhu; Chenxin Tao; Lewei Lu; Bin Li; Gao Huang; Yu Qiao; Xiaogang Wang; Jie Zhou; Jifeng Dai; |
63 | Aligning Bag of Regions for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to align the embedding of bag of regions beyond individual regions. |
Size Wu; Wenwei Zhang; Sheng Jin; Wentao Liu; Chen Change Loy; |
64 | Two-View Geometry Scoring Without Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a remedy, we propose the Fundamental Scoring Network (FSNet), which infers a score for a pair of overlapping images and any proposed fundamental matrix. |
Axel Barroso-Laguna; Eric Brachmann; Victor Adrian Prisacariu; Gabriel J. Brostow; Daniyar Turmukhambetov; |
65 | Annealing-Based Label-Transfer Learning for Open World Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we claim the learning of object detection could be seen as an object-level feature-entanglement process, where unknown traits are propagated to the known proposals through convolutional operations and could be distilled to benefit unknown recognition without manual selection. |
Yuqing Ma; Hainan Li; Zhange Zhang; Jinyang Guo; Shanghang Zhang; Ruihao Gong; Xianglong Liu; |
66 | Continual Semantic Segmentation With Automatic Memory Sample Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel memory sample selection mechanism that selects informative samples for effective replay in a fully automatic way by considering comprehensive factors including sample diversity and class performance. |
Lanyun Zhu; Tianrun Chen; Jianxiong Yin; Simon See; Jun Liu; |
67 | Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite their simplicity, fine-tuning based approaches typically yield competitive detection results. Based on this observation, we focus on the role of loss functions and augmentations as the force driving the fine-tuning process, and propose to tune their dynamics through meta-learning principles. |
Berkan Demirel; Orhun Buğra Baran; Ramazan Gokberk Cinbis; |
68 | A Light Weight Model for Active Speaker Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although these methods have achieved excellent performance, their high memory and computational power consumption render their application to resource-limited scenarios difficult. Therefore, in this study, a lightweight active speaker detection architecture is constructed by reducing the number of input candidates, splitting 2D and 3D convolutions for audio-visual feature extraction, and applying gated recurrent units with low computational complexity for cross-modal modeling. |
Junhua Liao; Haihan Duan; Kanghui Feng; Wanbing Zhao; Yanbing Yang; Liangyin Chen; |
69 | Self-Supervised Video Forensics By Audio-Visual Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. |
Chao Feng; Ziyang Chen; Andrew Owens; |
70 | CLIP2Scene: Towards Label-Efficient 3D Scene Understanding By CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CLIP2Scene, a simple yet effective framework that transfers CLIP knowledge from 2D image-text pre-trained models to a 3D point cloud network. |
Runnan Chen; Youquan Liu; Lingdong Kong; Xinge Zhu; Yuexin Ma; Yikang Li; Yuenan Hou; Yu Qiao; Wenping Wang; |
71 | GCFAgg: Global and Cross-View Feature Aggregation for Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing deep clustering methods learn consensus representation or view-specific representations from multiple views via view-wise aggregation way, where they ignore structure relationship of all samples. In this paper, we propose a novel multi-view clustering network to address these problems, called Global and Cross-view Feature Aggregation for Multi-View Clustering (GCFAggMVC). |
Weiqing Yan; Yuanyang Zhang; Chenlei Lv; Chang Tang; Guanghui Yue; Liang Liao; Weisi Lin; |
72 | Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, problems in FSSL are still yet to be solved. To seek for a fundamental solution to this problem, we present Class Balanced Adaptive Pseudo Labeling (CBAFed), to study FSSL from the perspective of pseudo labeling. |
Ming Li; Qingli Li; Yan Wang; |
73 | Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. |
Jingyao Li; Pengguang Chen; Zexin He; Shaozuo Yu; Shu Liu; Jiaya Jia; |
74 | DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we propose guided posterior regularization DeGPR, which assists an object detector by guiding it to exploit discriminative features among cells. |
Aayush Kumar Tyagi; Chirag Mohapatra; Prasenjit Das; Govind Makharia; Lalita Mehra; Prathosh AP; Mausam; |
75 | Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the trend of large-scale unsupervised learning in 3D has yet to emerge due to two stumbling blocks: the inefficiency of matching RGB-D frames as contrastive views and the annoying mode collapse phenomenon mentioned in previous works. Turning the two stumbling blocks into empirical stepping stones, we first propose an efficient and effective contrastive learning framework, which generates contrastive views directly on scene-level point clouds by a well-curated data augmentation pipeline and a practical view mixing strategy. Second, we introduce reconstructive learning on the contrastive learning framework with an exquisite design of contrastive cross masks, which targets the reconstruction of point color and surfel normal. |
Xiaoyang Wu; Xin Wen; Xihui Liu; Hengshuang Zhao; |
76 | Multi Domain Learning for Motion Magnification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The deep learning based approach has higher magnification but is prone to severe artifacts in some scenarios. We propose a new phase based deep network for video motion magnification that operates in both domains (frequency and spatial) to address this issue. |
Jasdeep Singh; Subrahmanyam Murala; G. Sankara Raju Kosuru; |
77 | LOGO: A Long-Form Video Dataset for Group Action Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. |
Shiyi Zhang; Wenxun Dai; Sujia Wang; Xiangwei Shen; Jiwen Lu; Jie Zhou; Yansong Tang; |
78 | A Simple Baseline for Video Restoration With Grouped Spatial-Temporal Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose a simple yet effective framework for video restoration. |
Dasong Li; Xiaoyu Shi; Yi Zhang; Ka Chun Cheung; Simon See; Xiaogang Wang; Hongwei Qin; Hongsheng Li; |
79 | UniSim: A Neural Closed-Loop Sensor Simulator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present UniSim, a neural sensor simulator that takes a single recorded log captured by a sensor-equipped vehicle and converts it into a realistic closed-loop multi-sensor simulation. |
Ze Yang; Yun Chen; Jingkang Wang; Sivabalan Manivasagam; Wei-Chiu Ma; Anqi Joyce Yang; Raquel Urtasun; |
80 | ItKD: Interchange Transfer-Based Knowledge Distillation for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To learn the map-view feature of a teacher network, the features from teacher and student networks are independently passed through the shared autoencoder; here, we use a compressed representation loss that binds the channel-wised compression knowledge from both student and teacher networks as a kind of regularization. |
Hyeon Cho; Junyong Choi; Geonwoo Baek; Wonjun Hwang; |
81 | SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SliceMatch, which consists of ground and aerial feature extractors, feature aggregators, and a pose predictor. |
Ted Lentsch; Zimin Xia; Holger Caesar; Julian F. P. Kooij; |
82 | 2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address errors that arise from low-light regions and other night-related attributes in images, we propose a night-specific augmentation pipeline called NightAug. |
Mikhail Kennerley; Jian-Gang Wang; Bharadwaj Veeravalli; Robby T. Tan; |
83 | Prefix Conditioning Unifies Language and Label Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study a pretraining strategy that uses both classification and caption datasets to unite their complementary benefits. |
Kuniaki Saito; Kihyuk Sohn; Xiang Zhang; Chun-Liang Li; Chen-Yu Lee; Kate Saenko; Tomas Pfister; |
84 | Panoptic Lifting for 3D Scene Understanding With Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. |
Yawar Siddiqui; Lorenzo Porzi; Samuel Rota Bulò; Norman Müller; Matthias Nießner; Angela Dai; Peter Kontschieder; |
85 | WeatherStream: Light Transport Automation of Single Image Deweathering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce WeatherStream, an automatic pipeline capturing all real-world weather effects (rain, snow, and rain fog degradations), along with their clean image pairs. |
Howard Zhang; Yunhao Ba; Ethan Yang; Varan Mehra; Blake Gella; Akira Suzuki; Arnold Pfahnl; Chethan Chinder Chandrappa; Alex Wong; Achuta Kadambi; |
86 | Learning To Detect Mirrors From Videos Via Dual Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our observation is that there are often correspondences between the contents inside (reflected) and outside (real) of a mirror, but such correspondences may not always appear in every frame, e.g., due to the change of camera pose. This inspires us to propose a video mirror detection method, named VMD-Net, that can tolerate spatially missing correspondences by considering the mirror correspondences at both the intra-frame level as well as inter-frame level via a dual correspondence module that looks over multiple frames spatially and temporally for correlating correspondences. |
Jiaying Lin; Xin Tan; Rynson W.H. Lau; |
87 | Single View Scene Scale Estimation Using Scale Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a single image scale estimation method based on a novel scale field representation. |
Byeong-Uk Lee; Jianming Zhang; Yannick Hold-Geoffroy; In So Kweon; |
88 | Learning Semantic-Aware Disentangled Representation for Flexible 3D Human Body Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a human body representation with fine-grained semantics and high reconstruction-accuracy in an unsupervised setting. |
Xiaokun Sun; Qiao Feng; Xiongzheng Li; Jinsong Zhang; Yu-Kun Lai; Jingyu Yang; Kun Li; |
89 | Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Training a robust classifier against this crop-related variability requires abundant training data, which is not available in few-shot settings. To mitigate this issue, we propose a novel variational autoencoder (VAE) based data generation model, which is capable of generating data with increased crop-related diversity. |
Jingyi Xu; Hieu Le; Dimitris Samaras; |
90 | Towards Scalable Neural Representation for Diverse Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first show that instead of dividing videos into small subsets and encoding them with separate models, encoding long and diverse videos jointly with a unified model achieves better compression results. Based on this observation, we propose D-NeRV, a novel neural representation framework designed to encode diverse videos by (i) decoupling clip-specific visual content from motion information, (ii) introducing temporal reasoning into the implicit neural network, and (iii) employing the task-oriented flow as intermediate output to reduce spatial redundancies. |
Bo He; Xitong Yang; Hanyu Wang; Zuxuan Wu; Hao Chen; Shuaiyi Huang; Yixuan Ren; Ser-Nam Lim; Abhinav Shrivastava; |
91 | The Devil Is in The Points: Weakly Semi-Supervised Instance Segmentation Via Point-Guided Mask Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel learning scheme named weakly semi-supervised instance segmentation (WSSIS) with point labels for budget-efficient and high-performance instance segmentation. |
Beomyoung Kim; Joonhyun Jeong; Dongyoon Han; Sung Ju Hwang; |
92 | Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first propose a novel method for generating composite adversarial examples. Our method can find the optimal attack composition by utilizing component-wise projected gradient descent and automatic attack-order scheduling. We then propose generalized adversarial training (GAT) to extend model robustness from Lp-ball to composite semantic perturbations, such as the combination of Hue, Saturation, Brightness, Contrast, and Rotation. |
Lei Hsiung; Yun-Yun Tsai; Pin-Yu Chen; Tsung-Yi Ho; |
93 | Language-Guided Audio-Visual Source Separation Via Trimodal Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data. |
Reuben Tan; Arijit Ray; Andrea Burns; Bryan A. Plummer; Justin Salamon; Oriol Nieto; Bryan Russell; Kate Saenko; |
94 | CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. |
Jiangbin Zheng; Yile Wang; Cheng Tan; Siyuan Li; Ge Wang; Jun Xia; Yidong Chen; Stan Z. Li; |
95 | DynaMask: Dynamic Mask Selection for Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to dynamically select suitable masks for different object proposals. |
Ruihuang Li; Chenhang He; Shuai Li; Yabin Zhang; Lei Zhang; |
96 | Paint By Example: Exemplar-Based Image Editing With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate exemplar-guided image editing for more precise control. |
Binxin Yang; Shuyang Gu; Bo Zhang; Ting Zhang; Xuejin Chen; Xiaoyan Sun; Dong Chen; Fang Wen; |
97 | Ego-Body Pose Estimation Via Ego-Head Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To eliminate the need for paired egocentric video and human motions, we propose a new method, Ego-Body Pose Estimation via Ego-Head Pose Estimation (EgoEgo), which decomposes the problem into two stages, connected by the head motion as an intermediate representation. |
Jiaman Li; Karen Liu; Jiajun Wu; |
98 | SAP-DETR: Bridging The Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To bridge the gap between the reference points of salient queries and Transformer detectors, we propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects. |
Yang Liu; Yao Zhang; Yixin Wang; Yang Zhang; Jiang Tian; Zhongchao Shi; Jianping Fan; Zhiqiang He; |
99 | GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm. |
Honghui Yang; Tong He; Jiaheng Liu; Hua Chen; Boxi Wu; Binbin Lin; Xiaofei He; Wanli Ouyang; |
100 | Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework to capture more fine-grained clues in complex scenarios for tampered text detection, termed as Document Tampering Detector (DTD), which consists of a Frequency Perception Head (FPH) to compensate the deficiencies caused by the inconspicuous visual features, and a Multi-view Iterative Decoder (MID) for fully utilizing the information of features in different scales. |
Chenfan Qu; Chongyu Liu; Yuliang Liu; Xinhong Chen; Dezhi Peng; Fengjun Guo; Lianwen Jin; |
101 | Learning Rotation-Equivariant Features for Visual Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a self-supervised learning framework to extract discriminative rotation-invariant descriptors using group-equivariant CNNs. |
Jongmin Lee; Byungjin Kim; Seungwook Kim; Minsu Cho; |
102 | DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new benchmark called DexArt, which involves Dexterous manipulation with Articulated objects in a physical simulator. |
Chen Bao; Helin Xu; Yuzhe Qin; Xiaolong Wang; |
103 | DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose an improved model called DeSTSeg, which integrates a pre-trained teacher network, a denoising student encoder-decoder, and a segmentation network into one framework. |
Xuan Zhang; Shiyu Li; Xi Li; Ping Huang; Jiulong Shan; Ting Chen; |
104 | Neural Rate Estimator and Unsupervised Learning for Efficient Distributed Image Analytics in Split-DNN Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most works on compression for image analytics use heuristic approaches to estimate the rate, leading to suboptimal performance. We propose a high-quality ‘neural rate-estimator’ to address this gap. |
Nilesh Ahuja; Parual Datta; Bhavya Kanzariya; V. Srinivasa Somayazulu; Omesh Tickoo; |
105 | Object Pop-Up: Can We Infer 3D Objects and Their Poses From Human Interactions Alone? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the inverse perspective is significantly less explored: Can we infer 3D objects and their poses from human interactions alone? Our investigation follows this direction, showing that a generic 3D human point cloud is enough to pop up an unobserved object, even when the user is just imitating a functionality (e.g., looking through a binocular) without involving a tangible counterpart. |
Ilya A. Petrov; Riccardo Marin; Julian Chibane; Gerard Pons-Moll; |
106 | VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the VoP: Text-Video Co-operative Prompt Tuning for efficient tuning on the text-video retrieval task. |
Siteng Huang; Biao Gong; Yulin Pan; Jianwen Jiang; Yiliang Lv; Yuyuan Li; Donglin Wang; |
107 | Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the former, we propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances. |
Aneeshan Sain; Ayan Kumar Bhunia; Subhadeep Koley; Pinaki Nath Chowdhury; Soumitri Chattopadhyay; Tao Xiang; Yi-Zhe Song; |
108 | You Do Not Need Additional Priors or Regularizers in Retinex-Based Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a contrastive learning method and a self-knowledge distillation method that allow training our Retinex-based model for Retinex decomposition without elaborate hand-crafted regularization functions. |
Huiyuan Fu; Wenkai Zheng; Xiangyu Meng; Xin Wang; Chuanming Wang; Huadong Ma; |
109 | PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Driven by the principle of explainability-by-design, we introduce PIP-Net (Patch-based Intuitive Prototypes Network): an interpretable image classification model that learns prototypical parts in a self-supervised fashion which correlate better with human vision. |
Meike Nauta; Jörg Schlötterer; Maurice van Keulen; Christin Seifert; |
110 | SCADE: NeRFs from Space Carving With Ambiguity-Aware Depth Estimates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a well-known drawback of NeRFs is the less-than-ideal performance under a small number of views, due to insufficient constraints enforced by volumetric rendering. To address this issue, we introduce SCADE, a novel technique that improves NeRF reconstruction quality on sparse, unconstrained input views for in-the-wild indoor scenes. |
Mikaela Angelina Uy; Ricardo Martin-Brualla; Leonidas Guibas; Ke Li; |
111 | Re-Thinking Model Inversion Attacks Against Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit MI, study two fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms, and propose solutions to these issues which lead to a significant boost in attack performance for all SOTA MI. |
Ngoc-Bao Nguyen; Keshigeyan Chandrasegaran; Milad Abdollahzadeh; Ngai-Man Cheung; |
112 | 1% VS 100%: Parameter-Efficient Low Rank Adapter for Dense Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LoRand, a method for fine-tuning large-scale vision models with a better trade-off between task performance and the number of trainable parameters. |
Dongshuo Yin; Yiran Yang; Zhechao Wang; Hongfeng Yu; Kaiwen Wei; Xian Sun; |
113 | ResFormer: Scaling ViTs With Multi-Resolution Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions. |
Rui Tian; Zuxuan Wu; Qi Dai; Han Hu; Yu Qiao; Yu-Gang Jiang; |
114 | You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is suboptimal in term of saving computation power to ignore the early exiting in encoder component. To handle this challenge, we propose a novel early exiting strategy for unified visual language models, which allows dynamically skip the layers in encoder and decoder simultaneously in term of input layer-wise similarities with multiple times of early exiting, namely MuE. |
Shengkun Tang; Yaqing Wang; Zhenglun Kong; Tianchi Zhang; Yao Li; Caiwen Ding; Yanzhi Wang; Yi Liang; Dongkuan Xu; |
115 | CloSET: Modeling Clothed Humans on Continuous Surface With Explicit Template Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing learning-based methods typically add pose-dependent deformations upon a minimally-clothed mesh template or a learned implicit template, which have limitations in capturing details or hinder end-to-end learning. In this paper, we revisit point-based solutions and propose to decompose explicit garment-related templates and then add pose-dependent wrinkles to them. |
Hongwen Zhang; Siyou Lin; Ruizhi Shao; Yuxiang Zhang; Zerong Zheng; Han Huang; Yandong Guo; Yebin Liu; |
116 | BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose BUOL, a Bottom-Up framework with Occupancy-aware Lifting to address the two issues for panoptic 3D scene reconstruction from a single image. |
Tao Chu; Pan Zhang; Qiong Liu; Jiaqi Wang; |
117 | Hierarchical Video-Moment Retrieval and Step-Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such an end-to-end setup would allow for many interesting applications, e.g., a text-based search that finds a relevant video from a video corpus, extracts the most relevant moment from that video, and segments the moment into important steps with captions. To address this, we present the HiREST (HIerarchical REtrieval and STep-captioning) dataset and propose a new benchmark that covers hierarchical information retrieval and visual/textual stepwise summarization from an instructional video corpus. |
Abhay Zala; Jaemin Cho; Satwik Kottur; Xilun Chen; Barlas Oguz; Yashar Mehdad; Mohit Bansal; |
118 | PROB: Probabilistic Objectness for Open World Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Herein, we introduce a novel probabilistic framework for objectness estimation, where we alternate between probability distribution estimation and objectness likelihood maximization of known objects in the embedded feature space – ultimately allowing us to estimate the objectness probability of different proposals. |
Orr Zohar; Kuan-Chieh Wang; Serena Yeung; |
119 | PD-Quant: Post-Training Quantization Based on Prediction Difference Metric Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods attempt to determine these parameters by minimize the distance between features before and after quantization, but such an approach only considers local information and may not result in the most optimal quantization parameters. We analyze this issue and propose PD-Quant, a method that addresses this limitation by considering global information. |
Jiawei Liu; Lin Niu; Zhihang Yuan; Dawei Yang; Xinggang Wang; Wenyu Liu; |
120 | AUNet: Learning Relations Between Action Units for Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Observing that face manipulation may alter the relation between different facial action units (AU), we propose the Action Units Relation Learning framework to improve the generality of forgery detection. |
Weiming Bai; Yufan Liu; Zhipeng Zhang; Bing Li; Weiming Hu; |
121 | SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. |
Zhizhuo Zhou; Shubham Tulsiani; |
122 | PolyFormer: Referring Image Segmentation As Sequential Polygon Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. |
Jiang Liu; Hui Ding; Zhaowei Cai; Yuting Zhang; Ravi Kumar Satzoda; Vijay Mahadevan; R. Manmatha; |
123 | Seeing What You Miss: Vision-Language Pre-Training With Semantic Completion Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of them pay little attention to the global semantic features generated for the masked data, resulting in a limited cross-modal alignment ability of global representations. Therefore, in this paper, we propose a novel Semantic Completion Learning (SCL) task, complementary to existing masked modeling tasks, to facilitate global-to-local alignment. |
Yatai Ji; Rongcheng Tu; Jie Jiang; Weijie Kong; Chengfei Cai; Wenzhe Zhao; Hongfa Wang; Yujiu Yang; Wei Liu; |
124 | Interactive Segmentation As Gaussion Process Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Albeit achieving promising performance, they do not fully and explicitly utilize and propagate the click information, inevitably leading to unsatisfactory segmentation results, even at clicked points. Against this issue, in this paper, we propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image. |
Minghao Zhou; Hong Wang; Qian Zhao; Yuexiang Li; Yawen Huang; Deyu Meng; Yefeng Zheng; |
125 | Differentiable Shadow Mapping for Efficient Inverse Graphics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate at several inverse graphics problems that differentiable shadow maps are orders of magnitude faster than differentiable light transport simulation with similar accuracy — while differentiable rasterization without shadows often fails to converge. |
Markus Worchel; Marc Alexa; |
126 | Dynamic Focus-Aware Positional Queries for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously. |
Haoyu He; Jianfei Cai; Zizheng Pan; Jing Liu; Jing Zhang; Dacheng Tao; Bohan Zhuang; |
127 | A Practical Stereo Depth System for Smart Glasses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. |
Jialiang Wang; Daniel Scharstein; Akash Bapat; Kevin Blackburn-Matzen; Matthew Yu; Jonathan Lehman; Suhib Alsisan; Yanghan Wang; Sam Tsai; Jan-Michael Frahm; Zijian He; Peter Vajda; Michael F. Cohen; Matt Uyttendaele; |
128 | Understanding and Constructing Latent Modality Structures in Multi-Modal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose three general approaches to construct latent modality structures. |
Qian Jiang; Changyou Chen; Han Zhao; Liqun Chen; Qing Ping; Son Dinh Tran; Yi Xu; Belinda Zeng; Trishul Chilimbi; |
129 | PointConvFormer: Revenge of The Point-Based Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. |
Wenxuan Wu; Li Fuxin; Qi Shan; |
130 | Instant Volumetric Head Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. |
Wojciech Zielonka; Timo Bolkart; Justus Thies; |
131 | HARP: Personalized Hand Reconstruction From A Monocular RGB Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. |
Korrawe Karunratanakul; Sergey Prokudin; Otmar Hilliges; Siyu Tang; |
132 | Variational Distribution Learning for Unsupervised Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training. |
Minsoo Kang; Doyup Lee; Jiseob Kim; Saehoon Kim; Bohyung Han; |
133 | MetaMix: Towards Corruption-Robust Continual Learning With Temporally Self-Adaptive Data Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our empirical evaluation results show that existing state-of-the-art (SOTA) CL models are particularly vulnerable to various data corruptions during testing. To make them trustworthy and robust to corruptions deployed in safety-critical scenarios, we propose a meta-learning framework of self-adaptive data augmentation to tackle the corruption robustness in CL. |
Zhenyi Wang; Li Shen; Donglin Zhan; Qiuling Suo; Yanjun Zhu; Tiehang Duan; Mingchen Gao; |
134 | Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the increasing interest and rapid development of methods for Ultra-High Resolution (UHR) segmentation, a large-scale benchmark covering a wide range of scenes with full fine-grained dense annotations is urgently needed to facilitate the field. To this end, the URUR dataset is introduced, in the meaning of Ultra-High Resolution dataset with Ultra-Rich Context. |
Deyi Ji; Feng Zhao; Hongtao Lu; Mingyuan Tao; Jieping Ye; |
135 | DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first establish a surprisingly simple but strong benchmark for generalization which utilizes diverse augmentations within a training minibatch, and show that this can learn a more balanced distribution of features. Further, we propose Diversify-Aggregate-Repeat Training (DART) strategy that first trains diverse models using different augmentations (or domains) to explore the loss basin, and further Aggregates their weights to combine their expertise and obtain improved generalization. |
Samyak Jain; Sravanti Addepalli; Pawan Kumar Sahu; Priyam Dey; R. Venkatesh Babu; |
136 | Cross-Domain Image Captioning With Discriminative Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. |
Roberto Dessì; Michele Bevilacqua; Eleonora Gualdoni; Nathanaël Carraz Rakotonirina; Francesca Franzon; Marco Baroni; |
137 | Accelerating Vision-Language Pretraining With Free Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To accelerate the convergence of VLP, we propose a new pretraining task, namely, free language modeling (FLM), that enables a 100% prediction rate with arbitrary corruption rates. |
Teng Wang; Yixiao Ge; Feng Zheng; Ran Cheng; Ying Shan; Xiaohu Qie; Ping Luo; |
138 | Efficient Mask Correction for Click-Based Interactive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an efficient method to correct the mask with a lightweight mask correction network. |
Fei Du; Jianlong Yuan; Zhibin Wang; Fan Wang; |
139 | DBARF: Deep Bundle-Adjusting Generalizable Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first analyze the difficulties of jointly optimizing camera poses with GeNeRFs, and then further propose our DBARF to tackle these issues. |
Yu Chen; Gim Hee Lee; |
140 | EvShutter: Transforming Events for Unconstrained Rolling Shutter Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new method, called Eventful Shutter (EvShutter), that corrects RS artifacts using a single RGB image and event information with high temporal resolution. |
Julius Erbach; Stepan Tulyakov; Patricia Vitoria; Alfredo Bochicchio; Yuanyou Li; |
141 | Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Inverse Graphics Capsule Network (IGC-Net) to learn the hierarchical 3D face representations from large-scale unlabeled images. |
Chang Yu; Xiangyu Zhu; Xiaomei Zhang; Zhaoxiang Zhang; Zhen Lei; |
142 | Connecting The Dots: Floorplan Reconstruction Using Two-Level Queries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address 2D floorplan reconstruction from 3D scans. |
Yuanwen Yue; Theodora Kontogianni; Konrad Schindler; Francis Engelmann; |
143 | Analyzing and Diagnosing Pose Estimation With Attributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Pose Integrated Gradient (PoseIG), the first interpretability technique designed for pose estimation. |
Qiyuan He; Linlin Yang; Kerui Gu; Qiuxia Lin; Angela Yao; |
144 | Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We experimentally find that the root lies in two kinds of ambiguities: (1) Selection ambiguity that selected pseudo labels are less accurate, since classification scores cannot properly represent the localization quality. (2) Assignment ambiguity that samples are matched with improper labels in pseudo-label assignment, as the strategy is misguided by missed objects and inaccurate pseudo boxes. To tackle these problems, we propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors. |
Chang Liu; Weiming Zhang; Xiangru Lin; Wei Zhang; Xiao Tan; Junyu Han; Xiaomao Li; Errui Ding; Jingdong Wang; |
145 | Scalable, Detailed and Mask-Free Universal Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce SDM-UniPS, a groundbreaking Scalable, Detailed, Mask-free, and Universal Photometric Stereo network. |
Satoshi Ikehata; |
146 | Towards High-Quality and Efficient Video Super-Resolution Via Spatial-Temporal Data Overfitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to a minimum. |
Gen Li; Jie Ji; Minghai Qin; Wei Niu; Bin Ren; Fatemeh Afghah; Linke Guo; Xiaolong Ma; |
147 | Make-a-Story: Visual Memory Conditioned Consistent Story Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address the aforementioned challenges and propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context across the generated frames. |
Tanzila Rahman; Hsin-Ying Lee; Jian Ren; Sergey Tulyakov; Shweta Mahajan; Leonid Sigal; |
148 | BiFormer: Vision Transformer With Bi-Level Routing Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. |
Lei Zhu; Xinjiang Wang; Zhanghan Ke; Wayne Zhang; Rynson W.H. Lau; |
149 | Masked Autoencoders Enable Efficient Knowledge Distillers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper studies the potential of distilling knowledge from pre-trained models, especially Masked Autoencoders. |
Yutong Bai; Zeyu Wang; Junfei Xiao; Chen Wei; Huiyu Wang; Alan L. Yuille; Yuyin Zhou; Cihang Xie; |
150 | TinyMIM: An Empirical Study of Distilling MIM Pre-Trained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. |
Sucheng Ren; Fangyun Wei; Zheng Zhang; Han Hu; |
151 | Persistent Nature: A Generative Model of Unbounded 3D Worlds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the task of unconditionally synthesizing unbounded nature scenes, enabling arbitrarily large camera motion while maintaining a persistent 3D world model. |
Lucy Chai; Richard Tucker; Zhengqi Li; Phillip Isola; Noah Snavely; |
152 | OneFormer: One Transformer To Rule Universal Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. |
Jitesh Jain; Jiachen Li; Mang Tik Chiu; Ali Hassani; Nikita Orlov; Humphrey Shi; |
153 | Hierarchical Neural Memory Network for Low Latency Event Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a low latency neural network architecture for event-based dense prediction tasks. |
Ryuhei Hamaguchi; Yasutaka Furukawa; Masaki Onishi; Ken Sakurada; |
154 | Finding Geometric Models By Clustering in The Consensus Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. |
Daniel Barath; Denys Rozumnyi; Ivan Eichhardt; Levente Hajder; Jiri Matas; |
155 | Leapfrog Diffusion Model for Stochastic Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To resolve the dilemma, we present LEapfrog Diffusion model (LED), a novel diffusion-based trajectory prediction model, which provides real-time, precise, and diverse predictions. |
Weibo Mao; Chenxin Xu; Qi Zhu; Siheng Chen; Yanfeng Wang; |
156 | DaFKD: Domain-Aware Federated Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new perspective that treats the local data in each client as a specific domain and design a novel domain knowledge aware federated distillation method, dubbed DaFKD, that can discern the importance of each model to the distillation sample, and thus is able to optimize the ensemble of soft predictions from diverse models. |
Haozhao Wang; Yichen Li; Wenchao Xu; Ruixuan Li; Yufeng Zhan; Zhigang Zeng; |
157 | GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. |
Chuwei Luo; Changxu Cheng; Qi Zheng; Cong Yao; |
158 | Class-Incremental Exemplar Compression for Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an adaptive mask generation model called class-incremental masking (CIM) to explicitly resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to 0-1 masks with an arbitrary threshold leads to a trade-off between the coverage on discriminative pixels and the quantity of exemplars, as the total memory is fixed; and 2) optimal thresholds vary for different object classes, which is particularly obvious in the dynamic environment of CIL. |
Zilin Luo; Yaoyao Liu; Bernt Schiele; Qianru Sun; |
159 | Boost Vision Transformer With GPU-Friendly Sparsity and Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper thoroughly designs a compression scheme to maximally utilize the GPU-friendly 2:4 fine-grained structured sparsity and quantization. |
Chong Yu; Tao Chen; Zhongxue Gan; Jiayuan Fan; |
160 | Spectral Bayesian Uncertainty for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to quantify spectral Bayesian uncertainty in image SR. |
Tao Liu; Jun Cheng; Shan Tan; |
161 | Behind The Scenes: Density Fields for Single View Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an alternative, we propose to predict an implicit density field from a single image. |
Felix Wimbauer; Nan Yang; Christian Rupprecht; Daniel Cremers; |
162 | StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there remains a challenge in controlling the hallucinations to accurately transfer hairstyle and preserve the face shape and identity of the input. To overcome this, we propose a multi-view optimization framework that uses "two different views" of reference composites to semantically guide occluded or ambiguous regions. |
Sasikarn Khwanmuang; Pakkapon Phongthawee; Patsorn Sangkloy; Supasorn Suwajanakorn; |
163 | Resource-Efficient RGBD Aerial Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper, we explore RGBD aerial tracking in an overhead space, which can greatly enlarge the development of drone-based visual perception. |
Jinyu Yang; Shang Gao; Zhe Li; Feng Zheng; Aleš Leonardis; |
164 | Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts and engages mutual information objectively to facilitate useful motion information disentanglement. |
Runyang Feng; Yixing Gao; Xueqing Ma; Tze Ho Elden Tse; Hyung Jin Chang; |
165 | Bilateral Memory Consolidation for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the way our brains constantly rewrite and consolidate past recollections, we propose a novel Bilateral Memory Consolidation (BiMeCo) framework that focuses on enhancing memory interaction capabilities. |
Xing Nie; Shixiong Xu; Xiyan Liu; Gaofeng Meng; Chunlei Huo; Shiming Xiang; |
166 | SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time, we study the potential of leveraging synthetic visual data for VSR. |
Xubo Liu; Egor Lakomkin; Konstantinos Vougioukas; Pingchuan Ma; Honglie Chen; Ruiming Xie; Morrie Doulaty; Niko Moritz; Jachym Kolar; Stavros Petridis; Maja Pantic; Christian Fuegen; |
167 | BiasBed – Rigorous Texture Bias Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate difficulties and limitations when training networks with reduced texture bias. |
Nikolai Kalischek; Rodrigo Caye Daudt; Torben Peters; Reinhard Furrer; Jan D. Wegner; Konrad Schindler; |
168 | Open-Category Human-Object Interaction Pre-Training Via Language Modeling Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current methods trained from closed-set data predict HOIs as fixed-dimension logits, which restricts their scalability to open-set categories. To address this issue, we introduce OpenCat, a language modeling framework that reformulates HOI prediction as sequence generation. |
Sipeng Zheng; Boshen Xu; Qin Jin; |
169 | SFD2: Semantic-Guided Feature Detection and Description Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose to extract globally reliable features by implicitly embedding high-level semantics into both the detection and description processes. |
Fei Xue; Ignas Budvytis; Roberto Cipolla; |
170 | Search-Map-Search: A Frame Selection Paradigm for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the limitations of existing methods, we propose a Search-Map-Search learning paradigm which combines the advantages of heuristic search and supervised learning to select the best combination of frames from a video as one entity. |
Mingjun Zhao; Yakun Yu; Xiaoli Wang; Lei Yang; Di Niu; |
171 | Uncovering The Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This limitation inevitably hinders the accuracy of trajectory prediction. To address this issue, our paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously. |
Yi Xu; Armin Bazarjani; Hyung-gun Chi; Chiho Choi; Yun Fu; |
172 | CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). |
Aneeshan Sain; Ayan Kumar Bhunia; Pinaki Nath Chowdhury; Subhadeep Koley; Tao Xiang; Yi-Zhe Song; |
173 | FlexiViT: One Model for All Patch Sizes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. |
Lucas Beyer; Pavel Izmailov; Alexander Kolesnikov; Mathilde Caron; Simon Kornblith; Xiaohua Zhai; Matthias Minderer; Michael Tschannen; Ibrahim Alabdulmohsin; Filip Pavetic; |
174 | RIAV-MVS: Recurrent-Indexing An Asymmetric Volume for Multi-View Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a learning-based method for multi-view depth estimation from posed images. |
Changjiang Cai; Pan Ji; Qingan Yan; Yi Xu; |
175 | Structured Kernel Estimation for Photon-Limited Deconvolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new blur estimation technique customized for photon-limited conditions. |
Yash Sanghvi; Zhiyuan Mao; Stanley H. Chan; |
176 | Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle supervised anomaly detection, i.e., we learn AD models using a few available anomalies with the objective to detect both the seen and unseen anomalies. |
Xincheng Yao; Ruoqi Li; Jing Zhang; Jun Sun; Chongyang Zhang; |
177 | 3D Video Loops From Asynchronous Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step forward and propose a practical solution that enables an immersive experience on dynamic 3D looping scenes. |
Li Ma; Xiaoyu Li; Jing Liao; Pedro V. Sander; |
178 | Style Projected Clustering for Domain Generalized Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to existing methods, we instead utilize the difference between images to build a better representation space, where the distinct style features are extracted and stored as the bases of representation. Then, the generalization to unseen image styles is achieved by projecting features to this known space. |
Wei Huang; Chang Chen; Yong Li; Jiacheng Li; Cheng Li; Fenglong Song; Youliang Yan; Zhiwei Xiong; |
179 | DIP: Dual Incongruity Perceiving Network for Sarcasm Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from other multi-modal tasks, for the sarcastic data, there exists intrinsic incongruity between a pair of image and text as demonstrated in psychological theories. To tackle this issue, we propose a Dual Incongruity Perceiving (DIP) network consisting of two branches to mine the sarcastic information from factual and affective levels. |
Changsong Wen; Guoli Jia; Jufeng Yang; |
180 | Frame Interpolation Transformer and Uncertainty Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we are bridging the gap towards video production with a novel transformer-based interpolation network architecture capable of estimating the expected error together with the interpolated frame. |
Markus Plack; Karlis Martins Briedis; Abdelaziz Djelouah; Matthias B. Hullin; Markus Gross; Christopher Schroers; |
181 | Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, two knotty obstacles limit the practicability of current SGG methods in real-world scenarios: 1) training SGG models requires time-consuming ground-truth annotations, and 2) the closed-set object categories make the SGG models limited in their ability to recognize novel objects outside of training corpora. To address these issues, we novelly exploit a powerful pre-trained visual-semantic space (VSS) to trigger language-supervised and open-vocabulary SGG in a simple yet effective manner. |
Yong Zhang; Yingwei Pan; Ting Yao; Rui Huang; Tao Mei; Chang-Wen Chen; |
182 | VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address semantic segmentation of a typical VG, i.e., roughcast floorplans with bare wall structures, whose output can be directly used for further applications like interior furnishing and room space modeling. |
Bingchen Yang; Haiyong Jiang; Hao Pan; Jun Xiao; |
183 | Neural Preset for Color Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. |
Zhanghan Ke; Yuhao Liu; Lei Zhu; Nanxuan Zhao; Rynson W.H. Lau; |
184 | DeCo: Decomposition and Reconstruction for Compositional Temporal Grounding Via Coarse-To-Fine Contrastive Ranking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to learn a coarse-to-fine compositional representation by decomposing the original query sentence into different granular levels, and then learning the correct correspondences between the video and recombined queries through a contrastive ranking constraint. |
Lijin Yang; Quan Kong; Hsuan-Kung Yang; Wadim Kehl; Yoichi Sato; Norimasa Kobori; |
185 | Dynamic Aggregated Network for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new perspective that actual gait features include global motion patterns in multiple key regions, and each global motion pattern is composed of a series of local motion patterns. |
Kang Ma; Ying Fu; Dezhi Zheng; Chunshui Cao; Xuecai Hu; Yongzhen Huang; |
186 | Wavelet Diffusion Models Are Fast and Scalable Image Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme. |
Hao Phung; Quan Dao; Anh Tran; |
187 | PA&DA: Jointly Sampling Path and Data for Consistent NAS Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: And we further find that large gradient variance occurs during supernet training, which degrades the supernet ranking consistency. To mitigate this issue, we propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta (PA&DA). |
Shun Lu; Yu Hu; Longxing Yang; Zihao Sun; Jilin Mei; Jianchao Tan; Chengru Song; |
188 | Sphere-Guided Training of Neural Implicit Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These methods, however, apply the ray marching procedure for the entire scene volume, leading to reduced sampling efficiency and, as a result, lower reconstruction quality in the areas of high-frequency details. In this work, we address this problem via joint training of the implicit function and our new coarse sphere-based surface reconstruction. |
Andreea Dogaru; Andrei-Timotei Ardelean; Savva Ignatyev; Egor Zakharov; Evgeny Burnaev; |
189 | 3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we find that the inherently hierarchical structures of physical space in 3D scenes aid in the automatic association of semantic and spatial arrangements, specifying clear patterns and leading to less ambiguous predictions. |
Mingtao Feng; Haoran Hou; Liang Zhang; Zijie Wu; Yulan Guo; Ajmal Mian; |
190 | Extracting Motion and Appearance Via Inter-Frame Attention for Efficient Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new module to explicitly extract motion and appearance information via a unified operation. |
Guozhen Zhang; Yuhan Zhu; Haonan Wang; Youxin Chen; Gangshan Wu; Limin Wang; |
191 | Bias Mimicking: A Simple Sampling Approach for Bias Mitigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, Undersampling drops a significant part of the input distribution per epoch while Oversampling repeats samples, causing overfitting. To address these shortcomings, we introduce a new class-conditioned sampling method: Bias Mimicking. |
Maan Qraitem; Kate Saenko; Bryan A. Plummer; |
192 | ViTs for SITS: Vision Transformers for Satellite Image Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). |
Michail Tarasiou; Erik Chavez; Stefanos Zafeiriou; |
193 | NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of tuning the quantizer to better fit the complicated activation distribution, this paper proposes NoisyQuant, a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers. |
Yijiang Liu; Huanrui Yang; Zhen Dong; Kurt Keutzer; Li Du; Shanghang Zhang; |
194 | Semi-Supervised Stereo-Based 3D Object Detection Via Cross-View Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to achieve semi-supervised learning for stereo-based 3D object detection through pseudo annotation generation from a temporal-aggregated teacher model, which temporally accumulates knowledge from a student model. |
Wenhao Wu; Hau San Wong; Si Wu; |
195 | Minimizing Maximum Model Discrepancy for Transferable Black-Box Targeted Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the black-box targeted attack problem from the model discrepancy perspective. |
Anqi Zhao; Tong Chu; Yahao Liu; Wen Li; Jingjing Li; Lixin Duan; |
196 | Efficient Loss Function By Minimizing The Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Correspondingly, we propose an efficient loss function by minimizing the detrimental impact of the floating-point errors on the attacks. |
Yunrui Yu; Cheng-Zhong Xu; |
197 | BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel bundle adjusted deblur Neural Radiance Fields (BAD-NeRF), which can be robust to severe motion blurred images and inaccurate camera poses. |
Peng Wang; Lingzhe Zhao; Ruijie Ma; Peidong Liu; |
198 | Video Compression With Entropy-Constrained Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel convolutional architecture for video representation that better represents spatio-temporal information and a training strategy capable of jointly optimizing rate and distortion. |
Carlos Gomes; Roberto Azevedo; Christopher Schroers; |
199 | Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre training paradigms for better few-shot learning. |
Renrui Zhang; Xiangfei Hu; Bohao Li; Siyuan Huang; Hanqiu Deng; Yu Qiao; Peng Gao; Hongsheng Li; |
200 | Deep Random Projector: Accelerated Deep Image Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on IR, and propose two crucial modifications to DIP that help achieve substantial speedup: 1) optimizing the DIP seed while freezing randomly-initialized network weights, and 2) reducing the network depth. |
Taihui Li; Hengkang Wang; Zhong Zhuang; Ju Sun; |
201 | SCPNet: Semantic Scene Completion on Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the above-mentioned problems, we propose the following three solutions: 1) Redesigning the completion network. |
Zhaoyang Xia; Youquan Liu; Xin Li; Xinge Zhu; Yuexin Ma; Yikang Li; Yuenan Hou; Yu Qiao; |
202 | Revisiting Prototypical Network for Cross Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, its performance drops dramatically when generalizing to the FSC tasks in new domains. In this study, we revisit this problem and argue that the devil lies in the simplicity bias pitfall in neural networks. |
Fei Zhou; Peng Wang; Lei Zhang; Wei Wei; Yanning Zhang; |
203 | QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, there is an inherent asynchronous relationship between human speech and gestures. To tackle these challenges, we introduce a novel quantization-based and phase-guided motion matching framework. |
Sicheng Yang; Zhiyong Wu; Minglei Li; Zhensong Zhang; Lei Hao; Weihong Bao; Haolin Zhuang; |
204 | Multiscale Tensor Decomposition and Rendering Equation Encoding for View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to further advance the quality of view rendering by proposing a novel approach dubbed the neural radiance feature field (NRFF). |
Kang Han; Wei Xiang; |
205 | NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, essential desiderata for models are to be data-efficient, generalize to different data distributions and tasks with unseen semantic forms, as well as ground complex language semantics (e.g., view-point anchoring and multi-object reference). To address these challenges, we propose NS3D, a neuro-symbolic framework for 3D grounding. |
Joy Hsu; Jiayuan Mao; Jiajun Wu; |
206 | Learning Accurate 3D Shape Based on Stereo Polarimetric Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The accuracy of existing SfP methods is affected by two main problems. First, the ambiguity of polarization cues partially results in false normal estimation. Second, the widely-used assumption about orthographic projection is too ideal. To solve these problems, we propose the first approach that combines deep learning and stereo polarization information to recover not only normal but also disparity. |
Tianyu Huang; Haoang Li; Kejing He; Congying Sui; Bin Li; Yun-Hui Liu; |
207 | VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we present a dual masking strategy for efficient pre-training, with an encoder operating on a subset of video tokens and a decoder processing another subset of video tokens. |
Limin Wang; Bingkun Huang; Zhiyu Zhao; Zhan Tong; Yinan He; Yi Wang; Yali Wang; Yu Qiao; |
208 | GANmouflage: 3D Object Nondetection With Texture Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method that learns to camouflage 3D objects within scenes. |
Rui Guo; Jasmine Collins; Oscar de Lima; Andrew Owens; |
209 | Perception and Semantic Aware Regularization for Sequential Confidence Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we find tokens/sequences with high perception and semantic correlations with the target ones contain more correlated and effective information and thus facilitate more effective regularization. |
Zhenghua Peng; Yu Luo; Tianshui Chen; Keke Xu; Shuangping Huang; |
210 | Revisiting Residual Networks for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, little attention was devoted to analyzing the role of architectural elements (e.g., topology, depth, and width) on adversarial robustness. This paper seeks to bridge this gap and present a holistic study on the impact of architectural design on adversarial robustness. |
Shihua Huang; Zhichao Lu; Kalyanmoy Deb; Vishnu Naresh Boddeti; |
211 | Vision Transformer With Super Token Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A challenge then arises: can we access efficient and effective global context modeling at the early stages of a neural network? To address this issue, we draw inspiration from the design of superpixels, which reduces the number of image primitives in subsequent processing, and introduce super tokens into vision transformer. |
Huaibo Huang; Xiaoqiang Zhou; Jie Cao; Ran He; Tieniu Tan; |
212 | RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel and efficient framework: Retrieval Augmented Contrastive Language-Image Pre-training (RA-CLIP) to augment embeddings by online retrieval. |
Chen-Wei Xie; Siyang Sun; Xiong Xiong; Yun Zheng; Deli Zhao; Jingren Zhou; |
213 | PosterLayout: A New Benchmark and Approach for Content-Aware Visual-Textual Presentation Layout Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since content-aware visual-textual presentation layout is a novel task, we first construct a new dataset named PKU PosterLayout, which consists of 9,974 poster-layout pairs and 905 images, i.e., non-empty canvases. It is more challenging and useful for greater layout variety, domain diversity, and content diversity. Then, we propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers, and a novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts. |
Hsiao Yuan Hsu; Xiangteng He; Yuxin Peng; Hao Kong; Qing Zhang; |
214 | A Practical Upper Bound for The Worst-Case Attribution Deviations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region while the classification results remain the same. |
Fan Wang; Adams Wai-Kin Kong; |
215 | A General Regret Bound of Preconditioned Gradient Method for DNN Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a general regret bound with a constrained full-matrix preconditioned gradient and show that the updating formula of the preconditioner can be derived by solving a cone-constrained optimization problem. |
Hongwei Yong; Ying Sun; Lei Zhang; |
216 | Teacher-Generated Spatial-Attention Labels Boost Robustness and Accuracy of Contrastive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, col-lecting such large scale data is very expensive. To addressthis challenge, we construct an auxiliary teacher model topredict human attention, trained on a relatively small la-beled dataset. |
Yushi Yao; Chang Ye; Junfeng He; Gamaleldin F. Elsayed; |
217 | Exploring and Exploiting Uncertainty for Incomplete Multi-View Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To explore and exploit the uncertainty, we propose an Uncertainty-induced Incomplete Multi-View Data Classification (UIMC) model to classify the incomplete multi-view data under a stable and reliable framework. |
Mengyao Xie; Zongbo Han; Changqing Zhang; Yichen Bai; Qinghua Hu; |
218 | Vid2Seq: Large-Scale Pretraining of A Visual Language Model for Dense Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning model pretrained on narrated videos which are readily-available at scale. |
Antoine Yang; Arsha Nagrani; Paul Hongsuck Seo; Antoine Miech; Jordi Pont-Tuset; Ivan Laptev; Josef Sivic; Cordelia Schmid; |
219 | Optimal Proposal Learning for Deployable End-to-End Pedestrian Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Though a few methods have been explored, most of them still suffer from longer training time and more complex deployment, which cannot be deployed in the actual industrial applications. In this paper, we intend to bridge this gap and propose an Optimal Proposal Learning (OPL) framework for deployable end-to-end pedestrian detection. |
Xiaolin Song; Binghui Chen; Pengyu Li; Jun-Yan He; Biao Wang; Yifeng Geng; Xuansong Xie; Honggang Zhang; |
220 | Discovering The Real Association: Multimodal Causal Reasoning in Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our work, we investigate relational structure from a causal representation perspective on multimodal data and propose a novel inference framework. |
Chuanqi Zang; Hanqing Wang; Mingtao Pei; Wei Liang; |
221 | Temporal Interpolation Is All You Need for Dynamic Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method to train spatiotemporal neural radiance fields of dynamic scenes based on temporal interpolation of feature vectors. |
Sungheon Park; Minjung Son; Seokhwan Jang; Young Chun Ahn; Ji-Yeon Kim; Nahyup Kang; |
222 | Graph Transformer GANs for Graph-Constrained House Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task. |
Hao Tang; Zhenyu Zhang; Humphrey Shi; Bo Li; Ling Shao; Nicu Sebe; Radu Timofte; Luc Van Gool; |
223 | On The Benefits of 3D Pose and Tracking for Human Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we study the benefits of using tracking and 3D poses for action recognition. |
Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Christoph Feichtenhofer; Jitendra Malik; |
224 | How to Backdoor Diffusion Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. |
Sheng-Yen Chou; Pin-Yu Chen; Tsung-Yi Ho; |
225 | ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model With Knowledge-Enhanced Mixture-of-Denoising-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose ERNIE-ViLG 2.0, a large-scale Chinese text-to-image diffusion model, to progressively upgrade the quality of generated images by: (1) incorporating fine-grained textual and visual knowledge of key elements in the scene, and (2) utilizing different denoising experts at different denoising stages. |
Zhida Feng; Zhenyu Zhang; Xintong Yu; Yewei Fang; Lanxin Li; Xuyi Chen; Yuxiang Lu; Jiaxiang Liu; Weichong Yin; Shikun Feng; Yu Sun; Li Chen; Hao Tian; Hua Wu; Haifeng Wang; |
226 | PACO: Parts and Attributes of Common Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we introduce PACO: Parts and Attributes of Common Objects. |
Vignesh Ramanathan; Anmol Kalia; Vladan Petrovic; Yi Wen; Baixue Zheng; Baishan Guo; Rui Wang; Aaron Marquez; Rama Kovvuri; Abhishek Kadian; Amir Mousavi; Yiwen Song; Abhimanyu Dubey; Dhruv Mahajan; |
227 | Learning Transformations To Reduce The Geometric Shift in Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, by contrast, we tackle geometric shifts emerging from variations in the image capture process, or due to the constraints of the environment causing differences in the apparent geometry of the content itself. We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts without leveraging any labeled data in the new domain, nor any information about the cameras. |
Vidit Vidit; Martin Engilberge; Mathieu Salzmann; |
228 | OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present OReX, a method for 3D shape reconstruction from slices alone, featuring a Neural Field as the interpolation prior. |
Haim Sawdayee; Amir Vaxman; Amit H. Bermano; |
229 | SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting With Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In 3D, solutions must be both consistent across multiple views and geometrically valid. In this paper, we propose a novel 3D inpainting method that addresses these challenges. |
Ashkan Mirzaei; Tristan Aumentado-Armstrong; Konstantinos G. Derpanis; Jonathan Kelly; Marcus A. Brubaker; Igor Gilitschenski; Alex Levinshtein; |
230 | Revisiting The Stack-Based Inverse Tone Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the stack-based ITM approaches and propose a novel method to reconstruct HDR radiance from a single image, which only needs to estimate two exposure images. |
Ning Zhang; Yuyao Ye; Yang Zhao; Ronggang Wang; |
231 | Revisiting Rotation Averaging: Uncertainties and Robust Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the rotation averaging problem applied in global Structure-from-Motion pipelines. |
Ganlin Zhang; Viktor Larsson; Daniel Barath; |
232 | Continuous Sign Language Recognition With Correlation Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current methods in continuous sign language recognition(CSLR) usually process frames independently to capture frame-wise features, thus failing to capture cross-frame trajectories to effectively identify a sign. To handle this limitation, we propose correlation network (CorrNet) to explicitly leverage body trajectories across frames to identify signs. |
Lianyu Hu; Liqing Gao; Zekang Liu; Wei Feng; |
233 | A Simple Framework for Text-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that a vanilla contrastive language-image pre-training (CLIP) model is an effective text-supervised semantic segmentor by itself. |
Muyang Yi; Quan Cui; Hao Wu; Cheng Yang; Osamu Yoshie; Hongtao Lu; |
234 | Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As the pseudo labels play a crucial role, we propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training. |
Chen Zhang; Guorong Li; Yuankai Qi; Shuhui Wang; Laiyun Qing; Qingming Huang; Ming-Hsuan Yang; |
235 | PlenVDB: Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new representation for neural radiance fields that accelerates both the training and the inference processes with VDB, a hierarchical data structure for sparse volumes. |
Han Yan; Celong Liu; Chao Ma; Xing Mei; |
236 | Patch-Based 3D Natural Scene Generation From A Single Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We target a 3D generative model for general natural scenes that are typically unique and intricate. |
Weiyu Li; Xuelin Chen; Jue Wang; Baoquan Chen; |
237 | Full or Weak Annotations? An Adaptive Strategy for Budget-Constrained Annotation Campaigns Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel approach to determine annotation strategies for segmentation datasets, whereby estimating what proportion of segmentation and classification annotations should be collected given a fixed budget. |
Javier Gamazo Tejero; Martin S. Zinkernagel; Sebastian Wolf; Raphael Sznitman; Pablo Márquez-Neila; |
238 | Leveraging Hidden Positives for Unsupervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavating hidden positives to learn rich semantic relationships and ensure semantic consistency in local regions. |
Hyun Seok Seong; WonJun Moon; SuBeen Lee; Jae-Pil Heo; |
239 | Backdoor Defense Via Deconfounded Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the causal understanding, we propose the Causality-inspired Backdoor Defense (CBD), to learn deconfounded representations by employing the front-door adjustment. |
Zaixi Zhang; Qi Liu; Zhicai Wang; Zepu Lu; Qingyong Hu; |
240 | LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a novel method called LG-BPN for self-supervised real-world denoising, which takes the spatial correlation statistic into our network design for local detail restoration, and also brings the long-range dependencies modeling ability to previously CNN-based BSN methods. |
Zichun Wang; Ying Fu; Ji Liu; Yulun Zhang; |
241 | Efficient View Synthesis and 3D-Based Multi-Frame Denoising With Multiplane Feature Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While current multi-frame restoration methods combine information from multiple input images using 2D alignment techniques, recent advances in novel view synthesis are paving the way for a new paradigm relying on volumetric scene representations. In this work, we introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements. |
Thomas Tanay; Aleš Leonardis; Matteo Maggioni; |
242 | An Actor-Centric Causality Graph for Asynchronous Temporal Inference in Group Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Actor-Centric Causality Graph Model, which learns the asynchronous temporal causality relation with three modules, i.e., an asynchronous temporal causality relation detection module, a causality feature fusion module, and a causality relation graph inference module. |
Zhao Xie; Tian Gao; Kewei Wu; Jiao Chang; |
243 | Color Backdoor: A Robust Poisoning Attack in Color Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel color backdoor attack, which can exhibit robustness and stealthiness at the same time. |
Wenbo Jiang; Hongwei Li; Guowen Xu; Tianwei Zhang; |
244 | HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the challenging problem of learning-based single-view 3D hair modeling. |
Yujian Zheng; Zirong Jin; Moran Li; Haibin Huang; Chongyang Ma; Shuguang Cui; Xiaoguang Han; |
245 | MoDAR: Using Motion Forecasting for 3D Object Detection in Point Cloud Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MoDAR, using motion forecasting outputs as a type of virtual modality, to augment LiDAR point clouds. |
Yingwei Li; Charles R. Qi; Yin Zhou; Chenxi Liu; Dragomir Anguelov; |
246 | How You Feelin’? Learning Emotions and Mental States in Movie Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. |
Dhruv Srivastava; Aditya Kumar Singh; Makarand Tapaswi; |
247 | Dynamic Inference With Grounding Based Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, there exists a large amount of computational redundancy in these large models which skips their run-time efficiency. To address this problem, we propose dynamic inference for grounding based vision and language models conditioned on the input image-text pair. |
Burak Uzkent; Amanmeet Garg; Wentao Zhu; Keval Doshi; Jingru Yi; Xiaolong Wang; Mohamed Omar; |
248 | ALSO: Automotive Lidar Self-Supervision By Occupancy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. |
Alexandre Boulch; Corentin Sautier; Björn Michele; Gilles Puy; Renaud Marlet; |
249 | Connecting Vision and Language With Video Localized Narratives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language. |
Paul Voigtlaender; Soravit Changpinyo; Jordi Pont-Tuset; Radu Soricut; Vittorio Ferrari; |
250 | Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the training samples are usually limited, while the modality gaps are too large, which leads that the existing methods cannot effectively mine diverse cross-modality clues. To handle this limitation, we propose a novel augmentation network in the embedding space, called diverse embedding expansion network (DEEN). |
Yukang Zhang; Hanzi Wang; |
251 | Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel compact un-transferable isolation domain (CUTI-domain), which acts as a model barrier to block illegal transferring from the authorized domain to the unauthorized domain. |
Lianyu Wang; Meng Wang; Daoqiang Zhang; Huazhu Fu; |
252 | Object Detection With Self-Supervised Scene Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel method to improve the performance of a trained object detector on scenes with fixed camera perspectives based on self-supervised adaptation. |
Zekun Zhang; Minh Hoai; |
253 | Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the specific textual knowledge has worse generalizable to the unseen classes because it forgets the essential general textual knowledge having a strong generalization ability. To tackle this issue, we introduce a novel Knowledge-guided Context Optimization (KgCoOp) to enhance the generalization ability of the learnable prompt for unseen classes. |
Hantao Yao; Rui Zhang; Changsheng Xu; |
254 | Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we use a transformer to aggregate frame-level features for video representation and use a pre-trained text encoder to encode the texts corresponding to each action and the whole video, respectively. To model the correspondence between text and video, we propose a multiple granularity loss, where the video-paragraph contrastive loss enforces matching between the whole video and the complete script, and a fine-grained frame-sentence contrastive loss enforces the matching between each action and its description. |
Sixun Dong; Huazhang Hu; Dongze Lian; Weixin Luo; Yicheng Qian; Shenghua Gao; |
255 | Self-Positioning Point-Based Transformer for Point Cloud Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity. |
Jinyoung Park; Sanghyeok Lee; Sihyeon Kim; Yunyang Xiong; Hyunwoo J. Kim; |
256 | Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing works hold an impractical assumption that the novel class distribution prior is uniform, yet neglect the imbalanced nature of real-world data. In this paper, we relax this assumption by proposing a new challenging task: distribution-agnostic NCD, which allows data drawn from arbitrary unknown class distributions and thus renders existing methods useless or even harmful. |
Muli Yang; Liancheng Wang; Cheng Deng; Hanwang Zhang; |
257 | Learning To Generate Image Embeddings With User-Level Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve user-level DP for large image-to-embedding feature extractors, we propose DP-FedEmb, a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in datacenter. |
Zheng Xu; Maxwell Collins; Yuxiao Wang; Liviu Panait; Sewoong Oh; Sean Augenstein; Ting Liu; Florian Schroff; H. Brendan McMahan; |
258 | Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. |
Jiarui Xu; Sifei Liu; Arash Vahdat; Wonmin Byeon; Xiaolong Wang; Shalini De Mello; |
259 | Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of open-vocabulary semantic segmentation (OVS), which aims to segment objects of arbitrary classes instead of pre-defined, closed-set categories. |
Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng; Yi Wang; Yu Qiao; Weidi Xie; |
260 | Learning Dynamic Style Kernels for Artistic Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To further enhance the flexibility of our style transfer method, we propose a Style Alignment Encoding (SAE) module complemented with a Content-based Gating Modulation (CGM) module for learning the dynamic style kernels in focusing regions. |
Wenju Xu; Chengjiang Long; Yongwei Nie; |
261 | DeepLSD: Line Segment Detection and Refinement With Deep Image Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to combine traditional and learned approaches to get the best of both worlds: an accurate and robust line detector that can be trained in the wild without ground truth lines. |
Rémi Pautrat; Daniel Barath; Viktor Larsson; Martin R. Oswald; Marc Pollefeys; |
262 | OcTr: Octree-Based Transformer for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Albeit recent efforts made by Transformers with the long sequence modeling capability, they fail to properly balance the accuracy and efficiency, suffering from inadequate receptive fields or coarse-grained holistic correlations. In this paper, we propose an Octree-based Transformer, named OcTr, to address this issue. |
Chao Zhou; Yanan Zhang; Jiaxin Chen; Di Huang; |
263 | Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we present an audio-visual deep reinforcement learning approach that works with our shared scene mapper to selectively turn on the camera to efficiently chart out the space. |
Sagnik Majumder; Hao Jiang; Pierre Moulon; Ethan Henderson; Paul Calamia; Kristen Grauman; Vamsi Krishna Ithapu; |
264 | Learning Distortion Invariant Representation for Image Restoration From A Causality Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we are the first to propose a novel training strategy for image restoration from the causality perspective, to improve the generalization ability of DNNs for unknown degradations. |
Xin Li; Bingchen Li; Xin Jin; Cuiling Lan; Zhibo Chen; |
265 | MOT: Masked Optimal Transport for Partial Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on the rigorous OT modeling for conditional distribution matching and label shift correction. |
You-Wei Luo; Chuan-Xian Ren; |
266 | Executing Your Commands Via Motion Diffusion in Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. |
Xin Chen; Biao Jiang; Wen Liu; Zilong Huang; Bin Fu; Tao Chen; Gang Yu; |
267 | GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations? To answer that, we introduce a point cloud representation learning framework, based on geometric feature reconstruction. |
Xiaoyu Tian; Haoxi Ran; Yue Wang; Hang Zhao; |
268 | Learning Conditional Attributes for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a solution, we provide analysis and argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings by a proposed attribute learning framework containing an attribute hyper learner and an attribute base learner. |
Qingsheng Wang; Lingqiao Liu; Chenchen Jing; Hao Chen; Guoqiang Liang; Peng Wang; Chunhua Shen; |
269 | Complete 3D Human Reconstruction From A Single Incomplete Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a method to reconstruct a complete human geometry and texture from an image of a person with only partial body observed, e.g., a torso. |
Junying Wang; Jae Shin Yoon; Tuanfeng Y. Wang; Krishna Kumar Singh; Ulrich Neumann; |
270 | PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the former requires time-consuming sampling while the latter introduces quantization errors. In this paper, we present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD) that takes advantage of these two representations. |
Honghui Yang; Wenxiao Wang; Minghao Chen; Binbin Lin; Tong He; Hua Chen; Xiaofei He; Wanli Ouyang; |
271 | Adaptive Human Matting for Dynamic Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the latest tripmap-free methods showing promising results, their performance often degrades when dealing with highly diverse and unstructured videos. We address this limitation by introducing Adaptive Matting for Dynamic Videos, termed AdaM, which is a framework designed for simultaneously differentiating foregrounds from backgrounds and capturing alpha matte details of human subjects in the foreground. |
Chung-Ching Lin; Jiang Wang; Kun Luo; Kevin Lin; Linjie Li; Lijuan Wang; Zicheng Liu; |
272 | Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, both our preliminary investigation and recent studies suggest that they may be less effective in learning representations for fine-grained visual recognition (FGVR) since many features helpful for optimizing SSL objectives are not suitable for characterizing the subtle differences in FGVR. To overcome this issue, we propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes, dubbed as common rationales in this paper. |
Yangyang Shu; Anton van den Hengel; Lingqiao Liu; |
273 | Reconstructing Animatable Categories From Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present RAC, a method to build category-level 3D models from monocular videos, disentangling variations over instances and motion over time. |
Gengshan Yang; Chaoyang Wang; N. Dinesh Reddy; Deva Ramanan; |
274 | UDE: A Unified Driving Engine for Human Motion Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose "UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig. 1). |
Zixiang Zhou; Baoyuan Wang; |
275 | High-Fidelity 3D Human Digitization From Single 2K Resolution Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: High-quality 3D human body reconstruction requires high-fidelity and large-scale training data and appropriate network design that effectively exploits the high-resolution input images. To tackle these problems, we propose a simple yet effective 3D human digitization method called 2K2K, which constructs a large-scale 2K human dataset and infers 3D human models from 2K resolution images. |
Sang-Hun Han; Min-Gyu Park; Ju Hong Yoon; Ju-Mi Kang; Young-Jae Park; Hae-Gon Jeon; |
276 | Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This brings about model robustness defect under the condition of irrelevant images in the testing image group, which hinders the use of CoSOD models in real-world applications. To address this issue, this paper presents a group exchange-masking (GEM) strategy for robust CoSOD model learning. |
Yang Wu; Huihui Song; Bo Liu; Kaihua Zhang; Dong Liu; |
277 | Tangentially Elongated Gaussian Belief Propagation for Event-Based Incremental Optical Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose tangentially elongated Gaussian (TEG) belief propagation (BP) that realizes incremental full-flow estimation. |
Jun Nagata; Yusuke Sekikawa; |
278 | Extracting Class Activation Maps From Non-Discriminative Features As Well Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The crux behind is that the weight of the classifier (used to compute CAM) captures only the discriminative features of objects. We tackle this by introducing a new computation method for CAM that explicitly captures non-discriminative features as well, thereby expanding CAM to cover whole objects. |
Zhaozheng Chen; Qianru Sun; |
279 | BlendFields: Few-Shot Example-Driven Facial Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods are either data-driven, requiring an extensive corpus of data not publicly accessible to the research community, or fail to capture fine details because they rely on geometric face models that cannot represent fine-grained details in texture with a mesh discretization and linear deformation designed to model only a coarse face geometry. We introduce a method that bridges this gap by drawing inspiration from traditional computer graphics techniques. |
Kacper Kania; Stephan J. Garbin; Andrea Tagliasacchi; Virginia Estellers; Kwang Moo Yi; Julien Valentin; Tomasz Trzciński; Marek Kowalski; |
280 | Adaptive Sparse Pairwise Loss for Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This dense sampling mechanism inevitably introduces positive pairs that share few visual similarities, which can be harmful to the training. To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks. |
Xiao Zhou; Yujie Zhong; Zhen Cheng; Fan Liang; Lin Ma; |
281 | NeFII: Inverse Rendering for Reflectance Decomposition With Near-Field Indirect Illumination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end inverse rendering pipeline that decomposes materials and illumination from multi-view images, while considering near-field indirect illumination. |
Haoqian Wu; Zhipeng Hu; Lincheng Li; Yongqiang Zhang; Changjie Fan; Xin Yu; |
282 | Towards Professional Level Crowd Annotation of Expert Domain Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new approach, based on semi-supervised learning (SSL) and denoted as SSL with human filtering (SSL-HF) is proposed. |
Pei Wang; Nuno Vasconcelos; |
283 | Fully Self-Supervised Depth Estimation From Defocus Clue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such limitation discourages the applications of DFD methods. To tackle this issue, we propose a completely self-supervised framework that estimates depth purely from a sparse focal stack. |
Haozhe Si; Bin Zhao; Dong Wang; Yunpeng Gao; Mulin Chen; Zhigang Wang; Xuelong Li; |
284 | Semi-Weakly Supervised Object Kinematic Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle the task of object kinematic motion prediction problem in a semi-weakly supervised manner. |
Gengxin Liu; Qian Sun; Haibin Huang; Chongyang Ma; Yulan Guo; Li Yi; Hui Huang; Ruizhen Hu; |
285 | Learning A Simple Low-Light Image Enhancer From Paired Low-Light Instances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose PairLIE, an unsupervised approach that learns adaptive priors from low-light image pairs. |
Zhenqi Fu; Yan Yang; Xiaotong Tu; Yue Huang; Xinghao Ding; Kai-Kuang Ma; |
286 | Deep Stereo Video Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel deep stereo video inpainting network named SVINet, which is the first attempt for stereo video inpainting task utilizing deep convolutional neural networks. |
Zhiliang Wu; Changchang Sun; Hanyu Xuan; Yan Yan; |
287 | Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Prophet—a conceptually simple framework designed to prompt GPT-3 with answer heuristics for knowledge-based VQA. |
Zhenwei Shao; Zhou Yu; Meng Wang; Jun Yu; |
288 | IFSeg: Image-Free Semantic Segmentation Via Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations. |
Sukmin Yun; Seong Hyeon Park; Paul Hongsuck Seo; Jinwoo Shin; |
289 | Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most research has focused on improving segmentation performance for sharp clean images and the few works that deal with degradations, consider motion-blur as one of many generic degradations. In this work, we focus exclusively on motion-blur and attempt to achieve robustness for semantic segmentation in its presence. |
Aakanksha; A. N. Rajagopalan; |
290 | Progressive Open Space Expansion for Open-Set Model Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we focus on a challenging task, namely Open-Set Model Attribution (OSMA), to simultaneously attribute images to known models and identify those from unknown ones. |
Tianyun Yang; Danding Wang; Fan Tang; Xinying Zhao; Juan Cao; Sheng Tang; |
291 | Backdoor Cleansing With Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. |
Lu Pang; Tao Sun; Haibin Ling; Chao Chen; |
292 | Is BERT Blind? Exploring The Effect of Vision-and-Language Pretraining on Visual Language Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate whether vision-and-language pretraining can improve performance on text-only tasks that involve implicit visual reasoning, focusing primarily on zero-shot probing methods. |
Morris Alper; Michael Fiman; Hadar Averbuch-Elor; |
293 | PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to this perspective, the model lacks any explicit understanding of action boundaries and tends to focus only on the most discriminative parts of the video resulting in incomplete action localization. To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly. |
Mamshad Nayeem Rizve; Gaurav Mittal; Ye Yu; Matthew Hall; Sandra Sajeev; Mubarak Shah; Mei Chen; |
294 | Harmonious Feature Learning for Interactive Hand-Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Harmonious Feature Learning Network (HFL-Net). |
Zhifeng Lin; Changxing Ding; Huan Yao; Zengsheng Kuang; Shaoli Huang; |
295 | 3D GAN Inversion With Facial Symmetry Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior. |
Fei Yin; Yong Zhang; Xuan Wang; Tengfei Wang; Xiaoyu Li; Yuan Gong; Yanbo Fan; Xiaodong Cun; Ying Shan; Cengiz Oztireli; Yujiu Yang; |
296 | CLOTH4D: A Dataset for Clothed Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce CLOTH4D, a clothed human dataset containing 1,000 subjects with varied appearances, 1,000 3D outfits, and over 100,000 clothed meshes with paired unclothed humans, to fill the gap in large-scale and high-quality 4D clothing data. |
Xingxing Zou; Xintong Han; Waikeung Wong; |
297 | SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel framework built to simplify 3D asset generation for amateur users. |
Yen-Chi Cheng; Hsin-Ying Lee; Sergey Tulyakov; Alexander G. Schwing; Liang-Yan Gui; |
298 | SMAE: Few-Shot Learning for HDR Deghosting With Saturation-Aware Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR. |
Qingsen Yan; Song Zhang; Weiye Chen; Hao Tang; Yu Zhu; Jinqiu Sun; Luc Van Gool; Yanning Zhang; |
299 | Improving Generalization With Domain Convex Game Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our explorations empirically reveal that the correlation between model generalization and the diversity of domains may be not strictly positive, which limits the effectiveness of domain augmentation. This work therefore aim to guarantee and further enhance the validity of this strand. |
Fangrui Lv; Jian Liang; Shuang Li; Jinming Zhang; Di Liu; |
300 | Learning To Render Novel Views From Wide-Baseline Stereo Pairs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. |
Yilun Du; Cameron Smith; Ayush Tewari; Vincent Sitzmann; |
301 | TryOnDiffusion: A Tale of Two UNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. |
Luyang Zhu; Dawei Yang; Tyler Zhu; Fitsum Reda; William Chan; Chitwan Saharia; Mohammad Norouzi; Ira Kemelmacher-Shlizerman; |
302 | Fair Scratch Tickets: Finding Fair Sparse Networks Without Weight Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we lead a novel fairness-aware learning paradigm for in-processing methods through the lens of the lottery ticket hypothesis (LTH) in the context of computer vision fairness. |
Pengwei Tang; Wei Yao; Zhicong Li; Yong Liu; |
303 | Generative Bias for Robust Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. |
Jae Won Cho; Dong-Jin Kim; Hyeonggon Ryu; In So Kweon; |
304 | Data-Free Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches. |
Abhra Chaudhuri; Ayan Kumar Bhunia; Yi-Zhe Song; Anjan Dutta; |
305 | Multi-Object Manipulation Via Object-Centric Neural Scattering Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose using object-centric neural scattering functions (OSFs) as object representations in a model-predictive control framework. |
Stephen Tian; Yancheng Cai; Hong-Xing Yu; Sergey Zakharov; Katherine Liu; Adrien Gaidon; Yunzhu Li; Jiajun Wu; |
306 | The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales. |
Alexandros Stergiou; Dima Damen; |
307 | Invertible Neural Skinning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing reposing methods suffer from the limited expressiveness of Linear Blend Skinning (LBS), require costly mesh extraction to generate each new pose, and typically do not preserve surface correspondences across different poses. In this work, we introduce Invertible Neural Skinning (INS) to address these shortcomings. |
Yash Kant; Aliaksandr Siarohin; Riza Alp Guler; Menglei Chai; Jian Ren; Sergey Tulyakov; Igor Gilitschenski; |
308 | Weakly Supervised Semantic Segmentation Via Adversarial Learning of Classifier and Reconstructor Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issues, we propose a novel WSSS framework via adversarial learning of a classifier and an image reconstructor. |
Hyeokjun Kweon; Sung-Hoon Yoon; Kuk-Jin Yoon; |
309 | Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a step forward and try to discover and represent intrinsic physical concepts such as mass and charge. |
Qu Tang; Xiangyu Zhu; Zhen Lei; Zhaoxiang Zhang; |
310 | Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a shallow temporal aggregation module cannot well capture both local and global temporal context information in sign language. To address this dilemma, we propose a cross-temporal context aggregation (CTCA) model. |
Leming Guo; Wanli Xue; Qing Guo; Bo Liu; Kaihua Zhang; Tiantian Yuan; Shengyong Chen; |
311 | Automatic High Resolution Wire Segmentation and Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We thus propose a two-stage method that leverages both global and local context to accurately segment wires in high-resolution images efficiently, and a tile-based inpainting strategy to remove the wires given our predicted segmentation masks. |
Mang Tik Chiu; Xuaner Zhang; Zijun Wei; Yuqian Zhou; Eli Shechtman; Connelly Barnes; Zhe Lin; Florian Kainz; Sohrab Amirghodsi; Humphrey Shi; |
312 | The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the use of sparsity can decrease the model size overhead by over 327x and the computation time by 3.34x compared to SOTA while maintaining equivalent total leakage rate, 77% even with 1000 clients in aggregation. |
Joshua C. Zhao; Ahmed Roushdy Elkordy; Atul Sharma; Yahya H. Ezzeldin; Salman Avestimehr; Saurabh Bagchi; |
313 | Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data. To address these issues, we propose UDPReg, an unsupervised deep probabilistic registration framework for point clouds with partial overlaps. |
Guofeng Mei; Hao Tang; Xiaoshui Huang; Weijie Wang; Juan Liu; Jian Zhang; Luc Van Gool; Qiang Wu; |
314 | Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore multi-modal correlations derived from large-scale image-text data to facilitate generalisable VMR. |
Dezhao Luo; Jiabo Huang; Shaogang Gong; Hailin Jin; Yang Liu; |
315 | Learning Adaptive Dense Event Stereo From The Image Domain Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, traditional UDA still needs the input event data with ground-truth in the source domain, which is more challenging and costly to obtain than image data. To tackle this issue, we propose a novel unsupervised domain Adaptive Dense Event Stereo (ADES), which resolves gaps between the different domains and input modalities. |
Hoonhee Cho; Jegyeong Cho; Kuk-Jin Yoon; |
316 | Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS. |
Chaohui Yu; Qiang Zhou; Jingliang Li; Jianlong Yuan; Zhibin Wang; Fan Wang; |
317 | Seeing A Rose in Five Thousand Ways Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. |
Yunzhi Zhang; Shangzhe Wu; Noah Snavely; Jiajun Wu; |
318 | Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time FVV rendering on long-duration dynamic scenes. |
Liao Wang; Qiang Hu; Qihan He; Ziyu Wang; Jingyi Yu; Tinne Tuytelaars; Lan Xu; Minye Wu; |
319 | ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the pixel-level semantic aggregation in self-supervised ViT pre-trained models as image Segmentation and propose the Adaptive Conceptualization approach for USS, termed ACSeg. |
Kehan Li; Zhennan Wang; Zesen Cheng; Runyi Yu; Yian Zhao; Guoli Song; Chang Liu; Li Yuan; Jie Chen; |
320 | NeRFVS: Neural Radiance Fields for Free View Synthesis Via Geometry Scaffolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present NeRFVS, a novel neural radiance fields (NeRF) based method to enable free navigation in a room. |
Chen Yang; Peihao Li; Zanwei Zhou; Shanxin Yuan; Bingbing Liu; Xiaokang Yang; Weichao Qiu; Wei Shen; |
321 | Reproducible Scaling Laws for Contrastive Language-Image Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous work on scaling laws has primarily used private data & models or focused on uni-modal language or vision learning. To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository. |
Mehdi Cherti; Romain Beaumont; Ross Wightman; Mitchell Wortsman; Gabriel Ilharco; Cade Gordon; Christoph Schuhmann; Ludwig Schmidt; Jenia Jitsev; |
322 | Similarity Metric Learning for RGB-Infrared Group Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a metric learning method Closest Permutation Matching (CPM) for RGB-IR G-ReID. |
Jianghao Xiong; Jianhuang Lai; |
323 | Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-Time Mobile Telepresence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a framework called Auto-CARD, which for the first time enables real-time and robust driving of Codec Avatars when exclusively using merely on-device computing resources. |
Yonggan Fu; Yuecheng Li; Chenghui Li; Jason Saragih; Peizhao Zhang; Xiaoliang Dai; Yingyan (Celine) Lin; |
324 | Conjugate Product Graphs for Globally Optimal 2D-3D Shape Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While such problems can be solved to global optimality by finding a shortest path in the product graph between both shapes, existing solutions heavily rely on unrealistic prior assumptions to avoid degenerate solutions (e.g. knowledge to which region of the 3D shape each point of the 2D contour is matched). To address this, we propose a novel 2D-3D shape matching formalism based on the conjugate product graph between the 2D contour and the 3D shape. |
Paul Roetzer; Zorah Lähner; Florian Bernard; |
325 | PromptCAL: Contrastive Affinity Learning Via Auxiliary Prompts for Generalized Novel Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we target a pragmatic but under-explored Generalized Novel Category Discovery (GNCD) setting. |
Sheng Zhang; Salman Khan; Zhiqiang Shen; Muzammal Naseer; Guangyi Chen; Fahad Shahbaz Khan; |
326 | Train/Test-Time Adaptation With Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Train/Test-Time Adaptation with Retrieval (T3AR), a method to adapt models both at train and test time by means of a retrieval module and a searchable pool of external samples. |
Luca Zancato; Alessandro Achille; Tian Yu Liu; Matthew Trager; Pramuditha Perera; Stefano Soatto; |
327 | ProxyFormer: Proxy Alignment Assisted Point Cloud Completion With Missing Part Sensitive Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, recovering the complete point clouds from the partial ones plays an vital role in many practical tasks, and one of the keys lies in the prediction of the missing part. In this paper, we propose a novel point cloud completion approach namely ProxyFormer that divides point clouds into existing (input) and missing (to be predicted) parts and each part communicates information through its proxies. |
Shanshan Li; Pan Gao; Xiaoyang Tan; Mingqiang Wei; |
328 | Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a ‘Squad’). |
Zitian Chen; Yikang Shen; Mingyu Ding; Zhenfang Chen; Hengshuang Zhao; Erik G. Learned-Miller; Chuang Gan; |
329 | Learning Customized Visual Models With Retrieval-Augmented Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. |
Haotian Liu; Kilho Son; Jianwei Yang; Ce Liu; Jianfeng Gao; Yong Jae Lee; Chunyuan Li; |
330 | Multi-Realism Image Compression With A Conditional Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. |
Eirikur Agustsson; David Minnen; George Toderici; Fabian Mentzer; |
331 | Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. |
Jierun Chen; Shiu-hong Kao; Hao He; Weipeng Zhuo; Song Wen; Chul-Ho Lee; S.-H. Gary Chan; |
332 | A Unified Spatial-Angular Structured Light for Single-View Acquisition of Shape and Reflectance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a unified structured light, consisting of an LED array and an LCD mask, for high-quality acquisition of both shape and reflectance from a single view. |
Xianmin Xu; Yuxin Lin; Haoyang Zhou; Chong Zeng; Yaxin Yu; Kun Zhou; Hongzhi Wu; |
333 | Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. |
Paul Hager; Martin J. Menten; Daniel Rueckert; |
334 | On The Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a novel CPTrans framework to tackle the challenge via balancing gradients of different patches, achieving the fine-grained Content-rich Patches Transferring. |
Zhenjie Yu; Shuang Li; Yirui Shen; Chi Harold Liu; Shuigen Wang; |
335 | Masked Images Are Counterfactual Samples for Robust Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on causal analysis of the aforementioned problems, we propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model. |
Yao Xiao; Ziyi Tang; Pengxu Wei; Cong Liu; Liang Lin; |
336 | StepFormer: Self-Supervised Step Discovery and Localization in Instructional Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem with no human supervision and introduce StepFormer, a self-supervised model that discovers and localizes instruction steps in a video. |
Nikita Dvornik; Isma Hadji; Ran Zhang; Konstantinos G. Derpanis; Richard P. Wildes; Allan D. Jepson; |
337 | Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to learn video representation that encodes both action steps and their temporal ordering, based on a large-scale dataset of web instructional videos and their narrations, without using human annotations. |
Licheng Yu; Yang Bai; Shangwen Li; Xueting Yan; Yin Li; Yiwu Zhong; |
338 | Open Vocabulary Semantic Segmentation With Patch Aligned Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP’s contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder. |
Jishnu Mukhoti; Tsung-Yu Lin; Omid Poursaeed; Rui Wang; Ashish Shah; Philip H.S. Torr; Ser-Nam Lim; |
339 | CLIP The Gap: A Single Domain Generalization Approach for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. |
Vidit Vidit; Martin Engilberge; Mathieu Salzmann; |
340 | Co-Training 2L Submodels for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. |
Hugo Touvron; Matthieu Cord; Maxime Oquab; Piotr Bojanowski; Jakob Verbeek; Hervé Jégou; |
341 | On The Importance of Accurate Geometry Data for Dense 3D Vision Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. |
HyunJun Jung; Patrick Ruhkamp; Guangyao Zhai; Nikolas Brasch; Yitong Li; Yannick Verdie; Jifei Song; Yiren Zhou; Anil Armagan; Slobodan Ilic; Aleš Leonardis; Nassir Navab; Benjamin Busam; |
342 | Camouflaged Instance Segmentation Via Explicit De-Camouflaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous instance segmentation methods perform poorly on this task as they are easily disturbed by the deceptive camouflage. To address these challenges, we propose a novel De-camouflaging Network (DCNet) including a pixel-level camouflage decoupling module and an instance-level camouflage suppression module. |
Naisong Luo; Yuwen Pan; Rui Sun; Tianzhu Zhang; Zhiwei Xiong; Feng Wu; |
343 | Understanding Masked Autoencoders Via Hierarchical Latent Variable Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formally characterize and justify existing empirical insights and provide theoretical guarantees of MAE. |
Lingjing Kong; Martin Q. Ma; Guangyi Chen; Eric P. Xing; Yuejie Chi; Louis-Philippe Morency; Kun Zhang; |
344 | K-Planes: Explicit Radiance Fields in Space, Time, and Appearance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. |
Sara Fridovich-Keil; Giacomo Meanti; Frederik Rahbæk Warburg; Benjamin Recht; Angjoo Kanazawa; |
345 | Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose a Multi-mode Online Knowledge Distillation method (MOKD) to boost self-supervised visual representation learning. |
Kaiyou Song; Jin Xie; Shan Zhang; Zimeng Luo; |
346 | Unbalanced Optimal Transport: A Unified Framework for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Each of these strategies comes with its own properties, underlying losses, and heuristics. We show how Unbalanced Optimal Transport unifies these different approaches and opens a whole continuum of methods in between. |
Henri De Plaen; Pierre-François De Plaen; Johan A. K. Suykens; Marc Proesmans; Tinne Tuytelaars; Luc Van Gool; |
347 | Viewpoint Equivariance for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we gain intuition from the integral role of multi-view consistency in 3D scene understanding and geometric learning. |
Dian Chen; Jie Li; Vitor Guizilini; Rares Andrei Ambrus; Adrien Gaidon; |
348 | Photo Pre-Training, But for Sketch Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This lack of sketch data has imposed on the community a few "peculiar" design choices — the most representative of them all is perhaps the coerced utilisation of photo-based pre-training (i.e., no sketch), for many core tasks that otherwise dictates specific sketch understanding. In this paper, we ask just the one question — can we make such photo-based pre-training, to actually benefit sketch? |
Ke Li; Kaiyue Pang; Yi-Zhe Song; |
349 | NeuralPCI: Spatio-Temporal Neural Field for 3D Point Cloud Multi-Frame Non-Linear Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, the existence of numerous nonlinear large motions in real-world scenarios makes the point cloud interpolation task more challenging. In light of these issues, we present NeuralPCI: an end-to-end 4D spatio-temporal Neural field for 3D Point Cloud Interpolation, which implicitly integrates multi-frame information to handle nonlinear large motions for both indoor and outdoor scenarios. |
Zehan Zheng; Danni Wu; Ruisi Lu; Fan Lu; Guang Chen; Changjun Jiang; |
350 | Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition. |
Wenhao Wu; Xiaohan Wang; Haipeng Luo; Jingdong Wang; Yi Yang; Wanli Ouyang; |
351 | Adaptive Plasticity Improvement for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new method, called adaptive plasticity improvement (API), for continual learning. |
Yan-Shuo Liang; Wu-Jun Li; |
352 | Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. |
Kuniaki Saito; Kihyuk Sohn; Xiang Zhang; Chun-Liang Li; Chen-Yu Lee; Kate Saenko; Tomas Pfister; |
353 | MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a general framework called MMANet to assist incomplete multimodal learning. |
Shicai Wei; Chunbo Luo; Yang Luo; |
354 | Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes. |
Sumith Kulal; Tim Brooks; Alex Aiken; Jiajun Wu; Jimei Yang; Jingwan Lu; Alexei A. Efros; Krishna Kumar Singh; |
355 | 3D Neural Field Generation Using Triplane Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. |
J. Ryan Shue; Eric Ryan Chan; Ryan Po; Zachary Ankner; Jiajun Wu; Gordon Wetzstein; |
356 | Regularized Vector Quantization for Tokenized Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, deterministic quantization suffers from severe codebook collapse and misaligned inference stage while stochastic quantization suffers from low codebook utilization and perturbed reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate above issues effectively by applying regularization from two perspectives. |
Jiahui Zhang; Fangneng Zhan; Christian Theobalt; Shijian Lu; |
357 | Semantic Scene Completion With Cleaner Self Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we use the ground-truth 3D voxels to generate a perfect visible surface, called TSDF-CAD, and then train a cleaner SSC model. |
Fengyun Wang; Dong Zhang; Hanwang Zhang; Jinhui Tang; Qianru Sun; |
358 | Improving Image Recognition By Retrieving From Web-Scale Image-Text Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. |
Ahmet Iscen; Alireza Fathi; Cordelia Schmid; |
359 | Deep Factorized Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Differently, we propose a deep factorized metric learning method (DFML) to factorize the training signal and employ different samples to train various components of the backbone network. |
Chengkun Wang; Wenzhao Zheng; Junlong Li; Jie Zhou; Jiwen Lu; |
360 | High-Fidelity 3D Face Generation From Natural Language Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue the major obstacle lies in 1) the lack of high-quality 3D face data with descriptive text annotation, and 2) the complex mapping relationship between descriptive language space and shape/appearance space. |
Menghua Wu; Hao Zhu; Linjia Huang; Yiyu Zhuang; Yuanxun Lu; Xun Cao; |
361 | A Generalized Framework for Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. |
Miran Heo; Sukjun Hwang; Jeongseok Hyun; Hanjung Kim; Seoung Wug Oh; Joon-Young Lee; Seon Joo Kim; |
362 | Multi-Level Logit Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Concretely, we propose a simple yet effective approach to logit distillation via multi-level prediction alignment. |
Ying Jin; Jiaqi Wang; Dahua Lin; |
363 | On Distillation of Guided Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. |
Chenlin Meng; Robin Rombach; Ruiqi Gao; Diederik Kingma; Stefano Ermon; Jonathan Ho; Tim Salimans; |
364 | Dual-Path Adaptation From Image to Video Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters. |
Jungin Park; Jiyoung Lee; Kwanghoon Sohn; |
365 | Towards Better Decision Forests: Forest Alternating Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, unlike for most other models, such as neural networks, optimizing forests or trees is not easy, because they define a non-differentiable function. We show, for the first time, that it is possible to learn a forest by optimizing a desirable loss and regularization jointly over all its trees and parameters. |
Miguel Á. Carreira-Perpiñán; Magzhan Gabidolla; Arman Zharmagambetov; |
366 | DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a neural technique for learning to select a local sub-region around a point which can be used for mesh parameterization. |
Richard Liu; Noam Aigerman; Vladimir G. Kim; Rana Hanocka; |
367 | Disentangled Representation Learning for Unsupervised Neural Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we firstly point out a problem that an existing deep learning-based quantizer hardly benefits from the residual vector space, unlike conventional shallow quantizers. To cope with this problem, we introduce a novel disentangled representation learning for unsupervised neural quantization. |
Haechan Noh; Sangeek Hyun; Woojin Jeong; Hanshin Lim; Jae-Pil Heo; |
368 | Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel Hierarchical Semantic Correspondence Network (HSCNet), which explores multi-level visual-textual correspondence by learning hierarchical semantic alignment and utilizes dense supervision by grounding diverse levels of queries. |
Chaolei Tan; Zihang Lin; Jian-Fang Hu; Wei-Shi Zheng; Jianhuang Lai; |
369 | Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate existing methods and present a general framework of spatiotemporal predictive learning, in which the spatial encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame correlations. |
Cheng Tan; Zhangyang Gao; Lirong Wu; Yongjie Xu; Jun Xia; Siyuan Li; Stan Z. Li; |
370 | Zero-Shot Pose Transfer for Unrigged Stylized 3D Characters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a zero-shot approach that requires only the widely available deformed non-stylized avatars in training, and deforms stylized characters of significantly different shapes at inference. |
Jiashun Wang; Xueting Li; Sifei Liu; Shalini De Mello; Orazio Gallo; Xiaolong Wang; Jan Kautz; |
371 | Listening Human Behavior: 3D Human Pose Estimation With Acoustic Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we introduce a framework that encodes multichannel audio features into 3D human poses. |
Yuto Shibata; Yutaka Kawashima; Mariko Isogawa; Go Irie; Akisato Kimura; Yoshimitsu Aoki; |
372 | Meta-Learning With A Geometry-Adaptive Preconditioner Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. |
Suhyun Kang; Duhun Hwang; Moonjung Eo; Taesup Kim; Wonjong Rhee; |
373 | Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the limitation, we propose a knowledge graph with Dynamic structure and nodes to facilitate chest X-ray report generation with Contrastive Learning, named DCL. |
Mingjie Li; Bingqian Lin; Zicong Chen; Haokun Lin; Xiaodan Liang; Xiaojun Chang; |
374 | BiCro: Noisy Correspondence Rectification for Multi-Modality Data Via Bi-Directional Cross-Modal Similarity Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, the cheaply collected dataset unavoidably contains many mismatched data pairs, which have been proven to be harmful to the model’s performance. To address this, we propose a general framework called BiCro (Bidirectional Cross-modal similarity consistency), which can be easily integrated into existing cross-modal matching models and improve their robustness against noisy data. |
Shuo Yang; Zhaopan Xu; Kai Wang; Yang You; Hongxun Yao; Tongliang Liu; Min Xu; |
375 | Transfer Knowledge From Head to Tail: Uncertainty Calibration Under Long-Tailed Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the problem of calibrating the model trained from a long-tailed distribution. |
Jiahao Chen; Bing Su; |
376 | FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework named FrustumFormer, which pays more attention to the features in instance regions via adaptive instance-aware resampling. |
Yuqi Wang; Yuntao Chen; Zhaoxiang Zhang; |
377 | Global Vision Transformer Pruning With Hessian-Aware Saliency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning. |
Huanrui Yang; Hongxu Yin; Maying Shen; Pavlo Molchanov; Hai Li; Jan Kautz; |
378 | Class-Conditional Sharpness-Aware Minimization for Deep Long-Tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose an effective twostage sharpness-aware optimization approach based on the decoupling paradigm in DLTR. |
Zhipeng Zhou; Lanqing Li; Peilin Zhao; Pheng-Ann Heng; Wei Gong; |
379 | ScarceNet: Animal Pose Estimation With Scarce Annotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose the ScarceNet, a pseudo label-based approach to generate artificial labels for the unlabeled images. |
Chen Li; Gim Hee Lee; |
380 | OmniCity: Omnipotent City Understanding With Multi-Level and Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents OmniCity, a new dataset for omnipotent city understanding from multi-level and multi-view images. |
Weijia Li; Yawen Lai; Linning Xu; Yuanbo Xiangli; Jinhua Yu; Conghui He; Gui-Song Xia; Dahua Lin; |
381 | Efficient On-Device Training Via Gradient Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Consequently, in this paper, we propose a new gradient filtering approach which enables on-device CNN model training. |
Yuedong Yang; Guihong Li; Radu Marculescu; |
382 | SViTT: Temporal Learning of Sparse Video-Text Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify several key challenges in temporal learning of video-text transformers: the spatiotemporal trade-off from limited network size; the curse of dimensionality for multi-frame modeling; and the diminishing returns of semantic information by extending clip length. |
Yi Li; Kyle Min; Subarna Tripathi; Nuno Vasconcelos; |
383 | NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To process the HODome dataset, we develop NeuralDome, a layer-wise neural processing pipeline tailored for multi-view video inputs to conduct accurate tracking, geometry reconstruction and free-view rendering, for both human subjects and objects. |
Juze Zhang; Haimin Luo; Hongdi Yang; Xinru Xu; Qianyang Wu; Ye Shi; Jingyi Yu; Lan Xu; Jingya Wang; |
384 | 3D Human Mesh Estimation From Virtual Markers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. |
Xiaoxuan Ma; Jiajun Su; Chunyu Wang; Wentao Zhu; Yizhou Wang; |
385 | CUDA: Convolution-Based Unlearnable Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel, model-free, Convolution-based Unlearnable DAtaset (CUDA) generation technique. |
Vinu Sankar Sadasivan; Mahdi Soltanolkotabi; Soheil Feizi; |
386 | No One Left Behind: Improving The Worst Categories in Long-Tailed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a simple plug-in method that is applicable to a wide range of methods. |
Yingxiao Du; Jianxin Wu; |
387 | Deep Fair Clustering Via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by developing a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. |
Pengxin Zeng; Yunfan Li; Peng Hu; Dezhong Peng; Jiancheng Lv; Xi Peng; |
388 | MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose a multi-information aggregation network (MIANet) that effectively leverages the general knowledge, i.e., semantic word embeddings, and instance information for accurate segmentation. |
Yong Yang; Qiong Chen; Yuan Feng; Tianlin Huang; |
389 | High Fidelity 3D Hand Shape Reconstruction Via Scalable Graph Frequency Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component. |
Tianyu Luan; Yuanhao Zhai; Jingjing Meng; Zhong Li; Zhang Chen; Yi Xu; Junsong Yuan; |
390 | COT: Unsupervised Domain Adaptation With Clustering and Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cope with two aforementioned issues, we propose a Clustering-based Optimal Transport (COT) algorithm, which formulates the alignment procedure as an Optimal Transport problem and constructs a mapping between clustering centers in the source and target domain via an end-to-end manner. |
Yang Liu; Zhipeng Zhou; Baigui Sun; |
391 | Target-Referenced Reactive Grasping for Dynamic Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to solve reactive grasping in a target-referenced setting by tracking through generated grasp spaces. |
Jirong Liu; Ruo Zhang; Hao-Shu Fang; Minghao Gou; Hongjie Fang; Chenxi Wang; Sheng Xu; Hengxu Yan; Cewu Lu; |
392 | Learning To Exploit The Sequence-Specific Prior Knowledge for Image Processing Pipelines Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a sequential ISP hyperparameter prediction framework that utilizes the sequential relationship within ISP modules and the similarity among parameters to guide the model sequence process. |
Haina Qin; Longfei Han; Weihua Xiong; Juan Wang; Wentao Ma; Bing Li; Weiming Hu; |
393 | Complexity-Guided Slimmable Decoder for Efficient Deep Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the complexity-guided slimmable decoder (cgSlimDecoder) in combination with skip-adaptive entropy coding (SaEC) for efficient deep video compression. |
Zhihao Hu; Dong Xu; |
394 | Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, the efficient combination of CNNs and Transformers is investigated, and a hybrid architecture called Lite-Mono is presented. |
Ning Zhang; Francesco Nex; George Vosselman; Norman Kerle; |
395 | MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MarginMatch, a new SSL approach combining consistency regularization and pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label quality. |
Tiberiu Sosea; Cornelia Caragea; |
396 | Neural Scene Chronology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time, from Internet photos of large-scale landmarks. |
Haotong Lin; Qianqian Wang; Ruojin Cai; Sida Peng; Hadar Averbuch-Elor; Xiaowei Zhou; Noah Snavely; |
397 | Starting From Non-Parametric Networks for 3D Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions. |
Renrui Zhang; Liuhui Wang; Yali Wang; Peng Gao; Hongsheng Li; Jianbo Shi; |
398 | Light Source Separation and Intrinsic Image Decomposition Under AC Illumination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that the flickers due to AC illumination is useful for intrinsic image decomposition (IID). |
Yusaku Yoshida; Ryo Kawahara; Takahiro Okabe; |
399 | TIPI: Test Time Adaptation With Transformation Invariance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When deploying a machine learning model to a new environment, we often encounter the distribution shift problem — meaning the target data distribution is different from the model’s training distribution. In this paper, we assume that labels are not provided for this new domain, and that we do not store the source data (e.g., for privacy reasons). |
A. Tuan Nguyen; Thanh Nguyen-Tang; Ser-Nam Lim; Philip H.S. Torr; |
400 | OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference. |
Zhiyuan Ma; Xiangyu Zhu; Guo-Jun Qi; Zhen Lei; Lei Zhang; |
401 | Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to learn a general human representation from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. |
Weihua Chen; Xianzhe Xu; Jian Jia; Hao Luo; Yaohua Wang; Fan Wang; Rong Jin; Xiuyu Sun; |
402 | Large-Capacity and Flexible Video Steganography Via Invertible Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For large-capacity, we present a reversible pipeline to perform multiple videos hiding and recovering through a single invertible neural network (INN). |
Chong Mou; Youmin Xu; Jiechong Song; Chen Zhao; Bernard Ghanem; Jian Zhang; |
403 | CFA: Class-Wise Calibrated Fair Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we are the first to theoretically and empirically investigate the preference of different classes for adversarial configurations, including perturbation margin, regularization, and weight averaging. |
Zeming Wei; Yifei Wang; Yiwen Guo; Yisen Wang; |
404 | EVAL: Explainable Video Anomaly Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a novel framework for single-scene video anomaly localization that allows for human-understandable reasons for the decisions the system makes. |
Ashish Singh; Michael J. Jones; Erik G. Learned-Miller; |
405 | Position-Guided Text Prompt for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP. |
Jinpeng Wang; Pan Zhou; Mike Zheng Shou; Shuicheng Yan; |
406 | HOLODIFFUSION: Training A 3D Diffusion Model Using 2D Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the first challenge by introducing a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision; and the second challenge by proposing an image formation model that decouples model memory from spatial memory. |
Animesh Karnewar; Andrea Vedaldi; David Novotny; Niloy J. Mitra; |
407 | Stimulus Verification Is A Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose stimulus verification, serving as a universal and effective sampling process to improve the multi-modal prediction capability, where stimulus refers to the factor in the observation that may affect the future movements such as social interaction and scene context. |
Jianhua Sun; Yuxuan Li; Liang Chai; Cewu Lu; |
408 | 3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we facilitate the issue by decomposing correlation learning into space and time, and present a novel Spatio-Temporal Criss-cross attention (STC) block. |
Zhenhua Tang; Zhaofan Qiu; Yanbin Hao; Richang Hong; Ting Yao; |
409 | Plateau-Reduced Differentiable Path Tracing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, inverse rendering might not converge due to inherent plateaus, i.e., regions of zero gradient, in the objective function. We propose to alleviate this by convolving the high-dimensional rendering function that maps scene parameters to images with an additional kernel that blurs the parameter space. |
Michael Fischer; Tobias Ritschel; |
410 | LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camera fusion at both local and global levels. |
Xin Li; Tao Ma; Yuenan Hou; Botian Shi; Yuchen Yang; Youquan Liu; Xingjiao Wu; Qin Chen; Yikang Li; Yu Qiao; Liang He; |
411 | ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike existing works that struggle to balance the trade-off between inference speed and SOD performance, in this paper, we propose a novel Scale-aware Knowledge Distillation (ScaleKD), which transfers knowledge of a complex teacher model to a compact student model. |
Yichen Zhu; Qiqi Zhou; Ning Liu; Zhiyuan Xu; Zhicai Ou; Xiaofeng Mou; Jian Tang; |
412 | An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we systematically examine the potential of MVM in the context of VidL learning. |
Tsu-Jui Fu; Linjie Li; Zhe Gan; Kevin Lin; William Yang Wang; Lijuan Wang; Zicheng Liu; |
413 | Glocal Energy-Based Learning for Few-Shot Open-Set Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we approach the FSOR task by proposing a novel energy-based hybrid model. |
Haoyu Wang; Guansong Pang; Peng Wang; Lei Zhang; Wei Wei; Yanning Zhang; |
414 | Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain. |
Ruyang Liu; Jingjia Huang; Ge Li; Jiashi Feng; Xinglong Wu; Thomas H. Li; |
415 | MethaneMapper: Spectral Absorption Aware Hyperspectral Transformer for Methane Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods for analyzing this data are sensitive to local terrain conditions, often require manual inspection from domain experts, prone to significant error and hence are not scalable. To address these challenges, we propose a novel end-to-end spectral absorption wavelength aware transformer network, MethaneMapper, to detect and quantify the emissions. |
Satish Kumar; Ivan Arevalo; ASM Iftekhar; B S Manjunath; |
416 | Autonomous Manipulation Learning for Similar Deformable Objects Via Only One Demonstration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Generally, most existing methods for deformable object manipulation suffer two issues, 1) Massive demonstration: repeating thousands of robot-object demonstrations for model training of one specific instance; 2) Poor generalization: inevitably re-training for transferring the learned skill to a similar/new instance from the same category. Therefore, we propose a category-level deformable 3D object manipulation framework, which could manipulate deformable 3D objects with only one demonstration and generalize the learned skills to new similar instances without re-training. |
Yu Ren; Ronghan Chen; Yang Cong; |
417 | Representation Learning for Visual Object Tracking By Masked Appearance Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose masked appearance transfer, a simple but effective representation learning method for tracking, based on an encoder-decoder architecture. |
Haojie Zhao; Dong Wang; Huchuan Lu; |
418 | EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Equivariant Neural Field Expectation Maximization (EFEM), a simple, effective, and robust geometric algorithm that can segment objects in 3D scenes without annotations or training on scenes. |
Jiahui Lei; Congyue Deng; Karl Schmeckpeper; Leonidas Guibas; Kostas Daniilidis; |
419 | Learning To Name Classes for Vision and Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Two distinct challenges that remain however, are high sensitivity to the choice of handcrafted class names that define queries, and the difficulty of adaptation to new, smaller datasets. Towards addressing these problems, we propose to leverage available data to learn, for each class, an optimal word embedding as a function of the visual content. |
Sarah Parisot; Yongxin Yang; Steven McDonagh; |
420 | ECON: Explicit Clothed Humans Optimized Via Normal Integration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we make two key observations: (1) current networks are better at inferring detailed 2D maps than full-3D surfaces, and (2) a parametric model can be seen as a "canvas" for stitching together detailed surface patches. |
Yuliang Xiu; Jinlong Yang; Xu Cao; Dimitrios Tzionas; Michael J. Black; |
421 | Neural Fourier Filter Bank Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel method to provide efficient and highly detailed reconstructions. |
Zhijie Wu; Yuhe Jin; Kwang Moo Yi; |
422 | F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we delve deep into the mechanism of space warping to handle unbounded scenes. |
Peng Wang; Yuan Liu; Zhaoxi Chen; Lingjie Liu; Ziwei Liu; Taku Komura; Christian Theobalt; Wenping Wang; |
423 | NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite successful synthesis of fake identity images randomly sampled from latent space, adopting these models for generating face images of real subjects is still a challenging task due to its so-called inversion issue. In this paper, we propose a universal method to surgically fine-tune these NeRF-GAN models in order to achieve high-fidelity animation of real subjects only by a single image. |
Yu Yin; Kamran Ghasedi; HsiangTao Wu; Jiaolong Yang; Xin Tong; Yun Fu; |
424 | Learning To Detect and Segment for Open Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. |
Tao Wang; |
425 | Disentangling Writer and Character Styles for Handwriting Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, while a person’s handwriting typically exhibits general uniformity (e.g., glyph slant and aspect ratios), there are still small style variations in finer details (e.g., stroke length and curvature) of characters. In light of this, we propose to disentangle the style representations at both writer and character levels from individual handwritings to synthesize realistic stylized online handwritten characters. |
Gang Dai; Yifan Zhang; Qingfeng Wang; Qing Du; Zhuliang Yu; Zhuoman Liu; Shuangping Huang; |
426 | Nighttime Smartphone Reflective Flare Removal Using Optical Center Symmetry Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many existing methods rely on manually designed features to detect these bright spots, but they often fail to identify reflective flares created by various types of light and may even mistakenly remove the light sources in scenarios with multiple light sources. To address these challenges, we propose an optical center symmetry prior, which suggests that the reflective flare and light source are always symmetrical around the lens’s optical center. |
Yuekun Dai; Yihang Luo; Shangchen Zhou; Chongyi Li; Chen Change Loy; |
427 | StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose StyleSync, an effective framework that enables high-fidelity lip synchronization. |
Jiazhi Guan; Zhanwang Zhang; Hang Zhou; Tianshu Hu; Kaisiyuan Wang; Dongliang He; Haocheng Feng; Jingtuo Liu; Errui Ding; Ziwei Liu; Jingdong Wang; |
428 | Balanced Spherical Grid for Egocentric View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present EgoNeRF, a practical solution to reconstruct large-scale real-world environments for VR assets. |
Changwoon Choi; Sang Min Kim; Young Min Kim; |
429 | Box-Level Active Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, existing methods still perform image-level annotation, but equally scoring all targets within the same image incurs waste of budget and redundant labels. Having revealed above problems and limitations, we introduce a box-level active detection framework that controls a box-based budget per cycle, prioritizes informative targets and avoids redundancy for fair comparison and efficient application. |
Mengyao Lyu; Jundong Zhou; Hui Chen; Yijie Huang; Dongdong Yu; Yaqian Li; Yandong Guo; Yuchen Guo; Liuyu Xiang; Guiguang Ding; |
430 | Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we propose SimCore algorithm to sample a coreset, the subset of an open-set that has a minimum distance to the target dataset in the latent space. |
Sungnyun Kim; Sangmin Bae; Se-Young Yun; |
431 | Trace and Pace: Controllable Pedestrian Animation Via Guided Trajectory Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. |
Davis Rempe; Zhengyi Luo; Xue Bin Peng; Ye Yuan; Kris Kitani; Karsten Kreis; Sanja Fidler; Or Litany; |
432 | Overlooked Factors in Concept-Based Explanations: Dataset Choice, Concept Learnability, and Human Capability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify and analyze three commonly overlooked factors in concept-based explanations. |
Vikram V. Ramaswamy; Sunnie S. Y. Kim; Ruth Fong; Olga Russakovsky; |
433 | Unsupervised 3D Shape Reconstruction By Part Retrieval and Assembly Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We instead propose to decompose shapes using a library of 3D parts provided by the user, giving full control over the choice of parts. |
Xianghao Xu; Paul Guerrero; Matthew Fisher; Siddhartha Chaudhuri; Daniel Ritchie; |
434 | SeqTrack: Sequence to Sequence Learning for Visual Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new sequence-to-sequence learning framework for visual tracking, dubbed SeqTrack. |
Xin Chen; Houwen Peng; Dong Wang; Huchuan Lu; Han Hu; |
435 | Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Some methods have attempted to explicitly estimate non-uniform blur kernels by CNNs, but accurate estimation is still challenging due to the lack of ground truth about spatially varying blur kernels in real-world images. To address these issues, we propose to represent the field of motion blur kernels in a latent space by normalizing flows, and design CNNs to predict the latent codes instead of motion kernels. |
Zhenxuan Fang; Fangfang Wu; Weisheng Dong; Xin Li; Jinjian Wu; Guangming Shi; |
436 | AutoLabel: CLIP-Based Framework for Open-Set Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we deviate from the prior work of training a specialized open-set classifier or weighted adversarial learning by proposing to use pre-trained Language and Vision Models (CLIP). |
Giacomo Zara; Subhankar Roy; Paolo Rota; Elisa Ricci; |
437 | Generative Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Generative Semantic Segmentation (GSS), a generative learning approach for semantic segmentation. |
Jiaqi Chen; Jiachen Lu; Xiatian Zhu; Li Zhang; |
438 | Instant-NVR: Instant Neural Volumetric Rendering for Human-Object Interactions From Monocular RGBD Stream Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. |
Yuheng Jiang; Kaixin Yao; Zhuo Su; Zhehao Shen; Haimin Luo; Lan Xu; |
439 | Aligning Step-by-Step Instructional Diagrams to Video Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Multimodal alignment facilitates the retrieval of instances from one modality when queried using another. In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world. |
Jiahao Zhang; Anoop Cherian; Yanbin Liu; Yizhak Ben-Shabat; Cristian Rodriguez; Stephen Gould; |
440 | Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. |
Junyu Gao; Mengyuan Chen; Changsheng Xu; |
441 | High-Fidelity and Freely Controllable Talking Head Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression. |
Yue Gao; Yuan Zhou; Jinglu Wang; Xiao Li; Xiang Ming; Yan Lu; |
442 | Q-DETR: An Efficient Low-Bit Quantized Detection Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy. |
Sheng Xu; Yanjing Li; Mingbao Lin; Peng Gao; Guodong Guo; Jinhu Lü; Baochang Zhang; |
443 | DINER: Depth-Aware Image-Based NEural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose novel techniques to incorporate depth information into feature fusion and efficient scene sampling. |
Malte Prinzler; Otmar Hilliges; Justus Thies; |
444 | Burstormer: Burst Image Restoration and Enhancement Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The challenge is to properly align the successive image shots and merge their complimentary information to achieve high-quality outputs. Towards this direction, we propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement. |
Akshay Dudhane; Syed Waqas Zamir; Salman Khan; Fahad Shahbaz Khan; Ming-Hsuan Yang; |
445 | Progressive Transformation Learning for Leveraging Virtual Images in Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a viable alternative to laborious and costly data curation, we introduce Progressive Transformation Learning (PTL), which gradually augments a training dataset by adding transformed virtual images with enhanced realism. |
Yi-Ting Shen; Hyungtae Lee; Heesung Kwon; Shuvra S. Bhattacharyya; |
446 | Co-Speech Gesture Synthesis By Reinforcement Learning With Contrastive Pre-Trained Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel reinforcement learning (RL) framework called RACER to generate sequences of gestures that maximize the overall satisfactory. |
Mingyang Sun; Mengchen Zhao; Yaqing Hou; Minglei Li; Huang Xu; Songcen Xu; Jianye Hao; |
447 | Reconstructing Signing Avatars From Video Using Linguistic Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our method, SGNify, captures fine-grained hand pose, facial expression, and body movement fully automatically from in-the-wild monocular SL videos. |
Maria-Paola Forte; Peter Kulits; Chun-Hao P. Huang; Vasileios Choutas; Dimitrios Tzionas; Katherine J. Kuchenbecker; Michael J. Black; |
448 | DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This is due to the lack of loop closures and exact cross-frame point correspondences, and the slow convergence of its global localization network. We propose DeepMapping2 by adding two novel techniques to address these issues: (1) organization of training batch based on map topology from loop closing, and (2) self-supervised local-to-global point consistency loss leveraging pairwise registration. |
Chao Chen; Xinhao Liu; Yiming Li; Li Ding; Chen Feng; |
449 | SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose SDC-UDA, a simple yet effective volumetric UDA framework for Slice-Direction Continuous cross-modality medical image segmentation which combines intra- and inter-slice self-attentive image translation, uncertainty-constrained pseudo-label refinement, and volumetric self-training. |
Hyungseob Shin; Hyeongyu Kim; Sewon Kim; Yohan Jun; Taejoon Eo; Dosik Hwang; |
450 | DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we proposed a De-overlapping Network (DoNet) in a decompose-and-recombined strategy. |
Hao Jiang; Rushan Zhang; Yanning Zhou; Yumeng Wang; Hao Chen; |
451 | AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a multimodal solution to the problem of 4D face reconstruction from monocular videos. |
Aggelina Chatziagapi; Dimitris Samaras; |
452 | Divide and Conquer: Answering Questions With Object Factorization and Compositional Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by humans’ reasoning of the visual world, we tackle the aforementioned challenges from a compositional perspective, and propose an integral framework consisting of a principled object factorization method and a novel neural module network. |
Shi Chen; Qi Zhao; |
453 | Instant Domain Augmentation for LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a fast and flexible LiDAR augmentation method for the semantic segmentation task, called ‘LiDomAug’. |
Kwonyoung Ryu; Soonmin Hwang; Jaesik Park; |
454 | A Characteristic Function-Based Method for Bottom-Up Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To cope with this problem, from a novel perspective, we propose a new bottom-up human pose estimation method that optimizes the heatmap prediction via minimizing the distance between two characteristic functions respectively constructed from the predicted heatmap and the groundtruth heatmap. Our analysis presented in this paper indicates that the distance between these two characteristic functions is essentially the upper bound of the L2 losses w.r.t. sub-regions of the predicted heatmap. |
Haoxuan Qu; Yujun Cai; Lin Geng Foo; Ajay Kumar; Jun Liu; |
455 | SceneTrilogy: On Human Scene-Sketch and Its Complementarity With Photo and Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend scene understanding to include that of human sketch. |
Pinaki Nath Chowdhury; Ayan Kumar Bhunia; Aneeshan Sain; Subhadeep Koley; Tao Xiang; Yi-Zhe Song; |
456 | ERM-KTP: Knowledge-Level Machine Unlearning Via Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, the approximate approaches have serious security flaws because even unlearning completely different data points can produce the same contribution estimation as unlearning the target data points. To address the above problems, we try to define machine unlearning from the knowledge perspective, and we propose a knowledge-level machine unlearning method, namely ERM-KTP. |
Shen Lin; Xiaoyu Zhang; Chenyang Chen; Xiaofeng Chen; Willy Susilo; |
457 | RefSR-NeRF: Towards High Fidelity and Super Resolution View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Reference-guided Super-Resolution Neural Radiance Field (RefSR-NeRF) that extends NeRF to super resolution and photorealistic novel view synthesis. |
Xudong Huang; Wei Li; Jie Hu; Hanting Chen; Yunhe Wang; |
458 | DATE: Domain Adaptive Product Seeker for E-Commerce Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Domain Adaptive producT sEeker (DATE) framework, regarding PR and PG as Product Seeking problem at different levels, to assist the query date the product. |
Haoyuan Li; Hao Jiang; Tao Jin; Mengyan Li; Yan Chen; Zhijie Lin; Yang Zhao; Zhou Zhao; |
459 | Polarimetric IToF: Measuring High-Fidelity Depth Through Scattering Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a polarimetric iToF imaging method that can capture depth information robustly through scattering media. |
Daniel S. Jeon; Andréas Meuleman; Seung-Hwan Baek; Min H. Kim; |
460 | Jedi: Entropy-Based Localization and Removal of Adversarial Patches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Jedi, a new defense against adversarial patches that is resilient to realistic patch attacks, and also improves detection and recovery compared to the state of the art. |
Bilel Tarchoun; Anouar Ben Khalifa; Mohamed Ali Mahjoub; Nael Abu-Ghazaleh; Ihsen Alouani; |
461 | Localized Semantic Feature Mixers for Efficient Pedestrian Detection in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Localized Semantic Feature Mixers (LSFM), a novel, anchor-free pedestrian detection architecture. |
Abdul Hannan Khan; Mohammed Shariq Nawaz; Andreas Dengel; |
462 | Self-Supervised Super-Plane for Neural 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations. |
Botao Ye; Sifei Liu; Xueting Li; Ming-Hsuan Yang; |
463 | DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consumption of contrastive loss when training contrastive learning models. |
Yihao Chen; Xianbiao Qi; Jianan Wang; Lei Zhang; |
464 | GM-NeRF: Learning Generalizable Model-Based Neural Radiance Fields From Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on synthesizing high-fidelity novel view images for arbitrary human performers, given a set of sparse multi-view images. |
Jianchuan Chen; Wentao Yi; Liqian Ma; Xu Jia; Huchuan Lu; |
465 | VDN-NeRF: Resolving Shape-Radiance Ambiguity Via View-Dependence Normalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose VDN-NeRF, a method to train neural radiance fields (NeRFs) for better geometry under non-Lambertian surface and dynamic lighting conditions that cause significant variation in the radiance of a point when viewed from different angles. |
Bingfan Zhu; Yanchao Yang; Xulong Wang; Youyi Zheng; Leonidas Guibas; |
466 | Mobile User Interface Element Detection Via Adaptively Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. |
Zhangxuan Gu; Zhuoer Xu; Haoxing Chen; Jun Lan; Changhua Meng; Weiqiang Wang; |
467 | Perspective Fields for Single Image Camera Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose perspective fields as a representation that models the local perspective properties of an image. |
Linyi Jin; Jianming Zhang; Yannick Hold-Geoffroy; Oliver Wang; Kevin Blackburn-Matzen; Matthew Sticha; David F. Fouhey; |
468 | Sparse Multi-Modal Graph Transformer With Shared-Context Processing for Representation Learning of Giga-Pixel Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by defining the novel concept of shared-context processing, we designed a multi-modal Graph Transformer that uses the cellular graph within the tissue to provide a single representation for a patient while taking advantage of the hierarchical structure of the tissue, enabling a dynamic focus between cell-level and tissue-level information. |
Ramin Nakhli; Puria Azadi Moghadam; Haoyang Mi; Hossein Farahani; Alexander Baras; Blake Gilks; Ali Bashashati; |
469 | Generating Human Motion From Textual Descriptions With Discrete Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a simple and must-known conditional generative framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) and Generative Pre-trained Transformer (GPT) for human motion generation from textural descriptions. |
Jianrong Zhang; Yangsong Zhang; Xiaodong Cun; Yong Zhang; Hongwei Zhao; Hongtao Lu; Xi Shen; Ying Shan; |
470 | Spatial-Temporal Concept Based Explanation of 3D ConvNets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a STCE (Spatial-temporal Concept-based Explanation) framework for interpreting 3D ConvNets. |
Ying Ji; Yu Wang; Jien Kato; |
471 | Robust Test-Time Adaptation in Dynamic Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these attempts may fail in dynamic scenarios of real-world applications like autonomous driving, where the environments gradually change and the test data is sampled correlatively over time. In this work, we explore such practical test data streams to deploy the model on the fly, namely practical test-time adaptation (PTTA). |
Longhui Yuan; Binhui Xie; Shuang Li; |
472 | Global and Local Mixture Consistency Cumulative Learning for Long-Tailed Visual Recognitions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, our goal is to design a simple learning paradigm for long-tail visual recognition, which not only improves the robustness of the feature extractor but also alleviates the bias of the classifier towards head classes while reducing the training skills and overhead. |
Fei Du; Peng Yang; Qi Jia; Fengtao Nan; Xiaoting Chen; Yun Yang; |
473 | NIRVANA: Neural Implicit Representations of Videos With Adaptive Networks and Autoregressive Patch-Wise Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. |
Shishira R. Maiya; Sharath Girish; Max Ehrlich; Hanyu Wang; Kwot Sin Lee; Patrick Poirson; Pengxiang Wu; Chen Wang; Abhinav Shrivastava; |
474 | Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based on their information densities for an accurate & compact code representation. |
Mengqi Huang; Zhendong Mao; Zhuowei Chen; Yongdong Zhang; |
475 | Coaching A Teachable Student Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel knowledge distillation framework for effectively teaching a sensorimotor student agent to drive from the supervision of a privileged teacher agent. |
Jimuyang Zhang; Zanming Huang; Eshed Ohn-Bar; |
476 | Collaboration Helps Camera Overtake LiDAR in 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes an orthogonal direction, improving the camera-only 3D detection by introducing multi-agent collaborations. |
Yue Hu; Yifan Lu; Runsheng Xu; Weidi Xie; Siheng Chen; Yanfeng Wang; |
477 | RealImpact: A Dataset of Impact Sound Fields for Real Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present RealImpact, a large-scale dataset of real object impact sounds recorded under controlled conditions. |
Samuel Clarke; Ruohan Gao; Mason Wang; Mark Rau; Julia Xu; Jui-Hsien Wang; Doug L. James; Jiajun Wu; |
478 | ReCo: Region-Controlled Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, large-scale text-to-image (T2I) models have shown impressive performance in generating high-fidelity images, but with limited controllability, e.g., precisely specifying the content in a specific region with a free-form text description. In this paper, we propose an effective technique for such regional control in T2I generation. |
Zhengyuan Yang; Jianfeng Wang; Zhe Gan; Linjie Li; Kevin Lin; Chenfei Wu; Nan Duan; Zicheng Liu; Ce Liu; Michael Zeng; Lijuan Wang; |
479 | WINNER: Weakly-Supervised HIerarchical DecompositioN and AligNment for Spatio-tEmporal Video GRounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify that intra-sample spurious correlations among video-language components can be alleviated if the model captures the decomposed structures of video and language data. In this light, we propose a novel framework, namely WINNER, for hierarchical video-text understanding. |
Mengze Li; Han Wang; Wenqiao Zhang; Jiaxu Miao; Zhou Zhao; Shengyu Zhang; Wei Ji; Fei Wu; |
480 | Preserving Linear Separability in Continual Learning By Backward Feature Projection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most feature distillation methods directly constrain the new features to match the old ones, overlooking the need for plasticity. To achieve a better stability-plasticity trade-off, we propose Backward Feature Projection (BFP), a method for continual learning that allows the new features to change up to a learnable linear transformation of the old features. |
Qiao Gu; Dongsub Shim; Florian Shkurti; |
481 | MHPL: Minimum Happy Points Learning for Active Source Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose minimum happy points learning (MHPL) to explore and exploit MH points actively. |
Fan Wang; Zhongyi Han; Zhiyan Zhang; Rundong He; Yilong Yin; |
482 | Fix The Noise: Disentangling Source Feature for Controllable Domain Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new approach for high-quality domain translation with better controllability. |
Dongyeun Lee; Jae Young Lee; Doyeon Kim; Jaehyun Choi; Jaejun Yoo; Junmo Kim; |
483 | Metadata-Based RAW Reconstruction Via Implicit Neural Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there still exist some limitations on taking full use of the metadata. In this paper, instead of following the perspective of sRGB-to-RAW mapping, we reformulate the problem as mapping the 2D coordinates of the metadata to its RAW values conditioned on the corresponding sRGB values. |
Leyi Li; Huijie Qiao; Qi Ye; Qinmin Yang; |
484 | Uni-Perceiver V2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance. |
Hao Li; Jinguo Zhu; Xiaohu Jiang; Xizhou Zhu; Hongsheng Li; Chun Yuan; Xiaohua Wang; Yu Qiao; Xiaogang Wang; Wenhai Wang; Jifeng Dai; |
485 | Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we observe that pixels that are close to each other in the feature space are more likely to share the same class. |
Linshan Wu; Zhun Zhong; Leyuan Fang; Xingxin He; Qiang Liu; Jiayi Ma; Hao Chen; |
486 | Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning With Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better visual dog classifier by reading about dogs and listening to them bark. |
Zhiqiu Lin; Samuel Yu; Zhiyi Kuang; Deepak Pathak; Deva Ramanan; |
487 | Decompose More and Aggregate Better: Two Closer Looks at Frequency Representation Learning for Human Motion Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Encouraged by the effectiveness of encoding temporal dynamics within the frequency domain, recent human motion prediction systems prefer to first convert the motion representation from the original pose space into the frequency space. In this paper, we introduce two closer looks at effective frequency representation learning for robust motion prediction and summarize them as: decompose more and aggregate better. |
Xuehao Gao; Shaoyi Du; Yang Wu; Yang Yang; |
488 | Diversity-Aware Meta Visual Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Diversity-Aware Meta Visual Prompting (DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone. |
Qidong Huang; Xiaoyi Dong; Dongdong Chen; Weiming Zhang; Feifei Wang; Gang Hua; Nenghai Yu; |
489 | Affection: Learning Affective Explanations for Real-World Visual Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the space of emotional reactions induced by real-world images. |
Panos Achlioptas; Maks Ovsjanikov; Leonidas Guibas; Sergey Tulyakov; |
490 | 3D Highlighter: Localizing Regions on 3D Shapes Via Text Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present 3D Highlighter, a technique for localizing semantic regions on a mesh using text as input. |
Dale Decatur; Itai Lang; Rana Hanocka; |
491 | Iterative Geometry Encoding Volume for Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Iterative Geometry Encoding Volume (IGEV-Stereo), a new deep network architecture for stereo matching. |
Gangwei Xu; Xianqi Wang; Xiaohuan Ding; Xin Yang; |
492 | PLA: Language-Driven Open-Vocabulary 3D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to distill knowledge encoded in pre-trained vision-language (VL) foundation models through captioning multi-view images from 3D, which allows explicitly associating 3D and semantic-rich captions. |
Runyu Ding; Jihan Yang; Chuhui Xue; Wenqing Zhang; Song Bai; Xiaojuan Qi; |
493 | FaceLit: Neural 3D Relightable Faces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. |
Anurag Ranjan; Kwang Moo Yi; Jen-Hao Rick Chang; Oncel Tuzel; |
494 | Visual Programming: Compositional Visual Reasoning Without Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. |
Tanmay Gupta; Aniruddha Kembhavi; |
495 | InstMove: Instance Motion for Object-Centric Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video Segmentation. |
Qihao Liu; Junfeng Wu; Yi Jiang; Xiang Bai; Alan L. Yuille; Song Bai; |
496 | Real-Time Evaluation in Online Continual Learning: A New Hope Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any real-world setting, which motivates us to propose: a practical real-time evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions. |
Yasir Ghunaim; Adel Bibi; Kumail Alhamoud; Motasem Alfarra; Hasan Abed Al Kader Hammoud; Ameya Prabhu; Philip H.S. Torr; Bernard Ghanem; |
497 | GRES: Generalized Referring Expression Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new benchmark called Generalized Referring Expression Segmentation (GRES), which extends the classic RES to allow expressions to refer to an arbitrary number of target objects. |
Chang Liu; Henghui Ding; Xudong Jiang; |
498 | Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this work is to develop a more reliable technique that can carry out an end-to-end evaluation of adversarial robustness for commercial systems. |
Xiao Yang; Chang Liu; Longlong Xu; Yikai Wang; Yinpeng Dong; Ning Chen; Hang Su; Jun Zhu; |
499 | BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A novel monocular 3D pose and shape reconstruction algorithm, based on bi-contextual attention and attention-guided modeling (BAAM), is proposed in this work. |
Hyo-Jun Lee; Hanul Kim; Su-Min Choi; Seong-Gyun Jeong; Yeong Jun Koh; |
500 | Freestyle Layout-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore the freestyle capability of the model, i.e., how far can it generate unseen semantics (e.g., classes, attributes, and styles) onto a given layout, and call the task Freestyle LIS (FLIS). |
Han Xue; Zhiwu Huang; Qianru Sun; Li Song; Wenjun Zhang; |
501 | Effective Ambiguity Attack Against Passport-Based DNN Intellectual Property Protection Schemes Through Fully Connected Layer Substitution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the issue of evaluating the security of passport-based IP protection methods. |
Yiming Chen; Jinyu Tian; Xiangyu Chen; Jiantao Zhou; |
502 | Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them. To mimic such capability, we propose Visual Dependency Transformers (DependencyViT) that can induce visual dependencies without any labels. |
Mingyu Ding; Yikang Shen; Lijie Fan; Zhenfang Chen; Zitian Chen; Ping Luo; Joshua B. Tenenbaum; Chuang Gan; |
503 | Differentiable Architecture Search With Random Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make efforts to alleviate the performance collapse problem for DARTS from two aspects. |
Xuanyang Zhang; Yonggang Li; Xiangyu Zhang; Yongtao Wang; Jian Sun; |
504 | Open-Set Fine-Grained Retrieval Via Prompting Vision-Language Evaluator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Prompting vision-Language Evaluator (PLEor) framework based on the recently introduced contrastive language-image pretraining (CLIP) model, for open-set fine-grained retrieval. |
Shijie Wang; Jianlong Chang; Haojie Li; Zhihui Wang; Wanli Ouyang; Qi Tian; |
505 | Sibling-Attack: Rethinking Transferable Adversarial Attacks Against Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent research took an important step towards attacking black-box FR models through leveraging transferability, their performance is still limited, especially against online commercial FR systems that can be pessimistic (e.g., a less than 50% ASR–attack success rate on average). Motivated by this, we present Sibling-Attack, a new FR attack technique for the first time explores a novel multi-task perspective (i.e., leveraging extra information from multi-correlated tasks to boost attacking transferability). |
Zexin Li; Bangjie Yin; Taiping Yao; Junfeng Guo; Shouhong Ding; Simin Chen; Cong Liu; |
506 | Enhanced Stable View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an approach to enhance the novel view synthesis from images taken from a freely moving camera. |
Nishant Jain; Suryansh Kumar; Luc Van Gool; |
507 | Breaching FedMD: Image Recovery Via Paired-Logits Inversion Attack Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we found that even though sharing output logits of public datasets is safer than directly sharing gradients, there still exists a substantial risk of data exposure caused by carefully designed malicious attacks. |
Hideaki Takahashi; Jingjing Liu; Yang Liu; |
508 | TempSAL – Uncovering Temporal Information for Deep Saliency Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals by exploiting human temporal attention patterns. |
Bahar Aydemir; Ludo Hoffstetter; Tong Zhang; Mathieu Salzmann; Sabine Süsstrunk; |
509 | Biomechanics-Guided Facial Action Unit Detection Through Force Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a biomechanics-guided AU detection approach, where facial muscle activation forces are modelled, and are employed to predict AU activation. |
Zijun Cui; Chenyi Kuang; Tian Gao; Kartik Talamadupula; Qiang Ji; |
510 | Equiangular Basis Vectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Equiangular Basis Vectors (EBVs) for classification tasks. |
Yang Shen; Xuhao Sun; Xiu-Shen Wei; |
511 | PIRLNav: Pretraining With Imitation and RL Finetuning for ObjectNav Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PIRLNav, a two-stage learning scheme for BC pretraining on human demonstrations followed by RL-finetuning. |
Ram Ramrakhya; Dhruv Batra; Erik Wijmans; Abhishek Das; |
512 | Megahertz Light Steering Without Moving Parts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a light steering technology that operates at megahertz frequencies, has no moving parts, and costs less than a hundred dollars. |
Adithya Pediredla; Srinivasa G. Narasimhan; Maysamreza Chamanzar; Ioannis Gkioulekas; |
513 | Iterative Proposal Refinement for Weakly-Supervised Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel IteRative prOposal refiNement network (dubbed as IRON) to gradually distill the prior knowledge into each proposal and encourage proposals with more complete coverage. |
Meng Cao; Fangyun Wei; Can Xu; Xiubo Geng; Long Chen; Can Zhang; Yuexian Zou; Tao Shen; Daxin Jiang; |
514 | SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make an attempt to exploit spatial and channel redundancy among features for CNN compression and propose an efficient convolution module, called SCConv (Spatial and Channel reconstruction Convolution), to decrease redundant computing and facilitate representative feature learning. |
Jiafeng Li; Ying Wen; Lianghua He; |
515 | StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is challenging to synthesize high-quality descendant faces with genetic relations due to the lack of large-scale, high-quality annotated kinship data. This paper proposes RFG (Region-level Facial Gene) extraction framework to address this issue. |
Hao Li; Xianxu Hou; Zepeng Huang; Linlin Shen; |
516 | Clothed Human Performance Capture With A Double-Layer Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is inconvenient to extract cloth semantics and capture clothing motion with one-piece template, while single frame-based methods may suffer from instable tracking across videos. To address these problems, we propose a novel method for human performance capture by tracking clothing and human body motion separately with a double-layer neural radiance fields (NeRFs). |
Kangkan Wang; Guofeng Zhang; Suxu Cong; Jian Yang; |
517 | NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel 3D face rendering model, namely NeuFace, to learn accurate and physically-meaningful underlying 3D representations by neural rendering techniques. |
Mingwu Zheng; Haiyu Zhang; Hongyu Yang; Di Huang; |
518 | Cross-Guided Optimization of Radiance Fields With Multi-View Image Super-Resolution for High-Resolution Novel View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, there is a limit to improving the performance of high-resolution novel view synthesis (HRNVS). To solve this problem, we propose a novel framework using cross-guided optimization of the single-image super-resolution (SISR) and radiance fields. |
Youngho Yoon; Kuk-Jin Yoon; |
519 | Unified Pose Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Unified Pose Sequence Modeling approach to unify heterogeneous human behavior understanding tasks based on pose data, e.g., action recognition, 3D pose estimation and 3D early action prediction. |
Lin Geng Foo; Tianjiao Li; Hossein Rahmani; Qiuhong Ke; Jun Liu; |
520 | Probability-Based Global Cross-Modal Upsampling for Pansharpening Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although deep learning (DL) approaches performed well on this task, current upsampling methods used in these approaches only utilize the local information of each pixel in the low-resolution multispectral (LRMS) image while neglecting to exploit its global information as well as the cross-modal information of the guiding panchromatic (PAN) image, which limits their performance improvement. To address this issue, this paper develops a novel probability-based global cross-modal upsampling (PGCU) method for pan-sharpening. |
Zeyu Zhu; Xiangyong Cao; Man Zhou; Junhao Huang; Deyu Meng; |
521 | Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new recipe for a contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S), that in a novel way unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data. |
Sara Sarto; Manuele Barraco; Marcella Cornia; Lorenzo Baraldi; Rita Cucchiara; |
522 | Rethinking Domain Generalization for Face Anti-Spoofing: Separability and Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, instead of constructing a domain-invariant feature space, we encourage domain separability while aligning the live-to-spoof transition (i.e., the trajectory from live to spoof) to be the same for all domains. |
Yiyou Sun; Yaojie Liu; Xiaoming Liu; Yixuan Li; Wen-Sheng Chu; |
523 | SMOC-Net: Leveraging Camera Pose for Self-Supervised Monocular Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the two problems, we propose a novel Network for Self-supervised Monocular Object pose estimation by utilizing the predicted Camera poses from un-annotated real images, called SMOC-Net. |
Tao Tan; Qiulei Dong; |
524 | FAC: 3D Representation Learning Via Foreground Aware Feature Contrast Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing work randomly selects point features as anchors while building contrast, leading to a clear bias toward background points that often dominate in 3D scenes. Also, object awareness and foreground-to-background discrimination are neglected, making contrastive learning less effective. To tackle these issues, we propose a general foreground-aware feature contrast (FAC) framework to learn more effective point cloud representations in pre-training. |
Kangcheng Liu; Aoran Xiao; Xiaoqin Zhang; Shijian Lu; Ling Shao; |
525 | Improving Visual Representation Learning Through Perceptual Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. |
Samyakh Tukra; Frederick Hoffman; Ken Chatfield; |
526 | 3D Cinemagraphy From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present 3D Cinemagraphy, a new technique that marries 2D image animation with 3D photography. |
Xingyi Li; Zhiguo Cao; Huiqiang Sun; Jianming Zhang; Ke Xian; Guosheng Lin; |
527 | Learning Bottleneck Concepts in Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Bottleneck Concept Learner (BotCL), which represents an image solely by the presence/absence of concepts learned through training over the target task without explicit supervision over the concepts. |
Bowen Wang; Liangzhi Li; Yuta Nakashima; Hajime Nagahara; |
528 | Inversion-Based Style Transfer With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we perceive style as a learnable textual description of a painting.We propose an inversion-based style transfer method (InST), which can efficiently and accurately learn the key information of an image, thus capturing and transferring the artistic style of a painting. |
Yuxin Zhang; Nisha Huang; Fan Tang; Haibin Huang; Chongyang Ma; Weiming Dong; Changsheng Xu; |
529 | Learning Human Mesh Recovery in 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for recovering the absolute pose and shape of a human in a pre-scanned scene given a single image. |
Zehong Shen; Zhi Cen; Sida Peng; Qing Shuai; Hujun Bao; Xiaowei Zhou; |
530 | Learning Locally Editable Virtual Humans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel hybrid representation and end-to-end trainable network architecture to model fully editable and customizable neural avatars. |
Hsuan-I Ho; Lixin Xue; Jie Song; Otmar Hilliges; |
531 | Learning Imbalanced Data With Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically investigate the ViTs’ performance in LTR and propose LiVT to train ViTs from scratch only with LT data. |
Zhengzhuo Xu; Ruikang Liu; Shuo Yang; Zenghao Chai; Chun Yuan; |
532 | AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks. |
Runqi Wang; Xiaoyue Duan; Guoliang Kang; Jianzhuang Liu; Shaohui Lin; Songcen Xu; Jinhu Lü; Baochang Zhang; |
533 | PHA: Patch-Wise High-Frequency Augmentation for Transformer-Based Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From a frequency perspective, we reveal that ViTs perform worse than CNNs in preserving key high-frequency components (e.g, clothes texture details) since high-frequency components are inevitably diluted by low-frequency ones due to the intrinsic Self-Attention within ViTs. To remedy such inadequacy of the ViT, we propose a Patch-wise High-frequency Augmentation (PHA) method with two core designs. |
Guiwei Zhang; Yongfei Zhang; Tianyu Zhang; Bo Li; Shiliang Pu; |
534 | StyleRes: Transforming The Residuals for Real Image Editing With StyleGAN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. |
Hamza Pehlivan; Yusuf Dalva; Aysegul Dundar; |
535 | Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing Via Disentangled Video Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel face video editing framework based on diffusion autoencoders that can successfully extract the decomposed features – for the first time as a face video editing model – of identity and motion from a given video. |
Gyeongman Kim; Hajin Shim; Hyunsu Kim; Yunjey Choi; Junho Kim; Eunho Yang; |
536 | Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerce Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to establish a generic multi-modal foundation model that has the scalable capability to massive downstream applications in E-commerce. |
Yang Jin; Yongzhi Li; Zehuan Yuan; Yadong Mu; |
537 | Conditional Text Image Generation With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the problem of text image generation, by taking advantage of the powerful abilities of Diffusion Models in generating photo-realistic and diverse image samples with given conditions, and propose a method called Conditional Text Image Generation with Diffusion Models (CTIG-DM for short). |
Yuanzhi Zhu; Zhaohai Li; Tianwei Wang; Mengchao He; Cong Yao; |
538 | AnchorFormer: Point Cloud Completion From Discriminative Nodes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new shape completion architecture, namely AnchorFormer, that innovatively leverages pattern-aware discriminative nodes, i.e., anchors, to dynamically capture regional information of objects. |
Zhikai Chen; Fuchen Long; Zhaofan Qiu; Ting Yao; Wengang Zhou; Jiebo Luo; Tao Mei; |
539 | Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Co-SLAM, a neural RGB-D SLAM system based on a hybrid representation, that performs robust camera tracking and high-fidelity surface reconstruction in real time. |
Hengyi Wang; Jingwen Wang; Lourdes Agapito; |
540 | SIM: Semantic-Aware Instance Mask Generation for Box-Supervised Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new box-supervised instance segmentation approach by developing a Semantic-aware Instance Mask (SIM) generation paradigm. |
Ruihuang Li; Chenhang He; Yabin Zhang; Shuai Li; Liyi Chen; Lei Zhang; |
541 | Compression-Aware Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In spite of a few pioneering works being proposed recently to super-resolve the compressed videos, they are not specially designed to deal with videos of various levels of compression. In this paper, we propose a novel and practical compression-aware video super-resolution model, which could adapt its video enhancement process to the estimated compression level. |
Yingwei Wang; Xu Jia; Xin Tao; Takashi Isobe; Huchuan Lu; Yu-Wing Tai; |
542 | PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the local point aggregators from the perspective of allocating computational resources. |
Jinyu Li; Chenxu Luo; Xiaodong Yang; |
543 | Regularization of Polynomial Networks for Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability but have yet to reach the performance of the powerful DNN baselines. In this work, we aim to close this performance gap. |
Grigorios G. Chrysos; Bohan Wang; Jiankang Deng; Volkan Cevher; |
544 | Incremental 3D Semantic Scene Graph Prediction From RGB Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a real-time framework that incrementally builds a consistent 3D semantic scene graph of a scene given an RGB image sequence. |
Shun-Cheng Wu; Keisuke Tateno; Nassir Navab; Federico Tombari; |
545 | EfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a family of high-speed vision transformers named EfficientViT. |
Xinyu Liu; Houwen Peng; Ningxin Zheng; Yuqing Yang; Han Hu; Yixuan Yuan; |
546 | VLPD: Context-Aware Pedestrian Detection Via Vision-Language Semantic Self-Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, previous context-aware pedestrian detectors either only learn latent contexts with visual clues, or need laborious annotations to obtain explicit and semantic contexts. Therefore, we propose in this paper a novel approach via Vision-Language semantic self-supervision for context-aware Pedestrian Detection (VLPD) to model explicitly semantic contexts without any extra annotations. |
Mengyin Liu; Jie Jiang; Chao Zhu; Xu-Cheng Yin; |
547 | TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce neural texture learning for 6D object pose estimation from synthetic data and a few unlabelled real images. |
Hanzhi Chen; Fabian Manhardt; Nassir Navab; Benjamin Busam; |
548 | LINe: Out-of-Distribution Detection By Leveraging Important Neurons Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, from the perspective of neurons in the deep layer of the model representing high-level features, we introduce a new aspect for analyzing the difference in model outputs between in-distribution data and OOD data. |
Yong Hyun Ahn; Gyeong-Moon Park; Seong Tae Kim; |
549 | DynIBaR: Neural Dynamic Image-Based Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for long videos with complex object motions and uncontrolled camera trajectories,these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications. Instead of encoding the entire dynamic scene within the weights of MLPs, we present a new approach that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene motion-aware manner. |
Zhengqi Li; Qianqian Wang; Forrester Cole; Richard Tucker; Noah Snavely; |
550 | Unsupervised Object Localization: Observing The Background To Discover Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take a different approach and propose to look for the background instead. |
Oriane Siméoni; Chloé Sekkat; Gilles Puy; Antonín Vobecký; Éloi Zablocki; Patrick Pérez; |
551 | Transforming Radiance Field With Lipschitz Network for Photorealistic 3D Scene Stylization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Simply coupling NeRF with photorealistic style transfer (PST) will result in cross-view inconsistency and degradation of stylized view syntheses. Through a thorough analysis, we demonstrate that this non-trivial task can be simplified in a new light: When transforming the appearance representation of a pre-trained NeRF with Lipschitz mapping, the consistency and photorealism across source views will be seamlessly encoded into the syntheses. |
Zicheng Zhang; Yinglu Liu; Congying Han; Yingwei Pan; Tiande Guo; Ting Yao; |
552 | BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera Via Key-Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Faced with the issues, our work proposes an efficient and robust monocular 3D lane detection called BEV-LaneDet with three main contributions. First, we introduce the Virtual Camera that unifies the in/extrinsic parameters of cameras mounted on different vehicles to guarantee the consistency of the spatial relationship among cameras. |
Ruihao Wang; Jian Qin; Kaiying Li; Yaochen Li; Dong Cao; Jintao Xu; |
553 | Self-Supervised 3D Scene Flow Estimation Guided By Superpoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an iterative end-to-end superpoint based scene flow estimation framework, where the superpoints can be dynamically updated to guide the point-level flow prediction. |
Yaqi Shen; Le Hui; Jin Xie; Jian Yang; |
554 | DiffCollage: Parallel Generation of Large Content With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. |
Qinsheng Zhang; Jiaming Song; Xun Huang; Yongxin Chen; Ming-Yu Liu; |
555 | Efficient Second-Order Plane Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper adopts the Newton’s method to efficiently solve the PA problem. |
Lipu Zhou; |
556 | Guided Depth Super-Resolution By Deep Anisotropic Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose a novel approach which combines guided anisotropic diffusion with a deep convolutional network and advances the state of the art for guided depth super-resolution. |
Nando Metzger; Rodrigo Caye Daudt; Konrad Schindler; |
557 | Fresnel Microfacet BRDF: Unification of Polari-Radiometric Surface-Body Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive a novel analytical reflectance model, which we refer to as Fresnel Microfacet BRDF model, that is physically accurate and generalizes to various real-world surfaces. |
Tomoki Ichikawa; Yoshiki Fukao; Shohei Nobuhara; Ko Nishino; |
558 | A Unified Pyramid Recurrent Network for Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present UPR-Net, a novel Unified Pyramid Recurrent Network for frame interpolation. |
Xin Jin; Longhai Wu; Jie Chen; Youxin Chen; Jayoon Koo; Cheul-hee Hahm; |
559 | Mofusion: A Framework for Denoising-Diffusion-Based Motion Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Conventional methods for human motion synthesis have either been deterministic or have had to struggle with the trade-off between motion diversity vs motion quality. In response to these limitations, we introduce MoFusion, i.e., a new denoising-diffusion-based framework for high-quality conditional human motion synthesis that can synthesise long, temporally plausible, and semantically accurate motions based on a range of conditioning contexts (such as music and text). |
Rishabh Dabral; Muhammad Hamza Mughal; Vladislav Golyanik; Christian Theobalt; |
560 | PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose PoseFormerV2, which exploits a compact representation of lengthy skeleton sequences in the frequency domain to efficiently scale up the receptive field and boost robustness to noisy 2D joint detection. |
Qitao Zhao; Ce Zheng; Mengyuan Liu; Pichao Wang; Chen Chen; |
561 | Mask3D: Pre-Training 2D Vision Transformers By Learning Masked 3D Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, to more effectively understand 3D structural priors in 2D backbones, we propose Mask3D to leverage existing large-scale RGB-D data in a self-supervised pre-training to embed these 3D priors into 2D learned feature representations. |
Ji Hou; Xiaoliang Dai; Zijian He; Angela Dai; Matthias Nießner; |
562 | Physically Adversarial Infrared Patches With Learnable Shapes and Locations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current few physical infrared attacks are complicated to implement in practical application because of their complex transformation from digital world to physical world. To address this issue, in this paper, we propose a physically feasible infrared attack method called "adversarial infrared patches". |
Xingxing Wei; Jie Yu; Yao Huang; |
563 | DiffusioNeRF: Regularizing Neural Radiance Fields With Denoising Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the scene geometry and color fields are severely under-constrained, which can lead to artifacts, especially when trained with few input views. To alleviate this problem we learn a prior over scene geometry and color, using a denoising diffusion model (DDM). |
Jamie Wynn; Daniyar Turmukhambetov; |
564 | Exemplar-FreeSOLO: Enhancing Unsupervised Instance Segmentation With Exemplars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel unsupervised instance segmentation approach, Exemplar-FreeSOLO, to enhance unsupervised instance segmentation by exploiting a limited number of unannotated and unsegmented exemplars. |
Taoseef Ishtiak; Qing En; Yuhong Guo; |
565 | Multimodal Prompting With Missing Modalities for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. |
Yi-Lun Lee; Yi-Hsuan Tsai; Wei-Chen Chiu; Chen-Yu Lee; |
566 | Edge-Aware Regional Message Passing Controller for Image Forgery Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although deep learning-based methods achieve remarkable progress, most of them usually suffer from severe feature coupling between the forged and authentic regions. In this work, we propose a two-step Edge-aware Regional Message Passing Controlling strategy to address the above issue. |
Dong Li; Jiaying Zhu; Menglu Wang; Jiawei Liu; Xueyang Fu; Zheng-Jun Zha; |
567 | Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the problem, we propose a plug-and-play module called Koopman pooling, which is a parameterized high-order pooling technique based on Koopman theory. |
Xinghan Wang; Xin Xu; Yadong Mu; |
568 | Simulated Annealing in Early Layers Leads to Better Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers. |
Amir M. Sarfi; Zahra Karimpour; Muawiz Chaudhary; Nasir M. Khalid; Mirco Ravanelli; Sudhir Mudur; Eugene Belilovsky; |
569 | Spatiotemporal Self-Supervised Learning for Point Clouds in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domains. |
Yanhao Wu; Tong Zhang; Wei Ke; Sabine Süsstrunk; Mathieu Salzmann; |
570 | Semi-Supervised Learning Made Simple With Self-Supervised Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods such as SwAV or DINO into semi-supervised learners. |
Enrico Fini; Pietro Astolfi; Karteek Alahari; Xavier Alameda-Pineda; Julien Mairal; Moin Nabi; Elisa Ricci; |
571 | Blind Image Quality Assessment Via Vision-Language Correspondence: A Multitask Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim at advancing blind image quality assessment (BIQA), which predicts the human perception of image quality without any reference information. |
Weixia Zhang; Guangtao Zhai; Ying Wei; Xiaokang Yang; Kede Ma; |
572 | Exploring Data Geometry for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study continual learning from a novel perspective by exploring data geometry for the non-stationary stream of data. |
Zhi Gao; Chen Xu; Feng Li; Yunde Jia; Mehrtash Harandi; Yuwei Wu; |
573 | Frequency-Modulated Point Cloud Rendering With Easy Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop an effective point cloud rendering pipeline for novel view synthesis, which enables high fidelity local detail reconstruction, real-time rendering and user-friendly editing. |
Yi Zhang; Xiaoyang Huang; Bingbing Ni; Teng Li; Wenjun Zhang; |
574 | Integral Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new family of deep neural networks. |
Kirill Solodskikh; Azim Kurbanov; Ruslan Aydarkhanov; Irina Zhelavskaya; Yury Parfenov; Dehua Song; Stamatios Lefkimmiatis; |
575 | Learning Neural Parametric Head Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel 3D morphable model for complete human heads based on hybrid neural fields. |
Simon Giebenhain; Tobias Kirschstein; Markos Georgopoulos; Martin Rünz; Lourdes Agapito; Matthias Nießner; |
576 | Removing Objects From Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework to remove objects from a NeRF representation created from an RGB-D sequence. |
Silvan Weder; Guillermo Garcia-Hernando; Áron Monszpart; Marc Pollefeys; Gabriel J. Brostow; Michael Firman; Sara Vicente; |
577 | Few-Shot Referring Relationships in Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a query visual relationship as <subject, predicate, object> and a test video, our objective is to localize the subject and object that are connected via the predicate. |
Yogesh Kumar; Anand Mishra; |
578 | Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Structural MPI (S-MPI), where the plane structure approximates 3D scenes concisely. |
Mingfang Zhang; Jinglu Wang; Xiao Li; Yifei Huang; Yoichi Sato; Yan Lu; |
579 | Harmonious Teacher for Cross-Domain Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we reveal that the consistency of classification and localization predictions are crucial to measure the quality of pseudo labels, and propose a new Harmonious Teacher approach to improve the self-training for cross-domain object detection. |
Jinhong Deng; Dongli Xu; Wen Li; Lixin Duan; |
580 | 3D Human Pose Estimation Via Intuitive Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we exploit novel intuitive-physics (IP) terms that can be inferred from a 3D SMPL body interacting with the scene. |
Shashank Tripathi; Lea Müller; Chun-Hao P. Huang; Omid Taheri; Michael J. Black; Dimitrios Tzionas; |
581 | SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current Deep Network (DN) visualization and interpretability methods rely heavily on data space visualizations such as scoring which dimensions of the data are responsible for their associated prediction or generating new data features or samples that best match a given DN unit or representation. In this paper, we go one step further by developing the first provably exact method for computing the geometry of a DN’s mapping — including its decision boundary — over a specified region of the data space. |
Ahmed Imtiaz Humayun; Randall Balestriero; Guha Balakrishnan; Richard G. Baraniuk; |
582 | Learning To Predict Scene-Level Implicit 3D From Posed RGBD Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data. |
Nilesh Kulkarni; Linyi Jin; Justin Johnson; David F. Fouhey; |
583 | EXCALIBUR: Encouraging and Evaluating Embodied Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To encourage the development of exploratory interactive agents, we present the EXCALIBUR benchmark. |
Hao Zhu; Raghav Kapoor; So Yeon Min; Winson Han; Jiatai Li; Kaiwen Geng; Graham Neubig; Yonatan Bisk; Aniruddha Kembhavi; Luca Weihs; |
584 | Visual DNA: Representing and Comparing Images Using Distributions of Neuron Activations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images — and by extension datasets — using Distributions of Neuron Activations (DNAs). |
Benjamin Ramtoula; Matthew Gadd; Paul Newman; Daniele De Martini; |
585 | Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study principled approaches to elevate the recognizability of a face in the embedding space instead of the visual quality. |
Jacky Chen Long Chai; Tiong-Sik Ng; Cheng-Yaw Low; Jaewoo Park; Andrew Beng Jin Teoh; |
586 | Physical-World Optical Adversarial Attacks on 3D Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the real-world challenges, we propose a novel structured-light attack against structured-light-based 3D face recognition. |
Yanjie Li; Yiquan Li; Xuelong Dai; Songtao Guo; Bin Xiao; |
587 | Accelerating Dataset Distillation Via Model Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we assume that training the synthetic data with diverse models leads to better generalization performance. |
Lei Zhang; Jie Zhang; Bowen Lei; Subhabrata Mukherjee; Xiang Pan; Bo Zhao; Caiwen Ding; Yao Li; Dongkuan Xu; |
588 | SE-ORNet: Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, point cloud noise disrupts consistent representations for point cloud and thus degrades the shape correspondence accuracy. To address the above issues, we propose a Self-Ensembling ORientation-aware Network termed SE-ORNet. |
Jiacheng Deng; Chuxin Wang; Jiahao Lu; Jianfeng He; Tianzhu Zhang; Jiyang Yu; Zhe Zhang; |
589 | Raw Image Reconstruction With Learned Compact Metadata Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner. |
Yufei Wang; Yi Yu; Wenhan Yang; Lanqing Guo; Lap-Pui Chau; Alex C. Kot; Bihan Wen; |
590 | Semi-Supervised Video Inpainting With Cycle Consistency Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, in this work, we propose an end-to-end trainable framework consisting of completion network and mask prediction network, which are designed to generate corrupted contents of the current frame using the known mask and decide the regions to be filled of the next frame, respectively. |
Zhiliang Wu; Hanyu Xuan; Changchang Sun; Weili Guan; Kang Zhang; Yan Yan; |
591 | Frame-Event Alignment and Fusion Network for High Frame Rate Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end network consisting of multi-modality alignment and fusion modules to effectively combine meaningful information from both modalities at different measurement rates. |
Jiqing Zhang; Yuanchen Wang; Wenxi Liu; Meng Li; Jinpeng Bai; Baocai Yin; Xin Yang; |
592 | A Bag-of-Prototypes Representation for Dataset-Level Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a bag-of-prototypes (BoP) dataset representation that extends the image level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes. |
Weijie Tu; Weijian Deng; Tom Gedeon; Liang Zheng; |
593 | Level-S$^2$fM: Structure From Motion on Neural Level Set of Implicit Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S2fM, which estimates the camera poses and scene geometry from a set of uncalibrated images by learning coordinate MLPs for the implicit surfaces and the radiance fields from the established keypoint correspondences. |
Yuxi Xiao; Nan Xue; Tianfu Wu; Gui-Song Xia; |
594 | Neuron Structure Modeling for Generalizable Remote Physiological Measurement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we systematically address the domain shift problem in the rPPG measurement task. |
Hao Lu; Zitong Yu; Xuesong Niu; Ying-Cong Chen; |
595 | Shape-Aware Text-Driven Layered Video Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods, however, can only edit object appearance rather than object shape changes due to the limitation of using a fixed UV mapping field for texture atlas. We present a shape-aware, text-driven video editing method to tackle this challenge. |
Yao-Chih Lee; Ji-Ze Genevieve Jang; Yi-Ting Chen; Elizabeth Qiu; Jia-Bin Huang; |
596 | Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a group ranking-based Out-of-Candidate Rectification (OCR) mechanism in a plug-and-play fashion. |
Zesen Cheng; Pengchong Qiao; Kehan Li; Siheng Li; Pengxu Wei; Xiangyang Ji; Li Yuan; Chang Liu; Jie Chen; |
597 | Solving Relaxations of MAP-MRF Problems: Combinatorial In-Face Frank-Wolfe Directions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a key computational subroutine, it uses a variant of the Frank-Wolfe (FW) method to minimize a smooth convex function over a combinatorial polytope. We propose an efficient implementation of this subproutine based on in-face Frank-Wolfe directions, introduced in (Freund et al. 2017) in a different context. |
Vladimir Kolmogorov; |
598 | MEGANE: Morphable Eyeglass and Avatar Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a 3D compositional morphable model of eyeglasses that accurately incorporates high-fidelity geometric and photometric interaction effects. |
Junxuan Li; Shunsuke Saito; Tomas Simon; Stephen Lombardi; Hongdong Li; Jason Saragih; |
599 | Leverage Interactive Affinity for Affordance Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to leverage interactive affinity for affordance learning, i.e., extracting interactive affinity from human-object interaction and transferring it to non-interactive objects. |
Hongchen Luo; Wei Zhai; Jing Zhang; Yang Cao; Dacheng Tao; |
600 | Enhancing Multiple Reliability Measures Via Nuisance-Extended Information Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training concerning both convolutional- and Transformer-based architectures. |
Jongheon Jeong; Sihyun Yu; Hankook Lee; Jinwoo Shin; |
601 | Rethinking The Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first carry out in-depth analysis of the approximation error in the surface fitting problem. |
Hang Du; Xuejun Yan; Jingjing Wang; Di Xie; Shiliang Pu; |
602 | Objaverse: A Universe of Annotated 3D Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite considerable interest and potential applications in 3D vision, datasets of high-fidelity 3D models continue to be mid-sized with limited diversity of object categories. Addressing this gap, we present Objaverse 1.0, a large dataset of objects with 800K+ (and growing) 3D models with descriptive captions, tags, and animations. |
Matt Deitke; Dustin Schwenk; Jordi Salvador; Luca Weihs; Oscar Michel; Eli VanderBilt; Ludwig Schmidt; Kiana Ehsani; Aniruddha Kembhavi; Ali Farhadi; |
603 | MonoATT: Online Monocular 3D Object Detection With Adaptive Token Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an online Mono3D framework, called MonoATT, which leverages a novel vision transformer with heterogeneous tokens of varying shapes and sizes to facilitate mobile Mono3D. |
Yunsong Zhou; Hongzi Zhu; Quan Liu; Shan Chang; Minyi Guo; |
604 | Image Quality-Aware Diagnosis Via Meta-Knowledge Co-Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we raise the problem of image quality-aware diagnosis, which aims to take advantage of low-quality images and image quality labels to achieve a more accurate and robust diagnosis. |
Haoxuan Che; Siyu Chen; Hao Chen; |
605 | A-Cap: Anticipation Captioning With Commonsense Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to emulate this ability, we introduce a novel task called Anticipation Captioning, which generates a caption for an unseen oracle image using a sparsely temporally-ordered set of images. To tackle this new task, we propose a model called A-CAP, which incorporates commonsense knowledge into a pre-trained vision-language model, allowing it to anticipate the caption. |
Duc Minh Vo; Quoc-An Luong; Akihiro Sugimoto; Hideki Nakayama; |
606 | Learning 3D Representations From 2D Pre-Trained Models Via Image-to-Point Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an alternative to obtain superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders, named as I2P-MAE. |
Renrui Zhang; Liuhui Wang; Yu Qiao; Peng Gao; Hongsheng Li; |
607 | BEVFormer V2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition Via Perspective Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel bird’s-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. |
Chenyu Yang; Yuntao Chen; Hao Tian; Chenxin Tao; Xizhou Zhu; Zhaoxiang Zhang; Gao Huang; Hongyang Li; Yu Qiao; Lewei Lu; Jie Zhou; Jifeng Dai; |
608 | Object Discovery From Motion-Guided Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we augment the auto-encoder representation learning framework with two key components: motion-guidance and mid-level feature tokenization. |
Zhipeng Bao; Pavel Tokmakov; Yu-Xiong Wang; Adrien Gaidon; Martial Hebert; |
609 | Domain Generalized Stereo Matching Via Hierarchical Visual Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a Hierarchical Visual Transformation (HVT) network to 1) first transform the training sample hierarchically into new domains with diverse distributions from three levels: Global, Local, and Pixel, 2) then maximize the visual discrepancy between the source domain and new domains, and minimize the cross-domain feature inconsistency to capture domain-invariant features. |
Tianyu Chang; Xun Yang; Tianzhu Zhang; Meng Wang; |
610 | Deep Semi-Supervised Metric Learning With Mixed Label Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Metric learning requires the identification of far-apart similar pairs and close dissimilar pairs during training, and this is difficult to achieve with unlabeled data because pairs are typically assumed to be similar if they are close. We present a novel metric learning method which circumvents this issue by identifying hard negative pairs as those which obtain dissimilar labels via label propagation (LP), when the edge linking the pair of data is removed in the affinity matrix. |
Furen Zhuang; Pierre Moulin; |
611 | Adapting Shortcut With Normalizing Flow: An Efficient Tuning Framework for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel and effective PEFT paradigm, named SNF (Shortcut adaptation via Normalization Flow), which utilizes normalizing flows to adjust the shortcut layers. |
Yaoming Wang; Bowen Shi; Xiaopeng Zhang; Jin Li; Yuchen Liu; Wenrui Dai; Chenglin Li; Hongkai Xiong; Qi Tian; |
612 | Unpaired Image-to-Image Translation With Shortest Path Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we start from a different perspective and consider the paths connecting the two domains. |
Shaoan Xie; Yanwu Xu; Mingming Gong; Kun Zhang; |
613 | MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Subsequently, we propose a general constrained sampling framework that enables controlled trajectory sampling based on differentiable cost functions. |
Chiyu “Max” Jiang; Andre Cornman; Cheolho Park; Benjamin Sapp; Yin Zhou; Dragomir Anguelov; |
614 | OVTrack: Open-Vocabulary Multiple Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, open-vocabulary MOT, that aims to evaluate tracking beyond pre-defined training categories. |
Siyuan Li; Tobias Fischer; Lei Ke; Henghui Ding; Martin Danelljan; Fisher Yu; |
615 | ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop an efficient and fully-convolutional masked autoencoder framework. |
Sanghyun Woo; Shoubhik Debnath; Ronghang Hu; Xinlei Chen; Zhuang Liu; In So Kweon; Saining Xie; |
616 | Hyperspherical Embedding for Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the learned embeddings have sparse distribution in the feature space, which leads to worse generalization results during testing. To address these problems, this paper proposes a hyperspherical module, which transforms and normalizes embeddings from the encoder to be on a unit hypersphere. |
Junming Zhang; Haomeng Zhang; Ram Vasudevan; Matthew Johnson-Roberson; |
617 | Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation. |
Taewoo Kim; Yujeong Chae; Hyun-Kurl Jang; Kuk-Jin Yoon; |
618 | Unsupervised Deep Asymmetric Stereo Matching With Spatially-Adaptive Self-Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel spatially-adaptive self-similarity (SASS) for unsupervised asymmetric stereo matching. |
Taeyong Song; Sunok Kim; Kwanghoon Sohn; |
619 | QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we devise a new style transfer framework called QuantArt for high visual-fidelity stylization. |
Siyu Huang; Jie An; Donglai Wei; Jiebo Luo; Hanspeter Pfister; |
620 | TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the model-based and data-based approaches for this goal and find that the two common approaches cannot achieve the objective of improving both generalization and adversarial robustness. Thus, we propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework, which consists of two neural networks where one of them keeps the population means and variances of pre-training data in the batch normalization layers. |
Ziquan Liu; Yi Xu; Xiangyang Ji; Antoni B. Chan; |
621 | VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VolRecon, a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). |
Yufan Ren; Tong Zhang; Marc Pollefeys; Sabine Süsstrunk; Fangjinhua Wang; |
622 | Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to the non-adaptive proposal cropping and single-level feature mimicking processes, they suffer from information destruction during knowledge extraction and inefficient knowledge transfer. To remedy these limitations, we propose an Object-Aware Distillation Pyramid (OADP) framework, including an Object-Aware Knowledge Extraction (OAKE) module and a Distillation Pyramid (DP) mechanism. |
Luting Wang; Yi Liu; Penghui Du; Zihan Ding; Yue Liao; Qiaosong Qi; Biaolong Chen; Si Liu; |
623 | Evolved Part Masking for Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an evolved part-based masking to pursue more general visual cues modeling in self-supervised learning. |
Zhanzhou Feng; Shiliang Zhang; |
624 | MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. |
Runsen Xu; Tai Wang; Wenwei Zhang; Runjian Chen; Jinkun Cao; Jiangmiao Pang; Dahua Lin; |
625 | SlowLiDAR: Increasing The Latency of LiDAR-Based Detection Using Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first systematic investigation of the availability of LiDAR detection pipelines, and SlowLiDAR, an adversarial perturbation attack that maximizes LiDAR detection runtime. |
Han Liu; Yuhao Wu; Zhiyuan Yu; Yevgeniy Vorobeychik; Ning Zhang; |
626 | Learning A Sparse Transformer Network for Effective Image Deraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key pairs for the feature aggregation. |
Xiang Chen; Hao Li; Mingqiang Li; Jinshan Pan; |
627 | Open-Set Semantic Segmentation for Point Clouds Via Adversarial Prototype Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most of the existing works in literature assume that the training and testing point clouds have the same object classes, but they are generally invalid in many real-world scenarios for identifying the 3D objects whose classes are not seen in the training set. To address this problem, we propose an Adversarial Prototype Framework (APF) for handling the open-set 3D semantic segmentation task, which aims to identify 3D unseen-class points while maintaining the segmentation performance on seen-class points. |
Jianan Li; Qiulei Dong; |
628 | CutMIB: Boosting Light Field Super-Resolution Via Multi-View Image Blending Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For the first time in light field SR, we propose a potent DA strategy called CutMIB to improve the performance of existing light field SR networks while keeping their structures unchanged. |
Zeyu Xiao; Yutong Liu; Ruisheng Gao; Zhiwei Xiong; |
629 | Learning Attention As Disentangler for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to exploit cross-attentions as compositional disentanglers to learn disentangled concept embeddings. |
Shaozhe Hao; Kai Han; Kwan-Yee K. Wong; |
630 | DA-DETR: Domain Adaptive Detection Transformer With Information Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the unique DETR attention mechanisms, we design DA-DETR, a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain. |
Jingyi Zhang; Jiaxing Huang; Zhipeng Luo; Gongjie Zhang; Xiaoqin Zhang; Shijian Lu; |
631 | Energy-Efficient Adaptive 3D Sensing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adaptive active depth sensor that jointly optimizes range, power consumption, and eye-safety. |
Brevin Tilmon; Zhanghao Sun; Sanjeev J. Koppal; Yicheng Wu; Georgios Evangelidis; Ramzi Zahreddine; Gurunandan Krishnan; Sizhuo Ma; Jian Wang; |
632 | CR-FIQA: Face Image Quality Assessment By Learning Sample Relative Classifiability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Face image quality assessment (FIQA) estimates the utility of the captured image in achieving reliable and accurate recognition performance. This work proposes a novel FIQA method, CR-FIQA, that estimates the face image quality of a sample by learning to predict its relative classifiability. |
Fadi Boutros; Meiling Fang; Marcel Klemt; Biying Fu; Naser Damer; |
633 | Endpoints Weight Fusion for Class Incremental Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective method to obtain a model with strong memory of old knowledge, named Endpoints Weight Fusion (EWF). |
Jia-Wen Xiao; Chang-Bin Zhang; Jiekang Feng; Xialei Liu; Joost van de Weijer; Ming-Ming Cheng; |
634 | GeneCIS: A Benchmark for General Conditional Image Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the GeneCIS (‘genesis’) benchmark, which measures models’ ability to adapt to a range of similarity conditions. |
Sagar Vaze; Nicolas Carion; Ishan Misra; |
635 | MetaViewer: Towards A Unified Multi-View Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome them, we propose a novel uniform-to-specific multi-view learning framework from a meta-learning perspective, where the unified representation no longer involves manual manipulation but is automatically derived from a meta-learner named MetaViewer. |
Ren Wang; Haoliang Sun; Yuling Ma; Xiaoming Xi; Yilong Yin; |
636 | MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address UGC Live VQA problems by constructing a first-of-a-kind subjective UGC Live VQA database and developing an effective evaluation tool. |
Zicheng Zhang; Wei Wu; Wei Sun; Danyang Tu; Wei Lu; Xiongkuo Min; Ying Chen; Guangtao Zhai; |
637 | Vision Transformers Are Good Mask Auto-Labelers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. |
Shiyi Lan; Xitong Yang; Zhiding Yu; Zuxuan Wu; Jose M. Alvarez; Anima Anandkumar; |
638 | Neural Transformation Fields for Arbitrary-Styled Font Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we model font generation as a continuous transformation process from the source character image to the target font image via the creation and dissipation of font pixels, and embed the corresponding transformations into a neural transformation field. |
Bin Fu; Junjun He; Jianjun Wang; Yu Qiao; |
639 | Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, we introduce Spring — a large, high-resolution, high-detail, computer-generated benchmark for scene flow, optical flow, and stereo. |
Lukas Mehl; Jenny Schmalfuss; Azin Jahedi; Yaroslava Nalivayko; Andrés Bruhn; |
640 | EDICT: Exact Diffusion Inversion Via Coupled Transformations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. |
Bram Wallace; Akash Gokul; Nikhil Naik; |
641 | Natural Language-Assisted Sign Language Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the problem, we propose the Natural Language-Assisted Sign Language Recognition (NLA-SLR) framework, which exploits semantic information contained in glosses (sign labels). |
Ronglai Zuo; Fangyun Wei; Brian Mak; |
642 | MAESTER: Masked Autoencoder Guided Segmentation at Pixel Resolution for Accurate, Self-Supervised Subcellular Structure Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MAESTER (Masked AutoEncoder guided SegmenTation at pixEl Resolution), a self-supervised method for accurate, subcellular structure segmentation at pixel resolution. |
Ronald Xie; Kuan Pang; Gary D. Bader; Bo Wang; |
643 | Learning Semantic Relationship Among Instances for Image-Text Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we argue that sample relations could help learn subtle differences for hard negative instances, and thus transfer shared knowledge for infrequent samples should be promising in obtaining better holistic embeddings. |
Zheren Fu; Zhendong Mao; Yan Song; Yongdong Zhang; |
644 | AeDet: Azimuth-Invariant Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To preserve the inherent property of the BEV features and ease the optimization, we propose an azimuth-equivariant convolution (AeConv) and an azimuth-equivariant anchor. |
Chengjian Feng; Zequn Jie; Yujie Zhong; Xiangxiang Chu; Lin Ma; |
645 | OCELOT: Overlapped Cell on Tissue Dataset for Histopathology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. |
Jeongun Ryu; Aaron Valero Puche; JaeWoong Shin; Seonwook Park; Biagio Brattoli; Jinhee Lee; Wonkyung Jung; Soo Ick Cho; Kyunghyun Paeng; Chan-Young Ock; Donggeun Yoo; Sérgio Pereira; |
646 | Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details. To solve these problems, we propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT). |
Xiaolong Shen; Zongxin Yang; Xiaohan Wang; Jianxin Ma; Chang Zhou; Yi Yang; |
647 | BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. |
Michael J. Black; Priyanka Patel; Joachim Tesch; Jinlong Yang; |
648 | Self-Supervised Image-to-Point Distillation Via Semantically Tolerant Contrastive Loss Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to alleviate the self-similarity problem through a novel semantically tolerant image-to-point contrastive loss that takes into consideration the semantic distance between positive and negative image regions to minimize contrasting semantically similar point and image regions. |
Anas Mahmoud; Jordan S. K. Hu; Tianshu Kuai; Ali Harakeh; Liam Paull; Steven L. Waslander; |
649 | ProtoCon: Pseudo-Label Refinement Via Online Clustering and Prototypical Consistency for Efficient Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It relies on including high-confidence predictions made on unlabeled data as additional targets to train the model. We propose ProtoCon, a novel SSL method aimed at the less-explored label-scarce SSL where such methods usually underperform. |
Islam Nassar; Munawar Hayat; Ehsan Abbasnejad; Hamid Rezatofighi; Gholamreza Haffari; |
650 | Image Super-Resolution Using T-Tetromino Pixels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve a higher image quality after upscaling, we propose a novel binning concept using tetromino-shaped pixels. |
Simon Grosche; Andy Regensky; Jürgen Seiler; André Kaup; |
651 | GFIE: A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce GFIE, a novel dataset recorded by a gaze data collection system we developed. |
Zhengxi Hu; Yuxue Yang; Xiaolin Zhai; Dingye Yang; Bohan Zhou; Jingtai Liu; |
652 | Efficient Robust Principal Component Analysis Via Block Krylov Iteration and CUR Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently an adaptive rank estimate based RPCA has achieved top performance in low-level vision tasks without the prior rank, but both the rank estimate and RPCA optimization algorithm involve singular value decomposition, which requires extremely huge computational resource for large-scale matrices. To address these issues, an efficient RPCA (eRPCA) algorithm is proposed based on block Krylov iteration and CUR decomposition in this paper. |
Shun Fang; Zhengqin Xu; Shiqian Wu; Shoulie Xie; |
653 | VIVE3D: Viewpoint-Independent Video Editing Using 3D-Aware GANs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce VIVE3D, a novel approach that extends the capabilities of image-based 3D GANs to video editing and is able to represent the input video in an identity-preserving and temporally consistent way. |
Anna Frühstück; Nikolaos Sarafianos; Yuanlu Xu; Peter Wonka; Tony Tung; |
654 | Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To promote the sampling process of stochastic prediction, we propose a novel method, called BOsampler, to adaptively mine potential paths with Bayesian optimization in an unsupervised manner, as a sequential design strategy in which new prediction is dependent on the previously drawn samples. |
Guangyi Chen; Zhenhao Chen; Shunxing Fan; Kun Zhang; |
655 | BKinD-3D: Self-Supervised 3D Keypoint Discovery From Multi-View Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. |
Jennifer J. Sun; Lili Karashchuk; Amil Dravid; Serim Ryou; Sonia Fereidooni; John C. Tuthill; Aggelos Katsaggelos; Bingni W. Brunton; Georgia Gkioxari; Ann Kennedy; Yisong Yue; Pietro Perona; |
656 | StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose StyleRF (Style Radiance Fields), an innovative 3D style transfer technique that resolves the three-way dilemma by performing style transformation within the feature space of a radiance field. |
Kunhao Liu; Fangneng Zhan; Yiwen Chen; Jiahui Zhang; Yingchen Yu; Abdulmotaleb El Saddik; Shijian Lu; Eric P. Xing; |
657 | Semantic Prompt for Few-Shot Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Semantic Prompt (SP) approach for few-shot learning. |
Wentao Chen; Chenyang Si; Zhang Zhang; Liang Wang; Zilei Wang; Tieniu Tan; |
658 | Accidental Light Probes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study recovering lighting from accidental light probes (ALPs)—common, shiny objects like Coke cans, which often accidentally appear in daily scenes. |
Hong-Xing Yu; Samir Agarwala; Charles Herrmann; Richard Szeliski; Noah Snavely; Jiajun Wu; Deqing Sun; |
659 | Iterative Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time. |
Jacob Krantz; Shurjo Banerjee; Wang Zhu; Jason Corso; Peter Anderson; Stefan Lee; Jesse Thomason; |
660 | DPE: Disentanglement of Pose and Expression for General Video Portrait Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel self-supervised disentanglement framework to decouple pose and expression without 3DMMs and paired data, which consists of a motion editing module, a pose generator, and an expression generator. |
Youxin Pang; Yong Zhang; Weize Quan; Yanbo Fan; Xiaodong Cun; Ying Shan; Dong-Ming Yan; |
661 | Adversarial Counterfactual Visual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The proposed approach hypothesizes that Denoising Diffusion Probabilistic Models are excellent regularizers for avoiding high-frequency and out-of-distribution perturbations when generating adversarial attacks. The paper’s key idea is to build attacks through a diffusion model to polish them. |
Guillaume Jeanneret; Loïc Simon; Frédéric Jurie; |
662 | MaLP: Manipulation Localization Using A Proactive Scheme Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prior works termed as passive for manipulation localization exhibit poor generalization performance over unseen GMs and attribute modifications. To combat this issue, we propose a proactive scheme for manipulation localization, termed MaLP. |
Vishal Asnani; Xi Yin; Tal Hassner; Xiaoming Liu; |
663 | Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In turn, they may even reinforce such biases. To help combat these undesired side effects, we present safe latent diffusion (SLD). |
Patrick Schramowski; Manuel Brack; Björn Deiseroth; Kristian Kersting; |
664 | MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously, towards high-quality realistic videos. |
Ludan Ruan; Yiyang Ma; Huan Yang; Huiguo He; Bei Liu; Jianlong Fu; Nicholas Jing Yuan; Qin Jin; Baining Guo; |
665 | HexPlane: A Fast Representation for Dynamic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane. |
Ang Cao; Justin Johnson; |
666 | Boosting Semi-Supervised Learning By Exploiting All Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For better leveraging all unlabeled examples, we propose two novel techniques: Entropy Meaning Loss (EML) and Adaptive Negative Learning (ANL). |
Yuhao Chen; Xin Tan; Borui Zhao; Zhaowei Chen; Renjie Song; Jiajun Liang; Xuequan Lu; |
667 | Novel-View Acoustic Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. |
Changan Chen; Alexander Richard; Roman Shapovalov; Vamsi Krishna Ithapu; Natalia Neverova; Kristen Grauman; Andrea Vedaldi; |
668 | Robust Generalization Against Photon-Limited Corruptions Via Worst-Case Sharpness Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, due to the over-parameterized model being optimized on scarce worst-case data, DRO fails to produce a smooth loss landscape, thus struggling on generalizing well to the test set. Therefore, instead of focusing on the worst-case risk minimization, we propose SharpDRO by penalizing the sharpness of the worst-case distribution, which measures the loss changes around the neighbor of learning parameters. |
Zhuo Huang; Miaoxi Zhu; Xiaobo Xia; Li Shen; Jun Yu; Chen Gong; Bo Han; Bo Du; Tongliang Liu; |
669 | Point2Pix: Photo-Realistic Point Cloud Rendering Via Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Point2Pix as a novel point renderer to link the 3D sparse point clouds with 2D dense image pixels. |
Tao Hu; Xiaogang Xu; Shu Liu; Jiaya Jia; |
670 | Superclass Learning With Representation Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the lack of common semantic features, the existing classification techniques are intractable to recognize superclass without raw class labels, thus they suffer severe performance damage or require huge annotation costs. To narrow this gap, this paper proposes a superclass learning framework, called SuperClass Learning with Representation Enhancement(SCLRE), to recognize super categories by leveraging enhanced representation. |
Jinlong Kang; Liyuan Shang; Suyun Zhao; Hong Chen; Cuiping Li; Zeyu Gan; |
671 | Visual Prompt Tuning for Generative Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a recipe for learning vision transformers by generative knowledge transfer. |
Kihyuk Sohn; Huiwen Chang; José Lezama; Luisa Polania; Han Zhang; Yuan Hao; Irfan Essa; Lu Jiang; |
672 | NICO++: Towards Better Benchmarking for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a large-scale benchmark with extensive labeled domains named NICO++ along with more rational evaluation methods for comprehensively evaluating DG algorithms. |
Xingxuan Zhang; Yue He; Renzhe Xu; Han Yu; Zheyan Shen; Peng Cui; |
673 | CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel CHMatch method, which can learn robust adaptive thresholds for instance-level prediction matching as well as discriminative features by contrastive hierarchical matching. |
Jianlong Wu; Haozhe Yang; Tian Gan; Ning Ding; Feijun Jiang; Liqiang Nie; |
674 | Neural Dependencies Emerging From Learning Massive Categories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents two astonishing findings on neural networks learned for large-scale image classification. |
Ruili Feng; Kecheng Zheng; Kai Zhu; Yujun Shen; Jian Zhao; Yukun Huang; Deli Zhao; Jingren Zhou; Michael Jordan; Zheng-Jun Zha; |
675 | ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the problem of rendering novel views from a Neural Radiance Field (NeRF) under unobserved light conditions. |
Marco Toschi; Riccardo De Matteo; Riccardo Spezialetti; Daniele De Gregorio; Luigi Di Stefano; Samuele Salti; |
676 | ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce ARCTIC — a dataset of two hands that dexterously manipulate objects, containing 2.1M video frames paired with accurate 3D hand and object meshes and detailed, dynamic contact information. |
Zicong Fan; Omid Taheri; Dimitrios Tzionas; Muhammed Kocabas; Manuel Kaufmann; Michael J. Black; Otmar Hilliges; |
677 | Constrained Evolutionary Diffusion Filter for Monocular Endoscope Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Many current methods suffer from an imbalance between exploration and exploitation due to their particle degeneracy and impoverishment, resulting in local optimums. To address this imbalance, this work proposes a new constrained evolutionary diffusion filter for nonlinear optimization. |
Xiongbiao Luo; |
678 | MAGVIT: Masked Generative Video Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a 3D tokenizer to quantize a video into spatial-temporal visual tokens and propose an embedding method for masked video token modeling to facilitate multi-task learning. |
Lijun Yu; Yong Cheng; Kihyuk Sohn; José Lezama; Han Zhang; Huiwen Chang; Alexander G. Hauptmann; Ming-Hsuan Yang; Yuan Hao; Irfan Essa; Lu Jiang; |
679 | Content-Aware Token Sharing for Efficient Semantic Segmentation With Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use Vision Transformers (ViTs). |
Chenyang Lu; Daan de Geus; Gijs Dubbelman; |
680 | Toward Accurate Post-Training Quantization for Image Super Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study post-training quantization(PTQ) for image super resolution using only a few unlabeled calibration images. |
Zhijun Tu; Jie Hu; Hanting Chen; Yunhe Wang; |
681 | Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a novel approach to 4D radar-based scene flow estimation via cross-modal learning. |
Fangqiang Ding; Andras Palffy; Dariu M. Gavrila; Chris Xiaoxuan Lu; |
682 | OmniMAE: Single Model Masked Pretraining on Images and Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that masked autoencoding can be used to train a simple Vision Transformer on images and videos, without requiring any labeled data. |
Rohit Girdhar; Alaaeldin El-Nouby; Mannat Singh; Kalyan Vasudev Alwala; Armand Joulin; Ishan Misra; |
683 | Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to decompose a video into a background and a set of foreground layers, where the background captures stationary elements while the foreground layers capture moving objects along with their associated effects (e.g. shadows and reflections). |
Mohammed Suhail; Erika Lu; Zhengqi Li; Noah Snavely; Leonid Sigal; Forrester Cole; |
684 | Real-Time Neural Light Field on Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. |
Junli Cao; Huan Wang; Pavlo Chemerys; Vladislav Shakhrai; Ju Hu; Yun Fu; Denys Makoviichuk; Sergey Tulyakov; Jian Ren; |
685 | Incrementer: Transformer for Class-Incremental Semantic Segmentation With Knowledge Distillation Focusing on Old Class Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing methods are based on convolutional networks and prevent forgetting through knowledge distillation, which (1) need to add additional convolutional layers to predict new classes, and (2) ignore to distinguish different regions corresponding to old and new classes during knowledge distillation and roughly distill all the features, thus limiting the learning of new classes. Based on the above observations, we propose a new transformer framework for class-incremental semantic segmentation, dubbed Incrementer, which only needs to add new class tokens to the transformer decoder for new-class learning. |
Chao Shang; Hongliang Li; Fanman Meng; Qingbo Wu; Heqian Qiu; Lanxiao Wang; |
686 | End-to-End Video Matting With Trimap Propagation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we present a more robust and faster end-to-end video matting model equipped with trimap propagation called FTP-VM (Fast Trimap Propagation – Video Matting). |
Wei-Lun Huang; Ming-Sui Lee; |
687 | DropMAE: Masked Autoencoders With Spatial-Attention Dropout for Tracking Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-based downstream tasks, including visual object tracking (VOT) and video object segmentation (VOS). |
Qiangqiang Wu; Tianyu Yang; Ziquan Liu; Baoyuan Wu; Ying Shan; Antoni B. Chan; |
688 | Are Binary Annotations Sufficient? Video Moment Retrieval Via Hierarchical Uncertainty-Based Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore a new interactive manner to stimulate the process of human-in-the-loop annotation in video moment retrieval task. |
Wei Ji; Renjie Liang; Zhedong Zheng; Wenqiao Zhang; Shengyu Zhang; Juncheng Li; Mengze Li; Tat-seng Chua; |
689 | High-Fidelity Clothed Avatar Reconstruction From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a framework for efficient 3D clothed avatar reconstruction. |
Tingting Liao; Xiaomei Zhang; Yuliang Xiu; Hongwei Yi; Xudong Liu; Guo-Jun Qi; Yong Zhang; Xuan Wang; Xiangyu Zhu; Zhen Lei; |
690 | Zero-Shot Object Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Starting from a class name, we propose a method that can accurately identify the optimal patches which can then be used as counting exemplars. |
Jingyi Xu; Hieu Le; Vu Nguyen; Viresh Ranjan; Dimitris Samaras; |
691 | Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a novel ViT-based module called PatchMix that effectively builds up the intermediate domain, i.e., probability distribution, by learning to sample patches from both domains based on the game-theoretical models. |
Jinjing Zhu; Haotian Bai; Lin Wang; |
692 | Implicit Diffusion Models for Continuous Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution. |
Sicheng Gao; Xuhui Liu; Bohan Zeng; Sheng Xu; Yanjing Li; Xiaoyan Luo; Jianzhuang Liu; Xiantong Zhen; Baochang Zhang; |
693 | VGFlow: Visibility Guided Flow Network for Human Reposing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These difficulties are further exacerbated by the fact that the possible space of pose orientation for humans is large and variable, the nature of clothing items are highly non-rigid and the diversity in body shape differ largely among the population. To alleviate these difficulties and synthesize perceptually accurate images, we propose VGFlow, a model which uses a visibility guided flow module to disentangle the flow into visible and invisible parts of the target for simultaneous texture preservation and style manipulation. |
Rishabh Jain; Krishna Kumar Singh; Mayur Hemani; Jingwan Lu; Mausoom Sarkar; Duygu Ceylan; Balaji Krishnamurthy; |
694 | Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a novel differentiable angle coder named phase-shifting coder (PSC) is proposed to accurately predict the orientation of objects, along with a dual-frequency version (PSCD). |
Yi Yu; Feipeng Da; |
695 | Improving Selective Visual Question Answering By Learning From Your Peers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data. |
Corentin Dancette; Spencer Whitehead; Rishabh Maheshwary; Ramakrishna Vedantam; Stefan Scherer; Xinlei Chen; Matthieu Cord; Marcus Rohrbach; |
696 | CAMS: CAnonicalized Manipulation Spaces for Category-Level Functional Hand-Object Manipulation Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on a novel task of category-level functional hand-object manipulation synthesis covering both rigid and articulated object categories. |
Juntian Zheng; Qingyuan Zheng; Lixing Fang; Yun Liu; Li Yi; |
697 | Neural Lens Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose NeuroLens, a neural lens model for distortion and vignetting that can be used for point projection and ray casting and can be optimized through both operations. |
Wenqi Xian; Aljaž Božič; Noah Snavely; Christoph Lassner; |
698 | CoralStyleCLIP: Co-Optimized Region and Layer Selection for Image Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CoralStyleCLIP, which incorporates a multi-layer attention-guided blending strategy in the feature space of StyleGAN2 for obtaining high-fidelity edits. |
Ambareesh Revanur; Debraj Basu; Shradha Agrawal; Dhwanit Agarwal; Deepak Pai; |
699 | GLeaD: Improving GANs With A Generator-Leading Task Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We believe that the pioneering attempt present in this work could inspire the community with better designed generator-leading tasks for GAN improvement. |
Qingyan Bai; Ceyuan Yang; Yinghao Xu; Xihui Liu; Yujiu Yang; Yujun Shen; |
700 | GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enable high-quality, efficient, fast, and controllable text-to-image synthesis, we propose Generative Adversarial CLIPs, namely GALIP. |
Ming Tao; Bing-Kun Bao; Hao Tang; Changsheng Xu; |
701 | Look, Radiate, and Learn: Self-Supervised Localisation Via Radio-Visual Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep learning has revolutionised computer vision but has had limited application to radio perception tasks, in part due to lack of systematic datasets and benchmarks dedicated to the study of the performance and promise of radio sensing. To address this gap, we present MaxRay: a synthetic radio-visual dataset and benchmark that facilitate precise target localisation in radio. |
Mohammed Alloulah; Maximilian Arnold; |
702 | Multiplicative Fourier Level of Detail Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a simple yet surprisingly effective implicit representing scheme called Multiplicative Fourier Level of Detail (MFLOD) motivated by the recent success of multiplicative filter network. |
Yishun Dou; Zhong Zheng; Qiaoqiao Jin; Bingbing Ni; |
703 | Indiscernible Object Counting in Underwater Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. |
Guolei Sun; Zhaochong An; Yun Liu; Ce Liu; Christos Sakaridis; Deng-Ping Fan; Luc Van Gool; |
704 | Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose shape-erased feature learning paradigm that decorrelates modality-shared features in two orthogonal subspaces. |
Jiawei Feng; Ancong Wu; Wei-Shi Zheng; |
705 | Relational Context Learning for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose the multiplex relation network (MUREN) that performs rich context exchange between three decoder branches using unary, pairwise, and ternary relations of human, object, and interaction tokens. |
Sanghyun Kim; Deunsol Jung; Minsu Cho; |
706 | Low-Light Image Enhancement Via Structure Modeling and Guidance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new framework for low-light image enhancement by simultaneously conducting the appearance as well as structure modeling. |
Xiaogang Xu; Ruixing Wang; Jiangbo Lu; |
707 | On Calibrating Semantic Segmentation Models: Analyses and An Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. |
Dongdong Wang; Boqing Gong; Liqiang Wang; |
708 | Visual Atoms: Pre-Training Vision Transformers With Sinusoidal Waves Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the present work, we develop a novel methodology based on circular harmonics for systematically investigating the design space of contour-oriented synthetic datasets. |
Sora Takashima; Ryo Hayamizu; Nakamasa Inoue; Hirokatsu Kataoka; Rio Yokota; |
709 | Multi-Label Compound Expression Recognition: C-EXPR Database & Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present an in-the-wild A/V database, C-EXPR-DB, consisting of 400 videos of 200K frames, annotated in terms of 13 compound expressions, valence-arousal emotion descriptors, action units, speech, facial landmarks and attributes. |
Dimitrios Kollias; |
710 | Masked Autoencoding Does Not Help Natural Language Supervision at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we investigate whether a similar approach can be effective when trained with a much larger amount of data. |
Floris Weers; Vaishaal Shankar; Angelos Katharopoulos; Yinfei Yang; Tom Gunter; |
711 | CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We identify the two core obstacles that need to be tackled when incorporating these models into detector training: (1) the distribution mismatch that happens when applying a VL-model trained on whole images to region recognition tasks; (2) the difficulty of localizing objects of unseen classes. To overcome these obstacles, we propose CORA, a DETR-style framework that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching. |
Xiaoshi Wu; Feng Zhu; Rui Zhao; Hongsheng Li; |
712 | 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an adaptation framework, where the source domain is a pre-trained 3D-GAN, while the target domain is a 2D-GAN trained on artistic datasets. |
Rameen Abdal; Hsin-Ying Lee; Peihao Zhu; Menglei Chai; Aliaksandr Siarohin; Peter Wonka; Sergey Tulyakov; |
713 | Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. |
Kun Su; Kaizhi Qian; Eli Shlizerman; Antonio Torralba; Chuang Gan; |
714 | Transductive Few-Shot Learning With Prototype-Based Label Propagation By Iterative Graph Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel prototype-based label propagation to solve these issues. |
Hao Zhu; Piotr Koniusz; |
715 | Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. |
Long Li; Junwei Han; Ni Zhang; Nian Liu; Salman Khan; Hisham Cholakkal; Rao Muhammad Anwer; Fahad Shahbaz Khan; |
716 | Alias-Free Convnets: Fractional Shift Invariance Via Polynomial Activations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an extended anti-aliasing method that tackles both down-sampling and non-linear layers, thus creating truly alias-free, shift-invariant CNNs. |
Hagay Michaeli; Tomer Michaeli; Daniel Soudry; |
717 | Binary Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that a binary latent space can be explored for compact yet expressive image representations. |
Ze Wang; Jiang Wang; Zicheng Liu; Qiang Qiu; |
718 | Person Image Synthesis Via Denoising Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution. |
Ankan Kumar Bhunia; Salman Khan; Hisham Cholakkal; Rao Muhammad Anwer; Jorma Laaksonen; Mubarak Shah; Fahad Shahbaz Khan; |
719 | Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically model randomization testing can be overinterpreted if regarded as a primary criterion for selecting or discarding explanation methods. To address shortcomings of this test, we start by observing an experimental gap in the ranking of explanation methods between randomization-based sanity checks [1] and model output faithfulness measures (e.g. [20]). |
Alexander Binder; Leander Weber; Sebastian Lapuschkin; Grégoire Montavon; Klaus-Robert Müller; Wojciech Samek; |
720 | Neural Part Priors: Learning To Optimize Part-Based Object Completion in RGB-D Scans Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose to learn Neural Part Priors (NPPs), parametric spaces of objects and their parts, that enable optimizing to fit to a new input 3D scan geometry with global scene consistency constraints. |
Aleksei Bokhovkin; Angela Dai; |
721 | Adaptive Assignment for Geometry Aware Local Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accordingly, we introduce AdaMatcher, which first accomplishes the feature correlation and co-visible area estimation through an elaborate feature interaction module, then performs adaptive assignment on patch-level matching while estimating the scales between images, and finally refines the co-visible matches through scale alignment and sub-pixel regression module. |
Dihe Huang; Ying Chen; Yong Liu; Jianlin Liu; Shang Xu; Wenlong Wu; Yikang Ding; Fan Tang; Chengjie Wang; |
722 | Initialization Noise in Image Gradients and Saliency Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine gradients of logits of image classification CNNs by input pixel values. |
Ann-Christin Woerl; Jan Disselhoff; Michael Wand; |
723 | FLAG3D: A 3D Fitness Activity Dataset With Language Instruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories. |
Yansong Tang; Jinpeng Liu; Aoyang Liu; Bin Yang; Wenxun Dai; Yongming Rao; Jiwen Lu; Jie Zhou; Xiu Li; |
724 | Implicit Neural Head Synthesis Via Controllable Local Deformation Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build on part-based implicit shape models that decompose a global deformation field into local ones. |
Chuhan Chen; Matthew O’Toole; Gaurav Bharaj; Pablo Garrido; |
725 | NeuralUDF: Learning Unsigned Distance Fields for Multi-View Reconstruction of Surfaces With Arbitrary Topologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method, called NeuralUDF, for reconstructing surfaces with arbitrary topologies from 2D images via volume rendering. |
Xiaoxiao Long; Cheng Lin; Lingjie Liu; Yuan Liu; Peng Wang; Christian Theobalt; Taku Komura; Wenping Wang; |
726 | Towards Trustable Skin Cancer Diagnosis Via Rewriting Model’s Decision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When a model performs decision-making based on these spurious correlations, it can become untrustable and lead to catastrophic outcomes when deployed in the real-world scene. In this paper, we explore and try to solve this problem in the context of skin cancer diagnosis. |
Siyuan Yan; Zhen Yu; Xuelin Zhang; Dwarikanath Mahapatra; Shekhar S. Chandra; Monika Janda; Peter Soyer; Zongyuan Ge; |
727 | Curricular Object Manipulation in LiDAR-Based Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores the potential of curriculum learning in LiDAR-based 3D object detection by proposing a curricular object manipulation (COM) framework. |
Ziyue Zhu; Qiang Meng; Xiao Wang; Ke Wang; Liujiang Yan; Jian Yang; |
728 | Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel framework in which a static vision-language stream and a dynamic vision-language stream are developed to collaboratively reason the target tube. |
Zihang Lin; Chaolei Tan; Jian-Fang Hu; Zhi Jin; Tiancai Ye; Wei-Shi Zheng; |
729 | Shape-Constraint Recurrent Flow for 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a shape-constraint recurrent flow network for 6D object pose estimation, which embeds the 3D shape information of the targets into the matching procedure. |
Yang Hai; Rui Song; Jiaojiao Li; Yinlin Hu; |
730 | FeatER: An Efficient Network for Human Reconstruction Via Feature Map-Based TransformER Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, much of the performance benefit in recent HPE and HMR methods has come at the cost of ever-increasing computation and memory needs. Therefore, to simultaneously address these problems, we propose FeatER, a novel transformer design which preserves the inherent structure of feature map representations when modeling attention while reducing the memory and computational costs. |
Ce Zheng; Matias Mendieta; Taojiannan Yang; Guo-Jun Qi; Chen Chen; |
731 | Micron-BERT: BERT-Based Facial Micro-Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents Micron-BERT (u-BERT), a novel approach to facial micro-expression recognition. |
Xuan-Bac Nguyen; Chi Nhan Duong; Xin Li; Susan Gauch; Han-Seok Seo; Khoa Luu; |
732 | Residual Degradation Learning Unfolding Framework With Mixing Priors Across Spectral and Spatial for Compressive Spectral Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Residual Degradation Learning Unfolding Framework (RDLUF), which bridges the gap between the sensing matrix and the degradation process. |
Yubo Dong; Dahua Gao; Tian Qiu; Yuyan Li; Minxi Yang; Guangming Shi; |
733 | Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we try to robustify NIR2RGB translation by designing the optimal spectrum of auxiliary illumination in the wide-band VIS-NIR range, while keeping visual friendliness. |
Muyao Niu; Zhuoxiao Li; Zhihang Zhong; Yinqiang Zheng; |
734 | PanelNet: Understanding 360 Indoor Environment Via Panel Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indoor 360 panoramas have two essential properties. (1) The panoramas are continuous and seamless in the horizontal direction. (2) Gravity plays an important role in indoor environment design. By leveraging these properties, we present PanelNet, a framework that understands indoor environments using a novel panel representation of 360 images. |
Haozheng Yu; Lu He; Bing Jian; Weiwei Feng; Shan Liu; |
735 | Learning With Noisy Labels Via Self-Supervised Adversarial Noisy Masking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous efforts tend to mitigate this problem via identifying and removing noisy samples or correcting their labels according to the statistical properties (e.g., loss values) among training samples. In this paper, we aim to tackle this problem from a new perspective, delving into the deep feature maps, we empirically find that models trained with clean and mislabeled samples manifest distinguishable activation feature distributions. |
Yuanpeng Tu; Boshen Zhang; Yuxi Li; Liang Liu; Jian Li; Jiangning Zhang; Yabiao Wang; Chengjie Wang; Cai Rong Zhao; |
736 | PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a learning-based testing method, termed PoseExaminer, that automatically diagnoses HPS algorithms by searching over the parameter space of human pose images to find the failure modes. |
Qihao Liu; Adam Kortylewski; Alan L. Yuille; |
737 | GamutMLP: A Lightweight MLP for Color Loss Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by neural implicit representations for 2D images, we propose a method that optimizes a lightweight multi-layer-perceptron (MLP) model during the gamut reduction step to predict the clipped values. |
Hoang M. Le; Brian Price; Scott Cohen; Michael S. Brown; |
738 | Instance-Aware Domain Generalization for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. |
Qianyu Zhou; Ke-Yue Zhang; Taiping Yao; Xuequan Lu; Ran Yi; Shouhong Ding; Lizhuang Ma; |
739 | GANHead: Towards Generative Animatable Neural Head Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This task is challenging, and it is difficult for existing methods to satisfy all the requirements at once. To achieve these goals, we propose GANHead (Generative Animatable Neural Head Avatar), a novel generative head model that takes advantages of both the fine-grained control over the explicit expression parameters and the realistic rendering results of implicit representations. |
Sijing Wu; Yichao Yan; Yunhao Li; Yuhao Cheng; Wenhan Zhu; Ke Gao; Xiaobo Li; Guangtao Zhai; |
740 | Towards Domain Generalization for Multi-View 3D Object Detection in Bird-Eye-View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To acquire a robust depth prediction, we propose to decouple the depth estimation from the intrinsic parameters of the camera (i.e. the focal length) through converting the prediction of metric depth to that of scale-invariant depth and perform dynamic perspective augmentation to increase the diversity of the extrinsic parameters (i.e. the camera poses) by utilizing homography. |
Shuo Wang; Xinhai Zhao; Hai-Ming Xu; Zehui Chen; Dameng Yu; Jiahao Chang; Zhen Yang; Feng Zhao; |
741 | Robust and Scalable Gaussian Process Regression and Its Applications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a robust and scalable Gaussian process regression (GPR) model via variational learning. |
Yifan Lu; Jiayi Ma; Leyuan Fang; Xin Tian; Junjun Jiang; |
742 | Deep Dive Into Gradients: Better Optimization for 3D Object Detection With Gradient-Corrected IoU Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Gradient-Corrected IoU (GCIoU) loss to achieve fast and accurate 3D bounding box regression. |
Qi Ming; Lingjuan Miao; Zhe Ma; Lin Zhao; Zhiqiang Zhou; Xuhui Huang; Yuanpei Chen; Yufei Guo; |
743 | Doubly Right Object Recognition: A Why Prompt for Visual Rationales Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a "doubly right" object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. |
Chengzhi Mao; Revant Teotia; Amrutha Sundar; Sachit Menon; Junfeng Yang; Xin Wang; Carl Vondrick; |
744 | Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHep- herding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. |
Jinwoo Kim; Janghyuk Choi; Ho-Jin Choi; Seon Joo Kim; |
745 | High-Fidelity Event-Radiance Recovery Via Transient Event Frequency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to use event cameras with bio-inspired silicon sensors, which are sensitive to radiance changes, to recover precise radiance values. |
Jin Han; Yuta Asano; Boxin Shi; Yinqiang Zheng; Imari Sato; |
746 | NeMo: Learning 3D Neural Motion Fields From Multiple Video Instances of The Same Action Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. |
Kuan-Chieh Wang; Zhenzhen Weng; Maria Xenochristou; João Pedro Araújo; Jeffrey Gu; Karen Liu; Serena Yeung; |
747 | RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose RIATIG, a reliable and imperceptible adversarial attack against text-to-image models via inconspicuous examples. |
Han Liu; Yuhao Wu; Shixuan Zhai; Bo Yuan; Ning Zhang; |
748 | Distilling Neural Fields for Real-Time Articulated Shape Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for reconstructing articulated 3D models from videos in real-time, without test-time optimization or manual 3D supervision at training time. |
Jeff Tan; Gengshan Yang; Deva Ramanan; |
749 | GLIGEN: Open-Set Grounded Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN: Open-Set Grounded Text-to-Image Generation, a novel approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion models by enabling them to also be conditioned on grounding inputs. |
Yuheng Li; Haotian Liu; Qingyang Wu; Fangzhou Mu; Jianwei Yang; Jianfeng Gao; Chunyuan Li; Yong Jae Lee; |
750 | Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SelTDA (Self-Taught Data Augmentation), a strategy for finetuning large VLMs on small-scale VQA datasets. |
Zaid Khan; Vijay Kumar BG; Samuel Schulter; Xiang Yu; Yun Fu; Manmohan Chandraker; |
751 | IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose IPCC-TP, a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling. |
Dekai Zhu; Guangyao Zhai; Yan Di; Fabian Manhardt; Hendrik Berkemeyer; Tuan Tran; Nassir Navab; Federico Tombari; Benjamin Busam; |
752 | Improving Robust Generalization By Direct PAC-Bayesian Bound Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although previous work provided theoretical explanations for this phenomenon using a robust PAC-Bayesian bound over the adversarial test error, related algorithmic derivations are at best only loosely connected to this bound, which implies that there is still a gap between their empirical success and our understanding of adversarial robustness theory. To close this gap, in this paper we consider a different form of the robust PAC-Bayesian bound and directly minimize it with respect to the model posterior. |
Zifan Wang; Nan Ding; Tomer Levinboim; Xi Chen; Radu Soricut; |
753 | MobileOne: An Improved One Millisecond Mobile Backbone Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these metrics may not correlate well with latency of the network when deployed on a mobile device. Therefore, we perform extensive analysis of different metrics by deploying several mobile-friendly networks on a mobile device. |
Pavan Kumar Anasosalu Vasu; James Gabriel; Jeff Zhu; Oncel Tuzel; Anurag Ranjan; |
754 | A Data-Based Perspective on Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a framework for probing the impact of the source dataset’s composition on transfer learning performance. |
Saachi Jain; Hadi Salman; Alaa Khaddaj; Eric Wong; Sung Min Park; Aleksander Mądry; |
755 | AssemblyHands: Towards Egocentric Activity Understanding Via 3D Hand Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of egocentric activities with challenging hand-object interactions. |
Takehiko Ohkawa; Kun He; Fadime Sener; Tomas Hodan; Luan Tran; Cem Keskin; |
756 | Scene-Aware Egocentric 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene. To address this issue, we propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints. |
Jian Wang; Diogo Luvizon; Weipeng Xu; Lingjie Liu; Kripasindhu Sarkar; Christian Theobalt; |
757 | Learning Geometry-Aware Representations By Sketching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Understanding geometric concepts, such as distance and shape, is essential for understanding the real world and also for many vision tasks. To incorporate such information into a visual representation of a scene, we propose learning to represent the scene by sketching, inspired by human behavior. |
Hyundo Lee; Inwoo Hwang; Hyunsung Go; Won-Seok Choi; Kibeom Kim; Byoung-Tak Zhang; |
758 | SVFormer: Semi-Supervised Video Transformer for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the use of transformer models under the SSL setting for action recognition. |
Zhen Xing; Qi Dai; Han Hu; Jingjing Chen; Zuxuan Wu; Yu-Gang Jiang; |
759 | X-Avatar: Expressive Human Avatars Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present X-Avatar, a novel avatar model that captures the full expressiveness of digital humans to bring about life-like experiences in telepresence, AR/VR and beyond. |
Kaiyue Shen; Chen Guo; Manuel Kaufmann; Juan Jose Zarate; Julien Valentin; Jie Song; Otmar Hilliges; |
760 | AccelIR: Task-Aware Image Compression for Accelerating Neural Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present AccelIR, a framework that optimizes image compression considering the end-to-end pipeline of IR tasks. |
Juncheol Ye; Hyunho Yeo; Jinwoo Park; Dongsu Han; |
761 | BEV-Guided Multi-Modality Fusion for Driving Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce BEVGuide, a novel Bird’s Eye-View (BEV) representation learning framework, representing the first attempt to unify a wide range of sensors under direct BEV guidance in an end-to-end fashion. |
Yunze Man; Liang-Yan Gui; Yu-Xiong Wang; |
762 | Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. |
Minyoung Hwang; Jaeyeon Jeong; Minsoo Kim; Yoonseon Oh; Songhwai Oh; |
763 | Proximal Splitting Adversarial Attack for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The methods proposed in these works do not accurately solve the adversarial segmentation problem and, therefore, overestimate the size of the perturbations required to fool models. Here, we propose a white-box attack for these models based on a proximal splitting to produce adversarial perturbations with much smaller l_infinity norms. |
Jérôme Rony; Jean-Christophe Pesquet; Ismail Ben Ayed; |
764 | Improved Test-Time Adaptation for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work addresses those two factors by proposing an Improved Test-Time Adaptation (ITTA) method. |
Liang Chen; Yong Zhang; Yibing Song; Ying Shan; Lingqiao Liu; |
765 | Recovering 3D Hand Mesh Sequence From A Single Blurry Image: A New Dataset and Temporal Unfolding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate the usefulness of BlurHand for the 3D hand mesh recovery from blurry images in our experiments. |
Yeonguk Oh; JoonKyu Park; Jaeha Kim; Gyeongsik Moon; Kyoung Mu Lee; |
766 | NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. |
Santhosh Kumar Ramakrishnan; Ziad Al-Halah; Kristen Grauman; |
767 | Correspondence Transformers With Asymmetric Feature Learning and Matching Flow Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper solves the problem of learning dense visual correspondences between different object instances of the same category with only sparse annotations. |
Yixuan Sun; Dongyang Zhao; Zhangyue Yin; Yiwen Huang; Tao Gui; Wenqiang Zhang; Weifeng Ge; |
768 | Adjustment and Alignment for Unbiased Open Set Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although the causality has been studied to remove the semantic-level bias, the non-available novel-class samples result in the failure of existing causal solutions in OSDA. To break through this barrier, we propose a novel causality-driven solution with the unexplored front-door adjustment theory, and then implement it with a theoretically grounded framework, coined AdjustmeNt aNd Alignment (ANNA), to achieve an unbiased OSDA. |
Wuyang Li; Jie Liu; Bo Han; Yixuan Yuan; |
769 | FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose FedSeg, a basic federated learning approach for class-heterogeneous semantic segmentation. |
Jiaxu Miao; Zongxin Yang; Leilei Fan; Yi Yang; |
770 | NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. |
Seung Wook Kim; Bradley Brown; Kangxue Yin; Karsten Kreis; Katja Schwarz; Daiqing Li; Robin Rombach; Antonio Torralba; Sanja Fidler; |
771 | DPF: Learning Dense Prediction Fields With Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In stark contrast to them, we propose a new paradigm that makes predictions for point coordinate queries, as inspired by the recent success of implicit representations, like distance or radiance fields. |
Xiaoxue Chen; Yuhang Zheng; Yupeng Zheng; Qiang Zhou; Hao Zhao; Guyue Zhou; Ya-Qin Zhang; |
772 | Fast Monocular Scene Reconstruction With Global-Sparse Local-Dense Grids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to directly use signed distance function (SDF) in sparse voxel block grids for fast and accurate scene reconstruction without MLPs. |
Wei Dong; Christopher Choy; Charles Loop; Or Litany; Yuke Zhu; Anima Anandkumar; |
773 | Thermal Spread Functions (TSF): Physics-Guided Material Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a physics-guided material classification framework that relies on thermal properties of the object. |
Aniket Dashpute; Vishwanath Saragadam; Emma Alexander; Florian Willomitzer; Aggelos Katsaggelos; Ashok Veeraraghavan; Oliver Cossairt; |
774 | ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ESLAM, an efficient implicit neural representation method for Simultaneous Localization and Mapping (SLAM). |
Mohammad Mahdi Johari; Camilla Carta; François Fleuret; |
775 | CNVid-3.5M: Build, Filter, and Pre-Train The Large-Scale Public Chinese Video-Text Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: 2) To improve the data quality, we propose a novel method to filter out 1M weakly-paired videos, resulting in the CNVid-3.5M dataset. |
Tian Gan; Qing Wang; Xingning Dong; Xiangyuan Ren; Liqiang Nie; Qingpei Guo; |
776 | Unsupervised Space-Time Network for Temporally-Consistent Segmentation of Multiple Motions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an original unsupervised spatio-temporal framework for motion segmentation from optical flow that fully investigates the temporal dimension of the problem. |
Etienne Meunier; Patrick Bouthemy; |
777 | Unsupervised 3D Point Cloud Representation Learning By Triangle Constrained Contrast for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design the Triangle Constrained Contrast (TriCC) framework tailored for autonomous driving scenes which learns 3D unsupervised representations through both the multimodal information and dynamic of temporal sequences. |
Bo Pang; Hongchi Xia; Cewu Lu; |
778 | IDisc: Internal Discretization for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose iDisc to learn those patterns with internal discretized representations. |
Luigi Piccinelli; Christos Sakaridis; Fisher Yu; |
779 | Balancing Logit Variation for Long-Tailed Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the imbalanced number of samples across categories, the features of those tail classes may get squeezed into a narrow area in the feature space. Towards a balanced feature distribution, we introduce category-wise variation into the network predictions in the training phase such that an instance is no longer projected to a feature point, but a small region instead. |
Yuchao Wang; Jingjing Fei; Haochen Wang; Wei Li; Tianpeng Bao; Liwei Wu; Rui Zhao; Yujun Shen; |
780 | Prompt-Guided Zero-Shot Anomaly Action Recognition Using Pretrained Deep Skeleton Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified, user prompt-guided zero-shot learning framework using a target domain-independent skeleton feature extractor, which is pretrained on a large-scale action recognition dataset. |
Fumiaki Sato; Ryo Hachiuma; Taiki Sekii; |
781 | IQuery: Instruments As Queries for Audio-Visual Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We re-formulate the visual-sound separation task and propose Instruments as Queries (iQuery) with a flexible query expansion mechanism. |
Jiaben Chen; Renrui Zhang; Dongze Lian; Jiaqi Yang; Ziyao Zeng; Jianbo Shi; |
782 | Sampling Is Matter: Point-Guided 3D Human Mesh Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. |
Jeonghwan Kim; Mi-Gyeong Gwon; Hyunwoo Park; Hyukmin Kwon; Gi-Mun Um; Wonjun Kim; |
783 | Efficient Multimodal Fusion Via Interactive Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient and flexible multimodal fusion method, namely PMF, tailored for fusing unimodally pretrained transformers. |
Yaowei Li; Ruijie Quan; Linchao Zhu; Yi Yang; |
784 | Look Around for Anomalies: Weakly-Supervised Anomaly Detection Via Context-Motion Relational Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, even for the same motion of running person, the abnormality varies depending on whether the surroundings are a playground or a roadway. Therefore, our aim is to extract discriminative features by widening the relative gap between classes’ features from a single branch. |
MyeongAh Cho; Minjung Kim; Sangwon Hwang; Chaewon Park; Kyungjae Lee; Sangyoun Lee; |
785 | Depth Estimation From Indoor Panoramas With Neural Scene Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a practical framework to improve the accuracy and efficiency of depth estimation from multi-view indoor panoramic images with the Neural Radiance Field technology. |
Wenjie Chang; Yueyi Zhang; Zhiwei Xiong; |
786 | Task-Specific Fine-Tuning Via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though Self-supervised Learning (SSL) proposes viable representation learning schemes, the downstream task-specific features via partial label tuning are not explored. To alleviate this problem, we propose an efficient WSI fine-tuning framework motivated by the Information Bottleneck theory. |
Honglin Li; Chenglu Zhu; Yunlong Zhang; Yuxuan Sun; Zhongyi Shui; Wenwei Kuang; Sunyi Zheng; Lin Yang; |
787 | Detecting Everything in The Open World: Towards Universal Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formally address universal object detection, which aims to detect every scene and predict every category. |
Zhenyu Wang; Yali Li; Xi Chen; Ser-Nam Lim; Antonio Torralba; Hengshuang Zhao; Shengjin Wang; |
788 | Single Image Depth Prediction Made Better: A Multivariate Gaussian Take Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, an SIDP approach must be mindful of the expected depth variations in the model’s prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution. |
Ce Liu; Suryansh Kumar; Shuhang Gu; Radu Timofte; Luc Van Gool; |
789 | NUWA-LIP: Language-Guided Image Inpainting With Defect-Free VQGAN Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better adapt the text guidance to the inpainting task, this paper proposes NUWA-LIP, which involves defect-free VQGAN (DF-VQGAN) and a multi-perspective sequence-to-sequence module (MP-S2S). |
Minheng Ni; Xiaoming Li; Wangmeng Zuo; |
790 | One-Shot Model for Mixed-Precision Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on a specific class of methods that find tensor bit width using gradient-based optimization. First, we theoretically derive several methods that were empirically proposed earlier. Second, we present a novel One-Shot method that finds a diverse set of Pareto-front architectures in O(1) time. |
Ivan Koryakovskiy; Alexandra Yakovleva; Valentin Buchnev; Temur Isaev; Gleb Odinokikh; |
791 | MARLIN: Masked Autoencoder for Facial Video Representation LearnINg Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). |
Zhixi Cai; Shreya Ghosh; Kalin Stefanov; Abhinav Dhall; Jianfei Cai; Hamid Rezatofighi; Reza Haffari; Munawar Hayat; |
792 | Language Adaptive Weight Generation for Multi-Task Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The active perception can take expressions as priors to extract relevant visual features, which can effectively alleviate the mismatches. Inspired by this, we propose an active perception Visual Grounding framework based on Language Adaptive Weights, called VG-LAW. |
Wei Su; Peihan Miao; Huanzhang Dou; Gaoang Wang; Liang Qiao; Zheyang Li; Xi Li; |
793 | Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel framework to formulate latent motion manifolds with keyframe-based constraints, from which the continuous nature of intermediate token representations is considered. |
Clinton A. Mo; Kun Hu; Chengjiang Long; Zhiyong Wang; |
794 | Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, the position prior, positive sample feature, and instance are mismatched, and the learning of extreme-shaped objects is biased and unbalanced due to little proper feature supervision. To tackle these issues, we propose a dynamic prior along with the coarse-to-fine assigner, dubbed DCFL. |
Chang Xu; Jian Ding; Jinwang Wang; Wen Yang; Huai Yu; Lei Yu; Gui-Song Xia; |
795 | Controllable Mesh Generation Through Sparse Latent Point Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a novel sparse latent point diffusion model for mesh generation. |
Zhaoyang Lyu; Jinyi Wang; Yuwei An; Ya Zhang; Dahua Lin; Bo Dai; |
796 | Query-Centric Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents QCNet, a modeling framework toward pushing the boundaries of trajectory prediction. |
Zikang Zhou; Jianping Wang; Yung-Hui Li; Yu-Kai Huang; |
797 | The Enemy of My Enemy Is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it may have a negative impact if a natural example is misclassified. To circumvent this issue, we propose a novel adversarial training scheme that encourages the model to produce similar output probabilities for an adversarial example and its "inverse adversarial" counterpart. |
Junhao Dong; Seyed-Mohsen Moosavi-Dezfooli; Jianhuang Lai; Xiaohua Xie; |
798 | Look Before You Match: Instance Understanding Matters in Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that instance understanding matters in VOS, and integrating it with memory-based matching can enjoy the synergy, which is intuitively sensible from the definition of VOS task, i.e., identifying and segmenting object instances within the video. |
Junke Wang; Dongdong Chen; Zuxuan Wu; Chong Luo; Chuanxin Tang; Xiyang Dai; Yucheng Zhao; Yujia Xie; Lu Yuan; Yu-Gang Jiang; |
799 | SGLoc: Scene Geometry Encoding for Outdoor LiDAR Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the accuracy of existing methods still has room to improve due to the difficulty of effectively encoding the scene geometry and the unsatisfactory quality of the data. In this work, we propose a novel LiDAR localization framework, SGLoc, which decouples the pose estimation to point cloud correspondence regression and pose estimation via this correspondence. |
Wen Li; Shangshu Yu; Cheng Wang; Guosheng Hu; Siqi Shen; Chenglu Wen; |
800 | Boundary Unlearning: Rapid Forgetting of Deep Networks Via Shifting The Decision Boundary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we refocus our attention from the parameter space to the decision space of the DNN model, and propose Boundary Unlearning, a rapid yet effective way to unlearn an entire class from a trained DNN model. |
Min Chen; Weizhuo Gao; Gaoyang Liu; Kai Peng; Chen Wang; |
801 | Bridging Search Region Interaction With Template for RGB-T Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many other methods sample candidate boxes from search frames and conduct various fusion approaches on isolated pairs of RGB and TIR boxes, which limits the cross-modal interaction within local regions and brings about inadequate context modeling. To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts. |
Tianrui Hui; Zizheng Xun; Fengguang Peng; Junshi Huang; Xiaoming Wei; Xiaolin Wei; Jiao Dai; Jizhong Han; Si Liu; |
802 | Indescribable Multi-Modal Spatial Evaluator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we developed a self-supervised approach, Indescribable Multi-model Spatial Evaluator (IMSE), to address multi-modal image registration. |
Lingke Kong; X. Sharon Qi; Qijin Shen; Jiacheng Wang; Jingyi Zhang; Yanle Hu; Qichao Zhou; |
803 | ImageBind: One Embedding Space To Bind Them All Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ImageBind, an approach to learn a joint embedding across six different modalities – images, text, audio, depth, thermal, and IMU data. |
Rohit Girdhar; Alaaeldin El-Nouby; Zhuang Liu; Mannat Singh; Kalyan Vasudev Alwala; Armand Joulin; Ishan Misra; |
804 | Orthogonal Annotation Benefits Barely-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, by introducing unlabeled volumes, we propose a dual-network paradigm named Dense-Sparse Co-training (DeSCO) that exploits dense pseudo labels in early stage and sparse labels in later stage and meanwhile forces consistent output of two networks. |
Heng Cai; Shumeng Li; Lei Qi; Qian Yu; Yinghuan Shi; Yang Gao; |
805 | Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we design a simple, efficient O(N) yet powerful guided cross-scale pyramid alignment(GCSPA) module, where multi-scale information is highly exploited. |
Kun Zhou; Wenbo Li; Xiaoguang Han; Jiangbo Lu; |
806 | Knowledge Distillation for 6D Pose Estimation By Aligning Distributions of Local Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the first knowledge distillation method driven by the 6D pose estimation task. |
Shuxuan Guo; Yinlin Hu; Jose M. Alvarez; Mathieu Salzmann; |
807 | Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose universally slimmable self-supervised learning (dubbed as US3L) to achieve better accuracy-efficiency trade-offs for deploying self-supervised models across different devices. |
Yun-Hao Cao; Peiqin Sun; Shuchang Zhou; |
808 | Adaptive Annealing for Robust Geometric Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a principled approach for adaptively annealing the scale for GNC by tracking the positive-definiteness (i.e. local convexity) of the Hessian of the cost function. |
Chitturi Sidhartha; Lalit Manam; Venu Madhav Govindu; |
809 | MetaFusion: Infrared and Visible Image Fusion Via Meta-Feature Embedding From Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the feature gap between these two different-level tasks hinders the progress. Addressing this issue, this paper proposes an infrared and visible image fusion via meta-feature embedding from object detection. |
Wenda Zhao; Shigeng Xie; Fan Zhao; You He; Huchuan Lu; |
810 | Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Transformers have shown potential in capturing long-range dependencies, but few attempts have been made with specifically designed Transformer to model the spatial and spectral correlation in HSIs. In this paper, we address these issues by proposing a spectral enhanced rectangle Transformer, driving it to explore the non-local spatial similarity and global spectral low-rank property of HSIs. |
Miaoyu Li; Ji Liu; Ying Fu; Yulun Zhang; Dejing Dou; |
811 | End-to-End Vectorized HD-Map Construction With Piecewise Bezier Curve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, by delving into parameterization-based methods, we pioneer a concise and elegant scheme that adopts unified piecewise Bezier curve. |
Limeng Qiao; Wenjie Ding; Xi Qiu; Chi Zhang; |
812 | PointListNet: Deep Learning on 3D Point Lists Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there are a few kinds of data that exhibit both regular 1D list and irregular 3D set structures, such as proteins and non-coding RNAs. In this paper, we refer to them as 3D point lists and propose a Transformer-style PointListNet to model them. |
Hehe Fan; Linchao Zhu; Yi Yang; Mohan Kankanhalli; |
813 | On Data Scaling in Masked Image Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, scaling properties seem to be unintentionally neglected in the recent trending studies on masked image modeling (MIM), and some arguments even suggest that MIM cannot benefit from large-scale data. In this work, we try to break down these preconceptions and systematically study the scaling behaviors of MIM through extensive experiments, with data ranging from 10% of ImageNet-1K to full ImageNet-22K, model parameters ranging from 49-million to one-billion, and training length ranging from 125K to 500K iterations. |
Zhenda Xie; Zheng Zhang; Yue Cao; Yutong Lin; Yixuan Wei; Qi Dai; Han Hu; |
814 | Upcycling Models Under Domain and Category Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce an innovative global and local clustering learning technique (GLC). |
Sanqing Qu; Tianpei Zou; Florian Röhrbein; Cewu Lu; Guang Chen; Dacheng Tao; Changjun Jiang; |
815 | Single Domain Generalization for LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a single domain generalization method for LiDAR semantic segmentation (DGLSS) that aims to ensure good performance not only in the source domain but also in the unseen domain by learning only on the source domain. |
Hyeonseong Kim; Yoonsu Kang; Changgyoon Oh; Kuk-Jin Yoon; |
816 | Balanced Energy Regularization Loss for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a balanced energy regularization loss that is simple but generally effective for a variety of tasks. |
Hyunjun Choi; Hawook Jeong; Jin Young Choi; |
817 | 3D-Aware Face Swapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel 3D-aware face swapping method that generates high-fidelity and multi-view-consistent swapped faces from single-view source and target images. |
Yixuan Li; Chao Ma; Yichao Yan; Wenhan Zhu; Xiaokang Yang; |
818 | UMat: Uncertainty-Aware Single Image High Resolution Material Capture Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a learning-based method to recover normals, specularity, and roughness from a single diffuse image of a material, using microgeometry appearance as our primary cue. |
Carlos Rodriguez-Pardo; Henar Domínguez-Elvira; David Pascual-Hernández; Elena Garces; |
819 | Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an effective way to refine a phrase ground model by considering self-similarity maps extracted from the latent representation of the model’s image encoder. |
Tal Shaharabany; Lior Wolf; |
820 | SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SCOOP, a new method for scene flow estimation that can be learned on a small amount of data without employing ground-truth flow supervision. |
Itai Lang; Dror Aiger; Forrester Cole; Shai Avidan; Michael Rubinstein; |
821 | SLACK: Stable Learning of Augmentations With Cold-Start and KL Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most recent approaches still rely on some prior information; they start from a small pool of manually-selected default transformations that are either used to pretrain the network or forced to be part of the policy learned by the automatic data augmentation algorithm. In this paper, we propose to directly learn the augmentation policy without leveraging such prior knowledge. |
Juliette Marrie; Michael Arbel; Diane Larlus; Julien Mairal; |
822 | Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus we present first-order flatness, a stronger measure of flatness focusing on the maximal gradient norm within a perturbation radius which bounds both the maximal eigenvalue of Hessian at local minima and the regularization function of SAM. |
Xingxuan Zhang; Renzhe Xu; Han Yu; Hao Zou; Peng Cui; |
823 | Phone2Proc: Bringing Robust Robots Into Our Chaotic World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Phone2Proc, a method that uses a 10-minute phone scan and conditional procedural generation to create a distribution of training scenes that are semantically similar to the target environment. |
Matt Deitke; Rose Hendrix; Ali Farhadi; Kiana Ehsani; Aniruddha Kembhavi; |
824 | Latency Matters: Real-Time Action Forecasting Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present RAFTformer, a real-time action forecasting transformer for latency aware real-world action forecasting applications. |
Harshayu Girase; Nakul Agarwal; Chiho Choi; Karttikeya Mangalam; |
825 | HierVL: Learning Hierarchical Video-Language Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a hierarchical contrastive training objective that encourages text-visual alignment at both the clip level and video level. |
Kumar Ashutosh; Rohit Girdhar; Lorenzo Torresani; Kristen Grauman; |
826 | GraVoS: Voxel Selection for 3D Point-Cloud Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Differently, we propose to modify the scenes by removing elements (voxels), rather than adding ones. |
Oren Shrout; Yizhak Ben-Shabat; Ayellet Tal; |
827 | Learning Articulated Shape With Keypoint Pseudo-Labels From Web Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper shows that it is possible to learn models for monocular 3D reconstruction of articulated objects (e.g. horses, cows, sheep), using as few as 50-150 images labeled with 2D keypoints. |
Anastasis Stathopoulos; Georgios Pavlakos; Ligong Han; Dimitris N. Metaxas; |
828 | Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we try to give a feasible answer from a machine learning perspective, i.e., the twin fitting problem caused by the long-tailed pixel distribution in natural images. |
Yuanbiao Gou; Peng Hu; Jiancheng Lv; Hongyuan Zhu; Xi Peng; |
829 | RobustNeRF: Ignoring Distractors With Robust Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. |
Sara Sabour; Suhani Vora; Daniel Duckworth; Ivan Krasin; David J. Fleet; Andrea Tagliasacchi; |
830 | Spherical Transformer for LiDAR-Based 3D Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones. |
Xin Lai; Yukang Chen; Fanbin Lu; Jianhui Liu; Jiaya Jia; |
831 | Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an abstraction of life, art incorporates humans in both natural and artificial scenes. We take advantage of it and introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios. |
Xuan Ju; Ailing Zeng; Jianan Wang; Qiang Xu; Lei Zhang; |
832 | Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input corruption situation where audio inputs and visual inputs are both corrupted, which is not well addressed in previous research directions. |
Joanna Hong; Minsu Kim; Jeongsoo Choi; Yong Man Ro; |
833 | Turning A CLIP Model Into A Scene Text Detector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection. In contrast to these works, this paper proposes a new method, termed TCM, focusing on Turning the CLIP Model directly for text detection without pretraining process. |
Wenwen Yu; Yuliang Liu; Wei Hua; Deqiang Jiang; Bo Ren; Xiang Bai; |
834 | VisFusion: Visibility-Aware Online 3D Scene Reconstruction From Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose VisFusion, a visibility-aware online 3D scene reconstruction approach from posed monocular videos. |
Huiyu Gao; Wei Mao; Miaomiao Liu; |
835 | SCOTCH and SODA: A Transformer Video Shadow Detection Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we argue that accounting for shadow deformation is essential when designing a video shadow detection method. |
Lihao Liu; Jean Prost; Lei Zhu; Nicolas Papadakis; Pietro Liò; Carola-Bibiane Schönlieb; Angelica I. Aviles-Rivero; |
836 | RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a 3D diffusion model that automatically generates 3D digital avatars represented as neural radiance fields (NeRFs). |
Tengfei Wang; Bo Zhang; Ting Zhang; Shuyang Gu; Jianmin Bao; Tadas Baltrusaitis; Jingjing Shen; Dong Chen; Fang Wen; Qifeng Chen; Baining Guo; |
837 | On The Pitfall of Mixup for Uncertainty Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By decomposing the mixup process into data transformation and random perturbation, we suggest that the confidence penalty nature of the data transformation is the reason of calibration degradation. |
Deng-Bao Wang; Lanqing Li; Peilin Zhao; Pheng-Ann Heng; Min-Ling Zhang; |
838 | Feature Shrinkage Pyramid for Camouflaged Object Detection With Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders, which are not conducive to camouflaged object detection that explores subtle cues from indistinguishable backgrounds. To address these issues, in this paper, we propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features through progressive shrinking for camouflaged object detection. |
Zhou Huang; Hang Dai; Tian-Zhu Xiang; Shuo Wang; Huai-Xin Chen; Jie Qin; Huan Xiong; |
839 | Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To calibrate the inaccurate matching results, we introduce a two-stage framework, where matched keypoints from the first stage are viewed as similarity-aware position proposals. |
Min Shi; Zihao Huang; Xianzheng Ma; Xiaowei Hu; Zhiguo Cao; |
840 | High-Fidelity Guided Image Synthesis With Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we find that prior works suffer from an intrinsic domain shift problem wherein the generated outputs often lack details and resemble simplistic representations of the target domain. In this paper, we propose a novel guided image synthesis framework, which addresses this problem by modeling the output image as the solution of a constrained optimization problem. |
Jaskirat Singh; Stephen Gould; Liang Zheng; |
841 | CodeTalker: Speech-Driven 3D Facial Animation With Discrete Motion Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty. |
Jinbo Xing; Menghan Xia; Yuechen Zhang; Xiaodong Cun; Jue Wang; Tien-Tsin Wong; |
842 | Towards Transferable Targeted Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Transferable Targeted Adversarial Attack (TTAA), which can capture the distribution information of the target class from both label-wise and feature-wise perspectives, to generate highly transferable targeted adversarial examples. |
Zhibo Wang; Hongshan Yang; Yunhe Feng; Peng Sun; Hengchang Guo; Zhifei Zhang; Kui Ren; |
843 | Semi-Supervised Parametric Real-World Image Harmonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This simulated data does not model many important appearance mismatches (illumination, object boundaries, etc.) between foreground and background in real composites, leading to models that do not generalize well and cannot model complex local changes. We propose a new semi-supervised training strategy that addresses this problem and lets us learn complex local appearance harmonization from unpaired real composites, where foreground and background come from different images. |
Ke Wang; Michaël Gharbi; He Zhang; Zhihao Xia; Eli Shechtman; |
844 | C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent state-of-the-art (SOTA) methods on SFDA mostly focus on pseudo-label refinement based self-training which generally suffers from two issues: i) inevitable occurrence of noisy pseudo-labels that could lead to early training time memorization, ii) refinement process requires maintaining a memory bank which creates a significant burden in resource constraint scenarios. To address these concerns, we propose C-SFDA, a curriculum learning aided self-training framework for SFDA that adapts efficiently and reliably to changes across domains based on selective pseudo-labeling. |
Nazmul Karim; Niluthpol Chowdhury Mithun; Abhinav Rajvanshi; Han-pang Chiu; Supun Samarasekera; Nazanin Rahnavard; |
845 | Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel sparse-view 3d human reconstruction framework that closely incorporates the occupancy field and albedo field with an additional visibility field–it not only resolves occlusion ambiguity in multiview feature aggregation, but can also be used to evaluate light attenuation for self-shadowed relighting. |
Ruichen Zheng; Peng Li; Haoqian Wang; Tao Yu; |
846 | Improving Zero-Shot Generalization and Robustness of Multi-Modal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While the top-5 zero-shot accuracies of these models are very high, the top-1 accuracies are much lower (over 25% gap in some cases). We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. |
Yunhao Ge; Jie Ren; Andrew Gallagher; Yuxiao Wang; Ming-Hsuan Yang; Hartwig Adam; Laurent Itti; Balaji Lakshminarayanan; Jiaping Zhao; |
847 | Improving Robustness of Vision Transformers By Reducing Sensitivity To Patch Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, when we only occlude a small number of patches with random noise (e.g., 10%), these patch corruptions would lead to severe accuracy drops and greatly distract intermediate attention layers. To address this, we propose a new training method that improves the robustness of transformers from a new perspective — reducing sensitivity to patch corruptions (RSPC). |
Yong Guo; David Stutz; Bernt Schiele; |
848 | VecFontSDF: Learning To Reconstruct and Synthesize High-Quality Vector Fonts Via Signed Distance Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an end-to-end trainable method, VecFontSDF, to reconstruct and synthesize high-quality vector fonts using signed distance functions (SDFs). |
Zeqing Xia; Bojun Xiong; Zhouhui Lian; |
849 | MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient Motion-guided Sequential Fusion (MSF) method, which exploits the continuity of object motion to mine useful sequential contexts for object detection in the current frame. |
Chenhang He; Ruihuang Li; Yabin Zhang; Shuai Li; Lei Zhang; |
850 | Modeling The Distributional Uncertainty for Salient Object Detection Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a particular type of epistemic uncertainty, namely distributional uncertainty, for salient object detection. |
Xinyu Tian; Jing Zhang; Mochu Xiang; Yuchao Dai; |
851 | Kernel Aware Resampler Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works do not provide a sound framework to handle them jointly. In this paper we propose a framework for generic image resampling that not only addresses all the above mentioned issues but extends the sets of possible transforms from upscaling to generic transforms. |
Michael Bernasconi; Abdelaziz Djelouah; Farnood Salehi; Markus Gross; Christopher Schroers; |
852 | LaserMix for Semi-Supervised LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the underexplored semi-supervised learning (SSL) in LiDAR semantic segmentation. |
Lingdong Kong; Jiawei Ren; Liang Pan; Ziwei Liu; |
853 | CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead propose to learn a set of prompt components which are assembled with input-conditioned weights to produce input-conditioned prompts, resulting in a novel attention-based end-to-end key-query scheme. |
James Seale Smith; Leonid Karlinsky; Vyshnavi Gutta; Paola Cascante-Bonilla; Donghyun Kim; Assaf Arbelle; Rameswar Panda; Rogerio Feris; Zsolt Kira; |
854 | HypLiLoc: Towards Effective LiDAR Pose Regression With Hyperbolic Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose HypLiLoc, a new model for LiDAR pose regression. |
Sijie Wang; Qiyu Kang; Rui She; Wei Wang; Kai Zhao; Yang Song; Wee Peng Tay; |
855 | Complementary Intrinsics From Neural Radiance Fields and CNNs for Outdoor Scene Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes to complement the intrinsic estimation from volume rendering using NeRFs and from inversing the photometric image formation model using convolutional neural networks (CNNs). |
Siqi Yang; Xuanning Cui; Yongjie Zhu; Jiajun Tang; Si Li; Zhaofei Yu; Boxin Shi; |
856 | VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. |
Zhengxiong Luo; Dayou Chen; Yingya Zhang; Yan Huang; Liang Wang; Yujun Shen; Deli Zhao; Jingren Zhou; Tieniu Tan; |
857 | Real-Time Multi-Person Eyeblink Detection in The Wild for Untrimmed Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. |
Wenzheng Zeng; Yang Xiao; Sicheng Wei; Jinfang Gan; Xintao Zhang; Zhiguo Cao; Zhiwen Fang; Joey Tianyi Zhou; |
858 | Category Query Learning for Human-Object Interaction Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. |
Chi Xie; Fangao Zeng; Yue Hu; Shuang Liang; Yichen Wei; |
859 | MDQE: Mining Discriminative Query Embeddings To Segment Occluded Instances on Challenging Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is mainly because instance queries in these methods cannot encode well the discriminative embeddings of instances, making the query-based segmenter difficult to distinguish those ‘hard’ instances. To address these issues, we propose to mine discriminative query embeddings (MDQE) to segment occluded instances on challenging videos. |
Minghan Li; Shuai Li; Wangmeng Xiang; Lei Zhang; |
860 | Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving. |
Xiaofeng Wang; Zheng Zhu; Yunpeng Zhang; Guan Huang; Yun Ye; Wenbo Xu; Ziwei Chen; Xingang Wang; |
861 | Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to enhance model-based face reconstruction by avoiding fitting the model to outliers, i.e. regions that cannot be well-expressed by the model such as occluders or make-up. |
Chunlu Li; Andreas Morel-Forster; Thomas Vetter; Bernhard Egger; Adam Kortylewski; |
862 | Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we borrow the idea of importance perception from classical image coding theory and propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) and Stackformer, to relieve the model from modeling redundancy. |
Mengqi Huang; Zhendong Mao; Quan Wang; Yongdong Zhang; |
863 | Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To leverage the advantage of different teachers, we design a spatial-temporal co-teaching method for MVD. |
Rui Wang; Dongdong Chen; Zuxuan Wu; Yinpeng Chen; Xiyang Dai; Mengchen Liu; Lu Yuan; Yu-Gang Jiang; |
864 | Transformer-Based Unified Recognition of Two Hands Manipulating Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Transformer-based unified framework that provides better understanding of two hands manipulating objects. |
Hoseong Cho; Chanwoo Kim; Jihyeon Kim; Seongyeong Lee; Elkhan Ismayilzada; Seungryul Baek; |
865 | Azimuth Super-Resolution for FMCW Radar in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the azimuth resolution of MIMO radar, we propose a light, yet efficient, Analog-to-Digital super-resolution model (ADC-SR) that predicts or hallucinates additional radar signals using signals from only a few receivers. |
Yu-Jhe Li; Shawn Hunt; Jinhyung Park; Matthew O’Toole; Kris Kitani; |
866 | PDPP:Projected Diffusion for Procedure Planning in Instructional Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of procedure planning in instructional videos, which aims to make goal-directed plans given the current visual observations in unstructured real-life videos. |
Hanlin Wang; Yilu Wu; Sheng Guo; Limin Wang; |
867 | RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Today, projection-based methods leverage 2D CNNs but recent advances in computer vision show that vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks. In this work, we question if projection-based methods for 3D semantic segmentation can benefit from these latest improvements on ViTs. |
Angelika Ando; Spyros Gidaris; Andrei Bursuc; Gilles Puy; Alexandre Boulch; Renaud Marlet; |
868 | ProTeGe: Untrimmed Pretraining for Video Temporal Grounding By Video Temporal Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ProTeGe as the first method to perform VTG-based untrimmed pretraining to bridge the gap between trimmed pretrained backbones and downstream VTG tasks. |
Lan Wang; Gaurav Mittal; Sandra Sajeev; Ye Yu; Matthew Hall; Vishnu Naresh Boddeti; Mei Chen; |
869 | VQACL: A Novel Visual Question Answering Continual Learning Setting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish a novel VQA Continual Learning setting named VQACL, which contains two key components: a dual-level task sequence where visual and linguistic data are nested, and a novel composition testing containing new skill-concept combinations. |
Xi Zhang; Feifei Zhang; Changsheng Xu; |
870 | Efficient Map Sparsification Based on 2D and 3D Discretized Grids Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we formulate map sparsification in an efficient linear form and select uniformly distributed landmarks based on 2D discretized grids. |
Xiaoyu Zhang; Yun-Hui Liu; |
871 | High-Res Facial Appearance Capture From Polarized Smartphone Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method for high-quality facial texture reconstruction from RGB images using a novel capturing routine based on a single smartphone which we equip with an inexpensive polarization foil. |
Dejan Azinović; Olivier Maury; Christophe Hery; Matthias Nießner; Justus Thies; |
872 | JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents JAWS, an optimzation-driven approach that achieves the robust transfer of visual cinematic features from a reference in-the-wild video clip to a newly generated clip. |
Xi Wang; Robin Courant; Jinglei Shi; Eric Marchand; Marc Christie; |
873 | Class Attention Transfer Based Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. |
Ziyao Guo; Haonan Yan; Hui Li; Xiaodong Lin; |
874 | EfficientSCI: Densely Connected Network With Space-Time Factorization for Large-Scale Video Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although recent deep learning-based state-of-the-art (SOTA) reconstruction algorithms have achieved good results in most tasks, they still face the following challenges due to excessive model complexity and GPU memory limitations: 1) these models need high computational cost, and 2) they are usually unable to reconstruct large-scale video frames at high compression ratios. To address these issues, we develop an efficient network for video SCI by using dense connections and space-time factorization mechanism within a single residual block, dubbed EfficientSCI. |
Lishun Wang; Miao Cao; Xin Yuan; |
875 | Exploring Incompatible Knowledge Transfer in Few-Shot Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate an underexplored issue in FSIG, dubbed as incompatible knowledge transfer, which would significantly degrade the realisticness of synthetic samples. |
Yunqing Zhao; Chao Du; Milad Abdollahzadeh; Tianyu Pang; Min Lin; Shuicheng Yan; Ngai-Man Cheung; |
876 | Temporally Consistent Online Depth Estimation Using Point-Based Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we aim to estimate temporally consistent depth maps of video streams in an online setting. |
Numair Khan; Eric Penner; Douglas Lanman; Lei Xiao; |
877 | Generalizable Implicit Neural Representations Via Instance Pattern Composers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a simple yet effective framework for generalizable INRs that enables a coordinate-based MLP to represent complex data instances by modulating only a small set of weights in an early MLP layer as an instance pattern composer; the remaining MLP weights learn pattern composition rules to learn common representations across instances. |
Chiheon Kim; Doyup Lee; Saehoon Kim; Minsu Cho; Wook-Shin Han; |
878 | MotionTrack: Learning Robust Short-Term and Long-Term Motions for Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective multi-object tracker, i.e., MotionTrack, which learns robust short-term and long-term motions in a unified framework to associate trajectories from a short to long range. |
Zheng Qin; Sanping Zhou; Le Wang; Jinghai Duan; Gang Hua; Wei Tang; |
879 | 3D Registration With Maximal Cliques Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a 3D registration method with maximal cliques (MAC). |
Xiyu Zhang; Jiaqi Yang; Shikun Zhang; Yanning Zhang; |
880 | What Can Human Sketches Do for Object Detection? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time, we cultivate the expressiveness of sketches but for the fundamental vision task of object detection. |
Pinaki Nath Chowdhury; Ayan Kumar Bhunia; Aneeshan Sain; Subhadeep Koley; Tao Xiang; Yi-Zhe Song; |
881 | Identity-Preserving Talking Face Generation With Landmark and Appearance Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing person-generic methods have difficulty in generating realistic and lip-synced videos while preserving identity information. To tackle this problem, we propose a two-stage framework consisting of audio-to-landmark generation and landmark-to-video rendering procedures. |
Weizhi Zhong; Chaowei Fang; Yinqi Cai; Pengxu Wei; Gangming Zhao; Liang Lin; Guanbin Li; |
882 | All-in-One Image Restoration for Unknown Degradations Using Adaptive Discriminative Filters for Specific Degradations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, inspired by the filter attribution integrated gradients (FAIG), we propose an adaptive discriminative filter-based model for specific degradations (ADMS) to restore images with unknown degradations. |
Dongwon Park; Byung Hyun Lee; Se Young Chun; |
883 | Weakly Supervised Segmentation With Point Annotations for Histopathology Images Via Contrast-Based Variational Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a contrast-based variational model to generate segmentation results, which serve as reliable complementary supervision to train a deep segmentation model for histopathology images. |
Hongrun Zhang; Liam Burrows; Yanda Meng; Declan Sculthorpe; Abhik Mukherjee; Sarah E. Coupland; Ke Chen; Yalin Zheng; |
884 | Efficient RGB-T Tracking Via Cross-Modality Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, a compact RGB-T tracker may be computationally efficient but encounter non-negligible performance degradation, due to the weakening of feature representation ability. To remedy this situation, a cross-modality distillation framework is presented to bridge the performance gap between a compact tracker and a powerful tracker. |
Tianlu Zhang; Hongyuan Guo; Qiang Jiao; Qiang Zhang; Jungong Han; |
885 | MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. |
Bowen Zhang; Chenyang Qi; Pan Zhang; Bo Zhang; HsiangTao Wu; Dong Chen; Qifeng Chen; Yong Wang; Fang Wen; |
886 | UniHCP: A Unified Model for Human-Centric Perceptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, few works have attempted to exploit such homogeneity and design a general-propose model for human-centric tasks. In this work, we revisit a broad range of human-centric tasks and unify them in a minimalist manner. |
Yuanzheng Ci; Yizhou Wang; Meilin Chen; Shixiang Tang; Lei Bai; Feng Zhu; Rui Zhao; Fengwei Yu; Donglian Qi; Wanli Ouyang; |
887 | Passive Micron-Scale Time-of-Flight With Sunlight Interferometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an interferometric technique for passive time-of-flight imaging and depth sensing at micrometer axial resolutions. |
Alankar Kotwal; Anat Levin; Ioannis Gkioulekas; |
888 | VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we instead propose VoxelNext for fully sparse 3D object detection. |
Yukang Chen; Jianhui Liu; Xiangyu Zhang; Xiaojuan Qi; Jiaya Jia; |
889 | Behavioral Analysis of Vision-and-Language Navigation Agents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a methodology to study agent behavior on a skill-specific basis — examining how well existing agents ground instructions about stopping, turning, and moving towards specified objects or rooms. |
Zijiao Yang; Arjun Majumdar; Stefan Lee; |
890 | Zero-Shot Generative Model Adaptation Via Image-Specific Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. |
Jiayi Guo; Chaofei Wang; You Wu; Eric Zhang; Kai Wang; Xingqian Xu; Humphrey Shi; Gao Huang; Shiji Song; |
891 | CelebV-Text: A Large-Scale Facial Text-Video Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents CelebV-Text, a large-scale, diverse, and high-quality dataset of facial text-video pairs, to facilitate research on facial text-to-video generation tasks. |
Jianhui Yu; Hao Zhu; Liming Jiang; Chen Change Loy; Weidong Cai; Wayne Wu; |
892 | Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. |
Eugenia Iofinova; Alexandra Peste; Dan Alistarh; |
893 | AttentionShift: Iteratively Estimated Part-Based Attention Map for Pointly Supervised Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose an AttentionShift method, to solve the semantic bias issue by iteratively decomposing the instance attention map to parts and estimating fine-grained semantics of each part. |
Mingxiang Liao; Zonghao Guo; Yuze Wang; Peng Yuan; Bailan Feng; Fang Wan; |
894 | Unsupervised Volumetric Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. |
Aliaksandr Siarohin; Willi Menapace; Ivan Skorokhodov; Kyle Olszewski; Jian Ren; Hsin-Ying Lee; Menglei Chai; Sergey Tulyakov; |
895 | Hard Patches Mining for Masked Image Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose Hard Patches Mining (HPM), a brand-new framework for MIM pre-training. |
Haochen Wang; Kaiyou Song; Junsong Fan; Yuxi Wang; Jin Xie; Zhaoxiang Zhang; |
896 | PlaneDepth: Self-Supervised Depth Estimation Via Orthogonal Planes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the PlaneDepth, a novel orthogonal planes based presentation, including vertical planes and ground planes. |
Ruoyu Wang; Zehao Yu; Shenghua Gao; |
897 | Diffusion-SDF: Text-To-Shape Via Voxelized Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new generative 3D modeling framework called Diffusion-SDF for the challenging task of text-to-shape synthesis. |
Muheng Li; Yueqi Duan; Jie Zhou; Jiwen Lu; |
898 | Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a robust approach for joint part and object segmentation. |
Ju He; Jieneng Chen; Ming-Xian Lin; Qihang Yu; Alan L. Yuille; |
899 | Semantic-Conditional Diffusion Networks for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we break the deeply rooted conventions in learning Transformer-based encoder-decoder, and propose a new diffusion model based paradigm tailored for image captioning, namely Semantic-Conditional Diffusion Networks (SCD-Net). |
Jianjie Luo; Yehao Li; Yingwei Pan; Ting Yao; Jianlin Feng; Hongyang Chao; Tao Mei; |
900 | Unite and Conquer: Plug & Play Multi-Modal Synthesis Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). |
Nithin Gopalakrishnan Nair; Wele Gedara Chaminda Bandara; Vishal M. Patel; |
901 | TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompted Reconstruction for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a generic Transformer-based Skeleton Graph prototype contrastive learning (TranSG) approach with structure-trajectory prompted reconstruction to fully capture skeletal relations and valuable spatial-temporal semantics from skeleton graphs for person re-ID. |
Haocong Rao; Chunyan Miao; |
902 | All Are Worth Words: A ViT Backbone for Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design a simple and general ViT-based architecture (named U-ViT) for image generation with diffusion models. |
Fan Bao; Shen Nie; Kaiwen Xue; Yue Cao; Chongxuan Li; Hang Su; Jun Zhu; |
903 | ZBS: Zero-Shot Background Subtraction Via Instance-Level Background Modeling and Foreground Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an unsupervised BGS algorithm based on zero-shot object detection called Zero-shot Background Subtraction ZBS. |
Yongqi An; Xu Zhao; Tao Yu; Haiyun Guo; Chaoyang Zhao; Ming Tang; Jinqiao Wang; |
904 | MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is difficult to create a replica of an object in reality, and even 3D reconstructions generated by 3D scanners have artefacts that cause biases in evaluation. To address this issue, we introduce a novel multi-view RGBD dataset captured using a mobile device, which includes highly precise 3D ground-truth annotations for 153 object models featuring a diverse set of 3D structures. |
Kejie Li; Jia-Wang Bian; Robert Castle; Philip H.S. Torr; Victor Adrian Prisacariu; |
905 | GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we approach the FSCIL by adopting analytic learning, a technique that converts network training into linear problems. |
Huiping Zhuang; Zhenyu Weng; Run He; Zhiping Lin; Ziqian Zeng; |
906 | SteerNeRF: Accelerating NeRF Rendering Via Smooth Viewpoint Trajectory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To push the frontier of the efficiency-memory trade-off, we explore a new perspective to accelerate NeRF rendering, leveraging a key fact that the viewpoint change is usually smooth and continuous in interactive viewpoint control. |
Sicheng Li; Hao Li; Yue Wang; Yiyi Liao; Lu Yu; |
907 | Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Active Multimodal Few-shot Action Recognition (AMFAR) framework, which can actively find the reliable modality for each sample based on task-dependent context information to improve few-shot reasoning procedure. |
Yuyang Wanyan; Xiaoshan Yang; Chaofan Chen; Changsheng Xu; |
908 | Magic3D: High-Resolution Text-to-3D Content Creation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the method has two inherent limitations: 1) optimization of the NeRF representation is extremely slow, 2) NeRF is supervised by images at a low resolution (64×64), thus leading to low-quality 3D models with a long wait time. In this paper, we address these limitations by utilizing a two-stage coarse-to-fine optimization framework. |
Chen-Hsuan Lin; Jun Gao; Luming Tang; Towaki Takikawa; Xiaohui Zeng; Xun Huang; Karsten Kreis; Sanja Fidler; Ming-Yu Liu; Tsung-Yi Lin; |
909 | Boundary-Aware Backward-Compatible Representation Via Adversarial Learning in Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce AdvBCT, an Adversarial Backward-Compatible Training method with an elastic boundary constraint that takes both compatibility and discrimination into consideration. |
Tan Pan; Furong Xu; Xudong Yang; Sifeng He; Chen Jiang; Qingpei Guo; Feng Qian; Xiaobo Zhang; Yuan Cheng; Lei Yang; Wei Chu; |
910 | Spatial-Frequency Mutual Learning for Face Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Relying on the Fourier transform, we devise a spatial-frequency mutual network (SFMNet) for FSR, which is the first FSR method to explore the correlations between spatial and frequency domains as far as we know. |
Chenyang Wang; Junjun Jiang; Zhiwei Zhong; Xianming Liu; |
911 | Sketch2Saliency: Learning To Detect Salient Objects From Human Drawings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel method that emphasises on how "salient object" could be explained by hand-drawn sketches. |
Ayan Kumar Bhunia; Subhadeep Koley; Amandeep Kumar; Aneeshan Sain; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; |
912 | Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring. |
Lingshun Kong; Jiangxin Dong; Jianjun Ge; Mingqiang Li; Jinshan Pan; |
913 | Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate on how to distill the knowledge from an imperfect expert. |
Jia Zeng; Li Chen; Hanming Deng; Lewei Lu; Junchi Yan; Yu Qiao; Hongyang Li; |
914 | ULIP: Learning A Unified Representation of Language, Images, and Point Clouds for 3D Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by this, leveraging multimodal information for 3D modality could be promising to improve 3D understanding under the restricted data regime, but this line of research is not well studied. Therefore, we introduce ULIP to learn a unified representation of images, language, and 3D point clouds by pre-training with object triplets from the three modalities. |
Mingfei Gao; Chen Xing; Roberto Martín-Martín; Jiajun Wu; Caiming Xiong; Le Xue; Ran Xu; Juan Carlos Niebles; Silvio Savarese; |
915 | Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate offline open-vocabulary text-to-motion generation in a zero-shot learning manner that neither requires paired training data nor extra online optimization to adapt for unseen texts. |
Junfan Lin; Jianlong Chang; Lingbo Liu; Guanbin Li; Liang Lin; Qi Tian; Chang-Wen Chen; |
916 | Deep Learning of Partial Graph Matching Via Differentiable Top-K Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to formulate the partial GM problem as the top-k selection task with a given/estimated number of inliers k. Specifically, we devise a differentiable top-k module that enables effective gradient descent over the optimal-transport layer, which can be readily plugged into SOTA deep GM pipelines including the quadratic matching network NGMv2 as well as the linear matching network GCAN. |
Runzhong Wang; Ziao Guo; Shaofei Jiang; Xiaokang Yang; Junchi Yan; |
917 | Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Due to the multi-modal nature of this task, multiple factors of variation are intertwined, making generalization difficult to analyze. This motivates us to introduce a virtual benchmark, Super-CLEVR, where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently. |
Zhuowan Li; Xingrui Wang; Elias Stengel-Eskin; Adam Kortylewski; Wufei Ma; Benjamin Van Durme; Alan L. Yuille; |
918 | MonoHuman: Animatable Human Neural Field From Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses. |
Zhengming Yu; Wei Cheng; Xian Liu; Wayne Wu; Kwan-Yee Lin; |
919 | Sliced Optimal Partial Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient algorithm for calculating the OPT problem between two non-negative measures in one dimension. |
Yikun Bai; Bernhard Schmitzer; Matthew Thorpe; Soheil Kolouri; |
920 | Siamese DETR Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR. |
Gengshi Huang; Wei Li; Jianing Teng; Kun Wang; Zeren Chen; Jing Shao; Chen Change Loy; Lu Sheng; |
921 | SINE: Semantic-Driven Image-Based NeRF Editing With Prior-Guided Editing Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image, and faithfully delivers edited novel views with high fidelity and multi-view consistency. |
Chong Bao; Yinda Zhang; Bangbang Yang; Tianxing Fan; Zesong Yang; Hujun Bao; Guofeng Zhang; Zhaopeng Cui; |
922 | Turning Strengths Into Weaknesses: A Certified Robustness Inspired Attack Framework Against Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We apply our attack framework to the existing attacks and results show it can significantly enhance the existing attacks’ performance. |
Binghui Wang; Meng Pang; Yun Dong; |
923 | Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network By Adversarial Instrumental Variable Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a way of delving into the unexpected vulnerability in adversarially trained networks from a causal perspective, namely adversarial instrumental variable (IV) regression. |
Junho Kim; Byung-Kwan Lee; Yong Man Ro; |
924 | NVTC: Nonlinear Vector Transform Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first investigate on some toy sources, demonstrating that even if modern neural networks considerably enhance the compression performance of SQ with nonlinear transform, there is still an insurmountable chasm between SQ and VQ. Therefore, revolving around VQ, we propose a novel framework for neural image compression named Nonlinear Vector Transform Coding (NVTC). |
Runsen Feng; Zongyu Guo; Weiping Li; Zhibo Chen; |
925 | B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, we need to properly handle the edges and textures to minimize information distortion of the contents when a display device’s resolution differs from SCIs. To achieve this goal, we propose an implicit neural representation using B-splines for screen content image super-resolution (SCI SR) with arbitrary scales. |
Byeonghyun Pak; Jaewon Lee; Kyong Hwan Jin; |
926 | MetaCLUE: Towards Comprehensive Visual Metaphors Research Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. |
Arjun R. Akula; Brendan Driscoll; Pradyumna Narayana; Soravit Changpinyo; Zhiwei Jia; Suyash Damle; Garima Pruthi; Sugato Basu; Leonidas Guibas; William T. Freeman; Yuanzhen Li; Varun Jampani; |
927 | Towards End-to-End Generative Modeling of Long Videos With Memory-Efficient Bidirectional Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. |
Jaehoon Yoo; Semin Kim; Doyup Lee; Chiheon Kim; Seunghoon Hong; |
928 | Domain Expansion of Image Generators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new task — domain expansion — to address this. |
Yotam Nitzan; Michaël Gharbi; Richard Zhang; Taesung Park; Jun-Yan Zhu; Daniel Cohen-Or; Eli Shechtman; |
929 | On The Effectiveness of Partial Variance Reduction in Federated Learning With Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. |
Bo Li; Mikkel N. Schmidt; Tommy S. Alstrøm; Sebastian U. Stich; |
930 | Point Cloud Forecasting As A Proxy for 4D Occupancy Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular LiDAR sensor), and (3) the shape and motion of other objects in the scene. |
Tarasha Khurana; Peiyun Hu; David Held; Deva Ramanan; |
931 | Masked Representation Learning for Domain Generalized Stereo Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by masked representation learning and multi-task learning, this paper designs a simple and effective masked representation for domain generalized stereo matching. |
Zhibo Rao; Bangshu Xiong; Mingyi He; Yuchao Dai; Renjie He; Zhelun Shen; Xing Li; |
932 | LVQAC: Lattice Vector Quantization Coupled With Spatially Adaptive Companding for Efficient Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel Lattice Vector Quantization scheme coupled with a spatially Adaptive Companding (LVQAC) mapping. |
Xi Zhang; Xiaolin Wu; |
933 | You Can Ground Earlier Than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we pose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input. |
Xiang Fang; Daizong Liu; Pan Zhou; Guoshun Nan; |
934 | EqMotion: Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such equivariance and invariance properties are overlooked by most existing methods. To fill this gap, we propose EqMotion, an efficient equivariant motion prediction model with invariant interaction reasoning. |
Chenxin Xu; Robby T. Tan; Yuhong Tan; Siheng Chen; Yu Guang Wang; Xinchao Wang; Yanfeng Wang; |
935 | Fine-Grained Face Swapping Via Regional GAN Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We rethink face swapping from the perspective of fine-grained face editing, i.e., editing for swapping (E4S), and propose a framework that is based on the explicit disentanglement of the shape and texture of facial components. |
Zhian Liu; Maomao Li; Yong Zhang; Cairong Wang; Qi Zhang; Jue Wang; Yongwei Nie; |
936 | Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel diffusion-based framework, named Diffusion Co-Speech Gesture (DiffGesture), to effectively capture the cross-modal audio-to-gesture associations and preserve temporal coherence for high-fidelity audio-driven co-speech gesture generation. |
Lingting Zhu; Xian Liu; Xuanyu Liu; Rui Qian; Ziwei Liu; Lequan Yu; |
937 | FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by recent success of masked autoencoding (MAE) pretraining in unleashing transformers’ capacity of encoding visual representation, we propose Masked Cost Volume Autoencoding (MCVA) to enhance FlowFormer by pretraining the cost-volume encoder with a novel MAE scheme. |
Xiaoyu Shi; Zhaoyang Huang; Dasong Li; Manyuan Zhang; Ka Chun Cheung; Simon See; Hongwei Qin; Jifeng Dai; Hongsheng Li; |
938 | NeRFLix: High-Quality Neural View Synthesis By Learning A Degradation-Driven Inter-Viewpoint MiXer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Towards to improve the synthesis quality of NeRF-based approaches, we propose NeRFLiX, a general NeRF-agnostic restorer paradigm by learning a degradation-driven inter-viewpoint mixer. |
Kun Zhou; Wenbo Li; Yi Wang; Tao Hu; Nianjuan Jiang; Xiaoguang Han; Jiangbo Lu; |
939 | HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. |
Anshul Shah; Aniket Roy; Ketul Shah; Shlok Mishra; David Jacobs; Anoop Cherian; Rama Chellappa; |
940 | STMixer: A One-Stage Sparse Action Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new one-stage sparse action detector, termed STMixer. |
Tao Wu; Mengqi Cao; Ziteng Gao; Gangshan Wu; Limin Wang; |
941 | 3D Human Keypoints Estimation From Point Clouds in The Wild Without Human Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose GC-KPL – Geometry Consistency inspired Key Point Leaning, an approach for learning 3D human joint locations from point clouds without human labels. |
Zhenzhen Weng; Alexander S. Gorban; Jingwei Ji; Mahyar Najibi; Yin Zhou; Dragomir Anguelov; |
942 | Where Is My Spot? Few-Shot Image Generation Via Latent Subspace Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Image generation relies on massive training data that can hardly produce diverse images of an unseen category according to a few examples. In this paper, we address this dilemma by projecting sparse few-shot samples into a continuous latent space that can potentially generate infinite unseen samples. |
Chenxi Zheng; Bangzhen Liu; Huaidong Zhang; Xuemiao Xu; Shengfeng He; |
943 | FLEX: Full-Body Grasping Without Full-Body Grasps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, 1) these methods do not generalize to different object positions and orientations or to the presence of furniture in the scene, and 2) the diversity of their generated full-body poses is very limited. In this work, we address all the above challenges to generate realistic, diverse full-body grasps in everyday scenes without requiring any 3D full-body grasping data. |
Purva Tendulkar; Dídac Surís; Carl Vondrick; |
944 | Genie: Show Me The Data for Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. |
Yongkweon Jeon; Chungman Lee; Ho-young Kim; |
945 | EVA: Exploring The Limits of Masked Visual Representation Learning at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data. |
Yuxin Fang; Wen Wang; Binhui Xie; Quan Sun; Ledell Wu; Xinggang Wang; Tiejun Huang; Xinlong Wang; Yue Cao; |
946 | TopNet: Transformer-Based Object Placement Network for Image Compositing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to learn the correlation between object features and all local background features with a transformer module so that detailed information can be provided on all possible location/scale configurations. |
Sijie Zhu; Zhe Lin; Scott Cohen; Jason Kuen; Zhifei Zhang; Chen Chen; |
947 | Discrete Point-Wise Attack Is Not Enough: Generalized Manifold Adversarial Attack for Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such paradigm of point-wise attack exhibits poor generalization against numerous unknown states of identity and can be easily defended. In this paper, by rethinking the inherent relationship between the face of target identity and its variants, we introduce a new pipeline of Generalized Manifold Adversarial Attack (GMAA) to achieve a better attack performance by expanding the attack range. |
Qian Li; Yuxiao Hu; Ye Liu; Dongxiao Zhang; Xin Jin; Yuntian Chen; |
948 | Gloss Attention for Gloss-Free Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most sign language translation (SLT) methods to date require the use of gloss annotations to provide additional supervision information, however, the acquisition of gloss is not easy. To solve this problem, we first perform an analysis of existing models to confirm how gloss annotations make SLT easier. |
Aoxiong Yin; Tianyun Zhong; Li Tang; Weike Jin; Tao Jin; Zhou Zhao; |
949 | Multi-Agent Automated Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose multi-agent automated machine learning (MA2ML) with the aim to effectively handle joint optimization of modules in automated machine learning (AutoML). |
Zhaozhi Wang; Kefan Su; Jian Zhang; Huizhu Jia; Qixiang Ye; Xiaodong Xie; Zongqing Lu; |
950 | Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation From Image Sequence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of online camera-to-robot pose estimation from single-view successive frames of an image sequence, a crucial task for robots to interact with the world. |
Yang Tian; Jiyao Zhang; Zekai Yin; Hao Dong; |
951 | FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Fairness Domain Adaptation (FREDOM) approach to semantic scene segmentation. |
Thanh-Dat Truong; Ngan Le; Bhiksha Raj; Jackson Cothren; Khoa Luu; |
952 | IMP: Iterative Matching and Pose Estimation With Adaptive Pooling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. |
Fei Xue; Ignas Budvytis; Roberto Cipolla; |
953 | HRDFuse: Monocular 360deg Depth Estimation By Collaboratively Learning Holistic-With-Regional Depth Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, HRDFuse, that subtly combines the potential of convolutional neural networks (CNNs) and transformers by collaboratively learning the holistic contextual information from the ERP and the regional structural information from the TP. |
Hao Ai; Zidong Cao; Yan-Pei Cao; Ying Shan; Lin Wang; |
954 | Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a robust and fast bundle adjustment solution that estimates the 6-DoF pose of the camera and the geometry of the environment based on measurements from a rolling shutter (RS) camera. |
Bangyan Liao; Delin Qu; Yifei Xue; Huiqing Zhang; Yizhen Lao; |
955 | StructVPR: Distill Structural Knowledge With Weighting Samples for Visual Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features and thus improve feature stability in a constantly changing environment. |
Yanqing Shen; Sanping Zhou; Jingwen Fu; Ruotong Wang; Shitao Chen; Nanning Zheng; |
956 | PATS: Patch Area Transportation With Subdivision for Local Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, detector-free methods present generally better performance but are not satisfactory in image pairs with large scale differences. In this paper, we propose Patch Area Transportation with Subdivision (PATS) to tackle this issue. |
Junjie Ni; Yijin Li; Zhaoyang Huang; Hongsheng Li; Hujun Bao; Zhaopeng Cui; Guofeng Zhang; |
957 | Learning Human-to-Robot Handovers From Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first framework to learn control policies for vision-based human-to-robot handovers, a critical task for human-robot interaction. |
Sammy Christen; Wei Yang; Claudia Pérez-D’Arpino; Otmar Hilliges; Dieter Fox; Yu-Wei Chao; |
958 | MEDIC: Remove Model Backdoors Via Importance Driven Cloning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a novel method to remove injected backdoors in deep learning models. |
Qiuling Xu; Guanhong Tao; Jean Honorio; Yingqi Liu; Shengwei An; Guangyu Shen; Siyuan Cheng; Xiangyu Zhang; |
959 | Context-Aware Relative Object Queries To Unify Video Instance and Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As one answer to both questions we propose ‘context-aware relative object queries’, which are continuously propagated frame-by-frame. |
Anwesa Choudhuri; Girish Chowdhary; Alexander G. Schwing; |
960 | Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. |
Haochen Wang; Xiaodan Du; Jiahao Li; Raymond A. Yeh; Greg Shakhnarovich; |
961 | Role of Transients in Two-Bounce Non-Line-of-Sight Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the role of time-of-flight (ToF) measurements, i.e. transients, in 2B-NLOS under multiplexed illumination. |
Siddharth Somasundaram; Akshat Dave; Connor Henley; Ashok Veeraraghavan; Ramesh Raskar; |
962 | SimpleNet: A Simple Network for Image Anomaly Detection and Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple and application-friendly network (called SimpleNet) for detecting and localizing anomalies. |
Zhikang Liu; Yiming Zhou; Yuansheng Xu; Zilei Wang; |
963 | Elastic Aggregation for Federated Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, naive aggregation suffers from client-drift when the data is heterogenous (non-IID), leading to unstable and slow convergence. In this work, we propose a novel aggregation approach, elastic aggregation, to overcome these issues. |
Dengsheng Chen; Jie Hu; Vince Junkai Tan; Xiaoming Wei; Enhua Wu; |
964 | G-MSM: Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present G-MSM (Graph-based Multi-Shape Matching), a novel unsupervised learning approach for non-rigid shape correspondence. |
Marvin Eisenberger; Aysim Toker; Laura Leal-Taixé; Daniel Cremers; |
965 | Enhancing Deformable Local Features By Jointly Learning To Detect and Describe Keypoints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DALF (Deformation-Aware Local Features), a novel deformation-aware network for jointly detecting and describing keypoints, to handle the challenging problem of matching deformable surfaces. |
Guilherme Potje; Felipe Cadar; André Araujo; Renato Martins; Erickson R. Nascimento; |
966 | ObjectMatch: Robust Registration Using Canonical Object Correspondences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to leverage indirect correspondences obtained via semantic object identification. |
Can Gümeli; Angela Dai; Matthias Nießner; |
967 | Siamese Image Modeling for Self-Supervised Vision Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM). |
Chenxin Tao; Xizhou Zhu; Weijie Su; Gao Huang; Bin Li; Jie Zhou; Yu Qiao; Xiaogang Wang; Jifeng Dai; |
968 | Generating Part-Aware Editable 3D Shapes Without 3D Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we devise PartNeRF, a novel part-aware generative model for editable 3D shape synthesis that does not require any explicit 3D supervision. |
Konstantinos Tertikas; Despoina Paschalidou; Boxiao Pan; Jeong Joon Park; Mikaela Angelina Uy; Ioannis Emiris; Yannis Avrithis; Leonidas Guibas; |
969 | Center Focusing Network for Real-Time LiDAR Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To achieve accurate and real-time LiDAR panoptic segmentation, a novel center focusing network (CFNet) is introduced. |
Xiaoyan Li; Gang Zhang; Boyue Wang; Yongli Hu; Baocai Yin; |
970 | High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new method for NeRF-based facial avatar reconstruction that utilizes 3D-aware generative prior. |
Yunpeng Bai; Yanbo Fan; Xuan Wang; Yong Zhang; Jingxiang Sun; Chun Yuan; Ying Shan; |
971 | Mixed Autoencoder for Self-Supervised Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address, we propose homologous recognition, an auxiliary pretext task, not only to alleviate the MI increasement by explicitly requiring each patch to recognize homologous patches, but also to perform object-aware self-supervised pre-training for better downstream dense perception performance. |
Kai Chen; Zhili Liu; Lanqing Hong; Hang Xu; Zhenguo Li; Dit-Yan Yeung; |
972 | Restoration of Hand-Drawn Architectural Drawings Using Latent Space Mapping With Degradation Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents the restoration of drawings of wooden built heritage. |
Nakkwan Choi; Seungjae Lee; Yongsik Lee; Seungjoon Yang; |
973 | CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network With Large Input Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method named Content-Aware Bit Mapping (CABM), which can remove the bit selector without any performance loss. |
Senmao Tian; Ming Lu; Jiaming Liu; Yandong Guo; Yurong Chen; Shunli Zhang; |
974 | Decoupling MaxLogit for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To provide a new viewpoint to study the logit-based scoring function, we reformulate the logit into cosine similarity and logit norm and propose to use MaxCosine and MaxNorm. |
Zihan Zhang; Xiang Xiang; |
975 | ProphNet: Efficient Agent-Centric Motion Forecasting With Anchor-Informed Proposals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging. To cope with these difficulties, this paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion forecasting. |
Xishun Wang; Tong Su; Fang Da; Xiaodong Yang; |
976 | Generalizing Dataset Distillation Via Deep Generative Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the above issues, we propose to use the learned prior from pre-trained deep generative models to synthesize the distilled data. To achieve this, we present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model’s latent space. |
George Cazenavette; Tongzhou Wang; Antonio Torralba; Alexei A. Efros; Jun-Yan Zhu; |
977 | Few-Shot Class-Incremental Learning Via Class-Aware Bilateral Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To adapt the powerful distillation technique for FSCIL, we propose a novel distillation structure, by taking the unique challenge of overfitting into account. |
Linglan Zhao; Jing Lu; Yunlu Xu; Zhanzhan Cheng; Dashan Guo; Yi Niu; Xiangzhong Fang; |
978 | Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To detect more anchor pixels to ensure better adaptive patch deformation, we propose to evaluate the matching ambiguity of a certain pixel by checking the convergence of the estimated depth as optimization proceeds. |
Yuesong Wang; Zhaojie Zeng; Tao Guan; Wei Yang; Zhuo Chen; Wenkai Liu; Luoyuan Xu; Yawei Luo; |
979 | Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel method for OOD detection. |
Bartłomiej Olber; Krystian Radlak; Adam Popowicz; Michal Szczepankiewicz; Krystian Chachuła; |
980 | SeaThru-NeRF: Neural Radiance Fields in Scattering Media Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a new rendering model for NeRFs in scattering media, which is based on the SeaThru image formation model, and suggest a suitable architecture for learning both scene information and medium parameters. |
Deborah Levy; Amit Peleg; Naama Pearl; Dan Rosenbaum; Derya Akkaynak; Simon Korman; Tali Treibitz; |
981 | Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to explicitly construct multi-modal class representations by leveraging the Contrastive Language-Image Pre-training (CLIP), to guide dense localization. |
Lian Xu; Wanli Ouyang; Mohammed Bennamoun; Farid Boussaid; Dan Xu; |
982 | Learning To Dub Movies Via Hierarchical Prosody Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: V2C is more challenging than conventional text-to-speech tasks as it additionally requires the generated speech to exactly match the varying emotions and speaking speed presented in the video. Unlike previous works, we propose a novel movie dubbing architecture to tackle these problems via hierarchical prosody modeling, which bridges the visual information to corresponding speech prosody from three aspects: lip, face, and scene. |
Gaoxiang Cong; Liang Li; Yuankai Qi; Zheng-Jun Zha; Qi Wu; Wenyu Wang; Bin Jiang; Ming-Hsuan Yang; Qingming Huang; |
983 | DiffusionRig: Learning Personalized Priors for Facial Appearance Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. |
Zheng Ding; Xuaner Zhang; Zhihao Xia; Lars Jebe; Zhuowen Tu; Xiuming Zhang; |
984 | Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to first obtain the proper latent code in foundation latent space W. |
Hongyu Liu; Yibing Song; Qifeng Chen; |
985 | MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers. |
Jihao Liu; Xin Huang; Jinliang Zheng; Yu Liu; Hongsheng Li; |
986 | Human Pose Estimation in Extremely Low-Light Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study human pose estimation in extremely low-light images. |
Sohyun Lee; Jaesung Rim; Boseung Jeong; Geonu Kim; Byungju Woo; Haechan Lee; Sunghyun Cho; Suha Kwak; |
987 | EventNeRF: Neural Radiance Fields From A Single Colour Event Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, this paper proposes the first approach for 3D-consistent, dense and photorealistic novel view synthesis using just a single colour event stream as input. |
Viktor Rudnev; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik; |
988 | Neighborhood Attention Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Neighborhood Attention (NA), the first efficient and scalable sliding window attention mechanism for vision. |
Ali Hassani; Steven Walton; Jiachen Li; Shen Li; Humphrey Shi; |
989 | Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we begin with analyzing the feature representation behavior in the open-set action recognition (OSAR) problem based on the information bottleneck (IB) theory, and propose to enlarge the instance-specific (IS) and class-specific (CS) information contained in the feature for better performance. |
Jun Cen; Shiwei Zhang; Xiang Wang; Yixuan Pei; Zhiwu Qing; Yingya Zhang; Qifeng Chen; |
990 | Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze existing training algorithms towards their flexibility for different annotation types and scalability to small annotation regimes. |
Simon Reiß; Constantin Seibold; Alexander Freytag; Erik Rodner; Rainer Stiefelhagen; |
991 | Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient event-based motion estimation framework for various motion models. |
Xueyan Huang; Yueyi Zhang; Zhiwei Xiong; |
992 | Trap Attention: Monocular Depth Estimation With Manual Traps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we exploit a depth-wise convolution to obtain long-range information, and propose a novel trap attention, which sets some traps on the extended space for each pixel, and forms the attention mechanism by the feature retention ratio of convolution window, resulting in that the quadratic computational complexity can be converted to linear form. |
Chao Ning; Hongping Gan; |
993 | Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new iterative method which we term Iterative Next Boundary Detection (INBD). |
Alexander Gillert; Giulia Resente; Alba Anadon-Rosell; Martin Wilmking; Uwe Freiherr von Lukas; |
994 | Learning and Aggregating Lane Graphs for Urban Automated Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, merging overlapping lane graphs to obtain consistent largescale graphs remains difficult. To overcome these challenges, we propose a novel bottom-up approach to lane graph estimation from aerial imagery that aggregates multiple overlapping graphs into a single consistent graph. |
Martin Büchner; Jannik Zürn; Ion-George Todoran; Abhinav Valada; Wolfram Burgard; |
995 | Universal Instance Perception As Object Discovery and Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a universal instance perception model of the next generation, termed UNINEXT. |
Bin Yan; Yi Jiang; Jiannan Wu; Dong Wang; Ping Luo; Zehuan Yuan; Huchuan Lu; |
996 | GlassesGAN: Eyewear Personalization Using Synthetic Appearance Discovery and Targeted Subspace Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present GlassesGAN, a novel image editing framework for custom design of glasses, that sets a new standard in terms of output-image quality, edit realism, and continuous multi-style edit capability. |
Richard Plesh; Peter Peer; Vitomir Struc; |
997 | Representing Volumetric Videos As Dynamic MLP Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a novel representation of volumetric videos for real-time view synthesis of dynamic scenes. |
Sida Peng; Yunzhi Yan; Qing Shuai; Hujun Bao; Xiaowei Zhou; |
998 | Deep Hashing With Minimal-Distance-Separated Hash Centers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an optimization method that finds hash centers with a constraint on the minimal distance between any pair of hash centers, which is non-trivial due to the non-convex nature of the problem. |
Liangdao Wang; Yan Pan; Cong Liu; Hanjiang Lai; Jian Yin; Ye Liu; |
999 | Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we creatively model video-text as game players with multivariate cooperative game theory to wisely handle the uncertainty during fine-grained semantic interaction with diverse granularity, flexible combination, and vague intensity. |
Peng Jin; Jinfa Huang; Pengfei Xiong; Shangxuan Tian; Chang Liu; Xiangyang Ji; Li Yuan; Jie Chen; |
1000 | VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations. |
Ziqin Wang; Bowen Cheng; Lichen Zhao; Dong Xu; Yang Tang; Lu Sheng; |
1001 | Learning Emotion Representations From Verbal and Nonverbal Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication using only uncurated data. |
Sitao Zhang; Yimu Pan; James Z. Wang; |
1002 | Transferable Adversarial Attacks on Vision Transformers With Token Gradient Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome the shortcomings of existing approaches, we propose the Token Gradient Regularization (TGR) method. |
Jianping Zhang; Yizhan Huang; Weibin Wu; Michael R. Lyu; |
1003 | MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel mutual correction framework (MCF) to explore network bias correction and improve the performance of SSMIS. |
Yongchao Wang; Bin Xiao; Xiuli Bi; Weisheng Li; Xinbo Gao; |
1004 | Blur Interpolation Transformer for Real-World Motion From Blur Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a blur interpolation transformer (BiT) to effectively unravel the underlying temporal correlation encoded in blur. |
Zhihang Zhong; Mingdeng Cao; Xiang Ji; Yinqiang Zheng; Imari Sato; |
1005 | Rethinking Few-Shot Medical Segmentation: A Vector Quantization View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the observation, we propose a learning VQ mechanism consisting of grid-format VQ (GFVQ), self-organized VQ (SOVQ) and residual oriented VQ (ROVQ). |
Shiqi Huang; Tingfa Xu; Ning Shen; Feng Mu; Jianan Li; |
1006 | Event-Based Shape From Polarization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the real world, we observe, however, that the challenging conditions (i.e., when few events are generated) harm the performance of physics-based solutions. To overcome this, we propose a learning-based approach that learns to estimate surface normals even at low event-rates, improving the physics-based approach by 52% on the real world dataset. |
Manasi Muglikar; Leonard Bauersfeld; Diederik Paul Moeys; Davide Scaramuzza; |
1007 | Architectural Backdoors in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. |
Mikel Bober-Irizar; Ilia Shumailov; Yiren Zhao; Robert Mullins; Nicolas Papernot; |
1008 | ARO-Net: Learning Implicit Fields From Anchored Radial Observations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The main idea behind our work is to reason about shapes through partial observations from a set of viewpoints, called anchors. |
Yizhi Wang; Zeyu Huang; Ariel Shamir; Hui Huang; Hao Zhang; Ruizhen Hu; |
1009 | All in One: Exploring Unified Video-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we for the first time introduce an end-to-end video-language model, namely all-in-one Transformer, that embeds raw video and textual signals into joint representations using a unified backbone architecture. |
Jinpeng Wang; Yixiao Ge; Rui Yan; Yuying Ge; Kevin Qinghong Lin; Satoshi Tsutsui; Xudong Lin; Guanyu Cai; Jianping Wu; Ying Shan; Xiaohu Qie; Mike Zheng Shou; |
1010 | Parametric Implicit Face Representation for Audio-Driven Facial Reenactment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. |
Ricong Huang; Peiwen Lai; Yipeng Qin; Guanbin Li; |
1011 | Semantic Human Parsing Via Scalable Semantic Transfer Over Multiple Label Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Scalable Semantic Transfer (SST), a novel training paradigm, to explore how to leverage the mutual benefits of the data from different label domains (i.e. various levels of label granularity) to train a powerful human parsing network. |
Jie Yang; Chaoqun Wang; Zhen Li; Junle Wang; Ruimao Zhang; |
1012 | Making Vision Transformers Efficient From A Token Sparsification View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks. |
Shuning Chang; Pichao Wang; Ming Lin; Fan Wang; David Junhao Zhang; Rong Jin; Mike Zheng Shou; |
1013 | GEN: Pushing The Limits of Softmax-Based Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Generalized ENtropy score (GEN), a simple but effective entropy-based score function, which can be applied to any pre-trained softmax-based classifier. |
Xixi Liu; Yaroslava Lochman; Christopher Zach; |
1014 | RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we resort to the efficient one-stage detector and propose a novel weakly supervised model called RefCLIP. |
Lei Jin; Gen Luo; Yiyi Zhou; Xiaoshuai Sun; Guannan Jiang; Annan Shu; Rongrong Ji; |
1015 | VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Conversely, user comments offer more comprehensive information and are a more natural way to express human opinions and preferences regarding image aesthetics. In light of this, we propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. |
Junjie Ke; Keren Ye; Jiahui Yu; Yonghui Wu; Peyman Milanfar; Feng Yang; |
1016 | Learnable Skeleton-Aware 3D Point Cloud Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new skeleton-aware learning-to-sample method by learning object skeletons as the prior knowledge to preserve the object geometry and topology information during sampling. |
Cheng Wen; Baosheng Yu; Dacheng Tao; |
1017 | Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we observe that there is an inconsistency between the quality of the pseudo-labels in CAMs and the performance of the final segmentation model, and the mislabeled pixels mainly lie on the boundary areas. |
Shenghai Rong; Bohai Tu; Zilei Wang; Junjie Li; |
1018 | Re-IQA: Unsupervised Learning for Image Quality Assessment in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Automatic Perceptual Image Quality Assessment is a challenging problem that impacts billions of internet, and social media users daily. To advance research in this field, we propose a Mixture of Experts approach to train two separate encoders to learn high-level content and low-level image quality features in an unsupervised setting. |
Avinab Saha; Sandeep Mishra; Alan C. Bovik; |
1019 | Procedure-Aware Pretraining for Instructional Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to learn a video representation that is useful for downstream procedure understanding tasks in instructional videos. |
Honglu Zhou; Roberto Martín-Martín; Mubbasir Kapadia; Silvio Savarese; Juan Carlos Niebles; |
1020 | Sample-Level Multi-View Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to exploit the implied data manifold by learning the topological structure of data. |
Yuze Tan; Yixi Liu; Shudong Huang; Wentao Feng; Jiancheng Lv; |
1021 | Fine-Grained Audible Video Description Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We create two new metrics for this task: an EntityScore to gauge the completeness of entities in the visual descriptions, and an AudioScore to assess the audio descriptions. As a preliminary approach to this task, we propose an audio-visual-language transformer that extends existing video captioning model with an additional audio branch. |
Xuyang Shen; Dong Li; Jinxing Zhou; Zhen Qin; Bowen He; Xiaodong Han; Aixuan Li; Yuchao Dai; Lingpeng Kong; Meng Wang; Yu Qiao; Yiran Zhong; |
1022 | 3D Semantic Segmentation in The Wild: Learning Generalized Models for Adverse-Condition Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. |
Aoran Xiao; Jiaxing Huang; Weihao Xuan; Ruijie Ren; Kangcheng Liu; Dayan Guan; Abdulmotaleb El Saddik; Shijian Lu; Eric P. Xing; |
1023 | Catch Missing Details: Image Reconstruction With Frequency Augmented Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a Frequency Complement Module (FCM) architecture is proposed to capture the missing frequency information for enhancing reconstruction quality. |
Xinmiao Lin; Yikang Li; Jenhao Hsiao; Chiuman Ho; Yu Kong; |
1024 | RaBit: Parametric Modeling of 3D Biped Cartoon Characters With A Topological-Consistent Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce 3DBiCar, the first large-scale dataset of 3D biped cartoon characters, and RaBit, the corresponding parametric model. |
Zhongjin Luo; Shengcai Cai; Jinguo Dong; Ruibo Ming; Liangdong Qiu; Xiaohang Zhan; Xiaoguang Han; |
1025 | Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images. |
Jingxiang Sun; Xuan Wang; Lizhen Wang; Xiaoyu Li; Yong Zhang; Hongwen Zhang; Yebin Liu; |
1026 | Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the task of training a unified 3D detector from multiple datasets. |
Bo Zhang; Jiakang Yuan; Botian Shi; Tao Chen; Yikang Li; Yu Qiao; |
1027 | Linking Garment With Person Via Semantically Associated Landmarks for Virtual Try-On Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a novel virtual try-on algorithm, dubbed SAL-VTON, is proposed, which links the garment with the person via semantically associated landmarks to alleviate misalignment. |
Keyu Yan; Tingwei Gao; Hui Zhang; Chengjun Xie; |
1028 | ACR: Attention Collaboration-Based Regressor for Arbitrary Two-Hand Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents ACR (Attention Collaboration-based Regressor), which makes the first attempt to reconstruct hands in arbitrary scenarios. |
Zhengdi Yu; Shaoli Huang; Chen Fang; Toby P. Breckon; Jue Wang; |
1029 | Rotation-Invariant Transformer for Point Cloud Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task. |
Hao Yu; Zheng Qin; Ji Hou; Mahdi Saleh; Dongsheng Li; Benjamin Busam; Slobodan Ilic; |
1030 | Devil’s on The Edges: Selective Quad Attention for Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One of the major challenges for the task lies in the presence of distracting objects and relationships in images; contextual reasoning is strongly distracted by irrelevant objects or backgrounds and, more importantly, a vast number of irrelevant candidate relations. To tackle the issue, we propose the Selective Quad Attention Network (SQUAT) that learns to select relevant object pairs and disambiguate them via diverse contextual interactions. |
Deunsol Jung; Sanghyun Kim; Won Hwa Kim; Minsu Cho; |
1031 | NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection Via Neural Instance Feature Forging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the first data-free knowledge distillation (DFKD) approach for G-FSOD that leverages the statistics of the region of interest (RoI) features from the base model to forge instance-level features without accessing the base images. |
Karim Guirguis; Johannes Meier; George Eskandar; Matthias Kayser; Bin Yang; Jürgen Beyerer; |
1032 | Habitat-Matterport 3D Semantics Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. |
Karmesh Yadav; Ram Ramrakhya; Santhosh Kumar Ramakrishnan; Theo Gervet; John Turner; Aaron Gokaslan; Noah Maestre; Angel Xuan Chang; Dhruv Batra; Manolis Savva; Alexander William Clegg; Devendra Singh Chaplot; |
1033 | Post-Processing Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
1034 | ConZIC: Controllable Zero-Shot Image Captioning By Sampling-Based Polishing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To move forward, we propose a framework for Controllable Zero-shot IC, named ConZIC. |
Zequn Zeng; Hao Zhang; Ruiying Lu; Dongsheng Wang; Bo Chen; Zhengjue Wang; |
1035 | EDGE: Editable Dance Generation From Music Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Editable Dance GEneration (EDGE), a state-of-the-art method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to the input music. |
Jonathan Tseng; Rodrigo Castellon; Karen Liu; |
1036 | Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel curricular contrastive regularization targeted at a consensual contrastive space as opposed to a non-consensual one. |
Yu Zheng; Jiahui Zhan; Shengfeng He; Junyu Dong; Yong Du; |
1037 | Learning From Noisy Labels With Decoupled Meta Label Purifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As compromise, previous methods resort toa coupled learning process with alternating update. In this paper, we empirically find such simultaneous optimization over both model weights and label distribution can not achieve an optimal routine, consequently limiting the representation ability of backbone and accuracy of corrected labels. |
Yuanpeng Tu; Boshen Zhang; Yuxi Li; Liang Liu; Jian Li; Yabiao Wang; Chengjie Wang; Cai Rong Zhao; |
1038 | Language in A Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and are first to show how to construct high-performance CBMs without manual specification of similar accuracy to black box models. |
Yue Yang; Artemis Panagopoulou; Shenghao Zhou; Daniel Jin; Chris Callison-Burch; Mark Yatskar; |
1039 | Sharpness-Aware Gradient Matching for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present two conditions to ensure that the model could converge to a flat minimum with a small loss, and present an algorithm, named Sharpness-Aware Gradient Matching (SAGM), to meet the two conditions for improving model generalization capability. |
Pengfei Wang; Zhaoxiang Zhang; Zhen Lei; Lei Zhang; |
1040 | ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although two-stage HOI detectors have advantages of high efficiency in training and inference, they suffer from lower performance than one-stage methods due to the old backbone networks and the lack of considerations for the HOI perception process of humans in the interaction classifiers. In this paper, we propose Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO) to resolve these problems. |
Jeeseung Park; Jin-Woo Park; Jong-Seok Lee; |
1041 | Improving Table Structure Recognition With Visual-Alignment Sequential Coordinate Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, as the logical representation lacks the local visual information, the previous methods often produce imprecise bounding boxes. To address this issue, we propose an end-to-end sequential modeling framework for table structure recognition called VAST. |
Yongshuai Huang; Ning Lu; Dapeng Chen; Yibo Li; Zecheng Xie; Shenggao Zhu; Liangcai Gao; Wei Peng; |
1042 | MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel Twins Contrastive Mechanism (TCM) to provide more appropriate supervision for ReID architecture search. |
Jianyang Gu; Kai Wang; Hao Luo; Chen Chen; Wei Jiang; Yuqiang Fang; Shanghang Zhang; Yang You; Jian Zhao; |
1043 | WIRE: Wavelet Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, parameter variation, etc.). Inspired by harmonic analysis, we develop a new, highly accurate and robust INR that does not exhibit this tradeoff. |
Vishwanath Saragadam; Daniel LeJeune; Jasper Tan; Guha Balakrishnan; Ashok Veeraraghavan; Richard G. Baraniuk; |
1044 | Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to eliminate the square effect, we design a bi-directional feature fusion generative adversarial network (BFF-GAN) with a global branch and a local branch. |
Kexin Sun; Zhineng Chen; Gongwei Wang; Jun Liu; Xiongjun Ye; Yu-Gang Jiang; |
1045 | HumanGen: Generating Human Radiance Fields With Explicit Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HumanGen, a novel 3D human generation scheme with detailed geometry and 360deg realistic free-view rendering. |
Suyi Jiang; Haoran Jiang; Ziyu Wang; Haimin Luo; Wenzheng Chen; Lan Xu; |
1046 | Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present InterWild, which brings MoCap and ITW samples to shared domains for robust 3D interacting hands recovery in the wild with a limited amount of ITW 2D/3D interacting hands data. |
Gyeongsik Moon; |
1047 | Local Connectivity-Based Density Estimation for Face Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, those false positive edges, which connect negative node pairs, have the risk of integration of different clusters when their connectivity is incorrectly estimated. This paper proposes a novel face clustering method to address this problem. |
Junho Shin; Hyo-Jun Lee; Hyunseop Kim; Jong-Hyeon Baek; Daehyun Kim; Yeong Jun Koh; |
1048 | Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Adaptive Zone-aware Hierarchical Planner (AZHP) to explicitly divides the navigation process into two heterogeneous phases, i.e., sub-goal setting via zone partition/selection (high-level action) and sub-goal executing (low-level action), for hierarchical planning. |
Chen Gao; Xingyu Peng; Mi Yan; He Wang; Lirong Yang; Haibing Ren; Hongsheng Li; Si Liu; |
1049 | Towards Practical Plug-and-Play Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For that, the existing practice is to fine-tune the guidance models with labeled data corrupted with noises. In this paper, we argue that this practice has limitations in two aspects: (1) performing on inputs with extremely various noises is too hard for a single guidance model; (2) collecting labeled datasets hinders scaling up for various tasks. |
Hyojun Go; Yunsung Lee; Jin-Young Kim; Seunghyun Lee; Myeongho Jeong; Hyun Seung Lee; Seungtaek Choi; |
1050 | Memory-Friendly Scalable Super-Resolution Via Rewinding Lottery Ticket Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Memory-friendly Scalable SR framework (MSSR). |
Jin Lin; Xiaotong Luo; Ming Hong; Yanyun Qu; Yuan Xie; Zongze Wu; |
1051 | YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the topics, we propose a trainable bag-of-freebies oriented solution. |
Chien-Yao Wang; Alexey Bochkovskiy; Hong-Yuan Mark Liao; |
1052 | Deep Deterministic Uncertainty: A New Simple Baseline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Reliable uncertainty from deterministic single-forward pass models is sought after because conventional methods of uncertainty quantification are computationally expensive. We take two complex single-forward-pass uncertainty approaches, DUQ and SNGP, and examine whether they mainly rely on a well-regularized feature space. |
Jishnu Mukhoti; Andreas Kirsch; Joost van Amersfoort; Philip H.S. Torr; Yarin Gal; |
1053 | PartDistillation: Learning Parts From Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a scalable framework to learn part segmentation from object instance labels. |
Jang Hyun Cho; Philipp Krähenbühl; Vignesh Ramanathan; |
1054 | DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, recent frameworks, which leverage pre-trained vision-language models, are limited by either per text-prompt optimization or inference-time hyper-parameters tuning. In this work, we propose a novel framework named DeltaEdit to address these problems. |
Yueming Lyu; Tianwei Lin; Fu Li; Dongliang He; Jing Dong; Tieniu Tan; |
1055 | Boosting Video Object Segmentation Via Space-Time Correspondence Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: They simply exploit the supervisory signals from the groundtruth masks for learning mask prediction only, without posing any constraint on the space-time correspondence matching, which, however, is the fundamental building block of such regime. To alleviate this crucial yet commonly ignored issue, we devise a correspondence-aware training framework, which boosts matching-based VOS solutions by explicitly encouraging robust correspondence matching during network learning. |
Yurong Zhang; Liulei Li; Wenguan Wang; Rong Xie; Li Song; Wenjun Zhang; |
1056 | Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Those LTSSL algorithms built upon the assumption can severely suffer when the class distributions of labeled and unlabeled data are mismatched since they utilize biased pseudo-labels from the model. To alleviate this issue, we propose a new simple method that can effectively utilize unlabeled data of unknown class distributions by introducing the adaptive consistency regularizer (ACR). |
Tong Wei; Kai Gan; |
1057 | GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation Via Generalizable and Actionable Parts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For years, researchers have been devoted to generalizable object perception and manipulation, where cross-category generalizability is highly desired yet underexplored. In this work, we propose to learn such cross-category skills via Generalizable and Actionable Parts (GAParts). |
Haoran Geng; Helin Xu; Chengyang Zhao; Chao Xu; Li Yi; Siyuan Huang; He Wang; |
1058 | NeRDi: Single-View NeRF Synthesis With Language-Guided Diffusion As General Image Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: 2D-to-3D reconstruction is an ill-posed problem, yet humans are good at solving this problem due to their prior knowledge of the 3D world developed over years. Driven by this observation, we propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models. |
Congyue Deng; Chiyu “Max” Jiang; Charles R. Qi; Xinchen Yan; Yin Zhou; Leonidas Guibas; Dragomir Anguelov; |
1059 | Therbligs in Action: Video Understanding Through Motion Primitives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we introduce a rule-based, compositional, and hierarchical modeling of action using Therbligs as our atoms. |
Eadom Dessalene; Michael Maynord; Cornelia Fermüller; Yiannis Aloimonos; |
1060 | InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take one step further towards real-world applicability of monocular neural avatar reconstruction by contributing InstantAvatar, a system that can reconstruct human avatars from a monocular video within seconds, and these avatars can be animated and rendered at an interactive rate. To achieve this efficiency we propose a carefully designed and engineered system, that leverages emerging acceleration structures for neural fields, in combination with an efficient empty-space skipping strategy for dynamic scenes. |
Tianjian Jiang; Xu Chen; Jie Song; Otmar Hilliges; |
1061 | You Only Segment Once: Towards Real-Time Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose YOSO, a real-time panoptic segmentation framework. |
Jie Hu; Linyan Huang; Tianhe Ren; Shengchuan Zhang; Rongrong Ji; Liujuan Cao; |
1062 | Robust Single Image Reflection Removal Against Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper addresses the problem of robust deep single-image reflection removal (SIRR) against adversarial attacks. |
Zhenbo Song; Zhenyuan Zhang; Kaihao Zhang; Wenhan Luo; Zhaoxin Fan; Wenqi Ren; Jianfeng Lu; |
1063 | OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate the development of 3D perception, reconstruction, and generation in the real world, we propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. |
Tong Wu; Jiarui Zhang; Xiao Fu; Yuxin Wang; Jiawei Ren; Liang Pan; Wayne Wu; Lei Yang; Jiaqi Wang; Chen Qian; Dahua Lin; Ziwei Liu; |
1064 | PartMix: Regularization Strategy To Learn Part Discovery for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel data augmentation technique, dubbed PartMix, that synthesizes the augmented samples by mixing the part descriptors across the modalities to improve the performance of part-based VI-ReID models. |
Minsu Kim; Seungryong Kim; Jungin Park; Seongheon Park; Kwanghoon Sohn; |
1065 | Uncovering The Disentanglement Capability in Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous studies have found that generative adversarial networks (GANs) are inherently endowed with such disentanglement capability, so they can perform disentangled image editing without re-training or fine-tuning the network. In this work, we explore whether diffusion models are also inherently equipped with such a capability. |
Qiucheng Wu; Yujian Liu; Handong Zhao; Ajinkya Kale; Trung Bui; Tong Yu; Zhe Lin; Yang Zhang; Shiyu Chang; |
1066 | Feature Representation Learning With Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently deep learning based methods have been developed for micro-expression recognition using feature extraction and fusion techniques, however, targeted feature learning and efficient feature fusion still lack further study according to micro-expression characteristics. To address these issues, we propose a novel framework Feature Representation Learning with adaptive Displacement Generation and Transformer fusion (FRL-DGT), in which a convolutional Displacement Generation Module (DGM) with self-supervised learning is used to extract dynamic feature targeted to the subsequent ME recognition task, and a well-designed Transformer fusion mechanism composed of the Transformer-based local fusion module, global fusion module, and full-face fusion module is applied to extract the multi-level informative feature from the output of the DGM for the final micro-expression prediction. |
Zhijun Zhai; Jianhui Zhao; Chengjiang Long; Wenju Xu; Shuangjiang He; Huijuan Zhao; |
1067 | ViewNet: A Novel Projection-Based Backbone With View Pooling for Few-Shot Point Cloud Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, based on our extensive experiments and analysis, we first show that using a point-based backbone is not the most suitable FSL approach, since (i) a large number of points’ features are discarded by the max pooling operation used in 3D point-based backbones, decreasing the ability of representing shape information; (ii)point-based backbones are sensitive to occlusion. To address these issues, we propose employing a projection- and 2D Convolutional Neural Network-based backbone, referred to as the ViewNet, for FSL from 3D point clouds. |
Jiajing Chen; Minmin Yang; Senem Velipasalar; |
1068 | EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We learn a visual representation that captures information about the camera that recorded a given photo. |
Chenhao Zheng; Ayush Shrivastava; Andrew Owens; |
1069 | ANetQA: A Large-Scale Benchmark for Fine-Grained Compositional Reasoning Over Untrimmed Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present ANetQA, a large-scale benchmark that supports fine-grained compositional reasoning over the challenging untrimmed videos from ActivityNet. |
Zhou Yu; Lixiang Zheng; Zhou Zhao; Fei Wu; Jianping Fan; Kui Ren; Jun Yu; |
1070 | SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation. |
Wenxuan Zhang; Xiaodong Cun; Xuan Wang; Yong Zhang; Xi Shen; Yu Guo; Ying Shan; Fei Wang; |
1071 | HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to regard the encodings as augmented views of the input image. |
Chia-Wen Kuo; Zsolt Kira; |
1072 | CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we found that building effective connections between pre-trained language models and visual animal keypoints is non-trivial since the gap between text-based descriptions and keypoint-based visual features about animal pose can be significant. To address this issue, we introduce a novel prompt-based Contrastive learning scheme for connecting Language and AniMal Pose (CLAMP) effectively. |
Xu Zhang; Wen Wang; Zhe Chen; Yufei Xu; Jing Zhang; Dacheng Tao; |
1073 | Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. |
Ziqi Pang; Jie Li; Pavel Tokmakov; Dian Chen; Sergey Zagoruyko; Yu-Xiong Wang; |
1074 | Learning Sample Relationship for Exposure Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new perspective to conjunct their optimization processes by correlating and constraining the relationship of correction procedure in a mini-batch. |
Jie Huang; Feng Zhao; Man Zhou; Jie Xiao; Naishan Zheng; Kaiwen Zheng; Zhiwei Xiong; |
1075 | TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our method, called TRACE, introduces several novel architectural components. |
Yu Sun; Qian Bao; Wu Liu; Tao Mei; Michael J. Black; |
1076 | TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. |
Taeyeop Lee; Jonathan Tremblay; Valts Blukis; Bowen Wen; Byeong-Uk Lee; Inkyu Shin; Stan Birchfield; In So Kweon; Kuk-Jin Yoon; |
1077 | TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? |
Weixin Chen; Dawn Song; Bo Li; |
1078 | End-to-End 3D Dense Captioning With Vote2Cap-DETR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple-yet-effective transformer framework Vote2Cap-DETR based on recent popular DEtection TRansformer (DETR). |
Sijin Chen; Hongyuan Zhu; Xin Chen; Yinjie Lei; Gang Yu; Tao Chen; |
1079 | Mitigating Task Interference in Multi-Task Learning Via Explicit Task Routing With Non-Learnable Primitives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives (NLPs) and explicit task routing (ETR). |
Chuntao Ding; Zhichao Lu; Shangguang Wang; Ran Cheng; Vishnu Naresh Boddeti; |
1080 | Learned Two-Plane Perspective Prior Based Image Resampling for Efficient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene (a ground plane and a plane above) to resample images for efficient object detection. |
Anurag Ghosh; N. Dinesh Reddy; Christoph Mertz; Srinivasa G. Narasimhan; |
1081 | Tell Me What Happened: Unifying Text-Guided Video Completion Via Multimodal Masked Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since there could be different outcomes from the hints of just a few frames, a system that can follow natural language to perform video completion may significantly improve controllability. Inspired by this, we introduce a novel task, text-guided video completion (TVC), which requests the model to generate a video from partial frames guided by an instruction. |
Tsu-Jui Fu; Licheng Yu; Ning Zhang; Cheng-Yang Fu; Jong-Chyi Su; William Yang Wang; Sean Bell; |
1082 | Tracking Through Containers and Occluders in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce TCOW, a new benchmark and model for visual tracking through heavy occlusion and containment. |
Basile Van Hoorick; Pavel Tokmakov; Simon Stent; Jie Li; Carl Vondrick; |
1083 | Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, to continually learn new categories using previous knowledge, we introduce class-incremental semantic segmentation of 3D point cloud. |
Yuwei Yang; Munawar Hayat; Zhao Jin; Chao Ren; Yinjie Lei; |
1084 | Neural Kernel Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud. |
Jiahui Huang; Zan Gojcic; Matan Atzmon; Or Litany; Sanja Fidler; Francis Williams; |
1085 | Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness Via Adaptive Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of them defend against only a single type of attack, while recent work steps forward at defending against multiple attacks. In this paper, to understand multi-target robustness, we view this problem as a bargaining game in which different players (adversaries) negotiate to reach an agreement on a joint direction of parameter updating. |
Yimu Wang; Dinghuai Zhang; Yihan Wu; Heng Huang; Hongyang Zhang; |
1086 | Decompose, Adjust, Compose: Effective Normalization By Playing With Frequency for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the normalization methods, we propose ResNet-variant models, DAC-P and DAC-SC, which are robust to the domain gap. |
Sangrok Lee; Jongseong Bae; Ha Young Kim; |
1087 | Multilateral Semantic Relations Modeling for Image Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a Multilateral Semantic Relations Modeling (termed MSRM) for image-text retrieval to capture the one-to-many correspondence between multiple samples and a given query via hypergraph modeling. |
Zheng Wang; Zhenwei Gao; Kangshuai Guo; Yang Yang; Xiaoming Wang; Heng Tao Shen; |
1088 | Optimization-Inspired Cross-Attention Transformer for Compressive Sensing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an Optimization-inspired Cross-attention Transformer (OCT) module as an iterative process, leading to a lightweight OCT-based Unfolding Framework (OCTUF) for image CS. |
Jiechong Song; Chong Mou; Shiqi Wang; Siwei Ma; Jian Zhang; |
1089 | Novel Class Discovery for 3D Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper is presented to advance the state of the art on point cloud data analysis in four directions. |
Luigi Riz; Cristiano Saltori; Elisa Ricci; Fabio Poiesi; |
1090 | CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. |
Shuailei Ma; Yuefeng Wang; Ying Wei; Jiaqi Fan; Thomas H. Li; Hongli Liu; Fanbing Lv; |
1091 | TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. |
Fabrizio Guillaro; Davide Cozzolino; Avneesh Sud; Nicholas Dufour; Luisa Verdoliva; |
1092 | LANA: A Language-Capable Navigator for Instruction Following and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this article, we devise LANA, a language-capable navigation agent which is able to not only execute human-written navigation commands, but also provide route descriptions to humans. |
Xiaohan Wang; Wenguan Wang; Jiayi Shao; Yi Yang; |
1093 | Learning 3D-Aware Image Synthesis With Unknown Pose Distribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes PoF3D that frees generative radiance fields from the requirements of 3D pose priors. |
Zifan Shi; Yujun Shen; Yinghao Xu; Sida Peng; Yiyi Liao; Sheng Guo; Qifeng Chen; Dit-Yan Yeung; |
1094 | Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel outlier-aware object detection framework that distinguishes outliers from inlier objects by learning the joint data distribution of all inlier classes with an invertible normalizing flow. |
Nishant Kumar; Siniša Šegvić; Abouzar Eslami; Stefan Gumhold; |
1095 | DivClust: Controlling Diversity in Deep Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is particularly important, as a diverse set of base clusterings are necessary for consensus clustering, which has been found to produce better and more robust results than relying on a single clustering. To address this gap, we propose DivClust, a diversity controlling loss that can be incorporated into existing deep clustering frameworks to produce multiple clusterings with the desired degree of diversity. |
Ioannis Maniadis Metaxas; Georgios Tzimiropoulos; Ioannis Patras; |
1096 | CAPE: Camera View Position Embedding for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of detecting 3D objects from multi-view images. |
Kaixin Xiong; Shi Gong; Xiaoqing Ye; Xiao Tan; Ji Wan; Errui Ding; Jingdong Wang; Xiang Bai; |
1097 | Train-Once-for-All Personalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our experiments, we however found it suboptimal, perhaps because the model’s weights are kept frozen without being personalized. To address this drawback, we propose Train-once-for-All PERsonalization (TAPER), a framework that is trained just once and can later customize a model for different end-users given their task descriptions. |
Hong-You Chen; Yandong Li; Yin Cui; Mingda Zhang; Wei-Lun Chao; Li Zhang; |
1098 | Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel TZSL model (named as Bi-VAEGAN), which largely improves the shift by a strengthened distribution alignment between the visual and auxiliary spaces. |
Zhicai Wang; Yanbin Hao; Tingting Mu; Ouxiang Li; Shuo Wang; Xiangnan He; |
1099 | FlexNeRF: Photorealistic Free-Viewpoint Rendering of Moving Humans From Sparse Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present FlexNeRF, a method for photorealistic free-viewpoint rendering of humans in motion from monocular videos. |
Vinoj Jayasundara; Amit Agrawal; Nicolas Heron; Abhinav Shrivastava; Larry S. Davis; |
1100 | DIFu: Depth-Guided Implicit Function for Clothed Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond the SMPL, which provides skinned parametric human 3D information, in this paper, we propose a new IF-based method, DIFu, that utilizes a projected depth prior containing textured and non-parametric human 3D information. |
Dae-Young Song; HeeKyung Lee; Jeongil Seo; Donghyeon Cho; |
1101 | Towards Better Gradient Consistency for Neural Signed Distance Functions Via Level Set Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we claim that gradient consistency in the field, indicated by the parallelism of level sets, is the key factor affecting the inference accuracy. |
Baorui Ma; Junsheng Zhou; Yu-Shen Liu; Zhizhong Han; |
1102 | Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Abstract: This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR), however with two significant differentiators to prior art (i) we tackle all variants … |
Fengyin Lin; Mingkang Li; Da Li; Timothy Hospedales; Yi-Zhe Song; Yonggang Qi; |
1103 | Graph Representation for Order-Aware Visual Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new visual reasoning formulation that aims at discovering changes between image pairs and their temporal orders. |
Yue Qiu; Yanjun Sun; Fumiya Matsuzawa; Kenji Iwata; Hirokatsu Kataoka; |
1104 | StarCraftImage: A Dataset for Prototyping Spatial Reasoning Methods for Multi-Agent Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. |
Sean Kulinski; Nicholas R. Waytowich; James Z. Hare; David I. Inouye; |
1105 | Quality-Aware Pre-Trained Models for Blind Image Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to solve the problem by a pretext task customized for BIQA in a self-supervised learning manner, which enables learning representations from orders of magnitude more data. |
Kai Zhao; Kun Yuan; Ming Sun; Mading Li; Xing Wen; |
1106 | Topology-Guided Multi-Class Cell Context Generation for Digital Pathology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Cells form different mixtures, lineages, clusters and holes. To model such structural patterns in a learnable fashion, we introduce several mathematical tools from spatial statistics and topological data analysis. |
Shahira Abousamra; Rajarsi Gupta; Tahsin Kurc; Dimitris Samaras; Joel Saltz; Chao Chen; |
1107 | Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. |
Yingjie Wang; Jiajun Deng; Yao Li; Jinshui Hu; Cong Liu; Yu Zhang; Jianmin Ji; Wanli Ouyang; Yanyong Zhang; |
1108 | Adaptive Graph Convolutional Subspace Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, inspired by graph convolutional networks, we use the graph convolution technique to develop a feature extraction method and a coefficient matrix constraint simultaneously. |
Lai Wei; Zhengwei Chen; Jun Yin; Changming Zhu; Rigui Zhou; Jin Liu; |
1109 | LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding. In this paper, we address this problem and propose a framework called LOCATE that can identify matching object parts across images, to transfer knowledge from images where an object is being used (exocentric images used for learning), to images where the object is inactive (egocentric ones used to test). |
Gen Li; Varun Jampani; Deqing Sun; Laura Sevilla-Lara; |
1110 | Learning Steerable Function for Efficient Image Resampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method of Learning Resampling Function (termed LeRF), which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption of interpolation methods. |
Jiacheng Li; Chang Chen; Wei Huang; Zhiqiang Lang; Fenglong Song; Youliang Yan; Zhiwei Xiong; |
1111 | TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation Via Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To leverage the observed findings, we propose a novel critical minority relationship-aware method based on the Transformer architecture in which the facial part relationships can be learned. |
Cheng Zhang; Hai Liu; Yongjian Deng; Bochen Xie; Youfu Li; |
1112 | BioNet: A Biologically-Inspired Network for Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To show our ideas and experimental evidence to the discussion, we focus on one of the most broadly researched topics both in Neuroscience and CV fields, i.e., Face Recognition (FR). |
Pengyu Li; |
1113 | Scaling Up GANs for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that naively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. |
Minguk Kang; Jun-Yan Zhu; Richard Zhang; Jaesik Park; Eli Shechtman; Sylvain Paris; Taesung Park; |
1114 | DepGraph: Towards Any Structural Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study a highly-challenging yet barely-explored task, any structural pruning, to tackle general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers. |
Gongfan Fang; Xinyin Ma; Mingli Song; Michael Bi Mi; Xinchao Wang; |
1115 | Exploring Discontinuity for Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, many practical videos contain various unnatural objects with discontinuous motions such as logos, user interfaces and subtitles. We propose three techniques that can make the existing deep learning-based VFI architectures robust to these elements. |
Sangjin Lee; Hyeongmin Lee; Chajin Shin; Hanbin Son; Sangyoun Lee; |
1116 | DynamicStereo: Consistent Dynamic Depth From Stereo Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose DynamicStereo, a novel transformer-based architecture to estimate disparity for stereo videos. |
Nikita Karaev; Ignacio Rocco; Benjamin Graham; Natalia Neverova; Andrea Vedaldi; Christian Rupprecht; |
1117 | Cut and Learn for Unsupervised Object Detection and Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. |
Xudong Wang; Rohit Girdhar; Stella X. Yu; Ishan Misra; |
1118 | Privacy-Preserving Adversarial Facial Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adversarial features-based face privacy protection (AdvFace) approach to generate privacy-preserving adversarial features, which can disrupt the mapping from adversarial features to facial images to defend against reconstruction attacks. |
Zhibo Wang; He Wang; Shuaifan Jin; Wenwen Zhang; Jiahui Hu; Yan Wang; Peng Sun; Wei Yuan; Kaixin Liu; Kui Ren; |
1119 | Exploring The Relationship Between Architectural Design and Adversarially Robust Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand the nature behind it, we conduct theoretical analysis via the lens of Rademacher complexity. |
Aishan Liu; Shiyu Tang; Siyuan Liang; Ruihao Gong; Boxi Wu; Xianglong Liu; Dacheng Tao; |
1120 | Vid2Avatar: 3D Avatar Reconstruction From Videos in The Wild Via Self-Supervised Scene Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Vid2Avatar, a method to learn human avatars from monocular in-the-wild videos. |
Chen Guo; Tianjian Jiang; Xu Chen; Jie Song; Otmar Hilliges; |
1121 | Task Residual for Tuning Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing efficient transfer learning (ETL) approaches for VLMs either damage or are excessively biased towards the prior knowledge, e.g., prompt tuning (PT) discards the pre-trained text-based classifier and builds a new one while adapter-style tuning (AT) fully relies on the pre-trained features. To address this, we propose a new efficient tuning approach for VLMs named Task Residual Tuning (TaskRes), which performs directly on the text-based classifier and explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task. |
Tao Yu; Zhihe Lu; Xin Jin; Zhibo Chen; Xinchao Wang; |
1122 | Side Adapter Network for Open-Vocabulary Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named SAN. |
Mengde Xu; Zheng Zhang; Fangyun Wei; Han Hu; Xiang Bai; |
1123 | Network Expansion for Practical Training Acceleration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a general network expansion method to reduce the practical time cost of the model training process. |
Ning Ding; Yehui Tang; Kai Han; Chao Xu; Yunhe Wang; |
1124 | FCC: Feature Clusters Compression for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In practical applications, these features are discretely mapped or even cross the decision boundary resulting in misclassification. Inspired by this observation, we propose a simple and generic method, namely Feature Clusters Compression (FCC), to increase the density of BFs by compressing backbone feature clusters. |
Jian Li; Ziyao Meng; Daqian Shi; Rui Song; Xiaolei Diao; Jingwen Wang; Hao Xu; |
1125 | Rethinking The Learning Paradigm for Dynamic Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We also identify the imbalance of short- and long-term temporal relationships in DFER. Therefore, we introduce the Multi-3D Dynamic Facial Expression Learning (M3DFEL) framework, which utilizes Multi-Instance Learning (MIL) to handle inexact labels. |
Hanyang Wang; Bo Li; Shuang Wu; Siyuan Shen; Feng Liu; Shouhong Ding; Aimin Zhou; |
1126 | Multi-Centroid Task Descriptor for Dynamic Class Incremental Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main difference is whether the task ID is given during evaluation. In this paper, we show this task information is indeed a strong prior knowledge, which will bring significant improvement over class-incremental learning baseline, e.g., DER. |
Tenghao Cai; Zhizhong Zhang; Xin Tan; Yanyun Qu; Guannan Jiang; Chengjie Wang; Yuan Xie; |
1127 | Hierarchical Prompt Learning for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Significant discrepancies across tasks could cause negative transferring. Considering this, we present Hierarchical Prompt (HiPro) learning, a simple and effective method for jointly adapting a pre-trained VLM to multiple downstream tasks. |
Yajing Liu; Yuning Lu; Hao Liu; Yaozu An; Zhuoran Xu; Zhuokun Yao; Baofeng Zhang; Zhiwei Xiong; Chenguang Gui; |
1128 | Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tap the potential of learning-based sensor noise modeling, we investigate the noise formation in a typical imaging process and propose a novel physics-guided ISO-dependent sensor noise modeling approach. |
Yue Cao; Ming Liu; Shuai Liu; Xiaotao Wang; Lei Lei; Wangmeng Zuo; |
1129 | RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies how to keep a vision backbone effective while removing token mixers in its basic building blocks. |
Jiahao Wang; Songyang Zhang; Yong Liu; Taiqiang Wu; Yujiu Yang; Xihui Liu; Kai Chen; Ping Luo; Dahua Lin; |
1130 | Context-Based Trit-Plane Coding for Progressive Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the context-based trit-plane coding (CTC) algorithm to achieve progressive compression more compactly. |
Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim; |
1131 | Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In turn, methods that purely rely on point clouds are unable to meet the matching quality of mesh-based methods that utilise the additional topological structure. In this work we close this gap by introducing a self-supervised multimodal learning strategy that combines mesh-based functional map regularisation with a contrastive loss that couples mesh and point cloud data. |
Dongliang Cao; Florian Bernard; |
1132 | Recurrent Vision Transformers for Object Detection With Event Cameras Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras. |
Mathias Gehrig; Davide Scaramuzza; |
1133 | Ham2Pose: Animating Sign Language Notation Into Pose Sequences Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. To achieve this goal, we propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences. |
Rotem Shalev Arkushin; Amit Moryossef; Ohad Fried; |
1134 | Open-Set Likelihood Maximization for Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the observation that existing transductive methods perform poorly in open-set scenarios, we propose a generalization of the maximum likelihood principle, in which latent scores down-weighing the influence of potential outliers are introduced alongside the usual parametric model. |
Malik Boudiaf; Etienne Bennequin; Myriam Tami; Antoine Toubhans; Pablo Piantanida; Celine Hudelot; Ismail Ben Ayed; |
1135 | DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we point out the reason is insufficient Discriminative feature learning for all of the classes. |
Jiawei Ma; Yulei Niu; Jincheng Xu; Shiyuan Huang; Guangxing Han; Shih-Fu Chang; |
1136 | Boosting Accuracy and Robustness of Student Models Via Adaptive Adversarial Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an adaptive adversarial distillation (AdaAD) that involves the teacher model in the knowledge optimization process in a way interacting with the student model to adaptively search for the inner results. |
Bo Huang; Mingyang Chen; Yi Wang; Junda Lu; Minhao Cheng; Wei Wang; |
1137 | METransformer: Radiology Report Generation By Transformer With Multiple Learnable Expert Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose METransformer, a method to realize this idea with a transformer-based backbone. |
Zhanyu Wang; Lingqiao Liu; Lei Wang; Luping Zhou; |
1138 | PixHt-Lab: Pixel Height Based Light Effect Generation for Image Compositing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PixHt-Lab, a system leveraging an explicit mapping from pixel height representation to 3D space. |
Yichen Sheng; Jianming Zhang; Julien Philip; Yannick Hold-Geoffroy; Xin Sun; He Zhang; Lu Ling; Bedrich Benes; |
1139 | A Soma Segmentation Benchmark in Full Adult Fly Brain Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop an efficient soma reconstruction method for obtaining accurate soma distribution and morphology information in a full adult fly brain. |
Xiaoyu Liu; Bo Hu; Mingxing Li; Wei Huang; Yueyi Zhang; Zhiwei Xiong; |
1140 | RGB No More: Minimally-Decoded JPEG Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, our work focuses on training Vision Transformers (ViT) directly from the encoded features of JPEG. |
Jeongsoo Park; Justin Johnson; |
1141 | Revealing The Dark Secrets of Masked Image Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we compare MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences. |
Zhenda Xie; Zigang Geng; Jingcheng Hu; Zheng Zhang; Han Hu; Yue Cao; |
1142 | Fine-Grained Classification With Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate a rarely studied scenario of LNL on fine-grained datasets (LNL-FG), which is more practical and challenging as large inter-class ambiguities among fine-grained classes cause more noisy labels. |
Qi Wei; Lei Feng; Haoliang Sun; Ren Wang; Chenhui Guo; Yilong Yin; |
1143 | CaPriDe Learning: Confidential and Private Decentralized Learning Based on Encryption-Friendly Distillation Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a framework called Confidential and Private Decentralized (CaPriDe) learning, which optimally leverages the power of fully homomorphic encryption (FHE) to enable collaborative learning without compromising on the confidentiality and privacy of data. |
Nurbek Tastan; Karthik Nandakumar; |
1144 | Hybrid Active Learning Via Deep Clustering for Video Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on reducing the annotation cost for video action detection which requires costly frame-wise dense annotations. |
Aayush J. Rana; Yogesh S. Rawat; |
1145 | Fine-Grained Image-Text Matching By Cross-Modal Hard Aligning Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to learn fine-grained image-text matching from the perspective of information coding. |
Zhengxin Pan; Fangyu Wu; Bailing Zhang; |
1146 | Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an approach to learn instance-dependent attention patterns, by devising a lightweight connectivity predictor module that estimates the connectivity score of each pair of tokens. |
Cong Wei; Brendan Duke; Ruowei Jiang; Parham Aarabi; Graham W. Taylor; Florian Shkurti; |
1147 | Structured Sparsity Learning for Efficient Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing VSR models contain considerable redundant filters, which drag down the inference efficiency. To prune these unimportant filters, we develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of VSR. |
Bin Xia; Jingwen He; Yulun Zhang; Yitong Wang; Yapeng Tian; Wenming Yang; Luc Van Gool; |
1148 | CAP: Robust Point Cloud Classification Via Semantic and Structural Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, defending against adversarial examples in point cloud data is extremely difficult due to the emergence of various attack strategies. In this work, with the insight of the fact that the adversarial examples in this task still preserve the same semantic and structural information as the original input, we design a novel defense framework for improving the robustness of existing classification models, which consists of two main modules: the attention-based pooling and the dynamic contrastive learning. |
Daizong Ding; Erling Jiang; Yuanmin Huang; Mi Zhang; Wenxuan Li; Min Yang; |
1149 | "Seeing" Electric Network Frequency From Events Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the performance of Video-based ENF (V-ENF) estimation largely relies on the imaging quality and thus may suffer from significant interference caused by non-ideal sampling, motion, and extreme lighting conditions. In this paper, we show that the ENF can be extracted without the above limitations from a new modality provided by the so-called event camera, a neuromorphic sensor that encodes the light intensity variations and asynchronously emits events with extremely high temporal resolution and high dynamic range. |
Lexuan Xu; Guang Hua; Haijian Zhang; Lei Yu; Ning Qiao; |
1150 | MMVC: Learned Multi-Mode Video Compression With Block-Based Prediction Mode Selection and Density-Adaptive Entropy Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose multi-mode video compression (MMVC), a block wise mode ensemble deep video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns. |
Bowen Liu; Yu Chen; Rakesh Chowdary Machineni; Shiyu Liu; Hun-Seok Kim; |
1151 | Visual-Tactile Sensing for In-Hand Object Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: With the availability of open-source tactile sensors such as DIGIT, research on visual-tactile learning is becoming more accessible and reproducible. Leveraging this tactile sensor, we propose a novel visual-tactile in-hand object reconstruction framework VTacO, and extend it to VTacOH for hand-object reconstruction. |
Wenqiang Xu; Zhenjun Yu; Han Xue; Ruolin Ye; Siqiong Yao; Cewu Lu; |
1152 | VMAP: Vectorised Object Mapping for Neural Field SLAM Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present vMAP, an object-level dense SLAM system using neural field representations. |
Xin Kong; Shikun Liu; Marwan Taher; Andrew J. Davison; |
1153 | Images Speak in Images: A Generalist Painter for In-Context Visual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But in computer vision, the difficulties for in-context learning lie in that tasks vary significantly in the output representations, thus it is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks. In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images. |
Xinlong Wang; Wen Wang; Yue Cao; Chunhua Shen; Tiejun Huang; |
1154 | Omni Aggregation Networks for Lightweight Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks, this work proposes two enhanced components under a new Omni-SR architecture. |
Hang Wang; Xuanhong Chen; Bingbing Ni; Yutian Liu; Jinfan Liu; |
1155 | StyLess: Boosting The Transferability of Adversarial Examples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve attack transferability, we propose a novel attack method called style-less perturbation (StyLess). |
Kaisheng Liang; Bin Xiao; |
1156 | Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose L2G-NeRF, a Local-to-Global registration method for bundle-adjusting Neural Radiance Fields: first, a pixel-wise flexible alignment, followed by a frame-wise constrained parametric alignment. |
Yue Chen; Xingyu Chen; Xuan Wang; Qi Zhang; Yu Guo; Ying Shan; Fei Wang; |
1157 | Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The coexistence of in-distribution and out-of-distribution samples will exacerbate the model overfitting when no distinction is made. To address this problem, we propose a novel uncertainty-aware optimal transport scheme. |
Fan Lu; Kai Zhu; Wei Zhai; Kecheng Zheng; Yang Cao; |
1158 | FJMP: Factorized Joint Multi-Agent Motion Prediction Over Learned Directed Acyclic Interaction Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of generating a set of scene-level, or joint, future trajectory predictions in multi-agent driving scenarios. |
Luke Rowe; Martin Ethier; Eli-Henry Dykhne; Krzysztof Czarnecki; |
1159 | Exploring The Effect of Primitives for Compositional Generalization in Vision-and-Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore the effect of primitives for compositional generalization in V&L. |
Chuanhao Li; Zhen Li; Chenchen Jing; Yunde Jia; Yuwei Wu; |
1160 | Correlational Image Modeling for Self-Supervised Visual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Correlational Image Modeling (CIM), a novel but surprisingly effective approach to self-supervised visual pre-training. |
Wei Li; Jiahao Xie; Chen Change Loy; |
1161 | DC2: Dual-Camera Defocus Control By Learning To Refocus Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: At the same time, many smartphones now have multiple cameras with different fixed apertures – specifically, an ultra-wide camera with wider field of view and deeper DoF and a higher resolution primary camera with shallower DoF. In this work, we propose DC^2, a system for defocus control for synthetically varying camera aperture, focus distance and arbitrary defocus effects by fusing information from such a dual-camera system. |
Hadi Alzayer; Abdullah Abuolaim; Leung Chun Chan; Yang Yang; Ying Chen Lou; Jia-Bin Huang; Abhishek Kar; |
1162 | MISC210K: A Large-Scale Dataset for Multi-Instance Semantic Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a dual-path collaborative learning pipeline to train instance-level co-segmentation task and fine-grained level correspondence task together. |
Yixuan Sun; Yiwen Huang; Haijing Guo; Yuzhou Zhao; Runmin Wu; Yizhou Yu; Weifeng Ge; Wenqiang Zhang; |
1163 | Self-Supervised Implicit Glyph Attention for Text Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). |
Tongkun Guan; Chaochen Gu; Jingzheng Tu; Xue Yang; Qi Feng; Yudi Zhao; Wei Shen; |
1164 | ACL-SPC: Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although there has been steep progress in the supervised methods on the synthetic point cloud completion task, it is hardly applicable in real-world scenarios due to the domain gap between the synthetic and real-world datasets or the requirement of prior information. To overcome these limitations, we propose a novel self-supervised framework ACL-SPC for point cloud completion to train and test on the same data. |
Sangmin Hong; Mohsen Yavartanoo; Reyhaneh Neshatavar; Kyoung Mu Lee; |
1165 | MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose MAsked Generative Encoder (MAGE), the first framework to unify SOTA image generation and self-supervised representation learning. |
Tianhong Li; Huiwen Chang; Shlok Mishra; Han Zhang; Dina Katabi; Dilip Krishnan; |
1166 | Focus on Details: Online Multi-Object Tracking With Diverse Fine-Grained Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose exploring diverse fine-grained representation, which describes appearance comprehensively from global and local perspectives. |
Hao Ren; Shoudong Han; Huilin Ding; Ziwen Zhang; Hongwei Wang; Faquan Wang; |
1167 | DiffPose: Toward More Reliable 3D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. |
Jia Gong; Lin Geng Foo; Zhipeng Fan; Qiuhong Ke; Hossein Rahmani; Jun Liu; |
1168 | Lift3D: Synthesize 3D Training Data By Lifting 2D GAN to 3D Generative Radiance Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Lift3D, an inverted 2D-to-3D generation framework to achieve the data generation objectives. |
Leheng Li; Qing Lian; Luozhou Wang; Ningning Ma; Ying-Cong Chen; |
1169 | Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that adequate supervision can be extracted directly from the geometry of feature space. |
Xiaoyang Wang; Bingfeng Zhang; Limin Yu; Jimin Xiao; |
1170 | Learning Analytical Posterior Probability for Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite various probabilistic methods for modeling the uncertainty and ambiguity in human mesh recovery, their overall precision is limited because existing formulations for joint rotations are either not constrained to SO(3) or difficult to learn for neural networks. To address such an issue, we derive a novel analytical formulation for learning posterior probability distributions of human joint rotations conditioned on bone directions in a Bayesian manner, and based on this, we propose a new posterior-guided framework for human mesh recovery. |
Qi Fang; Kang Chen; Yinghui Fan; Qing Shuai; Jiefeng Li; Weidong Zhang; |
1171 | Looking Through The Glass: Neural Surface Reconstruction Against High Specular Reflections Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The complex ambiguity in these scenes violates the multi-view consistency, then makes it challenging for recent methods to reconstruct target objects correctly. To remedy this issue, we present a novel surface reconstruction framework, NeuS-HSR, based on implicit neural rendering. |
Jiaxiong Qiu; Peng-Tao Jiang; Yifan Zhu; Ze-Xin Yin; Ming-Ming Cheng; Bo Ren; |
1172 | Non-Contrastive Unsupervised Learning of Physiological Signals From Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present the first non-contrastive unsupervised learning framework for signal regression to mitigate the need for labelled video data. |
Jeremy Speth; Nathan Vance; Patrick Flynn; Adam Czajka; |
1173 | FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for fine-grained fashion vision-language pre-training based on fashion Symbols and Attributes Prompt (FashionSAP) to model fine-grained multi-modalities fashion attributes and characteristics. |
Yunpeng Han; Lisai Zhang; Qingcai Chen; Zhijian Chen; Zhonghua Li; Jianxin Yang; Zhao Cao; |
1174 | PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds Via Pretrained Image-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieves superior performance on open-vocabulary 2D detection. |
Minghua Liu; Yinhao Zhu; Hong Cai; Shizhong Han; Zhan Ling; Fatih Porikli; Hao Su; |
1175 | An Erudite Fine-Grained Visual Classification Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an erudite FGVC model jointly trained by several different datasets, which can efficiently and accurately predict an object’s fine-grained label across the combined label space. |
Dongliang Chang; Yujun Tong; Ruoyi Du; Timothy Hospedales; Yi-Zhe Song; Zhanyu Ma; |
1176 | MAGVLT: Masked Generative Vision-and-Language Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a unified generative vision-and-language (VL) model that can produce both images and text sequences. |
Sungwoong Kim; Daejin Jo; Donghoon Lee; Jongmin Kim; |
1177 | Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, for the first time, we propose a guided denoising framework for cross-spectral stereo images. |
Zehua Sheng; Zhu Yu; Xiongwei Liu; Si-Yuan Cao; Yuqi Liu; Hui-Liang Shen; Huaqi Zhang; |
1178 | Decoupling Human and Camera Motion From Videos in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method to reconstruct global human trajectories from videos in the wild. |
Vickie Ye; Georgios Pavlakos; Jitendra Malik; Angjoo Kanazawa; |
1179 | DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-Training Via Word-Region Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents DetCLIPv2, an efficient and scalable training framework that incorporates large-scale image-text pairs to achieve open-vocabulary object detection (OVD). |
Lewei Yao; Jianhua Han; Xiaodan Liang; Dan Xu; Wei Zhang; Zhenguo Li; Hang Xu; |
1180 | Adversarially Robust Neural Architecture Search for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, current graph NAS approaches lack robust design and are vulnerable to adversarial attacks. To tackle these challenges, we propose a novel Robust Neural Architecture search framework for GNNs (G-RNA). |
Beini Xie; Heng Chang; Ziwei Zhang; Xin Wang; Daixin Wang; Zhiqiang Zhang; Rex Ying; Wenwu Zhu; |
1181 | Affordance Grounding From Demonstration Video To Target Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle them, we propose Affordance Transformer (Afformer), which has a fine-grained transformer-based decoder that gradually refines affordance grounding. |
Joya Chen; Difei Gao; Kevin Qinghong Lin; Mike Zheng Shou; |
1182 | GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of 3D semantic segmentation from raw point clouds. |
Zihui Zhang; Bo Yang; Bing Wang; Bo Li; |
1183 | RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problem, this paper proposes a robust 2D-3D retrieval framework (RONO) to robustly learn from noisy multimodal data. |
Yanglin Feng; Hongyuan Zhu; Dezhong Peng; Xi Peng; Peng Hu; |
1184 | One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a one-stage pipeline for expressive whole-body mesh recovery, named OSX, without separate networks for each part. |
Jing Lin; Ailing Zeng; Haoqian Wang; Lei Zhang; Yu Li; |
1185 | Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This caveat naturally raises a series of interesting questions about the impact of PEs on accuracy, privacy, prediction consistency, etc. To tackle these issues, we propose a Masked Jigsaw Puzzle (MJP) position embedding method. |
Bin Ren; Yahui Liu; Yue Song; Wei Bi; Rita Cucchiara; Nicu Sebe; Wei Wang; |
1186 | LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a diffusion model named LayoutDiffusion that can obtain higher generation quality and greater controllability than the previous works. |
Guangcong Zheng; Xianpan Zhou; Xuewei Li; Zhongang Qi; Ying Shan; Xi Li; |
1187 | DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way. |
Xuan Shen; Yaohua Wang; Ming Lin; Yilun Huang; Hao Tang; Xiuyu Sun; Yanzhi Wang; |
1188 | DISC: Learning From Noisy Labels Via Dynamic Instance-Specific Selection and Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that the memorization strength of DNNs towards each instance is different and can be represented by the confidence value, which becomes larger and larger during the training process. Based on this, we propose a Dynamic Instance-specific Selection and Correction method (DISC) for learning from noisy labels (LNL). |
Yifan Li; Hu Han; Shiguang Shan; Xilin Chen; |
1189 | BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, a novel image-to-image translation method based on the Brownian Bridge Diffusion Model(BBDM) is proposed, which models image-to-image translation as a stochastic Brownian Bridge process, and learns the translation between two domains directly through the bidirectional diffusion process rather than a conditional generation process. |
Bo Li; Kaitao Xue; Bin Liu; Yu-Kun Lai; |
1190 | ConQueR: Query Contrast Voxel-DETR for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective sparse 3D detector, named Query Contrast Voxel-DETR (ConQueR), to eliminate the challenging false positives, and achieve more accurate and sparser predictions. |
Benjin Zhu; Zhe Wang; Shaoshuai Shi; Hang Xu; Lanqing Hong; Hongsheng Li; |
1191 | Probing Neural Representations of Scene Perception in A Hippocampally Dependent Task Using Artificial Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a novel scene perception benchmark inspired by a hippocampal dependent task, designed to probe the ability of DNNs to transform scenes viewed from different egocentric perspectives. |
Markus Frey; Christian F. Doeller; Caswell Barry; |
1192 | Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Imagen Editor, a cascaded diffusion model, built by fine-tuning Imagen on text-guided image inpainting. |
Su Wang; Chitwan Saharia; Ceslee Montgomery; Jordi Pont-Tuset; Shai Noy; Stefano Pellegrini; Yasumasa Onoe; Sarah Laszlo; David J. Fleet; Radu Soricut; Jason Baldridge; Mohammad Norouzi; Peter Anderson; William Chan; |
1193 | Robust Multiview Point Cloud Registration With Reliable Pose Graph Initialization and History Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new method for the multiview registration of point cloud. |
Haiping Wang; Yuan Liu; Zhen Dong; Yulan Guo; Yu-Shen Liu; Wenping Wang; Bisheng Yang; |
1194 | A Probabilistic Framework for Lifelong Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, existing TTA approaches also lack the ability to provide reliable uncertainty estimates, which is crucial when distribution shifts occur between the source and target domain. To address these issues, we present PETAL (Probabilistic lifElong Test-time Adaptation with seLf-training prior), which solves lifelong TTA using a probabilistic approach, and naturally results in (1) a student-teacher framework, where the teacher model is an exponential moving average of the student model, and (2) regularizing the model updates at inference time using the source model as a regularizer. |
Dhanajit Brahma; Piyush Rai; |
1195 | Sound to Visual Scene Generation By Audio-to-Visual Latent Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a method for generating an image of a scene from sound. |
Kim Sung-Bin; Arda Senocak; Hyunwoo Ha; Andrew Owens; Tae-Hyun Oh; |
1196 | OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Fisheye downsampling, which mimics the real-world imaging process and synthesizes more realistic low-resolution samples. |
Fanghua Yu; Xintao Wang; Mingdeng Cao; Gen Li; Ying Shan; Chao Dong; |
1197 | Text With Knowledge Graph Augmented Transformer for Video Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a text with knowledge graph augmented transformer (TextKG) for video captioning. |
Xin Gu; Guang Chen; Yufei Wang; Libo Zhang; Tiejian Luo; Longyin Wen; |
1198 | Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we improve the following three aspects of the contrastive pre-training pipeline: dataset noise, model initialization and the training objective. |
Filip Radenovic; Abhimanyu Dubey; Abhishek Kadian; Todor Mihaylov; Simon Vandenhende; Yash Patel; Yi Wen; Vignesh Ramanathan; Dhruv Mahajan; |
1199 | PointCMP: Contrastive Mask Prediction for Self-Supervised Learning on Point Cloud Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a contrastive mask prediction (PointCMP) framework for self-supervised learning on point cloud videos. |
Zhiqiang Shen; Xiaoxiao Sheng; Longguang Wang; Yulan Guo; Qiong Liu; Xi Zhou; |
1200 | IS-GGT: Iterative Scene Graph Generation With Generative Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. |
Sanjoy Kundu; Sathyanarayanan N. Aakur; |
1201 | Meta Omnium: A Benchmark for General-Purpose Learning-To-Learn Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This naturally raises the question of whether there is any few-shot meta-learning algorithm capable of generalizing across these diverse task types? To support the community in answering this question, we introduce Meta Omnium, a dataset-of-datasets spanning multiple vision tasks including recognition, keypoint localization, semantic segmentation and regression. |
Ondrej Bohdal; Yinbing Tian; Yongshuo Zong; Ruchika Chavhan; Da Li; Henry Gouk; Li Guo; Timothy Hospedales; |
1202 | Multimodal Industrial Anomaly Detection Via Hybrid Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Multi-3D-Memory (M3DM), a novel multimodal anomaly detection method with hybrid fusion scheme: firstly, we design an unsupervised feature fusion with patch-wise contrastive learning to encourage the interaction of different modal features; secondly, we use a decision layer fusion with multiple memory banks to avoid loss of information and additional novelty classifiers to make the final decision. |
Yue Wang; Jinlong Peng; Jiangning Zhang; Ran Yi; Yabiao Wang; Chengjie Wang; |
1203 | BEV@DC: Bird’s-Eye View Assisted Training for Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose BEV@DC, a more efficient and powerful multi-modal training scheme, to boost the performance of image-guided depth completion. |
Wending Zhou; Xu Yan; Yinghong Liao; Yuankai Lin; Jin Huang; Gangming Zhao; Shuguang Cui; Zhen Li; |
1204 | BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While, we find that box-supervised methods can produce some fine segmentation masks and we wonder whether the detectors could learn from these fine masks while ignoring low-quality masks. To answer this question, we present BoxTeacher, an efficient and end-to-end training framework for high-performance weakly supervised instance segmentation, which leverages a sophisticated teacher to generate high-quality masks as pseudo labels. |
Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Qian Zhang; Wenyu Liu; |
1205 | Change-Aware Sampling and Contrastive Learning for Satellite Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage characteristics unique to satellite images to learn better self-supervised features. |
Utkarsh Mall; Bharath Hariharan; Kavita Bala; |
1206 | Large-Scale Training Data Search for Object Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. |
Yue Yao; Tom Gedeon; Liang Zheng; |
1207 | Devil Is in The Queries: Advancing Mask Transformers for Real-World Medical Image Segmentation and Out-of-Distribution Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we adopt the concept of object queries in Mask transformers to formulate semantic segmentation as a soft cluster assignment. |
Mingze Yuan; Yingda Xia; Hexin Dong; Zifan Chen; Jiawen Yao; Mingyan Qiu; Ke Yan; Xiaoli Yin; Yu Shi; Xin Chen; Zaiyi Liu; Bin Dong; Jingren Zhou; Le Lu; Ling Zhang; Li Zhang; |
1208 | KD-DLGAN: Data Limited Image Generation Via Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent advances in knowledge distillation (KD), we propose KD-GAN, a knowledge-distillation based generation framework that introduces pre-trained vision-language models for training effective data-limited image generation models. |
Kaiwen Cui; Yingchen Yu; Fangneng Zhan; Shengcai Liao; Shijian Lu; Eric P. Xing; |
1209 | Batch Model Consolidation: A Multi-Task Model Consolidation Framework Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the intuition derived from the widely applied mini-batch training, we propose Batch Model Consolidation (BMC) to support more realistic CL under conditions where multiple agents are exposed to a range of tasks. |
Iordanis Fostiropoulos; Jiaye Zhu; Laurent Itti; |
1210 | SelfME: Self-Supervised Motion Learning for Micro-Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While deep learning-based ME recognition (MER) methods achieved impressive success, these methods typically require pre-processing using conventional optical flow-based methods to extract facial motions as inputs. To overcome this limitation, we proposed a novel MER framework using self-supervised learning to extract facial motion for ME (SelfME). |
Xinqi Fan; Xueli Chen; Mingjie Jiang; Ali Raza Shahid; Hong Yan; |
1211 | DR2: Diffusion-Based Robust Degradation Remover for Blind Face Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is expensive and infeasible to include every type of degradation to cover real-world cases in the training data. To tackle this robustness issue, we propose Diffusion-based Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image. |
Zhixin Wang; Ziying Zhang; Xiaoyun Zhang; Huangjie Zheng; Mingyuan Zhou; Ya Zhang; Yanfeng Wang; |
1212 | T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most existing transfer-based approaches rely on ensembling multiple models to boost the attack transferability, which is time- and resource-intensive, not to mention the difficulty of obtaining diverse models on the same task. To address this limitation, in this work, we focus on the single-model transfer-based black-box attack on object detection, utilizing only one model to achieve a high-transferability adversarial attack on multiple black-box detectors. |
Hao Huang; Ziyan Chen; Huanran Chen; Yongtao Wang; Kevin Zhang; |
1213 | LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an effective LiDAR-based method to build semantic map. |
Song Wang; Wentong Li; Wenyu Liu; Xiaolu Liu; Jianke Zhu; |
1214 | NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Those segmentations can help compare the semantics in the corresponding scales, but lack a wider view of larger temporal spans, especially when the video is complex and structured. Therefore, we present two abstractive levels of temporal segmentations and study their hierarchy to the existing fine-grained levels. |
Haoqian Wu; Keyu Chen; Haozhe Liu; Mingchen Zhuge; Bing Li; Ruizhi Qiao; Xiujun Shu; Bei Gan; Liangsheng Xu; Bo Ren; Mengmeng Xu; Wentian Zhang; Raghavendra Ramachandra; Chia-Wen Lin; Bernard Ghanem; |
1215 | Token Contrast for Weakly-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, ie, the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. |
Lixiang Ru; Heliang Zheng; Yibing Zhan; Bo Du; |
1216 | LightedDepth: Video Depth Estimation in Light of Limited Inference View Angles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While recent works consider it a simplified structure-from-motion (SfM) problem, it still differs from the SfM in that significantly fewer view angels are available in inference. This setting, however, suits the mono-depth and optical flow estimation. This observation motivates us to decouple the video depth estimation into two components, a normalized pose estimation over a flowmap and a logged residual depth estimation over a mono-depth map. |
Shengjie Zhu; Xiaoming Liu; |
1217 | Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By organically integrating the respective strengths of deep priors and hand-crafted priors, we propose an unsupervised semi-blind deblurring model which recovers the latent image from the blurry image and inaccurate blur kernel. |
Xiaole Tang; Xile Zhao; Jun Liu; Jianli Wang; Yuchun Miao; Tieyong Zeng; |
1218 | HouseDiffusion: Vector Floorplan Generation Via A Diffusion Model With Discrete and Continuous Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The paper presents a novel approach for vector-floorplan generation via a diffusion model, which denoises 2D coordinates of room/door corners with two inference objectives: 1) a single-step noise as the continuous quantity to precisely invert the continuous forward process; and 2) the final 2D coordinate as the discrete quantity to establish geometric incident relationships such as parallelism, orthogonality, and corner-sharing. |
Mohammad Amin Shabani; Sepidehsadat Hosseini; Yasutaka Furukawa; |
1219 | FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, we propose FedDM to build the global training objective from multiple local surrogate functions, which enables the server to gain a more global view of the loss landscape. |
Yuanhao Xiong; Ruochen Wang; Minhao Cheng; Felix Yu; Cho-Jui Hsieh; |
1220 | V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on V2X-Seq, we introduce three new tasks for vehicle-infrastructure cooperative (VIC) autonomous driving: VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting. |
Haibao Yu; Wenxian Yang; Hongzhi Ruan; Zhenwei Yang; Yingjuan Tang; Xu Gao; Xin Hao; Yifeng Shi; Yifeng Pan; Ning Sun; Juan Song; Jirui Yuan; Ping Luo; Zaiqing Nie; |
1221 | PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. |
Zhongwei Qiu; Qiansheng Yang; Jian Wang; Haocheng Feng; Junyu Han; Errui Ding; Chang Xu; Dongmei Fu; Jingdong Wang; |
1222 | Bit-Shrinking: Limiting Instantaneous Sharpness for Improving Post-Training Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To smooth the rugged loss surface, we propose to limit the sharpness term small and stable during optimization. |
Chen Lin; Bo Peng; Zheyang Li; Wenming Tan; Ye Ren; Jun Xiao; Shiliang Pu; |
1223 | LSTFE-Net:Long Short-Term Feature Enhancement Network for Video Small Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that context information from the long-term frame and temporal information from the short-term frame are two useful cues for video small object detection. To fully utilize these two cues, we propose a long short-term feature enhancement network (LSTFE-Net) for video small object detection. |
Jinsheng Xiao; Yuanxu Wu; Yunhua Chen; Shurui Wang; Zhongyuan Wang; Jiayi Ma; |
1224 | MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. |
Lukas Hoyer; Dengxin Dai; Haoran Wang; Luc Van Gool; |
1225 | Bridging The Gap Between Model Explanations in Partially Annotated Multi-Label Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We observe that the explanation of two models, trained with full and partial labels each, highlights similar regions but with different scaling, where the latter tends to have lower attribution scores. Based on these findings, we propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels. |
Youngwook Kim; Jae Myung Kim; Jieun Jeong; Cordelia Schmid; Zeynep Akata; Jungwoo Lee; |
1226 | SkyEye: Self-Supervised Bird’s-Eye-View Semantic Mapping Using Monocular Frontal View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing approaches for generating these maps still follow a fully supervised training paradigm and hence rely on large amounts of annotated BEV data. In this work, we address this limitation by proposing the first self-supervised approach for generating a BEV semantic map using a single monocular image from the frontal view (FV). |
Nikhil Gosala; Kürsat Petek; Paulo L. J. Drews-Jr; Wolfram Burgard; Abhinav Valada; |
1227 | Unifying Vision, Text, and Layout for Universal Document Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. |
Zineng Tang; Ziyi Yang; Guoxin Wang; Yuwei Fang; Yang Liu; Chenguang Zhu; Michael Zeng; Cha Zhang; Mohit Bansal; |
1228 | SparsePose: Sparse-View Camera Pose Regression and Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose Sparse-View Camera Pose Regression and Refinement (SparsePose) for recovering accurate camera poses given a sparse set of wide-baseline images (fewer than 10). |
Samarth Sinha; Jason Y. Zhang; Andrea Tagliasacchi; Igor Gilitschenski; David B. Lindell; |
1229 | Learning Audio-Visual Source Localization Via False Negative Aware Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, for an audio sample, treating the frames from the same audio class as negative samples may mislead the model and therefore harm the learned representations (e.g., the audio of a siren wailing may reasonably correspond to the ambulances in multiple images). Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples. |
Weixuan Sun; Jiayi Zhang; Jianyuan Wang; Zheyuan Liu; Yiran Zhong; Tianpeng Feng; Yandong Guo; Yanhao Zhang; Nick Barnes; |
1230 | VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This appealing ability is vital for recognition and understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images. |
Yiming Li; Zhiding Yu; Christopher Choy; Chaowei Xiao; Jose M. Alvarez; Sanja Fidler; Chen Feng; Anima Anandkumar; |
1231 | Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim ambitiously for a more realistic and challenging task – joint video multi-frame interpolation and deblurring under unknown exposure time. |
Wei Shang; Dongwei Ren; Yi Yang; Hongzhi Zhang; Kede Ma; Wangmeng Zuo; |
1232 | Flow Supervision for Deformable NeRF Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we present a new method for deformable NeRF that can directly use optical flow as supervision. |
Chaoyang Wang; Lachlan Ewen MacDonald; László A. Jeni; Simon Lucey; |
1233 | MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a novel problem in egocentric action recognition, which we term as "Multimodal Generalization" (MMG). |
Xinyu Gong; Sreyas Mohan; Naina Dhingra; Jean-Charles Bazin; Yilei Li; Zhangyang Wang; Rakesh Ranjan; |
1234 | Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel text-to-parameter translation method (T2P) to achieve zero-shot text-driven game character auto-creation. |
Rui Zhao; Wei Li; Zhipeng Hu; Lincheng Li; Zhengxia Zou; Zhenwei Shi; Changjie Fan; |
1235 | PIVOT: Prompting for Video Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we address the problem of continual learning for video data. |
Andrés Villa; Juan León Alcázar; Motasem Alfarra; Kumail Alhamoud; Julio Hurtado; Fabian Caba Heilbron; Alvaro Soto; Bernard Ghanem; |
1236 | Dual-Bridging With Adversarial Noise Generation for Domain Adaptive RPPG Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to improve the generalization ability of rPPG models, we propose a dual-bridging network to reduce the domain discrepancy by aligning intermediate domains and synthesizing the target noise in the source domain for better noise reduction. |
Jingda Du; Si-Qi Liu; Bochao Zhang; Pong C. Yuen; |
1237 | Panoptic Video Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). |
Jingkang Yang; Wenxuan Peng; Xiangtai Li; Zujin Guo; Liangyu Chen; Bo Li; Zheng Ma; Kaiyang Zhou; Wayne Zhang; Chen Change Loy; Ziwei Liu; |
1238 | 3D Video Object Detection With Learnable Object-Centric Global Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose BA-Det, an end-to-end optimizable object detector with object-centric temporal correspondence learning and featuremetric object bundle adjustment. |
Jiawei He; Yuntao Chen; Naiyan Wang; Zhaoxiang Zhang; |
1239 | Improving The Transferability of Adversarial Samples By Path-Augmented Method Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the pitfall, we propose the Path-Augmented Method (PAM). |
Jianping Zhang; Jen-tse Huang; Wenxuan Wang; Yichen Li; Weibin Wu; Xiaosen Wang; Yuxin Su; Michael R. Lyu; |
1240 | Robust Mean Teacher for Continual and Gradual Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose and show that in the setting of TTA, the symmetric cross-entropy is better suited as a consistency loss for mean teachers compared to the commonly used cross-entropy. |
Mario Döbler; Robert A. Marsden; Bin Yang; |
1241 | Understanding Imbalanced Semantic Segmentation Through Neural Collapse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. |
Zhisheng Zhong; Jiequan Cui; Yibo Yang; Xiaoyang Wu; Xiaojuan Qi; Xiangyu Zhang; Jiaya Jia; |
1242 | MOVES: Manipulated Objects in Video Enable Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method that uses manipulation to learn to understand the objects people hold and as well as hand-object contact. |
Richard E. L. Higgins; David F. Fouhey; |
1243 | Generating Holistic 3D Human Motion From Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work addresses the problem of generating 3D holistic body motions from human speech. |
Hongwei Yi; Hualin Liang; Yifei Liu; Qiong Cao; Yandong Wen; Timo Bolkart; Dacheng Tao; Michael J. Black; |
1244 | NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies implicit surface reconstruction leveraging differentiable ray casting. |
Bowen Cai; Jinchi Huang; Rongfei Jia; Chengfei Lv; Huan Fu; |
1245 | HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. |
Shan Ning; Longtian Qiu; Yongfei Liu; Xuming He; |
1246 | ShadowNeuS: Neural SDF Reconstruction By Shadow Ray Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, shadow rays between the light source and the scene have yet to be considered. Therefore, we propose a novel shadow ray supervision scheme that optimizes both the samples along the ray and the ray location. |
Jingwang Ling; Zhibo Wang; Feng Xu; |
1247 | Generalized UAV Object Detection Via Frequency Domain Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When deploying the Unmanned Aerial Vehicles object detection (UAV-OD) network to complex and unseen real-world scenarios, the generalization ability is usually reduced due to the domain shift. To address this issue, this paper proposes a novel frequency domain disentanglement method to improve the UAV-OD generalization. |
Kunyu Wang; Xueyang Fu; Yukun Huang; Chengzhi Cao; Gege Shi; Zheng-Jun Zha; |
1248 | Boosting Weakly-Supervised Temporal Action Localization With Text Information Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to leverage the text information to boost WTAL from two aspects, i.e., (a) the discriminative objective to enlarge the inter-class difference, thus reducing the over-complete; (b) the generative objective to enhance the intra-class integrity, thus finding more complete temporal boundaries. |
Guozhang Li; De Cheng; Xinpeng Ding; Nannan Wang; Xiaoyu Wang; Xinbo Gao; |
1249 | DINER: Disorder-Invariant Implicit Neural Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the capacity of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be largely solved by re-arranging the coordinates of the input signal, for which we propose the disorder-invariant implicit neural representation (DINER) by augmenting a hash-table to a traditional INR backbone. |
Shaowen Xie; Hao Zhu; Zhen Liu; Qi Zhang; You Zhou; Xun Cao; Zhan Ma; |
1250 | A Light Touch Approach to Teaching Transformers Multi-View Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This flexibility can be problematic in tasks that involve multiple-view geometry, due to the near-infinite possible variations in 3D shapes and viewpoints (requiring flexibility), and the precise nature of projective geometry (obeying rigid laws). To resolve this conundrum, we propose a "light touch" approach, guiding visual Transformers to learn multiple-view geometry but allowing them to break free when needed. |
Yash Bhalgat; João F. Henriques; Andrew Zisserman; |
1251 | Trade-Off Between Robustness and Accuracy of Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Trade-off between Robustness and Accuracy of Vision Transformers (TORA-ViTs), which aims to efficiently transfer ViT models pretrained on natural tasks for both accuracy and robustness. |
Yanxi Li; Chang Xu; |
1252 | Focused and Collaborative Feedback Integration for Interactive Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods overlook the importance of feedback or simply concatenate it with the original input, leading to underutilization of feedback and an increase in the number of required annotations. To address this, we propose an approach called Focused and Collaborative Feedback Integration (FCFI) to fully exploit the feedback for click-based interactive image segmentation. |
Qiaoqiao Wei; Hui Zhang; Jun-Hai Yong; |
1253 | Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents an approach for detecting educational content in online videos. |
Rohit Gupta; Anirban Roy; Claire Christensen; Sujeong Kim; Sarah Gerard; Madeline Cincebeaux; Ajay Divakaran; Todd Grindal; Mubarak Shah; |
1254 | Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Graph-based Spatial Consistency Network (GraphSCNet) to filter outliers for non-rigid registration. |
Zheng Qin; Hao Yu; Changjian Wang; Yuxing Peng; Kai Xu; |
1255 | Source-Free Adaptive Gaze Estimation By Uncertainty Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to privacy and efficiency concerns, simultaneous access to annotated source data and to-be-predicted target data can be challenging. In light of this, we present an unsupervised source-free domain adaptation approach for gaze estimation, which adapts a source-trained gaze estimator to unlabeled target domains without source data. |
Xin Cai; Jiabei Zeng; Shiguang Shan; Xilin Chen; |
1256 | Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel local attention module, Slide Attention, which leverages common convolution operations to achieve high efficiency, flexibility and generalizability. |
Xuran Pan; Tianzhu Ye; Zhuofan Xia; Shiji Song; Gao Huang; |
1257 | NeRF-Supervised Deep Stereo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth. |
Fabio Tosi; Alessio Tonioni; Daniele De Gregorio; Matteo Poggi; |
1258 | Decoupled Multimodal Distilling for Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the impressive performance of previous MER approaches, the inherent multimodal heterogeneities still haunt and the contribution of different modalities varies significantly. In this work, we mitigate this issue by proposing a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation, aiming to enhance the discriminative features of each modality. |
Yong Li; Yuanzhi Wang; Zhen Cui; |
1259 | SuperDisco: Super-Class Discovery Improves Visual Recognition for The Long-Tail Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Humans, by contrast, effortlessly handle the long-tailed recognition challenge, since they can learn the tail representation based on different levels of semantic abstraction, making the learned tail features more discriminative. This phenomenon motivated us to propose SuperDisco, an algorithm that discovers super-class representations for long-tailed recognition using a graph model. |
Yingjun Du; Jiayi Shen; Xiantong Zhen; Cees G. M. Snoek; |
1260 | DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by traditional structure-from-motion (SfM) principles, we propose the DualRefine model, which tightly couples depth and pose estimation through a feedback loop. |
Antyanta Bangunharcana; Ahmed Magd; Kyung-Soo Kim; |
1261 | Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new regularization mechanism for meta-learning — Minimax-Meta Regularization, which employs inverted regularization at the inner loop and ordinary regularization at the outer loop during training. |
Lianzhe Wang; Shiji Zhou; Shanghang Zhang; Xu Chu; Heng Chang; Wenwu Zhu; |
1262 | SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As an alternative to large models, we present SmallCap, which generates a caption conditioned on an input image and related captions retrieved from a datastore. |
Rita Ramos; Bruno Martins; Desmond Elliott; Yova Kementchedjhieva; |
1263 | Unifying Layout Generation With A Decoupled Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diverse application scenarios impose a big challenge in unifying various layout generation subtasks, including conditional and unconditional generation. In this paper, we propose a Layout Diffusion Generative Model (LDGM) to achieve such unification with a single decoupled diffusion model. |
Mude Hui; Zhizheng Zhang; Xiaoyi Zhang; Wenxuan Xie; Yuwang Wang; Yan Lu; |
1264 | Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Implicit Two Hands (Im2Hands), the first neural implicit representation of two interacting hands. |
Jihyun Lee; Minhyuk Sung; Honggyu Choi; Tae-Kyun Kim; |
1265 | Long-Term Visual Localization With Mobile Sensors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the remarkable advances in image matching and pose estimation, image-based localization of a camera in a temporally-varying outdoor environment is still a challenging problem due to huge appearance disparity between query and reference images caused by illumination, seasonal and structural changes. In this work, we propose to leverage additional sensors on a mobile phone, mainly GPS, compass, and gravity sensor, to solve this challenging problem. |
Shen Yan; Yu Liu; Long Wang; Zehong Shen; Zhen Peng; Haomin Liu; Maojun Zhang; Guofeng Zhang; Xiaowei Zhou; |
1266 | Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the fact that two images of the same place only partially share visual cues due to camera pose differences, we deploy an automatic re-annotation strategy to re-label VPR datasets. |
María Leyva-Vallina; Nicola Strisciuglio; Nicolai Petkov; |
1267 | Dynamic Neural Network for Multi-Task Learning Searching Across Diverse Network Topologies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new MTL framework that searches for structures optimized for multiple tasks with diverse graph topologies and shares features among tasks. |
Wonhyeok Choi; Sunghoon Im; |
1268 | Relightable Neural Human Assets From Multi-View Gradient Illuminations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To promote research in both fields, in this paper, we present UltraStage, a new 3D human dataset that contains more than 2,000 high-quality human assets captured under both multi-view and multi-illumination settings. |
Taotao Zhou; Kai He; Di Wu; Teng Xu; Qixuan Zhang; Kuixiang Shao; Wenzheng Chen; Lan Xu; Jingyi Yu; |
1269 | Probing Sentiment-Oriented Pre-Training Inspired By Human Sentiment Perception Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it boosts performance for a big margin against initializing model states from random, we argue that DCNNs simply pre-trained on ImageNet may excessively focus on recognizing objects, but failed to provide high-level concepts in terms of sentiment. To address this long-term overlooked problem, we propose a sentiment-oriented pre-training method that is built upon human visual sentiment perception (VSP) mechanism. |
Tinglei Feng; Jiaxuan Liu; Jufeng Yang; |
1270 | Imitation Learning As State Matching Via Differentiable Physics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability. |
Siwei Chen; Xiao Ma; Zhongwen Xu; |
1271 | OpenMix: Exploring Outlier Samples for Misclassification Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we exploit the easily available outlier samples, i.e., unlabeled samples coming from non-target classes, for helping detect misclassification errors. |
Fei Zhu; Zhen Cheng; Xu-Yao Zhang; Cheng-Lin Liu; |
1272 | Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a GNN-based model that explores multivariate relationships and captures the varying importance of emotion discrepancy and commonality by valuing multi-frequency signals. |
Feiyu Chen; Jie Shao; Shuyuan Zhu; Heng Tao Shen; |
1273 | Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a two-stage weakly supervised approach, where the segmentation model trained with the incomplete binary masks in Stage1 will facilitate the self-supervised learning of the motion prediction network in Stage2 by estimating possible moving foregrounds in advance. |
Ruibo Li; Hanyu Shi; Ziang Fu; Zhe Wang; Guosheng Lin; |
1274 | TOPLight: Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel task-oriented pretrained lightweight neural network (TOPLight) for VI recognition. |
Hao Yu; Xu Cheng; Wei Peng; |
1275 | DeFeeNet: Consecutive 3D Human Motion Prediction With Deviation Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DeFeeNet, a simple yet effective network that can be added on existing one-off prediction models to realize deviation perception and feedback when applied to consecutive motion prediction task. |
Xiaoning Sun; Huaijiang Sun; Bin Li; Dong Wei; Weiqing Li; Jianfeng Lu; |
1276 | Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention. |
Brandon Clark; Alec Kerrigan; Parth Parag Kulkarni; Vicente Vivanco Cepeda; Mubarak Shah; |
1277 | Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, inspired by the train-time calibration methods, we propose a novel auxiliary loss formulation that explicitly aims to align the class confidence of bounding boxes with the accurateness of predictions (i.e. precision). |
Muhammad Akhtar Munir; Muhammad Haris Khan; Salman Khan; Fahad Shahbaz Khan; |
1278 | DyLiN: Making Light Field Networks Dynamic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Dynamic Light Field Network (DyLiN) method that can handle non-rigid deformations, including topological changes. |
Heng Yu; Joel Julin; Zoltán Á. Milacski; Koichiro Niinuma; László A. Jeni; |
1279 | Critical Learning Periods for Multisensory Integration in Deep Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better understand how the internal representations change according to disturbances or sensory deficits, we introduce a new measure of source sensitivity, which allows us to track the inhibition and integration of sources during training. |
Michael Kleinman; Alessandro Achille; Stefano Soatto; |
1280 | Human Guided Ground-Truth Generation for Realistic Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though great progress has been achieved, such an LR-HR pair generation scheme has several limitations. First, the perceptual quality of HR images may not be high enough, limiting the quality of Real-ISR outputs. Second, existing schemes do not consider much human perception in GT generation, and the trained models tend to produce over-smoothed results or unpleasant artifacts. With the above considerations, we propose a human guided GT generation scheme. |
Du Chen; Jie Liang; Xindong Zhang; Ming Liu; Hui Zeng; Lei Zhang; |
1281 | GarmentTracking: Category-Level Garment Pose Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a complete package to address the category-level garment pose tracking task: (1) A recording system VR-Garment, with which users can manipulate virtual garment models in simulation through a VR interface. |
Han Xue; Wenqiang Xu; Jieyi Zhang; Tutian Tang; Yutong Li; Wenxin Du; Ruolin Ye; Cewu Lu; |
1282 | Mask DINO: Towards A Unified Transformer-Based Framework for Object Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present Mask DINO, a unified object detection and segmentation framework. |
Feng Li; Hao Zhang; Huaizhe Xu; Shilong Liu; Lei Zhang; Lionel M. Ni; Heung-Yeung Shum; |
1283 | Align and Attend: Multimodal Summarization With Dual Contrastive Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples. To address this issue, we introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input. |
Bo He; Jun Wang; Jielin Qiu; Trung Bui; Abhinav Shrivastava; Zhaowen Wang; |
1284 | SinGRAF: Learning A 3D Generative Radiance Field for A Single Scene Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene. |
Minjung Son; Jeong Joon Park; Leonidas Guibas; Gordon Wetzstein; |
1285 | Self-Supervised AutoFlow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Observing a strong correlation between the ground truth search metric and self-supervised losses, we introduce self-supervised AutoFlow to handle real-world videos without ground truth labels. |
Hsin-Ping Huang; Charles Herrmann; Junhwa Hur; Erika Lu; Kyle Sargent; Austin Stone; Ming-Hsuan Yang; Deqing Sun; |
1286 | MagicNet: Semi-Supervised Multi-Organ Segmentation Via Magic-Cube Partition and Recovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel teacher-student model for semi-supervised multi-organ segmentation. |
Duowen Chen; Yunhao Bai; Wei Shen; Qingli Li; Lequan Yu; Yan Wang; |
1287 | Neuralangelo: High-Fidelity Neural Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we present Neuralangelo, which combines the representation power of multi-resolution 3D hash grids with neural surface rendering. |
Zhaoshuo Li; Thomas Müller; Alex Evans; Russell H. Taylor; Mathias Unberath; Ming-Yu Liu; Chen-Hsuan Lin; |
1288 | Re-GAN: Data-Efficient GANs Training Via Architectural Reconfiguration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Re-GAN, a data-efficient GANs training that dynamically reconfigures GANs architecture during training to explore different sub-network structures in training time. |
Divya Saxena; Jiannong Cao; Jiahao Xu; Tarun Kulshrestha; |
1289 | Dimensionality-Varying Diffusion Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that, considering the spatial redundancy in image signals, there is no need to maintain a high dimensionality in the evolution process, especially in the early generation phase. |
Han Zhang; Ruili Feng; Zhantao Yang; Lianghua Huang; Yu Liu; Yifei Zhang; Yujun Shen; Deli Zhao; Jingren Zhou; Fan Cheng; |
1290 | FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This results in parameter inefficiency and inability to exploit inter-task relatedness. To address such issues, we propose a novel FAshion-focused Multi-task Efficient learning method for Vision-and-Language tasks (FAME-ViL) in this work. |
Xiao Han; Xiatian Zhu; Licheng Yu; Li Zhang; Yi-Zhe Song; Tao Xiang; |
1291 | Neural Intrinsic Embedding for Non-Rigid Point Cloud Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such discrepancy poses great challenges in directly establishing correspondences between point clouds sampled from deformable shapes. In light of this, we propose Neural Intrinsic Embedding (NIE) to embed each vertex into a high-dimensional space in a way that respects the intrinsic structure. |
Puhua Jiang; Mingze Sun; Ruqi Huang; |
1292 | Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first experimentally observe that layers in these SNNs mostly communicate by rate coding. Based on this rate coding property, we develop a novel rate coding SNN-specified attack method, Rate Gradient Approximation Attack (RGA). |
Tong Bu; Jianhao Ding; Zecheng Hao; Zhaofei Yu; |
1293 | Few-Shot Geometry-Aware Keypoint Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present a novel formulation that learns to localize semantically consistent keypoint definitions, even for occluded regions, for varying object categories. |
Xingzhe He; Gaurav Bharaj; David Ferman; Helge Rhodin; Pablo Garrido; |
1294 | RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision. |
Titas Anciukevičius; Zexiang Xu; Matthew Fisher; Paul Henderson; Hakan Bilen; Niloy J. Mitra; Paul Guerrero; |
1295 | Adaptive Data-Free Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: how to generate the samples with adaptive adaptability to improve Q’s generalization? To answer the above questions, in this paper, we propose an Adaptive Data-Free Quantization (AdaDFQ) method, which revisits DFQ from a zero-sum game perspective upon the sample adaptability between two players — a generator and a quantized network. |
Biao Qian; Yang Wang; Richang Hong; Meng Wang; |
1296 | Neural Vector Fields: Implicit Representation By Explicit Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Taking advantage of both advanced explicit learning process and powerful representation ability of implicit functions, we propose a novel 3D representation method, Neural Vector Fields (NVF). |
Xianghui Yang; Guosheng Lin; Zhenghao Chen; Luping Zhou; |
1297 | Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To assist and direct the 3D generation, we propose to guide our Latent-NeRF using a Sketch-Shape: an abstract geometry that defines the coarse structure of the desired object. |
Gal Metzer; Elad Richardson; Or Patashnik; Raja Giryes; Daniel Cohen-Or; |
1298 | Learning Generative Structure Prior for Blind Text Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel prior that focuses more on the character structure. |
Xiaoming Li; Wangmeng Zuo; Chen Change Loy; |
1299 | Overcoming The Trade-Off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel weakly-supervised hand shape estimation framework that integrates non-parametric mesh fitting with MANO models in an end-to-end fashion. |
Ziwei Yu; Chen Li; Linlin Yang; Xiaoxu Zheng; Michael Bi Mi; Gim Hee Lee; Angela Yao; |
1300 | Open-Vocabulary Attribute Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. |
María A. Bravo; Sudhanshu Mittal; Simon Ging; Thomas Brox; |
1301 | PEFAT: Boosting Semi-Supervised Medical Image Classification Via Pseudo-Loss Estimation and Feature Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Pseudo-loss Estimation and Feature Adversarial Training semi-supervised framework, termed as PEFAT, to boost the performance of multi-class and multi-label medical image classification from the point of loss distribution modeling and adversarial training. |
Qingjie Zeng; Yutong Xie; Zilin Lu; Yong Xia; |
1302 | TBP-Former: Learning Temporal Bird’s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is still a critical challenge to synchronize features obtained at multiple camera views and timestamps due to inevitable geometric distortions and further exploit those spatial-temporal features. To address this issue, we propose a temporal bird’s-eye-view pyramid transformer (TBP-Former) for vision-centric PnP, which includes two novel designs. |
Shaoheng Fang; Zi Wang; Yiqi Zhong; Junhao Ge; Siheng Chen; |
1303 | Ground-Truth Free Meta-Learning for Deep Compressive Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a ground-truth (GT) free meta-learning method for CS, which leverages both external and internal learning for unsupervised high-quality image reconstruction. |
Xinran Qin; Yuhui Quan; Tongyao Pang; Hui Ji; |
1304 | SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method called SHS-Net for oriented normal estimation of point clouds by learning signed hyper surfaces, which can accurately predict normals with global consistent orientation from various point clouds. |
Qing Li; Huifang Feng; Kanle Shi; Yue Gao; Yi Fang; Yu-Shen Liu; Zhizhong Han; |
1305 | DistractFlow: Improving Optical Flow Estimation Via Realistic Distractions and Pseudo-Labeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames. |
Jisoo Jeong; Hong Cai; Risheek Garrepalli; Fatih Porikli; |
1306 | Test of Time: Instilling Video-Language Models With A Sense of Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider a specific aspect of temporal understanding: consistency of time order as elicited by before/after relations. |
Piyush Bagad; Makarand Tapaswi; Cees G. M. Snoek; |
1307 | Learning To Segment Every Referring Object Point By Point Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new partially supervised training paradigm for RES, i.e., training using abundant referring bounding boxes and only a few (e.g., 1%) pixel-level referring masks. |
Mengxue Qu; Yu Wu; Yunchao Wei; Wu Liu; Xiaodan Liang; Yao Zhao; |
1308 | Seeing With Sound: Long-range Acoustic Beamforming for Multimodal Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce the first multimodal long-range acoustic beamforming dataset. We propose a neural aperture expansion method for beamforming and we validate its utility for multimodal automotive object detection. |
Praneeth Chakravarthula; Jim Aldon D’Souza; Ethan Tseng; Joe Bartusek; Felix Heide; |
1309 | OpenScene: 3D Scene Understanding With Open Vocabularies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. |
Songyou Peng; Kyle Genova; Chiyu “Max” Jiang; Andrea Tagliasacchi; Marc Pollefeys; Thomas Funkhouser; |
1310 | Movies2Scenes: Using Movie Metadata To Learn Scene Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel contrastive learning approach that uses movie metadata to learn a general-purpose scene representation. |
Shixing Chen; Chun-Hao Liu; Xiang Hao; Xiaohan Nie; Maxim Arap; Raffay Hamid; |
1311 | Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to alleviate the aforementioned problem by two principles: (1) fully utilizing the capacity of the encoder; (2) increasing the capacity of the decoder. |
Xiaosong Jia; Penghao Wu; Li Chen; Jiangwei Xie; Conghui He; Junchi Yan; Hongyang Li; |
1312 | DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. |
Haiyang Wang; Chen Shi; Shaoshuai Shi; Meng Lei; Sen Wang; Di He; Bernt Schiele; Liwei Wang; |
1313 | Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our quantitative experiments reveal that the impact of pruned tokens on performance should be noticeable. To address this issue, we propose a novel joint Token Pruning & Squeezing module (TPS) for compressing vision transformers with higher efficiency. |
Siyuan Wei; Tianzhu Ye; Shen Zhang; Yao Tang; Jiajun Liang; |
1314 | Enhancing The Self-Universality for Transferable Targeted Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel transfer-based targeted attack method that optimizes the adversarial perturbations without any extra training efforts for auxiliary networks on training data. |
Zhipeng Wei; Jingjing Chen; Zuxuan Wu; Yu-Gang Jiang; |
1315 | Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With Cross-Scale Distortion Awareness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the compression procedure confuses the semantics of different planes, yielding inferior performance with ambiguous interpretability. To address this issue, we propose to disentangle this 1D representation by pre-segmenting orthogonal (vertical and horizontal) planes from a complex scene, explicitly capturing the geometric cues for indoor layout estimation. |
Zhijie Shen; Zishuo Zheng; Chunyu Lin; Lang Nie; Kang Liao; Shuai Zheng; Yao Zhao; |
1316 | EditableNeRF: Editing Topologically Varying Neural Radiance Fields By Key Points Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Then end-users can edit the scene by easily dragging the key points to desired new positions. To achieve this, we propose a scene analysis method to detect and initialize key points by considering the dynamics in the scene, and a weighted key points strategy to model topologically varying dynamics by joint key points and weights optimization. |
Chengwei Zheng; Wenbin Lin; Feng Xu; |
1317 | Neural Map Prior for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Neural Map Prior (NMP), a neural representation of global maps that enables automatic global map updates and enhances local map inference performance. |
Xuan Xiong; Yicheng Liu; Tianyuan Yuan; Yue Wang; Yilun Wang; Hang Zhao; |
1318 | Solving Oscillation Problem in Post-Training Quantization Through A Theoretical Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. |
Yuexiao Ma; Huixia Li; Xiawu Zheng; Xuefeng Xiao; Rui Wang; Shilei Wen; Xin Pan; Fei Chao; Rongrong Ji; |
1319 | PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this global dependency can be ambiguous and lacks distinctiveness, especially in indoor low-overlap scenarios, as which the dependence with an extensive range of non-overlapping points introduces ambiguity. To address this issue, we present PEAL, a Prior-embedded Explicit Attention Learning model. |
Junle Yu; Luwei Ren; Yu Zhang; Wenhui Zhou; Lili Lin; Guojun Dai; |
1320 | NeuralEditor: Editing Neural Radiance Fields Via Manipulating Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes NeuralEditor that enables neural radiance fields (NeRFs) natively editable for general shape editing tasks. |
Jun-Kun Chen; Jipeng Lyu; Yu-Xiong Wang; |
1321 | NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present NIKI (Neural Inverse Kinematics with Invertible Neural Network), which models bi-directional errors to improve the robustness to occlusions and obtain pixel-aligned accuracy. |
Jiefeng Li; Siyuan Bian; Qi Liu; Jiasheng Tang; Fan Wang; Cewu Lu; |
1322 | Masked Image Modeling With Local Multi-Scale Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering the reconstruction task requires non-trivial inter-patch interactions to reason target signals, we apply it to multiple local layers including lower and upper layers. |
Haoqing Wang; Yehui Tang; Yunhe Wang; Jianyuan Guo; Zhi-Hong Deng; Kai Han; |
1323 | Transfer4D: A Framework for Frugal Motion Capture and Deformation Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel skeleton extraction pipeline from single-view depth sequence that incorporates additional geometric information, resulting in superior performance in motion reconstruction and transfer in comparison to the contemporary methods. |
Shubh Maheshwari; Rahul Narain; Ramya Hebbalaguppe; |
1324 | GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose GeoVLN, which learns Geometry-enhanced visual representation based on slot attention for robust Visual-and-Language Navigation. |
Jingyang Huo; Qiang Sun; Boyan Jiang; Haitao Lin; Yanwei Fu; |
1325 | KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. |
Zhongzhen Huang; Xiaofan Zhang; Shaoting Zhang; |
1326 | Flexible-Cm GAN: Towards Precise 3D Dose Prediction in Radiotherapy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Herein, we propose a novel conditional generative model, Flexible-C^m GAN, utilizing additional information regarding planning types and various beam geometries. |
Riqiang Gao; Bin Lou; Zhoubing Xu; Dorin Comaniciu; Ali Kamen; |
1327 | Randomized Adversarial Training Via Taylor Expansion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Leveraging over the studies that smoothed update on weights during training may help find flat minima and improve generalization, we suggest reconciling the robustness-accuracy trade-off from another perspective, i.e., by adding random noise into deterministic weights. |
Gaojie Jin; Xinping Yi; Dengyu Wu; Ronghui Mu; Xiaowei Huang; |
1328 | Handy: Towards A High Fidelity 3D Hand Shape and Appearance Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose "Handy", a large-scale model of the human hand, modeling both shape and appearance composed of over 1200 subjects which we make publicly available for the benefit of the research community. |
Rolandos Alexandros Potamias; Stylianos Ploumpis; Stylianos Moschoglou; Vasileios Triantafyllou; Stefanos Zafeiriou; |
1329 | Learning To Measure The Point Cloud Reconstruction Loss in A Representation Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a learning-based Contrastive Adversarial Loss (CALoss) to measure the point cloud reconstruction loss dynamically in a non-linear representation space by combining the contrastive constraint with the adversarial strategy. |
Tianxin Huang; Zhonggan Ding; Jiangning Zhang; Ying Tai; Zhenyu Zhang; Mingang Chen; Chengjie Wang; Yong Liu; |
1330 | Progressive Neighbor Consistency Mining for Correspondence Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is difficult to ensure that these neighbors are always consistent, since the distribution of false correspondences is extremely irregular. For addressing this problem, we propose a novel global-graph space to search for consistent neighbors based on a weighted global graph that can explicitly explore long-range dependencies among correspondences. |
Xin Liu; Jufeng Yang; |
1331 | Learning To Zoom and Unzoom Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work (LZU), we "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations. |
Chittesh Thavamani; Mengtian Li; Francesco Ferroni; Deva Ramanan; |
1332 | Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. |
Wenjin Wang; Yunqing Hu; Qianglong Chen; Yin Zhang; |
1333 | Bootstrapping Objectness From Videos By Relaxed Common Fate and Visual Grouping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, common fate is not a reliable indicator of objectness: Parts of an articulated / deformable object may not move at the same speed, whereas shadows / reflections of an object always move with it but are not part of it. Our insight is to bootstrap objectness by first learning image features from relaxed common fate and then refining them based on visual appearance grouping within the image itself and across images statistically. |
Long Lian; Zhirong Wu; Stella X. Yu; |
1334 | From Node Interaction To Hop Interaction: New Effective and Scalable Graph Learning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While considerable progress has been made, such node interaction paradigms still have the following limitation. First, the scalability limitation precludes the broad application of GNNs in large-scale industrial settings since the node interaction among rapidly expanding neighbors incurs high computation and memory costs. Second, the over-smoothing problem restricts the discrimination ability of nodes, i.e., node representations of different classes will converge to indistinguishable after repeated node interactions. In this work, we propose a novel hop interaction paradigm to address these limitations simultaneously. |
Jie Chen; Zilong Li; Yin Zhu; Junping Zhang; Jian Pu; |
1335 | Semi-Supervised Hand Appearance Recovery Via Structure Disentanglement and Dual Adversarial Discrimination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The core of our approach is to first disentangle the bare hand structure from those degraded images and then wrap the appearance to this structure with a dual adversarial discrimination (DAD) scheme. |
Zimeng Zhao; Binghui Zuo; Zhiyu Long; Yangang Wang; |
1336 | Understanding and Improving Features Learned in Deep Functional Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that under some mild conditions, the features learned within deep functional map approaches can be used as point-wise descriptors and thus are directly comparable across different shapes, even without the necessity of solving for a functional map at test time. |
Souhaib Attaiki; Maks Ovsjanikov; |
1337 | Back to The Source: Diffusion-Driven Adaptation To Test-Time Corruption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While re-training can help, it is sensitive to the amount and order of the data and the hyperparameters for optimization. We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model. |
Jin Gao; Jialing Zhang; Xihui Liu; Trevor Darrell; Evan Shelhamer; Dequan Wang; |
1338 | PartManip: Learning Cross-Category Generalizable Part Manipulation Policy From Point Cloud Observations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we build the first large-scale, part-based cross-category object manipulation benchmark, PartManip, which is composed of 11 object categories, 494 objects, and 1432 tasks in 6 task classes. |
Haoran Geng; Ziming Li; Yiran Geng; Jiayi Chen; Hao Dong; He Wang; |
1339 | Polynomial Implicit Neural Representations for Large Diverse Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Higher representational power is needed to go from representing a single given image to representing large and diverse datasets. Our approach addresses this gap by representing an image with a polynomial function and eliminates the need for positional encodings. |
Rajhans Singh; Ankita Shukla; Pavan Turaga; |
1340 | Neural Video Compression With Diverse Contexts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. |
Jiahao Li; Bin Li; Yan Lu; |
1341 | High-Frequency Stereo Matching Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Decouple module to alleviate the problem of data coupling and allow features containing subtle details to transfer across the iterations which proves to alleviate the problem significantly in the ablations. |
Haoliang Zhao; Huizhou Zhou; Yongjun Zhang; Jie Chen; Yitong Yang; Yong Zhao; |
1342 | LayoutDM: Discrete Diffusion Model for Controllable Layout Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. |
Naoto Inoue; Kotaro Kikuchi; Edgar Simo-Serra; Mayu Otani; Kota Yamaguchi; |
1343 | Markerless Camera-to-Robot Pose Estimation Via Self-Supervised Sim-to-Real Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method to scale the training to unlabeled real-world data. |
Jingpei Lu; Florian Richter; Michael C. Yip; |
1344 | CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CARTO, a novel approach for reconstructing multiple articulated objects from a single stereo RGB observation. |
Nick Heppert; Muhammad Zubair Irshad; Sergey Zakharov; Katherine Liu; Rares Andrei Ambrus; Jeannette Bohg; Abhinav Valada; Thomas Kollar; |
1345 | ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to facilitate the task of editing the geometry of 3D models through the use of natural language. |
Panos Achlioptas; Ian Huang; Minhyuk Sung; Sergey Tulyakov; Leonidas Guibas; |
1346 | Event-Guided Person Re-Identification Via Sparse-Dense Complementary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the properties of event cameras, in this work, we propose a Sparse-Dense Complementary Learning Framework, which effectively extracts identity features by fully exploiting the complementary information of dense frames and sparse events. |
Chengzhi Cao; Xueyang Fu; Hongjian Liu; Yukun Huang; Kunyu Wang; Jiebo Luo; Zheng-Jun Zha; |
1347 | Regularizing Second-Order Influences for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To regularize the second-order effects, a novel selection objective is proposed, which also has clear connections to two widely adopted criteria. |
Zhicheng Sun; Yadong Mu; Gang Hua; |
1348 | Spatial-Then-Temporal Self-Supervised Learning for Video Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the previous work concentrates on either spatial-discriminative features or temporal-repetitive features, with little attention to the synergy between spatial and temporal cues. To address this issue, we propose a novel spatial-then-temporal self-supervised learning method. |
Rui Li; Dong Liu; |
1349 | Super-Resolution Neural Operator Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Super-resolution Neural Operator (SRNO), a deep operator learning framework that can resolve high-resolution (HR) images at arbitrary scales from the low-resolution (LR) counterparts. |
Min Wei; Xuesong Zhang; |
1350 | GradICON: Approximate Diffeomorphisms Via Gradient Inverse Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present an approach to learning regular spatial transformations between image pairs in the context of medical image registration. |
Lin Tian; Hastings Greer; François-Xavier Vialard; Roland Kwitt; Raúl San José Estépar; Richard Jarrett Rushmore; Nikolaos Makris; Sylvain Bouix; Marc Niethammer; |
1351 | LP-DIF: Learning Local Pattern-Specific Deep Implicit Function for 3D Objects and Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This leads that fine geometry details could not be preserved well. To solve this problem, we propose a novel Local Pattern-specific Implicit Function, named LP-DIF, for representing a shape with some clusters of local regions and multiple decoders, where each decoder only focuses on one cluster of local regions which share a certain pattern. |
Meng Wang; Yu-Shen Liu; Yue Gao; Kanle Shi; Yi Fang; Zhizhong Han; |
1352 | PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In classic radar signal processing, the object signature is detected according to a local peak response, i.e., CFAR detection. Inspired by this idea, we redefine the receptive field of the convolution operation as the peak receptive field (PRF) and propose the peak convolution operation (PeakConv) to learn the object signatures in an end-to-end network. |
Liwen Zhang; Xinyan Zhang; Youcheng Zhang; Yufei Guo; Yuanpei Chen; Xuhui Huang; Zhe Ma; |
1353 | Unsupervised Contour Tracking of Live Cells By Mechanical and Cycle Consistency Losses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the first deep learning-based tracking of cellular (or more generally viscoelastic materials) contours with point correspondence by fusing dense representation between two contours with cross attention. |
Junbong Jang; Kwonmoo Lee; Tae-Kyun Kim; |
1354 | Explaining Image Classifiers With Multiscale Directional Image Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ShearletX, a novel mask explanation method for image classifiers based on the shearlet transform — a multiscale directional image representation. |
Stefan Kolek; Robert Windesheim; Hector Andrade-Loarca; Gitta Kutyniok; Ron Levie; |
1355 | RGBD2: Generative Scene Synthesis Via Incremental View Inpainting Using RGBD Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the challenge of recovering an underlying scene geometry and colors from a sparse set of RGBD view observations. In this work, we present a new solution termed RGBD2 that sequentially generates novel RGBD views along a camera trajectory, and the scene geometry is simply the fusion result of these views. |
Jiabao Lei; Jiapeng Tang; Kui Jia; |
1356 | Distribution Shift Inversion for Out-of-Distribution Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore how to bypass the requirement of testing distribution for distribution translator training and make the distribution translation useful for OoD prediction. |
Runpeng Yu; Songhua Liu; Xingyi Yang; Xinchao Wang; |
1357 | Deep Polarization Reconstruction With PDAVIS Events Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we strive to train an effective, yet efficient, DNN model that directly outputs polarization from the input raw polarization events. |
Haiyang Mei; Zuowen Wang; Xin Yang; Xiaopeng Wei; Tobi Delbruck; |
1358 | VideoTrack: Learning To Track Objects Via Video Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we resort to sequence-level target matching that can encode temporal contexts into the spatial features through a neat feedforward video model. |
Fei Xie; Lei Chu; Jiahao Li; Yan Lu; Chao Ma; |
1359 | System-Status-Aware Adaptive Network for Online Streaming Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, training such an agent on various types of hardware configurations is not easy as the labeled training data might not be available, or can be computationally prohibitive. To address this challenging problem, we propose a Meta Self-supervised Adaptation (MSA) method that adapts the agent’s policy to new hardware configurations at test-time, allowing for easy deployment of the model onto other unseen hardware platforms. |
Lin Geng Foo; Jia Gong; Zhipeng Fan; Jun Liu; |
1360 | Parallel Diffusion Models of Operator and Image for Blind Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that we can indeed solve a family of blind inverse problems by constructing another diffusion prior for the forward operator. |
Hyungjin Chung; Jeongsol Kim; Sehui Kim; Jong Chul Ye; |
1361 | Local-Guided Global: Paired Similarity Representation for Visual Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach, termed self-supervised Paired Similarity Representation Learning (PSRL) for effectively encoding spatial structures in an unsupervised manner. |
Hyesong Choi; Hunsang Lee; Wonil Song; Sangryul Jeon; Kwanghoon Sohn; Dongbo Min; |
1362 | Semidefinite Relaxations for Robust Multiview Triangulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. |
Linus Härenstam-Nielsen; Niclas Zeller; Daniel Cremers; |
1363 | Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We also explore a practical setup with "mixed" supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. |
Dahyun Kang; Piotr Koniusz; Minsu Cho; Naila Murray; |
1364 | FFCV: Accelerating Training By Removing Data Bottlenecks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present FFCV, a library for easy, fast, resource-efficient training of machine learning models. |
Guillaume Leclerc; Andrew Ilyas; Logan Engstrom; Sung Min Park; Hadi Salman; Aleksander Mądry; |
1365 | Collaborative Noisy Label Cleaner: Learning Scene-Aware Trailers for Multi-Modal Highlight Detection in Movies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study a more practical and promising setting, i.e., reformulating highlight detection as "learning with noisy labels". |
Bei Gan; Xiujun Shu; Ruizhi Qiao; Haoqian Wu; Keyu Chen; Hanjun Li; Bo Ren; |
1366 | Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to learn video representations by modeling Video as Stochastic Processes (VSP) via a novel process-based contrastive learning framework, which aims to discriminate between video processes and simultaneously capture the temporal dynamics in the processes. |
Heng Zhang; Daqing Liu; Qi Zheng; Bing Su; |
1367 | ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-Real Novel View Synthesis Via Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Otherwise, severe artifacts will be produced. To maintain the advantages of using synthetic data while avoiding its negative effects, we propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints. |
Hao Yang; Lanqing Hong; Aoxue Li; Tianyang Hu; Zhenguo Li; Gim Hee Lee; Liwei Wang; |
1368 | Region-Aware Pretraining for Open-Vocabulary Object Detection With Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) — a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection. |
Dahun Kim; Anelia Angelova; Weicheng Kuo; |
1369 | PaletteNeRF: Palette-Based Appearance Editing of Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. |
Zhengfei Kuang; Fujun Luan; Sai Bi; Zhixin Shu; Gordon Wetzstein; Kalyan Sunkavalli; |
1370 | Towards Unsupervised Object Detection From LiDAR Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of unsupervised object detection from 3D point clouds in self-driving scenes. |
Lunjun Zhang; Anqi Joyce Yang; Yuwen Xiong; Sergio Casas; Bin Yang; Mengye Ren; Raquel Urtasun; |
1371 | Contrastive Mean Teacher for Domain Adaptive Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we identify the intriguing alignment and synergy between mean-teacher self-training and contrastive learning. |
Shengcao Cao; Dhiraj Joshi; Liang-Yan Gui; Yu-Xiong Wang; |
1372 | Learning Transferable Spatiotemporal Representations From Natural Script Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new pretext task, Turning to Video for Transcript Sorting (TVTS), which sorts shuffled ASR scripts by attending to learned video representations. |
Ziyun Zeng; Yuying Ge; Xihui Liu; Bin Chen; Ping Luo; Shu-Tao Xia; Yixiao Ge; |
1373 | NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Dynamic Neural Radiance Field (NeRF) is a powerful algorithm capable of rendering photo-realistic novel view images from a monocular RGB video of a dynamic scene. Although it warps moving points across frames from the observation spaces to a common canonical space for rendering, dynamic NeRF does not model the change of the reflected color during the warping. As a result, this approach often fails drastically on challenging specular objects in motion. We address this limitation by reformulating the neural radiance field function to be conditioned on surface position and orientation in the observation space. |
Zhiwen Yan; Chen Li; Gim Hee Lee; |
1374 | M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Models trained on these datasets may not generalize well to real-world scenarios. Therefore, this paper introduces a large and diverse document layout analysis dataset called M^6-Doc. |
Hiuyi Cheng; Peirong Zhang; Sihang Wu; Jiaxin Zhang; Qiyuan Zhu; Zecheng Xie; Jing Li; Kai Ding; Lianwen Jin; |
1375 | RealFusion: 360deg Reconstruction of Any Object From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of reconstructing a full 360deg photographic model of an object from a single image of it. |
Luke Melas-Kyriazi; Iro Laina; Christian Rupprecht; Andrea Vedaldi; |
1376 | CiCo: Domain-Aware Sign Language Retrieval Via Cross-Lingual Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from traditional video-text retrieval, sign language videos, not only contain visual signals but also carry abundant semantic meanings by themselves due to the fact that sign languages are also natural languages. Considering this character, we formulate sign language retrieval as a cross-lingual retrieval problem as well as a video-text retrieval task. |
Jianmin Bao; Dong Chen; Wenqiang Zhang; Yiting Cheng; Fangyun Wei; |
1377 | Relational Space-Time Query in Long-Form Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, real-world applications, e.g., AR assistants, require bundling these problems for both model development and evaluation. In this paper, we propose to study these problems in a joint framework for long video understanding. |
Xitong Yang; Fu-Jen Chu; Matt Feiszli; Raghav Goyal; Lorenzo Torresani; Du Tran; |
1378 | LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when directly applying large convolutional kernels in 3D CNNs, severe difficulties are met, where those successful module designs in 2D become surprisingly ineffective on 3D networks, including the popular depth-wise convolution. To address this vital challenge, we instead propose the spatial-wise partition convolution and its large-kernel module. |
Yukang Chen; Jianhui Liu; Xiangyu Zhang; Xiaojuan Qi; Jiaya Jia; |
1379 | Video Dehazing Via A Multi-Range Temporal Alignment Network With Physical Prior Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel framework to effectively explore the physical haze priors and aggregate temporal information. |
Jiaqi Xu; Xiaowei Hu; Lei Zhu; Qi Dou; Jifeng Dai; Yu Qiao; Pheng-Ann Heng; |
1380 | 3D Concept Learning and Reasoning From Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Humans are able to accurately reason in 3D by gathering multi-view observations of the surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for 3D multi-view visual question answering (3DMV-VQA). |
Yining Hong; Chunru Lin; Yilun Du; Zhenfang Chen; Joshua B. Tenenbaum; Chuang Gan; |
1381 | BiFormer: Learning Bilateral Motion Estimation Via Bilateral Transformer for 4K Video Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A novel 4K video frame interpolator based on bilateral transformer (BiFormer) is proposed in this paper, which performs three steps: global motion estimation, local motion refinement, and frame synthesis. |
Junheum Park; Jintae Kim; Chang-Su Kim; |
1382 | Integrally Pre-Trained Transformer Pyramid Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an integral pre-training framework based on masked image modeling (MIM). |
Yunjie Tian; Lingxi Xie; Zhaozhi Wang; Longhui Wei; Xiaopeng Zhang; Jianbin Jiao; Yaowei Wang; Qi Tian; Qixiang Ye; |
1383 | Soft Augmentation for Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e.g., more aggressive image crop augmentations produce less confident learning targets. |
Yang Liu; Shen Yan; Laura Leal-Taixé; James Hays; Deva Ramanan; |
1384 | Learning From Unique Perspectives: User-Aware Saliency Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we identify the critical roles of visual preferences in attention modeling, and for the first time study the problem of user-aware saliency modeling. |
Shi Chen; Nachiappan Valliappan; Shaolei Shen; Xinyu Ye; Kai Kohlhoff; Junfeng He; |
1385 | PREIM3D: 3D Consistent Precise Image Attribute Editing From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate 3D inconsistency at large camera poses, we propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency and subject identity. |
Jianhui Li; Jianmin Li; Haoji Zhang; Shilong Liu; Zhengyi Wang; Zihao Xiao; Kaiwen Zheng; Jun Zhu; |
1386 | MaskSketch: Unpaired Structure-Guided Masked Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: MaskSketch utilizes a pre-trained masked generative transformer, requiring no model training or paired supervision, and works with input sketches of different levels of abstraction. We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation. |
Dina Bashkirova; José Lezama; Kihyuk Sohn; Kate Saenko; Irfan Essa; |
1387 | Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to enable the detector to classify novel object categories based on text prompting. |
Yuheng Lu; Chenfeng Xu; Xiaobao Wei; Xiaodong Xie; Masayoshi Tomizuka; Kurt Keutzer; Shanghang Zhang; |
1388 | Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To minimize the impact of sparsity on FL convergence, we propose Flado to improve the alignment of client model update trajectories by tailoring the sparsities of individual neurons in each client. |
Dongping Liao; Xitong Gao; Yiren Zhao; Cheng-Zhong Xu; |
1389 | Detecting Backdoors in Pre-Trained Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DECREE, the first backdoor detection approach for pre-trained encoders, requiring neither classifier headers nor input labels. |
Shiwei Feng; Guanhong Tao; Siyuan Cheng; Guangyu Shen; Xiangzhe Xu; Yingqi Liu; Kaiyuan Zhang; Shiqing Ma; Xiangyu Zhang; |
1390 | Sequential Training of GANs Against GAN-Classifiers Reveals Correlated "Knowledge Gaps" Present Among Independently Trained GAN Instances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We iteratively train GAN-classifiers and train GANs that "fool" the classifiers (in an attempt to fill the knowledge gaps), and examine the effect on GAN training dynamics, output quality, and GAN-classifier generalization. |
Arkanath Pathak; Nicholas Dufour; |
1391 | Lookahead Diffusion Probabilistic Models for Refining Mean Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose lookahead diffusion probabilistic models (LA-DPMs) to exploit the correlation in the outputs of the deep neural networks (DNNs) over subsequent timesteps in diffusion probabilistic models (DPMs) to refine the mean estimation of the conditional Gaussian distributions in the backward process. |
Guoqiang Zhang; Kenta Niwa; W. Bastiaan Kleijn; |
1392 | TensoIR: Tensorial Inverse Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose TensoIR, a novel inverse rendering approach based on tensor factorization and neural fields. |
Haian Jin; Isabella Liu; Peijia Xu; Xiaoshuai Zhang; Songfang Han; Sai Bi; Xiaowei Zhou; Zexiang Xu; Hao Su; |
1393 | NIPQ: Noise Proxy-Based Integrated Pseudo-Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, pseudo-quantization training has been proposed as an alternative approach to updating the learnable parameters using the pseudo-quantization noise instead of STE. In this study, we propose a novel noise proxy-based integrated pseudo-quantization (NIPQ) that enables unified support of pseudo-quantization for both activation and weight with minimal error by integrating the idea of truncation on the pseudo-quantization framework. |
Juncheol Shin; Junhyuk So; Sein Park; Seungyeop Kang; Sungjoo Yoo; Eunhyeok Park; |
1394 | Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces as well as address the issue of lack of unseen training data. |
Shuting He; Henghui Ding; Wei Jiang; |
1395 | Long Range Pooling for 3D Large-Scale Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the success of recent vision transformers and large kernel design in convolutional neural networks (CNNs), in this paper, we analyze and explore essential reasons for their success. |
Xiang-Li Li; Meng-Hao Guo; Tai-Jiang Mu; Ralph R. Martin; Shi-Min Hu; |
1396 | Object-Goal Visual Navigation Via Effective Exploration of Relations Among Historical Navigation States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a History-inspired Navigation Policy Learning (HiNL) framework to estimate navigation states effectively by exploring relationships among historical navigation states. |
Heming Du; Lincheng Li; Zi Huang; Xin Yu; |
1397 | Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particularly, we build a causal graph, and train the images to estimate the intraoperative attributes for final OS prediction. We present a novel Causally-aware Intraoperative Imputation Model (CAWIM) that can sequentially predict each attribute using its parent nodes in the estimated causal graph. |
Xiang Li; Xuelin Qian; Litian Liang; Lingjie Kong; Qiaole Dong; Jiejun Chen; Dingxia Liu; Xiuzhong Yao; Yanwei Fu; |
1398 | Probabilistic Knowledge Distillation of Face Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To inherit the uncertainty estimation capability from BEA without the loss of inference efficiency, we propose BEA-KD, a student model to distill knowledge from BEA. |
Jianqing Xu; Shen Li; Ailin Deng; Miao Xiong; Jiaying Wu; Jiaxiang Wu; Shouhong Ding; Bryan Hooi; |
1399 | Twin Contrastive Learning With Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present TCL, a novel twin contrastive learning model to learn robust representations and handle noisy labels for classification. |
Zhizhong Huang; Junping Zhang; Hongming Shan; |
1400 | TriVol: Point Cloud Rendering Via Triple Volumes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a dense while lightweight 3D representation, named TriVol, that can be combined with NeRF to render photo-realistic images from point clouds. |
Tao Hu; Xiaogang Xu; Ruihang Chu; Jiaya Jia; |
1401 | (ML)$^2$P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our interest is to fully explore the power of channel-class correlation as the unique base for MLZSL. |
Ziming Liu; Song Guo; Xiaocheng Lu; Jingcai Guo; Jiewei Zhang; Yue Zeng; Fushuo Huo; |
1402 | MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to reconstruct meshes and estimate MANO parameters of two hands from a single RGB image simultaneously to utilize the merits of two kinds of hand representations. |
Congyi Wang; Feida Zhu; Shilei Wen; |
1403 | Asymmetric Feature Fusion for Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an Asymmetric Feature Fusion (AFF) paradigm, which advances existing asymmetric retrieval systems by considering the complementarity among different features just at the gallery side. |
Hui Wu; Min Wang; Wengang Zhou; Zhenbo Lu; Houqiang Li; |
1404 | CREPE: Can Vision-Language Foundation Models Reason Compositionally? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, despite the performance gains contributed by large vision and language pretraining, we find that–across 7 architectures trained with 4 algorithms on massive datasets–they struggle at compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark, CREPE, which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. |
Zixian Ma; Jerry Hong; Mustafa Omer Gul; Mona Gandhi; Irena Gao; Ranjay Krishna; |
1405 | DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate how modeling 3D facial geometry in image and model space jointly can solve the occlusion and view angle problems. |
Heyuan Li; Bo Wang; Yu Cheng; Mohan Kankanhalli; Robby T. Tan; |
1406 | MoStGAN-V: Video Generation With Temporal Motion Styles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions. |
Xiaoqian Shen; Xiang Li; Mohamed Elhoseiny; |
1407 | Poly-PC: A Polyhedral Network for Multiple Point Cloud Tasks at Once Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that it is feasible to perform multiple tasks concurrently on point cloud with a straightforward yet effective multi-task network. |
Tao Xie; Shiguang Wang; Ke Wang; Linqi Yang; Zhiqiang Jiang; Xingcheng Zhang; Kun Dai; Ruifeng Li; Jian Cheng; |
1408 | HandsOff: Labeled Dataset Generation With No Additional Human Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our framework avoids the practi- cal drawbacks of prior work by unifying the field of GAN in- version with dataset generation. |
Austin Xu; Mariya I. Vasileva; Achal Dave; Arjun Seshadri; |
1409 | Semi-Supervised 2D Human Pose Estimation Driven By Position Inconsistency Pseudo Label Correction Module Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve into semi-supervised 2D human pose estimation. |
Linzhi Huang; Yulong Li; Hongbo Tian; Yue Yang; Xiangang Li; Weihong Deng; Jieping Ye; |
1410 | ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose ARKitTrack, a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple’s iPhone and iPad. |
Haojie Zhao; Junsong Chen; Lijun Wang; Huchuan Lu; |
1411 | Image As A Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves excellent transfer performance on both vision and vision-language tasks. |
Wenhui Wang; Hangbo Bao; Li Dong; Johan Bjorck; Zhiliang Peng; Qiang Liu; Kriti Aggarwal; Owais Khan Mohammed; Saksham Singhal; Subhojit Som; Furu Wei; |
1412 | Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we make the attempt to propose a density-insensitive domain adaption framework to address the density-induced domain gap. |
Qianjiang Hu; Daizong Liu; Wei Hu; |
1413 | Efficient Verification of Neural Networks Against LVM-Based Specifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present an efficient approach for verifying specifications definable using Latent Variable Models that capture such diverse changes. |
Harleen Hanspal; Alessio Lomuscio; |
1414 | Learning Action Changes By Measuring Verb-Adverb Textual Relationships Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this work is to understand the way actions are performed in videos. |
Davide Moltisanti; Frank Keller; Hakan Bilen; Laura Sevilla-Lara; |
1415 | Feature Aggregated Queries for Transformer-Based Video Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take a different perspective on video object detection. |
Yiming Cui; |
1416 | Context-Aware Pretraining for Efficient Blind Image Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study Blind Image Decomposition (BID), which is to uniformly remove multiple types of degradation at once without foreknowing the noise type. |
Chao Wang; Zhedong Zheng; Ruijie Quan; Yifan Sun; Yi Yang; |
1417 | Weakly Supervised Posture Mining for Fine-Grained Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel fine-grained framework named PMRC (posture mining and reverse cross-entropy), which is able to combine with different backbones to good effect. |
Zhenchao Tang; Hualin Yang; Calvin Yu-Chian Chen; |
1418 | LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks. |
Linjie Li; Zhe Gan; Kevin Lin; Chung-Ching Lin; Zicheng Liu; Ce Liu; Lijuan Wang; |
1419 | Decomposed Cross-Modal Distillation for RGB-Based Temporal Action Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a decomposed cross-modal distillation framework to build a strong RGB-based detector by transferring knowledge of the motion modality. |
Pilhyeon Lee; Taeoh Kim; Minho Shim; Dongyoon Wee; Hyeran Byun; |
1420 | PyramidFlow: High-Resolution Defect Contrastive Localization Using Pyramid Normalizing Flow Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although unsupervised methods have been successful in defect localization, the usual use of pre-trained models results in low-resolution outputs, which damages visual performance. To address this issue, we propose PyramidFlow, the first fully normalizing flow method without pre-trained models that enables high-resolution defect localization. |
Jiarui Lei; Xiaobo Hu; Yue Wang; Dong Liu; |
1421 | On-the-Fly Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study on-the-fly category discovery (OCD) aimed at making the model instantaneously aware of novel category samples (i.e., enabling inductive learning and streaming inference). |
Ruoyi Du; Dongliang Chang; Kongming Liang; Timothy Hospedales; Yi-Zhe Song; Zhanyu Ma; |
1422 | A Unified Knowledge Distillation Framework for Deep Directed Graphical Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel unified knowledge distillation framework for deep DGMs on various applications. |
Yizhuo Chen; Kaizhao Liang; Zhe Zeng; Shuochao Yao; Huajie Shao; |
1423 | MAIR: Multi-View Attention Inverse Rendering With 3D Spatially-Varying Lighting Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, a SVBRDF, and 3D spatially-varying lighting. |
JunYong Choi; SeokYeong Lee; Haesol Park; Seung-Won Jung; Ig-Jae Kim; Junghyun Cho; |
1424 | DF-Platter: Multi-Face Heterogeneous Deepfake Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this research, we emulate the real-world scenario of deepfake generation and spreading, and propose the DF-Platter dataset, which contains (i) both low-resolution and high-resolution deepfakes generated using multiple generation techniques and (ii) single-subject and multiple-subject deepfakes, with face images of Indian ethnicity. |
Kartik Narayan; Harsh Agarwal; Kartik Thakral; Surbhi Mittal; Mayank Vatsa; Richa Singh; |
1425 | Shifted Diffusion for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Corgi, a novel method for text-to-image generation. |
Yufan Zhou; Bingchen Liu; Yizhe Zhu; Xiao Yang; Changyou Chen; Jinhui Xu; |
1426 | Robust Unsupervised StyleGAN Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make StyleGAN image restoration robust: a single set of hyperparameters works across a wide range of degradation levels. |
Yohan Poirier-Ginter; Jean-François Lalonde; |
1427 | Blemish-Aware and Progressive Face Retouching With Limited Paired Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Blemish-aware and Progressive Face Retouching model, which is referred to as BPFRe. |
Lianxin Xie; Wen Xue; Zhen Xu; Si Wu; Zhiwen Yu; Hau San Wong; |
1428 | Event-Based Frame Interpolation With Ad-Hoc Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We instead propose a general method for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. |
Lei Sun; Christos Sakaridis; Jingyun Liang; Peng Sun; Jiezhang Cao; Kai Zhang; Qi Jiang; Kaiwei Wang; Luc Van Gool; |
1429 | OvarNet: Towards Open-Vocabulary Object Attribute Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario. |
Keyan Chen; Xiaolong Jiang; Yao Hu; Xu Tang; Yan Gao; Jianqi Chen; Weidi Xie; |
1430 | Detecting and Grounding Multi-Modal Media Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM^4). |
Rui Shao; Tianxing Wu; Ziwei Liu; |
1431 | Boosting Detection in Crowd Analysis Via Underutilized Output Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, the area size and confidence score of output proposals and bounding boxes provide insight into the scale and density of the crowd. To leverage these underutilized features, we propose Crowd Hat, a plug-and-play module that can be easily integrated with existing detection models. |
Shaokai Wu; Fengyu Yang; |
1432 | Human Pose As Compositional Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a structured representation, named Pose as Compositional Tokens (PCT), to explore the joint dependency. |
Zigang Geng; Chunyu Wang; Yixuan Wei; Ze Liu; Houqiang Li; Han Hu; |
1433 | K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The disparity occurs in defocus blurred regions between the two views of the DP pair, while the in-focus sharp regions have zero disparity. This motivates us to propose a K3DN framework for DP pair deblurring, and it has three modules: i) a disparity-aware deblur module. |
Yan Yang; Liyuan Pan; Liu Liu; Miaomiao Liu; |
1434 | 3D Line Mapping Revisited Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we aim to close the gap by introducing LIMAP, a library for 3D line mapping that robustly and efficiently creates 3D line maps from multi-view imagery. |
Shaohui Liu; Yifan Yu; Rémi Pautrat; Marc Pollefeys; Viktor Larsson; |
1435 | DartBlur: Privacy Preservation With Detection Artifact Suppression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel De-artifact Blurring(DartBlur) privacy-preserving method, which capitalizes on a DNN architecture to generate blurred faces. |
Baowei Jiang; Bing Bai; Haozhe Lin; Yu Wang; Yuchen Guo; Lu Fang; |
1436 | Synthesizing Photorealistic Virtual Humans Through Cross-Modal Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To allow for virtual human avatars to be used in practical scenarios, we propose an end-to-end framework for synthesizing high-quality virtual human faces capable of speaking with accurate lip motion with a special emphasis on performance. |
Siddarth Ravichandran; Ondřej Texler; Dimitar Dinev; Hyun Jae Kang; |
1437 | Test Time Adaptation With Regularized Loss for Weakly Supervised Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first approach for test-time Salient Object Detection (SOD) in the context of weak supervision. |
Olga Veksler; |
1438 | Self-Supervised Pre-Training With Masked Shape Prediction for 3D Scene Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, this paper introduces Masked Shape Prediction (MSP), a new framework to conduct masked signal modeling in 3D scenes. |
Li Jiang; Zetong Yang; Shaoshuai Shi; Vladislav Golyanik; Dengxin Dai; Bernt Schiele; |
1439 | Efficient and Explicit Modelling of Image Hierarchies for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The aim of this paper is to propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration. |
Yawei Li; Yuchen Fan; Xiaoyu Xiang; Denis Demandolx; Rakesh Ranjan; Radu Timofte; Luc Van Gool; |
1440 | Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate Source-free Unsupervised Domain Adaptation (SF-UDA), a specific case of UDA where a model is adapted to a target domain without access to source data. |
Mattia Litrico; Alessio Del Bue; Pietro Morerio; |
1441 | HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image. We show that these approaches exhibit a trade-off between three key properties: (i) accuracy – the likelihood of the ground-truth 3D solution under the predicted distribution, (ii) sample-input consistency – the extent to which 3D samples from the predicted distribution match the visible 2D image evidence, and (iii) sample diversity – the range of plausible 3D solutions modelled by the predicted distribution. |
Akash Sengupta; Ignas Budvytis; Roberto Cipolla; |
1442 | DKT: Diverse Knowledge Transfer Transformer for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To meet the challenge, we propose a novel framework, Diverse Knowledge Transfer Transformer (DKT). |
Xinyuan Gao; Yuhang He; Songlin Dong; Jie Cheng; Xing Wei; Yihong Gong; |
1443 | LipFormer: High-Fidelity and Generalizable Talking Face Generation With A Pre-Learned Facial Codebook Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose LipFormer, a transformer-based framework, to model the audio-visual coherence and predict the lip-codes sequence based on the input audio features. |
Jiayu Wang; Kang Zhao; Shiwei Zhang; Yingya Zhang; Yujun Shen; Deli Zhao; Jingren Zhou; |
1444 | Generalizable Local Feature Pre-Training for Deformable Shape Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In addition, there is currently a lack of understanding of what makes pre-trained features transferable across significantly different 3D shape categories. In this paper, we make a step toward addressing these challenges. |
Souhaib Attaiki; Lei Li; Maks Ovsjanikov; |
1445 | TarViS: A Unified Approach for Target-Based Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined ‘targets’ in video. |
Ali Athar; Alexander Hermans; Jonathon Luiten; Deva Ramanan; Bastian Leibe; |
1446 | Progressive Random Convolutions for Single Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problem, we propose a Progressive Random Convolution (Pro-RandConv) method that recursively stacks random convolution layers with a small kernel size instead of increasing the kernel size. |
Seokeon Choi; Debasmit Das; Sungha Choi; Seunghan Yang; Hyunsin Park; Sungrack Yun; |
1447 | IDGI: A Framework To Eliminate Explanation Noise From Integrated Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To minimize the noise, we examine the source of the noise analytically and propose a new approach to reduce the explanation noise based on our analytical findings. |
Ruo Yang; Binghui Wang; Mustafa Bilgic; |
1448 | OPE-SR: Orthogonal Position Encoding for Designing A Parameter-Free Upsampling Module in Arbitrary-Scale Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce orthogonal position encoding (OPE), an extension of position encoding, and an OPE-Upscale module to replace the INR-based upsampling module for arbitrary-scale image super-resolution. |
Gaochao Song; Qian Sun; Luo Zhang; Ran Su; Jianfeng Shi; Ying He; |
1449 | Implicit Surface Contrastive Clustering for LiDAR Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ISCC, a new self-supervised pretraining method, core of which are two pretext tasks newly designed for LiDAR point clouds. |
Zaiwei Zhang; Min Bai; Erran Li; |
1450 | EC2: Emergent Communication for Embodied Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose Emergent Communication for Embodied Control (EC^2), a novel scheme to pre-train video-language representations for few-shot embodied control. |
Yao Mu; Shunyu Yao; Mingyu Ding; Ping Luo; Chuang Gan; |
1451 | Semantic Ray: Learning A Generalizable Semantic Field With Cross-Reprojection Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to learn a semantic radiance field from multiple scenes that is accurate, efficient and generalizable. |
Fangfu Liu; Chubin Zhang; Yu Zheng; Yueqi Duan; |
1452 | DynamicDet: A Unified Dynamic Architecture for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, it is challenging to design a powerful dynamic detector, because of no suitable dynamic architecture and exiting criterion for object detection. To tackle these difficulties, we propose a dynamic framework for object detection, named DynamicDet. |
Zhihao Lin; Yongtao Wang; Jinhe Zhang; Xiaojie Chu; |
1453 | I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a novel perspective on using an LLM to provide text supervision for a zero-shot image classification model. |
Muhammad Ferjad Naeem; Muhammad Gul Zain Ali Khan; Yongqin Xian; Muhammad Zeshan Afzal; Didier Stricker; Luc Van Gool; Federico Tombari; |
1454 | MixSim: A Hierarchical Framework for Mixed Reality Traffic Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we present MixSim, a hierarchical framework for mixed reality traffic simulation. |
Simon Suo; Kelvin Wong; Justin Xu; James Tu; Alexander Cui; Sergio Casas; Raquel Urtasun; |
1455 | ORCa: Glossy Objects As Radiance-Field Cameras Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key insight is to convert the object surface into a virtual sensor that captures cast reflections as a 2D projection of the 5D environment radiance field visible to and surrounding the object. |
Kushagra Tiwary; Akshat Dave; Nikhil Behari; Tzofi Klinghoffer; Ashok Veeraraghavan; Ramesh Raskar; |
1456 | SECAD-Net: Self-Supervised CAD Reconstruction By Learning Sketch-Extrude Operations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce SECAD-Net, an end-to-end neural network aimed at reconstructing compact and easy-to-edit CAD models in a self-supervised manner. |
Pu Li; Jianwei Guo; Xiaopeng Zhang; Dong-Ming Yan; |
1457 | Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a generic 3D-language pre-training approach, that tackles multiple facets of 3D-language reasoning by learning universal representations. |
Zhao Jin; Munawar Hayat; Yuwei Yang; Yulan Guo; Yinjie Lei; |
1458 | MDL-NAS: A Joint Multi-Domain Learning Framework for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce MDL-NAS, a unified framework that integrates multiple vision tasks into a manageable supernet and optimizes these tasks collectively under diverse dataset domains. |
Shiguang Wang; Tao Xie; Jian Cheng; Xingcheng Zhang; Haijun Liu; |
1459 | Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous works merely alleviate the domain shift, which however overlook the pairwise misalignment issue in target domain, i.e., there exist no semantic relationships between target videos and texts. To tackle this, we propose a novel method named Dual Alignment Domain Adaptation (DADA). |
Xiaoshuai Hao; Wanqian Zhang; Dayan Wu; Fei Zhu; Bo Li; |
1460 | Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. |
Samarth Sinha; Roman Shapovalov; Jeremy Reizenstein; Ignacio Rocco; Natalia Neverova; Andrea Vedaldi; David Novotny; |
1461 | Generalized Decoding for Pixel, Image, and Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. |
Xueyan Zou; Zi-Yi Dou; Jianwei Yang; Zhe Gan; Linjie Li; Chunyuan Li; Xiyang Dai; Harkirat Behl; Jianfeng Wang; Lu Yuan; Nanyun Peng; Lijuan Wang; Yong Jae Lee; Jianfeng Gao; |
1462 | Towards Unified Scene Text Spotting Based on Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although some auto-regressive models have demonstrated promising results in end-to-end text spotting, they use specific detection formats while ignoring various text shapes and are limited in the maximum number of text instances that can be detected. To overcome these limitations, we propose a UNIfied scene Text Spotter, called UNITS. |
Taeho Kil; Seonghyeon Kim; Sukmin Seo; Yoonsik Kim; Daehee Kim; |
1463 | Normal-Guided Garment UV Prediction for Human Re-Texturing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that it is possible to edit dressed human images and videos without 3D reconstruction. |
Yasamin Jafarian; Tuanfeng Y. Wang; Duygu Ceylan; Jimei Yang; Nathan Carr; Yi Zhou; Hyun Soo Park; |
1464 | Learning Compact Representations for LiDAR Completion and Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, dense LiDARs are very expensive and the point clouds captured by low-beam LiDAR are often sparse. To address these issues, we present UltraLiDAR, a data-driven framework for scene-level LiDAR completion, LiDAR generation, and LiDAR manipulation. |
Yuwen Xiong; Wei-Chiu Ma; Jingkang Wang; Raquel Urtasun; |
1465 | Computational Flash Photography Through Intrinsics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the computational control of the flash light in photographs taken with or without flash. |
Sepideh Sarajian Maralan; Chris Careaga; Yagiz Aksoy; |
1466 | Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-Shot Learning With Hyperspherical Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the hubness problem in FSL, we first prove that hubness can be eliminated by distributing representations uniformly on the hypersphere. We then propose two new approaches to embed representations on the hypersphere, which we prove optimize a tradeoff between uniformity and local similarity preservation — reducing hubness while retaining class structure. |
Daniel J. Trosten; Rwiddhi Chakraborty; Sigurd Løkse; Kristoffer Knutsen Wickstrøm; Robert Jenssen; Michael C. Kampffmeyer; |
1467 | Improving Graph Representation for Point Cloud Segmentation Via Attentive Filtering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we employ a hybrid architecture design to construct our Graph Convolution Network with Attentive Filtering (AF-GCN), which takes advantage of both graph convolution and self-attention mechanism. |
Nan Zhang; Zhiyi Pan; Thomas H. Li; Wei Gao; Ge Li; |
1468 | SpaText: Spatio-Textual Representation for Controllable Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present SpaText — a new method for text-to-image generation using open-vocabulary scene control. |
Omri Avrahami; Thomas Hayes; Oran Gafni; Sonal Gupta; Yaniv Taigman; Devi Parikh; Dani Lischinski; Ohad Fried; Xi Yin; |
1469 | The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch. |
Ruohan Gao; Yiming Dou; Hao Li; Tanmay Agarwal; Jeannette Bohg; Yunzhu Li; Li Fei-Fei; Jiajun Wu; |
1470 | ScaleFL: Resource-Adaptive Federated Learning With Heterogeneous Clients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents ScaleFL, a novel FL approach with two distinctive mechanisms to handle resource heterogeneity and provide an equitable FL framework for all clients. |
Fatih Ilhan; Gong Su; Ling Liu; |
1471 | X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces X3KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. |
Marvin Klingner; Shubhankar Borse; Varun Ravi Kumar; Behnaz Rezaei; Venkatraman Narayanan; Senthil Yogamani; Fatih Porikli; |
1472 | PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present PCT-Net, a simple and general image harmonization method that can be easily applied to images at full resolution. |
Julian Jorge Andrade Guerreiro; Mitsuru Nakazawa; Björn Stenger; |
1473 | Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address those issues, we propose a unified framework, dubbed PURER, which contains: (1) ePisode cUrriculum inveRsion (ECI) during data-free meta training; and (2) invErsion calibRation following inner loop (ICFIL) during meta testing. |
Zixuan Hu; Li Shen; Zhenyi Wang; Tongliang Liu; Chun Yuan; Dacheng Tao; |
1474 | Egocentric Video Task Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. |
Zihui Xue; Yale Song; Kristen Grauman; Lorenzo Torresani; |
1475 | Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in A Wide Variety of Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose a noise-accounted RAW image augmentation method. |
Masakazu Yoshimura; Junji Otsuka; Atsushi Irie; Takeshi Ohashi; |
1476 | Reliable and Interpretable Personalized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Herein, a reliable personalized federated learning approach, termed RIPFL, is proposed and fully interpreted from the perspective of social learning. |
Zixuan Qin; Liu Yang; Qilong Wang; Yahong Han; Qinghua Hu; |
1477 | Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the optimal transport minimization (OT-M) algorithm for crowd localization with density maps. |
Wei Lin; Antoni B. Chan; |
1478 | AdamsFormer for Spatial Action Localization in The Future Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new task called spatial action localization in the future (SALF), which aims to predict action locations in both observed and future frames. |
Hyung-gun Chi; Kwonjoon Lee; Nakul Agarwal; Yi Xu; Karthik Ramani; Chiho Choi; |
1479 | Leveraging Per Image-Token Consistency for Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To handle those limitations, we propose EPIC (lEveraging Per Image-Token Consistency for vision-language pre-training). |
Yunhao Gou; Tom Ko; Hansi Yang; James Kwok; Yu Zhang; Mingxuan Wang; |
1480 | BITE: Beyond Priors for Improved Three-D Dog Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem of inferring the 3D shape and pose of dogs from images. |
Nadine Rüegg; Shashank Tripathi; Konrad Schindler; Michael J. Black; Silvia Zuffi; |
1481 | Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel model for singleimage super-resolution based on Equivalent Transformation and Dual Stream network construction (ETDS). |
Jiahao Chao; Zhou Zhou; Hongfan Gao; Jiali Gong; Zhengfeng Yang; Zhenbing Zeng; Lydia Dehbi; |
1482 | UTM: A Unified Multiple Object Tracking Model With Identity-Aware Feature Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the limitations of existing methods, we introduce a novel Unified Tracking Model (UTM) to bridge those three components for generating a positive feedback loop with mutual benefits. |
Sisi You; Hantao Yao; Bing-Kun Bao; Changsheng Xu; |
1483 | On The Stability-Plasticity Dilemma of Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to shed light on how effectively recent class-incremental learning algorithms address the stability-plasticity trade-off. |
Dongwan Kim; Bohyung Han; |
1484 | Generalization Matters: Loss Minima Flattening Via Parameter Hybridization for Efficient Online Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we strive to fully utilize multi-model settings instead of well-designed modules to achieve a distillation effect with excellent generalization performance. |
Tianli Zhang; Mengqi Xue; Jiangtao Zhang; Haofei Zhang; Yu Wang; Lechao Cheng; Jie Song; Mingli Song; |
1485 | Gaussian Label Distribution Learning for Spherical Image Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we design a simple but effective regression loss based on Gaussian Label Distribution Learning (GLDL) for spherical image object detection. |
Hang Xu; Xinyuan Liu; Qiang Zhao; Yike Ma; Chenggang Yan; Feng Dai; |
1486 | High-Resolution Image Reconstruction With Latent Diffusion Models From Human Brain Activity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a new method based on a diffusion model (DM) to reconstruct images from human brain activity obtained via functional magnetic resonance imaging (fMRI). |
Yu Takagi; Shinji Nishimoto; |
1487 | L-CoIns: Language-Based Colorization With Instance Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transformer-based framework to automatically aggregate similar image patches and achieve instance awareness without any additional knowledge. |
Zheng Chang; Shuchen Weng; Peixuan Zhang; Yu Li; Si Li; Boxin Shi; |
1488 | On The Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, we find large variations in the development of self-supervision-based methods for deep MVC, potentially slowing the progress of the field. To address this, we present DeepMVC, a unified framework for deep MVC that includes many recent methods as instances. |
Daniel J. Trosten; Sigurd Løkse; Robert Jenssen; Michael C. Kampffmeyer; |
1489 | Activating More Pixels in Image Super-Resolution Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). |
Xiangyu Chen; Xintao Wang; Jiantao Zhou; Yu Qiao; Chao Dong; |
1490 | BEV-SAN: Accurate BEV 3D Object Detection Via Slice Attention Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method named BEV Slice Attention Network (BEV-SAN) for exploiting the intrinsic characteristics of different heights. |
Xiaowei Chi; Jiaming Liu; Ming Lu; Rongyu Zhang; Zhaoqing Wang; Yandong Guo; Shanghang Zhang; |
1491 | The Dark Side of Dynamic Routing Neural Networks: Towards Efficiency Backdoor Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we investigate whether adversaries can manipulate DyNNs’ computational costs to create a false sense of efficiency. To address this question, we propose EfficFrog, an adversarial attack that injects universal efficiency backdoors in DyNNs. |
Simin Chen; Hanlin Chen; Mirazul Haque; Cong Liu; Wei Yang; |
1492 | Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the datasets, we design a novel Cross-MOdal fuSion network (CMOS) that estimate both blur and semantics simultaneously, which leads to improved SR results. |
Xuhai Chen; Jiangning Zhang; Chao Xu; Yabiao Wang; Chengjie Wang; Yong Liu; |
1493 | MixTeacher: Mining Promising Labels With Mixed Scale Teacher for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we delve into the scale variation problem, and propose a novel framework by introducing a mixed scale teacher to improve the pseudo labels generation and scale invariant learning. |
Liang Liu; Boshen Zhang; Jiangning Zhang; Wuhao Zhang; Zhenye Gan; Guanzhong Tian; Wenbing Zhu; Yabiao Wang; Chengjie Wang; |
1494 | DARE-GRAM: Unsupervised Domain Adaptation Regression By Aligning Inverse Gram Matrices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a different perspective for the DAR problem by analyzing the closed-form ordinary least square (OLS) solution to the linear regressor in the deep domain adaptation context. |
Ismail Nejjar; Qin Wang; Olga Fink; |
1495 | Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a straightforward method for alleviating the problem — copy-pasting labeled and unlabeled data bidirectionally, in a simple Mean Teacher architecture. |
Yunhao Bai; Duowen Chen; Qingli Li; Wei Shen; Yan Wang; |
1496 | Learning Discriminative Representations for Skeleton Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It results in some ambiguous actions that are hard to be distinguished and tend to be misclassified. To alleviate this problem, we propose an auxiliary feature refinement head (FR Head), which consists of spatial-temporal decoupling and contrastive feature refinement, to obtain discriminative representations of skeletons. |
Huanyu Zhou; Qingjie Liu; Yunhong Wang; |
1497 | NeRF in The Palm of Your Hand: Corrective Augmentation for Robotics Via Novel-View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce SPARTN (Synthetic Perturbations for Augmenting Robot Trajectories via NeRF): a fully-offline data augmentation scheme for improving robot policies that use eye-in-hand cameras. |
Allan Zhou; Moo Jin Kim; Lirui Wang; Pete Florence; Chelsea Finn; |
1498 | NeuMap: Neural Coordinate Mapping By Auto-Transdecoder for Camera Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. |
Shitao Tang; Sicong Tang; Andrea Tagliasacchi; Ping Tan; Yasutaka Furukawa; |
1499 | AShapeFormer: Semantics-Guided Object-Level Active Shape Encoding for 3D Object Detection Via Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose AShapeFormer, a semantics-guided object-level shape encoding module for 3D object detection. |
Zechuan Li; Hongshan Yu; Zhengeng Yang; Tongjia Chen; Naveed Akhtar; |
1500 | SeSDF: Self-Evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a flexible framework which, by leveraging the parametric SMPL-X model, can take an arbitrary number of input images to reconstruct a clothed human model under an uncalibrated setting. |
Yukang Cao; Kai Han; Kwan-Yee K. Wong; |
1501 | Deep Depth Estimation From Thermal Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper, we first built a large-scale Multi-Spectral Stereo (MS^2) dataset, including stereo RGB, stereo NIR, stereo thermal, and stereo LiDAR data along with GNSS/IMU information. The collected dataset provides about 195K synchronized data pairs taken from city, residential, road, campus, and suburban areas in the morning, daytime, and nighttime under clear-sky, cloudy, and rainy conditions. Secondly, we conduct an exhaustive validation process of monocular and stereo depth estimation algorithms designed on visible spectrum bands to benchmark their performance in the thermal image domain. |
Ukcheol Shin; Jinsun Park; In So Kweon; |
1502 | Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences Between Pretrained Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an alternative approach that compares a newly developed GAN against a prior baseline. |
Matthew L. Olson; Shusen Liu; Rushil Anirudh; Jayaraman J. Thiagarajan; Peer-Timo Bremer; Weng-Keen Wong; |
1503 | Building Rearticulable Models for Arbitrary 3D Objects From 4D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build rearticulable models for arbitrary everyday man-made objects containing an arbitrary number of parts that are connected together in arbitrary ways via 1-degree-of-freedom joints. |
Shaowei Liu; Saurabh Gupta; Shenlong Wang; |
1504 | Backdoor Defense Via Adaptively Splitting Poisoned Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we summarize the training-time defenses from a unified framework as splitting the poisoned dataset into two data pools. |
Kuofeng Gao; Yang Bai; Jindong Gu; Yong Yang; Shu-Tao Xia; |
1505 | Neural Congealing: Aligning Images to A Joint Semantic Atlas Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Neural Congealing — a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. |
Dolev Ofri-Amar; Michal Geyer; Yoni Kasten; Tali Dekel; |
1506 | Adaptive Spot-Guided Transformer for Consistent Local Feature Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, most methods struggle with large scale variations. To deal with the above issues, we propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching, which jointly models the local consistency and scale variations in a unified coarse-to-fine architecture. |
Jiahuan Yu; Jiahao Chang; Jianfeng He; Tianzhu Zhang; Jiyang Yu; Feng Wu; |
1507 | Wide-Angle Rectification Via Content-Aware Conformal Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing rectification methods adopt a global warping transformation to undistort the input wide-angle image, yet their performances are not entirely satisfactory, leaving many unwanted residue distortions uncorrected or at the sacrifice of the intended wide FoV (field-of-view). This paper proposes a new method to tackle these challenges. |
Qi Zhang; Hongdong Li; Qing Wang; |
1508 | Towards Stable Human Pose Estimation Via Cross-View Fusion and Foot Stabilization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first propose the Cross-View Fusion (CVF) module to catch up with better 3D intermediate representation and alleviate the view inconsistency based on the vision transformer encoder. Then the optimization-based method is introduced to reconstruct the foot pose and foot-ground contact for the general multi-view datasets including AIST++ and Human3.6M. Besides, the reversible kinematic topology strategy is innovated to utilize the contact information into the full-body with foot pose regressor. |
Li’an Zhuo; Jian Cao; Qi Wang; Bang Zhang; Liefeng Bo; |
1509 | Few-Shot Non-Line-of-Sight Imaging With Signal-Surface Collaborative Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a signal-surface collaborative regularization (SSCR) framework that provides noise-robust reconstructions with a minimal number of measurements. |
Xintong Liu; Jianyu Wang; Leping Xiao; Xing Fu; Lingyun Qiu; Zuoqiang Shi; |
1510 | SINE: SINgle Image Editing With Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work aims to address the problem of single-image editing. |
Zhixing Zhang; Ligong Han; Arnab Ghosh; Dimitris N. Metaxas; Jian Ren; |
1511 | Probabilistic Debiasing of Scene Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose virtual evidence incorporated within-triplet Bayesian Network (BN) to preserve the object-conditional distribution of the relationship label and to eradicate the bias created by the marginal probability of the relationships. |
Bashirul Azam Biswas; Qiang Ji; |
1512 | OSAN: A One-Stage Alignment Network To Unify Multimodal Alignment and Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, domains and modalities are not associated in most existing two-stage studies, and the relationship between them is not leveraged which can provide complementary information to each other. In this paper, we unify these two stages into one to align domains and modalities simultaneously. |
Ye Liu; Lingfeng Qiao; Changchong Lu; Di Yin; Chen Lin; Haoyuan Peng; Bo Ren; |
1513 | Token Turing Machines Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. |
Michael S. Ryoo; Keerthana Gopalakrishnan; Kumara Kahatapitiya; Ted Xiao; Kanishka Rao; Austin Stone; Yao Lu; Julian Ibarz; Anurag Arnab; |
1514 | Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we combine the ideas from the conventional model-based iterative reconstruction with the modern diffusion models, which leads to a highly effective method for solving 3D medical image reconstruction tasks such as sparse-view tomography, limited angle tomography, compressed sensing MRI from pre-trained 2D diffusion models. |
Hyungjin Chung; Dohoon Ryu; Michael T. McCann; Marc L. Klasky; Jong Chul Ye; |
1515 | Heat Diffusion Based Multi-Scale and Geometric Structure-Aware Transformer for Mesh Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Current point based Transformer models fail to tackle such challenges and thus provide inferior performance for discretized surface segmentation. In this work, heat diffusion based method is exploited to tackle these problems. |
Chi-Chong Wong; |
1516 | DyNCA: Real-Time Dynamic Texture Synthesis Using Neural Cellular Automata Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Dynamic Neural Cellular Automata (DyNCA), a framework for real-time and controllable dynamic texture synthesis. |
Ehsan Pajouheshgar; Yitao Xu; Tong Zhang; Sabine Süsstrunk; |
1517 | Semantic-Promoted Debiasing and Background Disambiguation for Zero-Shot Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose D^2Zero with Semantic-Promoted Debiasing and Background Disambiguation to enhance the performance of Zero-shot instance segmentation. |
Shuting He; Henghui Ding; Wei Jiang; |
1518 | RelightableHands: Efficient Neural Relighting of Articulated Hand Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the first neural relighting approach for rendering high-fidelity personalized hands that can be animated in real-time under novel illumination. |
Shun Iwase; Shunsuke Saito; Tomas Simon; Stephen Lombardi; Timur Bagautdinov; Rohan Joshi; Fabian Prada; Takaaki Shiratori; Yaser Sheikh; Jason Saragih; |
1519 | Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present an alternative lightweight strategy called Paired-Point Lifting (PPL) for constructing 3D line clouds. |
Chunghwan Lee; Jaihoon Kim; Chanhyuk Yun; Je Hyeong Hong; |
1520 | Depth Estimation From Camera Image and MmWave Radar Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a method for inferring dense depth from a camera image and a sparse noisy radar point cloud. |
Akash Deep Singh; Yunhao Ba; Ankur Sarker; Howard Zhang; Achuta Kadambi; Stefano Soatto; Mani Srivastava; Alex Wong; |
1521 | Learning Event Guided High Dynamic Range Video Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a multimodal learning framework for event guided HDR video reconstruction. |
Yixin Yang; Jin Han; Jinxiu Liang; Imari Sato; Boxin Shi; |
1522 | Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on A Knowledge-Guided Relation Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current archaeology depends on trained experts to carry out bronze dating, which is time-consuming and labor-intensive. For such dating, in this study, we propose a learning-based approach to integrate advanced deep learning techniques and archaeological knowledge. |
Rixin Zhou; Jiafu Wei; Qian Zhang; Ruihua Qi; Xi Yang; Chuntao Li; |
1523 | CASP-Net: Rethinking Video Saliency Prediction From An Audio-Visual Consistency Perceptual Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the biological inconsistency-correction within multi-sensory information, in this study, a consistency-aware audio-visual saliency prediction network (CASP-Net) is proposed, which takes a comprehensive consideration of the audio-visual semantic interaction and consistent perception. |
Junwen Xiong; Ganglai Wang; Peng Zhang; Wei Huang; Yufei Zha; Guangtao Zhai; |
1524 | Learning Expressive Prompting With Residuals for Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Expressive Prompts with Residuals (EXPRES) which modifies the prompt learning paradigm specifically for effective adaptation of vision transformers (ViT). |
Rajshekhar Das; Yonatan Dukler; Avinash Ravichandran; Ashwin Swaminathan; |
1525 | Prototypical Residual Networks for Anomaly Detection and Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, anomalies are typically subtle, hard to discern, and of various appearance, making it difficult to detect anomalies and let alone locate anomalous regions. To address these issues, we propose a framework called Prototypical Residual Network (PRN), which learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions. |
Hui Zhang; Zuxuan Wu; Zheng Wang; Zhineng Chen; Yu-Gang Jiang; |
1526 | What Happened 3 Seconds Ago? Inferring The Past With Thermal Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thermal images, on the other hand, encode traces of past human-object interactions left in the environment via thermal radiation measurement. Based on this observation, we collect the first RGB-Thermal dataset for human motion analysis, dubbed Thermal-IM. |
Zitian Tang; Wenjie Ye; Wei-Chiu Ma; Hang Zhao; |
1527 | Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes SparseMat, a computationally efficient approach for UHR image/video matting. |
Yanan Sun; Chi-Keung Tang; Yu-Wing Tai; |
1528 | AnyFlow: Arbitrary Scale Optical Flow With Implicit Neural Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce AnyFlow, a robust network that estimates accurate flow from images of various resolutions. |
Hyunyoung Jung; Zhuo Hui; Lei Luo; Haitao Yang; Feng Liu; Sungjoo Yoo; Rakesh Ranjan; Denis Demandolx; |
1529 | Zero-Shot Noise2Noise: Efficient Image Denoising Without Any Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we show that a simple 2-layer network, without any training data or knowledge of the noise distribution, can enable high-quality image denoising at low computational cost. |
Youssef Mansour; Reinhard Heckel; |
1530 | Vector Quantization With Self-Attention for Quality-Independent Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by sparse representation in image restoration, we opt to address this issue by learning image-quality-independent feature representation in a simple plug-and-play manner, that is, to introduce discrete vector quantization (VQ) to remove redundancy in recognition models. |
Zhou Yang; Weisheng Dong; Xin Li; Mengluan Huang; Yulin Sun; Guangming Shi; |
1531 | Generating Anomalies for Video Anomaly Detection With Prompt-Based Feature Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to solve the problem of the anomaly gap and scene gap by proposing a prompt-based feature mapping framework (PFMF). |
Zuhao Liu; Xiao-Ming Wu; Dian Zheng; Kun-Yu Lin; Wei-Shi Zheng; |
1532 | Diffusion-Based Signed Distance Fields for 3D Shape Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a 3D shape generation framework (SDF-Diffusion in short) that uses denoising diffusion models with continuous 3D representation via signed distance fields (SDF). |
Jaehyeok Shim; Changwoo Kang; Kyungdon Joo; |
1533 | Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition From Egocentric RGB Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address occlusion and ambiguity, we develop a transformer-based framework to exploit temporal information for robust estimation. |
Yilin Wen; Hao Pan; Lei Yang; Jia Pan; Taku Komura; Wenping Wang; |
1534 | CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new framework named CAP-VSTNet, which consists of a new reversible residual network and an unbiased linear transform module, for versatile style transfer. |
Linfeng Wen; Chengying Gao; Changqing Zou; |
1535 | FIANCEE: Faster Inference of Adversarial Networks Via Conditional Early Exits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for diminishing computations by adding so-called early exit branches to the original architecture, and dynamically switching the computational path depending on how difficult it will be to render the output. |
Polina Karpikova; Ekaterina Radionova; Anastasia Yaschenko; Andrei Spiridonov; Leonid Kostyushko; Riccardo Fabbricatore; Aleksei Ivakhnenko; |
1536 | Simultaneously Short- and Long-Term Temporal Modeling for Semi-Supervised Video Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel feature enhancement network to simultaneously model short- and long-term temporal correlation. |
Jiangwei Lao; Weixiang Hong; Xin Guo; Yingying Zhang; Jian Wang; Jingdong Chen; Wei Chu; |
1537 | Federated Domain Generalization With Generalization Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, without the support of multi-domain data jointly in the mini-batch training, almost all methods cannot guarantee the generalization under domain shift. To overcome this problem, we propose a novel global objective incorporating a new variance reduction regularizer to encourage fairness. |
Ruipeng Zhang; Qinwei Xu; Jiangchao Yao; Ya Zhang; Qi Tian; Yanfeng Wang; |
1538 | Tunable Convolutions With Parametric Multi-Loss Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to optimize a parametric tunable convolutional layer, which includes a number of different kernels, using a parametric multi-loss, which includes an equal number of objectives. |
Matteo Maggioni; Thomas Tanay; Francesca Babiloni; Steven McDonagh; Aleš Leonardis; |
1539 | Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From Only Image-Text Pairs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we proposed a novel Text-grounded Contrastive Learning (TCL) framework that enables a model to directly learn region-text alignment. |
Junbum Cha; Jonghwan Mun; Byungseok Roh; |
1540 | CoMFormer: Continual Learning in Semantic and Panoptic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first continual learning model capable of operating on both semantic and panoptic segmentation. |
Fabio Cermelli; Matthieu Cord; Arthur Douillard; |
1541 | DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously. |
Maoyuan Ye; Jing Zhang; Shanshan Zhao; Juhua Liu; Tongliang Liu; Bo Du; Dacheng Tao; |
1542 | Conditional Generation of Audio From Video Via Foley Analogies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The sound effects that designers add to videos are designed to convey a particular artistic effect and, thus, may be quite different from a scene’s true sound. Inspired by the challenges of creating a soundtrack for a video that differs from its true sound, but that nonetheless matches the actions occurring on screen, we propose the problem of conditional Foley. |
Yuexi Du; Ziyang Chen; Justin Salamon; Bryan Russell; Andrew Owens; |
1543 | Diverse 3D Hand Gesture Prediction From Body Dynamics By Bilateral Hand Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel bilateral hand disentanglement based two-stage 3D hand generation method to achieve natural and diverse 3D hand prediction from body dynamics. |
Xingqun Qi; Chen Liu; Muyi Sun; Lincheng Li; Changjie Fan; Xin Yu; |
1544 | DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a new approach for "personalization" of text-to-image diffusion models. |
Nataniel Ruiz; Yuanzhen Li; Varun Jampani; Yael Pritch; Michael Rubinstein; Kfir Aberman; |
1545 | MOSO: Decomposing MOtion, Scene and Object for Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, objects represent the foreground, scenes represent the background, and motion traces their dynamics. Based on this insight, we propose a two-stage MOtion, Scene and Object decomposition framework (MOSO) for video prediction, consisting of MOSO-VQVAE and MOSO-Transformer. |
Mingzhen Sun; Weining Wang; Xinxin Zhu; Jing Liu; |
1546 | Shakes on A Plane: Unsupervised Depth Estimation From Unstabilized Photography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion. |
Ilya Chugunov; Yuxuan Zhang; Felix Heide; |
1547 | Learning Video Representations From Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce LAVILA, a new approach to learning video-language representations by leveraging Large Language Models (LLMs). |
Yue Zhao; Ishan Misra; Philipp Krähenbühl; Rohit Girdhar; |
1548 | Learning The Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new loss function for joint disparity and uncertainty estimation in deep stereo matching. |
Liyan Chen; Weihan Wang; Philippos Mordohai; |
1549 | Learning Correspondence Uncertainty Via Differentiable Nonlinear Least Squares Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a differentiable nonlinear least squares framework to account for uncertainty in relative pose estimation from feature correspondences. |
Dominik Muhle; Lukas Koestler; Krishna Murthy Jatavallabhula; Daniel Cremers; |
1550 | Samples With Low Loss Curvature Improve Data Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the second order properties of the loss of trained deep neural networks with respect to the training data points to understand the curvature of the loss surface in the vicinity of these points. |
Isha Garg; Kaushik Roy; |
1551 | Towards Effective Visual Representations for Partial-Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we rethink a state-of-the-art contrastive PLL method PiCO [24], inspiring the design of a simple framework termed PaPi (Partial-label learning with a guided Prototypical classifier), which demonstrates significant scope for improvement in representation learning, thus contributing to label disambiguation. |
Shiyu Xia; Jiaqi Lv; Ning Xu; Gang Niu; Xin Geng; |
1552 | MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a simple yet effective framework MaskCLIP, which incorporates a newly proposed masked self-distillation into contrastive language-image pretraining. |
Xiaoyi Dong; Jianmin Bao; Yinglin Zheng; Ting Zhang; Dongdong Chen; Hao Yang; Ming Zeng; Weiming Zhang; Lu Yuan; Dong Chen; Fang Wen; Nenghai Yu; |
1553 | Open-Vocabulary Semantic Segmentation With Mask-Adapted CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We identify the performance bottleneck of this paradigm to be the pre-trained CLIP model, since it does not perform well on masked images. To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions. |
Feng Liang; Bichen Wu; Xiaoliang Dai; Kunpeng Li; Yinan Zhao; Hang Zhang; Peizhao Zhang; Peter Vajda; Diana Marculescu; |
1554 | A Loopback Network for Explainable Microvascular Invasion Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, aiming to develop an accurate, objective, and explainable diagnosis tool for MVI, we propose a Loopback Network (LoopNet) for classifying MVI efficiently. |
Shengxuming Zhang; Tianqi Shi; Yang Jiang; Xiuming Zhang; Jie Lei; Zunlei Feng; Mingli Song; |
1555 | TINC: Tree-Structured Implicit Neural Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Preliminary studies can only exploit either global or local correlation in the target data and thus of limited performance. In this paper, we propose a Tree-structured Implicit Neural Compression (TINC) to conduct compact representation for local regions and extract the shared features of these local representations in a hierarchical manner. |
Runzhao Yang; |
1556 | Unifying Short and Long-Term Tracking With Graph Hierarchies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we question the need for hybrid approaches and introduce SUSHI, a unified and scalable multi-object tracker. |
Orcun Cetintas; Guillem Brasó; Laura Leal-Taixé; |
1557 | Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To improve part synthesis, this paper presents to infer Parts from Object ShapE (iPOSE) and leverage it for improving semantic image synthesis. |
Yuxiang Wei; Zhilong Ji; Xiaohe Wu; Jinfeng Bai; Lei Zhang; Wangmeng Zuo; |
1558 | MIME: Human-Aware 3D Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MIME (Mining Interaction and Movement to infer 3D Environments), which is a generative model of indoor scenes that produces furniture layouts that are consistent with the human movement. |
Hongwei Yi; Chun-Hao P. Huang; Shashank Tripathi; Lea Hering; Justus Thies; Michael J. Black; |
1559 | Re-Basin Via Implicit Sinkhorn Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Sinkhorn re-basin network with the ability to obtain the transportation plan that better suits a given objective. |
Fidel A. Guerrero Peña; Heitor Rapela Medeiros; Thomas Dubail; Masih Aminbeidokhti; Eric Granger; Marco Pedersoli; |
1560 | NerVE: Neural Volumetric Edges for Parametric Curve Extraction From Point Cloud Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing approaches mainly rely on keypoint detection, a challenging procedure that tends to generate noisy output, making the subsequent edge extraction error-prone. To address this issue, we propose to directly detect structured edges to circumvent the limitations of the previous point-wise methods. |
Xiangyu Zhu; Dong Du; Weikai Chen; Zhiyou Zhao; Yinyu Nie; Xiaoguang Han; |
1561 | ShapeClipper: Scalable 3D Shape Learning From Single-View Images Via Geometric and CLIP-Based Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images. |
Zixuan Huang; Varun Jampani; Anh Thai; Yuanzhen Li; Stefan Stojanov; James M. Rehg; |
1562 | Supervised Masked Knowledge Distillation for Few-Shot Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by recent advances in self-supervised knowledge distillation and masked image modeling (MIM), we propose a novel Supervised Masked Knowledge Distillation model (SMKD) for few-shot Transformers which incorporates label information into self-distillation frameworks. |
Han Lin; Guangxing Han; Jiawei Ma; Shiyuan Huang; Xudong Lin; Shih-Fu Chang; |
1563 | RIDCP: Revitalizing Real Image Dehazing Via High-Quality Codebook Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, (1) instead of adopting the de facto physical scattering model, we rethink the degradation of real hazy images and propose a phenomenological pipeline considering diverse degradation types. |
Rui-Qi Wu; Zheng-Peng Duan; Chun-Le Guo; Zhi Chai; Chongyi Li; |
1564 | Exact-NeRF: An Exploration of A Precise Volumetric Parameterization for Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the use of an exact approach for calculating the IPE by using a pyramid-based integral formulation instead of an approximated conical-based one. |
Brian K. S. Isaac-Medina; Chris G. Willcocks; Toby P. Breckon; |
1565 | Backdoor Attacks Against Deep Image Compression Via Adaptive Frequency Trigger Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel backdoor attack with multiple triggers against learned image compression models. |
Yap-Peng Tan; Alex C. Kot; Yi Yu; Yufei Wang; Wenhan Yang; Shijian Lu; |
1566 | Recurrence Without Recurrence: Stable Video Landmark Detection With Deep Equilibrium Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Cascaded computation, whereby predictions are recurrently refined over several stages, has been a persistent theme throughout the development of landmark detection models. In this work, we show that the recently proposed Deep Equilibrium Model (DEQ) can be naturally adapted to this form of computation. |
Paul Micaelli; Arash Vahdat; Hongxu Yin; Jan Kautz; Pavlo Molchanov; |
1567 | Generalized Relation Modeling for Transformer Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This could potentially lead to target-background confusion when the extracted feature representations are not sufficiently discriminative. To alleviate this issue, we propose a generalized relation modeling method based on adaptive token division. |
Shenyuan Gao; Chunluan Zhou; Jun Zhang; |
1568 | Non-Line-of-Sight Imaging With Signal Superresolution Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a general learning-based pipeline for increasing imaging quality with only a few scanning points. |
Jianyu Wang; Xintong Liu; Leping Xiao; Zuoqiang Shi; Lingyun Qiu; Xing Fu; |
1569 | WildLight: In-the-Wild Inverse Rendering With A Flashlight Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a practical photometric solution for the challenging problem of in-the-wild inverse rendering under unknown ambient lighting. |
Ziang Cheng; Junxuan Li; Hongdong Li; |
1570 | A Probabilistic Attention Model With Occlusion-Aware Texture Regression for 3D Hand Reconstruction From A Single RGB Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These approaches can be roughly divided into model-based approaches, which are heavily dependent on the model’s parameter space, and model-free approaches, which require large numbers of 3D ground truths to reduce depth ambiguity and struggle in weakly-supervised scenarios. To overcome these issues, we propose a novel probabilistic model to achieve the robustness of model-based approaches and reduced dependence on the model’s parameter space of model-free approaches. |
Zheheng Jiang; Hossein Rahmani; Sue Black; Bryan M. Williams; |
1571 | MixNeRF: Modeling A Ray With Mixture Density for Novel View Synthesis From Sparse Inputs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MixNeRF, an effective training strategy for novel view synthesis from sparse inputs by modeling a ray with a mixture density model. |
Seunghyeon Seo; Donghoon Han; Yeonjin Chang; Nojun Kwak; |
1572 | A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate large-scale augmentation with synthetic instructions. We take 500+ indoor environments captured in densely-sampled 360 degree panoramas, construct navigation trajectories through these panoramas, and generate a visually-grounded instruction for each trajectory using Marky, a high-quality multilingual navigation instruction generator. |
Aishwarya Kamath; Peter Anderson; Su Wang; Jing Yu Koh; Alexander Ku; Austin Waters; Yinfei Yang; Jason Baldridge; Zarana Parekh; |
1573 | Layout-Based Causal Inference for Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by keeping the positive effect and removing the negative effect of the experience, we propose the layout-based soft Total Direct Effect (L-sTDE) framework based on the causal inference to adjust the prediction of the navigation policy. |
Sixian Zhang; Xinhang Song; Weijie Li; Yubing Bai; Xinyao Yu; Shuqiang Jiang; |
1574 | Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More specifically, existing contrastive learning (CL) tends to learn pose-invariant features that cannot depict the pose details of faces, compromising the learning performance. To conquer the above limitation of CL, we propose a novel Pose-disentangled Contrastive Learning (PCL) method for general self-supervised facial representation. |
Yuanyuan Liu; Wenbin Wang; Yibing Zhan; Shaoze Feng; Kejun Liu; Zhe Chen; |
1575 | Cross-Domain 3D Hand Pose Estimation With Dual Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent advances in hand pose estimation have shed light on utilizing synthetic data to train neural networks, which however inevitably hinders generalization to real-world data due to domain gaps. To solve this problem, we present a framework for cross-domain semi-supervised hand pose estimation and target the challenging scenario of learning models from labelled multi-modal synthetic data and unlabelled real-world data. |
Qiuxia Lin; Linlin Yang; Angela Yao; |
1576 | Attribute-Preserving Face Dataset Anonymization Via Latent Code Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We accordingly present a task-agnostic anonymization procedure that directly optimises the images’ latent representation in the latent space of a pre-trained GAN. |
Simone Barattin; Christos Tzelepis; Ioannis Patras; Nicu Sebe; |
1577 | Inverse Rendering of Translucent Objects Using Physical and Neural Renderers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an inverse rendering model that estimates 3D shape, spatially-varying reflectance, homogeneous subsurface scattering parameters, and an environment illumination jointly from only a pair of captured images of a translucent object. |
Chenhao Li; Trung Thanh Ngo; Hajime Nagahara; |
1578 | Towards Building Self-Aware Object Detectors Via Reliable Uncertainty Quantification and Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The current approach for testing the robustness of object detectors suffers from serious deficiencies such as improper methods of performing out-of-distribution detection and using calibration metrics which do not consider both localisation and classification quality. In this work, we address these issues, and introduce the Self Aware Object Detection (SAOD) task, a unified testing framework which respects and adheres to the challenges that object detectors face in safety-critical environments such as autonomous driving. |
Kemal Oksuz; Tom Joy; Puneet K. Dokania; |
1579 | Ensemble-Based Blackbox Attacks on Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an approach for adversarial attacks on dense prediction models (such as object detectors and segmentation). |
Zikui Cai; Yaoteng Tan; M. Salman Asif; |
1580 | Improving Fairness in Facial Albedo Estimation Via Visual-Textual Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reconsider the relationship between albedo and face attributes and propose an ID2Albedo to directly estimate albedo without constraining illumination. |
Xingyu Ren; Jiankang Deng; Chao Ma; Yichao Yan; Xiaokang Yang; |
1581 | Source-Free Video Domain Adaptation With Spatial-Temporal-Historical Consistency Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple and highly flexible method for Source-Free Video Domain Adaptation (SFVDA), which extensively exploits consistency learning for videos from spatial, temporal, and historical perspectives. |
Kai Li; Deep Patel; Erik Kruus; Martin Renqiang Min; |
1582 | SmartAssign: Learning A Smart Knowledge Assignment Strategy for Deraining and Desnowing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the very correlated rain and snow to explore their connections at deep representation level. |
Yinglong Wang; Chao Ma; Jianzhuang Liu; |
1583 | Delving Into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel normalizing flow on SO(3) by combining a Mobius transformation-based coupling layer and a quaternion affine transformation. |
Yulin Liu; Haoran Liu; Yingda Yin; Yang Wang; Baoquan Chen; He Wang; |
1584 | SfM-TTR: Using Structure From Motion for Test-Time Refinement of Single-View Depth Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we combine the strengths of both approaches by proposing a novel test-time refinement (TTR) method, denoted as SfM-TTR, that boosts the performance of single-view depth networks at test time using SfM multi-view cues. |
Sergio Izquierdo; Javier Civera; |
1585 | Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose ESPER (Extending Sensory PErception with Reinforcement learning) which enables text-only pretrained models to address multimodal tasks such as visual commonsense reasoning. |
Youngjae Yu; Jiwan Chung; Heeseung Yun; Jack Hessel; Jae Sung Park; Ximing Lu; Rowan Zellers; Prithviraj Ammanabrolu; Ronan Le Bras; Gunhee Kim; Yejin Choi; |
1586 | MELTR: Meta Loss Transformer for Learning To Fine-Tune Video Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We formulate the auxiliary learning as a bi-level optimization problem and present an efficient optimization algorithm based on Approximate Implicit Differentiation (AID). |
Dohwan Ko; Joonmyung Choi; Hyeong Kyu Choi; Kyoung-Woon On; Byungseok Roh; Hyunwoo J. Kim; |
1587 | Dense Network Expansion for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity. |
Zhiyuan Hu; Yunsheng Li; Jiancheng Lyu; Dashan Gao; Nuno Vasconcelos; |
1588 | Meta-Personalizing Vision-Language Models To Find Named Instances in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these models allow category-level queries, they currently struggle with personalized searches for moments in a video where a specific object instance such as "My dog Biscuit" appears. We present the following three contributions to address this problem. |
Chun-Hsiao Yeh; Bryan Russell; Josef Sivic; Fabian Caba Heilbron; Simon Jenni; |
1589 | Regularize Implicit Neural Representation By Itself Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a regularizer called Implicit Neural Representation Regularizer (INRR) to improve the generalization ability of the Implicit Neural Representation (INR). |
Zhemin Li; Hongxia Wang; Deyu Meng; |
1590 | Egocentric Audio-Visual Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention. |
Chao Huang; Yapeng Tian; Anurag Kumar; Chenliang Xu; |
1591 | DropKey for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on analyzing and improving the dropout technique for self-attention layers of Vision Transformer, which is important while surprisingly ignored by prior works. |
Bonan Li; Yinhan Hu; Xuecheng Nie; Congying Han; Xiangjian Jiang; Tiande Guo; Luoqi Liu; |
1592 | SRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While most of the deep noise generators proposed to synthesize sRGB real noise using an end-to-end trained model, the lack of explicit noise modeling degrades the quality of their synthesized noise. In this work, we propose to model the real noise as not only dependent on the underlying clean image pixel intensity, but also highly correlated to its neighboring noise realization within the local region. |
Zixuan Fu; Lanqing Guo; Bihan Wen; |
1593 | Meta Architecture for Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take the initiative to explore and propose a unified framework called PointMeta, to which the popular 3D point cloud analysis approaches could fit. |
Haojia Lin; Xiawu Zheng; Lijiang Li; Fei Chao; Shanshan Wang; Yan Wang; Yonghong Tian; Rongrong Ji; |
1594 | Ambiguous Medical Image Segmentation Using Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. |
Aimon Rahman; Jeya Maria Jose Valanarasu; Ilker Hacihaliloglu; Vishal M. Patel; |
1595 | CIRCLE: Capture in Rich Contextual Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world while being motion captured in the real world. |
João Pedro Araújo; Jiaman Li; Karthik Vetrivel; Rishi Agarwal; Jiajun Wu; Deepak Gopinath; Alexander William Clegg; Karen Liu; |
1596 | Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the weak-to-strong consistency framework, popularized by FixMatch from semi-supervised classification, where the prediction of a weakly perturbed image serves as supervision for its strongly perturbed version. |
Lihe Yang; Lei Qi; Litong Feng; Wayne Zhang; Yinghuan Shi; |
1597 | Implicit View-Time Interpolation of Stereo Videos Using Multi-Plane Disparities and Non-Uniform Coordinates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an approach for view-time interpolation of stereo videos. |
Avinash Paliwal; Andrii Tsarov; Nima Khademi Kalantari; |
1598 | PyPose: A Library for Robot Learning With Physics-Based Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage of these two complementary worlds, we present PyPose: a robotics-oriented, PyTorch-based library that combines deep perceptual models with physics-based optimization. |
Chen Wang; Dasong Gao; Kuan Xu; Junyi Geng; Yaoyu Hu; Yuheng Qiu; Bowen Li; Fan Yang; Brady Moon; Abhinav Pandey; Aryan; Jiahe Xu; Tianhao Wu; Haonan He; Daning Huang; Zhongqiang Ren; Shibo Zhao; Taimeng Fu; Pranay Reddy; Xiao Lin; Wenshan Wang; Jingnan Shi; Rajat Talak; Kun Cao; Yi Du; Han Wang; Huai Yu; Shanzhao Wang; Siyu Chen; Ananth Kashyap; Rohan Bandaru; Karthik Dantu; Jiajun Wu; Lihua Xie; Luca Carlone; Marco Hutter; Sebastian Scherer; |
1599 | Make Landscape Flatter in Differentially Private Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing DPFL methods tend to make a sharper loss landscape and have poorer weight perturbation robustness, resulting in severe performance degradation. To alleviate these issues, we propose a novel DPFL algorithm named DP-FedSAM, which leverages gradient perturbation to mitigate the negative impact of DP. |
Yifan Shi; Yingqi Liu; Kang Wei; Li Shen; Xueqian Wang; Dacheng Tao; |
1600 | BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. |
Changdae Oh; Hyeji Hwang; Hee-young Lee; YongTaek Lim; Geunyoung Jung; Jiyoung Jung; Hosik Choi; Kyungwoo Song; |
1601 | DeepVecFont-v2: Exploiting Transformers To Synthesize Vector Fonts With Higher Quality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, vector glyphs synthesized by DeepVecFont still often contain some distortions and artifacts and cannot rival human-designed results. To address the above problems, this paper proposes an enhanced version of DeepVecFont mainly by making the following three novel technical contributions. First, we adopt Transformers instead of RNNs to process sequential data and design a relaxation representation for vector outlines, markedly improving the model’s capability and stability of synthesizing long and complex outlines. |
Yuqing Wang; Yizhi Wang; Longhui Yu; Yuesheng Zhu; Zhouhui Lian; |
1602 | PCON: Polarimetric Coordinate Networks for Neural Scene Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose polarimetric coordinate networks (pCON), a new model architecture for neural scene representations aimed at preserving polarimetric information while accurately parameterizing the scene. |
Henry Peters; Yunhao Ba; Achuta Kadambi; |
1603 | Soft-Landing Strategy for Alleviating The Task Discrepancy Problem in Temporal Action Localization Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce Soft-Landing (SoLa) strategy, an efficient yet effective framework to bridge the transferability gap between the pretrained encoder and the downstream tasks by incorporating a light-weight neural network, i.e., a SoLa module, on top of the frozen encoder. |
Hyolim Kang; Hanjung Kim; Joungbin An; Minsu Cho; Seon Joo Kim; |
1604 | Visibility Aware Human-Object Interaction Tracking From Single RGB Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to track the 3D human, object, contacts, and relative translation across frames from a single RGB camera, while being robust to heavy occlusions. |
Xianghui Xie; Bharat Lal Bhatnagar; Gerard Pons-Moll; |
1605 | Uncertainty-Aware Vision-Based Metric Cross-View Geolocalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel method for vision-based metric cross-view geolocalization (CVGL) that matches the camera images captured from a ground-based vehicle with an aerial image to determine the vehicle’s geo-pose. |
Florian Fervers; Sebastian Bullinger; Christoph Bodensteiner; Michael Arens; Rainer Stiefelhagen; |
1606 | DANI-Net: Uncalibrated Photometric Stereo By Differentiable Shadow Handling, Anisotropic Reflectance Modeling, and Neural Inverse Rendering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To exploit cues from shadow and reflectance to solve UPS and improve performance on general materials, we propose DANI-Net, an inverse rendering framework with differentiable shadow handling and anisotropic reflectance modeling. |
Zongrui Li; Qian Zheng; Boxin Shi; Gang Pan; Xudong Jiang; |
1607 | Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on unsupervised model adaptation (UMA), also called source-free domain adaptation, which adapts a source-trained model to the target domain without accessing source data. |
Dong Zhao; Shuang Wang; Qi Zang; Dou Quan; Xiutiao Ye; Licheng Jiao; |
1608 | Continuous Landmark Detection With 3D Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first facial landmark detection network that can predict continuous, unlimited landmarks, allowing to specify the number and location of the desired landmarks at inference time. |
Prashanth Chandran; Gaspard Zoss; Paulo Gotardo; Derek Bradley; |
1609 | Ranking Regularization for Critical Rare Classes: Minimizing False Positives at A High True Positive Rate Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel approach to address the challenge of minimizing false positives for systems that need to operate at a high true positive rate. |
Kiarash Mohammadi; He Zhao; Mengyao Zhai; Frederick Tung; |
1610 | Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a space decoupling (SD) algorithm to decouple the feature space into a pair of complementary subspaces, i.e., the stability space I, and the plasticity space R. I is established by conducting space intersection between the historic and current feature space, and thus I contains more task-shared bases. |
Zhen Zhao; Zhizhong Zhang; Xin Tan; Jun Liu; Yanyun Qu; Yuan Xie; Lizhuang Ma; |
1611 | Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop, for the first time to our best knowledge, an HDR image dataset by using mobile phone cameras, namely Mobile-HDR dataset. |
Shuaizheng Liu; Xindong Zhang; Lingchen Sun; Zhetong Liang; Hui Zeng; Lei Zhang; |
1612 | FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This inefficiency comes from point clouds’ sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. |
Zhijian Liu; Xinyu Yang; Haotian Tang; Shang Yang; Song Han; |
1613 | Unbiased Scene Graph Generation in Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This often leads to the generation of biased scene graphs. To address these challenges, we introduce a new framework called TEMPURA: TEmporal consistency and Memory Prototype guided UnceRtainty Attenuation for unbiased dynamic SGG. |
Sayak Nag; Kyle Min; Subarna Tripathi; Amit K. Roy-Chowdhury; |
1614 | Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Due to the inadequate information interaction with image content, the extracted frequency features are thus spatially irrelavant, struggling to generalize well on increasingly realistic counterfeit types. To address this issue, we propose a Spatial-Frequency Dynamic Graph method to exploit the relation-aware features in spatial and frequency domains via dynamic graph learning. |
Yuan Wang; Kun Yu; Chen Chen; Xiyuan Hu; Silong Peng; |
1615 | Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models to gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. |
Ming Y. Lu; Bowen Chen; Andrew Zhang; Drew F. K. Williamson; Richard J. Chen; Tong Ding; Long Phi Le; Yung-Sung Chuang; Faisal Mahmood; |
1616 | MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new model named Multi-modal Iterative Spatial-temporal Transformer (MIST) to better adapt pre-trained models for long-form VideoQA. |
Difei Gao; Luowei Zhou; Lei Ji; Linchao Zhu; Yi Yang; Mike Zheng Shou; |
1617 | PMR: Prototypical Modal Rebalance for Multimodal Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better exploit the features of multimodal, we propose Prototypical Modality Rebalance (PMR) to perform stimulation on the particular slow-learning modality without interference from other modalities. |
Yunfeng Fan; Wenchao Xu; Haozhao Wang; Junxiao Wang; Song Guo; |
1618 | Two-Stage Co-Segmentation Network Based on Discriminative Representation for Recovering Human Mesh From Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of the existing methods focus on the temporal consistency of videos, while ignoring the spatial representation in complex scenes, thus failing to recover a reasonable and smooth human mesh sequence under extreme illumination and chaotic backgrounds.To alleviate this problem, we propose a two-stage co-segmentation network based on discriminative representation for recovering human body meshes from videos. |
Boyang Zhang; Kehua Ma; Suping Wu; Zhixiang Yuan; |
1619 | Multi-Sensor Large-Scale Dataset for Multi-View 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new multi-sensor dataset for multi-view 3D surface reconstruction. |
Oleg Voynov; Gleb Bobrovskikh; Pavel Karpyshev; Saveliy Galochkin; Andrei-Timotei Ardelean; Arseniy Bozhenko; Ekaterina Karmanova; Pavel Kopanev; Yaroslav Labutin-Rymsho; Ruslan Rakhimov; Aleksandr Safin; Valerii Serpiva; Alexey Artemov; Evgeny Burnaev; Dzmitry Tsetserukou; Denis Zorin; |
1620 | Privacy-Preserving Representations Are Not Enough: Recovering Scene Content From Camera Poses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that an attacker can learn about details of a scene without any access by simply querying a localization service. |
Kunal Chelani; Torsten Sattler; Fredrik Kahl; Zuzana Kukelova; |
1621 | Learning Anchor Transformations for 3D Garment Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an anchor-based deformation model, namely AnchorDEF, to predict 3D garment animation from a body motion sequence. |
Fang Zhao; Zekun Li; Shaoli Huang; Junwu Weng; Tianfei Zhou; Guo-Sen Xie; Jue Wang; Ying Shan; |
1622 | Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To realize the adaptive action modeling of both parts, we propose an Actionlet-Dependent Contrastive Learning method (ActCLR). |
Lilang Lin; Jiahang Zhang; Jiaying Liu; |
1623 | Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current 3D scene stylization methods transfer textures and colors as styles using arbitrary style references, lacking meaningful semantic correspondences. We introduce Reference-Based Non-Photorealistic Radiance Fields (Ref-NPR) to address this limitation. |
Yuechen Zhang; Zexin He; Jinbo Xing; Xufeng Yao; Jiaya Jia; |
1624 | PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose PanoHead, the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360deg with diverse appearance and detailed geometry using only in-the-wild unstructured images for training. |
Sizhe An; Hongyi Xu; Yichun Shi; Guoxian Song; Umit Y. Ogras; Linjie Luo; |
1625 | Rethinking Feature-Based Knowledge Distillation for Face Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we attempt to remove identity supervision in student training, to spare the GPU memory from saving massive class centers. |
Jingzhi Li; Zidong Guo; Hui Li; Seungju Han; Ji-won Baek; Min Yang; Ran Yang; Sungjoo Suh; |
1626 | NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present NeurOCS, a framework that uses instance masks and 3D boxes as input to learn 3D object shapes by means of differentiable rendering, which further serves as supervision for learning dense object coordinates. |
Zhixiang Min; Bingbing Zhuang; Samuel Schulter; Buyu Liu; Enrique Dunn; Manmohan Chandraker; |
1627 | Tree Instance Segmentation With Temporal Contour Graph Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel approach to perform instance segmentation, and counting, for densely packed self-similar trees using a top-view RGB image sequence. |
Adnan Firoze; Cameron Wingren; Raymond A. Yeh; Bedrich Benes; Daniel Aliaga; |
1628 | A New Dataset Based on Images Taken By Blind People for Testing The Robustness of Image Classification Models Trained for ImageNet Categories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform well on images from another domain. |
Reza Akbarian Bafghi; Danna Gurari; |
1629 | Detecting Backdoors During The Inference Stage Based on Corruption Robustness Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the test-time corruption robustness consistency evaluation (TeCo), a novel test-time trigger sample detection method that only needs the hard-label outputs of the victim models without any extra information. |
Xiaogeng Liu; Minghui Li; Haoyu Wang; Shengshan Hu; Dengpan Ye; Hai Jin; Libing Wu; Chaowei Xiao; |
1630 | Black-Box Sparse Adversarial Attack Via Multi-Objective Optimisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, other methods that limit the number of modified pixels often permit unbounded modifications, making them easily detectable. To address these limitations, we propose a novel multi-objective sparse attack algorithm that efficiently minimizes the number of modified pixels and their size during the attack process. |
Phoenix Neale Williams; Ke Li; |
1631 | Renderable Neural Radiance Map for Visual Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel type of map for visual navigation, a renderable neural radiance map (RNR-Map), which is designed to contain the overall visual information of a 3D environment. |
Obin Kwon; Jeongho Park; Songhwai Oh; |
1632 | Revisiting Reverse Distillation for Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Another approach that employs Reversed Distillation (RD) can perform well while maintaining low latency. In this paper, we revisit this idea to improve its performance, establishing a new state-of-the-art benchmark on the challenging MVTec dataset for both anomaly detection and localization. |
Tran Dinh Tien; Anh Tuan Nguyen; Nguyen Hoang Tran; Ta Duc Huy; Soan T.M. Duong; Chanh D. Tr. Nguyen; Steven Q. H. Truong; |
1633 | Diffusion-Based Generation, Optimization, and Planning in 3D Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce SceneDiffuser, a conditional generative model for 3D scene understanding. |
Siyuan Huang; Zan Wang; Puhao Li; Baoxiong Jia; Tengyu Liu; Yixin Zhu; Wei Liang; Song-Chun Zhu; |
1634 | TMO: Textured Mesh Acquisition of Objects With A Mobile Device By Using Differentiable Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. |
Jaehoon Choi; Dongki Jung; Taejae Lee; Sangwook Kim; Youngdong Jung; Dinesh Manocha; Donghwan Lee; |
1635 | Meta-Causal Learning for Single Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new learning paradigm, namely simulate-analyze-reduce, which first simulates the domain shift by building an auxiliary domain as the target domain, then learns to analyze the causes of domain shift, and finally learns to reduce the domain shift for model adaptation. Under this paradigm, we propose a meta-causal learning method to learn meta-knowledge, that is, how to infer the causes of domain shift between the auxiliary and source domains during training. |
Jin Chen; Zhi Gao; Xinxiao Wu; Jiebo Luo; |
1636 | Grad-PU: Arbitrary-Scale Point Cloud Upsampling Via Gradient Descent With Learned Distance Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To adress them, we propose a new framework for accurate point cloud upsampling that supports arbitrary upsampling rates. |
Yun He; Danhang Tang; Yinda Zhang; Xiangyang Xue; Yanwei Fu; |
1637 | Trainable Projected Gradient Method for Robust Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, most of these methods employ manually crafted heuristics or expensive hyper-parameter search, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. |
Junjiao Tian; Zecheng He; Xiaoliang Dai; Chih-Yao Ma; Yen-Cheng Liu; Zsolt Kira; |
1638 | Text2Scene: Text-Driven Indoor Scene Stylization With Part-Aware Details Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Text2Scene, a method to automatically create realistic textures for virtual scenes composed of multiple objects. |
Inwoo Hwang; Hyeonwoo Kim; Young Min Kim; |
1639 | FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-Tail Trajectory Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on dealing with the long-tail phenomenon in trajectory prediction. |
Yuning Wang; Pu Zhang; Lei Bai; Jianru Xue; |
1640 | MP-Former: Mask-Piloted Transformer for Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The improvement is based on our observation that Mask2Former suffers from inconsistent mask predictions between consecutive decoder layers, which leads to inconsistent optimization goals and low utilization of decoder queries. To address this problem, we propose a mask-piloted training approach, which additionally feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones. |
Hao Zhang; Feng Li; Huaizhe Xu; Shijia Huang; Shilong Liu; Lionel M. Ni; Lei Zhang; |
1641 | HDR Imaging With Spatially Varying Signal-to-Noise Ratios Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new exposure-shared block within our custom-designed multi-scale transformer framework. |
Yiheng Chi; Xingguang Zhang; Stanley H. Chan; |
1642 | Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new idea of leveraging Projection onto Orthogonal Prototypes (POP), which updates features to identify novel classes without compromising base classes. |
Sun-Ao Liu; Yiheng Zhang; Zhaofan Qiu; Hongtao Xie; Yongdong Zhang; Ting Yao; |
1643 | TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. |
Jiacheng Wei; Hao Wang; Jiashi Feng; Guosheng Lin; Kim-Hui Yap; |
1644 | Are Deep Neural Networks SMARTer Than Second Graders? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6–8 age group. |
Anoop Cherian; Kuan-Chuan Peng; Suhas Lohit; Kevin A. Smith; Joshua B. Tenenbaum; |
1645 | Reliability in Semantic Segmentation: Are We on The Right Track? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze a broad variety of models, spanning from older ResNet-based architectures to novel transformers and assess their reliability based on four metrics: robustness, calibration, misclassification detection and out-of-distribution (OOD) detection. |
Pau de Jorge; Riccardo Volpi; Philip H.S. Torr; Grégory Rogez; |
1646 | Video Test-Time Adaptation for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. |
Wei Lin; Muhammad Jehanzeb Mirza; Mateusz Kozinski; Horst Possegger; Hilde Kuehne; Horst Bischof; |
1647 | Bi-Level Meta-Learning for Few-Shot Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of Few-shot domain generalization (FSDG), which is a more challenging variant of few-shot classification. |
Xiaorong Qin; Xinhang Song; Shuqiang Jiang; |
1648 | Tensor4D: Efficient Neural 4D Decomposition for High-Fidelity Dynamic Reconstruction and Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Tensor4D, an efficient yet effective approach to dynamic scene modeling. |
Ruizhi Shao; Zerong Zheng; Hanzhang Tu; Boning Liu; Hongwen Zhang; Yebin Liu; |
1649 | Blowing in The Wind: CycleNet for Human Cinemagraphs From Still Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. |
Hugo Bertiche; Niloy J. Mitra; Kuldeep Kulkarni; Chun-Hao P. Huang; Tuanfeng Y. Wang; Meysam Madadi; Sergio Escalera; Duygu Ceylan; |
1650 | Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. |
Ziqian Bai; Feitong Tan; Zeng Huang; Kripasindhu Sarkar; Danhang Tang; Di Qiu; Abhimitra Meka; Ruofei Du; Mingsong Dou; Sergio Orts-Escolano; Rohit Pandey; Ping Tan; Thabo Beeler; Sean Fanello; Yinda Zhang; |
1651 | Multi-Modal Learning With Missing Modality Via Shared-Specific Feature Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. |
Hu Wang; Yuanhong Chen; Congbo Ma; Jodie Avery; Louise Hull; Gustavo Carneiro; |
1652 | Panoptic Compositional Feature Field for Editable Scene Rendering With Network-Inferred Labels Via Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework named Panoptic Compositional Feature Field (PCFF), which introduces an instance quadruplet metric learning to build a discriminating panoptic feature space for reliable scene editing. |
Xinhua Cheng; Yanmin Wu; Mengxi Jia; Qian Wang; Jian Zhang; |
1653 | Progressive Backdoor Erasing Via Connecting Backdoor and Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep neural networks (DNNs) are known to be vulnerable to both backdoor attacks as well as adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, in this paper we find an intriguing connection between them: for a model planted with backdoors, we observe that its adversarial examples have similar behaviors as its triggered samples, i.e., both activate the same subset of DNN neurons. |
Bingxu Mu; Zhenxing Niu; Le Wang; Xue Wang; Qiguang Miao; Rong Jin; Gang Hua; |
1654 | LayoutFormer++: Conditional Graphic Layout Generation Via Constraint Serialization and Decoding Space Restriction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LayoutFormer++ to tackle the above problems. First, to flexibly handle diverse constraints, we propose a constraint serialization scheme, which represents different user constraints as sequences of tokens with a predefined format. Then, we formulate conditional layout generation as a sequence-to-sequence transformation, and leverage encoder-decoder framework with Transformer as the basic architecture. |
Zhaoyun Jiang; Jiaqi Guo; Shizhao Sun; Huayu Deng; Zhongkai Wu; Vuksan Mijovic; Zijiang James Yang; Jian-Guang Lou; Dongmei Zhang; |
1655 | DisWOT: Student Architecture Search for Distillation WithOut Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to previous adaptive distillation methods to reduce the teacher-student gap, we explore a novel training-free framework to search for the best student architectures for a given teacher. |
Peijie Dong; Lujun Li; Zimian Wei; |
1656 | Stare at What You See: Masked Image Modeling Without Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient MIM paradigm named MaskAlign. |
Hongwei Xue; Peng Gao; Hongyang Li; Yu Qiao; Hao Sun; Houqiang Li; Jiebo Luo; |
1657 | Joint Visual Grounding and Tracking With Natural Language Specification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, the separated framework can hardly be trained end-to-end. To handle these issues, we propose a joint visual grounding and tracking framework, which reformulates grounding and tracking as a unified task: localizing the referred target based on the given visual-language references. |
Li Zhou; Zikun Zhou; Kaige Mao; Zhenyu He; |
1658 | Neural Kaleidoscopic Space Sculpting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method that recovers full-surround 3D reconstructions from a single kaleidoscopic image using a neural surface representation. |
Byeongjoo Ahn; Michael De Zeeuw; Ioannis Gkioulekas; Aswin C. Sankaranarayanan; |
1659 | Few-Shot Semantic Image Synthesis With Class Affinity Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate the high annotation cost, we propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets via estimated pairwise relations between source and target classes. |
Marlène Careil; Jakob Verbeek; Stéphane Lathuilière; |
1660 | Implicit Identity Driven Deepfake Face Swapping Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the face swapping detection from the perspective of face identity. |
Baojin Huang; Zhongyuan Wang; Jifan Yang; Jiaxin Ai; Qin Zou; Qian Wang; Dengpan Ye; |
1661 | Class Relationship Embedded Learning for Source-Free Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fully utilize source knowledge, we propose to transfer the class relationship, which is domain-invariant but still under-explored in previous works. |
Yixin Zhang; Zilei Wang; Weinan He; |
1662 | Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a logically consistent prediction loss, LCPLoss, to aid learning of logical consistency across attributes, and also a label compensation training strategy to eliminate the problem of no positive prediction across a set of related attributes. |
Haiyu Wu; Grace Bezold; Aman Bhatta; Kevin W. Bowyer; |
1663 | One-to-Few Label Assignment for End-to-End Dense Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a simple yet effective one-to-few (o2f) label assignment strategy for end-to-end dense detection. |
Shuai Li; Minghan Li; Ruihuang Li; Chenhang He; Lei Zhang; |
1664 | Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose Spatio-Temporal Pixel-Level (STPL) contrastive learning, a novel method that takes full advantage of spatio-temporal information to tackle the absence of source data better. |
Shao-Yuan Lo; Poojan Oza; Sumanth Chennupati; Alejandro Galindo; Vishal M. Patel; |
1665 | InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. |
Wenhai Wang; Jifeng Dai; Zhe Chen; Zhenhang Huang; Zhiqi Li; Xizhou Zhu; Xiaowei Hu; Tong Lu; Lewei Lu; Hongsheng Li; Xiaogang Wang; Yu Qiao; |
1666 | DAA: A Delta Age AdaIN Operation for Age Estimation Via Binary Code Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the transfer learning, we designed the Delta Age AdaIN (DAA) operation to obtain the feature difference with each age, which obtains the style map of each age through the learned values representing the mean and standard deviation. |
Ping Chen; Xingpeng Zhang; Ye Li; Ju Tao; Bin Xiao; Bing Wang; Zongjie Jiang; |
1667 | Fake It Till You Make It: Learning Transferable Representations From Synthetic ImageNet Clones Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by investigating the need for real images when training models for ImageNet classification. |
Mert Bülent Sarıyıldız; Karteek Alahari; Diane Larlus; Yannis Kalantidis; |
1668 | Mind The Label Shift of Augmentation-Based Graph OOD Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This causes the label shift in augmentations and brings inconsistent predictive relationships among augmented environments. To address this issue, we propose LiSA, which generates label-invariant augmentations to facilitate graph OOD generalization. |
Junchi Yu; Jian Liang; Ran He; |
1669 | Unsupervised Intrinsic Image Decomposition With LiDAR Intensity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose unsupervised intrinsic image decomposition with LiDAR intensity (IID-LI). |
Shogo Sato; Yasuhiro Yao; Taiga Yoshida; Takuhiro Kaneko; Shingo Ando; Jun Shimamura; |
1670 | HIER: Metric Learning Beyond Class Labels Via Hierarchical Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances in the field. In this regard, we propose a new regularization method, dubbed HIER, to discover the latent semantic hierarchy of training data, and to deploy the hierarchy to provide richer and more fine-grained supervision than inter-class separability induced by common metric learning losses. |
Sungyeon Kim; Boseung Jeong; Suha Kwak; |
1671 | Diffusion Probabilistic Model Made Slim Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and size. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. |
Xingyi Yang; Daquan Zhou; Jiashi Feng; Xinchao Wang; |
1672 | Confidence-Aware Personalized Federated Learning Via Variational Expectation Maximization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a novel framework for PFL based on hierarchical Bayesian modeling and variational inference. |
Junyi Zhu; Xingchen Ma; Matthew B. Blaschko; |
1673 | Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, the data augmentation of the point cloud in the typical teacher-student framework is too weak, and only contains basic down sampling and flip-and-shift (i.e., rotate and scaling), which hinders the effective learning of feature information. Hence, we address these issues by introducing a novel approach of Hierarchical Supervision and Shuffle Data Augmentation (HSSDA), which is a simple yet effective teacher-student framework. |
Chuandong Liu; Chenqiang Gao; Fangcen Liu; Pengcheng Li; Deyu Meng; Xinbo Gao; |
1674 | Interactive and Explainable Region-Guided Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple yet effective region-guided report generation model that detects anatomical regions and then describes individual, salient regions to form the final report. |
Tim Tanida; Philip Müller; Georgios Kaissis; Daniel Rueckert; |
1675 | MED-VT: Multiscale Encoder-Decoder Video Transformer With Application To Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified multiscale encoder-decoder transformer that is focused on dense prediction tasks in videos. |
Rezaul Karim; He Zhao; Richard P. Wildes; Mennatullah Siam; |
1676 | PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We build on the successful recent method NeuS to extend it by three new components. |
Yiqun Wang; Ivan Skorokhodov; Peter Wonka; |
1677 | ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we pursue a simpler-and-efficient one-stage solution that directly extends CLIP’s zero-shot prediction capability from image to pixel level. |
Ziqin Zhou; Yinjie Lei; Bowen Zhang; Lingqiao Liu; Yifan Liu; |
1678 | AdaptiveMix: Improving GAN Training Via Feature Space Shrinkage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of training GANs from a novel perspective, i.e., robust image classification. |
Haozhe Liu; Wentian Zhang; Bing Li; Haoqian Wu; Nanjun He; Yawen Huang; Yuexiang Li; Bernard Ghanem; Yefeng Zheng; |
1679 | Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models To Learn Any Unseen Style Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Diffusion models have demonstrated impressive capability of text-conditioned image synthesis, and broader application horizons are emerging by personalizing those pretrained diffusion models toward generating some specialized target object or style. In this paper, we aim to learn an unseen style by simply fine-tuning a pre-trained diffusion model with a handful of images (e.g., less than 10), so that the fine-tuned model can generate high-quality images of arbitrary objects in this style. |
Haoming Lu; Hazarapet Tunanyan; Kai Wang; Shant Navasardyan; Zhangyang Wang; Humphrey Shi; |
1680 | Benchmarking Self-Supervised Learning on Diverse Pathology Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. |
Mingu Kang; Heon Song; Seonwook Park; Donggeun Yoo; Sérgio Pereira; |
1681 | Planning-Oriented Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Unified Autonomous Driving (UniAD), a comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. |
Yihan Hu; Jiazhi Yang; Li Chen; Keyu Li; Chonghao Sima; Xizhou Zhu; Siqi Chai; Senyao Du; Tianwei Lin; Wenhai Wang; Lewei Lu; Xiaosong Jia; Qiang Liu; Jifeng Dai; Yu Qiao; Hongyang Li; |
1682 | HyperCUT: Video Sequence From A Single Blurry Image Using Unsupervised Ordering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an effective self-supervised ordering scheme that allows training high-quality image-to-video deblurring models. |
Bang-Dang Pham; Phong Tran; Anh Tran; Cuong Pham; Rang Nguyen; Minh Hoai; |
1683 | Can’t Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first instantiate the conventional stealing attacks against encoders and demonstrate their severer vulnerability compared with downstream classifiers. |
Zeyang Sha; Xinlei He; Ning Yu; Michael Backes; Yang Zhang; |
1684 | Document Image Shadow Removal Guided By Color-Aware Background Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a color-aware background extraction network (CBENet) for extracting a spatially varying background image that accurately depicts the background colors of the document. |
Ling Zhang; Yinghao He; Qing Zhang; Zheng Liu; Xiaolong Zhang; Chunxia Xiao; |
1685 | Independent Component Alignment for Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose using a condition number of a linear system of gradients as a stability criterion of an MTL optimization. |
Dmitry Senushkin; Nikolay Patakin; Arseny Kuznetsov; Anton Konushin; |
1686 | Edges to Shapes to Concepts: Adversarial Augmentation for Robust Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes for accurate prediction in an object classification setting. |
Aditay Tripathi; Rishubh Singh; Anirban Chakraborty; Pradeep Shenoy; |
1687 | ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subjects and study Generalized Speech Regeneration, where the goal is not to reconstruct the exact reference clean signal, but to focus on improving certain aspects of speech while not necessarily preserving the rest such as voice. |
Wei-Ning Hsu; Tal Remez; Bowen Shi; Jacob Donley; Yossi Adi; |
1688 | Improved Distribution Matching for Dataset Condensation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel dataset condensation method based on distribution matching, which is more efficient and promising. |
Ganlong Zhao; Guanbin Li; Yipeng Qin; Yizhou Yu; |
1689 | Feature Separation and Recalibration for Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel, easy-to-plugin approach named Feature Separation and Recalibration (FSR) that recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration. |
Woo Jae Kim; Yoonki Cho; Junsik Jung; Sung-Eui Yoon; |
1690 | Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation From 2D Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nerflets are our key contribution– a set of local neural radiance fields that together represent a scene. |
Xiaoshuai Zhang; Abhijit Kundu; Thomas Funkhouser; Leonidas Guibas; Hao Su; Kyle Genova; |
1691 | CLIP Is Also An Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the potential of Contrastive Language-Image Pre-training models (CLIP) to localize different categories with only image-level labels and without further training. |
Yuqi Lin; Minghao Chen; Wenxiao Wang; Boxi Wu; Ke Li; Binbin Lin; Haifeng Liu; Xiaofei He; |
1692 | Slimmable Dataset Condensation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose a novel training objective for slimmable dataset condensation to explicitly account for both factors. |
Songhua Liu; Jingwen Ye; Runpeng Yu; Xinchao Wang; |
1693 | Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although pixel-shuffle downsampling has been suggested for breaking the noise correlation, it breaks the original information of images, which limits the denoising performance. In this paper, we propose a novel perspective to solve this problem, i.e., seeking for spatially adaptive supervision for real-world sRGB image denoising. |
Junyi Li; Zhilu Zhang; Xiaoyu Liu; Chaoyu Feng; Xiaotao Wang; Lei Lei; Wangmeng Zuo; |
1694 | Data-Free Knowledge Distillation Via Feature Exchange and Activation Region Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is naive to expect that a simple combination of generative network-based data synthesis and data augmentation will solve these issues. Therefore, this paper proposes a novel data-free knowledge distillation method (SpaceshipNet) based on channel-wise feature exchange (CFE) and multi-scale spatial activation region consistency (mSARC) constraint. |
Shikang Yu; Jiachen Chen; Hu Han; Shuqiang Jiang; |
1695 | CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes From Natural Language Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. |
Aditya Sanghi; Rao Fu; Vivian Liu; Karl D.D. Willis; Hooman Shayani; Amir H. Khasahmadi; Srinath Sridhar; Daniel Ritchie; |
1696 | Mask-Free Video Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to remove the mask-annotation requirement. |
Lei Ke; Martin Danelljan; Henghui Ding; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu; |
1697 | Continual Detection Transformer for Incremental Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, KD and ER do not work well if applied directly to state-of-the-art transformer-based object detectors such as Deformable DETR and UP-DETR. In this paper, we solve these issues by proposing a ContinuaL DEtection TRansformer (CL-DETR), a new method for transformer-based IOD which enables effective usage of KD and ER in this context. |
Yaoyao Liu; Bernt Schiele; Andrea Vedaldi; Christian Rupprecht; |
1698 | Two-Stream Networks for Weakly-Supervised Temporal Action Localization With Semantic-Aware Mechanisms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we hypothesize that snippets with similar representations should be considered as the same action class despite the absence of supervision signals on each snippet. |
Yu Wang; Yadong Li; Hongbin Wang; |
1699 | HyperMatch: Noise-Tolerant Semi-Supervised Learning Via Relaxed Contrastive Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, mismatched instance pairs caused by inaccurate pseudo labels would assign an unlabeled instance to the incorrect class in feature space, hence exacerbating SSL’s renowned confirmation bias. To address this issue, we introduced a novel SSL approach, HyperMatch, which is a plug-in to several SSL designs enabling noise-tolerant utilization of unlabeled data. |
Beitong Zhou; Jing Lu; Kerui Liu; Yunlu Xu; Zhanzhan Cheng; Yi Niu; |
1700 | From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: End-to-end training on vision and language data may bridge the disconnections, but is inflexible and computationally expensive. To address this issue, we propose Img2Prompt, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training. |
Jiaxian Guo; Junnan Li; Dongxu Li; Anthony Meng Huat Tiong; Boyang Li; Dacheng Tao; Steven Hoi; |
1701 | LEGO-Net: Learning Regular Rearrangements of Objects in Rooms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present LEGO-Net, a data-driven transformer-based iterative method for LEarning reGular rearrangement of Objects in messy rooms. |
Qiuhong Anna Wei; Sijie Ding; Jeong Joon Park; Rahul Sajnani; Adrien Poulenard; Srinath Sridhar; Leonidas Guibas; |
1702 | FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show the strong potential of query-based models on efficient instance segmentation algorithm designs. |
Junjie He; Pengyu Li; Yifeng Geng; Xuansong Xie; |
1703 | Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that a basic Kalman filter can still obtain state-of-the-art tracking performance if proper care is taken to fix the noise accumulated during occlusion. |
Jinkun Cao; Jiangmiao Pang; Xinshuo Weng; Rawal Khirodkar; Kris Kitani; |
1704 | Multi-View Azimuth Stereo Via Tangent Space Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a method for 3D reconstruction only using calibrated multi-view surface azimuth maps. |
Xu Cao; Hiroaki Santo; Fumio Okura; Yasuyuki Matsushita; |
1705 | VectorFusion: Text-to-SVG By Abstracting Pixel-Based Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics. |
Ajay Jain; Amber Xie; Pieter Abbeel; |
1706 | The Dialog Must Go On: Improving Visual Dialog Via Generative Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a semi-supervised learning approach for visually-grounded dialog, called Generative Self-Training (GST), to leverage unlabeled images on the Web. |
Gi-Cheon Kang; Sungdong Kim; Jin-Hwa Kim; Donghyun Kwak; Byoung-Tak Zhang; |
1707 | Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis. |
Xiuwei Xu; Ziwei Wang; Jie Zhou; Jiwen Lu; |
1708 | Transformer-Based Learned Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new approach to learned optimization where we represent the computation of an optimizer’s update step using a neural network. |
Erik Gärtner; Luke Metz; Mykhaylo Andriluka; C. Daniel Freeman; Cristian Sminchisescu; |
1709 | Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. |
Gowthami Somepalli; Vasu Singla; Micah Goldblum; Jonas Geiping; Tom Goldstein; |
1710 | Neuralizer: General Neuroimage Analysis Without Re-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Neuralizer, a single model that generalizes to previously unseen neuroimaging tasks and modalities without the need for re-training or fine-tuning. |
Steffen Czolbe; Adrian V. Dalca; |
1711 | Quantum-Inspired Spectral-Spatial Pyramid Network for Hyperspectral Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the quantum circuit (QC) model, we propose a quantum-inspired spectral-spatial network (QSSN) for HSI feature extraction. |
Jie Zhang; Yongshan Zhang; Yicong Zhou; |
1712 | Towards Benchmarking and Assessing Visual Naturalness of Physical World Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take the first step to benchmark and assess visual naturalness of physical world attacks, taking autonomous driving scenario as the first attempt. |
Simin Li; Shuning Zhang; Gujun Chen; Dong Wang; Pu Feng; Jiakai Wang; Aishan Liu; Xin Yi; Xianglong Liu; |
1713 | Visual Prompt Multi-Modal Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multimodal tracking tasks. |
Jiawen Zhu; Simiao Lai; Xin Chen; Dong Wang; Huchuan Lu; |
1714 | Self-Supervised Representation Learning for CAD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes to leverage unlabeled CAD geometry on supervised learning tasks. |
Benjamin T. Jones; Michael Hu; Milin Kodnongbua; Vladimir G. Kim; Adriana Schulz; |
1715 | DETRs With Hybrid Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a simple yet effective method based on a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training. |
Ding Jia; Yuhui Yuan; Haodi He; Xiaopei Wu; Haojun Yu; Weihong Lin; Lei Sun; Chao Zhang; Han Hu; |
1716 | Dealing With Cross-Task Class Discrimination in Online Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A novel optimization objective with a gradient-based adaptive method is proposed to dynamically deal with the problem in the online CL process. |
Yiduo Guo; Bing Liu; Dongyan Zhao; |
1717 | Angelic Patches for Improving Third-Party Object Detector Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore whether we can adopt the characteristics of adversarial attack methods to help improve perturbation robustness for object detection. |
Wenwen Si; Shuo Li; Sangdon Park; Insup Lee; Osbert Bastani; |
1718 | UniDexGrasp: Universal Robotic Dexterous Grasping Via Learning Diverse Proposal Generation and Goal-Conditioned Policy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we tackle the problem of learning universal robotic dexterous grasping from a point cloud observation under a table-top setting. |
Yinzhen Xu; Weikang Wan; Jialiang Zhang; Haoran Liu; Zikang Shan; Hao Shen; Ruicheng Wang; Haoran Geng; Yijia Weng; Jiayi Chen; Tengyu Liu; Li Yi; He Wang; |
1719 | A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel visual-inertial odometry (VIO) initialization method, which decouples rotation and translation estimation, and achieves higher efficiency and better robustness. |
Yijia He; Bo Xu; Zhanpeng Ouyang; Hongdong Li; |
1720 | GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model. |
Da Yin; Feng Gao; Govind Thattai; Michael Johnston; Kai-Wei Chang; |
1721 | Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a Bi-domain active learning approach, namely Bi3D, to solve the cross-domain 3D object detection task. |
Jiakang Yuan; Bo Zhang; Xiangchao Yan; Tao Chen; Botian Shi; Yikang Li; Yu Qiao; |
1722 | Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we identify a principled model design space with two axes: how to represent videos and how to fuse video and text information. |
Xudong Lin; Simran Tiwari; Shiyuan Huang; Manling Li; Mike Zheng Shou; Heng Ji; Shih-Fu Chang; |
1723 | Mask-Free OVIS: Open-Vocabulary Instance Segmentation Without Manual Mask Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This difference between strong and weak supervision leads to overfitting on base categories, resulting in poor generalization towards novel categories. In this work, we overcome this issue by learning both base and novel categories from pseudo-mask annotations generated by the vision-language model in a weakly supervised manner using our proposed Mask-free OVIS pipeline. |
Vibashan VS; Ning Yu; Chen Xing; Can Qin; Mingfei Gao; Juan Carlos Niebles; Vishal M. Patel; Ran Xu; |
1724 | Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most existing self-supervised point cloud representation learning methods only consider geometry from a static snapshot omitting the fact that sequential observations of dynamic scenes could reveal more comprehensive geometric details. To overcome such issues, this paper proposes a new 4D self-supervised pre-training method called Complete-to-Partial 4D Distillation. |
Zhuoyang Zhang; Yuhao Dong; Yunze Liu; Li Yi; |
1725 | BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a near real-time (10Hz) method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. |
Bowen Wen; Jonathan Tremblay; Valts Blukis; Stephen Tyree; Thomas Müller; Alex Evans; Dieter Fox; Jan Kautz; Stan Birchfield; |
1726 | Multi-Modal Gait Recognition Via Effective Spatial-Temporal Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To obtain a more robust and comprehensive gait representation for recognition, we propose a transformer-based gait recognition framework called MMGaitFormer, which effectively fuses and aggregates the spatial-temporal information from the skeletons and silhouettes. |
Yufeng Cui; Yimei Kang; |
1727 | Crowd3D: Towards Hundreds of People Reconstruction From A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image. |
Hao Wen; Jing Huang; Huili Cui; Haozhe Lin; Yu-Kun Lai; Lu Fang; Kun Li; |
1728 | Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Considering the reality of a large amount of incomplete data, in this paper, we propose a simple but effective method for incomplete multi-view clustering based on consensus graph learning, termed as HCLS_CGL. |
Jie Wen; Chengliang Liu; Gehui Xu; Zhihao Wu; Chao Huang; Lunke Fei; Yong Xu; |
1729 | Humans As Light Bulbs: 3D Human Reconstruction From Thermal Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an analysis-by-synthesis framework that jointly models the objects, people, and their thermal reflections, which allows us to combine generative models with differentiable rendering of reflections. |
Ruoshi Liu; Carl Vondrick; |
1730 | Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: HiDisc uses a self-supervised contrastive learning framework in which positive patch pairs are defined based on a common ancestry in the data hierarchy, and a unified patch, slide, and patient discriminative learning objective is used for visual SSL. |
Cheng Jiang; Xinhai Hou; Akhil Kondepudi; Asadur Chowdury; Christian W. Freudiger; Daniel A. Orringer; Honglak Lee; Todd C. Hollon; |
1731 | ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To mitigate the domain gap, we propose a prompting-to-disentangle (ProD) method through a novel exploration with the prompting mechanism. |
Tianyi Ma; Yifan Sun; Zongxin Yang; Yi Yang; |
1732 | Clothing-Change Feature Augmentation for Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Clothing-Change Feature Augmentation (CCFA) model for CC Re-ID to largely expand clothing-change data in the feature space rather than visual image space. |
Ke Han; Shaogang Gong; Yan Huang; Liang Wang; Tieniu Tan; |
1733 | CafeBoost: Causal Feature Boost To Eliminate Task-Induced Bias for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find a new type of bias appearing in continual learning, coined as task-induced bias. |
Benliu Qiu; Hongliang Li; Haitao Wen; Heqian Qiu; Lanxiao Wang; Fanman Meng; Qingbo Wu; Lili Pan; |
1734 | A-La-Carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce A-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts on distinct data so that they can be arbitrarily composed at inference time. |
Benjamin Bowman; Alessandro Achille; Luca Zancato; Matthew Trager; Pramuditha Perera; Giovanni Paolini; Stefano Soatto; |
1735 | ImageNet-E: Benchmarking Neural Network Robustness Via Attribute Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, instead of following the traditional research paradigm that investigates new out-of-distribution corruptions or perturbations deep models may encounter, we conduct model debugging in in-distribution data to explore which object attributes a model may be sensitive to. |
Xiaodan Li; Yuefeng Chen; Yao Zhu; Shuhui Wang; Rong Zhang; Hui Xue; |
1736 | Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the observations made, we propose Semantic-Aware Virtual Contrastive model (SAVC), a novel method that facilitates separation between new classes and base classes by introducing virtual classes to SCL. |
Zeyin Song; Yifan Zhao; Yujun Shi; Peixi Peng; Li Yuan; Yonghong Tian; |
1737 | ViLEM: Visual-Language Error Modeling for Image-Text Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel proxy task, named Visual-Language Error Modeling (ViLEM), to inject detailed image-text association into "dual-encoder" model by "proofreading" each word in the text against the corresponding image. |
Yuxin Chen; Zongyang Ma; Ziqi Zhang; Zhongang Qi; Chunfeng Yuan; Ying Shan; Bing Li; Weiming Hu; Xiaohu Qie; Jianping Wu; |
1738 | Egocentric Auditory Attention Localization in Conversations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the new and challenging Selective Auditory Attention Localization problem, we propose an end-to-end deep learning approach that uses egocentric video and multichannel audio to predict the heatmap of the camera wearer’s auditory attention. |
Fiona Ryan; Hao Jiang; Abhinav Shukla; James M. Rehg; Vamsi Krishna Ithapu; |
1739 | Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel USOD method to mine rich and accurate saliency knowledge from both easy and hard samples. |
Huajun Zhou; Bo Qiao; Lingxiao Yang; Jianhuang Lai; Xiaohua Xie; |
1740 | AltFreezing for More General Video Face Forgery Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection. |
Zhendong Wang; Jianmin Bao; Wengang Zhou; Weilun Wang; Houqiang Li; |
1741 | Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Local Implicit Transformer (LIT) that integrates attention mechanism and frequency encoding technique into local implicit image function. |
Hao-Wei Chen; Yu-Syuan Xu; Min-Fong Hong; Yi-Min Tsai; Hsien-Kai Kuo; Chun-Yi Lee; |
1742 | Learning Partial Correlation Based Deep Visual Representation for Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formulate SICE as a novel structured layer of CNN. |
Saimunur Rahman; Piotr Koniusz; Lei Wang; Luping Zhou; Peyman Moghadam; Changming Sun; |
1743 | Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment for developing human-level multi-task agents. |
Shaofei Cai; Zihao Wang; Xiaojian Ma; Anji Liu; Yitao Liang; |
1744 | MoDi: Unconditional Motion Synthesis From Diverse Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MoDi — a generative model trained in an unsupervised setting from an extremely diverse, unstructured and unlabeled dataset. |
Sigal Raab; Inbal Leibovitch; Peizhuo Li; Kfir Aberman; Olga Sorkine-Hornung; Daniel Cohen-Or; |
1745 | Visual Localization Using Imperfect 3D Models From The Internet Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper studies how the imperfections of these models affect localization accuracy. |
Vojtech Panek; Zuzana Kukelova; Torsten Sattler; |
1746 | Network-Free, Unsupervised Semantic Segmentation With Synthetic Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We derive a method that yields highly accurate semantic segmentation maps without the use of any additional neural network, layers, manually annotated training data, or supervised training. |
Qianli Feng; Raghudeep Gadde; Wentong Liao; Eduard Ramon; Aleix Martinez; |
1747 | Hierarchical Dense Correlation Distillation for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture. |
Bohao Peng; Zhuotao Tian; Xiaoyang Wu; Chengyao Wang; Shu Liu; Jingyong Su; Jiaya Jia; |
1748 | PVO: Panoptic Visual Odometry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present PVO, a novel panoptic visual odometry framework to achieve more comprehensive modeling of the scene motion, geometry, and panoptic segmentation information. |
Weicai Ye; Xinyue Lan; Shuo Chen; Yuhang Ming; Xingyuan Yu; Hujun Bao; Zhaopeng Cui; Guofeng Zhang; |
1749 | Generative Diffusion Prior for Unified Image Restoration and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the Generative Diffusion Prior (GDP) to effectively model the posterior distributions in an unsupervised sampling manner. |
Ben Fei; Zhaoyang Lyu; Liang Pan; Junzhe Zhang; Weidong Yang; Tianyue Luo; Bo Zhang; Bo Dai; |
1750 | Real-Time Controllable Denoising for Image and Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Real-time Controllable Denoising (RCD), the first deep image and video denoising pipeline that provides a fully controllable user interface to edit arbitrary denoising levels in real-time with only one-time network inference. |
Zhaoyang Zhang; Yitong Jiang; Wenqi Shao; Xiaogang Wang; Ping Luo; Kaimo Lin; Jinwei Gu; |
1751 | ISBNet: A 3D Point Cloud Instance Segmentation Network With Instance-Aware Sampling and Box-Aware Dynamic Convolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, by relying on the quality of the clusters, these methods generate susceptible results when (1) nearby objects with the same semantic class are packed together, or (2) large objects with loosely connected regions. To address these limitations, we introduce ISBNet, a novel cluster-free method that represents instances as kernels and decodes instance masks via dynamic convolution. |
Tuan Duc Ngo; Binh-Son Hua; Khoi Nguyen; |
1752 | Hi4D: 4D Instance Segmentation of Close Human Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Hi4D, a method and dataset for the auto analysis of physically close human-human interaction under prolonged contact. |
Yifei Yin; Chen Guo; Manuel Kaufmann; Juan Jose Zarate; Jie Song; Otmar Hilliges; |
1753 | Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery From Sparse Image Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Hi-LASSIE, which performs 3D articulated reconstruction from only 20-30 online images in the wild without any user-defined shape or skeleton templates. |
Chun-Han Yao; Wei-Chih Hung; Yuanzhen Li; Michael Rubinstein; Ming-Hsuan Yang; Varun Jampani; |
1754 | IterativePFN: True Iterative Point Cloud Filtering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose IterativePFN (iterative point cloud filtering network), which consists of multiple IterationModules that model the true iterative filtering process internally, within a single network. |
Dasith de Silva Edirimuni; Xuequan Lu; Zhiwen Shao; Gang Li; Antonio Robles-Kelly; Ying He; |
1755 | Computationally Budgeted Continual Learning: What Does Matter? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is unreasonable for applications in-the-wild, where systems are primarily constrained by computational and time budgets, not storage. We revisit this problem with a large-scale benchmark and analyze the performance of traditional CL approaches in a compute-constrained setting, where effective memory samples used in training can be implicitly restricted as a consequence of limited computation. |
Ameya Prabhu; Hasan Abed Al Kader Hammoud; Puneet K. Dokania; Philip H.S. Torr; Ser-Nam Lim; Bernard Ghanem; Adel Bibi; |
1756 | Decentralized Learning With Multi-Headed Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. |
Andrey Zhmoginov; Mark Sandler; Nolan Miller; Gus Kristiansen; Max Vladymyrov; |
1757 | SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Radiography imaging protocols focus on particular body regions, therefore producing images of great similarity and yielding recurrent anatomical structures across patients. To exploit this structured information, we propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID). |
Tiange Xiang; Yixiao Zhang; Yongyi Lu; Alan L. Yuille; Chaoyi Zhang; Weidong Cai; Zongwei Zhou; |
1758 | CF-Font: Content Fusion for Few-Shot Font Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the content feature extracted using a representative font might not be optimal. In light of this, we propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts, which can take the variation of content features caused by different fonts into consideration. |
Chi Wang; Min Zhou; Tiezheng Ge; Yuning Jiang; Hujun Bao; Weiwei Xu; |
1759 | On The Convergence of IRLS and Its Variants in Outlier-Robust Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Outlier-robust estimation involves estimating some parameters (e.g., 3D rotations) from data samples in the presence of outliers, and is typically formulated as a non-convex and non-smooth problem. For this problem, the classical method called iteratively reweighted least-squares (IRLS) and its variants have shown impressive performance. This paper makes several contributions towards understanding why these algorithms work so well. |
Liangzu Peng; Christian Kümmerle; René Vidal; |
1760 | CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present CLIP-S^4 that leverages self-supervised pixel representation learning and vision-language models to enable various semantic segmentation tasks (e.g., unsupervised, transfer learning, language-driven segmentation) without any human annotations and unknown class information. |
Wenbin He; Suphanut Jamonnak; Liang Gou; Liu Ren; |
1761 | Deep Incomplete Multi-View Clustering With Cross-View Partial Sample and Prototype Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although several attempts have been proposed to address IMVC, they suffer from the following drawbacks: i) Existing methods mainly adopt cross-view contrastive learning forcing the representations of each sample across views to be exactly the same, which might ignore view discrepancy and flexibility in representations; ii) Due to the absence of non-observed samples across multiple views, the obtained prototypes of clusters might be unaligned and biased, leading to incorrect fusion. To address the above issues, we propose a Cross-view Partial Sample and Prototype Alignment Network (CPSPAN) for Deep Incomplete Multi-view Clustering. |
Jiaqi Jin; Siwei Wang; Zhibin Dong; Xinwang Liu; En Zhu; |
1762 | A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a new comprehensive dataset, NWPU Campus, containing 43 scenes, 28 classes of abnormal events, and 16 hours of videos. |
Congqi Cao; Yue Lu; Peng Wang; Yanning Zhang; |
1763 | Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, direct aligning cross-modal information using such representations is challenging, as visual patches and text tokens differ in semantic levels and granularities. To alleviate this issue, we propose a Finite Discrete Tokens (FDT) based multimodal representation. |
Yuxiao Chen; Jianbo Yuan; Yu Tian; Shijie Geng; Xinyu Li; Ding Zhou; Dimitris N. Metaxas; Hongxia Yang; |
1764 | 3Mformer: Multi-Order Multi-Mode Transformer for Skeletal Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to form hypergraph to model hyper-edges between graph nodes (e.g., third- and fourth-order hyper-edges capture three and four nodes) which help capture higher-order motion patterns of groups of body joints. |
Lei Wang; Piotr Koniusz; |
1765 | HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we propose a HumanBench based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. |
Shixiang Tang; Cheng Chen; Qingsong Xie; Meilin Chen; Yizhou Wang; Yuanzheng Ci; Lei Bai; Feng Zhu; Haiyang Yang; Li Yi; Rui Zhao; Wanli Ouyang; |
1766 | Heterogeneous Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. |
Divyam Madaan; Hongxu Yin; Wonmin Byeon; Jan Kautz; Pavlo Molchanov; |
1767 | Object Pose Estimation With Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we inject two fundamental changes, namely conformal keypoint detection and geometric uncertainty propagation, into the two-stage paradigm and propose the first pose estimator that endows an estimation with provable and computable worst-case error bounds. |
Heng Yang; Marco Pavone; |
1768 | Transformer Scale Gate for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Leveraging from the inherent properties of Vision Transformers, we propose a simple yet effective module, Transformer Scale Gate (TSG), to optimally combine multi-scale features. |
Hengcan Shi; Munawar Hayat; Jianfei Cai; |
1769 | Deep Graph Reprogramming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we explore a novel model reusing task tailored for graph neural networks (GNNs), termed as "deep graph reprogramming". |
Yongcheng Jing; Chongbin Yuan; Li Ju; Yiding Yang; Xinchao Wang; Dacheng Tao; |
1770 | Compacting Binary Neural Networks By Sparse Kernel Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed: their values are mostly clustered into a small number of codewords. |
Yikai Wang; Wenbing Huang; Yinpeng Dong; Fuchun Sun; Anbang Yao; |
1771 | EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the joint training of weighting parameters on multiple related tasks may lead to performance degradation, known as negative transfer. To address this issue, this work proposes an evolutionary multi-tasking neural architecture search (EMT-NAS) algorithm to accelerate the search process by transferring architectural knowledge across multiple related tasks. |
Peng Liao; Yaochu Jin; Wenli Du; |
1772 | 3D-Aware Multi-Class Image-to-Image Translation With NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. |
Senmao Li; Joost van de Weijer; Yaxing Wang; Fahad Shahbaz Khan; Meiqin Liu; Jian Yang; |
1773 | Learning Joint Latent Space EBM Prior Model for Multi-Layer Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. |
Jiali Cui; Ying Nian Wu; Tian Han; |
1774 | Unsupervised Visible-Infrared Person Re-Identification Via Progressive Graph Matching and Alternate Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we devise a Progressive Graph Matching method to globally mine cross-modality correspondences under cluster imbalance scenarios. |
Zesen Wu; Mang Ye; |
1775 | Hierarchical B-Frame Video Coding Using Two-Layer CANF Without Motion Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel B-frame coding architecture based on two-layer Conditional Augmented Normalization Flows (CANF). |
David Alexandre; Hsueh-Ming Hang; Wen-Hsiao Peng; |
1776 | Benchmarking Robustness of 3D Object Detection to Common Corruptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To comprehensively and rigorously benchmark the corruption robustness of 3D detectors, in this paper we design 27 types of common corruptions for both LiDAR and camera inputs considering real-world driving scenarios. |
Yinpeng Dong; Caixin Kang; Jinlai Zhang; Zijian Zhu; Yikai Wang; Xiao Yang; Hang Su; Xingxing Wei; Jun Zhu; |
1777 | Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The objective of this paper is self-supervised learning of video object segmentation. |
Liulei Li; Wenguan Wang; Tianfei Zhou; Jianwu Li; Yi Yang; |
1778 | Seeing Beyond The Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding. |
Zijiao Chen; Jiaxin Qing; Tiange Xiang; Wan Lin Yue; Juan Helen Zhou; |
1779 | PointAvatar: Deformable Point-Based Head Avatars From Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose PointAvatar, a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading. |
Yufeng Zheng; Wang Yifan; Gordon Wetzstein; Michael J. Black; Otmar Hilliges; |
1780 | Seeing Through The Glass: Neural 3D Reconstruction of Object Inside A Transparent Container Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we define a new problem of recovering the 3D geometry of an object confined in a transparent enclosure. |
Jinguang Tong; Sundaram Muthu; Fahira Afzal Maken; Chuong Nguyen; Hongdong Li; |
1781 | OrienterNet: Visual Localization in 2D Public Maps With Neural Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: OrienterNet is supervised only by camera poses but learns to perform semantic matching with a wide range of map elements in an end-to-end manner. To enable this, we introduce a large crowd-sourced dataset of images captured across 12 cities from the diverse viewpoints of cars, bikes, and pedestrians. |
Paul-Edouard Sarlin; Daniel DeTone; Tsun-Yi Yang; Armen Avetisyan; Julian Straub; Tomasz Malisiewicz; Samuel Rota Bulò; Richard Newcombe; Peter Kontschieder; Vasileios Balntas; |
1782 | PMatch: Paired Masked Image Modeling for Dense Geometric Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, to be robust to the textureless area, we propose a novel cross-frame global matching module (CFGM). Since the most textureless area is planar surfaces, we propose a homography loss to further regularize its learning. |
Shengjie Zhu; Xiaoming Liu; |
1783 | Neural Voting Field for Camera-Space 3D Hand Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation. |
Lin Huang; Chung-Ching Lin; Kevin Lin; Lin Liang; Lijuan Wang; Junsong Yuan; Zicheng Liu; |
1784 | STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of human action recognition using motion capture (MoCap) sequences. |
Xiaoyu Zhu; Po-Yao Huang; Junwei Liang; Celso M. de Melo; Alexander G. Hauptmann; |
1785 | Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to address more practical scenarios, we propose a Visual Recognition-Driven Image Restoration network for multiple degradation, dubbed VRD-IR, to recover high-quality images from various unknown corruption types from the perspective of visual recognition within one model. |
Zizheng Yang; Jie Huang; Jiahao Chang; Man Zhou; Hu Yu; Jinghao Zhang; Feng Zhao; |
1786 | High-Fidelity Generalized Emotional Talking Face Generation With Multi-Modal Emotion Space Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a more flexible and generalized framework. |
Chao Xu; Junwei Zhu; Jiangning Zhang; Yue Han; Wenqing Chu; Ying Tai; Chengjie Wang; Zhifeng Xie; Yong Liu; |
1787 | Masked and Adaptive Transformer for Exemplar Based Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel framework for exemplar based image translation. |
Chang Jiang; Fei Gao; Biao Ma; Yuhao Lin; Nannan Wang; Gang Xu; |
1788 | Knowledge Combination To Learn Rotated Detection Without Rotated Annotation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework that allows the model to predict precise rotated boxes only requiring cheaper axis-aligned annotation of the target dataset. |
Tianyu Zhu; Bryce Ferenczi; Pulak Purkait; Tom Drummond; Hamid Rezatofighi; Anton van den Hengel; |
1789 | Teaching Matters: Investigating The Role of Supervision in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, their behavior under different learning paradigms is not well explored. We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance. |
Matthew Walmer; Saksham Suri; Kamal Gupta; Abhinav Shrivastava; |
1790 | Imagic: Text-Based Real Image Editing With Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we demonstrate, for the very first time, the ability to apply complex (e.g., non-rigid) text-based semantic edits to a single real image. |
Bahjat Kawar; Shiran Zada; Oran Lang; Omer Tov; Huiwen Chang; Tali Dekel; Inbar Mosseri; Michal Irani; |
1791 | Pointersect: Neural Rendering With Cloud-Ray Intersection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel method that renders point clouds as if they are surfaces. |
Jen-Hao Rick Chang; Wei-Yu Chen; Anurag Ranjan; Kwang Moo Yi; Oncel Tuzel; |
1792 | Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning. |
Sifan Long; Zhen Zhao; Jimin Pi; Shengsheng Wang; Jingdong Wang; |
1793 | You Are Catching My Attention: Are Vision Transformers Bad Learners Under Backdoor Attacks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, compared with CNNs, we find that this mechanism of capturing global information within patches makes ViTs more sensitive to patch-wise triggers. Under such observations, we delicately design a novel backdoor attack framework for ViTs, dubbed BadViT, which utilizes a universal patch-wise trigger to catch the model’s attention from patches beneficial for classification to those with triggers, thereby manipulating the mechanism on which ViTs survive to confuse itself. |
Zenghui Yuan; Pan Zhou; Kai Zou; Yu Cheng; |
1794 | STDLens: Model Hijacking-Resilient Federated Learning for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The attacker can control how the object detection system should misbehave by implanting Trojaned gradients using only a small number of compromised clients in the collaborative learning process. This paper introduces STDLens, a principled approach to safeguarding FL against such attacks. |
Ka-Ho Chow; Ling Liu; Wenqi Wei; Fatih Ilhan; Yanzhao Wu; |
1795 | Contrastive Grouping With Transformer for Referring Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a mask classification framework, Contrastive Grouping with Transformer network (CGFormer), which explicitly captures object-level information via token-based querying and grouping strategy. |
Jiajin Tang; Ge Zheng; Cheng Shi; Sibei Yang; |
1796 | MagicPony: Learning Articulated 3D Animals in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of predicting the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse given a single test image as input. We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. |
Shangzhe Wu; Ruining Li; Tomas Jakab; Christian Rupprecht; Andrea Vedaldi; |
1797 | PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The patch-to-patch attention suffers from the quadratic complexity issue, and also makes it non-trivial to explain learned ViTs. To address these issues in ViT, this paper proposes to learn Patch-to-Cluster attention (PaCa) in ViT. |
Ryan Grainger; Thomas Paniagua; Xi Song; Naresh Cuntoor; Mun Wai Lee; Tianfu Wu; |
1798 | Pix2map: Cross-Modal Retrieval for Inferring Street Maps From Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. |
Xindi Wu; KwunFung Lau; Francesco Ferroni; Aljoša Ošep; Deva Ramanan; |
1799 | LightPainter: Interactive Portrait Relighting With Freehand Scribble Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce LightPainter, a scribble-based relighting system that allows users to interactively manipulate portrait lighting effect with ease. |
Yiqun Mei; He Zhang; Xuaner Zhang; Jianming Zhang; Zhixin Shu; Yilin Wang; Zijun Wei; Shi Yan; HyunJoon Jung; Vishal M. Patel; |
1800 | Affordances From Human Videos As A Versatile Representation for Robotics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. |
Shikhar Bahl; Russell Mendonca; Lili Chen; Unnat Jain; Deepak Pathak; |
1801 | Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our insight here is to learn surface parameterization and SDFs inference in an end-to-end manner. |
Chao Chen; Yu-Shen Liu; Zhizhong Han; |
1802 | AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video frame interpolation. |
Zhen Li; Zuo-Liang Zhu; Ling-Hao Han; Qibin Hou; Chun-Le Guo; Ming-Ming Cheng; |
1803 | Vision Transformers Are Parameter-Efficient Audio-Visual Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Vision transformers (ViTs) have achieved impressive results on various computer vision tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained only on visual data, to generalize to audio-visual data without finetuning any of its original parameters. |
Yan-Bo Lin; Yi-Lin Sung; Jie Lei; Mohit Bansal; Gedas Bertasius; |
1804 | Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As adjacent frames usually contain different contents, directly stacking features of adjacent frames without discrimination may affect the latent clear frame restoration. Therefore, we develop a simple yet effective discriminative temporal feature fusion module to obtain useful temporal features for latent frame restoration. |
Jinshan Pan; Boming Xu; Jiangxin Dong; Jianjun Ge; Jinhui Tang; |
1805 | Training Debiased Subnetworks With Contrastive Weight Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We then further elucidate the importance of bias-conflicting samples on structure learning. Motivated by these observations, we propose a Debiased Contrastive Weight Pruning (DCWP) algorithm, which probes unbiased subnetworks without expensive group annotations. |
Geon Yeong Park; Sangmin Lee; Sang Wan Lee; Jong Chul Ye; |
1806 | SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce SparseViT that revisits activation sparsity for recent window-based vision transformers (ViTs). |
Xuanyao Chen; Zhijian Liu; Haotian Tang; Li Yi; Hang Zhao; Song Han; |
1807 | Prototype-Based Embedding Network for Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The above challenges prevent current SGG methods from acquiring robust features for reliable relation prediction. In this paper, we claim that predicate’s categoryinherent semantics can serve as class-wise prototypes in the semantic space for relieving the above challenges caused by the diverse visual appearances. |
Chaofan Zheng; Xinyu Lyu; Lianli Gao; Bo Dai; Jingkuan Song; |
1808 | Toward RAW Object Detection: A New Benchmark and A New Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to achieve object detection on RAW sensor data, which naturally saves the HDR information from image sensors without extra equipment costs. |
Ruikang Xu; Chang Chen; Jingyang Peng; Cheng Li; Yibin Huang; Fenglong Song; Youliang Yan; Zhiwei Xiong; |
1809 | Music-Driven Group Choreography Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present AIOZ-GDANCE, a new largescale dataset for music-driven group dance generation. |
Nhat Le; Thang Pham; Tuong Do; Erman Tjiputra; Quang D. Tran; Anh Nguyen; |
1810 | Cascade Evidential Learning for Open-World Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared with traditional open-set recognition tasks, Open-world WTAL (OWTAL) is challenging since not only are the annotations of unknown samples unavailable, but also the fine-grained annotations of known action instances can only be inferred ambiguously from the video category labels. To address this problem, we propose a Cascade Evidential Learning framework at an evidence level, which targets at OWTAL for the first time. |
Mengyuan Chen; Junyu Gao; Changsheng Xu; |
1811 | Efficient Movie Scene Detection Using State-Space Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work proposes a State-Space Transformer model that can efficiently capture dependencies in long movie videos for accurate movie scene detection. |
Md Mohaiminul Islam; Mahmudul Hasan; Kishan Shamsundar Athrey; Tony Braskich; Gedas Bertasius; |
1812 | Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, the present focus in single RGBT image input restricts existing methods from well addressing dynamic real-world scenes. Motivated by the above observations, in this paper, we set out to address a relatively new task of semantic segmentation of multispectral video input, which we refer to as Multispectral Video Semantic Segmentation, or MVSS in short. |
Wei Ji; Jingjing Li; Cheng Bian; Zongwei Zhou; Jiaying Zhao; Alan L. Yuille; Li Cheng; |
1813 | Reducing The Label Bias for Timestamp Supervised Temporal Action Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, previous approaches suffer from severe label bias due to over-reliance on sparse timestamp annotations, resulting in unsatisfactory performance. In this paper, we propose the Debiasing-TSTAS (D-TSTAS) framework by exploiting unannotated frames to alleviate this bias from two phases: 1) Initialization. |
Kaiyuan Liu; Yunheng Li; Shenglan Liu; Chenwei Tan; Zihang Shao; |
1814 | Efficient Semantic Segmentation By Altering Resolutions for Compressed Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an altering resolution framework called AR-Seg for compressed videos to achieve efficient VSS. |
Yubin Hu; Yuze He; Yanghao Li; Jisheng Li; Yuxing Han; Jiangtao Wen; Yong-Jin Liu; |
1815 | STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, the semantic ambiguity causes inconsistent annotation and negatively affects the model’s convergence, leading to worse accuracy and instability prediction. To solve this problem, we propose a Self-adapTive Ambiguity Reduction (STAR) loss by exploiting the properties of semantic ambiguity. |
Zhenglin Zhou; Huaxia Li; Hong Liu; Nanyang Wang; Gang Yu; Rongrong Ji; |
1816 | A Meta-Learning Approach to Predicting Performance and Data Requirements Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an approach to estimate the number of samples required for a model to reach a target performance. |
Achin Jain; Gurumurthy Swaminathan; Paolo Favaro; Hao Yang; Avinash Ravichandran; Hrayr Harutyunyan; Alessandro Achille; Onkar Dabeer; Bernt Schiele; Ashwin Swaminathan; Stefano Soatto; |
1817 | Seeing What You Said: Talking Face Generation Guided By A Lip Reading Expert Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With a lip-reading expert, we propose a novel contrastive learning to enhance lip-speech synchronization, and a transformer to encode audio synchronically with video, while considering global temporal dependency of audio. |
Jiadong Wang; Xinyuan Qian; Malu Zhang; Robby T. Tan; Haizhou Li; |
1818 | Deep Curvilinear Editing: Commutative and Nonlinear Image Manipulation for Pretrained Deep Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study proposes a novel method called deep curvilinear editing (DeCurvEd) to determine semantic commuting vector fields on the latent space. |
Takehiro Aoshima; Takashi Matsubara; |
1819 | Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Without semantic priors, a network may easily deviate from a region’s original color. To address this issue, we propose a novel semantic-aware knowledge-guided framework (SKF) that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model. |
Yuhui Wu; Chen Pan; Guoqing Wang; Yang Yang; Jiwei Wei; Chongyi Li; Heng Tao Shen; |
1820 | SimpSON: Simplifying Photo Cleanup With Single-Click Distracting Object Segmentation Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an interactive distractor selection method that is optimized to achieve the task with just a single click. |
Chuong Huynh; Yuqian Zhou; Zhe Lin; Connelly Barnes; Eli Shechtman; Sohrab Amirghodsi; Abhinav Shrivastava; |
1821 | Learning Neural Duplex Radiance Fields for Real-Time View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to distill and bake NeRFs into highly efficient mesh-based neural representations that are fully compatible with the massively parallel graphics rendering pipeline. |
Ziyu Wan; Christian Richardt; Aljaž Božič; Chao Li; Vijay Rengarajan; Seonghyeon Nam; Xiaoyu Xiang; Tuotuo Li; Bo Zhu; Rakesh Ranjan; Jing Liao; |
1822 | Deep Arbitrary-Scale Image Super-Resolution Via Scale-Equivariance Pursuit Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ability of scale-equivariance processing blocks plays a central role in arbitrary-scale image super-resolution tasks. Inspired by this crucial observation, this work proposes two novel scale-equivariant modules within a transformer-style framework to enhance arbitrary-scale image super-resolution (ASISR) performance, especially in high upsampling rate image extrapolation. |
Xiaohang Wang; Xuanhong Chen; Bingbing Ni; Hang Wang; Zhengyan Tong; Yutian Liu; |
1823 | Towards Modality-Agnostic Person Re-Identification With Descriptive Query Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This motivates us to study a new and challenging modality-agnostic person re-identification problem. Towards this goal, we propose a unified person re-identification (UNIReID) architecture that can effectively adapt to cross-modality and multi-modality tasks. |
Cuiqun Chen; Mang Ye; Ding Jiang; |
1824 | Discriminating Known From Unknown Objects Via Structure-Enhanced Recurrent Variational AutoEncoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, to boost the performance of object localization, we explore utilizing the classical Laplacian of Gaussian (LoG) operator to enhance the structure information in the extracted low-level features. |
Aming Wu; Cheng Deng; |
1825 | Occlusion-Free Scene Recovery Via Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method for occlusion removal by directly building a mapping between position and viewing angles and the corresponding occlusion-free scene details leveraging Neural Radiance Fields (NeRF). |
Chengxuan Zhu; Renjie Wan; Yunkai Tang; Boxin Shi; |
1826 | OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a unified CNN framework for unsupervised anomaly localization, named OmniAL. |
Ying Zhao; |
1827 | An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For the cloth-changing problem, video-based ReID is rarely studied due to the lack of a suitable cloth-changing benchmark, and gait recognition is often researched under controlled conditions. To tackle this problem, we propose a Cloth-Changing benchmark for Person re-identification and Gait recognition (CCPG). |
Weijia Li; Saihui Hou; Chunjie Zhang; Chunshui Cao; Xu Liu; Yongzhen Huang; Yao Zhao; |
1828 | Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide an in-depth analysis of current multi-task learning methods under different common settings and find out that the existing methods make progress but there is still a large performance gap compared with single-task baselines. To alleviate this dilemma in autonomous driving, we present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting to guide the model toward learning high-quality task-specific representations. |
Xiwen Liang; Minzhe Niu; Jianhua Han; Hang Xu; Chunjing Xu; Xiaodan Liang; |
1829 | Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a standardized and well-defined human evaluation protocol to facilitate verifiable and reproducible human evaluation in future works. |
Mayu Otani; Riku Togashi; Yu Sawai; Ryosuke Ishigami; Yuta Nakashima; Esa Rahtu; Janne Heikkilä; Shin’ichi Satoh; |
1830 | Semi-Supervised Domain Adaptation With Source Label Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a novel source-adaptive paradigm that adapts the source data to match the target data. |
Yu-Chu Yu; Hsuan-Tien Lin; |
1831 | Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, how to infer accurate intermediate motion and synthesize high-quality video frames are two critical challenges. In this paper, we present a novel VFI framework with improved treatment for these challenges. |
Zhiyang Yu; Yu Zhang; Dongqing Zou; Xijun Chen; Jimmy S. Ren; Shunqing Ren; |
1832 | FlowGrad: Controlling The Output of Generative ODEs With Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to optimize the output of ODE models according to a guidance function to achieve controllable generation. |
Xingchao Liu; Lemeng Wu; Shujian Zhang; Chengyue Gong; Wei Ping; Qiang Liu; |
1833 | Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we find that distorted images under different weather conditions contain general characteristics as well as their specific characteristics. |
Yurui Zhu; Tianyu Wang; Xueyang Fu; Xuanyu Yang; Xin Guo; Jifeng Dai; Yu Qiao; Xiaowei Hu; |
1834 | Generalized Deep 3D Shape Prior Via Part-Discretized Diffusion Process Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a generalized 3D shape generation prior model, tailored for multiple 3D tasks including unconditional shape generation, point cloud completion, and cross-modality shape generation, etc. |
Yuhan Li; Yishun Dou; Xuanhong Chen; Bingbing Ni; Yilin Sun; Yutian Liu; Fuzhen Wang; |
1835 | Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework which aims at enforcing the two sub-nets to learn informative features from irrelevant views. |
Zicheng Wang; Zhen Zhao; Xiaoxia Xing; Dong Xu; Xiangyu Kong; Luping Zhou; |
1836 | Learning A 3D Morphable Face Reflectance Model From Low-Cost Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes the first 3D morphable face reflectance model with spatially varying BRDF using only low-cost publicly-available data. |
Yuxuan Han; Zhibo Wang; Feng Xu; |
1837 | SCoDA: Domain Adaptive Shape Completion for Real Scans Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A new dataset, ScanSalon, is contributed with a bunch of elaborate 3D models created by skillful artists according to scans. To address this new task, we propose a novel cross-domain feature fusion method for knowledge transfer and a novel volume-consistent self-training framework for robust learning from real data. |
Yushuang Wu; Zizheng Yan; Ce Chen; Lai Wei; Xiao Li; Guanbin Li; Yihao Li; Shuguang Cui; Xiaoguang Han; |
1838 | Recurrent Homography Estimation Using Homography-Guided Image Warping and Focus Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Recurrent homography estimation framework using Homography-guided image Warping and Focus transformer (FocusFormer), named RHWF. |
Si-Yuan Cao; Runmin Zhang; Lun Luo; Beinan Yu; Zehua Sheng; Junwei Li; Hui-Liang Shen; |
1839 | I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing Via Raytracing in Neural SDFs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present I^2-SDF, a new method for intrinsic indoor scene reconstruction and editing using differentiable Monte Carlo raytracing on neural signed distance fields (SDFs). |
Jingsen Zhu; Yuchi Huo; Qi Ye; Fujun Luan; Jifan Li; Dianbing Xi; Lisha Wang; Rui Tang; Wei Hua; Hujun Bao; Rui Wang; |
1840 | DLBD: A Self-Supervised Direct-Learned Binary Descriptor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Since their binarization processes are not a component of the network, the learning-based binary descriptor cannot fully utilize the advances of deep learning. To solve this issue, we propose a model-agnostic plugin binary transformation layer (BTL), making the network directly generate binary descriptors. |
Bin Xiao; Yang Hu; Bo Liu; Xiuli Bi; Weisheng Li; Xinbo Gao; |
1841 | Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce Fuzzy Positive Learning (FPL) for accurate SSL semantic segmentation in a plug-and-play fashion, targeting adaptively encouraging fuzzy positive predictions and suppressing highly-probable negatives. |
Pengchong Qiao; Zhidan Wei; Yu Wang; Zhennan Wang; Guoli Song; Fan Xu; Xiangyang Ji; Chang Liu; Jie Chen; |
1842 | Canonical Fields: Self-Supervised Learning of Pose-Canonicalized Neural Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Canonical Field Network (CaFi-Net), a self-supervised method to canonicalize the 3D pose of instances from an object category represented as neural fields, specifically neural radiance fields (NeRFs). |
Rohith Agaram; Shaurya Dewan; Rahul Sajnani; Adrien Poulenard; Madhava Krishna; Srinath Sridhar; |
1843 | TransFlow: Transformer As Flow Learner Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose TransFlow, a pure transformer architecture for optical flow estimation. |
Yawen Lu; Qifan Wang; Siqi Ma; Tong Geng; Yingjie Victor Chen; Huaijin Chen; Dongfang Liu; |
1844 | Multi-View Inverse Rendering for Large-Scale Real-World Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a efficient multi-view inverse rendering method for large-scale real-world indoor scenes that reconstructs global illumination and physically-reasonable SVBRDFs. |
Zhen Li; Lingli Wang; Mofang Cheng; Cihui Pan; Jiaqi Yang; |
1845 | AutoFocusFormer: Image Segmentation Off The Grid Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task. |
Chen Ziwen; Kaushik Patnaik; Shuangfei Zhai; Alvin Wan; Zhile Ren; Alexander G. Schwing; Alex Colburn; Li Fuxin; |
1846 | Boosting Transductive Few-Shot Fine-Tuning With Margin-Based Uncertainty Weighting and Probability Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first observe that the few-shot fine-tuned methods are learned with the imbalanced class marginal distribution. This observation further motivates us to propose the Transductive Fine-tuning with Margin-based uncertainty weighting and Probability regularization (TF-MP), which learns a more balanced class marginal distribution. |
Ran Tao; Hao Chen; Marios Savvides; |
1847 | SMPConv: Self-Moving Point Representations for Continuous Convolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present self-moving point representations where weight parameters freely move, and interpolation schemes are used to implement continuous functions. |
Sanghyeon Kim; Eunbyung Park; |
1848 | CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup Via Adversarial Latent Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low-dimensional manifold of a pretrained generative model. |
Fahad Shamshad; Muzammal Naseer; Karthik Nandakumar; |
1849 | Improving Weakly Supervised Temporal Action Localization By Bridging Train-Test Gap in Pseudo Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to generate high-quality pseudo labels from the predicted action boundaries. |
Jingqiu Zhou; Linjiang Huang; Liang Wang; Si Liu; Hongsheng Li; |
1850 | PRISE: Demystifying Deep Lucas-Kanade With Strongly Star-Convex Constraints for Multimodel Image Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Lucas-Kanade (LK) method is a classic iterative homography estimation algorithm for image alignment, but often suffers from poor local optimality especially when image pairs have large distortions. To address this challenge, in this paper we propose a novel Deep Star-Convexified Lucas-Kanade (PRISE) method for multimodel image alignment by introducing strongly star-convex constraints into the optimization problem. |
Yiqing Zhang; Xinming Huang; Ziming Zhang; |
1851 | Learning To Exploit Temporal Structure for Biomedical Vision-Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explicitly account for prior images and reports when available during both training and fine-tuning. |
Shruthi Bannur; Stephanie Hyland; Qianchu Liu; Fernando Pérez-García; Maximilian Ilse; Daniel C. Castro; Benedikt Boecking; Harshita Sharma; Kenza Bouzid; Anja Thieme; Anton Schwaighofer; Maria Wetscherek; Matthew P. Lungren; Aditya Nori; Javier Alvarez-Valle; Ozan Oktay; |
1852 | Simple Cues Lead to A Strong Multi-Object Tracker Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. |
Jenny Seidenschwarz; Guillem Brasó; Víctor Castro Serrano; Ismail Elezi; Laura Leal-Taixé; |
1853 | Marching-Primitives: Shape Abstraction From Signed Distance Function Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unlike previous works which extract polygonal meshes from a signed distance function (SDF), in this paper, we present a novel method, named Marching-Primitives, to obtain a primitive-based abstraction directly from an SDF. |
Weixiao Liu; Yuwei Wu; Sipu Ruan; Gregory S. Chirikjian; |
1854 | BiasAdv: Bias-Adversarial Augmentation for Model Debiasing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel data augmentation approach termed Bias-Adversarial augmentation (BiasAdv) that supplements bias-conflicting samples with adversarial images. |
Jongin Lim; Youngdong Kim; Byungjai Kim; Chanho Ahn; Jinwoo Shin; Eunho Yang; Seungju Han; |
1855 | CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the challenge in modeling cross-modality features and decomposing desirable modality-specific and modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. |
Zixiang Zhao; Haowen Bai; Jiangshe Zhang; Yulun Zhang; Shuang Xu; Zudi Lin; Radu Timofte; Luc Van Gool; |
1856 | Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, these works use prior information to explore explicit part alignments, which may lead to the distortion of intra-modality information. To alleviate these issues, we present IRRA: a cross-modal Implicit Relation Reasoning and Aligning framework that learns relations between local visual-textual tokens and enhances global image-text matching without requiring additional prior supervision. |
Ding Jiang; Mang Ye; |
1857 | REVEAL: Retrieval-Augmented Visual-Language Pre-Training With Multi-Source Multimodal Knowledge Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. |
Ziniu Hu; Ahmet Iscen; Chen Sun; Zirui Wang; Kai-Wei Chang; Yizhou Sun; Cordelia Schmid; David A. Ross; Alireza Fathi; |
1858 | Learning To Retain While Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, at every generator update, we aim to maintain the student’s performance on previously encountered examples while acquiring knowledge from samples of the current distribution. |
Gaurav Patel; Konda Reddy Mopuri; Qiang Qiu; |
1859 | Why Is The Winner The Best? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. |
Matthias Eisenmann; Annika Reinke; Vivienn Weru; Minu D. Tizabi; Fabian Isensee; Tim J. Adler; Sharib Ali; Vincent Andrearczyk; Marc Aubreville; Ujjwal Baid; Spyridon Bakas; Niranjan Balu; Sophia Bano; Jorge Bernal; Sebastian Bodenstedt; Alessandro Casella; Veronika Cheplygina; Marie Daum; Marleen de Bruijne; Adrien Depeursinge; Reuben Dorent; Jan Egger; David G. Ellis; Sandy Engelhardt; Melanie Ganz; Noha Ghatwary; Gabriel Girard; Patrick Godau; Anubha Gupta; Lasse Hansen; Kanako Harada; Mattias P. Heinrich; Nicholas Heller; Alessa Hering; Arnaud Huaulmé; Pierre Jannin; Ali Emre Kavur; Oldřich Kodym; Michal Kozubek; Jianning Li; Hongwei Li; Jun Ma; Carlos Martín-Isla; Bjoern Menze; Alison Noble; Valentin Oreiller; Nicolas Padoy; Sarthak Pati; Kelly Payette; Tim Rädsch; Jonathan Rafael-Patiño; Vivek Singh Bawa; Stefanie Speidel; Carole H. Sudre; Kimberlin van Wijnen; Martin Wagner; Donglai Wei; Amine Yamlahi; Moi Hoon Yap; Chun Yuan; Maximilian Zenk; Aneeq Zia; David Zimmerer; Dogu Baran Aydogan; Binod Bhattarai; Louise Bloch; Raphael Brüngel; Jihoon Cho; Chanyeol Choi; Qi Dou; Ivan Ezhov; Christoph M. Friedrich; Clifton D. Fuller; Rebati Raman Gaire; Adrian Galdran; Álvaro García Faura; Maria Grammatikopoulou; SeulGi Hong; Mostafa Jahanifar; Ikbeom Jang; Abdolrahim Kadkhodamohammadi; Inha Kang; Florian Kofler; Satoshi Kondo; Hugo Kuijf; Mingxing Li; Minh Luu; Tomaž Martinčič; Pedro Morais; Mohamed A. Naser; Bruno Oliveira; David Owen; Subeen Pang; Jinah Park; Sung-Hong Park; Szymon Plotka; Elodie Puybareau; Nasir Rajpoot; Kanghyun Ryu; Numan Saeed; Adam Shephard; Pengcheng Shi; Dejan Štepec; Ronast Subedi; Guillaume Tochon; Helena R. Torres; Helene Urien; João L. Vilaça; Kareem A. Wahid; Haojie Wang; Jiacheng Wang; Liansheng Wang; Xiyue Wang; Benedikt Wiestler; Marek Wodzinski; Fangfang Xia; Juanying Xie; Zhiwei Xiong; Sen Yang; Yanwu Yang; Zixuan Zhao; Klaus Maier-Hein; Paul F. Jäger; Annette Kopp-Schneider; Lena Maier-Hein; |
1860 | HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates us to design a deep architecture to model the hierarchical geometry from points, edges, surfaces (triangles), to super-surfaces (adjacent surfaces) for the thorough analysis of point clouds. In this paper, we present a novel Hierarchical Geometry Network (HGNet) that integrates such hierarchical geometry structures from super-surfaces, surfaces, edges, to points in a top-down manner for learning point cloud representations. |
Ting Yao; Yehao Li; Yingwei Pan; Tao Mei; |
1861 | PointVector: A Vector Representation in Point Cloud Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, standard MLPs are limited in their ability to extract local features effectively. To address this limitation, we propose a Vector-oriented Point Set Abstraction that can aggregate neighboring features through higher-dimensional vectors. |
Xin Deng; WenYu Zhang; Qing Ding; XinMing Zhang; |
1862 | BAEFormer: Bi-Directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing Transformer-based methods confront difficulties in transforming Perspective View (PV) to BEV due to their unidirectional and posterior interaction mechanisms. To address this issue, we propose a novel Bi-directional and Early Interaction Transformers framework named BAEFormer, consisting of (i) an early-interaction PV-BEV pipeline and (ii) a bi-directional cross-attention mechanism. |
Cong Pan; Yonghao He; Junran Peng; Qian Zhang; Wei Sui; Zhaoxiang Zhang; |
1863 | Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we analyze the effect of clothing on the model inference and adopt a dual-branch model to simulate causal intervention. |
Zhengwei Yang; Meng Lin; Xian Zhong; Yu Wu; Zheng Wang; |
1864 | Use Your Head: Improving Long-Tail Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents an investigation into long-tail video recognition. |
Toby Perrett; Saptarshi Sinha; Tilo Burghardt; Majid Mirmehdi; Dima Damen; |
1865 | Revisiting The P3P Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we algebraically formulate the problem as finding the intersection of two conics. |
Yaqing Ding; Jian Yang; Viktor Larsson; Carl Olsson; Kalle Åström; |
1866 | Generic-to-Specific Distillation of Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose generic-to-specific distillation (G2SD), to tap the potential of small ViT models under the supervision of large models pre-trained by masked autoencoders. |
Wei Huang; Zhiliang Peng; Li Dong; Furu Wei; Jianbin Jiao; Qixiang Ye; |
1867 | PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose PAniC-3D, a system to reconstruct stylized 3D character heads directly from illustrated (p)ortraits of (ani)me (c)haracters. |
Shuhong Chen; Kevin Zhang; Yichun Shi; Heng Wang; Yiheng Zhu; Guoxian Song; Sizhe An; Janus Kristjansson; Xiao Yang; Matthias Zwicker; |
1868 | Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel network called LF-IENet for light field semantic segmentation. |
Ruixuan Cong; Da Yang; Rongshan Chen; Sizhe Wang; Zhenglong Cui; Hao Sheng; |
1869 | TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. |
Ishan Rajendrakumar Dave; Mamshad Nayeem Rizve; Chen Chen; Mubarak Shah; |
1870 | RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents RiDDLE, short for Reversible and Diversified De-identification with Latent Encryptor, to protect the identity information of people from being misused. |
Dongze Li; Wei Wang; Kang Zhao; Jing Dong; Tieniu Tan; |
1871 | SunStage: Portrait Reconstruction and Relighting Using The Sun As A Light Stage Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present SunStage: a lightweight alternative to a light stage that captures comparable data using only a smartphone camera and the sun. |
Yifan Wang; Aleksander Holynski; Xiuming Zhang; Xuaner Zhang; |
1872 | Private Image Generation With Dual-Purpose Auxiliary Classifier Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Jointly considering these two views of utility, the standard and the reversed, could help the generation model better improve transferability between fake and real data. Therefore, we propose a novel private image generation method that incorporates a dual-purpose auxiliary classifier, which alternates between learning from real data and fake data, into the training of differentially private GANs. |
Chen Chen; Daochang Liu; Siqi Ma; Surya Nepal; Chang Xu; |
1873 | 3D-POP – An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose a method that uses a motion capture (mo-cap) system to obtain a large amount of annotated data on animal movement and posture (2D and 3D) in a semi-automatic manner. |
Hemal Naik; Alex Hoi Hang Chan; Junran Yang; Mathilde Delacoux; Iain D. Couzin; Fumihiro Kano; Máté Nagy; |
1874 | SOOD: Towards Semi-Supervised Oriented Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. |
Wei Hua; Dingkang Liang; Jingyu Li; Xiaolong Liu; Zhikang Zou; Xiaoqing Ye; Xiang Bai; |
1875 | Unified Keypoint-Based Action Recognition Framework Via Structured Keypoint Pooling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A point cloud deep-learning paradigm is introduced to the action recognition, and a unified framework along with a novel deep neural network architecture called Structured Keypoint Pooling is proposed. |
Ryo Hachiuma; Fumiaki Sato; Taiki Sekii; |
1876 | Multi-View Reconstruction Using Signed Ray Distance Functions (SRDF) Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate a new optimization framework for multi-view 3D shape reconstructions. |
Pierre Zins; Yuanlu Xu; Edmond Boyer; Stefanie Wuhrer; Tony Tung; |
1877 | Beyond MAP: Towards Better Evaluation of Instance Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We therefore cannot rely on AP to choose a model that provides an optimal tradeoff between false positives and high recall. To resolve this dilemma, we review alternative metrics in the literature and propose two new measures to explicitly measure the amount of both spatial and categorical duplicate predictions. We also propose a Semantic Sorting and NMS module to remove these duplicates based on a pixel occupancy matching scheme. |
Rohit Jena; Lukas Zhornyak; Nehal Doiphode; Pratik Chaudhari; Vivek Buch; James Gee; Jianbo Shi; |
1878 | Generating Aligned Pseudo-Supervision From Non-Aligned Data for Image Restoration in Under-Display Camera Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we revisit the classic stereo setup for training data collection — capturing two images of the same scene with one UDC and one standard camera. |
Ruicheng Feng; Chongyi Li; Huaijin Chen; Shuai Li; Jinwei Gu; Chen Change Loy; |
1879 | Improving Cross-Modal Retrieval With Set of Diverse Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel set-based embedding method, which is distinct from previous work in two aspects. |
Dongwon Kim; Namyup Kim; Suha Kwak; |
1880 | BASiS: Batch Aligned Spectral Embedding Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a different approach of directly learning the graph’s eigensapce. |
Or Streicher; Ido Cohen; Guy Gilboa; |
1881 | Neural Pixel Composition for 3D-4D View Synthesis From Multi-Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Neural Pixel Composition (NPC), a novel approach for continuous 3D-4D view synthesis given only a discrete set of multi-view observations as input. |
Aayush Bansal; Michael Zollhöfer; |
1882 | DCFace: Synthetic Face Generation With Dual Condition Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we approach the problem from the aspect of combining subject appearance (ID) and external factor (style) conditions. |
Minchul Kim; Feng Liu; Anil Jain; Xiaoming Liu; |
1883 | CRAFT: Concept Recursive Activation FacTorization for Explainability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, recent work has shown that these methods have limited utility in practice, presumably because they only highlight the most salient parts of an image (i.e., "where" the model looked) and do not communicate any information about "what" the model saw at those locations. In this work, we try to fill in this gap with Craft — a novel approach to identify both "what" and "where" by generating concept-based explanations. |
Thomas Fel; Agustin Picard; Louis Béthune; Thibaut Boissin; David Vigouroux; Julien Colin; Rémi Cadène; Thomas Serre; |
1884 | Policy Adaptation From Foundation Model Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose Policy Adaptation from Foundation model Feedback (PAFF). |
Yuying Ge; Annabella Macaluso; Li Erran Li; Ping Luo; Xiaolong Wang; |
1885 | Recognizing Rigid Patterns of Unlabeled Point Clouds By Complete and Continuous Isometry Invariants With No False Negatives and No False Positives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first continuous and complete invariant of unlabeled clouds in any Euclidean space. |
Daniel Widdowson; Vitaliy Kurlin; |
1886 | N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking multi-scale outputs of the hierarchical encoder. |
Haram Choi; Jeongmin Lee; Jihoon Yang; |
1887 | Semi-DETR: Semi-Supervised Object Detection With Detection Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Besides, we introduce a Cross-view Query Consistency method to learn the semantic feature invariance of object queries from different views while avoiding the need to find deterministic query correspondence. |
Jiacheng Zhang; Xiangru Lin; Wei Zhang; Kuo Wang; Xiao Tan; Junyu Han; Errui Ding; Jingdong Wang; Guanbin Li; |
1888 | Infinite Photorealistic Worlds Using Procedural Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. |
Alexander Raistrick; Lahav Lipson; Zeyu Ma; Lingjie Mei; Mingzhe Wang; Yiming Zuo; Karhan Kayan; Hongyu Wen; Beining Han; Yihan Wang; Alejandro Newell; Hei Law; Ankit Goyal; Kaiyu Yang; Jia Deng; |
1889 | Diversity-Measurable Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, to better solve the tradeoff problem, we propose Diversity-Measurable Anomaly Detection (DMAD) framework to enhance reconstruction diversity while avoid the undesired generalization on anomalies. |
Wenrui Liu; Hong Chang; Bingpeng Ma; Shiguang Shan; Xilin Chen; |
1890 | Hybrid Neural Rendering for Large-Scale Scenes With Motion Blur Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose strategies to simulate blur effects on the rendered images to mitigate the negative influence of blurriness images and reduce their importance during training based on precomputed quality-aware weights. |
Peng Dai; Yinda Zhang; Xin Yu; Xiaoyang Lyu; Xiaojuan Qi; |
1891 | Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, in this paper, we propose a new SISR framework that applies optimal objectives for each region to generate plausible results in overall areas of high-resolution outputs. |
Seung Ho Park; Young Su Moon; Nam Ik Cho; |
1892 | GP-VTON: Towards General Purpose Virtual Try-On Via Collaborative Local-Flow Global-Parsing Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The above inferior performance hinders existing methods from real-world applications. To address these problems and take a step towards real-world virtual try-on, we propose a General-Purpose Virtual Try-ON framework, named GP-VTON, by developing an innovative Local-Flow Global-Parsing (LFGP) warping module and a Dynamic Gradient Truncation (DGT) training strategy. |
Zhenyu Xie; Zaiyu Huang; Xin Dong; Fuwei Zhao; Haoye Dong; Xijin Zhang; Feida Zhu; Xiaodan Liang; |
1893 | A Large-Scale Robustness Analysis of Video Action Recognition Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. |
Madeline Chantry Schiappa; Naman Biyani; Prudvi Kamtam; Shruti Vyas; Hamid Palangi; Vibhav Vineet; Yogesh S. Rawat; |
1894 | Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods either learn the combined state-object representation, challenging the generalization of unseen compositions, or design two classifiers to identify state and object separately from image features, ignoring the intrinsic relationship between them. To jointly eliminate the above issues and construct a more robust CZSL system, we propose a novel framework termed Decomposed Fusion with Soft Prompt (DFSP), by involving vision-language models (VLMs) for unseen composition recognition. |
Xiaocheng Lu; Song Guo; Ziming Liu; Jingcai Guo; |
1895 | Hierarchical Semantic Contrast for Scene-Aware Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a hierarchical semantic contrast (HSC) method to learn a scene-aware VAD model from normal videos. |
Shengyang Sun; Xiaojin Gong; |
1896 | All-in-Focus Imaging From Event Focal Stack Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, to restore an all-in-focus image, we propose the event focal stack which is defined as event streams captured during a continuous focal sweep. |
Hanyue Lou; Minggui Teng; Yixin Yang; Boxin Shi; |
1897 | Video Probabilistic Diffusion Models in Projected Latent Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works on diffusion models have shown their potential to solve this challenge, yet they suffer from severe computation- and memory-inefficiency that limit the scalability. To handle this issue, we propose a novel generative model for videos, coined projected latent video diffusion models (PVDM), a probabilistic diffusion model which learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources. |
Sihyun Yu; Kihyuk Sohn; Subin Kim; Jinwoo Shin; |
1898 | Learning 3D Scene Priors With 2D Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth. |
Yinyu Nie; Angela Dai; Xiaoguang Han; Matthias Nießner; |
1899 | Blind Video Deflickering By Neural Filtering With A Flawed Atlas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a general flicker removal framework that only receives a single flickering video as input without additional guidance. |
Chenyang Lei; Xuanchi Ren; Zhaoxiang Zhang; Qifeng Chen; |
1900 | Label-Free Liver Tumor Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that AI models can accurately segment liver tumors without the need for manual annotation by using synthetic tumors in CT scans. |
Qixin Hu; Yixiong Chen; Junfei Xiao; Shuwen Sun; Jieneng Chen; Alan L. Yuille; Zongwei Zhou; |
1901 | Grid-Guided Neural Radiance Fields for Large Urban Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new framework that realizes high-fidelity rendering on large urban scenes while being computationally efficient. |
Linning Xu; Yuanbo Xiangli; Sida Peng; Xingang Pan; Nanxuan Zhao; Christian Theobalt; Bo Dai; Dahua Lin; |
1902 | Defining and Quantifying The Emergence of Sparse Concepts in DNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims to illustrate the concept-emerging phenomenon in a trained DNN. |
Jie Ren; Mingjie Li; Qirui Chen; Huiqi Deng; Quanshi Zhang; |
1903 | Uncurated Image-Text Datasets: Shedding Light on Demographic Bias Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our first contribution is to annotate part of the Google Conceptual Captions dataset, widely used for training vision-and-language models, with four demographic and two contextual attributes. |
Noa Garcia; Yusuke Hirota; Yankun Wu; Yuta Nakashima; |
1904 | FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence in this paper, we propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation. |
Jie Qin; Jie Wu; Pengxiang Yan; Ming Li; Ren Yuxi; Xuefeng Xiao; Yitong Wang; Rui Wang; Shilei Wen; Xin Pan; Xingang Wang; |
1905 | AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present AVFormer, a simple method for augmenting audioonly models with visual information, at the same time performing lightweight domain adaptation. |
Paul Hongsuck Seo; Arsha Nagrani; Cordelia Schmid; |
1906 | FreeNeRF: Improving Few-Shot Neural Rendering With Free Frequency Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Frequency regularized NeRF (FreeNeRF), a surprisingly simple baseline that outperforms previous methods with minimal modifications to plain NeRF. |
Jiawei Yang; Marco Pavone; Yue Wang; |
1907 | Adversarial Robustness Via Random Projection Filters Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Taking advantage of the properties of random projection, we propose to replace part of convolutional filters with random projection filters, and theoretically explore the geometric representation preservation of proposed synthesized filters via Johnson-Lindenstrauss lemma. |
Minjing Dong; Chang Xu; |
1908 | VNE: An Effective Method for Improving Deep Representation By Manipulating Eigenvalue Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, manipulating such properties can be challenging in terms of implementational effectiveness and general applicability. To address these limitations, we propose to regularize von Neumann entropy (VNE) of representation. |
Jaeill Kim; Suhyun Kang; Duhun Hwang; Jungwook Shin; Wonjong Rhee; |
1909 | Self-Guided Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability and correctness. In this paper, we eliminate the need for such annotation by instead exploiting the flexibility of self-supervision signals to design a framework for self-guided diffusion models. |
Vincent Tao Hu; David W. Zhang; Yuki M. Asano; Gertjan J. Burghouts; Cees G. M. Snoek; |
1910 | NeuWigs: A Neural Dynamic Model for Volumetric Hair Capture and Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Both problems are highly challenging, because hair has complex geometry and appearance, as well as exhibits challenging motion. In this paper, we present a two-stage approach that models hair independently from the head to address these challenges in a data-driven manner. |
Ziyan Wang; Giljoo Nam; Tuur Stuyck; Stephen Lombardi; Chen Cao; Jason Saragih; Michael Zollhöfer; Jessica Hodgins; Christoph Lassner; |
1911 | CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On top of that, we propose a cross-modal contrastive objective to learn semantic and instance-level aligned point cloud representation. |
Yihan Zeng; Chenhan Jiang; Jiageng Mao; Jianhua Han; Chaoqiang Ye; Qingqiu Huang; Dit-Yan Yeung; Zhen Yang; Xiaodan Liang; Hang Xu; |
1912 | HNeRV: A Hybrid Neural Representation for Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Hybrid Neural Representation for Videos (HNeRV), where learnable and content-adaptive embeddings act as decoder input. |
Hao Chen; Matthew Gwilliam; Ser-Nam Lim; Abhinav Shrivastava; |
1913 | Model-Agnostic Gender Debiased Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. |
Yusuke Hirota; Yuta Nakashima; Noa Garcia; |
1914 | Local Implicit Ray Function for Generalizable Radiance Field Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LIRF (Local Implicit Ray Function), a generalizable neural rendering approach for novel view rendering. |
Xin Huang; Qi Zhang; Ying Feng; Xiaoyu Li; Xuan Wang; Qing Wang; |
1915 | One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Field Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. |
Weichuang Li; Longhao Zhang; Dong Wang; Bin Zhao; Zhigang Wang; Mulin Chen; Bang Zhang; Zhongjian Wang; Liefeng Bo; Xuelong Li; |
1916 | FitMe: Deep Photorealistic 3D Morphable Model Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce FitMe, a facial reflectance model and a differentiable rendering optimization pipeline, that can be used to acquire high-fidelity renderable human avatars from single or multiple images. |
Alexandros Lattas; Stylianos Moschoglou; Stylianos Ploumpis; Baris Gecer; Jiankang Deng; Stefanos Zafeiriou; |
1917 | Dense Distinct Query for End-to-End Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper shows that the solution should be Dense Distinct Queries (DDQ). |
Shilong Zhang; Xinjiang Wang; Jiaqi Wang; Jiangmiao Pang; Chengqi Lyu; Wenwei Zhang; Ping Luo; Kai Chen; |
1918 | CLIPPO: Image-and-Language Understanding From Pixels Only Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore an additional unification: the use of a pure pixel-based model to perform image, text, and multimodal tasks. |
Michael Tschannen; Basil Mustafa; Neil Houlsby; |
1919 | Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Trajectory-Aware Body Interaction Transformer (TBIFormer) for multi-person pose forecasting via effectively modeling body part interactions. |
Xiaogang Peng; Siyuan Mao; Zizhao Wu; |
1920 | Conditional Image-to-Video Generation With Latent Flow Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. |
Haomiao Ni; Changhao Shi; Kai Li; Sharon X. Huang; Martin Renqiang Min; |
1921 | Virtual Sparse Convolution for Multimodal 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a fast yet effective backbone, termed VirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), for virtual-point-based 3D object detection. |
Hai Wu; Chenglu Wen; Shaoshuai Shi; Xin Li; Cheng Wang; |
1922 | DETR With Additional Global Aggregation for Cross-Domain Weakly Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a DETR-based method for cross-domain weakly supervised object detection (CDWSOD), aiming at adapting the detector from source to target domain through weak supervision. |
Zongheng Tang; Yifan Sun; Si Liu; Yi Yang; |
1923 | Divide and Adapt: Active Domain Adaptation Via Customized Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Divide-and-Adapt (DiaNA), a new ADA framework that partitions the target instances into four categories with stratified transferable properties. |
Duojun Huang; Jichang Li; Weikai Chen; Junshi Huang; Zhenhua Chai; Guanbin Li; |
1924 | Towards Universal Fake Image Detectors That Generalize Across Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The real class becomes a ‘sink’ class holding anything that is not fake, including generated images from models not accessible during training. Building upon this discovery, we propose to perform real-vs-fake classification without learning; i.e., using a feature space not explicitly trained to distinguish real from fake images. |
Utkarsh Ojha; Yuheng Li; Yong Jae Lee; |
1925 | Towards Bridging The Performance Gaps of Joint Energy-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a variety of training techniques to bridge the accuracy gap and the generation quality gap of JEM. |
Xiulong Yang; Qing Su; Shihao Ji; |
1926 | Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel framework that incorporates the spatial-temporal interpolation of events to VSR in a unified framework. |
Yunfan Lu; Zipeng Wang; Minjie Liu; Hongjian Wang; Lin Wang; |
1927 | Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel yet flexible dual-path UDA framework, DPPASS, taking ERP and tangent projection (TP) images as inputs. |
Xu Zheng; Jinjing Zhu; Yexin Liu; Zidong Cao; Chong Fu; Lin Wang; |
1928 | ExpOSE: Accurate Initialization-Free Projective Factorization Using Exponential Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that pOSE has an undesirable penalization of large depths. |
José Pedro Iglesias; Amanda Nilsson; Carl Olsson; |
1929 | OpenGait: Revisiting Gait Recognition Towards Better Practicality Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: More importantly, we also find that some conclusions drawn from indoor datasets cannot be generalized to real applications. Therefore, the primary goal of this paper is to present a comprehensive benchmark study for better practicality rather than only a particular model for better performance. |
Chao Fan; Junhao Liang; Chuanfu Shen; Saihui Hou; Yongzhen Huang; Shiqi Yu; |
1930 | ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose ALTO to sequentially alternate between geometric representations, before converging to an easy-to-decode latent. |
Zhen Wang; Shijie Zhou; Jeong Joon Park; Despoina Paschalidou; Suya You; Gordon Wetzstein; Leonidas Guibas; Achuta Kadambi; |
1931 | Learning Debiased Representations Via Conditional Attribute Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When a dataset is biased, i.e., most samples have attributes spuriously correlated with the target label, a Deep Neural Network (DNN) is prone to make predictions by the "unintended" attribute, especially if it is easier to learn. To improve the generalization ability when training on such a biased dataset, we propose a chi^2-model to learn debiased representations. |
Yi-Kai Zhang; Qi-Wei Wang; De-Chuan Zhan; Han-Jia Ye; |
1932 | A Large-Scale Homography Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. |
Daniel Barath; Dmytro Mishkin; Michal Polic; Wolfgang Förstner; Jiri Matas; |
1933 | Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, in this paper, we propose to model both inter-class and intra-class constraints in NCD based on the symmetric Kullback-Leibler divergence (sKLD). |
Wenbin Li; Zhichen Fan; Jing Huo; Yang Gao; |
1934 | Weakly Supervised Video Emotion Detection and Prediction Via Cross-Modal Temporal Erasing Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle that, in this paper, we propose a cross-modal temporal erasing network that locates not only keyframes but also context and audio-related information in a weakly-supervised manner. |
Zhicheng Zhang; Lijuan Wang; Jufeng Yang; |
1935 | Multiple Instance Learning Via Iterative Self-Paced Supervised Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, in real-world applications such as medical image classification, there is often class imbalance, so randomly-selected instances mostly belong to the same majority class, which precludes CSSL from learning inter-class differences. To address this issue, we propose a novel framework, Iterative Self-paced Supervised Contrastive Learning for MIL Representations (ItS2CLR), which improves the learned representation by exploiting instance-level pseudo labels derived from the bag-level labels. |
Kangning Liu; Weicheng Zhu; Yiqiu Shen; Sheng Liu; Narges Razavian; Krzysztof J. Geras; Carlos Fernandez-Granda; |
1936 | Consistent View Synthesis With Pose-Guided Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. |
Hung-Yu Tseng; Qinbo Li; Changil Kim; Suhib Alsisan; Jia-Bin Huang; Johannes Kopf; |
1937 | MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel framework with better utilization of the depth information and fine-grained cross-modal interaction between LiDAR and camera, which consists of two important components. |
Yang Jiao; Zequn Jie; Shaoxiang Chen; Jingjing Chen; Lin Ma; Yu-Gang Jiang; |
1938 | Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize and recognize all audio-visual events occurring in an untrimmed video. |
Tiantian Geng; Teng Wang; Jinming Duan; Runmin Cong; Feng Zheng; |
1939 | Weak-Shot Object Detection Through Mutual Knowledge Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Weak-shot Object Detection methods exploit a fully-annotated source dataset to facilitate the detection performance on the target dataset which only contains image-level labels for novel categories. To bridge the gap between these two datasets, we aim to transfer the object knowledge between the source (S) and target (T) datasets in a bi-directional manner. |
Xuanyi Du; Weitao Wan; Chong Sun; Chen Li; |
1940 | DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose DATID-3D, a domain adaptation method tailored for 3D generative models using text-to-image diffusion models that can synthesize diverse images per text prompt without collecting additional images and camera information for the target domain. |
Gwanghyun Kim; Se Young Chun; |
1941 | CrowdCLIP: Unsupervised Crowd Counting Via Vision-Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To alleviate the problem, we propose a novel unsupervised framework for crowd counting, named CrowdCLIP. |
Dingkang Liang; Jiahao Xie; Zhikang Zou; Xiaoqing Ye; Wei Xu; Xiang Bai; |
1942 | Toward Stable, Interpretable, and Lightweight Hyperspectral Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop a new coordination optimization framework for stable, interpretable, and lightweight HSI-SR. |
Weiying Xie; Kai Jiang; Yunsong Li; Jie Lei; Leyuan Fang; Wen-jin Guo; |
1943 | Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel Generative Adversarial Networks alike framework, referred to as GAN-MAE, where a generator is used to generate the masked patches according to the remaining visible patches, and a discriminator is employed to predict whether the patch is synthesized by the generator. |
Zhengcong Fei; Mingyuan Fan; Li Zhu; Junshi Huang; Xiaoming Wei; Xiaolin Wei; |
1944 | ICLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a method that effectively combines two prevalent visual recognition methods, i.e., image classification and contrastive language-image pre-training, dubbed iCLIP. |
Yixuan Wei; Yue Cao; Zheng Zhang; Houwen Peng; Zhuliang Yao; Zhenda Xie; Han Hu; Baining Guo; |
1945 | Learning Neural Volumetric Representations of Dynamic Humans in Minutes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method for learning neural volumetric representations of dynamic humans in minutes with competitive visual quality. |
Chen Geng; Sida Peng; Zhen Xu; Hujun Bao; Xiaowei Zhou; |
1946 | Streaming Video Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose to unify video understanding tasks into one novel streaming video architecture, referred to as Streaming Vision Transformer (S-ViT). |
Yucheng Zhao; Chong Luo; Chuanxin Tang; Dongdong Chen; Noel Codella; Zheng-Jun Zha; |
1947 | CapDet: Unifying Dense Captioning and Open-World Detection Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To introduce a "real" open-world detector, in this paper, we propose a novel method named CapDet to either predict under a given category list or directly generate the category of predicted bounding boxes. |
Yanxin Long; Youpeng Wen; Jianhua Han; Hang Xu; Pengzhen Ren; Wei Zhang; Shen Zhao; Xiaodan Liang; |
1948 | Bayesian Posterior Approximation With Stochastic Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce ensembles of stochastic neural networks to approximate the Bayesian posterior, combining stochastic methods such as dropout with deep ensembles. |
Oleksandr Balabanov; Bernhard Mehlig; Hampus Linander; |
1949 | RILS: Masked Visual Reconstruction in Language Semantic Space Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we seek the synergy between two paradigms and study the emerging properties when MIM meets natural language supervision. |
Shusheng Yang; Yixiao Ge; Kun Yi; Dian Li; Ying Shan; Xiaohu Qie; Xinggang Wang; |
1950 | Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To approach this issue, we reveal that the model learned by the working memory are actually residing in a redundant high-dimensional space, and the knowledge incorporated in the model can have a quite compact representation under a group of pattern basis shared by all incremental learning tasks. Therefore, we propose a knowledge projection process to adapatively maintain the shared basis, with which the loosely organized model knowledge of working memory is projected into the compact representation to be remembered in the long-term memory. |
Wenju Sun; Qingyong Li; Jing Zhang; Wen Wang; Yangli-ao Geng; |
1951 | R2Former: Unified Retrieval and Reranking Transformer for Place Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a unified place recognition framework that handles both retrieval and reranking with a novel transformer model, named R2Former. |
Sijie Zhu; Linjie Yang; Chen Chen; Mubarak Shah; Xiaohui Shen; Heng Wang; |
1952 | RepMode: Learning to Re-Parameterize Diverse Experts for Subcellular Structure Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we model it as a deep learning task termed subcellular structure prediction (SSP), aiming to predict the 3D fluorescent images of multiple subcellular structures from a 3D transmitted-light image. |
Donghao Zhou; Chunbin Gu; Junde Xu; Furui Liu; Qiong Wang; Guangyong Chen; Pheng-Ann Heng; |
1953 | Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unsupervised completion of real scene objects is of vital importance but still remains extremely challenging in preserving input shapes, predicting accurate results, and adapting to multi-category data. To solve these problems, we propose in this paper an Unsupervised Symmetric Shape-Preserving Autoencoding Network, termed USSPA, to predict complete point clouds of objects from real scenes. |
Changfeng Ma; Yinuo Chen; Pengxiao Guo; Jie Guo; Chongjun Wang; Yanwen Guo; |
1954 | Modality-Agnostic Debiasing for Single Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we target a versatile Modality-Agnostic Debiasing (MAD) framework for single-DG, that enables generalization for different modalities. |
Sanqing Qu; Yingwei Pan; Guang Chen; Ting Yao; Changjun Jiang; Tao Mei; |
1955 | Difficulty-Based Sampling for Debiased Contrastive Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we go beyond the statistical approach and explore the connection between hard negative samples and data bias. |
Taeuk Jang; Xiaoqian Wang; |
1956 | Masked Motion Encoding for Self-Supervised Video Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, simply masking and recovering appearance contents may not be sufficient to model temporal clues as the appearance contents can be easily reconstructed from a single frame. To overcome this limitation, we present Masked Motion Encoding (MME), a new pre-training paradigm that reconstructs both appearance and motion information to explore temporal clues. |
Xinyu Sun; Peihao Chen; Liangwei Chen; Changhao Li; Thomas H. Li; Mingkui Tan; Chuang Gan; |
1957 | CompletionFormer: Depth Completion With Convolutions and Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a joint convolutional attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure. |
Youmin Zhang; Xianda Guo; Matteo Poggi; Zheng Zhu; Guan Huang; Stefano Mattoccia; |
1958 | Comprehensive and Delicate: An Efficient Transformer for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel efficient image restoration Transformer that first captures the superpixel-wise global dependency, and then transfers it into each pixel. |
Haiyu Zhao; Yuanbiao Gou; Boyun Li; Dezhong Peng; Jiancheng Lv; Xi Peng; |
1959 | Zero-Shot Model Diagnosis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling. |
Jinqi Luo; Zhaoning Wang; Chen Henry Wu; Dong Huang; Fernando De la Torre; |
1960 | Improving Visual Grounding By Encouraging Consistent Gradient-Based Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a margin-based loss for tuning joint vision-language models so that their gradient-based explanations are consistent with region-level annotations provided by humans for relatively smaller grounding datasets. |
Ziyan Yang; Kushal Kafle; Franck Dernoncourt; Vicente Ordonez; |
1961 | Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors Via 3D Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures. |
Zhanhao Hu; Wenda Chu; Xiaopei Zhu; Hui Zhang; Bo Zhang; Xiaolin Hu; |
1962 | ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their restored images still suffer from unsatisfactory boundary artifacts, due to the lack of degradation prior and the deficiency in modeling capacity. Our work addresses these issues by proposing a unified diffusion framework that integrates both the image and degradation priors for highly effective shadow removal. |
Lanqing Guo; Chong Wang; Wenhan Yang; Siyu Huang; Yufei Wang; Hanspeter Pfister; Bihan Wen; |
1963 | FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a large-scale facial UV-texture dataset that contains over 50,000 high-quality texture UV-maps with even illuminations, neutral expressions, and cleaned facial regions, which are desired characteristics for rendering realistic 3D face models under different lighting conditions. |
Haoran Bai; Di Kang; Haoxian Zhang; Jinshan Pan; Linchao Bao; |
1964 | Pruning Parameterization With Bi-Level Optimization for Efficient Semantic Segmentation on The Edge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to derive ViTs with fewer computations and fast inference speed to facilitate the dense prediction of semantic segmentation on edge devices. To achieve this, we propose a pruning parameterization method to formulate the pruning problem of semantic segmentation. |
Changdi Yang; Pu Zhao; Yanyu Li; Wei Niu; Jiexiong Guan; Hao Tang; Minghai Qin; Bin Ren; Xue Lin; Yanzhi Wang; |
1965 | Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the FEature Decomposition and Edge Reconstruction (FEDER) model for COD. |
Chunming He; Kai Li; Yachao Zhang; Longxiang Tang; Yulun Zhang; Zhenhua Guo; Xiu Li; |
1966 | ALOFT: A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, several MLP-based methods have achieved promising results in supervised learning tasks by learning global interactions among different patches of the image. Inspired by this, in this paper, we first analyze the difference between CNN and MLP methods in DG and find that MLP methods exhibit a better generalization ability because they can better capture the global representations (e.g., structure) than CNN methods. Then, based on a recent lightweight MLP method, we obtain a strong baseline that outperforms most start-of-the-art CNN-based methods. |
Jintao Guo; Na Wang; Lei Qi; Yinghuan Shi; |
1967 | NLOST: Non-Line-of-Sight Imaging With Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To boost the performance, we present NLOST, the first transformer-based neural network for NLOS reconstruction. |
Yue Li; Jiayong Peng; Juntian Ye; Yueyi Zhang; Feihu Xu; Zhiwei Xiong; |
1968 | Text-Visual Prompting for Efficient 2D Temporal Video Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of temporal video grounding (TVG), which aims to predict the starting/ending time points of moments described by a text sentence within a long untrimmed video. |
Yimeng Zhang; Xin Chen; Jinghan Jia; Sijia Liu; Ke Ding; |
1969 | SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce SurfelNeRF, a variant of neural radiance field which employs a flexible and scalable neural surfel representation to store geometric attributes and extracted appearance features from input images. |
Yiming Gao; Yan-Pei Cao; Ying Shan; |
1970 | Learning Visual Representations Via Language-Guided Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Language allows us to abstract away visual variation to represent and communicate concepts. Building on this intuition, we propose an alternative approach to visual representation learning: using language similarity to sample semantically similar image pairs for contrastive learning. |
Mohamed El Banani; Karan Desai; Justin Johnson; |
1971 | Logical Implications for Visual Question Answering Consistency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, we propose a novel strategy intended to improve model performance by directly reducing logical inconsistencies. |
Sergio Tascon-Morales; Pablo Márquez-Neila; Raphael Sznitman; |
1972 | NeUDF: Leaning Neural Unsigned Distance Fields With Volume Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a new neural rendering framework, coded NeUDF, that can reconstruct surfaces with arbitrary topologies solely from multi-view supervision. |
Yu-Tao Liu; Li Wang; Jie Yang; Weikai Chen; Xiaoxu Meng; Bo Yang; Lin Gao; |
1973 | Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we devise a novel Transformer model termed as Master specifically for style transfer. |
Hao Tang; Songhua Liu; Tianwei Lin; Shaoli Huang; Fu Li; Dongliang He; Xinchao Wang; |
1974 | Affordance Diffusion: Synthesizing Hand-Object Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, in this work we focus on synthesizing complex interactions (i.e., an articulated hand) with a given object. |
Yufei Ye; Xueting Li; Abhinav Gupta; Shalini De Mello; Stan Birchfield; Jiaming Song; Shubham Tulsiani; Sifei Liu; |
1975 | NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction From Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of reconstructing 3D feature curves of an object from a set of calibrated multi-view images. |
Yunfan Ye; Renjiao Yi; Zhirui Gao; Chenyang Zhu; Zhiping Cai; Kai Xu; |
1976 | Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel visual similarity learning paradigm, Geometric Visual Similarity Learning, which embeds the prior of topological invariance into the measurement of the inter-image similarity for consistent representation of semantic regions. |
Yuting He; Guanyu Yang; Rongjun Ge; Yang Chen; Jean-Louis Coatrieux; Boyu Wang; Shuo Li; |
1977 | Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and A New Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, little light has been shed on measuring the aesthetic quality of artistic images, and the existing datasets only contain relatively few artworks. Such a defect is a great obstacle to the aesthetic assessment of artistic images. To fill the gap in the field of artistic image aesthetics assessment (AIAA), we first introduce a large-scale AIAA dataset: Boldbrush Artistic Image Dataset (BAID), which consists of 60,337 artistic images covering various art forms, with more than 360,000 votes from online users. |
Ran Yi; Haoyuan Tian; Zhihao Gu; Yu-Kun Lai; Paul L. Rosin; |
1978 | MM-3DScene: 3D Scene Understanding By Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel informative-preserved reconstruction, which explores local statistics to discover and preserve the representative structured points, effectively enhancing the pretext masking task for 3D scene understanding. |
Mingye Xu; Mutian Xu; Tong He; Wanli Ouyang; Yali Wang; Xiaoguang Han; Yu Qiao; |
1979 | Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a new framework that takes text-to-image synthesis to the realm of image-to-image translation — given a guidance image and a target text prompt as input, our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text, while preserving the semantic layout of the guidance image. |
Narek Tumanyan; Michal Geyer; Shai Bagon; Tali Dekel; |
1980 | Inverting The Imaging Process By Learning An Implicit Camera Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to existing implicit neural representations which focus on modelling the scene only, this paper proposes a novel implicit camera model which represents the physical imaging process of a camera as a deep neural network. |
Xin Huang; Qi Zhang; Ying Feng; Hongdong Li; Qing Wang; |
1981 | Fast Contextual Scene Graph Generation With Unbiased Context Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose a contextual scene graph generation (C-SGG) method without using visual information and introduce a context augmentation method. |
Tianlei Jin; Fangtai Guo; Qiwei Meng; Shiqiang Zhu; Xiangming Xi; Wen Wang; Zonghao Mu; Wei Song; |
1982 | Less Is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To effectively sub-sample our training data, we propose a new Spatio-Temporal Redundant Frame Downsampling (ST-RFD) method that leverages knowledge of sensor motion within the environment to extract a more diverse subset of training data frame samples. |
Li Li; Hubert P. H. Shum; Toby P. Breckon; |
1983 | Re-Thinking Federated Active Learning Based on Inter-Class Diversity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our findings, we propose LoGo, a FAL sampling strategy robust to varying local heterogeneity levels and global imbalance ratio, that integrates both models by two steps of active selection scheme. |
SangMook Kim; Sangmin Bae; Hwanjun Song; Se-Young Yun; |
1984 | Enhanced Training of Query-Based Object Detection Via Selective Query Recollection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage. |
Fangyi Chen; Han Zhang; Kai Hu; Yu-Kai Huang; Chenchen Zhu; Marios Savvides; |
1985 | AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. |
Wele Gedara Chaminda Bandara; Naman Patel; Ali Gholami; Mehdi Nikkhah; Motilal Agrawal; Vishal M. Patel; |
1986 | Detecting Human-Object Contact in Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there exists no robust method to detect contact between the body and the scene from an image, and there exists no dataset to learn such a detector. We fill this gap with HOT (Human-Object conTact), a new dataset of human-object contacts in images. |
Yixin Chen; Sai Kumar Dwivedi; Michael J. Black; Dimitrios Tzionas; |
1987 | PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present PointClustering, a new unsupervised representation learning scheme that leverages transformation invariance for point cloud pre-training. |
Fuchen Long; Ting Yao; Zhaofan Qiu; Lusong Li; Tao Mei; |
1988 | CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Such a local ensemble suffers from some limitations: i) it has no learnable parameters and it neglects the similarity of the visual features; ii) it has a limited receptive field and cannot ensemble relevant features in a large field which are important in an image. To address these issues, this paper proposes a continuous implicit attention-in-attention network, called CiaoSR. |
Jiezhang Cao; Qin Wang; Yongqin Xian; Yawei Li; Bingbing Ni; Zhiming Pi; Kai Zhang; Yulun Zhang; Radu Timofte; Luc Van Gool; |
1989 | Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we take an initiative step to explore and propose a unified framework termed as OOD Semantic Pruning (OSP), aims at pruning OOD semantics out from the in-distribution (ID) features. |
Yu Wang; Pengchong Qiao; Chang Liu; Guoli Song; Xiawu Zheng; Jie Chen; |
1990 | The Best Defense Is A Good Offense: Adversarial Augmentation Against Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many defenses against adversarial attacks (e.g. robust classifiers, randomization, or image purification) use countermeasures put to work only after the attack has been crafted. We adopt a different perspective to introduce A^5 (Adversarial Augmentation Against Adversarial Attacks), a novel framework including the first certified preemptive defense against adversarial attacks. |
Iuri Frosio; Jan Kautz; |
1991 | GaitGCI: Generative Counterfactual Intervention for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, prevailing methods are susceptible to confounders, resulting in the networks hardly focusing on the regions that reflect effective walking patterns. To address this fundamental problem in gait recognition, we propose a Generative Counterfactual Intervention framework, dubbed GaitGCI, consisting of Counterfactual Intervention Learning (CIL) and Diversity-Constrained Dynamic Convolution (DCDC). |
Huanzhang Dou; Pengyi Zhang; Wei Su; Yunlong Yu; Yining Lin; Xi Li; |
1992 | Constructing Deep Spiking Neural Networks From Artificial Neural Networks With Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As discrete signals, typical SNNs cannot apply the gradient descent rules directly into parameters adjustment as artificial neural networks (ANNs). Aiming at this limitation, here we propose a novel method of constructing deep SNN models with knowledge distillation (KD) that uses ANN as teacher model and SNN as student model. |
Qi Xu; Yaxin Li; Jiangrong Shen; Jian K. Liu; Huajin Tang; Gang Pan; |
1993 | Understanding and Improving Visual Prompting: A Label-Mapping Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To optimize LM, we propose a new VP framework, termed ILM-VP (iterative label mapping-based visual prompting), which automatically re-maps the source labels to the target labels and progressively improves the target task accuracy of VP. |
Aochuan Chen; Yuguang Yao; Pin-Yu Chen; Yihua Zhang; Sijia Liu; |
1994 | Directional Connectivity-Based Segmentation of Medical Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. |
Ziyun Yang; Sina Farsiu; |
1995 | Towards Flexible Multi-Modal Document Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we attempt at building a holistic model that can jointly solve many different design tasks. |
Naoto Inoue; Kotaro Kikuchi; Edgar Simo-Serra; Mayu Otani; Kota Yamaguchi; |
1996 | DegAE: A New Pretraining Paradigm for Low-Level Vision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. |
Yihao Liu; Jingwen He; Jinjin Gu; Xiangtao Kong; Yu Qiao; Chao Dong; |
1997 | The Differentiable Lens: Compound Lens Search Over Glass Surfaces and Materials for Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a differentiable spherical lens simulation model that accurately captures geometrical aberrations. |
Geoffroi Côté; Fahim Mannan; Simon Thibault; Jean-François Lalonde; Felix Heide; |
1998 | Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to mitigate the domain gap caused by target noise via learning to mask the source points during the adaptation procedure. |
Guangrui Li; Guoliang Kang; Xiaohan Wang; Yunchao Wei; Yi Yang; |
1999 | KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As knowledge provides crucial information which is complementary to visible content, in this paper, we propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability. |
Xiangyang Li; Zihan Wang; Jiahao Yang; Yaowei Wang; Shuqiang Jiang; |
2000 | LiDAR-in-the-Loop Hyperparameter Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To investigate the optimization of LiDAR system parameters, we devise a realistic LiDAR simulation method that generates raw waveforms as input to a LiDAR DSP pipeline. |
Félix Goudreault; Dominik Scheuble; Mario Bijelic; Nicolas Robidoux; Felix Heide; |
2001 | Local 3D Editing Via 3D Distillation of CLIP Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome the problems, we propose Local Editing NeRF (LENeRF), which only requires text inputs for fine-grained and localized manipulation. |
Junha Hyung; Sungwon Hwang; Daejin Kim; Hyunji Lee; Jaegul Choo; |
2002 | Abstract Visual Reasoning: An Algebraic Approach for Solving Raven’s Progressive Matrices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce algebraic machine reasoning, a new reasoning framework that is well-suited for abstract reasoning. |
Jingyi Xu; Tushar Vaidya; Yufei Wu; Saket Chandra; Zhangsheng Lai; Kai Fong Ernest Chong; |
2003 | 3D-Aware Conditional Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose pix2pix3D, a 3D-aware conditional generative model for controllable photorealistic image synthesis. |
Kangle Deng; Gengshan Yang; Deva Ramanan; Jun-Yan Zhu; |
2004 | Understanding Deep Generative Models With Generalized Empirical Likelihoods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that generalized empirical likelihood (GEL) methods offer a family of diagnostic tools that can identify many deficiencies of deep generative models (DGMs). |
Suman Ravuri; Mélanie Rey; Shakir Mohamed; Marc Peter Deisenroth; |
2005 | ABCD: Arbitrary Bitwise Coefficient for De-Quantization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose an implicit neural function with a bit query to recover de-quantized images from arbitrarily quantized inputs. |
Woo Kyoung Han; Byeonghun Lee; Sang Hyun Park; Kyong Hwan Jin; |
2006 | Event-Based Blurry Frame Interpolation Under Blind Exposure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the problem of blurry frame interpolation under blind exposure with the assistance of an event camera. |
Wenming Weng; Yueyi Zhang; Zhiwei Xiong; |
2007 | Human Body Shape Completion With Implicit Shape and Flow Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate how to complete human body shape models by combining shape and flow estimation given two consecutive depth images. |
Boyao Zhou; Di Meng; Jean-Sébastien Franco; Edmond Boyer; |
2008 | Spider GAN: Leveraging Friendly Neighbors To Accelerate GAN Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach for training GANs with images as inputs, but without enforcing any pairwise constraints. |
Siddarth Asokan; Chandra Sekhar Seelamantula; |
2009 | CLIPPING: Distilling CLIP-Based Models With A Student Base for Video-Language Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel knowledge distillation method, named CLIPPING, where the plentiful knowledge of a large teacher model that has been fine-tuned for video-language tasks with the powerful pre-trained CLIP can be effectively transferred to a small student only at the fine-tuning stage. |
Renjing Pei; Jianzhuang Liu; Weimian Li; Bin Shao; Songcen Xu; Peng Dai; Juwei Lu; Youliang Yan; |
2010 | ScaleDet: A Scalable Multi-Dataset Object Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Multi-dataset training provides a viable solution for exploiting heterogeneous large-scale datasets without extra annotation cost. In this work, we propose a scalable multi-dataset detector (ScaleDet) that can scale up its generalization across datasets when increasing the number of training datasets. |
Yanbei Chen; Manchen Wang; Abhay Mittal; Zhenlin Xu; Paolo Favaro; Joseph Tighe; Davide Modolo; |
2011 | Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a new MIL framework: Unbiased MIL (UMIL), to learn unbiased anomaly features that improve WSVAD. |
Hui Lv; Zhongqi Yue; Qianru Sun; Bin Luo; Zhen Cui; Hanwang Zhang; |
2012 | BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight, to address this issue. |
Lei Yang; Kaicheng Yu; Tao Tang; Jun Li; Kun Yuan; Li Wang; Xinyu Zhang; Peng Chen; |
2013 | Towards Unbiased Volume Rendering of Neural Implicit Surfaces With Geometry Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Although they build a bridge between volume rendering and Signed Distance Function (SDF), the accuracy is still limited. In this paper, we argue that this limited accuracy is due to the bias of their volume rendering strategies, especially when the viewing direction is close to be tangent to the surface. |
Yongqiang Zhang; Zhipeng Hu; Haoqian Wu; Minda Zhao; Lincheng Li; Zhengxia Zou; Changjie Fan; |
2014 | Modular Memorability: Tiered Representations for Video Memorability Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose to explore how different key properties of images and videos affect their consolidation into memory. |
Théo Dumont; Juan Segundo Hevia; Camilo L. Fosco; |
2015 | Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There has been a lot of effort in improving the performance of unsupervised domain adaptation for semantic segmentation task, however there is still a huge gap in performance when compared with supervised learning. In this work, we propose a common framework to use different weak labels, e.g. image, point and coarse labels from target domain to reduce this performance gap. |
Anurag Das; Yongqin Xian; Dengxin Dai; Bernt Schiele; |
2016 | Language-Guided Music Recommendation for Video Via Prompt Analogies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method to recommend music for an input video while allowing a user to guide music selection with free-form natural language. |
Daniel McKee; Justin Salamon; Josef Sivic; Bryan Russell; |
2017 | Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, to extend the potential in TAL networks, we propose a novel end-to-end method Re2TAL, which rewires pretrained video backbones for reversible TAL. |
Chen Zhao; Shuming Liu; Karttikeya Mangalam; Bernard Ghanem; |
2018 | Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We take inspiration from the biological plausibility learning where the neuron responses are tuned based on a local synapse-change procedure and activated by competitive lateral inhibition rules. Based on these feed-forward learning rules, we design a soft Hebbian learning process which provides an unsupervised and effective mechanism for online adaptation. |
Yushun Tang; Ce Zhang; Heng Xu; Shuoshuo Chen; Jie Cheng; Luziwei Leng; Qinghai Guo; Zhihai He; |
2019 | NeRFLight: Fast and Light Neural Radiance Fields Using A Shared Feature Grid Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the grid-based approach to achieve real-time view synthesis at more than 150 FPS using a lightweight model. |
Fernando Rivas-Manzaneque; Jorge Sierra-Acosta; Adrian Penate-Sanchez; Francesc Moreno-Noguer; Angela Ribeiro; |
2020 | MVImgNet: A Large-Scale Dataset of Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the laborious collection of real-world 3D data, there is yet no generic dataset serving as a counterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3D community is unraveled. To remedy this defect, we introduce MVImgNet, a large-scale dataset of multi-view images, which is highly convenient to gain by shooting videos of real-world objects in human daily life. |
Xianggang Yu; Mutian Xu; Yidan Zhang; Haolin Liu; Chongjie Ye; Yushuang Wu; Zizheng Yan; Chenming Zhu; Zhangyang Xiong; Tianyou Liang; Guanying Chen; Shuguang Cui; Xiaoguang Han; |
2021 | LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we make the following 4 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. |
Adrian Bulat; Georgios Tzimiropoulos; |
2022 | Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we analyse the generalization ability of binary classifiers for the task of deepfake detection. |
Shichao Dong; Jin Wang; Renhe Ji; Jiajun Liang; Haoqiang Fan; Zheng Ge; |
2023 | Learning Federated Visual Prompt in Null Space for MRI Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new algorithm, FedPR, to learn federated visual prompts in the null space of global prompt for MRI reconstruction. |
Chun-Mei Feng; Bangjun Li; Xinxing Xu; Yong Liu; Huazhu Fu; Wangmeng Zuo; |
2024 | A New Benchmark: On The Utility of Synthetic Data With Blender for Bare Supervised Learning and Downstream Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, under the well-controlled, IID data setting enabled by 3D rendering, we systematically verify the typical, important learning insights, e.g., shortcut learning, and discover the new laws of various data regimes and network architectures in generalization. |
Hui Tang; Kui Jia; |
2025 | Data-Driven Feature Tracking for Event Cameras Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame. |
Nico Messikommer; Carter Fang; Mathias Gehrig; Davide Scaramuzza; |
2026 | Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper aims at taking an alternative path proposing a self-supervised representation learning method for 3D LiDAR data. |
Lucas Nunes; Louis Wiesmann; Rodrigo Marcuzzi; Xieyuanli Chen; Jens Behley; Cyrill Stachniss; |
2027 | AutoAD: Movie Description in Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form. |
Tengda Han; Max Bain; Arsha Nagrani; Gül Varol; Weidi Xie; Andrew Zisserman; |
2028 | DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we turn attention to the emerging powerful Latent Diffusion Models, and model the Talking head generation as an audio-driven temporally coherent denoising process (DiffTalk). |
Shuai Shen; Wenliang Zhao; Zibin Meng; Wanhua Li; Zheng Zhu; Jie Zhou; Jiwen Lu; |
2029 | Autoregressive Visual Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present ARTrack, an autoregressive framework for visual object tracking. |
Xing Wei; Yifan Bai; Yongchao Zheng; Dahu Shi; Yihong Gong; |
2030 | SceneComposer: Any-Level Semantic Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new framework for conditional image synthesis from semantic layouts of any precision levels, ranging from pure text to a 2D semantic canvas with precise shapes. |
Yu Zeng; Zhe Lin; Jianming Zhang; Qing Liu; John Collomosse; Jason Kuen; Vishal M. Patel; |
2031 | Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose visual query tuning (VQT), a simple yet effective approach to aggregate intermediate features of Vision Transformers. |
Cheng-Hao Tu; Zheda Mai; Wei-Lun Chao; |
2032 | MaPLe: Multi-Modal Prompt Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations. |
Muhammad Uzair Khattak; Hanoona Rasheed; Muhammad Maaz; Salman Khan; Fahad Shahbaz Khan; |
2033 | Unsupervised Domain Adaption With Pixel-Level Discriminator for Image-Aware Layout Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There exists a domain gap between inpainted posters (source domain data) and clean product images (target domain data). Therefore, this paper combines unsupervised domain adaption techniques to design a GAN with a novel pixel-level discriminator (PD), called PDA-GAN, to generate graphic layouts according to image contents. |
Chenchen Xu; Min Zhou; Tiezheng Ge; Yuning Jiang; Weiwei Xu; |
2034 | Compressing Volumetric Radiance Fields to 1 MB Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods typically require a tremendous storage overhead, costing up to hundreds of megabytes of disk space and runtime memory for a single scene. We address this issue in this paper by introducing a simple yet effective framework, called vector quantized radiance fields (VQRF), for compressing these volume-grid-based radiance fields. |
Lingzhi Li; Zhen Shen; Zhongshu Wang; Li Shen; Liefeng Bo; |
2035 | Real-Time 6K Image Rescaling With Rate-Distortion Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing image rescaling methods do not optimize the LR image file size and recent flow-based rescaling methods are not real-time yet for HR image reconstruction (e.g., 6K). To address these two challenges, we propose a novel framework (HyperThumbnail) for real-time 6K rate-distortion-aware image rescaling. |
Chenyang Qi; Xin Yang; Ka Leong Cheng; Ying-Cong Chen; Qifeng Chen; |
2036 | Gated Stereo: Joint Depth Estimation From Gated and Wide-Baseline Active Stereo Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Gated Stereo, a high-resolution and long-range depth estimation technique that operates on active gated stereo images. |
Stefanie Walz; Mario Bijelic; Andrea Ramazzina; Amanpreet Walia; Fahim Mannan; Felix Heide; |
2037 | Label Information Bottleneck for Label Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on the challenging problem of Label Enhancement (LE), which aims to exactly recover label distributions from logical labels, and present a novel Label Information Bottleneck (LIB) method for LE. |
Haoyu Tang; Qinghai Zheng; Jihua Zhu; |
2038 | Multi-Modal Representation Learning With Text-Driven Soft Masks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a visual-linguistic representation learning approach within a self-supervised learning framework by introducing a new operation, loss, and data augmentation strategy. |
Jaeyoo Park; Bohyung Han; |
2039 | Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. |
Sounak Mondal; Zhibo Yang; Seoyoung Ahn; Dimitris Samaras; Gregory Zelinsky; Minh Hoai; |
2040 | MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose MammalNet, a new large-scale animal behavior dataset with taxonomy-guided annotations of mammals and their common behaviors. |
Jun Chen; Ming Hu; Darren J. Coker; Michael L. Berumen; Blair Costelloe; Sara Beery; Anna Rohrbach; Mohamed Elhoseiny; |
2041 | Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present HandAvatar, a novel representation for hand animation and rendering, which can generate smoothly compositional geometry and self-occlusion-aware texture. |
Xingyu Chen; Baoyuan Wang; Heung-Yeung Shum; |
2042 | Rethinking The Correlation in Few-Shot Segmentation: A Buoys View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we rethink how to mitigate the false matches from the perspective of representative reference features (referred to as buoys), and propose a novel adaptive buoys correlation (ABC) network to rectify direct pairwise pixel-level correlation, including a buoys mining module and an adaptive correlation module. |
Yuan Wang; Rui Sun; Tianzhu Zhang; |
2043 | VindLU: A Recipe for Effective Video-and-Language Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, instead of proposing yet another new VidL model, this paper conducts a thorough empirical study demystifying the most important factors in the VidL model design. |
Feng Cheng; Xizi Wang; Jie Lei; David Crandall; Mohit Bansal; Gedas Bertasius; |
2044 | Scaling Language-Image Pre-Training Via Masking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP. |
Yanghao Li; Haoqi Fan; Ronghang Hu; Christoph Feichtenhofer; Kaiming He; |
2045 | OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. |
Hongyi Xu; Guoxian Song; Zihang Jiang; Jianfeng Zhang; Yichun Shi; Jing Liu; Wanchun Ma; Jiashi Feng; Linjie Luo; |
2046 | DiffRF: Rendering-Guided 3D Radiance Field Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. |
Norman Müller; Yawar Siddiqui; Lorenzo Porzi; Samuel Rota Bulò; Peter Kontschieder; Matthias Nießner; |
2047 | DNF: Decouple and Feedback Network for Seeing in The Dark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The multi-stage methods propagate the information merely through the resulting image of each stage, neglecting the abundant features in the lossy image-level dataflow. In this paper, we probe a generalized solution to these bottlenecks and propose a Decouple aNd Feedback framework, abbreviated as DNF. |
Xin Jin; Ling-Hao Han; Zhen Li; Chun-Le Guo; Zhi Chai; Chongyi Li; |
2048 | SUDS: Scalable Urban Dynamic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a step towards truly open-world reconstructions of dynamic cities, we introduce two key innovations: (a) we factorize the scene into three separate hash table data structures to efficiently encode static, dynamic, and far-field radiance fields, and (b) we make use of unlabeled target signals consisting of RGB images, sparse LiDAR, off-the-shelf self-supervised 2D descriptors, and most importantly, 2D optical flow. |
Haithem Turki; Jason Y. Zhang; Francesco Ferroni; Deva Ramanan; |
2049 | Deformable Mesh Transformer for 3D Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Deformable mesh transFormer (DeFormer), a novel vertex-based approach to monocular 3D human mesh recovery. |
Yusuke Yoshiyasu; |
2050 | Vita-CLIP: Video and Text Adaptive CLIP Via Multimodal Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a multimodal prompt learning scheme that works to balance the supervised and zero-shot performance under a single unified training. |
Syed Talal Wasim; Muzammal Naseer; Salman Khan; Fahad Shahbaz Khan; Mubarak Shah; |
2051 | HS-Pose: Hybrid Scope Feature Extraction for Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on the problem of category-level object pose estimation, which is challenging due to the large intra-category shape variation. |
Linfang Zheng; Chen Wang; Yinghan Sun; Esha Dasgupta; Hua Chen; Aleš Leonardis; Wei Zhang; Hyung Jin Chang; |
2052 | Cloud-Device Collaborative Adaptation to Continual Changing Environments in The Real-World Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation. |
Mingjie Pan; Rongyu Zhang; Zijian Ling; Yulu Gan; Lingran Zhao; Jiaming Liu; Shanghang Zhang; |
2053 | Parts2Words: Learning Joint Embedding of Point Clouds and Texts By Bidirectional Matching Between Parts and Words Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods mainly represent a 3D shape as multiple 2D rendered views, which obviously can not be understood well due to the structural ambiguity caused by self-occlusion in the limited number of views. To resolve this issue, we directly represent 3D shapes as point clouds, and propose to learn joint embedding of point clouds and texts by bidirectional matching between parts from shapes and words from texts. |
Chuan Tang; Xi Yang; Bojian Wu; Zhizhong Han; Yi Chang; |
2054 | Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the objective for acquiring segment-level scores during training is not consistent with the target for acquiring proposal-level scores during testing, leading to suboptimal results. To deal with this problem, we propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages, which includes three key designs: 1) a surrounding contrastive feature extraction module to suppress the discriminative short proposals by considering the surrounding contrastive information, 2) a proposal completeness evaluation module to inhibit the low-quality proposals with the guidance of the completeness pseudo labels, and 3) an instance-level rank consistency loss to achieve robust detection by leveraging the complementarity of RGB and FLOW modalities. |
Huan Ren; Wenfei Yang; Tianzhu Zhang; Yongdong Zhang; |
2055 | LayoutDM: Transformer-Based Diffusion Model for Layout Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success of diffusion models in generating high-quality images, this paper explores their potential for conditional layout generation and proposes Transformer-based Layout Diffusion Model (LayoutDM) by instantiating the conditional denoising diffusion probabilistic model (DDPM) with a purely transformer-based architecture. |
Shang Chai; Liansheng Zhuang; Fengying Yan; |
2056 | HandNeRF: Neural Radiance Fields for Animatable Interacting Hands Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands, enabling the rendering of photo-realistic images and videos for gesture animation from arbitrary views. |
Zhiyang Guo; Wengang Zhou; Min Wang; Li Li; Houqiang Li; |
2057 | ASPnet: Action Segmentation With Shared-Private Representation of Multiple Data Sources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to improve multimodal representation learning for action segmentation, we propose to disentangle hidden features of a multi-stream segmentation model into modality-shared components, containing common information across data sources, and private components; we then use an attention bottleneck to capture long-range temporal dependencies in the data while preserving disentanglement in consecutive processing layers. |
Beatrice van Amsterdam; Abdolrahim Kadkhodamohammadi; Imanol Luengo; Danail Stoyanov; |
2058 | Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe how to obtain adversarially-robust model soups (i.e., linear combinations of parameters) that smoothly trade-off robustness to different l_p-norm bounded adversaries. |
Francesco Croce; Sylvestre-Alvise Rebuffi; Evan Shelhamer; Sven Gowal; |
2059 | Introducing Competition To Boost The Transferability of Targeted Adversarial Examples Through Clean Feature Mixup Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To enhance the transferability of targeted adversarial examples, we propose introducing competition into the optimization process. |
Junyoung Byun; Myung-Joon Kwon; Seungju Cho; Yoonji Kim; Changick Kim; |
2060 | Ingredient-Oriented Multi-Degradation Learning for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel perspective to delve into the degradation via an ingredients-oriented rather than previous task-oriented manner for scalable learning. |
Jinghao Zhang; Jie Huang; Mingde Yao; Zizheng Yang; Hu Yu; Man Zhou; Feng Zhao; |
2061 | How To Prevent The Continuous Damage of Noises To Model Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we put forward a Gradient Switching Strategy (GSS) to prevent the continuous damage of noise samples to the classifier. |
Xiaotian Yu; Yang Jiang; Tianqi Shi; Zunlei Feng; Yuexuan Wang; Mingli Song; Li Sun; |
2062 | A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. |
Zhiheng Li; Ivan Evtimov; Albert Gordo; Caner Hazirbas; Tal Hassner; Cristian Canton Ferrer; Chenliang Xu; Mark Ibrahim; |
2063 | Skinned Motion Retargeting With Residual Perception of Motion Semantics & Geometry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel Residual RETargeting network (R2ET) structure, which relies on two neural modification modules, to adjust the source motions to fit the target skeletons and shapes progressively. |
Jiaxu Zhang; Junwu Weng; Di Kang; Fang Zhao; Shaoli Huang; Xuefei Zhe; Linchao Bao; Ying Shan; Jue Wang; Zhigang Tu; |
2064 | Weakly-Supervised Single-View Image Relighting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a learning-based approach to relight a single image of Lambertian and low-frequency specular objects. |
Renjiao Yi; Chenyang Zhu; Kai Xu; |
2065 | DualVector: Unsupervised Vector Font Synthesis With Dual-Part Representation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To push the quality of vector font synthesis to the next level, we propose a novel dual-part representation for vector glyphs, where each glyph is modeled as a collection of closed "positive" and "negative" path pairs. |
Ying-Tian Liu; Zhifei Zhang; Yuan-Chen Guo; Matthew Fisher; Zhaowen Wang; Song-Hai Zhang; |
2066 | Efficient Scale-Invariant Generator With Column-Row Entangled Pixel Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Column-Row Entangled Pixel Synthesisthes (CREPS), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. |
Thuan Hoang Nguyen; Thanh Van Le; Anh Tran; |
2067 | ReasonNet: End-to-End Driving With Temporal and Global Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present ReasonNet, a novel end-to-end driving framework that extensively exploits both temporal and global information of the driving scene. |
Hao Shao; Letian Wang; Ruobing Chen; Steven L. Waslander; Hongsheng Li; Yu Liu; |
2068 | Learning Situation Hyper-Graphs for Video Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs, coined Situation Hyper-Graph based Video Question Answering (SHG-VQA). |
Aisha Urooj; Hilde Kuehne; Bo Wu; Kim Chheu; Walid Bousselham; Chuang Gan; Niels Lobo; Mubarak Shah; |
2069 | H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Beyond the previous methods, we design H2ONet to fully exploit non-occluded information from multiple frames to boost the reconstruction quality. |
Hao Xu; Tianyu Wang; Xiao Tang; Chi-Wing Fu; |
2070 | Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel scheme, Interventional Bag Multi-Instance Learning (IBMIL), to achieve deconfounded bag-level prediction. |
Tiancheng Lin; Zhimiao Yu; Hongyu Hu; Yi Xu; Chang-Wen Chen; |
2071 | GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose GazeNeRF, a 3D-aware method for the task of gaze redirection. |
Alessandro Ruzzi; Xiangwei Shi; Xi Wang; Gengyan Li; Shalini De Mello; Hyung Jin Chang; Xucong Zhang; Otmar Hilliges; |
2072 | How Can Objects Help Action Recognition? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate how we can use knowledge of objects to design better video models, namely to process fewer tokens and to improve recognition accuracy. |
Xingyi Zhou; Anurag Arnab; Chen Sun; Cordelia Schmid; |
2073 | Realistic Saliency Guided Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a realism loss for saliency-guided image enhancement to maintain high realism across varying image types, while attenuating distractors and amplifying objects of interest. |
S. Mahdi H. Miangoleh; Zoya Bylinskii; Eric Kee; Eli Shechtman; Yağiz Aksoy; |
2074 | SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. |
Yudi Dai; Yitai Lin; Xiping Lin; Chenglu Wen; Lan Xu; Hongwei Yi; Siqi Shen; Yuexin Ma; Cheng Wang; |
2075 | SegLoc: Learning Segmentation-Based Representations for Privacy-Preserving Visual Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new localization framework, SegLoc, that leverages image segmentation to create robust, compact, and privacy-preserving scene representations, i.e., 3D maps. |
Maxime Pietrantoni; Martin Humenberger; Torsten Sattler; Gabriela Csurka; |
2076 | Efficient Hierarchical Entropy Model for Learned Point Cloud Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the efficiency of the attention model, we propose a hierarchical attention structure that has a linear complexity to the context scale and maintains the global receptive field. |
Rui Song; Chunyang Fu; Shan Liu; Ge Li; |
2077 | RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images With Diverse Sizes and Imbalanced Categories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose, RankMix, a data augmentation method of mixing ranked features in a pair of WSIs. |
Yuan-Chih Chen; Chun-Shien Lu; |
2078 | ActMAD: Activation Matching To Align Distributions for Test-Time-Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Test-Time-Training (TTT) is an approach to cope with out-of-distribution (OOD) data by adapting a trained model to distribution shifts occurring at test-time. We propose to perform this adaptation via Activation Matching (ActMAD): We analyze activations of the model and align activation statistics of the OOD test data to those of the training data. |
Muhammad Jehanzeb Mirza; Pol Jané Soneira; Wei Lin; Mateusz Kozinski; Horst Possegger; Horst Bischof; |
2079 | DKM: Dense Kernelized Feature Matching for Geometry Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we consider the dense approach instead of the more common sparse paradigm, thus striving to find all correspondences. |
Johan Edstedt; Ioannis Athanasiadis; Mårten Wadenbäck; Michael Felsberg; |
2080 | Image Cropping With Spatial-Aware Feature and Rank Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the first issue, we propose spatial-aware feature to encode the spatial relationship between candidate crops and aesthetic elements, by feeding the concatenation of crop mask and selectively aggregated feature maps to a light-weighted encoder. |
Chao Wang; Li Niu; Bo Zhang; Liqing Zhang; |
2081 | SVGformer: Representation Learning for Continuous Vector Graphics Using Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a transformer-based representation learning model (SVGformer) that directly operates on continuous input values and manipulates the geometric information of SVG to encode outline details and long-distance dependencies. |
Defu Cao; Zhaowen Wang; Jose Echevarria; Yan Liu; |
2082 | Structured 3D Features for Reconstructing Controllable Avatars Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. |
Enric Corona; Mihai Zanfir; Thiemo Alldieck; Eduard Gabriel Bazavan; Andrei Zanfir; Cristian Sminchisescu; |
2083 | Mask-Guided Matting in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a simple yet effective learning framework based on two core insights: 1) learning a generalized matting model that can better understand the given mask guidance and 2) leveraging weak supervision datasets (e.g., instance segmentation dataset) to alleviate the limited diversity and scale of existing matting datasets. |
Kwanyong Park; Sanghyun Woo; Seoung Wug Oh; In So Kweon; Joon-Young Lee; |
2084 | Dynamic Conceptional Contrastive Learning for Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a Dynamic Conceptional Contrastive Learning (DCCL) framework, which can effectively improve clustering accuracy by alternately estimating underlying visual conceptions and learning conceptional representation. |
Nan Pu; Zhun Zhong; Nicu Sebe; |
2085 | Neumann Network With Recursive Kernels for Single Image Defocus Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the strong correlation among defocus kernels of different sizes and the blob-type structure of defocus kernels, we propose a learnable recursive kernel representation (RKR) for defocus kernels that expresses a defocus kernel by a linear combination of recursive, separable and positive atom kernels, leading to a compact yet effective and physics-encoded parametrization of the spatially-varying defocus blurring process. |
Yuhui Quan; Zicong Wu; Hui Ji; |
2086 | Active Finetuning: Exploiting Annotation Budget in The Pretraining-Finetuning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel method called ActiveFT for active finetuning task to select a subset of data distributing similarly with the entire unlabeled pool and maintaining enough diversity by optimizing a parametric model in the continuous space. |
Yichen Xie; Han Lu; Junchi Yan; Xiaokang Yang; Masayoshi Tomizuka; Wei Zhan; |
2087 | Learning Attribute and Class-Specific Representation Duet for Fine-Grained Fashion Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works often learn fine-grained fashion representations at the attribute-level without considering their relationships and inter-dependencies across different classes. In this work, we propose to learn an attribute and class specific fashion representation duet to better model such attribute relationships and inter-dependencies by leveraging prior knowledge about the taxonomy of fashion attributes and classes. |
Yang Jiao; Yan Gao; Jingjing Meng; Jin Shang; Yi Sun; |
2088 | Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Accuracy and robustness need to be further improved, particularly in complex scenes with multiple objects and background clutter. To address this issue, we propose a novel approach called Multiple Enhancement Network (MENet) that adopts the boundary sensibility, content integrity, iterative refinement, and frequency decomposition mechanisms of HVS. |
Yi Wang; Ruili Wang; Xin Fan; Tianzhu Wang; Xiangjian He; |
2089 | Leveraging Temporal Context in Low Representational Power Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the Event Transition Matrix (ETM), computed from action labels in an untrimmed video dataset, which captures the temporal context of a given action, operationalized as the likelihood that it was preceded or followed by each other action in the set. |
Camilo L. Fosco; SouYoung Jin; Emilie Josephs; Aude Oliva; |
2090 | Guided Recommendation for Model Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The prior knowledge on model capacity and dataset also can not be easily integrated into the existing criteria. To address these issues, we propose to convert model selection as a recommendation problem and to learn from the past training history. |
Hao Li; Charless Fowlkes; Hao Yang; Onkar Dabeer; Zhuowen Tu; Stefano Soatto; |
2091 | Masked Image Training for Generalizable Deep Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, deep models trained on Gaussian noise may perform poorly when tested on other noise distributions. To address this issue, we present a novel approach to enhance the generalization performance of denoising networks, known as masked training. |
Haoyu Chen; Jinjin Gu; Yihao Liu; Salma Abdel Magid; Chao Dong; Qiong Wang; Hanspeter Pfister; Lei Zhu; |
2092 | In-Hand 3D Object Scanning From An RGB Sequence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a method for in-hand 3D scanning of an unknown object with a monocular camera. |
Shreyas Hampali; Tomas Hodan; Luan Tran; Lingni Ma; Cem Keskin; Vincent Lepetit; |
2093 | Zero-Shot Referring Image Segmentation With Global-Local Context Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. |
Seonghoon Yu; Paul Hongsuck Seo; Jeany Son; |
2094 | SketchXAI: A First Look at Explainability for Human Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper, for the very first time, introduces human sketches to the landscape of XAI (Explainable Artificial Intelligence). |
Zhiyu Qu; Yulia Gryaditskaya; Ke Li; Kaiyue Pang; Tao Xiang; Yi-Zhe Song; |
2095 | Omni3D: A Large Benchmark and Model for 3D Object Detection in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. |
Garrick Brazil; Abhinav Kumar; Julian Straub; Nikhila Ravi; Justin Johnson; Georgia Gkioxari; |
2096 | OT-Filter: An Optimal Transport Filter for Learning With Noisy Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revamp the sample selection from the perspective of optimal transport theory and propose a novel method, called the OT-Filter. |
Chuanwen Feng; Yilong Ren; Xike Xie; |
2097 | Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To that end, we develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL). |
Sungmin Cha; Sungjun Cho; Dasol Hwang; Sunwon Hong; Moontae Lee; Taesup Moon; |
2098 | OmniVidar: Omnidirectional Depth Estimation From Multi-Fisheye Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Estimating depth from four large field of view (FoV) cameras has been a difficult and understudied problem. In this paper, we proposed a novel and simple system that can convert this difficult problem into easier binocular depth estimation. |
Sheng Xie; Daochuan Wang; Yun-Hui Liu; |
2099 | RWSC-Fusion: Region-Wise Style-Controlled Fusion Network for The Prohibited X-Ray Security Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to solve the data insufficiency, we propose a RegionWise Style-Controlled Fusion (RWSC-Fusion) network, which superimposes the prohibited items onto the normal X-ray security images, to synthesize the prohibited X-ray security images. |
Luwen Duan; Min Wu; Lijian Mao; Jun Yin; Jianping Xiong; Xi Li; |
2100 | Octree Guided Unoriented Surface Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a two-step approach, OG-INR, where we (1) construct a discrete octree and label what is inside and outside (2) optimize for a continuous and high-fidelity shape using an INR that is initially guided by the octree’s labelling. |
Chamin Hewa Koneputugodage; Yizhak Ben-Shabat; Stephen Gould; |
2101 | Rigidity-Aware Detection for 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid. |
Yang Hai; Rui Song; Jiaojiao Li; Mathieu Salzmann; Yinlin Hu; |
2102 | ToThePoint: Efficient Contrastive Learning of 3D Point Clouds Via Recycling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing models, for self-supervised learning of 3D point clouds, rely on a large number of data samples, and require significant amount of computational resources and training time. To address this issue, we propose a novel contrastive learning approach, referred to as ToThePoint. |
Xinglin Li; Jiajing Chen; Jinhui Ouyang; Hanhui Deng; Senem Velipasalar; Di Wu; |
2103 | Clover: Towards A Unified Video-Language Alignment and Fusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though offering attractive generality, the resulted models have to compromise between efficiency and performance. They mostly adopt different architectures to deal with different downstream tasks. We find this is because the pair-wise training cannot well align and fuse features from different modalities. We then introduce Clover–a Correlated Video-Language pre-training method–towards a universal video-language model for solving multiple video understanding tasks with neither performance nor efficiency compromise. |
Jingjia Huang; Yinan Li; Jiashi Feng; Xinglong Wu; Xiaoshuai Sun; Rongrong Ji; |
2104 | Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection and Direction Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This inconsistency between the training and inference makes it hard to utilize the large-scale feedback data and increases the data collection expenses. To bridge this gap, we propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images. |
Runzhou Tao; Wencheng Han; Zhongying Qiu; Cheng-Zhong Xu; Jianbing Shen; |
2105 | Self-Supervised Learning From Images With A Joint-Embedding Predictive Architecture Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. |
Mahmoud Assran; Quentin Duval; Ishan Misra; Piotr Bojanowski; Pascal Vincent; Michael Rabbat; Yann LeCun; Nicolas Ballas; |
2106 | EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, existing methods either extract the sentence-level features coupling all words or focus more on object names, which would lose the word-level information or neglect other attributes. To alleviate these issues, we present EDA that Explicitly Decouples the textual attributes in a sentence and conducts Dense Alignment between such fine-grained language and point cloud objects. |
Yanmin Wu; Xinhua Cheng; Renrui Zhang; Zesen Cheng; Jian Zhang; |
2107 | A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From A Single RGB Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our key idea is to equip A2J with strong local-global aware ability to well capture interacting hands’ local fine details and global articulated clues among joints jointly. |
Changlong Jiang; Yang Xiao; Cunlin Wu; Mingyang Zhang; Jinghong Zheng; Zhiguo Cao; Joey Tianyi Zhou; |
2108 | The Treasure Beneath Multiple Annotations: An Uncertainty-Aware Edge Detector Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel uncertainty-aware edge detector (UAED), which employs uncertainty to investigate the subjectivity and ambiguity of diverse annotations. |
Caixia Zhou; Yaping Huang; Mengyang Pu; Qingji Guan; Li Huang; Haibin Ling; |
2109 | DP-NeRF: Deblurred Neural Radiance Field With Physical Scene Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, this paper proposes a DP-NeRF, a novel clean NeRF framework for blurred images, which is constrained with two physical priors. |
Dogyoon Lee; Minhyeok Lee; Chajin Shin; Sangyoun Lee; |
2110 | MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose MixPHM, a redundancy-aware parameter-efficient tuning method that outperforms full finetuning in low-resource VQA. |
Jingjing Jiang; Nanning Zheng; |
2111 | Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper presents a dataset-free deep learning method for removing uniform and non-uniform blur effects from images of static scenes. |
Ji Li; Weixi Wang; Yuesong Nan; Hui Ji; |
2112 | DeAR: Debiasing Vision-Language Models With Additive Residuals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present DeAR (Debiasing with Additive Residuals), a novel debiasing method that learns additive residual image representations to offset the original representations, ensuring fair output representations. |
Ashish Seth; Mayur Hemani; Chirag Agarwal; |
2113 | E2PN: Efficient SE(3)-Equivariant Point Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a convolution structure for learning SE(3)-equivariant features from 3D point clouds. |
Minghan Zhu; Maani Ghaffari; William A. Clark; Huei Peng; |
2114 | Understanding Masked Image Modeling Via Learning Occlusion Invariant Feature Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new viewpoint: MIM implicitly learns occlusion-invariant features, which is analogous to other siamese methods while the latter learns other invariance. |
Xiangwen Kong; Xiangyu Zhang; |
2115 | Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose counterfactual explanation with text-driven concepts (CounTEX), where the concepts are defined only from text by leveraging a pre-trained multi-modal joint embedding space without additional concept-annotated datasets. |
Siwon Kim; Jinoh Oh; Sungjin Lee; Seunghak Yu; Jaeyoung Do; Tara Taghavi; |
2116 | A Dynamic Multi-Scale Voxel Flow Network for Video Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. |
Xiaotao Hu; Zhewei Huang; Ailin Huang; Jun Xu; Shuchang Zhou; |
2117 | UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird’s-Eye View Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a universal cross-modality knowledge distillation framework (UniDistill) to improve the performance of single-modality detectors. |
Shengchao Zhou; Weizhou Liu; Chen Hu; Shuchang Zhou; Chao Ma; |
2118 | SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we unify the current dominant Mean-Teacher approaches by reconciling intra- model and inter-model properties for semi-supervised segmentation to produce a novel algorithm, SemiCVT, that absorbs the quintessence of CNNs and Transformer in a comprehensive way. |
Huimin Huang; Shiao Xie; Lanfen Lin; Ruofeng Tong; Yen-Wei Chen; Yuexiang Li; Hong Wang; Yawen Huang; Yefeng Zheng; |
2119 | Fine-Tuned CLIP Models Are Efficient Video Learners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that a simple Video Fine-tuned CLIP (ViFi-CLIP) baseline is generally sufficient to bridge the domain gap from images to videos. |
Hanoona Rasheed; Muhammad Uzair Khattak; Muhammad Maaz; Salman Khan; Fahad Shahbaz Khan; |
2120 | Towards Open-World Segmentation of Parts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The largest dataset nowadays contains merely two hundred object categories, implying the difficulty to scale up part segmentation to an unconstrained setting. To address this, we propose to explore a seemingly simplified but empirically useful and scalable task, class-agnostic part segmentation. |
Tai-Yu Pan; Qing Liu; Wei-Lun Chao; Brian Price; |
2121 | Stitchable Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment. |
Zizheng Pan; Jianfei Cai; Bohan Zhuang; |
2122 | Collaborative Diffusion for Multi-Modal Face Generation and Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. |
Ziqi Huang; Kelvin C.K. Chan; Yuming Jiang; Ziwei Liu; |
2123 | DejaVu: Conditional Regenerative Learning To Enhance Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present DejaVu, a novel framework which leverages conditional image regeneration as additional supervision during training to improve deep networks for dense prediction tasks such as segmentation, depth estimation, and surface normal prediction. |
Shubhankar Borse; Debasmit Das; Hyojin Park; Hong Cai; Risheek Garrepalli; Fatih Porikli; |
2124 | MACARONS: Mapping and Coverage Anticipation With RGB Online Self-Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images only. |
Antoine Guédon; Tom Monnier; Pascal Monasse; Vincent Lepetit; |
2125 | Audio-Visual Grouping Network for Sound Localization From Mixtures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their promising performance, they can only handle a fixed number of sources, and they cannot learn compact class-aware representations for individual sources. To alleviate this shortcoming, in this paper, we propose a novel audio-visual grouping network, namely AVGN, that can directly learn category-wise semantic features for each source from the input audio mixture and frame to localize multiple sources simultaneously. |
Shentong Mo; Yapeng Tian; |
2126 | Fair Federated Medical Image Segmentation Via Client Contribution Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel method to optimize both types of fairness simultaneously. |
Meirui Jiang; Holger R. Roth; Wenqi Li; Dong Yang; Can Zhao; Vishwesh Nath; Daguang Xu; Qi Dou; Ziyue Xu; |
2127 | Dynamic Generative Targeted Attacks With Pattern Injection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on this analysis, we introduce a generative attack model composed of a cross-attention guided convolution module and a pattern injection module. |
Weiwei Feng; Nanqing Xu; Tianzhu Zhang; Yongdong Zhang; |
2128 | Tracking Multiple Deformable Objects in Egocentric Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose DETracker, a new MOT method that jointly detects and tracks deformable objects in egocentric videos. |
Mingzhen Huang; Xiaoxing Li; Jun Hu; Honghong Peng; Siwei Lyu; |
2129 | Visual Recognition By Request Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish a new paradigm named visual recognition by request (ViRReq) to bridge the gap. |
Chufeng Tang; Lingxi Xie; Xiaopeng Zhang; Xiaolin Hu; Qi Tian; |
2130 | SmartBrush: Text and Shape Guided Object Inpainting With Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance. |
Shaoan Xie; Zhifei Zhang; Zhe Lin; Tobias Hinz; Kun Zhang; |
2131 | REC-MV: REconstructing 3D Dynamic Cloth From Monocular Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel approach, called REC-MV to jointly optimize the explicit feature curves and the implicit signed distance field (SDF) of the garments. |
Lingteng Qiu; Guanying Chen; Jiapeng Zhou; Mutian Xu; Junle Wang; Xiaoguang Han; |
2132 | JRDB-Pose: A Large-Scale Dataset for Multi-Person Pose Estimation and Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce JRDB-Pose, a large-scale dataset and benchmark for multi-person pose estimation and tracking. |
Edward Vendrow; Duy Tho Le; Jianfei Cai; Hamid Rezatofighi; |
2133 | AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study few-shot domain adaptive object detection (FSDAOD), where only a few target labeled images are available for training in addition to sufficient source labeled images. |
Yipeng Gao; Kun-Yu Lin; Junkai Yan; Yaowei Wang; Wei-Shi Zheng; |
2134 | RUST: Latent Neural Scene Representations From Unposed Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Scene Representation Transformer (SRT) has shown promise in this direction, but scaling it to a larger set of diverse scenes is challenging and necessitates accurately posed ground truth data. To address this problem, we propose RUST (Really Unposed Scene representation Transformer), a pose-free approach to novel view synthesis trained on RGB images alone. |
Mehdi S. M. Sajjadi; Aravindh Mahendran; Thomas Kipf; Etienne Pot; Daniel Duckworth; Mario Lučić; Klaus Greff; |
2135 | PointCert: Point Cloud Classification With Deterministic Certified Robustness Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework, namely PointCert, that can transform an arbitrary point cloud classifier to be certifiably robust against adversarial point clouds with deterministic guarantees. |
Jinghuai Zhang; Jinyuan Jia; Hongbin Liu; Neil Zhenqiang Gong; |
2136 | Open Set Action Recognition Via Multi-Label Evidential Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for open set action recognition and novelty detection via MUlti-Label Evidential learning (MULE), that goes beyond previous novel action detection methods by addressing the more general problems of single or multiple actors in the same scene, with simultaneous action(s) by any actor. |
Chen Zhao; Dawei Du; Anthony Hoogs; Christopher Funk; |
2137 | MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we project the representations of all modalities as probabilistic distributions via a Probability Distribution Encoder (PDE) by utilizing sequence-level interactions. |
Yatai Ji; Junjie Wang; Yuan Gong; Lin Zhang; Yanru Zhu; Hongfa Wang; Jiaxing Zhang; Tetsuya Sakai; Yujiu Yang; |
2138 | DualRel: Semi-Supervised Mitochondria Segmentation From A Prototype Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we analyze the gap between mitochondrial images and natural images and rethink how to achieve effective semi-supervised mitochondria segmentation, from the perspective of reliable prototype-level supervision. |
Huayu Mai; Rui Sun; Tianzhu Zhang; Zhiwei Xiong; Feng Wu; |
2139 | Federated Learning With Data-Agnostic Distribution Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel data-agnostic distribution fusion based model aggregation method called FedFusion to optimize federated learning with non-IID local datasets, based on which the heterogeneous clients’ data distributions can be represented by a global distribution of several virtual fusion components with different parameters and weights. |
Jian-hui Duan; Wenzhong Li; Derun Zou; Ruichen Li; Sanglu Lu; |
2140 | Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given the generated captions, a natural question arises: what benefits do they bring to text-video retrieval? To answer this, we introduce Cap4Video, a new framework that leverages captions in three ways: i) Input data: video-caption pairs can augment the training data. |
Wenhao Wu; Haipeng Luo; Bo Fang; Jingdong Wang; Wanli Ouyang; |
2141 | Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between attribute prototypes and visual features, constituting a progressive semantic-visual mutual adaption (PSVMA) network for semantic disambiguation and knowledge transferability improvement. |
Man Liu; Feng Li; Chunjie Zhang; Yunchao Wei; Huihui Bai; Yao Zhao; |
2142 | Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The existing up-sampling methods cannot effectively utilize the advantages of single-stage and progressive up-sampling strategies with conventional and/or recent up-samplers at the same time. To address these challenges, we propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images. |
Nancy Mehta; Akshay Dudhane; Subrahmanyam Murala; Syed Waqas Zamir; Salman Khan; Fahad Shahbaz Khan; |
2143 | Improving Commonsense in Vision-Language Models Via Knowledge Graph Riddles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Rather than collecting a new VL training dataset, we propose a more scalable strategy, i.e., "Data Augmentation with kNowledge graph linearization for CommonsensE capability" (DANCE). |
Shuquan Ye; Yujia Xie; Dongdong Chen; Yichong Xu; Lu Yuan; Chenguang Zhu; Jing Liao; |
2144 | S3C: Semi-Supervised VQA Natural Language Explanation Via Self-Critical Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new Semi-Supervised VQA-NLE via Self-Critical Learning (S3C), which evaluates the candidate explanations by answering rewards to improve the logical consistency between answers and rationales. |
Wei Suo; Mengyang Sun; Weisong Liu; Yiqi Gao; Peng Wang; Yanning Zhang; Qi Wu; |
2145 | Spatio-Focal Bidirectional Disparity Estimation From A Dual-Pixel Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a self-supervised learning method that learns bidirectional disparity by utilizing the nature of anisotropic blur kernels in dual-pixel photography. |
Donggun Kim; Hyeonjoong Jang; Inchul Kim; Min H. Kim; |
2146 | Block Selection Method for Using Feature Norm in Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we first revealed that a norm of the feature map obtained from the other block than the last block can be a better indicator of OOD detection. Motivated by this, we propose a simple framework consisting of FeatureNorm: a norm of the feature map and NormRatio: a ratio of FeatureNorm for ID and OOD to measure the OOD detection performance of each block. |
Yeonguk Yu; Sungho Shin; Seongju Lee; Changhyun Jun; Kyoobin Lee; |
2147 | PIDNet: A Real-Time Semantic Segmentation Network Inspired By PID Controllers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we make a connection between Convolutional Neural Networks (CNN) and Proportional-Integral-Derivative (PID) controllers and reveal that a two-branch network is equivalent to a Proportional-Integral (PI) controller, which inherently suffers from similar overshoot issues. |
Jiacong Xu; Zixiang Xiong; Shankar P. Bhattacharyya; |
2148 | Four-View Geometry With Unknown Radial Distortion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present novel solutions to previously unsolved problems of relative pose estimation from images whose calibration parameters, namely focal lengths and radial distortion, are unknown. |
Petr Hruby; Viktor Korotynskiy; Timothy Duff; Luke Oeding; Marc Pollefeys; Tomas Pajdla; Viktor Larsson; |
2149 | Rethinking Optical Flow From Geometric Matching Consistent Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a rethinking to previous optical flow estimation. |
Qiaole Dong; Chenjie Cao; Yanwei Fu; |
2150 | Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep reinforcement learning (DRL) gives the promise that an agent learns good policy from high-dimensional information, whereas representation learning removes irrelevant and redundant information and retains pertinent information. In this work, we demonstrate that the learned representation of the Q-network and its target Q-network should, in theory, satisfy a favorable distinguishable representation property. |
Qiang He; Huangyuan Su; Jieyu Zhang; Xinwen Hou; |
2151 | PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To fill the gap, this paper proposes PointDistiller, a structured knowledge distillation framework for point clouds-based 3D detection. |
Linfeng Zhang; Runpei Dong; Hung-Shuo Tai; Kaisheng Ma; |
2152 | Learning Optical Expansion From Scale Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous methods estimate optical expansion mainly from optical flow results, but this two-stage architecture makes their results limited by the accuracy of optical flow and less robust. To solve these problems, we propose the concept of 3D optical flow by integrating optical expansion into the 2D optical flow, which is implemented by a plug-and-play module, namely TPCV. |
Han Ling; Yinghui Sun; Quansen Sun; Zhenwen Ren; |
2153 | LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple yet effective self-supervised pretraining method for image harmonization which can leverage large-scale unannotated image datasets. |
Sheng Liu; Cong Phuoc Huynh; Cong Chen; Maxim Arap; Raffay Hamid; |
2154 | How To Prevent The Poor Performance Clients for Personalized Federated Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we embed our proposed PLGU strategy into two pFL schemes concluded in this paper: with/without a global model, and present the training procedures in detail. |
Zhe Qu; Xingyu Li; Xiao Han; Rui Duan; Chengchao Shen; Lixing Chen; |
2155 | TopDiG: Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the vast majority of existing works concentrate on a specific target, fragile to category variety, and hardly achieve stable performance crossing different categories. In this work, we propose an innovative class-agnostic model, namely TopDiG, to directly extract topological directional graphs from remote sensing images and solve these issues. |
Bingnan Yang; Mi Zhang; Zhan Zhang; Zhili Zhang; Xiangyun Hu; |
2156 | Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Second Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. |
Vincent-Pierre Berges; Andrew Szot; Devendra Singh Chaplot; Aaron Gokaslan; Roozbeh Mottaghi; Dhruv Batra; Eric Undersander; |
2157 | StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: StyleIPSB gives us a novel tool for high-fidelity face swapping, and we propose a three-stage framework for face swapping with StyleIPSB. |
Diqiong Jiang; Dan Song; Ruofeng Tong; Min Tang; |
2158 | Unknown Sniffer for Object Detection: Don’t Turn A Blind Eye to Unknown Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the unknown sniffer (UnSniffer) to find both unknown and known objects. |
Wenteng Liang; Feng Xue; Yihao Liu; Guofeng Zhong; Anlong Ming; |
2159 | Discriminator-Cooperated Feature Map Distillation for GAN Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the irreplaceability of teacher discriminator and present an inventive discriminator-cooperated distillation, abbreviated as DCD, towards refining better feature maps from the generator. |
Tie Hu; Mingbao Lin; Lizhou You; Fei Chao; Rongrong Ji; |
2160 | Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a novel detection framework, named Learning on Gradients (LGrad), designed for identifying GAN-generated images, with the aim of constructing a generalized detector with cross-model and cross-data. |
Chuangchuang Tan; Yao Zhao; Shikui Wei; Guanghua Gu; Yunchao Wei; |
2161 | Don’t Lie to Me! Robust and Efficient Explainability With Verified Perturbation Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce EVA (Explaining using Verified perturbation Analysis) — the first explainability method guarantee to have an exhaustive exploration of a perturbation space. |
Thomas Fel; Melanie Ducoffe; David Vigouroux; Rémi Cadène; Mikaël Capelle; Claire Nicodème; Thomas Serre; |
2162 | StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus, inspired by vanilla adversarial learning, a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method is proposed for CD-FSL. |
Yuqian Fu; Yu Xie; Yanwei Fu; Yu-Gang Jiang; |
2163 | Multi-Concept Customization of Text-to-Image Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. |
Nupur Kumari; Bingliang Zhang; Richard Zhang; Eli Shechtman; Jun-Yan Zhu; |
2164 | Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It was shown that an adversary can poison a small part of the unlabeled data so that when a victim trains an SSL model on it, the final model will have a backdoor that the adversary can exploit. This work aims to defend self-supervised learning against such attacks. |
Ajinkya Tejankar; Maziar Sanjabi; Qifan Wang; Sinong Wang; Hamed Firooz; Hamed Pirsiavash; Liang Tan; |
2165 | Long-Tailed Visual Recognition Via Self-Heterogeneous Integration With Knowledge Excavation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel MoE-based method called Self-Heterogeneous Integration with Knowledge Excavation (SHIKE). |
Yan Jin; Mengke Li; Yang Lu; Yiu-ming Cheung; Hanzi Wang; |
2166 | GeoNet: Benchmarking Unsupervised Adaptation Across Geographies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of geographic robustness and make three main contributions. First, we introduce a large-scale dataset GeoNet for geographic adaptation containing benchmarks across diverse tasks like scene recognition (GeoPlaces), image classification (GeoImNet) and universal adaptation (GeoUniDA). |
Tarun Kalluri; Wangdong Xu; Manmohan Chandraker; |
2167 | Context De-Confounded Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Concretely, the harmful bias is a confounder that misleads existing models to learn spurious correlations based on conventional likelihood estimation, significantly limiting the models’ performance. To tackle the issue, this paper provides a causality-based perspective to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task via a tailored causal graph. |
Dingkang Yang; Zhaoyu Chen; Yuzheng Wang; Shunli Wang; Mingcheng Li; Siao Liu; Xiao Zhao; Shuai Huang; Zhiyan Dong; Peng Zhai; Lihua Zhang; |
2168 | LinK: Linear Kernel for LiDAR-Based 3D Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. |
Tao Lu; Xiang Ding; Haisong Liu; Gangshan Wu; Limin Wang; |
2169 | CP3: Channel Pruning Plug-In for Point-Based Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed CP^3, which is a Channel Pruning Plug-in for Point-based network. |
Yaomin Huang; Ning Liu; Zhengping Che; Zhiyuan Xu; Chaomin Shen; Yaxin Peng; Guixu Zhang; Xinmei Liu; Feifei Feng; Jian Tang; |
2170 | InstructPix2Pix: Learning To Follow Image Editing Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. |
Tim Brooks; Aleksander Holynski; Alexei A. Efros; |
2171 | Learning Transformation-Predictive Representations for Detection and Description of Local Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such pseudo-labeled samples prevent deep neural networks from learning discriminative descriptions for accurate matching. To tackle this challenge, we propose to learn transformation-predictive representations with self-supervised contrastive learning. |
Zihao Wang; Chunxu Wu; Yifei Yang; Zhen Li; |
2172 | Two-Way Multi-Label Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a multi-label loss by bridging a gap between the softmax loss and the multi-label scenario. |
Takumi Kobayashi; |
2173 | Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel one-shot talking head synthesis method that achieves disentangled and fine-grained control over lip motion, eye gaze&blink, head pose, and emotional expression. |
Duomin Wang; Yu Deng; Zixin Yin; Heung-Yeung Shum; Baoyuan Wang; |
2174 | Breaking The "Object" in Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we close the gap by collecting a new dataset for Video Object Segmentation under Transformations (VOST). |
Pavel Tokmakov; Jie Li; Adrien Gaidon; |
2175 | Where Is My Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, we focus on the challenging problem of egocentric visual query localization. |
Mengmeng Xu; Yanghao Li; Cheng-Yang Fu; Bernard Ghanem; Tao Xiang; Juan-Manuel Pérez-Rúa; |
2176 | Dionysus: Recovering Scene Structures By Dividing Into Semantic Pieces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper tackles the problem by proposing a novel learning-based 3D reconstruction framework named Dionysus. |
Likang Wang; Lei Chen; |
2177 | ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods focused on changing gaze directions of the images that only include eyes or restricted ranges of faces with low resolution (less than 128*128) to largely reduce interference from other attributes such as hairs, which limits application scenarios. To cope with this limitation, we proposed a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in an interpretable manner. |
Shiwei Jin; Zhen Wang; Lei Wang; Ning Bi; Truong Nguyen; |
2178 | Advancing Visual Grounding With Scene Knowledge: Benchmark and Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Therefore, in this paper, we propose a novel benchmark of Scene Knowledge-guided Visual Grounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. |
Yibing Song; Ruifei Zhang; Zhihong Chen; Xiang Wan; Guanbin Li; |
2179 | Noisy Correspondence Learning With Meta Similarity Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training on such noisy correspondence datasets causes performance degradation because the cross-modal retrieval methods can wrongly enforce the mismatched data to be similar. To tackle this problem, we propose a Meta Similarity Correction Network (MSCN) to provide reliable similarity scores. |
Haochen Han; Kaiyao Miao; Qinghua Zheng; Minnan Luo; |
2180 | CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To better evaluate L-ZSON, we introduce the Pasture benchmark, which considers finding uncommon objects, objects described by spatial and appearance attributes, and hidden objects described relative to visible objects. |
Samir Yitzhak Gadre; Mitchell Wortsman; Gabriel Ilharco; Ludwig Schmidt; Shuran Song; |
2181 | CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, these methods build the graph solely based on the visual features and do not consider the linguistic knowledge carried by the semantic prototypes, e.g., dataset labels. To overcome these problems, we propose a cross-modality graph reasoning adaptation (CIGAR) method to take advantage of both visual and linguistic knowledge. |
Yabo Liu; Jinghua Wang; Chao Huang; Yaowei Wang; Yong Xu; |
2182 | Multiview Compressive Coding for 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore single-view 3D reconstruction by learning generalizable representations inspired by advances in self-supervised learning. |
Chao-Yuan Wu; Justin Johnson; Jitendra Malik; Christoph Feichtenhofer; Georgia Gkioxari; |
2183 | HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method that leverages graph neural networks, multi-level message passing, and unsupervised training to enable real-time prediction of realistic clothing dynamics. |
Artur Grigorev; Michael J. Black; Otmar Hilliges; |
2184 | HyperReel: High-Fidelity 6-DoF Video With Ray-Conditioned Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In particular, existing methods fail to simultaneously achieve real-time performance, small memory footprint, and high-quality rendering for challenging real-world scenes. To address these issues, we present HyperReel — a novel 6-DoF video representation. |
Benjamin Attal; Jia-Bin Huang; Christian Richardt; Michael Zollhöfer; Johannes Kopf; Matthew O’Toole; Changil Kim; |
2185 | Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple approach which can turn a ViT encoder into an efficient video model, which can seamlessly work with both image and video inputs. |
AJ Piergiovanni; Weicheng Kuo; Anelia Angelova; |
2186 | Modeling Entities As Semantic Points for Visual Information Extraction in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the benchmarks used to assess these methods are relatively plain, i.e., scenarios with real-world complexity are not fully represented in these benchmarks. As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common. |
Zhibo Yang; Rujiao Long; Pengfei Wang; Sibo Song; Humen Zhong; Wenqing Cheng; Xiang Bai; Cong Yao; |
2187 | MobileVOS: Real-Time Video Object Segmentation Contrastive Learning Meets Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we provide a theoretically grounded framework that unifies knowledge distillation with supervised contrastive representation learning. |
Roy Miles; Mehmet Kerim Yucel; Bruno Manganelli; Albert Saà-Garriga; |
2188 | PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although these two replay manners are effective, the former would incline to new classes due to class imbalance issues, and the latter is unstable and hard to converge because of the limited number of samples. In this paper, we conduct a comprehensive analysis of these two replay manners and find that they can be complementary. |
Huiwei Lin; Baoquan Zhang; Shanshan Feng; Xutao Li; Yunming Ye; |
2189 | Pose Synchronization Under Multiple Pair-Wise Relative Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a three-step algorithm for pose synchronization under multiple relative pose inputs. |
Yifan Sun; Qixing Huang; |
2190 | Unsupervised Continual Semantic Adaptation Through Neural Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study continual multi-scene adaptation for the task of semantic segmentation, assuming that no ground-truth labels are available during deployment and that performance on the previous scenes should be maintained. |
Zhizheng Liu; Francesco Milano; Jonas Frey; Roland Siegwart; Hermann Blum; Cesar Cadena; |
2191 | Controllable Light Diffusion for Portraits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce light diffusion, a novel method to improve lighting in portraits, softening harsh shadows and specular highlights while preserving overall scene illumination. |
David Futschik; Kelvin Ritland; James Vecore; Sean Fanello; Sergio Orts-Escolano; Brian Curless; Daniel Sýkora; Rohit Pandey; |
2192 | Token Boosting for Robust Self-Supervised Visual Transformer Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Pre-training VTs on such corrupted data can be challenging, especially when we pre-train via the masked autoencoding approach, where both the inputs and masked "ground truth" targets can potentially be unreliable in this case. To address this limitation, we introduce the Token Boosting Module (TBM) as a plug-and-play component for VTs that effectively allows the VT to learn to extract clean and robust features during masked autoencoding pre-training. |
Tianjiao Li; Lin Geng Foo; Ping Hu; Xindi Shang; Hossein Rahmani; Zehuan Yuan; Jun Liu; |
2193 | Multi-View Adversarial Discriminator: Mine The Non-Causal Factors for Object Detection in Unseen Domains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This is mainly due to the single-view nature of DAL. In this work, we present an idea to remove non-causal factors from common features by multi-view adversarial training on source domains, because we observe that such insignificant non-causal factors may still be significant in other latent spaces (views) due to the multi-mode structure of data. |
Mingjun Xu; Lingyun Qin; Weijie Chen; Shiliang Pu; Lei Zhang; |
2194 | MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a contrastive learning method, called masked contrastive learning (MaskCon) to address the under-explored problem setting, where we learn with a coarse-labelled dataset in order to address a finer labelling problem. |
Chen Feng; Ioannis Patras; |
2195 | Boosting Low-Data Instance Segmentation By Unsupervised Pre-Training With Saliency Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels. |
Hao Li; Dingwen Zhang; Nian Liu; Lechao Cheng; Yalun Dai; Chao Zhang; Xinggang Wang; Junwei Han; |
2196 | Virtual Occlusions Through Implicit Depth Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we challenge the need for depth-regression as an intermediate step. |
Jamie Watson; Mohamed Sayed; Zawar Qureshi; Gabriel J. Brostow; Sara Vicente; Oisin Mac Aodha; Michael Firman; |
2197 | AGAIN: Adversarial Training With Attribution Span Enlargement and Hybrid Feature Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a generic method to boost the robust generalization of AT methods from the novel perspective of attribution span. |
Shenglin Yin; Kelu Yao; Sheng Shi; Yangzhou Du; Zhen Xiao; |
2198 | Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the SFDA setting for the task of adaptive object detection. |
Vibashan VS; Poojan Oza; Vishal M. Patel; |
2199 | Instant Multi-View Head Capture Through Learnable Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow and commonly address the problem in two separate steps; multi-view stereo (MVS) reconstruction followed by non-rigid registration. To simplify this process, we introduce TEMPEH (Towards Estimation of 3D Meshes from Performances of Expressive Heads) to directly infer 3D heads in dense correspondence from calibrated multi-view images. |
Timo Bolkart; Tianye Li; Michael J. Black; |
2200 | DiGA: Distil To Generalize and Then Adapt for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this popular approach still faces several challenges in each stage: for warm-up, the widely adopted adversarial training often results in limited performance gain, due to blind feature alignment; for self-training, finding proper categorical thresholds is very tricky. To alleviate these issues, we first propose to replace the adversarial training in the warm-up stage by a novel symmetric knowledge distillation module that only accesses the source domain data and makes the model domain generalizable. Surprisingly, this domain generalizable warm-up model brings substantial performance improvement, which can be further amplified via our proposed cross-domain mixture data augmentation technique. Then, for the self-training stage, we propose a threshold-free dynamic pseudo-label selection mechanism to ease the aforementioned threshold problem and make the model better adapted to the target domain. |
Fengyi Shen; Akhil Gurram; Ziyuan Liu; He Wang; Alois Knoll; |
2201 | DiffSwap: High-Fidelity and Controllable Face Swapping Via 3D-Aware Masked Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose DiffSwap, a diffusion model based framework for high-fidelity and controllable face swapping. |
Wenliang Zhao; Yongming Rao; Weikang Shi; Zuyan Liu; Jie Zhou; Jiwen Lu; |
2202 | GINA-3D: Learning To Generate Implicit Neural Assets in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce GINA-3D, a generative model that uses real-world driving data from camera and LiDAR sensors to create photo-realistic 3D implicit neural assets of diverse vehicles and pedestrians. |
Bokui Shen; Xinchen Yan; Charles R. Qi; Mahyar Najibi; Boyang Deng; Leonidas Guibas; Yin Zhou; Dragomir Anguelov; |
2203 | Consistent Direct Time-of-Flight Video Depth Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, limited by manufacturing capabilities in a compact module, the dToF data has low spatial resolution (e.g., 20×30 for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-resolution dToF data with the corresponding high-resolution RGB guidance. |
Zhanghao Sun; Wei Ye; Jinhui Xiong; Gyeongmin Choe; Jialiang Wang; Shuochen Su; Rakesh Ranjan; |
2204 | Crossing The Gap: Domain Generalization for Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process. |
Yuchen Ren; Zhendong Mao; Shancheng Fang; Yan Lu; Tong He; Hao Du; Yongdong Zhang; Wanli Ouyang; |
2205 | Probabilistic Prompt Learning for Dense Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a novel probabilistic prompt learning to fully exploit the vision-language knowledge in dense prediction tasks. |
Hyeongjun Kwon; Taeyong Song; Somi Jeong; Jin Kim; Jinhyun Jang; Kwanghoon Sohn; |
2206 | Learned Image Compression With Mixed Transformer-CNN Architectures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity to incorporate the local modeling ability of CNN and the non-local modeling ability of transformers to improve the overall architecture of image compression models. |
Jinming Liu; Heming Sun; Jiro Katto; |
2207 | Exploring Intra-Class Variation Factors With Learnable Cluster Prompts for Semi-Supervised Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Learnable Cluster Prompt-based GAN (LCP-GAN) to capture class-wise characteristics and intra-class variation factors with a broader source of supervision. |
Yunfei Zhang; Xiaoyang Huo; Tianyi Chen; Si Wu; Hau San Wong; |
2208 | NeAT: Learning Neural Implicit Surfaces With Arbitrary Topologies From Multi-View Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose NeAT, a new neural rendering framework that can learn implicit surfaces with arbitrary topologies from multi-view images. |
Xiaoxu Meng; Weikai Chen; Bo Yang; |
2209 | Quantum Multi-Model Fitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recently, quantum optimization has been shown to enhance robust fitting for the case of a single model, while leaving the question of multi-model fitting open. In response to this challenge, this paper shows that the latter case can significantly benefit from quantum hardware and proposes the first quantum approach to multi-model fitting (MMF). |
Matteo Farina; Luca Magri; Willi Menapace; Elisa Ricci; Vladislav Golyanik; Federica Arrigoni; |
2210 | SPARF: Neural Radiance Fields From Sparse and Noisy Poses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. |
Prune Truong; Marie-Julie Rakotosaona; Fabian Manhardt; Federico Tombari; |
2211 | ABLE-NeRF: Attention-Based Rendering With Learnable Embeddings for Neural Radiance Field Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an alternative to the physics-based VR approach by introducing a self-attention-based framework on volumes along a ray. |
Zhe Jun Tang; Tat-Jen Cham; Haiyu Zhao; |
2212 | Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nonetheless, previous arbitrary-scale SR methods ignore the ill-posed problem and train the model with per-pixel L1 loss, leading to blurry SR outputs. In this work, we propose "Local Implicit Normalizing Flow" (LINF) as a unified solution to the above problems. |
Jie-En Yao; Li-Yuan Tsao; Yi-Chen Lo; Roy Tseng; Chia-Che Chang; Chun-Yi Lee; |
2213 | WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The focus of prior research in the field has been on training custom models for each quality inspection task, which requires task-specific images and annotation. In this paper we move away from this regime, addressing zero-shot and few-normal-shot anomaly classification and segmentation. |
Jongheon Jeong; Yang Zou; Taewan Kim; Dongqing Zhang; Avinash Ravichandran; Onkar Dabeer; |
2214 | PermutoSDF: Fast Multi-View Reconstruction With Implicit Surfaces Using Permutohedral Lattices Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current SDF methods are overly smooth and miss fine geometric details. In this work, we combine the strengths of these two lines of work in a novel hash-based implicit surface representation. |
Radu Alexandru Rosu; Sven Behnke; |
2215 | TriDet: Temporal Action Detection With Relative Boundary Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a one-stage framework TriDet for temporal action detection. |
Dingfeng Shi; Yujie Zhong; Qiong Cao; Lin Ma; Jia Li; Dacheng Tao; |
2216 | Detection Hub: Unifying Object Detection Datasets Via Query Adaptation on Language Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But similar trend has not been witnessed in object detection when combining multiple datasets due to two inconsistencies among detection datasets: taxonomy difference and domain gap. In this paper, we address these challenges by a new design (named Detection Hub) that is dataset-aware and category-aligned. |
Lingchen Meng; Xiyang Dai; Yinpeng Chen; Pengchuan Zhang; Dongdong Chen; Mengchen Liu; Jianfeng Wang; Zuxuan Wu; Lu Yuan; Yu-Gang Jiang; |
2217 | Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the challenging text-to-shape generation task, we present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model. |
Jiale Xu; Xintao Wang; Weihao Cheng; Yan-Pei Cao; Ying Shan; Xiaohu Qie; Shenghua Gao; |
2218 | Adversarial Normalization: I Can Visualize Everything (ICE) Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose ICE (Adversarial Normalization: I Can visualize Everything), a novel method that enables a model to directly predict a class for each patch in an image; thus, advancing the effective visualization of the explainability of a vision transformer. |
Hoyoung Choi; Seungwan Jin; Kyungsik Han; |
2219 | Reinforcement Learning-Based Black-Box Model Inversion Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, current black-box model inversion attacks that utilize GANs suffer from issues such as being unable to guarantee the completion of the attack process within a predetermined number of query accesses or achieve the same level of performance as white-box attacks. To overcome these limitations, we propose a reinforcement learning-based black-box model inversion attack. |
Gyojin Han; Jaehyun Choi; Haeil Lee; Junmo Kim; |
2220 | Learning A Deep Color Difference Metric for Photographic Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to learn a deep CD metric for photographic images with four desirable properties. |
Haoyu Chen; Zhihua Wang; Yang Yang; Qilin Sun; Kede Ma; |
2221 | 1000 FPS HDR Video With A Spike-RGB Hybrid Camera Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a hybrid camera system composed of a spiking and an alternating-exposure RGB camera to capture HFR&HDR scenes with high fidelity. |
Yakun Chang; Chu Zhou; Yuchen Hong; Liwen Hu; Chao Xu; Tiejun Huang; Boxin Shi; |
2222 | DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we first analyze two 360deg image datasets and observe several findings that characterize how 360deg images typically change along their latitudes. Inspired by these findings, we propose a novel deformable invertible neural network (INN), named DINN360, for latitude-aware 360deg image rescaling. |
Yichen Guo; Mai Xu; Lai Jiang; Leonid Sigal; Yunjin Chen; |
2223 | Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores an alternative learning method by leveraging a lightweight and publicly available type of 3D data in the form of CAD models. |
Pattaramanee Arsomngern; Sarana Nutanong; Supasorn Suwajanakorn; |
2224 | Texts As Images in Prompt Tuning for Multi-Label Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we advocate that the effectiveness of image-text contrastive learning in aligning the two modalities (for training CLIP) further makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting. |
Zixian Guo; Bowen Dong; Zhilong Ji; Jinfeng Bai; Yiwen Guo; Wangmeng Zuo; |
2225 | Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The learned network does not have the capability to characterize the prediction error, generate feedback information from the test sample, and correct the prediction error on the fly for each individual test sample, which results in degraded performance in generalization. In this work, we introduce a self-correctable and adaptable inference (SCAI) method to address the generalization challenge of network prediction and use human pose estimation as an example to demonstrate its effectiveness and performance. |
Zhehan Kan; Shuoshuo Chen; Ce Zhang; Yushun Tang; Zhihai He; |
2226 | Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To deal with the distraction problem, we propose a Selective Attack module, which consists of trainable adapters that generate spatial attention maps of images to guide the attacks on class-irrelevant image areas. |
Runqi Wang; Hao Zheng; Xiaoyue Duan; Jianzhuang Liu; Yuning Lu; Tian Wang; Songcen Xu; Baochang Zhang; |
2227 | Referring Multi-Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT). |
Dongming Wu; Wencheng Han; Tiancai Wang; Xingping Dong; Xiangyu Zhang; Jianbing Shen; |
2228 | Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. |
Sachin Goyal; Ananya Kumar; Sankalp Garg; Zico Kolter; Aditi Raghunathan; |
2229 | GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To tackle the problems, we propose a new FL approach (namely GradMA), which takes inspiration from continual learning to simultaneously correct the server-side and worker-side update directions as well as take full advantage of server’s rich computing and memory resources. |
Kangyang Luo; Xiang Li; Yunshi Lan; Ming Gao; |
2230 | Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the complex temporal structure of videos, proposals distinct from the negative ones may correspond to several video segments but not necessarily the correct ground truth. To alleviate this problem, we propose an uncertainty-guided self-training technique to provide extra self-supervision signals to guide the weakly-supervised learning. |
Yifei Huang; Lijin Yang; Yoichi Sato; |
2231 | Hint-Aug: Drawing Hints From Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. |
Zhongzhi Yu; Shang Wu; Yonggan Fu; Shunyao Zhang; Yingyan (Celine) Lin; |
2232 | A Strong Baseline for Generalized Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a generalized few-shot segmentation framework with a straightforward training process and an easy-to-optimize inference phase. |
Sina Hajimiri; Malik Boudiaf; Ismail Ben Ayed; Jose Dolz; |
2233 | AutoRecon: Automated 3D Object Discovery and Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework named AutoRecon for the automated discovery and reconstruction of an object from multi-view images. |
Yuang Wang; Xingyi He; Sida Peng; Haotong Lin; Hujun Bao; Xiaowei Zhou; |
2234 | POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a pure transformer architecture named POoling aTtention TransformER (POTTER) for the HMR task from single images. |
Ce Zheng; Xianpeng Liu; Guo-Jun Qi; Chen Chen; |
2235 | Learning A Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Consequently, we propose new HDRTV dataset (dubbed HDRTV4K) and new HDR-to-SDR degradation models. |
Cheng Guo; Leidong Fan; Ziyu Xue; Xiuhua Jiang; |
2236 | Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis From Monocular Image Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a 3D-consistent novel view synthesis approach for monocular portrait images based on a recent proposed 3D-aware GAN, namely Generative Radiance Manifolds (GRAM), which has shown strong 3D consistency at multiview image generation of virtual subjects via the radiance manifolds representation. |
Yu Deng; Baoyuan Wang; Heung-Yeung Shum; |
2237 | Patch-Craft Self-Supervised Training for Correlated Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a novel self-supervised training technique suitable for the removal of unknown correlated noise. |
Gregory Vaksman; Michael Elad; |
2238 | Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth Estimation in Dynamic Scenes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the heuristically crafted masks. |
Rui Li; Dong Gong; Wei Yin; Hao Chen; Yu Zhu; Kaixuan Wang; Xiaozhi Chen; Jinqiu Sun; Yanning Zhang; |
2239 | DynaFed: Tackling Client Data Heterogeneity With Global Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we put forth an idea to collect and leverage global knowledge on the server without hindering data privacy. |
Renjie Pi; Weizhong Zhang; Yueqi Xie; Jiahui Gao; Xiaoyu Wang; Sunghun Kim; Qifeng Chen; |
2240 | Bias-Eliminating Augmentation Learning for Debiased Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the challenging task of debiased federated learning, we present a novel FL framework of Bias-Eliminating Augmentation Learning (FedBEAL), which learns to deploy Bias-Eliminating Augmenters (BEA) for producing client-specific bias-conflicting samples at each client. |
Yuan-Yi Xu; Ci-Siang Lin; Yu-Chiang Frank Wang; |
2241 | DistilPose: Tokenized Pose Regression With Heatmap Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. |
Suhang Ye; Yingyi Zhang; Jie Hu; Liujuan Cao; Shengchuan Zhang; Lei Shen; Jun Wang; Shouhong Ding; Rongrong Ji; |
2242 | Understanding The Robustness of 3D Object Detection With Bird’s-Eye-View Representations in Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we evaluate the natural and adversarial robustness of various representative models under extensive settings, to fully understand their behaviors influenced by explicit BEV features compared with those without BEV. |
Zijian Zhu; Yichi Zhang; Hai Chen; Yinpeng Dong; Shu Zhao; Wenbo Ding; Jiachen Zhong; Shibao Zheng; |
2243 | Neural Volumetric Memory for Visual Locomotion Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the difficult problem of locomotion on challenging terrains using a single forward-facing depth camera. |
Ruihan Yang; Ge Yang; Xiaolong Wang; |
2244 | CUF: Continuous Upsampling Filters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider one of the most important operations in image processing: upsampling. |
Cristina N. Vasconcelos; Cengiz Oztireli; Mark Matthews; Milad Hashemi; Kevin Swersky; Andrea Tagliasacchi; |
2245 | Generalist: Decoupling Natural and Robust Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issue, we decouple the natural generalization and the robust generalization from joint training and formulate different training strategies for each one. Specifically, instead of minimizing a global loss on the expectation over these two generalization errors, we propose a bi-expert framework called Generalist where we simultaneously train base learners with task-aware strategies so that they can specialize in their own fields. |
Hongjun Wang; Yisen Wang; |
2246 | Propagate and Calibrate: Real-Time Passive Non-Line-of-Sight Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e.g., security. |
Yihao Wang; Zhigang Wang; Bin Zhao; Dong Wang; Mulin Chen; Xuelong Li; |
2247 | Learning Decorrelated Representations Efficiently Using Fast Fourier Transform Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a relaxed decorrelating regularizer that can be computed in O(n d log d) time by Fast Fourier Transform. |
Yutaro Shigeto; Masashi Shimbo; Yuya Yoshikawa; Akikazu Takeuchi; |
2248 | Quantitative Manipulation of Custom Attributes on 3D-Aware Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While 3D-based GAN techniques have been successfully applied to render photo-realistic 3D images with a variety of attributes while preserving view consistency, there has been little research on how to fine-control 3D images without limiting to a specific category of objects of their properties. To fill such research gap, we propose a novel image manipulation model of 3D-based GAN representations for a fine-grained control of specific custom attributes. |
Hoseok Do; EunKyung Yoo; Taehyeong Kim; Chul Lee; Jin Young Choi; |
2249 | Explicit Visual Prompting for Low-Level Structure Segmentations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP). |
Weihuang Liu; Xi Shen; Chi-Man Pun; Xiaodong Cun; |
2250 | HOTNAS: Hierarchical Optimal Transport for Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing OT-based NAS methods either ignore the cell similarity or focus solely on searching for a single cell architecture. To address these issues, we propose a hierarchical optimal transport metric called HOTNN for measuring the similarity of different networks. |
Jiechao Yang; Yong Liu; Hongteng Xu; |
2251 | Two-Shot Video Object Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate the feasibility of training a satisfactory VOS model on sparsely annotated videos–we merely require two labeled frames per training video while the performance is sustained. |
Kun Yan; Xiao Li; Fangyun Wei; Jinglu Wang; Chenbin Zhang; Ping Wang; Yan Lu; |
2252 | Neural Fields Meet Explicit Geometric Representations for Inverse Rendering of Urban Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel inverse rendering framework for large urban scenes capable of jointly reconstructing the scene geometry, spatially-varying materials, and HDR lighting from a set of posed RGB images with optional depth. |
Zian Wang; Tianchang Shen; Jun Gao; Shengyu Huang; Jacob Munkberg; Jon Hasselgren; Zan Gojcic; Wenzheng Chen; Sanja Fidler; |
2253 | Practical Network Acceleration With Tiny Sets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous methods mainly adopt filter-level pruning to accelerate networks with scarce training samples. In this paper, we reveal that dropping blocks is a fundamentally superior approach in this scenario. |
Guo-Hua Wang; Jianxin Wu; |
2254 | NeRF-RPN: A General Framework for Object Detection in NeRFs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the first significant object detection framework, NeRF-RPN, which directly operates on NeRF. |
Benran Hu; Junkai Huang; Yichen Liu; Yu-Wing Tai; Chi-Keung Tang; |
2255 | Cross-Image-Attention for Conditional Embeddings in Deep Metric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we propose during training to condition the embedding of an image on the image we want to compare it to. |
Dmytro Kotovenko; Pingchuan Ma; Timo Milbich; Björn Ommer; |
2256 | Masked Wavelet Representation for Compact Neural Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. |
Daniel Rho; Byeonghyeon Lee; Seungtae Nam; Joo Chan Lee; Jong Hwan Ko; Eunbyung Park; |
2257 | PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on point cloud and RGB image data, two modalities that are often presented together in the real world and explore their meaningful interactions. |
Anthony Chen; Kevin Zhang; Renrui Zhang; Zihan Wang; Yuheng Lu; Yandong Guo; Shanghang Zhang; |
2258 | ObjectStitch: Object Compositing With Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To preserve the input object’s characteristics, we introduce a content adaptor that helps to maintain categorical semantics and object appearance. |
Yizhi Song; Zhifei Zhang; Zhe Lin; Scott Cohen; Brian Price; Jianming Zhang; Soo Ye Kim; Daniel Aliaga; |
2259 | High-Fidelity 3D GAN Inversion By Pseudo-Multi-View Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. |
Jiaxin Xie; Hao Ouyang; Jingtan Piao; Chenyang Lei; Qifeng Chen; |
2260 | Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we define 3D lane anchors in the 3D space and propose a BEV-free method named Anchor3DLane to predict 3D lanes directly from FV representations. |
Shaofei Huang; Zhenwei Shen; Zehao Huang; Zi-han Ding; Jiao Dai; Jizhong Han; Naiyan Wang; Si Liu; |
2261 | Class-Balancing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. |
Yiming Qin; Huangjie Zheng; Jiangchao Yao; Mingyuan Zhou; Ya Zhang; |
2262 | AstroNet: When Astrocyte Meets Artificial Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: Network structure learning aims to optimize network architectures and make them more efficient without compromising performance. In this paper, we first study the astrocytes, a … |
Mengqiao Han; Liyuan Pan; Xiabi Liu; |
2263 | Feature Alignment and Uniformity for Test Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For test time feature alignment, we propose a memorized spatial local clustering strategy to align the representations among the neighborhood samples for the upcoming batch. |
Shuai Wang; Daoan Zhang; Zipei Yan; Jianguo Zhang; Rui Li; |
2264 | Balanced Product of Calibrated Experts for Long-Tailed Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE). |
Emanuel Sanchez Aimar; Arvi Jonnarth; Michael Felsberg; Marco Kuhlmann; |
2265 | Single Image Backdoor Inversion Via Robust Smoothed Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that one can reliably recover the backdoor trigger with as few as a single image. |
Mingjie Sun; Zico Kolter; |
2266 | PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet effective architecture named PanoSwin to learn panorama representations with ERP. |
Zhixin Ling; Zhen Xing; Xiangdong Zhou; Manliang Cao; Guichun Zhou; |
2267 | Parameter Efficient Local Implicit Image Function Network for Face Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make use of the structural consistency of the human face to propose a lightweight face-parsing method using a Local Implicit Function network, FP-LIIF. |
Mausoom Sarkar; Nikitha SR; Mayur Hemani; Rishabh Jain; Balaji Krishnamurthy; |
2268 | A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction From In-the-Wild Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we in this paper present a novel hierarchical representation network (HRN) to achieve accurate and detailed face reconstruction from a single image. |
Biwen Lei; Jianqiang Ren; Mengyang Feng; Miaomiao Cui; Xuansong Xie; |
2269 | PersonNeRF: Personalized Reconstruction From Photo Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present PersonNeRF, a method that takes a collection of photos of a subject (e.g., Roger Federer) captured across multiple years with arbitrary body poses and appearances, and enables rendering the subject with arbitrary novel combinations of viewpoint, body pose, and appearance. |
Chung-Yi Weng; Pratul P. Srinivasan; Brian Curless; Ira Kemelmacher-Shlizerman; |
2270 | Enhanced Multimodal Representation Learning With Cross-Modal KD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The widely adopted mutual information maximization-based objective leads to a short-cut solution of the weak teacher, i.e., achieving the maximum mutual information by simply making the teacher model as weak as the student model. To prevent such a weak solution, we introduce an additional objective term, i.e., the mutual information between the teacher and the auxiliary modality model. |
Mengxi Chen; Linyu Xing; Yu Wang; Ya Zhang; |
2271 | Learning A Depth Covariance Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose learning a depth covariance function with applications to geometric vision tasks. |
Eric Dexheimer; Andrew J. Davison; |
2272 | Evading DeepFake Detectors Via Adversarial Statistical Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to explicitly minimize the statistical differences to evade state-of-the-art DeepFake detectors. |
Yang Hou; Qing Guo; Yihao Huang; Xiaofei Xie; Lei Ma; Jianjun Zhao; |
2273 | Referring Image Matting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting. |
Jizhizi Li; Jing Zhang; Dacheng Tao; |
2274 | V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To facilitate the development of cooperative perception, we present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception. |
Runsheng Xu; Xin Xia; Jinlong Li; Hanzhao Li; Shuo Zhang; Zhengzhong Tu; Zonglin Meng; Hao Xiang; Xiaoyu Dong; Rui Song; Hongkai Yu; Bolei Zhou; Jiaqi Ma; |
2275 | RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address this through the second component, where instance-specific margins are learnt, allowing the model to distinguish between samples of varying complexity. We introduce a bias-injecting component to our model, and compute the instance-specific margins from the confidence of this component. |
Abhipsa Basu; Sravanti Addepalli; R. Venkatesh Babu; |
2276 | NeuralLift-360: Lifting An In-the-Wild 2D Photo to A 3D Object With 360deg Views Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel framework, dubbed NeuralLift-360, that utilizes a depth-aware neural radiance representation (NeRF) and learns to craft the scene guided by denoising diffusion models. |
Dejia Xu; Yifan Jiang; Peihao Wang; Zhiwen Fan; Yi Wang; Zhangyang Wang; |
2277 | ViP3D: End-to-End Visual Trajectory Prediction Via 3D Agent Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene. |
Junru Gu; Chenxu Hu; Tianyuan Zhang; Xuanyao Chen; Yilun Wang; Yue Wang; Hang Zhao; |
2278 | Modality-Invariant Visual Odometry for Embodied Vision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Transformer-based modality-invariant VO approach that can deal with diverse or changing sensor suites of navigation agents. |
Marius Memmel; Roman Bachmann; Amir Zamir; |
2279 | What You Can Reconstruct From A Shadow Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a method that uses the shadows cast by an unobserved object in order to infer the possible 3D volumes under occlusion. |
Ruoshi Liu; Sachit Menon; Chengzhi Mao; Dennis Park; Simon Stent; Carl Vondrick; |
2280 | Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To address the issues above, we propose a novel global context-enhanced adaptive sparse convolutional network (CEASC). |
Bowei Du; Yecheng Huang; Jiaxin Chen; Di Huang; |
2281 | LidarGait: Benchmarking 3D Gait Recognition With Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead of extracting gait features from images, this work explores precise 3D gait features from point clouds and proposes a simple yet efficient 3D gait recognition framework, termed LidarGait. |
Chuanfu Shen; Chao Fan; Wei Wu; Rui Wang; George Q. Huang; Shiqi Yu; |
2282 | Command-Driven Articulated Object Understanding and Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Cart, a new approach towards articulated-object manipulations by human commands. |
Ruihang Chu; Zhengzhe Liu; Xiaoqing Ye; Xiao Tan; Xiaojuan Qi; Chi-Wing Fu; Jiaya Jia; |
2283 | D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors Via Agent-Based Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, achieving robust image matching remains challenging because CNN extracted descriptors usually lack discriminative ability in texture-less regions and keypoint detectors are only good at identifying keypoints with a specific level of structure. To deal with these issues, a novel image matching method is proposed by Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-based Transformers (D2Former), including a contextual feature descriptor learning (CFDL) module and a hierarchical keypoint detector learning (HKDL) module. |
Jianfeng He; Yuan Gao; Tianzhu Zhang; Zhe Zhang; Feng Wu; |
2284 | ConStruct-VL: Data-Free Continual Structured VL Concepts Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce the first Continual Data-Free Structured VL Concepts Learning (ConStruct-VL) benchmark and show it is challenging for many existing data-free CL strategies. |
James Seale Smith; Paola Cascante-Bonilla; Assaf Arbelle; Donghyun Kim; Rameswar Panda; David Cox; Diyi Yang; Zsolt Kira; Rogerio Feris; Leonid Karlinsky; |
2285 | Lite DETR: An Interleaved Multi-Scale Encoder for Efficient DETR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Lite DETR, a simple yet efficient end-to-end object detection framework that can effectively reduce the GFLOPs of the detection head by 60% while keeping 99% of the original performance. |
Feng Li; Ailing Zeng; Shilong Liu; Hao Zhang; Hongyang Li; Lei Zhang; Lionel M. Ni; |
2286 | HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the meanwhile, traditional multi-view stereo can recover the geometry of scenes with rich textures, by globally optimizing the local, pixel-wise correspondences across multiple views. We are thus motivated to make use of the complementary benefits from the two strategies, and propose a method termed Helix-shaped neural implicit Surface learning or HelixSurf; HelixSurf uses the intermediate prediction from one strategy as the guidance to regularize the learning of the other one, and conducts such intertwined regularization iteratively during the learning process. |
Zhihao Liang; Zhangjin Huang; Changxing Ding; Kui Jia; |
2287 | Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a single-stage encoder-decoder-based network, named JAMNet, for efficient RSC. |
Bin Fan; Yuxin Mao; Yuchao Dai; Zhexiong Wan; Qi Liu; |
2288 | Towards A Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the lightweight student model lacks adequate representation capacity for effective knowledge imitation during the most critical early training period, causing final performance degeneration. To tackle this issue, we propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity. |
Yi Xie; Huaidong Zhang; Xuemiao Xu; Jianqing Zhu; Shengfeng He; |
2289 | Federated Incremental Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Moreover, new clients collecting novel classes may join in the global training of FSS, which further exacerbates catastrophic forgetting. To surmount the above challenges, we propose a Forgetting-Balanced Learning (FBL) model to address heterogeneous forgetting on old classes from both intra-client and inter-client aspects. |
Jiahua Dong; Duzhen Zhang; Yang Cong; Wei Cong; Henghui Ding; Dengxin Dai; |
2290 | 3D-Aware Facial Landmark Detection Via Multi-View Consistent Training on Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fortunately, with the recent advances in generative visual models and neural rendering, we have witnessed rapid progress towards high quality 3D image synthesis. In this work, we leverage such approaches to construct a synthetic dataset and propose a novel multi-view consistent learning strategy to improve 3D facial landmark detection accuracy on in-the-wild images. |
Libing Zeng; Lele Chen; Wentao Bao; Zhong Li; Yi Xu; Junsong Yuan; Nima Khademi Kalantari; |
2291 | Attention-Based Point Cloud Edge Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the Canny edge detection algorithm for images and with the help of the attention mechanism, this paper proposes a non-generative Attention-based Point cloud Edge Sampling method (APES), which captures salient points in the point cloud outline. |
Chengzhi Wu; Junwei Zheng; Julius Pfrommer; Jürgen Beyerer; |
2292 | Avatars Grow Legs: Generating Smooth Human Motion From Sparse Tracking Inputs With Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present AGRoL, a novel conditional diffusion model specifically designed to track full bodies given sparse upper-body tracking signals. |
Yuming Du; Robin Kips; Albert Pumarola; Sebastian Starke; Ali Thabet; Artsiom Sanakoyeu; |
2293 | MobileNeRF: Exploiting The Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a new NeRF representation based on textured polygons that can synthesize novel images efficiently with standard rendering pipelines. |
Zhiqin Chen; Thomas Funkhouser; Peter Hedman; Andrea Tagliasacchi; |
2294 | Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel semi-supervised patch-based CL framework for medical image segmentation without using any explicit pretext task. |
Hritam Basak; Zhaozheng Yin; |
2295 | Learning Neural Proto-Face Field for Disentangled 3D Face Modeling in The Wild Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While plausible details and resolutions are achieved, these models easily fail under extreme conditions of pose, shadow or appearance, due to the entangled fitting or lack of multi-view priors. To address this problem, this paper presents a novel Neural Proto-face Field (NPF) for unsupervised robust 3D face modeling. |
Zhenyu Zhang; Renwang Chen; Weijian Cao; Ying Tai; Chengjie Wang; |
2296 | Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. |
Yushi Lan; Xuyi Meng; Shuai Yang; Chen Change Loy; Bo Dai; |
2297 | PC2: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process. |
Luke Melas-Kyriazi; Christian Rupprecht; Andrea Vedaldi; |
2298 | Gradient-Based Uncertainty Attribution for Explainable Bayesian Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a gradient-based uncertainty attribution method to identify the most problematic regions of the input that contribute to the prediction uncertainty. |
Hanjing Wang; Dhiraj Joshi; Shiqiang Wang; Qiang Ji; |
2299 | Manipulating Transfer Learning for Property Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study how an adversary with control over an upstream model used in transfer learning can conduct property inference attacks on a victim’s tuned downstream model. |
Yulong Tian; Fnu Suya; Anshuman Suri; Fengyuan Xu; David Evans; |
2300 | POEM: Reconstructing Hand in A Point Embedded Multi-View Stereo Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we present a novel method, named POEM, that directly operates on the 3D POints Embedded in the Multi-view stereo for reconstructing hand mesh in it. |
Lixin Yang; Jian Xu; Licheng Zhong; Xinyu Zhan; Zhicheng Wang; Kejian Wu; Cewu Lu; |
2301 | BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose BUFFER, a point cloud registration method for balancing accuracy, efficiency, and generalizability. |
Sheng Ao; Qingyong Hu; Hanyun Wang; Kai Xu; Yulan Guo; |
2302 | CrOC: Cross-View Online Clustering for Dense Visual Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning dense visual representations without labels is an arduous task and more so from scene-centric data. We propose to tackle this challenging problem by proposing a Cross-view consistency objective with an Online Clustering mechanism (CrOC) to discover and segment the semantics of the views. |
Thomas Stegmüller; Tim Lebailly; Behzad Bozorgtabar; Tinne Tuytelaars; Jean-Philippe Thiran; |
2303 | Class Adaptive Network Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our method builds on a general Augmented Lagrangian approach, a well-established technique in constrained optimization, but we introduce several modifications to tailor it for large-scale, class-adaptive training. |
Bingyuan Liu; Jérôme Rony; Adrian Galdran; Jose Dolz; Ismail Ben Ayed; |
2304 | DrapeNet: Garment Generation and Self-Supervised Draping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our work, we rely on self-supervision to train a single network to drape multiple garments. This is achieved by predicting a 3D deformation field conditioned on the latent codes of a generative network, which models garments as unsigned distance fields. |
Luca De Luigi; Ren Li; Benoît Guillard; Mathieu Salzmann; Pascal Fua; |
2305 | Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we go one step further and show that it is possible to successfully generate adversarial fake faces with a specified set of attributes (e.g., hair color, eye size, race, gender, etc.). |
Fahad Shamshad; Koushik Srivatsan; Karthik Nandakumar; |
2306 | FeatureBooster: Boosting Feature Descriptors With A Lightweight Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a lightweight network to improve descriptors of keypoints within the same image. |
Xinjiang Wang; Zeyu Liu; Yu Hu; Wei Xi; Wenxian Yu; Danping Zou; |
2307 | Progressively Optimized Local Radiance Fields for Robust View Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video. |
Andréas Meuleman; Yu-Lun Liu; Chen Gao; Jia-Bin Huang; Changil Kim; Min H. Kim; Johannes Kopf; |
2308 | Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) – a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. |
Gongjie Zhang; Zhipeng Luo; Zichen Tian; Jingyi Zhang; Xiaoqin Zhang; Shijian Lu; |
2309 | Delivering Arbitrary-Modal Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Aside from this, we provide this dataset in four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. To facilitate this data, we present the arbitrary cross-modal segmentation model CMNeXt. |
Jiaming Zhang; Ruiping Liu; Hao Shi; Kailun Yang; Simon Reiß; Kunyu Peng; Haodong Fu; Kaiwei Wang; Rainer Stiefelhagen; |
2310 | GeoMVSNet: Learning Multi-View Stereo With Geometry Perception Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a geometry awareness model, termed GeoMVSNet, to explicitly integrate geometric clues implied in coarse stages for delicate depth estimation. |
Zhe Zhang; Rui Peng; Yuxi Hu; Ronggang Wang; |
2311 | Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD). |
Xinjiang Wang; Xingyi Yang; Shilong Zhang; Yijiang Li; Litong Feng; Shijie Fang; Chengqi Lyu; Kai Chen; Wayne Zhang; |
2312 | OCTET: Object-Aware Counterfactual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, previous methods struggle to explain decision models trained on images with many objects, e.g., urban scenes, which are more difficult to work with but also arguably more critical to explain. In this work, we propose to tackle this issue with an object-centric framework for counterfactual explanation generation. |
Mehdi Zemni; Mickaël Chen; Éloi Zablocki; Hédi Ben-Younes; Patrick Pérez; Matthieu Cord; |
2313 | TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Most recent test-time adaptation methods focus on only classification tasks, use specialized network architectures, destroy model calibration or rely on lightweight information from the source domain. To tackle these issues, this paper proposes a novel Test-time Self-Learning method with automatic Adversarial augmentation dubbed TeSLA for adapting a pre-trained source model to the unlabeled streaming test data. |
Devavrat Tomar; Guillaume Vray; Behzad Bozorgtabar; Jean-Philippe Thiran; |
2314 | DNeRV: Modeling Inherent Dynamics Via Difference Neural Representation for Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To use explicit motion information, we propose Difference Neural Representation for Videos (DNeRV), which consists of two streams for content and frame difference. |
Qi Zhao; M. Salman Asif; Zhan Ma; |
2315 | RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher. |
Jiamu Sun; Gen Luo; Yiyi Zhou; Xiaoshuai Sun; Guannan Jiang; Zhiyu Wang; Rongrong Ji; |
2316 | Handwritten Text Generation From Visual Archetypes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. |
Vittorio Pippi; Silvia Cascianelli; Rita Cucchiara; |
2317 | Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the Unicode Analogies challenge, consisting of polysemic, character-based PMPs to benchmark fluid conceptualisation ability in vision systems. |
Steven Spratley; Krista A. Ehinger; Tim Miller; |
2318 | FFF: Fragment-Guided Flexible Fitting for Building Complete Protein Structures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method named FFF that bridges protein structure prediction and protein structure recognition with flexible fitting. |
Weijie Chen; Xinyan Wang; Yuhang Wang; |
2319 | Polarized Color Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It is a challenge for traditional image processing pipelines owing to the fact that the physical constraints exerted implicitly in the channels are excessively complicated. In this paper, we propose to tackle this issue through a noise modeling method for realistic data synthesis and a powerful network structure inspired by vision Transformer. |
Zhuoxiao Li; Haiyang Jiang; Mingdeng Cao; Yinqiang Zheng; |
2320 | Continuous Pseudo-Label Rectified Domain Adaptive Semantic Segmentation With Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to estimate the rectification values of the predicted pseudo-labels with implicit neural representations. |
Rui Gong; Qin Wang; Martin Danelljan; Dengxin Dai; Luc Van Gool; |
2321 | Hyperbolic Contrastive Learning for Visual Representations Beyond Objects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations of objects and scenes that preserve the structure among them. |
Songwei Ge; Shlok Mishra; Simon Kornblith; Chun-Liang Li; David Jacobs; |
2322 | Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. |
Andreas Blattmann; Robin Rombach; Huan Ling; Tim Dockhorn; Seung Wook Kim; Sanja Fidler; Karsten Kreis; |
2323 | AligNeRF: High-Fidelity Neural Radiance Fields Via Alignment-Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we conduct the first pilot study on training NeRF with high-resolution data and propose the corresponding solutions: 1) marrying the multilayer perceptron (MLP) with convolutional layers which can encode more neighborhood information while reducing the total number of parameters; 2) a novel training strategy to address misalignment caused by moving objects or small camera calibration errors; and 3) a high-frequency aware loss. |
Yifan Jiang; Peter Hedman; Ben Mildenhall; Dejia Xu; Jonathan T. Barron; Zhangyang Wang; Tianfan Xue; |
2324 | NAR-Former: Neural Architecture Representation Learning Towards Holistic Attributes Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These models can be used to estimate attributes of different neural network architectures such as the accuracy and latency, without running the actual training or inference tasks. In this paper, we propose a neural architecture representation model that can be used to estimate these attributes holistically. |
Yun Yi; Haokui Zhang; Wenze Hu; Nannan Wang; Xiaoyu Wang; |
2325 | Implicit 3D Human Mesh Recovery Using Consistency With Pose and Shape From Unseen-View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, we propose "Implicit 3D Human Mesh Recovery (ImpHMR)" that can implicitly imagine a person in 3D space at the feature-level via Neural Feature Fields. |
Hanbyel Cho; Yooshin Cho; Jaesung Ahn; Junmo Kim; |
2326 | UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer Via Hierarchical Mask Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design UniDAformer, a unified domain adaptive panoptic segmentation transformer that is simple but can achieve domain adaptive instance segmentation and semantic segmentation simultaneously within a single network. |
Jingyi Zhang; Jiaxing Huang; Xiaoqin Zhang; Shijian Lu; |
2327 | Non-Contrastive Learning Meets Language-Image Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore the validity of non-contrastive language-image pre-training (nCLIP) and study whether nice properties exhibited in visual self-supervised models can emerge. |
Jinghao Zhou; Li Dong; Zhe Gan; Lijuan Wang; Furu Wei; |
2328 | Teaching Structured Vision & Language Concepts to Vision & Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose various techniques based on language structure understanding that can be used to manipulate the textual part of off-the-shelf paired VL datasets. |
Sivan Doveh; Assaf Arbelle; Sivan Harary; Eli Schwartz; Roei Herzig; Raja Giryes; Rogerio Feris; Rameswar Panda; Shimon Ullman; Leonid Karlinsky; |
2329 | Teleidoscopic Imaging System for Microscale 3D Shape Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a practical method of microscale 3D shape capturing by a teleidoscopic imaging system. |
Ryo Kawahara; Meng-Yu Jennifer Kuo; Shohei Nobuhara; |
2330 | UV Volumes for Real-Time Rendering of Editable Free-View Human Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But the practice is severely limited by high computational costs in the rendering process. To solve this problem, we propose the UV Volumes, a new approach that can render an editable free-view video of a human performer in real-time. |
Yue Chen; Xuan Wang; Xingyu Chen; Qi Zhang; Xiaoyu Li; Yu Guo; Jue Wang; Fei Wang; |
2331 | NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. |
Ron Mokady; Amir Hertz; Kfir Aberman; Yael Pritch; Daniel Cohen-Or; |
2332 | JacobiNeRF: NeRF Shaping With Mutual Information Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a method that trains a neural radiance field (NeRF) to encode not only the appearance of the scene but also semantic correlations between scene points, regions, or entities — aiming to capture their mutual co-variation patterns. |
Xiaomeng Xu; Yanchao Yang; Kaichun Mo; Boxiao Pan; Li Yi; Leonidas Guibas; |
2333 | Selective Structured State-Spaces for Long-Form Video Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we demonstrate that treating all image-tokens equally as done by S4 model can adversely affect its efficiency and accuracy. To address this limitation, we present a novel Selective S4 (i.e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. |
Jue Wang; Wentao Zhu; Pichao Wang; Xiang Yu; Linda Liu; Mohamed Omar; Raffay Hamid; |
2334 | Open-Set Representation Learning Through Combinatorial Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are interested in identifying novel concepts in a dataset through representation learning based on both labeled and unlabeled examples, and extending the horizon of recognition to both known and novel classes. To address this challenging task, we propose a combinatorial learning approach, which naturally clusters the examples in unseen classes using the compositional knowledge given by multiple supervised meta-classifiers on heterogeneous label spaces. |
Geeho Kim; Junoh Kang; Bohyung Han; |
2335 | Multi-View Stereo Representation Revist: Region-Aware MVSNet Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we predict the distance volume from cost volume to estimate the signed distance of points around the surface. |
Yisu Zhang; Jianke Zhu; Lixiang Lin; |
2336 | A Unified HDR Imaging Method With Pixel and Patch Level Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To generate visually pleasing HDR images in various cases, we propose a hybrid HDR deghosting network, called HyHDRNet, to learn the complicated relationship between reference and non-reference images. |
Qingsen Yan; Weiye Chen; Song Zhang; Yu Zhu; Jinqiu Sun; Yanning Zhang; |
2337 | Motion Information Propagation for Neural Video Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that, through information interactions, the synergy between motion coding and frame coding can be achieved. |
Linfeng Qi; Jiahao Li; Bin Li; Houqiang Li; Yan Lu; |
2338 | Accelerated Coordinate Encoding: Learning to Relocalize in Minutes Using RGB and Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since training needs to happen on each new scene again, long training times make learning-based relocalization impractical for most applications, despite its promise of high accuracy. In this paper we show how such a system can actually achieve the same accuracy in less than 5 minutes. |
Eric Brachmann; Tommaso Cavallari; Victor Adrian Prisacariu; |
2339 | Switchable Representation Learning Framework With Self-Compatibility Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Switchable representation learning Framework with Self-Compatibility (SFSC). |
Shengsen Wu; Yan Bai; Yihang Lou; Xiongkun Linghu; Jianzhong He; Ling-Yu Duan; |
2340 | Partial Network Cloning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study a novel task that enables partial knowledge transfer from pre-trained models, which we term as Partial Network Cloning (PNC). |
Jingwen Ye; Songhua Liu; Xinchao Wang; |
2341 | MOTRv2: Bootstrapping End-to-End Multi-Object Tracking By Pretrained Object Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector. |
Yuang Zhang; Tiancai Wang; Xiangyu Zhang; |
2342 | Zero-Shot Dual-Lens Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a zero-shot solution for dual-lens SR (ZeDuSR), where only the dual-lens pair at test time is used to learn an image-specific SR model. |
Ruikang Xu; Mingde Yao; Zhiwei Xiong; |
2343 | Robust Dynamic Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods, thus, are unreliable as SfM algorithms often fail or produce erroneous poses on challenging videos with highly dynamic objects, poorly textured surfaces, and rotating camera motion. We address this issue by jointly estimating the static and dynamic radiance fields along with the camera parameters (poses and focal length). |
Yu-Lun Liu; Chen Gao; Andréas Meuleman; Hung-Yu Tseng; Ayush Saraf; Changil Kim; Yung-Yu Chuang; Johannes Kopf; Jia-Bin Huang; |
2344 | Improving Vision-and-Language Navigation By Generating Future-View Image Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to take one step further and explore whether the agent can benefit from generating the potential future view during navigation. |
Jialu Li; Mohit Bansal; |
2345 | PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce PLIKS (Pseudo-Linear Inverse Kinematic Solver) for reconstruction of a 3D mesh of the human body from a single 2D image. |
Karthik Shetty; Annette Birkhold; Srikrishna Jaganathan; Norbert Strobel; Markus Kowarschik; Andreas Maier; Bernhard Egger; |
2346 | Promoting Semantic Connectivity: Dual Nearest Neighbors Contrastive Learning for Unsupervised Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first delve into the failure of vanilla contrastive learning and point out that semantic connectivity is the key to UDG. |
Yuchen Liu; Yaoming Wang; Yabo Chen; Wenrui Dai; Chenglin Li; Junni Zou; Hongkai Xiong; |
2347 | Interactive Segmentation of Radiance Fields Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present the ISRF method to interactively segment objects with fine structure and appearance. |
Rahul Goel; Dhawal Sirikonda; Saurabh Saini; P. J. Narayanan; |
2348 | GSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we exploit the hand structure and use it as guidance for SDF-based shape reconstruction. |
Zerui Chen; Shizhe Chen; Cordelia Schmid; Ivan Laptev; |
2349 | Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent progress in reducing catastrophic forgetting, its causes and effects remain obscure. Therefore, we study how the representations of semantic segmentation models are affected during domain-incremental learning in adverse weather conditions. |
Tobias Kalb; Jürgen Beyerer; |
2350 | Neural Texture Synthesis With Guided Correspondence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to re-promote the combination of MRFs and neural networks, i.e., the CNNMRF model, for texture synthesis, with two key observations made. |
Yang Zhou; Kaijian Chen; Rongjun Xiao; Hui Huang; |
2351 | Exploring and Utilizing Pattern Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to prior methods which are mostly concerned with category or domain granularity, ignoring the potential finer structure that existed in datasets, we give a new definition of seed category as an appropriate optimization unit to distinguish different patterns in the same category or domain. |
Shibin Mei; Chenglong Zhao; Shengchao Yuan; Bingbing Ni; |
2352 | Are Data-Driven Explanations Robust Against Out-of-Distribution Data? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How to develop robust explanations against out-of-distribution data? To address this problem, we propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE). |
Tang Li; Fengchun Qiao; Mengmeng Ma; Xi Peng; |
2353 | Top-Down Visual Attention From Analysis By Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider top-down attention from a classic Analysis-by-Synthesis (AbS) perspective of vision. |
Baifeng Shi; Trevor Darrell; Xin Wang; |
2354 | Hierarchical Fine-Grained Image Forgery Detection and Localization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a hierarchical fine-grained formulation for IFDL representation learning. |
Xiao Guo; Xiaohong Liu; Zhiyuan Ren; Steven Grosz; Iacopo Masi; Xiaoming Liu; |
2355 | CIMI4D: A Large Multimodal Climbing Motion Dataset Under Human-Scene Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To evaluate the merit of CIMI4D, we perform four tasks which include human pose estimations (with/without scene constraints), pose prediction, and pose generation. |
Ming Yan; Xin Wang; Yudi Dai; Siqi Shen; Chenglu Wen; Lan Xu; Yuexin Ma; Cheng Wang; |
2356 | Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Fantastic Breaks (and Where to Find Them: https://terascale-all-sensing-research-studio.github.io/FantasticBreaks), a dataset containing scanned, waterproofed, and cleaned 3D meshes for 150 broken objects, paired and geometrically aligned with complete counterparts. |
Nikolas Lamb; Cameron Palmer; Benjamin Molloy; Sean Banerjee; Natasha Kholgade Banerjee; |
2357 | Modernizing Old Photos Using Multiple References Via Photorealistic Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In order to modernize old photos, we propose a novel multi-reference-based old photo modernization (MROPM) framework consisting of a network MROPM-Net and a novel synthetic data generation scheme. |
Agus Gunawan; Soo Ye Kim; Hyeonjun Sim; Jae-Ho Lee; Munchurl Kim; |
2358 | Interactive Cartoonization With Controllable Perceptual Factors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Previous deep methods only have focused on end-to-end translation, disabling artists from manipulating results. To tackle this, in this work, we propose a novel solution with editing features of texture and color based on the cartoon creation process. |
Namhyuk Ahn; Patrick Kwon; Jihye Back; Kibeom Hong; Seungkwon Kim; |
2359 | Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we systematically propose a series of geometric measures for perceptual manifolds in deep neural networks, and then explore the effect of the geometric characteristics of perceptual manifolds on classification difficulty and how learning shapes the geometric characteristics of perceptual manifolds. |
Yanbiao Ma; Licheng Jiao; Fang Liu; Shuyuan Yang; Xu Liu; Lingling Li; |