Paper Digest: CVPR 2022 Highlights
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2022, it is to be held in New Orleans.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: CVPR 2022 Highlights
Paper | Author(s) | |
---|---|---|
1 | Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we explore how to extend self-attention modules to better learn subtle feature embeddings for recognizing fine-grained objects, e.g., different bird species or person identities. |
Haowei Zhu; Wenjing Ke; Dong Li; Ji Liu; Lu Tian; Yi Shan; |
2 | SimAN: Exploring Self-Supervised Representation Learning of Scene Text Via Similarity-Aware Normalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a Similarity-Aware Normalization (SimAN) module to identify the different patterns and align the corresponding styles from the guiding patch. |
Canjie Luo; Lianwen Jin; Jingdong Chen; |
3 | GASP, A Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a theoretical framework that generalizes simple and fast algorithms for hierarchical agglomerative clustering to weighted graphs with both attractive and repulsive interactions between the nodes. |
Alberto Bailoni; Constantin Pape; Nathan Hütsch; Steffen Wolf; Thorsten Beier; Anna Kreshuk; Fred A. Hamprecht; |
4 | Estimating Example Difficulty Using Variance of Gradients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. |
Chirag Agarwal; Daniel D’souza; Sara Hooker; |
5 | One Loss for Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper considers an alternative approach to learning the quantization constraints. |
Khoa D. Doan; Peng Yang; Ping Li; |
6 | Pixel Screening Based Intermediate Correction for Blind Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these methods still fail while dealing with images containing saturations and large blurs. To address this problem, we propose an intermediate image correction method which utilizes Bayes posterior estimation to screen through the intermediate image and exclude those unfavorable pixels to reduce their influence for kernel estimation. |
Meina Zhang; Yingying Fang; Guoxi Ni; Tieyong Zeng; |
7 | Weakly Supervised Semantic Segmentation By Pixel-to-Prototype Contrast Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose weakly-supervised pixel-to-prototype contrast that can provide pixel-level supervisory signals to narrow the gap. |
Ye Du; Zehua Fu; Qingjie Liu; Yunhong Wang; |
8 | Controllable Animation of Fluid Elements in Still Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. |
Aniruddha Mahapatra; Kuldeep Kulkarni; |
9 | Holocurtains: Programming Light Curtains Via Binary Holography Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose Holocurtains: a light-efficient approach to producing light curtains of arbitrary shape. |
Dorian Chan; Srinivasa G. Narasimhan; Matthew O’Toole; |
10 | Recurrent Dynamic Embedding for Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. |
Mingxing Li; Li Hu; Zhiwei Xiong; Bang Zhang; Pan Pan; Dong Liu; |
11 | Deep Hierarchical Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. |
Liulei Li; Tianfei Zhou; Wenguan Wang; Jianwu Li; Yi Yang; |
12 | F-SfT: Shape-From-Template With A Physics-Based Deformation Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast to previous works, this paper proposes a new SfT approach explaining 2D observations through physical simulations accounting for forces and material properties. |
Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik; |
13 | Continual Object Detection Via Prototypical Task Correlation Guided Gating Mechanism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Different from previous works that tune the whole network for all tasks, in this work, we present a simple and flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTing mechAnism (ROSETTA). |
Binbin Yang; Xinchi Deng; Han Shi; Changlin Li; Gengwei Zhang; Hang Xu; Shen Zhao; Liang Lin; Xiaodan Liang; |
14 | DATA: Domain-Aware and Task-Aware Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present DATA, a simple yet effective NAS approach specialized for SSL that provides Domain-Aware and Task-Aware pre-training. |
Qing Chang; Junran Peng; Lingxi Xie; Jiajun Sun; Haoran Yin; Qi Tian; Zhaoxiang Zhang; |
15 | TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To leverage the unlabeled data to boost model performance, we present a novel Two-Way Inter-label Self-Training framework named TWIST. |
Ruihang Chu; Xiaoqing Ye; Zhengzhe Liu; Xiao Tan; Xiaojuan Qi; Chi-Wing Fu; Jiaya Jia; |
16 | Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation. |
Chenhang He; Ruihuang Li; Shuai Li; Lei Zhang; |
17 | Learning Adaptive Warping for Real-World Rolling Shutter Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a real-world rolling shutter (RS) correction dataset, BS-RSC, and a corresponding model to correct the RS frames in a distorted video. |
Mingdeng Cao; Zhihang Zhong; Jiahao Wang; Yinqiang Zheng; Yujiu Yang; |
18 | Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Siamese Contrastive Embedding Network (SCEN) for unseen composition recognition. |
Xiangyu Li; Xu Yang; Kun Wei; Cheng Deng; Muli Yang; |
19 | Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Bongard-HOI, a new visual reasoning benchmark that focuses on compositional learning of human-object interactions (HOIs) from natural images. |
Huaizu Jiang; Xiaojian Ma; Weili Nie; Zhiding Yu; Yuke Zhu; Anima Anandkumar; |
20 | RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce RIM-Net, a neural network which learns recursive implicit fields for unsupervised inference of hierarchical shape structures. |
Chengjie Niu; Manyi Li; Kai Xu; Hao Zhang; |
21 | Do Learned Representations Respect Causal Relationships? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Data often has many semantic attributes that are causally associated with each other. But do attribute-specific learned representations of data also respect the same causal relations? We answer this question in three steps. |
Lan Wang; Vishnu Naresh Boddeti; |
22 | ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a discrete descriptor, which can represent the object surface densely. |
Yongzhi Su; Mahdi Saleh; Torben Fetzer; Jason Rambach; Nassir Navab; Benjamin Busam; Didier Stricker; Federico Tombari; |
23 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step. |
Yoad Tewel; Yoav Shalev; Idan Schwartz; Lior Wolf; |
24 | Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They generally explore a unidirectional paradigm, e.g., find the nearest support feature for every query feature and aggregate these local matches for a joint classification. In this paper, we propose a novel Mutual Centralized Learning (MCL) to fully affiliate these two disjoint dense features sets in a bidirectional paradigm. |
Yang Liu; Weifeng Zhang; Chao Xiang; Tu Zheng; Deng Cai; Xiaofei He; |
25 | CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce CAPRI-Net, a self-supervised neural network for learning compact and interpretable implicit representations of 3D computer-aided design (CAD) models, in the form of adaptive primitive assemblies. |
Fenggen Yu; Zhiqin Chen; Manyi Li; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang; |
26 | ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Besides, the existing works ignore Federated Learning (FL) scenarios, failing to make full use of distributed multi-source datasets with rich actual scenes to learn more a powerful TP model. In this paper, we make up for the above defects and propose ATPFL to help users federate multi-source trajectory datasets to automatically design and train a powerful TP model. |
Chunnan Wang; Xiang Chen; Junzhe Wang; Hongzhi Wang; |
27 | Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: These affine parameters were introduced to maintain the expressive powers of the model following normalization. While this hypothesis holds true for classification within the same domain, this work illustrates that these parameters are detrimental to downstream performance on common few-shot transfer tasks. |
Moslem Yazdanpanah; Aamer Abdul Rahman; Muawiz Chaudhary; Christian Desrosiers; Mohammad Havaei; Eugene Belilovsky; Samira Ebrahimi Kahou; |
28 | Bridging The Gap Between Classification and Localization for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight. |
Eunji Kim; Siwon Kim; Jungbeom Lee; Hyunwoo Kim; Sungroh Yoon; |
29 | Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS). |
Lian Xu; Wanli Ouyang; Mohammed Bennamoun; Farid Boussaid; Dan Xu; |
30 | 3D Moments From Near-Duplicate Photos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce 3D Moments, a new computational photography effect. |
Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen; |
31 | Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space. |
Yabin Zhang; Minghan Li; Ruihuang Li; Kui Jia; Lei Zhang; |
32 | Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods. |
Zejin Wang; Jiazheng Liu; Guoqing Li; Hua Han; |
33 | Balanced and Hierarchical Relation Learning for One-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the balanced and hierarchical learning for our detector. |
Hanqing Yang; Sijia Cai; Hualian Sheng; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Yong Tang; Yu Zhang; |
34 | End-to-End Generative Pretraining for Multimodal Video Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining framework for learning from unlabelled videos which can be effectively used for generative tasks such as multimodal video captioning. |
Paul Hongsuck Seo; Arsha Nagrani; Anurag Arnab; Cordelia Schmid; |
35 | Delving Deep Into The Generalization of Vision Transformers Under Distribution Shifts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we provide a comprehensive study on the out-of-distribution generalization of Vision Transformers. |
Chongzhi Zhang; Mingyuan Zhang; Shanghang Zhang; Daisheng Jin; Qiang Zhou; Zhongang Cai; Haiyu Zhao; Xianglong Liu; Ziwei Liu; |
36 | NICE-SLAM: Neural Implicit Scalable Encoding for SLAM Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation. |
Zihan Zhu; Songyou Peng; Viktor Larsson; Weiwei Xu; Hujun Bao; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys; |
37 | HyperDet3D: Learning A Scene-Conditioned 3D Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose HyperDet3D to explore scene-conditioned prior knowledge for 3D object detection. |
Yu Zheng; Yueqi Duan; Jiwen Lu; Jie Zhou; Qi Tian; |
38 | Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID), in which we progressively discard indeterminacy from all the walkable areas until reaching the desired trajectory. |
Tianpei Gu; Guangyi Chen; Junlong Li; Chunze Lin; Yongming Rao; Jie Zhou; Jiwen Lu; |
39 | CLRNet: Cross Layer Refinement Network for Lane Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present Cross Layer Refinement Network (CLRNet) aiming at fully utilizing both high-level and low-level features in lane detection. |
Tu Zheng; Yifei Huang; Yang Liu; Wenjian Tang; Zheng Yang; Deng Cai; Xiaofei He; |
40 | Cross-Modal Map Learning for Vision and Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a cross-modal map learning model for vision-and-language navigation that first learns to predict the top-down semantics on an egocentric map for both observed and unobserved regions, and then predicts a path towards the goal as a set of waypoints. |
Georgios Georgakis; Karl Schmeckpeper; Karan Wanchoo; Soham Dan; Eleni Miltsakaki; Dan Roth; Kostas Daniilidis; |
41 | Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others. |
Shuangrui Ding; Maomao Li; Tianyu Yang; Rui Qian; Haohang Xu; Qingyi Chen; Jue Wang; Hongkai Xiong; |
42 | Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the other hand, attention-based models can learn better long-range dependency for the structure recovery, but they are limited by the heavy computation for inference with large image sizes. To address these issues, we propose to leverage an additional structure restorer to facilitate the image inpainting incrementally. |
Qiaole Dong; Chenjie Cao; Yanwei Fu; |
43 | Pointly-Supervised Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an embarrassingly simple point annotation scheme to collect weak supervision for instance segmentation. |
Bowen Cheng; Omkar Parkhi; Alexander Kirillov; |
44 | Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To endow models with the capability of incorporating expert knowledge, we propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG), in which clinical relation triples are injected into the visual features as prior knowledge to drive the decoding procedure. |
Mingjie Li; Wenjia Cai; Karin Verspoor; Shirui Pan; Xiaodan Liang; Xiaojun Chang; |
45 | Human-Object Interaction Detection Via Disentangled Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our main motivation is that detecting the human-object instances and classifying interactions accurately needs to learn representations that focus on different regions. To this end, we present Disentangled Transformer, where both encoder and decoder are disentangled to facilitate learning of two subtasks. |
Desen Zhou; Zhichao Liu; Jian Wang; Leshan Wang; Tao Hu; Errui Ding; Jingdong Wang; |
46 | DINE: Domain Adaptation From Single and Multiple Black-Box Predictors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). |
Jian Liang; Dapeng Hu; Jiashi Feng; Ran He; |
47 | LGT-Net: Indoor Panoramic Room Layout Estimation With Geometry-Aware Transformer Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present that using horizon-depth along with room height can obtain omnidirectional-geometry awareness of room layout in both horizontal and vertical directions. |
Zhigang Jiang; Zhongzheng Xiang; Jinhua Xu; Ming Zhao; |
48 | CRIS: CLIP-Driven Referring Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the recent advance in Contrastive Language-Image Pretraining (CLIP), in this paper, we propose an end-to-end CLIP-Driven Referring Image Segmentation framework (CRIS). |
Zhaoqing Wang; Yu Lu; Qiang Li; Xunqiang Tao; Yandong Guo; Mingming Gong; Tongliang Liu; |
49 | Multi-View Mesh Reconstruction With Neural Deferred Shading Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an analysis-by-synthesis method for fast multi-view 3D reconstruction of opaque objects with arbitrary materials and illumination. |
Markus Worchel; Rodrigo Diaz; Weiwen Hu; Oliver Schreer; Ingo Feldmann; Peter Eisert; |
50 | CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising By Disentangling Noise From Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the aforementioned challenges, we propose a novel and powerful self-supervised denoising method called CVF-SID based on a Cyclic multi-Variate Function (CVF) module and a self-supervised image disentangling (SID) framework. |
Reyhaneh Neshatavar; Mohsen Yavartanoo; Sanghyun Son; Kyoung Mu Lee; |
51 | Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We proposed the infrared adversarial clothing, which could fool infrared pedestrian detectors at different angles. |
Xiaopei Zhu; Zhanhao Hu; Siyuan Huang; Jianmin Li; Xiaolin Hu; |
52 | Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem. |
Zitian Wang; Xuecheng Nie; Xiaochao Qu; Yunpeng Chen; Si Liu; |
53 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. |
Yingruo Fan; Zhaojiang Lin; Jun Saito; Wenping Wang; Taku Komura; |
54 | Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the methods often ignore the diverse semantic relation within the images. To address this, here we propose a novel semantic relation consistency (SRC) regularization along with the decoupled contrastive learning (DCL), which utilize the diverse semantics by focusing on the heterogeneous semantics between the image patches of a single image. |
Chanyong Jung; Gihyun Kwon; Jong Chul Ye; |
55 | High-Resolution Face Swapping Via Latent Semantics Disentanglement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel high-resolution face swapping method using the inherent prior knowledge of a pre-trained GAN model. |
Yangyang Xu; Bailin Deng; Junle Wang; Yanqing Jing; Jia Pan; Shengfeng He; |
56 | Searching The Deployable Convolution Neural Networks for GPUs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper intends to expedite the model customization with a model hub that contains the optimized models tiered by their inference latency using Neural Architecture Search (NAS). |
Linnan Wang; Chenhan Yu; Satish Salian; Slawomir Kierat; Szymon Migacz; Alex Fit Florea; |
57 | Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Sparse Local Patch Transformer (SLPT) for learning the inherent relation. |
Jiahao Xia; Weiwei Qu; Wenjian Huang; Jianguo Zhang; Xi Wang; Min Xu; |
58 | DeepFake Disrupter: The Detector of DeepFake Is My Friend Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel DeepFake disruption algorithm called "DeepFake Disrupter". |
Xueyu Wang; Jiajun Huang; Siqi Ma; Surya Nepal; Chang Xu; |
59 | Rotationally Equivariant 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information. To this end, we propose Equivariant Object detection Network (EON) with a rotation equivariance suspension design to achieve object-level equivariance. |
Hong-Xing Yu; Jiajun Wu; Li Yi; |
60 | Accelerating DETR Convergence Via Semantic-Aligned Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that the slow convergence is largely attributed to the complication in matching object queries with target features in different feature embedding spaces. This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR’s convergence without sacrificing its accuracy. |
Gongjie Zhang; Zhipeng Luo; Yingchen Yu; Kaiwen Cui; Shijian Lu; |
61 | Long-Short Temporal Contrastive Learning of Video Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we empirically demonstrate that self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results that are on par or better than those obtained with supervised pretraining on large-scale image datasets, even massive ones such as ImageNet-21K. |
Jue Wang; Gedas Bertasius; Du Tran; Lorenzo Torresani; |
62 | Vision Transformer With Deformable Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This flexible scheme enables the self-attention module to focus on relevant regions and cap-ture more informative features. On this basis, we present Deformable Attention Transformer, a general backbone model with deformable attention for both image classifi-cation and dense prediction tasks. |
Zhuofan Xia; Xuran Pan; Shiji Song; Li Erran Li; Gao Huang; |
63 | Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose GPV-1, a task-agnostic vision-language architecture that can learn and perform tasks that involve receiving an image and producing text and/or bounding boxes, including classification, localization, visual question answering, captioning, and more. |
Tanmay Gupta; Amita Kamath; Aniruddha Kembhavi; Derek Hoiem; |
64 | Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, deep networks require expensive annotated datasets trained on costly hardware and do not generalize to even slightly different domains, and minor problem variants. Here, we address these issues by injecting deep vanishing point detection networks with prior knowledge. |
Yancong Lin; Ruben Wiersma; Silvia L. Pintea; Klaus Hildebrandt; Elmar Eisemann; Jan C. van Gemert; |
65 | RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, an unsupervised learning framework is proposed to jointly predict monocular depth and complete 3D motion including the motions of moving objects and camera. |
Tak-Wai Hui; |
66 | LiT: Zero-Shot Transfer With Locked-Image Text Tuning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. |
Xiaohua Zhai; Xiao Wang; Basil Mustafa; Andreas Steiner; Daniel Keysers; Alexander Kolesnikov; Lucas Beyer; |
67 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, synthesized persons in existing datasets are mostly cartoon-like and in random dress collocation, which limits their performance. To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. |
Yanan Wang; Xuezhi Liang; Shengcai Liao; |
68 | GeoNeRF: Generalizing NeRF With Geometry Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. |
Mohammad Mahdi Johari; Yann Lepoittevin; François Fleuret; |
69 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos. |
Biwen Lei; Xiefan Guo; Hongyu Yang; Miaomiao Cui; Xuansong Xie; Di Huang; |
70 | PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To provide a benchmark with high-quality ground truth annotations to the community, we introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL. |
Pengyuan Wang; HyunJun Jung; Yitong Li; Siyuan Shen; Rahul Parthasarathy Srikanth; Lorenzo Garattoni; Sven Meier; Nassir Navab; Benjamin Busam; |
71 | Neural Compression-Based Feature Learning for Video Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes learning noise-robust feature representations to help video restoration. |
Cong Huang; Jiahao Li; Bin Li; Dong Liu; Yan Lu; |
72 | Expanding Low-Density Latent Regions for Open-Set Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to identify unknown objects by separating high/low-density regions in the latent space, based on the consensus that unknown objects are usually distributed in low-density latent regions. |
Jiaming Han; Yuqiang Ren; Jian Ding; Xingjia Pan; Ke Yan; Gui-Song Xia; |
73 | Drop The GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, despite their impressiveness, single-image GANs require long training time (usually hours) for each image and each task and often suffer from visual artifacts. In this paper we revisit the classical patch-based methods, and show that – unlike previously believed — classical methods can be adapted to tackle these novel "GAN-only" tasks. |
Niv Granot; Ben Feinstein; Assaf Shocher; Shai Bagon; Michal Irani; |
74 | Uformer: A General U-Shaped Transformer for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. |
Zhendong Wang; Xiaodong Cun; Jianmin Bao; Wengang Zhou; Jianzhuang Liu; Houqiang Li; |
75 | Exploring Dual-Task Correlation for Pose Guided Person Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG. |
Pengze Zhang; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; |
76 | Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework to remove eyeglasses as well as their cast shadows from face images. |
Junfeng Lyu; Zhibo Wang; Feng Xu; |
77 | Neural Rays for Occlusion-Aware Image-Based Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new neural representation, called Neural Ray (NeuRay), for the novel view synthesis task. |
Yuan Liu; Sida Peng; Lingjie Liu; Qianqian Wang; Peng Wang; Christian Theobalt; Xiaowei Zhou; Wenping Wang; |
78 | Modeling 3D Layout for Group Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, layout ambiguity is introduced because these methods only consider the 2D layout on the imaging plane. In this paper, we overcome the above limitations by 3D layout modeling. |
Quan Zhang; Kaiheng Dang; Jian-Huang Lai; Zhanxiang Feng; Xiaohua Xie; |
79 | Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose a novel approach for mask proposals, Generic Grouping Networks (GGNs), constructed without semantic supervision. |
Weiyao Wang; Matt Feiszli; Heng Wang; Jitendra Malik; Du Tran; |
80 | SIOD: Single Instance Annotated Per Category Per Image for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Under the SIOD setting, we propose a simple yet effective framework, termed Dual-Mining (DMiner), which consists of a Similarity-based Pseudo Label Generating module (SPLG) and a Pixel-level Group Contrastive Learning module (PGCL). |
Hanjun Li; Xingjia Pan; Ke Yan; Fan Tang; Wei-Shi Zheng; |
81 | Toward Fast, Flexible, and Robust Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. |
Long Ma; Tengyu Ma; Risheng Liu; Xin Fan; Zhongxuan Luo; |
82 | Online Learning of Reusable Abstract Models for Object Goal Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel approach to incrementally learn an Abstract Model of an unknown environment, and show how an agent can reuse the learned model for tackling the Object Goal Navigation task. |
Tommaso Campari; Leonardo Lamanna; Paolo Traverso; Luciano Serafini; Lamberto Ballan; |
83 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. |
Muheng Li; Lei Chen; Yueqi Duan; Zhilan Hu; Jianjiang Feng; Jie Zhou; Jiwen Lu; |
84 | SimMatch: Semi-Supervised Learning With Similarity Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduced a new semi-supervised learning framework, SimMatch, which simultaneously considers semantic similarity and instance similarity. |
Mingkai Zheng; Shan You; Lang Huang; Fei Wang; Chen Qian; Chang Xu; |
85 | OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new eXplanation framework, called OrphicX, for generating causal explanations for any graph neural networks (GNNs) based on learned latent causal factors. |
Wanyu Lin; Hao Lan; Hao Wang; Baochun Li; |
86 | HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, in this work, we propose a novel 3D hand mesh estimation network HandOccNet, that can fully exploits the information at occluded regions as a secondary means to enhance image features and make it much richer. |
JoonKyu Park; Yeonguk Oh; Gyeongsik Moon; Hongsuk Choi; Kyoung Mu Lee; |
87 | EfficientNeRF Efficient Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present EfficientNeRF as an efficient NeRF-based method to represent 3D scene and synthesize novel-view images. |
Tao Hu; Shu Liu; Yilun Chen; Tiancheng Shen; Jiaya Jia; |
88 | Quantifying Societal Bias Amplification in Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We provide a comprehensive study on the strengths and limitations of each metric, and propose LIC, a metric to study captioning bias amplification. |
Yusuke Hirota; Yuta Nakashima; Noa Garcia; |
89 | Modular Action Concept Grounding in Semantic Video Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the idea of Mixture of Experts, we embody each abstract label by a structured combination of various visual concept learners and propose a novel video prediction model, Modular Action Concept Network (MAC). |
Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg; |
90 | StyleSwin: Transformer-Based GAN for High-Resolution Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. |
Bowen Zhang; Shuyang Gu; Bo Zhang; Jianmin Bao; Dong Chen; Fang Wen; Yong Wang; Baining Guo; |
91 | Reinforced Structured State-Evolution for Vision-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Structured state-Evolution (SEvol) model to effectively maintain the environment layout clues for VLN. |
Jinyu Chen; Chen Gao; Erli Meng; Qiong Zhang; Si Liu; |
92 | Sub-Word Level Lip Reading With Visual Attention Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this paper is to learn strong lip reading models that can recognise speech in silent videos. |
K R Prajwal; Triantafyllos Afouras; Andrew Zisserman; |
93 | Weakly Supervised High-Fidelity Clothing Model Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the expensive proprietary model images challenge the existing image virtual try-on methods in this scenario, as most of them need to be trained on considerable amounts of model images accompanied with paired clothes images. In this paper, we propose a cheap yet scalable weakly-supervised method called Deep Generative Projection (DGP) to address this specific scenario. |
Ruili Feng; Cheng Ma; Chengji Shen; Xin Gao; Zhenjiang Liu; Xiaobo Li; Kairi Ou; Deli Zhao; Zheng-Jun Zha; |
94 | Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although many IMVC methods have been recently proposed, they always encounter high complexity and expensive time expenditure from being applied into large-scale tasks. In this paper, we present a flexible highly-efficient incomplete large-scale multi-view clustering approach based on bipartite graph framework to solve these issues. |
Siwei Wang; Xinwang Liu; Li Liu; Wenxuan Tu; Xinzhong Zhu; Jiyuan Liu; Sihang Zhou; En Zhu; |
95 | Towards Principled Disentanglement for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization. |
Hanlin Zhang; Yi-Fan Zhang; Weiyang Liu; Adrian Weller; Bernhard Schölkopf; Eric P. Xing; |
96 | Discrete Cosine Transform Network for Guided Depth Map Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To solve the challenges in interpreting the working mechanism, extracting cross-modal features and RGB texture over-transferred, we propose a novel Discrete Cosine Transform Network (DCTNet) to alleviate the problems from three aspects. |
Zixiang Zhao; Jiangshe Zhang; Shuang Xu; Zudi Lin; Hanspeter Pfister; |
97 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. |
Xiaoxue Chen; Tianyu Liu; Hao Zhao; Guyue Zhou; Ya-Qin Zhang; |
98 | E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction Via Neural Stochastic Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these works do not provide videos with sufficiently good quality due to unrealistic artifacts, such as lack of temporal information from irregular and discontinuous data and deterministic modeling for continuous-time stochastic process. In this study, we overcome these difficulties by introducing a new model called E2V-SDE, which is a neural continuous time-state model consisting of a latent stochastic differential equation and a conditional distribution of the observation. |
Jongwan Kim; DongJin Lee; Byunggook Na; Seongsik Park; Sungroh Yoon; |
99 | CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the more realistic setting of class-imbalanced data – called imbalanced SSL – is largely underexplored and standard SSL tends to underperform. In this paper, we propose a novel co-learning framework (CoSSL), which decouples representation and classifier learning while coupling them closely. |
Yue Fan; Dengxin Dai; Anna Kukleva; Bernt Schiele; |
100 | Discovering Objects That Can Move Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies the problem of object discovery — separating objects from the background without manual labels. |
Zhipeng Bao; Pavel Tokmakov; Allan Jabri; Yu-Xiong Wang; Adrien Gaidon; Martial Hebert; |
101 | Knowledge Mining With Scene Text for Fine-Grained Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image and enhance the semantics and correlation to fine-tune the image representation. |
Hao Wang; Junchao Liao; Tianheng Cheng; Zewen Gao; Hao Liu; Bo Ren; Xiang Bai; Wenyu Liu; |
102 | Self-Supervised Learning of Object Parts for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, learning dense representations is challenging, as in the unsupervised context it is not clear how to guide the model to learn representations that correspond to various potential object categories. In this paper, we argue that self-supervised learning of object parts is a solution to this issue. |
Adrian Ziegler; Yuki M. Asano; |
103 | Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the following, we thus propose ICG, a novel probabilistic tracker that fuses region and depth information and only requires the object geometry. |
Manuel Stoiber; Martin Sundermeyer; Rudolph Triebel; |
104 | Single-Photon Structured Light Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel structured light technique that uses Single Photon Avalanche Diode (SPAD) arrays to enable 3D scanning at high-frame rates and low-light levels. |
Varun Sundar; Sizhuo Ma; Aswin C. Sankaranarayanan; Mohit Gupta; |
105 | Deblurring Via Stochastic Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an alternative framework for blind deblurring based on conditional diffusion models. |
Jay Whang; Mauricio Delbracio; Hossein Talebi; Chitwan Saharia; Alexandros G. Dimakis; Peyman Milanfar; |
106 | 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules. |
Daigang Cai; Lichen Zhao; Jing Zhang; Lu Sheng; Dong Xu; |
107 | TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. |
Sijie Zhu; Mubarak Shah; Chen Chen; |
108 | R(Det)2: Randomized Decision Routing for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel approach to combine decision trees and deep neural networks in an end-to-end learning manner for object detection. |
Yali Li; Shengjin Wang; |
109 | Abandoning The Bayer-Filter To See in The Dark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the fact that not all photons can pass the Bayer-Filter on the sensor of the color camera, in this work, we first present a De-Bayer-Filter simulator based on deep neural networks to generate a monochrome raw image from the colored raw image. Next, a fully convolutional network is proposed to achieve the low-light image enhancement by fusing colored raw data with synthesized monochrome data. |
Xingbo Dong; Wanyan Xu; Zhihui Miao; Lan Ma; Chao Zhang; Jiewen Yang; Zhe Jin; Andrew Beng Jin Teoh; Jiajun Shen; |
110 | SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a learned method for stereo image compression that leverages the similarity of the left and right images in a stereo pair due to overlapping fields of view. |
Matthias Wödlinger; Jan Kotera; Jan Xu; Robert Sablatnig; |
111 | Exploiting Temporal Relations on Radar Perception for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To enhance the capacity of automotive radar, in this work, we exploit the temporal information from successive ego-centric bird-eye-view radar image frames for radar object recognition. |
Peizhao Li; Pu Wang; Karl Berntorp; Hongfu Liu; |
112 | Multi-Instance Point Cloud Registration By Efficient Correspondence Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to directly group the set of noisy correspondences into different clusters based on a distance invariance matrix. |
Weixuan Tang; Danping Zou; |
113 | Contrastive Boundary Learning for Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the segmentation of scene boundaries. |
Liyao Tang; Yibing Zhan; Zhe Chen; Baosheng Yu; Dacheng Tao; |
114 | Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts. |
Jie Liang; Hui Zeng; Lei Zhang; |
115 | CVNet: Contour Vibration Network for Building Extraction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the physical vibration theory, we propose a contour vibration network (CVNet) for automatic building boundary delineation. |
Ziqiang Xu; Chunyan Xu; Zhen Cui; Xiangwei Zheng; Jian Yang; |
116 | Hyperbolic Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that hyperbolic manifolds provide a valuable alternative for image segmentation and propose a tractable formulation of hierarchical pixel-level classification in hyperbolic space. |
Mina Ghadimi Atigh; Julian Schoep; Erman Acar; Nanne van Noord; Pascal Mettes; |
117 | Forward Compatible Training for Large-Scale Embedding Retrieval Systems Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT). |
Vivek Ramanujan; Pavan Kumar Anasosalu Vasu; Ali Farhadi; Oncel Tuzel; Hadi Pouransari; |
118 | Everything at Once – Multi-Modal Fusion Transformer for Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Multi-modal learning from video data has seen increased attention recently as it allows training of semantically meaningful embeddings without human annotation, enabling tasks like zero-shot retrieval and action localization. In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space. |
Nina Shvetsova; Brian Chen; Andrew Rouditchenko; Samuel Thomas; Brian Kingsbury; Rogerio S. Feris; David Harwath; James Glass; Hilde Kuehne; |
119 | Swin Transformer V2: Scaling Up Capacity and Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution. |
Ze Liu; Han Hu; Yutong Lin; Zhuliang Yao; Zhenda Xie; Yixuan Wei; Jia Ning; Yue Cao; Zheng Zhang; Li Dong; Furu Wei; Baining Guo; |
120 | Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a novel framework called DT-Net for 3D mesh reconstruction and generation via Disentangled Topology. |
Ka-Hei Hui; Ruihui Li; Jingyu Hu; Chi-Wing Fu; |
121 | DEFEAT: Deep Hidden Feature Backdoor Attacks By Imperceptible Perturbation and Latent Representation Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, We propose a novel and stealthy backdoor attack – DEFEAT. |
Zhendong Zhao; Xiaojun Chen; Yuexin Xuan; Ye Dong; Dakui Wang; Kaitai Liang; |
122 | Projective Manifold Gradient Layer for Deep Rotation Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a manifold-aware gradient that directly backpropagates into deep network weights. |
Jiayi Chen; Yingda Yin; Tolga Birdal; Baoquan Chen; Leonidas J. Guibas; He Wang; |
123 | CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Cross Language Image Matching (CLIMS) framework, based on the recently introduced Contrastive Language-Image Pre-training (CLIP) model, for WSSS. |
Jinheng Xie; Xianxu Hou; Kai Ye; Linlin Shen; |
124 | Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore two orthogonal but complementary aspects of a video snippet, i.e., the action features and the co-occurrence features. |
Kun Xia; Le Wang; Sanping Zhou; Nanning Zheng; Wei Tang; |
125 | It’s Time for Artistic Correspondence in Music and Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an approach for recommending a music track for a given video, and vice versa, based on both their temporal alignment and their correspondence at an artistic level. |
Dídac Surís; Carl Vondrick; Bryan Russell; Justin Salamon; |
126 | Mixed Differential Privacy in Computer Vision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. |
Aditya Golatkar; Alessandro Achille; Yu-Xiang Wang; Aaron Roth; Michael Kearns; Stefano Soatto; |
127 | AdaFace: Quality Adaptive Margin for Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce another aspect of adaptiveness in the loss function, namely the image quality. |
Minchul Kim; Anil K. Jain; Xiaoming Liu; |
128 | Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, hard estimators are difficult to handle the local patches containing structures of different objects or multiple edges. In this paper, a Soft Self-Supervised Estimator (S3Esti) is proposed to overcome this problem by learning to predict multiple scales and orientations. |
Pei Yan; Yihua Tan; Shengzhou Xiong; Yuan Tai; Yansheng Li; |
129 | DN-DETR: Accelerate DETR Training By Introducing Query DeNoising Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present in this paper a novel denoising training method to speedup DETR (DEtection TRansformer) training and offer a deepened understanding of the slow convergence issue of DETR-like methods. |
Feng Li; Hao Zhang; Shilong Liu; Jian Guo; Lionel M. Ni; Lei Zhang; |
130 | HCSC: Hierarchical Contrastive Selective Coding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In addition, the negative pairs used in these methods are not guaranteed to be semantically distinct, which could further hamper the structural correctness of learned image representations. To tackle these limitations, we propose a novel contrastive learning framework called Hierarchical Contrastive Selective Coding (HCSC). |
Yuanfan Guo; Minghao Xu; Jiawen Li; Bingbing Ni; Xuanyu Zhu; Zhenbang Sun; Yi Xu; |
131 | TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. |
Haodong Duan; Nanxuan Zhao; Kai Chen; Dahua Lin; |
132 | KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the theory of Non-Rigid Structure from Motion prescribes to constrain the deformations for 3D reconstruction. We thus propose a new model that departs significantly from this prior work. |
David Novotny; Ignacio Rocco; Samarth Sinha; Alexandre Carlier; Gael Kerchenbaum; Roman Shapovalov; Nikita Smetanin; Natalia Neverova; Benjamin Graham; Andrea Vedaldi; |
133 | Invariant Grounding for Video Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we first take a causal look at VideoQA and argue that invariant grounding is the key to ruling out the spurious correlations. Towards this end, we propose a new learning framework, Invariant Grounding for VideoQA (IGV), to ground the question-critical scene, whose causal relations with answers are invariant across different interventions on the complement. |
Yicong Li; Xiang Wang; Junbin Xiao; Wei Ji; Tat-Seng Chua; |
134 | Prompt Distribution Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present prompt distribution learning for effectively adapting a pre-trained vision-language model to address downstream recognition tasks. |
Yuning Lu; Jianzhuang Liu; Yonggang Zhang; Yajing Liu; Xinmei Tian; |
135 | RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a deep recurrent Rotation Averaging Graph Optimizer (RAGO) for Multiple Rotation Averaging (MRA). |
Heng Li; Zhaopeng Cui; Shuaicheng Liu; Ping Tan; |
136 | Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this line of research, effectively modeling task correlations is vital yet highly neglected. Therefore, we propose Arch-Graph, a transferable NAS method that predicts task-specific optimal architectures with respect to given task embeddings. |
Minbin Huang; Zhijian Huang; Changlin Li; Xin Chen; Hang Xu; Zhenguo Li; Xiaodan Liang; |
137 | On Aliased Resizing and Surprising Subtleties in GAN Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows that choices in low-level image processing have been an under-appreciated aspect of generative modeling. |
Gaurav Parmar; Richard Zhang; Jun-Yan Zhu; |
138 | Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Lepard, a Learning based approach for partial point cloud matching in rigid and deformable scenes. |
Yang Li; Tatsuya Harada; |
139 | Virtual Elastic Objects Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Virtual Elastic Objects (VEOs): virtual objects that not only look like their real-world counterparts but also behave like them, even when subject to novel interactions. |
Hsiao-yu Chen; Edith Tretschk; Tuur Stuyck; Petr Kadlecek; Ladislav Kavan; Etienne Vouga; Christoph Lassner; |
140 | DiSparse: Disentangled Sparsification for Multitask Model Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. |
Xinglong Sun; Ali Hassani; Zhangyang Wang; Gao Huang; Humphrey Shi; |
141 | Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We seek to push the limits of a simple-but-effective pipeline for real-world few-shot image classification in practice. To this end, we explore few-shot learning from the perspective of neural architecture, as well as a three stage pipeline of pre-training on external data, meta-training with labelled few-shot tasks, and task-specific fine-tuning on unseen tasks. |
Shell Xu Hu; Da Li; Jan Stühmer; Minyoung Kim; Timothy M. Hospedales; |
142 | Opening Up Open World Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper addresses this evaluation deficit and lays out the landscape and evaluation methodology for detecting and tracking both known and unknown objects in the open-world setting. |
Yang Liu; Idil Esen Zulfikar; Jonathon Luiten; Achal Dave; Deva Ramanan; Bastian Leibe; Aljoša Ošep; Laura Leal-Taixé; |
143 | Towards Efficient and Scalable Sharpness-Aware Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel algorithm LookSAM – that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. |
Yong Liu; Siqi Mai; Xiangning Chen; Cho-Jui Hsieh; Yang You; |
144 | VISTA: Boosting 3D Object Detection Via Dual Cross-VIew SpaTial Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to adaptively fuse multi-view features in a global spatial context via Dual Cross-VIew SpaTial Attention (VISTA). |
Shengheng Deng; Zhihao Liang; Lin Sun; Kui Jia; |
145 | Rethinking Deep Face Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Because the human visual system is very sensitive to faces, even minor facial changes may alter the identity and significantly degrade the perceptual quality. In this work, we argue the problems of existing models can be traced down to the two sub-tasks of the face restoration problem, i.e. face generation and face reconstruction, and the fragile balance between them. |
Yang Zhao; Yu-Chuan Su; Chun-Te Chu; Yandong Li; Marius Renn; Yukun Zhu; Changyou Chen; Xuhui Jia; |
146 | OSSO: Obtaining Skeletal Shape From Outside Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We address the problem of inferring the anatomic skeleton of a person, in an arbitrary pose, from the 3D surface of the body; i.e. we predict the inside (bones) from the outside (skin). |
Marilyn Keller; Silvia Zuffi; Michael J. Black; Sergi Pujades; |
147 | Temporal Alignment Networks for Long-Term Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment. |
Tengda Han; Weidi Xie; Andrew Zisserman; |
148 | Few-Shot Head Swapping in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present the Head Swapper (HeSer), which achieves few-shot head swapping in the wild through two dedicated designed modules. |
Changyong Shu; Hemao Wu; Hang Zhou; Jiaming Liu; Zhibin Hong; Changxing Ding; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
149 | A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the biases of a varied set of SSL visual models, trained using ImageNet data, using a method and dataset designed by psychological experts to measure social biases. |
Kirill Sirotkin; Pablo Carballeira; Marcos Escudero-Viñolo; |
150 | LAR-SR: A Local Autoregressive Model for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Based on the fact that given the structural information, the textural details in the natural images are locally related without long term dependency, in this paper we propose a novel autoregressive model-based SR approach, namely LAR-SR, which can efficiently generate realistic SR images using a novel local autoregressive (LAR) module. |
Baisong Guo; Xiaoyun Zhang; Haoning Wu; Yu Wang; Ya Zhang; Yan-Feng Wang; |
151 | Bayesian Invariant Risk Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our empirical evidence also provides supports: IRM methods that work well in typical settings significantly deteriorate even if we slightly enlarge the model size or lessen the training data. To alleviate this issue, we propose Bayesian Invariant Risk Minimization (BIRM) by introducing Bayesian inference into the IRM. |
Yong Lin; Hanze Dong; Hao Wang; Tong Zhang; |
152 | Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to mine comprehensive co-salient features with democracy and reduce background interference without introducing any extra information. |
Siyue Yu; Jimin Xiao; Bingfeng Zhang; Eng Gee Lim; |
153 | Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation Via Structure Consistency Constraint Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the low-level I2I translation, where the structure of images is highly related to their semantics. |
Jiaxian Guo; Jiachen Li; Huan Fu; Mingming Gong; Kun Zhang; Dacheng Tao; |
154 | Doodle It Yourself: Class Incremental Learning By Drawing A Few Sketches Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For that, we present a framework that infuses (i) gradient consensus for domain invariant learning, (ii) knowledge distillation for preserving old class information, and (iii) graph attention networks for message passing between old and novel classes. |
Ayan Kumar Bhunia; Viswanatha Reddy Gajjala; Subhadeep Koley; Rohit Kundu; Aneeshan Sain; Tao Xiang; Yi-Zhe Song; |
155 | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, instead of following previous literature, we propose Self-Supervised Predictive Learning (SSPL), a negative-free method for sound localization via explicit positive mining. |
Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang; |
156 | ICON: Implicit Clothed Humans Obtained From Normals Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, our goal is to learn the avatar from only 2D images of people in unconstrained poses. |
Yuliang Xiu; Jinlong Yang; Dimitrios Tzionas; Michael J. Black; |
157 | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels. |
Daniel Geng; Max Hamilton; Andrew Owens; |
158 | Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a generic perception architecture named Uni-Perceiver, which processes a variety of modalities and tasks with unified modeling and shared parameters. |
Xizhou Zhu; Jinguo Zhu; Hao Li; Xiaoshi Wu; Hongsheng Li; Xiaohua Wang; Jifeng Dai; |
159 | The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To build computer vision systems that truly solve real-world problems at global scale, we need benchmarks that fully capture real-world complexity, including geographic domain shift, long-tailed distributions, and data noise. We propose urban forest monitoring as an ideal testbed for studying and improving upon these computer vision challenges, while simultaneously working towards filling a crucial environmental and societal need. |
Sara Beery; Guanhang Wu; Trevor Edwards; Filip Pavetic; Bo Majewski; Shreyasee Mukherjee; Stanley Chan; John Morgan; Vivek Rathod; Jonathan Huang; |
160 | On The Instability of Relative Pose Estimation and RANSAC’s Role Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: These cases arise due to numerical instability of the 5- and 7-point minimal problems. This paper characterizes these instabilities, both in terms of minimal world scene configurations that lead to infinite condition number in epipolar estimation, and also in terms of the related minimal image feature pair correspondence configurations. |
Hongyi Fan; Joe Kileel; Benjamin Kimia; |
161 | Shape From Polarization for Complex Scenes in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image. |
Chenyang Lei; Chenyang Qi; Jiaxin Xie; Na Fan; Vladlen Koltun; Qifeng Chen; |
162 | Real-Time, Accurate, and Consistent Video Semantic Segmentation Via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This demonstration showcases our innovations on efficient, accurate, and temporally consistent video semantic segmentation on mobile device. |
Hyojin Park; Alan Yessenbayev; Tushar Singhal; Navin Kumar Adhikari; Yizhe Zhang; Shubhankar Mangesh Borse; Hong Cai; Nilesh Prasad Pandey; Fei Yin; Frank Mayer; Balaji Calidas; Fatih Porikli; |
163 | SNUG: Self-Supervised Neural Dynamic Garments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a self-supervised method to learn dynamic 3D deformations of garments worn by parametric human bodies. |
Igor Santesteban; Miguel A. Otaduy; Dan Casas; |
164 | Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance. |
Binhui Xie; Longhui Yuan; Shuang Li; Chi Harold Liu; Xinjing Cheng; |
165 | Glass Segmentation Using Intensity and Spectral Polarization Cues Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we exploit that the light-matter interactions on glass materials provide unique intensity-polarization cues for each observed wavelength of light. |
Haiyang Mei; Bo Dong; Wen Dong; Jiaxi Yang; Seung-Hwan Baek; Felix Heide; Pieter Peers; Xiaopeng Wei; Xin Yang; |
166 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. |
Mohamed Afham; Isuru Dissanayake; Dinithi Dissanayake; Amaya Dharmasiri; Kanchana Thilakarathna; Ranga Rodrigo; |
167 | Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing methods are prone to model overfitting and collapse in extremely few shot setting (less than 10). To solve this problem, we propose a relaxed spatial structural alignment (RSSA) method to calibrate the target generative models during the adaption. |
Jiayu Xiao; Liang Li; Chaofei Wang; Zheng-Jun Zha; Qingming Huang; |
168 | Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Albeit great effort made in single source domain adaptation, a more generalized task with multiple source domains remains not being well explored, due to knowledge degradation during their combination. To address this issue, we propose a novel approach, namely target-relevant knowledge preservation (TRKP), to unsupervised multi-source DAOD. |
Jiaxi Wu; Jiaxin Chen; Mengzhe He; Yiru Wang; Bo Li; Bingqi Ma; Weihao Gan; Wei Wu; Yali Wang; Di Huang; |
169 | Pyramid Grafting Network for One-Stage High Resolution Saliency Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this contradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. |
Chenxi Xie; Changqun Xia; Mingcan Ma; Zhirui Zhao; Xiaowu Chen; Jia Li; |
170 | A Style-Aware Discriminator for Controllable Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This limitation largely arises because labels do not consider the semantic distance. To mitigate such problems, we propose a style-aware discriminator that acts as a critic as well as a style encoder to provide conditions. |
Kunhee Kim; Sanghun Park; Eunyeong Jeon; Taehun Kim; Daijin Kim; |
171 | Non-Iterative Recovery From Nonlinear Observations Using Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to estimate the direction of an underlying signal from its nonlinear observations following the semi-parametric single index model (SIM). |
Jiulong Liu; Zhaoqiang Liu; |
172 | Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, this paper builds a novel medical slice synthesis to increase the inter-slice resolution. Considering that the ground-truth intermediate medical slices are always absent in clinical practice, we introduce the incremental cross-view mutual distillation strategy to accomplish this task in the self-supervised learning manner. |
Chaowei Fang; Liang Wang; Dingwen Zhang; Jun Xu; Yixuan Yuan; Junwei Han; |
173 | Enhancing Adversarial Training With Second-Order Statistics of Weights Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through Second-Order Statistics Optimization (S^2O) with respect to the weights. |
Gaojie Jin; Xinping Yi; Wei Huang; Sven Schewe; Xiaowei Huang; |
174 | Partially Does It: Towards Scene-Level FG-SBIR With Partial Input Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstraction, and as a result, (iii) existing scene-level fine-grained sketch-based image retrieval methods collapse as scene sketches become more partial. To solve this "partial" problem, we advocate for a simple set-based approach using optimal transport (OT) to model cross-modal region associativity in a partially-aware fashion. |
Pinaki Nath Chowdhury; Ayan Kumar Bhunia; Viswanatha Reddy Gajjala; Aneeshan Sain; Tao Xiang; Yi-Zhe Song; |
175 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. |
Chaoning Zhang; Kang Zhang; Trung X. Pham; Axi Niu; Zhinan Qiao; Chang D. Yoo; In So Kweon; |
176 | Moving Window Regression: A Novel Approach to Ordinal Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A novel ordinal regression algorithm, called moving window regression (MWR), is proposed in this paper. |
Nyeong-Ho Shin; Seon-Ho Lee; Chang-Su Kim; |
177 | UniCoRN: A Unified Conditional Image Repainting Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods based on two-phase architecture design assume dependency between phases and cause color-image incongruity. To solve these problems, we propose a novel Unified Conditional image Repainting Network (UniCoRN). |
Jimeng Sun; Shuchen Weng; Zheng Chang; Si Li; Boxin Shi; |
178 | Forecasting Characteristic 3D Poses of Human Actions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To predict characteristic poses, we propose a probabilistic approach that models the possible multi-modality in the distribution of likely characteristic poses. |
Christian Diller; Thomas Funkhouser; Angela Dai; |
179 | ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, unlike traditional methods that select confident pseudo label by threshold, we propose a new SSL algorithm, called anti-curriculum pseudo-labelling (ACPL), which introduces novel techniques to select informative unlabelled samples, improving training balance and allowing the model to work for both multi-label and multi-class problems, and to estimate pseudo labels by an accurate ensemble of classifiers (improving pseudo label accuracy). |
Fengbei Liu; Yu Tian; Yuanhong Chen; Yuyuan Liu; Vasileios Belagiannis; Gustavo Carneiro; |
180 | Learning to Deblur Using Light Field Generated and Real Defocus Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel deep defocus deblurring network that leverages the strength and overcomes the shortcoming of light fields. |
Lingyan Ruan; Bin Chen; Jizhou Li; Miuling Lam; |
181 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Different from related methods, we propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. |
Nicolae-Cătălin Ristea; Neelu Madan; Radu Tudor Ionescu; Kamal Nasrollahi; Fahad Shahbaz Khan; Thomas B. Moeslund; Mubarak Shah; |
182 | Safe Self-Refinement for Transformer-Based Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we propose a novel solution named SSRT (Safe Self-Refinement for Transformer-based domain adaptation), which brings improvement from two aspects. |
Tao Sun; Cheng Lu; Tianshuo Zhang; Haibin Ling; |
183 | Density-Preserving Deep Point Cloud Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Local density of point clouds is crucial for representing local details, but has been overlooked by existing point cloud compression methods. To address this, we propose a novel deep point cloud compression method that preserves local density information. |
Yun He; Xinlin Ren; Danhang Tang; Yinda Zhang; Xiangyang Xue; Yanwei Fu; |
184 | StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We apply style transfer on mesh reconstructions of indoor scenes. |
Lukas Höllein; Justin Johnson; Matthias Nießner; |
185 | Which Model To Transfer? Finding The Needle in The Growing Haystack Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand becomes paramount. We provide a formalization of this problem through a familiar notion of regret and introduce the predominant strategies, namely task-agnostic (e.g. ranking models by their ImageNet performance) and task-aware search strategies (such as linear or kNN evaluation). |
Cedric Renggli; André Susano Pinto; Luka Rimanic; Joan Puigcerver; Carlos Riquelme; Ce Zhang; Mario Lučić; |
186 | Fast and Unsupervised Action Boundary Detection for Action Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To deal with the great number of untrimmed videos produced every day, we propose an efficient unsupervised action segmentation method by detecting boundaries, named action boundary detection (ABD). |
Zexing Du; Xue Wang; Guoqing Zhou; Qing Wang; |
187 | Class-Incremental Learning With Strong Pre-Trained Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a 2-stage training scheme, i) feature augmentation – cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion – combining the base and novel classifiers into a unified classifier. |
Tz-Ying Wu; Gurumurthy Swaminathan; Zhizhong Li; Avinash Ravichandran; Nuno Vasconcelos; Rahul Bhotika; Stefano Soatto; |
188 | Robust Optimization As Data Augmentation for Large-Scale Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. |
Kezhi Kong; Guohao Li; Mucong Ding; Zuxuan Wu; Chen Zhu; Bernard Ghanem; Gavin Taylor; Tom Goldstein; |
189 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast to the literature, we propose a family of robust structured declarative classifiers for point cloud classification, where the internal constrained optimization mechanism can effectively defend adversarial attacks through implicit gradients. |
Kaidong Li; Ziming Zhang; Cuncong Zhong; Guanghui Wang; |
190 | PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most indoor 3D scene reconstruction methods focus on recovering 3D geometry and scene layout. In this work, we go beyond this to propose PhotoScene, a framework that takes input image(s) of a scene along with approximately aligned CAD geometry (either reconstructed automatically or manually specified) and builds a photorealistic digital twin with high-quality materials and similar lighting. |
Yu-Ying Yeh; Zhengqin Li; Yannick Hold-Geoffroy; Rui Zhu; Zexiang Xu; Miloš Hašan; Kalyan Sunkavalli; Manmohan Chandraker; |
191 | Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, prior works utilize simple image transformations such as resizing, which limits input diversity. To tackle this limitation, we propose the object-based diverse input (ODI) method that draws an adversarial image on a 3D object and induces the rendered image to be classified as the target class. |
Junyoung Byun; Seungju Cho; Myung-Joon Kwon; Hee-Seon Kim; Changick Kim; |
192 | IRON: Inverse Rendering By Optimizing Neural SDFs and Materials From Photometric Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a neural inverse rendering pipeline called IRON that operates on photometric images and outputs high-quality 3D content in the format of triangle meshes and material textures readily deployable in existing graphics pipelines. |
Kai Zhang; Fujun Luan; Zhengqi Li; Noah Snavely; |
193 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ObjectFolder 2.0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1.0 in three aspects. |
Ruohan Gao; Zilin Si; Yen-Yu Chang; Samuel Clarke; Jeannette Bohg; Li Fei-Fei; Wenzhen Yuan; Jiajun Wu; |
194 | Versatile Multi-Modal Pre-Training for Human-Centric Perception Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, it is desirable to have a versatile pre-train model that serves as a foundation for data-efficient downstream tasks transfer. To this end, we propose the Human-Centric Multi-Modal Contrastive Learning framework HCMoCo that leverages the multi-modal nature of human data (e.g. RGB, depth, 2D keypoints) for effective representation learning. |
Fangzhou Hong; Liang Pan; Zhongang Cai; Ziwei Liu; |
195 | 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a flexible framework for monocular depth estimation from high-resolution 360deg images using tangent images. |
Manuel Rey-Area; Mingze Yuan; Christian Richardt; |
196 | Splicing ViT Features for Semantic Appearance Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method for semantically transferring the visual appearance of one natural image to another. |
Narek Tumanyan; Omer Bar-Tal; Shai Bagon; Tali Dekel; |
197 | Contrastive Regression for Domain Adaptation on Gaze Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel gaze adaptation approach, namely Contrastive Regression Gaze Adaptation (CRGA), for generalizing gaze estimation on the target domain in an unsupervised manner. |
Yaoming Wang; Yangzhou Jiang; Jin Li; Bingbing Ni; Wenrui Dai; Chenglin Li; Hongkai Xiong; Teng Li; |
198 | MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose MUSE-VAE, a new probabilistic modeling framework based on a cascade of Conditional VAEs, which tackles the long-term, uncertain trajectory prediction task using a coarse-to-fine multi-factor forecasting architecture. |
Mihee Lee; Samuel S. Sohn; Seonghyeon Moon; Sejong Yoon; Mubbasir Kapadia; Vladimir Pavlovic; |
199 | Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, one key challenge remains: existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images. To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints. |
Xuanmeng Zhang; Zhedong Zheng; Daiheng Gao; Bang Zhang; Pan Pan; Yi Yang; |
200 | Putting People in Their Place: Monocular Regression of 3D People in Depth Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. |
Yu Sun; Wu Liu; Qian Bao; Yili Fu; Tao Mei; Michael J. Black; |
201 | POCO: Point Convolution for Surface Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, relying on fixed patch sizes may require discretization tuning. To address these issues, we propose to use point cloud convolutions and compute latent vectors at each input point. |
Alexandre Boulch; Renaud Marlet; |
202 | Memory-Augmented Non-Local Attention for Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones. |
Jiyang Yu; Jingen Liu; Liefeng Bo; Tao Mei; |
203 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Observing that person images are highly structured, we propose to generate desired images by extracting and distributing semantic entities of reference images. |
Yurui Ren; Xiaoqing Fan; Ge Li; Shan Liu; Thomas H. Li; |
204 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Today’s VidSGG models are all proposal-based methods, i.e., they first generate numerous paired subject-object snippets as proposals, and then conduct predicate classification for each proposal. In this paper, we argue that this prevalent proposal-based framework has three inherent drawbacks: 1) The ground-truth predicate labels for proposals are partially correct. |
Kaifeng Gao; Long Chen; Yulei Niu; Jian Shao; Jun Xiao; |
205 | Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing methods still have two shortcomings: (1) they neglect that the multi-contrast features at different scales contain different anatomical details and hence lack effective mechanisms to match and fuse these features for better reconstruction; and (2) they are still deficient in capturing long-range dependencies, which are essential for the regions with complicated anatomical structures. We propose a novel network to comprehensively address these problems by developing a set of innovative Transformer-empowered multi-scale contextual matching and aggregation techniques; we call it McMRSR. |
Guangyuan Li; Jun Lv; Yapeng Tian; Qi Dou; Chengyan Wang; Chenliang Xu; Jing Qin; |
206 | GazeOnce: Real-Time Multi-Person Gaze Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the first one-stage end-to-end gaze estimation method, GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces (>10) in an image. |
Mingfang Zhang; Yunfei Liu; Feng Lu; |
207 | GateHUB: Gated History Unit With Background Suppression for Online Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is therefore important to accentuate parts of the history that are more informative to the prediction of the current frame. We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction. |
Junwen Chen; Gaurav Mittal; Ye Yu; Yu Kong; Mei Chen; |
208 | Few-Shot Font Generation By Learning Fine-Grained Local Styles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new font generation approach by learning 1) the fine-grained local styles from references, and 2) the spatial correspondence between the content and reference glyphs. |
Licheng Tang; Yiyang Cai; Jiaming Liu; Zhibin Hong; Mingming Gong; Minhu Fan; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
209 | Bridging Video-Text Retrieval With Multiple Choice Questions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we enable fine-grained video-text interactions while maintaining high efficiency for retrieval via a novel pretext task, dubbed as Multiple Choice Questions (MCQ), where a parametric module BridgeFormer is trained to answer the "questions" constructed by the text features via resorting to the video features. |
Yuying Ge; Yixiao Ge; Xihui Liu; Dian Li; Ying Shan; Xiaohu Qie; Ping Luo; |
210 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a self-supervised face-depth learning method to automatically recover dense 3D facial geometry (i.e. depth) from the face videos without the requirement of any expensive 3D annotation data. |
Fa-Ting Hong; Longhao Zhang; Li Shen; Dan Xu; |
211 | Dual-Path Image Inpainting With Auxiliary GAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a dual-path inpainting network with inversion path and feed-forward path, in which inversion path provides auxiliary information to help feed-forward path. |
Wentao Wang; Li Niu; Jianfu Zhang; Xue Yang; Liqing Zhang; |
212 | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). |
Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu; |
213 | Generative Flows With Invertible Attentions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transformer-based attentions, for both unconditional and conditional generative flows. |
Rhea Sanjay Sukthanker; Zhiwu Huang; Suryansh Kumar; Radu Timofte; Luc Van Gool; |
214 | Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an effective solution by simply clipping the Euclidean feature magnitude while training HNNs. |
Yunhui Guo; Xudong Wang; Yubei Chen; Stella X. Yu; |
215 | Estimating Fine-Grained Noise Model Via Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we combine both noise modeling and estimation, and propose an innovative noise model estimation and noise synthesis pipeline for realistic noisy image generation. |
Yunhao Zou; Ying Fu; |
216 | DiffPoseNet: Direct Differentiable Camera Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a network NFlowNet, for normal flow estimation which is used to enforce robust and direct constraints. |
Chethan M. Parameshwara; Gokul Hari; Cornelia Fermüller; Nitin J. Sanket; Yiannis Aloimonos; |
217 | The Flag Median and FlagIRLS Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While a number of different subspace prototypes have been described, the calculation of some of these prototypes has proven to be computationally expensive while other prototypes are affected by outliers and produce highly imperfect clustering on noisy data. This work proposes a new subspace prototype, the flag median, and introduces the FlagIRLS algorithm for its calculation. |
Nathan Mankovich; Emily J. King; Chris Peterson; Michael Kirby; |
218 | Implicit Feature Decoupling With Depthwise Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Depthwise Quantization (DQ) where quantization is applied to a decomposed sub-tensor along the feature axis of weak statistical dependence. |
Iordanis Fostiropoulos; Barry Boehm; |
219 | Graph-Context Attention Networks for Size-Varied Deep Graph Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle this, we firstly propose to formulate the combinatorial problem of graph matching as an Integer Linear Programming (ILP) problem, which is more flexible and efficient to facilitate comparing graphs of varied sizes. A novel Graph-context Attention Network (GCAN), which jointly capture intrinsic graph structure and cross-graph information for improving the discrimination of node features, is then proposed and trained to resolve this ILP problem with node correspondence supervision. |
Zheheng Jiang; Hossein Rahmani; Plamen Angelov; Sue Black; Bryan M. Williams; |
220 | FENeRF: Face Editing in Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: 3D-aware GAN methods can maintain view consistency but their generated images are not locally editable. To overcome these limitations, we propose FENeRF, a 3D-aware generator that can produce view-consistent and locally-editable portrait images. |
Jingxiang Sun; Xuan Wang; Yong Zhang; Xiaoyu Li; Qi Zhang; Yebin Liu; Jue Wang; |
221 | CoNeRF: Controllable Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key idea is to treat the attributes as latent variables that are regressed by the neural network given the scene encoding. |
Kacper Kania; Kwang Moo Yi; Marek Kowalski; Tomasz Trzciński; Andrea Tagliasacchi; |
222 | Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a framework for training a noise model and a denoiser simultaneously while relying only on pairs of noisy images rather than noisy/clean paired image data. |
Ali Maleky; Shayan Kousha; Michael S. Brown; Marcus A. Brubaker; |
223 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we take a step towards computer-aided waste detection and present the first in-the-wild industrial-grade waste detection and segmentation dataset, ZeroWaste. |
Dina Bashkirova; Mohamed Abdelfattah; Ziliang Zhu; James Akl; Fadi Alladkani; Ping Hu; Vitaly Ablavsky; Berk Calli; Sarah Adel Bargal; Kate Saenko; |
224 | Remember Intentions: Retrospective-Memory-Based Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To provide a more explicit link between the current situation and the seen instances, we imitate the mechanism of retrospective memory in neuropsychology and propose MemoNet, an instance-based approach that predicts the movement intentions of agents by looking for similar scenarios in the training data. |
Chenxin Xu; Weibo Mao; Wenjun Zhang; Siheng Chen; |
225 | Measuring Compositional Consistency for Video Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a question decomposition engine that programmatically deconstructs a compositional question into a directed acyclic graph of sub-questions. |
Mona Gandhi; Mustafa Omer Gul; Eva Prakash; Madeleine Grunde-McLaughlin; Ranjay Krishna; Maneesh Agrawala; |
226 | Category Contrast for Unsupervised Domain Adaptation in Visual Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we explore the idea of instance contrastive learning in unsupervised domain adaptation (UDA) and propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks. |
Jiaxing Huang; Dayan Guan; Aoran Xiao; Shijian Lu; Ling Shao; |
227 | SwapMix: Diagnosing and Regularizing The Over-Reliance on Visual Context in Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study the robustness of VQA models from a novel perspective: visual context. |
Vipul Gupta; Zhuowan Li; Adam Kortylewski; Chenyu Zhang; Yingwei Li; Alan Yuille; |
228 | UNIST: Unpaired Neural Implicit Shape Translation Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. |
Qimin Chen; Johannes Merz; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang; |
229 | Local-Adaptive Face Recognition Via Graph-Based Meta-Clustering and Regularized Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To support continuous learning and fill the last-mile quality gap, we introduce a new problem setup called Local-Adaptive Face Recognition (LaFR). |
Wenbin Zhu; Chien-Yi Wang; Kuan-Lun Tseng; Shang-Hong Lai; Baoyuan Wang; |
230 | The DEVIL Is in The Details: A Diagnostic Evaluation Benchmark for Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although attributes such as camera and background scene motion inherently change the difficulty of the task and affect methods differently, existing evaluation schemes fail to control for them, thereby providing minimal insight into inpainting failure modes. To address this gap, we propose the Diagnostic Evaluation of Video Inpainting on Landscapes (DEVIL) benchmark, which consists of two contributions: (i) a novel dataset of videos and masks labeled according to several key inpainting failure modes, and (ii) an evaluation scheme that samples slices of the dataset characterized by a fixed content attribute, and scores performance on each slice according to reconstruction, realism, and temporal consistency quality. |
Ryan Szeto; Jason J. Corso; |
231 | Mutual Information-Driven Pan-Sharpening Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This leads to information redundancy not being handled well, which further limits the performance of these methods. To address the above issue, we propose a novel mutual information-driven Pan-sharpening framework in this paper. |
Man Zhou; Keyu Yan; Jie Huang; Zihe Yang; Xueyang Fu; Feng Zhao; |
232 | Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Query-modulated Refinement Network (QRNet) to address the inconsistent issue by adjusting intermediate features in the visual backbone with a novel Query-aware Dynamic Attention (QD-ATT) mechanism and query-aware multiscale fusion. |
Jiabo Ye; Junfeng Tian; Ming Yan; Xiaoshan Yang; Xuwu Wang; Ji Zhang; Liang He; Xin Lin; |
233 | A Framework for Learning Ante-Hoc Explainable Models Via Concepts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Self-explaining deep models are designed to learn the latent concept-based explanations implicitly during training, which eliminates the requirement of any post-hoc explanation generation technique. In this work, we propose one such model that appends an explanation generation module on top of any basic network and jointly trains the whole module that shows high predictive performance and generates meaningful explanations in terms of concepts. |
Anirban Sarkar; Deepak Vijaykeerthy; Anindya Sarkar; Vineeth N Balasubramanian; |
234 | Generating Useful Accident-Prone Driving Scenarios Via A Learned Traffic Prior Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce STRIVE, a method to automatically generate challenging scenarios that cause a given planner to produce undesirable behavior, like collisions. |
Davis Rempe; Jonah Philion; Leonidas J. Guibas; Sanja Fidler; Or Litany; |
235 | FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose FLOAT, a factorized label space framework for scalable multi-object multi-part parsing. |
Rishubh Singh; Pranav Gupta; Pradeep Shenoy; Ravikiran Sarvadevabhatla; |
236 | Efficient Geometry-Aware 3D Generative Adversarial Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. |
Eric R. Chan; Connor Z. Lin; Matthew A. Chan; Koki Nagano; Boxiao Pan; Shalini De Mello; Orazio Gallo; Leonidas J. Guibas; Jonathan Tremblay; Sameh Khamis; Tero Karras; Gordon Wetzstein; |
237 | DO-GAN: A Double Oracle Framework for Generative Adversarial Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new approach to train Generative Adversarial Networks (GANs) where we deploy a double-oracle framework using the generator and discriminator oracles. |
Aye Phyu Phyu Aung; Xinrun Wang; Runsheng Yu; Bo An; Senthilnath Jayavelu; Xiaoli Li; |
238 | Dancing Under The Stars: Video Denoising in Starlight Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we demonstrate photorealistic video under starlight (no moon present, <0.001 lux) for the first time. |
Kristina Monakhova; Stephan R. Richter; Laura Waller; Vladlen Koltun; |
239 | FocusCut: Diving Into A Focus View in Interactive Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the global view makes the model lose focus from later clicks, and is not in line with user intentions. In this paper, we dive into the view of clicks’ eyes to endow them with the decisive role in object details again. |
Zheng Lin; Zheng-Peng Duan; Zhao Zhang; Chun-Le Guo; Ming-Ming Cheng; |
240 | Medial Spectral Coordinates for 3D Shape Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Yet, surprisingly, such coordinates have thus far typically considered only local surface positional or derivative information. In the present article, we propose to equip spectral coordinates with medial (object width) information, so as to enrich them. |
Morteza Rezanejad; Mohammad Khodadad; Hamidreza Mahyar; Herve Lombaert; Michael Gruninger; Dirk Walther; Kaleem Siddiqi; |
241 | Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. |
Liangzhe Yuan; Rui Qian; Yin Cui; Boqing Gong; Florian Schroff; Ming-Hsuan Yang; Hartwig Adam; Ting Liu; |
242 | Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. |
Liangqiong Qu; Yuyin Zhou; Paul Pu Liang; Yingda Xia; Feifei Wang; Ehsan Adeli; Li Fei-Fei; Daniel Rubin; |
243 | APES: Articulated Part Extraction From Sprite Sheets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. |
Zhan Xu; Matthew Fisher; Yang Zhou; Deepali Aneja; Rushikesh Dudhat; Li Yi; Evangelos Kalogerakis; |
244 | Dressing in The Wild By Watching Dance Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper, therefore, attends to virtual try-on in real-world scenes and brings essential improvements in authenticity and naturalness especially for loose garment (e.g., skirts, formal dresses), challenging poses (e.g., cross arms, bent legs), and cluttered backgrounds. |
Xin Dong; Fuwei Zhao; Zhenyu Xie; Xijin Zhang; Daniel K. Du; Min Zheng; Xiang Long; Xiaodan Liang; Jianchao Yang; |
245 | SPAct: Self-Supervised Privacy Preservation for Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels. |
Ishan Rajendrakumar Dave; Chen Chen; Mubarak Shah; |
246 | Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a consequence, the 3D structure is no longer preserved by a modified depth image or feature. To address this issue, we propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input. |
Xiaoke Jiang; Donghai Li; Hao Chen; Ye Zheng; Rui Zhao; Liwei Wu; |
247 | De-Rendering 3D Objects in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters. |
Felix Wimbauer; Shangzhe Wu; Christian Rupprecht; |
248 | SPAMs: Structured Implicit Parametric Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observe that deformable object motion is often semantically structured, and thus propose to learn Structured-implicit PArametric Models (SPAMs) as a deformable object representation that structurally decomposes non-rigid object motion into part-based disentangled representations of shape and pose, with each being represented by deep implicit functions. |
Pablo Palafox; Nikolaos Sarafianos; Tony Tung; Angela Dai; |
249 | Global Sensing and Measurements Reuse for Image Compressed Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale. |
Zi-En Fan; Feng Lian; Jia-Ni Quan; |
250 | SeeThroughNet: Resurrection of Auxiliary Loss By Preserving Class Probability Information Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce Class Probability Preserving (CPP) pooling to alleviate information loss in down-sampling the ground truth in semantic segmentation tasks. |
Dasol Han; Jaewook Yoo; Dokwan Oh; |
251 | Representing 3D Shapes With Probabilistic Directed Distance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we endeavour to address both shortcomings with a novel shape representation that allows fast differentiable rendering within an implicit architecture. |
Tristan Aumentado-Armstrong; Stavros Tsogkas; Sven Dickinson; Allan D. Jepson; |
252 | Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel algorithm that utilizes a weak form of supervision where the data is partitioned into sets according to certain inactive (common) factors of variation which are invariant across elements of each set. |
Kieran A. Murphy; Varun Jampani; Srikumar Ramalingam; Ameesh Makadia; |
253 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Amazon Berkeley Objects (ABO), a new large-scale dataset designed to help bridge the gap between real and virtual 3D worlds. |
Jasmine Collins; Shubham Goel; Kenan Deng; Achleshwar Luthra; Leon Xu; Erhan Gundogdu; Xi Zhang; Tomas F. Yago Vicente; Thomas Dideriksen; Himanshu Arora; Matthieu Guillaumin; Jitendra Malik; |
254 | DETReg: Unsupervised Pretraining With Region Priors for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we introduce DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. |
Amir Bar; Xin Wang; Vadim Kantorov; Colorado J. Reed; Roei Herzig; Gal Chechik; Anna Rohrbach; Trevor Darrell; Amir Globerson; |
255 | Learning To Restore 3D Face From In-the-Wild Degraded Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In-the-wild 3D face modelling is a challenging problem as the predicted facial geometry and texture suffer from a lack of reliable clues or priors, when the input images are degraded. To address such a problem, in this paper we propose a novel Learning to Restore (L2R) 3D face framework for unsupervised high-quality face reconstruction from low-resolution images. |
Zhenyu Zhang; Yanhao Ge; Ying Tai; Xiaoming Huang; Chengjie Wang; Hao Tang; Dongjin Huang; Zhifeng Xie; |
256 | Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable (i.e., approaching the lower bound of robustness). Towards this target, we propose a parameter-free Adaptive Auto Attack (A3) evaluation method which addresses the efficiency and reliability in a test-time-training fashion. |
Ye Liu; Yaya Cheng; Lianli Gao; Xianglong Liu; Qilong Zhang; Jingkuan Song; |
257 | Convolutions for Spatial Interaction Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider the problem of spatial interaction modeling in the context of predicting the motion of actors around autonomous vehicles, and investigate alternatives to GNNs. |
Zhaoen Su; Chao Wang; David Bradley; Carlos Vallespi-Gonzalez; Carl Wellington; Nemanja Djuric; |
258 | MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For detecting actions in those complex videos, efficiently capturing both short-term and long-term temporal information in the video is critical. To this end, we propose a novel ConvTransformer network for action detection. |
Rui Dai; Srijan Das; Kumara Kahatapitiya; Michael S. Ryoo; François Brémond; |
259 | Salvage of Supervision in Weakly Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To bridge the performance and technical gaps between WSOD and FSOD, this paper proposes a new framework, Salvage of Supervision (SoS), with the key idea being to harness every potentially useful supervisory signal in WSOD: the weak image-level labels, the pseudo-labels, and the power of semi-supervised object detection. |
Lin Sui; Chen-Lin Zhang; Jianxin Wu; |
260 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present cross-view transformers, an efficient attention-based model for map-view semantic segmentation from multiple cameras. |
Brady Zhou; Philipp Krähenbühl; |
261 | Distinguishing Unseen From Seen for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel method which leverages both visual and semantic modalities to distinguish seen and unseen categories. |
Hongzu Su; Jingjing Li; Zhi Chen; Lei Zhu; Ke Lu; |
262 | Online Continual Learning on A Contaminated Data Stream With Blurry Task Boundaries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To balance diversity and purity in the episodic memory, we propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning. |
Jihwan Bang; Hyunseo Koh; Seulki Park; Hwanjun Song; Jung-Woo Ha; Jonghyun Choi; |
263 | Controllable Dynamic Multi-Task Architectures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. |
Dripta S. Raychaudhuri; Yumin Suh; Samuel Schulter; Xiang Yu; Masoud Faraki; Amit K. Roy-Chowdhury; Manmohan Chandraker; |
264 | Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although maintaining a handful of samples (called "exemplars") of each task could alleviate forgetting to some extent, existing methods are still limited by the small number of exemplars since these exemplars are too few to carry enough task-specific knowledge, and therefore the forgetting remains. To overcome this problem, we propose to "imagine" diverse counterparts of given exemplars referring to the abundant semantic-irrelevant information from unlabeled data. |
Yu-Ming Tang; Yi-Xing Peng; Wei-Shi Zheng; |
265 | SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to schedule the optimal one at inference time? |
Ran Xu; Fangzhou Mu; Jayoung Lee; Preeti Mukherjee; Somali Chaterji; Saurabh Bagchi; Yin Li; |
266 | VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VL-T5. |
Yi-Lin Sung; Jaemin Cho; Mohit Bansal; |
267 | Deep Hybrid Models for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a principled and practical method for out-of-distribution (OoD) detection with deep hybrid models (DHMs), which model the joint density p(x,y) of features and labels with a single forward pass. |
Senqi Cao; Zhongfei Zhang; |
268 | Accelerating Video Object Segmentation With Compressed Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an efficient plug-and-play acceleration framework for semi-supervised video object segmentation by exploiting the temporal redundancies in videos presented by the compressed bitstream. |
Kai Xu; Angela Yao; |
269 | Exploring Domain-Invariant Parameters for Source Free Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The motivation behind this insight is clear: the domain-invariant representations are dominated by only partial parameters of an available deep source model. We devise the Domain-Invariant Parameter Exploring (DIPE) approach to capture such domain-invariant parameters in the source model to generate domain-invariant representations. |
Fan Wang; Zhongyi Han; Yongshun Gong; Yilong Yin; |
270 | FastDOG: Fast Discrete Optimization on GPU Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a massively parallel Lagrange decomposition method for solving 0–1 integer linear programs occurring in structured prediction. |
Ahmed Abbas; Paul Swoboda; |
271 | Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our contribution is two-fold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. |
Sara Elkerdawy; Mostafa Elhoushi; Hong Zhang; Nilanjan Ray; |
272 | Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel multi-source uncertainty mining method to facilitate unsupervised deep learning from multiple noisy labels generated by traditional handcrafted SOD methods. |
Yifan Wang; Wenbo Zhang; Lijun Wang; Ting Liu; Huchuan Lu; |
273 | Self-Supervised Equivariant Learning for Oriented Keypoint Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To learn to detect robust oriented keypoints, we introduce a self-supervised learning framework using rotation-equivariant CNNs. |
Jongmin Lee; Byungjin Kim; Minsu Cho; |
274 | Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The results show that GANs, especially small GANs lack the ability to generate high-quality high frequency information. To address this problem, we propose a novel knowledge distillation method referred to as wavelet knowledge distillation. |
Linfeng Zhang; Xin Chen; Xiaobing Tu; Pengfei Wan; Ning Xu; Kaisheng Ma; |
275 | Focal and Global Knowledge Distillation for Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. |
Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan; |
276 | Learning To Prompt for Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. |
Zifeng Wang; Zizhao Zhang; Chen-Yu Lee; Han Zhang; Ruoxi Sun; Xiaoqi Ren; Guolong Su; Vincent Perot; Jennifer Dy; Tomas Pfister; |
277 | Human Mesh Recovery From Multiple Shots Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing 3D human understanding methods. In this paper, we address these limitations with the insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. |
Georgios Pavlakos; Jitendra Malik; Angjoo Kanazawa; |
278 | Improving Adversarial Transferability Via Neuron Attribution-Based Attacks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing feature-level attacks generally employ inaccurate neuron importance estimations, which deteriorates their transferability. To overcome such pitfalls, in this paper, we propose the Neuron Attribution-based Attack (NAA), which conducts feature-level attacks with more accurate neuron importance estimations. |
Jianping Zhang; Weibin Wu; Jen-tse Huang; Yizhan Huang; Wenxuan Wang; Yuxin Su; Michael R. Lyu; |
279 | Better Trigger Inversion Optimization in Backdoor Scanning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a new optimization method that directly minimizes individual pixel changes, without using a mask. |
Guanhong Tao; Guangyu Shen; Yingqi Liu; Shengwei An; Qiuling Xu; Shiqing Ma; Pan Li; Xiangyu Zhang; |
280 | GANSeg: Learning To Segment By Unsupervised Hierarchical Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Weakly-supervised and unsupervised methods exist, but they depend on the comparison of pairs of images, such as from multi-views, frames of videos, and image augmentation, which limits their applicability. To address this, we propose a GAN-based approach that generates images conditioned on latent masks, thereby alleviating full or weak annotations required in previous approaches. |
Xingzhe He; Bastian Wandt; Helge Rhodin; |
281 | Dense Learning Based Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although a few works have proposed various self-training-based methods or consistency-regularization-based methods, they all target anchor-based detectors, while ignoring the dependency on anchor-free detectors of the actual industrial deployment. To this end, in this paper, we intend to bridge the gap on anchor-free SSOD algorithm by proposing a DenSe Learning (DSL) based algorithm for SSOD. |
Binghui Chen; Pengyu Li; Xiang Chen; Biao Wang; Lei Zhang; Xian-Sheng Hua; |
282 | Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To mimic humans’ mental simulation process, we present FixNet, a novel framework that seamlessly incorporates perception and physical dynamics. |
Yining Hong; Kaichun Mo; Li Yi; Leonidas J. Guibas; Antonio Torralba; Joshua B. Tenenbaum; Chuang Gan; |
283 | Convolution of Convolution: Let Kernels Spatially Collaborate Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the biological visual pathway, especially the retina, neurons are tiled along spatial dimensions with the electrical coupling as their local association, while in a convolution layer, kernels are placed along the channel dimension singly. We propose Convolution of Convolution, associating kernels in a layer and letting them collaborate spatially. |
Rongzhen Zhao; Jian Li; Zhenzhi Wu; |
284 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The key challenges of TI2V task lie both in aligning appearance and motion from different modalities, and in handling uncertainty in text descriptions. To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor (MA) structure to store appearance-motion aligned representation. |
Yaosi Hu; Chong Luo; Zhenzhong Chen; |
285 | C2AM Loss: Chasing A Better Decision Boundary for Long-Tail Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, we devise a Category-Aware Angular Margin Loss (C2AM Loss) to introduce an adaptive angular margin between any two categories |
Tong Wang; Yousong Zhu; Yingying Chen; Chaoyang Zhao; Bin Yu; Jinqiao Wang; Ming Tang; |
286 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Neural Points, a novel point cloud representation and apply it to the arbitrary-factored upsampling task. |
Wanquan Feng; Jin Li; Hongrui Cai; Xiaonan Luo; Juyong Zhang; |
287 | Distribution Consistent Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead, in this paper, we propose a novel distribution consistent one-shot neural architecture search algorithm. |
Junyi Pan; Chong Sun; Yizhou Zhou; Ying Zhang; Chen Li; |
288 | Video-Text Representation Learning Via Differentiable Weak Temporal Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW). |
Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim; |
289 | Bi-Directional Object-Context Prioritization Learning for Saliency Ranking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. |
Xin Tian; Ke Xu; Xin Yang; Lin Du; Baocai Yin; Rynson W.H. Lau; |
290 | FreeSOLO: Learning To Segment Objects Without Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FreeSOLO, a self-supervised instance segmentation framework built on top of the simple instance segmentation method SOLO. |
Xinlong Wang; Zhiding Yu; Shalini De Mello; Jan Kautz; Anima Anandkumar; Chunhua Shen; Jose M. Alvarez; |
291 | What Do Navigation Agents Learn About Their Environment? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal (PointNav) and Object Goal (ObjectNav) navigation models. |
Kshitij Dwivedi; Gemma Roig; Aniruddha Kembhavi; Roozbeh Mottaghi; |
292 | Progressive Minimal Path Method With Embedded CNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Path-CNN, a method for the segmentation of centerlines of tubular structures by embedding convolutional neural networks (CNNs) into the progressive minimal path method. |
Wei Liao; |
293 | FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Robust visual recognition under adverse weather conditions is of great importance in real-world applications. In this context, we propose a new method for learning semantic segmentation models robust against fog. |
Sohyun Lee; Taeyoung Son; Suha Kwak; |
294 | 3D Human Tongue Reconstruction From Single "In-the-Wild" Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we present the first, to the best of our knowledge, end-to-end trainable pipeline that accurately reconstructs the 3D face together with the tongue. |
Stylianos Ploumpis; Stylianos Moschoglou; Vasileios Triantafyllou; Stefanos Zafeiriou; |
295 | Enhancing Adversarial Robustness for Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Conversely, we propose Hardness Manipulation to efficiently perturb the training triplet till a specified level of hardness for adversarial training, according to a harder benign triplet or a pseudo-hardness function. |
Mo Zhou; Vishal M. Patel; |
296 | Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, ViTs are mainly designed for image classification that generate single-scale low-resolution representations, which makes dense prediction tasks such as semantic segmentation challenging for ViTs. Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs. |
Jiaqi Gu; Hyoukjun Kwon; Dilin Wang; Wei Ye; Meng Li; Yu-Hsin Chen; Liangzhen Lai; Vikas Chandra; David Z. Pan; |
297 | Lite-MDETR: A Lightweight Multi-Modal Detector Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a Lightweight modulated detector, Lite-MDETR, to facilitate efficient end-to-end multi-modal understanding on mobile devices. |
Qian Lou; Yen-Chang Hsu; Burak Uzkent; Ting Hua; Yilin Shen; Hongxia Jin; |
298 | CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. |
Jiteng Mu; Shalini De Mello; Zhiding Yu; Nuno Vasconcelos; Xiaolong Wang; Jan Kautz; Sifei Liu; |
299 | A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a simple transfer learning baseline for sign language translation. |
Yutong Chen; Fangyun Wei; Xiao Sun; Zhirong Wu; Stephen Lin; |
300 | Unsupervised Visual Representation Learning By Online Constrained K-Means Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these challenges, we first investigate the objective of clustering-based representation learning. Based on this, we propose a novel clustering-based pretext task with online Constrained K-means (CoKe). |
Qi Qian; Yuanhong Xu; Juhua Hu; Hao Li; Rong Jin; |
301 | Neural Point Light Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Neural Point Light Fields that represent scenes implicitly with a light field living on a sparse point cloud. |
Julian Ost; Issam Laradji; Alejandro Newell; Yuval Bahat; Felix Heide; |
302 | Vehicle Trajectory Prediction Works, But Not Everywhere Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel method that automatically generates realistic scenes causing state-of-the-art models to go off-road. |
Mohammadhossein Bahari; Saeed Saadatnejad; Ahmad Rahimi; Mohammad Shaverdikondori; Amir Hossein Shahidzadeh; Seyed-Mohsen Moosavi-Dezfooli; Alexandre Alahi; |
303 | PSMNet: Position-Aware Stereo Merging Network for Room Layout Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new deep learning-based method for estimating room layout given a pair of 360 panoramas. |
Haiyan Wang; Will Hutchcroft; Yuguang Li; Zhiqiang Wan; Ivaylo Boyadzhiev; Yingli Tian; Sing Bing Kang; |
304 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. |
Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu; |
305 | Learning Graph Regularisation for Guided Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a novel formulation for guided super-resolution. |
Riccardo de Lutio; Alexander Becker; Stefano D’Aronco; Stefania Russo; Jan D. Wegner; Konrad Schindler; |
306 | Instance-Wise Occlusion and Depth Orders in Natural Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new dataset, named InstaOrder, that can be used to understand the spatial relationships of instances in a 3D space. |
Hyunmin Lee; Jaesik Park; |
307 | Look for The Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we seek to temporally localize object states (e.g. "empty" and "full" cup) together with the corresponding state-modifying actions ("pouring coffee") in long uncurated videos with minimal supervision. |
Tomáš Souček; Jean-Baptiste Alayrac; Antoine Miech; Ivan Laptev; Josef Sivic; |
308 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents new hierarchically cascaded transformers that can improve data efficiency through attribute surrogates learning and spectral tokens pooling. |
Yangji He; Weihan Liang; Dongyang Zhao; Hong-Yu Zhou; Weifeng Ge; Yizhou Yu; Wenqiang Zhang; |
309 | Generalized Category Discovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. |
Sagar Vaze; Kai Han; Andrea Vedaldi; Andrew Zisserman; |
310 | Maximum Consensus By Weighted Influences of Monotone Boolean Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper studies the concept of weighted influences for solving MaxCon. |
Erchuan Zhang; David Suter; Ruwan Tennakoon; Tat-Jun Chin; Alireza Bab-Hadiashar; Giang Truong; Syed Zulqarnain Gilani; |
311 | TransforMatcher: Match-to-Match Attention for Semantic Correspondence Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a strong semantic image matching learner, dubbed TransforMatcher, which builds on the success of transformer networks in vision domains. |
Seungwook Kim; Juhong Min; Minsu Cho; |
312 | Robust Outlier Detection By De-Biasing VAE Likelihoods Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose novel analytical and algorithmic approaches to ameliorate key biases with VAE likelihoods. |
Kushal Chauhan; Barath Mohan U; Pradeep Shenoy; Manish Gupta; Devarajan Sridharan; |
313 | Contour-Hugging Heatmaps for Landmark Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an effective and easy-to-implement method for simultaneously performing landmark detection in images and obtaining an ingenious uncertainty measurement for each landmark. |
James McCouat; Irina Voiculescu; |
314 | Voxel Field Fusion for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. |
Yanwei Li; Xiaojuan Qi; Yukang Chen; Liwei Wang; Zeming Li; Jian Sun; Jiaya Jia; |
315 | Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite showing competitive performance on novel classes, they fail to generalize to recognizing samples from both base and novel sets. In this paper, we focus on this generalized setting of NCD (GNCD), and propose to divide and conquer it with two groups of Compositional Experts (ComEx). |
Muli Yang; Yuehua Zhu; Jiaping Yu; Aming Wu; Cheng Deng; |
316 | Programmatic Concept Learning for Human Motion Description and Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Programmatic Motion Concepts, a hierarchical motion representation for human actions that captures both low level motion and high level description as motion concepts. |
Sumith Kulal; Jiayuan Mao; Alex Aiken; Jiajun Wu; |
317 | Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we want to make a step forward towards interpretability in neural networks, providing new tools to interpret their behavior. |
Nicola Garau; Niccolò Bisagno; Zeno Sambugaro; Nicola Conci; |
318 | Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recent studies have shown high completion performance with a relatively small window size, but experiments with large window sizes require huge amount of memory and cannot be easily calculated. In this study, we address this serious computational issue, and propose its fast and efficient algorithm. |
Ryuki Yamamoto; Hidekata Hontani; Akira Imakura; Tatsuya Yokota; |
319 | Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel framework to integrate both semantic and instance contexts for panoptic segmentation. |
Shubhankar Borse; Hyojin Park; Hong Cai; Debasmit Das; Risheek Garrepalli; Fatih Porikli; |
320 | Point2Seq: Detecting 3D Objects As Sequences Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. |
Yujing Xue; Jiageng Mao; Minzhe Niu; Hang Xu; Michael Bi Mi; Wei Zhang; Xiaogang Wang; Xinchao Wang; |
321 | Less Is More: Generating Grounded Navigation Instructions From Landmarks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. |
Su Wang; Ceslee Montgomery; Jordi Orbay; Vighnesh Birodkar; Aleksandra Faust; Izzeddin Gur; Natasha Jaques; Austin Waters; Jason Baldridge; Peter Anderson; |
322 | Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we instead propose task-adaptive negative class envision for FSOR to integrate threshold tuning into the learning process. |
Shiyuan Huang; Jiawei Ma; Guangxing Han; Shih-Fu Chang; |
323 | DisARM: Displacement Aware Relation Module for 3D Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Displacement Aware Relation Module (DisARM), a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes. |
Yao Duan; Chenyang Zhu; Yuqing Lan; Renjiao Yi; Xinwang Liu; Kai Xu; |
324 | ETHSeg: An Amodel Instance Segmentation Network and A Real-World Dataset for X-Ray Waste Inspection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a novel problem of instance-level waste segmentation in X-ray image for intelligent waste inspection, and contribute a real dataset consisting of 5,038 X-ray images (totally 30,881 waste items) with high-quality annotations (i.e., waste categories, object boxes, and instance-level masks) as a benchmark for this problem. |
Lingteng Qiu; Zhangyang Xiong; Xuhao Wang; Kenkun Liu; Yihan Li; Guanying Chen; Xiaoguang Han; Shuguang Cui; |
325 | MixFormer: Mixing Features Across Windows and Dimensions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self-attention within non-overlapped windows and shares weights on the channel dimension. We propose MixFormer to find a solution. |
Qiang Chen; Qiman Wu; Jian Wang; Qinghao Hu; Tao Hu; Errui Ding; Jian Cheng; Jingdong Wang; |
326 | Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs By Partial FC Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a sparsely updating variant of the FC layer, named Partial FC (PFC). |
Xiang An; Jiankang Deng; Jia Guo; Ziyong Feng; XuHan Zhu; Jing Yang; Tongliang Liu; |
327 | NeRF-Editing: Geometry Editing of Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a method that allows users to perform controllable shape deformation on the implicit representation of the scene, and synthesizes the novel view images of the edited scene without re-training the network. |
Yu-Jie Yuan; Yang-Tian Sun; Yu-Kun Lai; Yuewen Ma; Rongfei Jia; Lin Gao; |
328 | Optimal Correction Cost for Object Detection Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the gap between downstream tasks and the evaluation scenario, we propose Optimal Correction Cost (OC-cost), which assesses detection accuracy at image level. |
Mayu Otani; Riku Togashi; Yuta Nakashima; Esa Rahtu; Janne Heikkilä; Shin’ichi Satoh; |
329 | Contextual Similarity Distillation for Asymmetric Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing approaches either fail to achieve feature coherence or make strong assumptions, e.g., requiring labeled datasets or classifiers from large model, etc., which limits their practical application. To this end, we propose a flexible contextual similarity distillation framework to enhance the small query model and keep its output feature compatible with that of large gallery model, which is crucial with asymmetric retrieval. |
Hui Wu; Min Wang; Wengang Zhou; Houqiang Li; Qi Tian; |
330 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose to parse pairwise query and exemplar action instances into consecutive steps with diverse semantic and temporal correspondences. |
Jinglin Xu; Yongming Rao; Xumin Yu; Guangyi Chen; Jie Zhou; Jiwen Lu; |
331 | Artistic Style Discovery With Independent Components Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we take a closer look into the mechanism of style transfer and obtain different artistic style components from the latent space consisting of different style features. |
Xin Xie; Yi Li; Huaibo Huang; Haiyan Fu; Wanwan Wang; Yanqing Guo; |
332 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. |
Jiacheng Chen; Yiming Qian; Yasutaka Furukawa; |
333 | HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce this approach into the realm of encoder-based inversion. |
Yuval Alaluf; Omer Tov; Ron Mokady; Rinon Gal; Amit Bermano; |
334 | DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The capability of the traditional semi-supervised learning (SSL) methods is far from real-world application due to severely biased pseudo-labels caused by (1) class imbalance and (2) class distribution mismatch between labeled and unlabeled data. This paper addresses such a relatively under-explored problem. |
Youngtaek Oh; Dong-Jin Kim; In So Kweon; |
335 | Mobile-Former: Bridging MobileNet and Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. |
Yinpeng Chen; Xiyang Dai; Dongdong Chen; Mengchen Liu; Xiaoyi Dong; Lu Yuan; Zicheng Liu; |
336 | Exploiting Pseudo Labels in A Self-Supervised Learning Framework for Improved Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel self-distillation based self-supervised monocular depth estimation (SD-SSMDE) learning framework. |
Andra Petrovai; Sergiu Nedevschi; |
337 | DESTR: Object Detection With Split Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: First, we propose a new Detection Split Transformer (DESTR) that separates estimation of cross-attention into two independent branches — one tailored for classification and the other for box regression. |
Liqiang He; Sinisa Todorovic; |
338 | LTP: Lane-Based Trajectory Prediction for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a two-stage proposal-based motion forecasting method that exploits the sliced lane segments as fine-grained, shareable, and interpretable proposals. |
Jingke Wang; Tengju Ye; Ziqing Gu; Junbo Chen; |
339 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the difficulties, we propose a new framework for scribble learning-based medical image segmentation, which is composed of mix augmentation and cycle consistency and thus is referred to as CycleMix. |
Ke Zhang; Xiahai Zhuang; |
340 | VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. |
Zeyuan Chen; Yinbo Chen; Jingwen Liu; Xingqian Xu; Vidit Goel; Zhangyang Wang; Humphrey Shi; Xiaolong Wang; |
341 | Towards End-to-End Unified Scene Text Detection and Layout Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. |
Shangbang Long; Siyang Qin; Dmitry Panteleev; Alessandro Bissacco; Yasuhisa Fujii; Michalis Raptis; |
342 | Image Based Reconstruction of Liquids From 2D Surface Detections Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a solution to the challenging problem of reconstructing liquids from image data. |
Florian Richter; Ryan K. Orosco; Michael C. Yip; |
343 | Contextual Outpainting With Object-Level Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To explore the semantic cues provided by the remaining foreground contents, we propose a novel ConTextual Outpainting GAN (CTO-GAN), leveraging the semantic layout as a bridge to synthesize coherent and diverse background contents. |
Jiacheng Li; Chang Chen; Zhiwei Xiong; |
344 | AP-BSN: Self-Supervised Denoising for Real-World Images Via Asymmetric PD and Blind-Spot Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it is not trivial to integrate PD and BSN directly, which prevents the fully self-supervised denoising model on real-world images. We propose an Asymmetric PD (AP) to address this issue, which introduces different PD stride factors for training and inference. |
Wooseok Lee; Sanghyun Son; Kyoung Mu Lee; |
345 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. |
Paritosh Mittal; Yen-Chi Cheng; Maneesh Singh; Shubham Tulsiani; |
346 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we first show that optimal neural architectures in the DIP framework are image-dependent. Leveraging this insight, we then propose an image-specific NAS strategy for the DIP framework that requires substantially less training than typical NAS approaches, effectively enabling image-specific NAS. |
Metin Ersin Arican; Ozgur Kara; Gustav Bredell; Ender Konukoglu; |
347 | Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The resulting small-motion parallax between video frames makes standard geometry-based SfM approaches not as effective for movies and TV shows. To address this challenge, we propose a simple yet effective approach that uses single-frame depth-prior obtained from a pretrained network to significantly improve geometry-based SfM for our small-parallax setting. |
Sheng Liu; Xiaohan Nie; Raffay Hamid; |
348 | End-to-End Referring Video Object Segmentation With Multimodal Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple Transformer-based approach to RVOS. |
Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin; |
349 | Unpaired Cartoon Image Synthesis Via Gated Cycle Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a general-purpose solution to cartoon image synthesis with unpaired training data. |
Yifang Men; Yuan Yao; Miaomiao Cui; Zhouhui Lian; Xuansong Xie; Xian-Sheng Hua; |
350 | IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present IterMVS, a new data-driven method for high-resolution multi-view stereo. |
Fangjinhua Wang; Silvano Galliani; Christoph Vogel; Marc Pollefeys; |
351 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, the foreground points are inherently more important than background points for object detectors. Motivated by this, we propose a highly-efficient single-stage point-based 3D detector in this paper, termed IA-SSD. |
Yifan Zhang; Qingyong Hu; Guoquan Xu; Yanxin Ma; Jianwei Wan; Yulan Guo; |
352 | FedCorr: Multi-Stage Federated Learning for Label Noise Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose FedCorr, a general multi-stage framework to tackle heterogeneous label noise in FL, without making any assumptions on the noise models of local clients, while still maintaining client data privacy. |
Jingyi Xu; Zihan Chen; Tony Q.S. Quek; Kai Fong Ernest Chong; |
353 | Detecting Camouflaged Object in Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To well involve the frequency clues into the CNN models, we present a powerful network with two special components. |
Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding; |
354 | RigNeRF: Fully Controllable Neural 3D Portraits Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose RigNeRF, a system that goes beyond just novel view synthesis and enables full control of head pose and facial expressions learned from a single portrait video. |
ShahRukh Athar; Zexiang Xu; Kalyan Sunkavalli; Eli Shechtman; Zhixin Shu; |
355 | CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While significant recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape generation that circumvents such data scarcity. |
Aditya Sanghi; Hang Chu; Joseph G. Lambourne; Ye Wang; Chin-Yi Cheng; Marco Fumero; Kamal Rahimi Malekshan; |
356 | Style-Based Global Appearance Flow for Virtual Try-On Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They are thus intrinsically susceptible to difficult body poses/occlusions and large mis-alignments between person and garment images. To overcome this limitation, a novel global appearance flow estimation model is proposed in this work. |
Sen He; Yi-Zhe Song; Tao Xiang; |
357 | Source-Free Object Detection By Learning To Overlook Domain Style Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This approach suffers from both unsatisfactory accuracy of pseudo labels due to the presence of domain shift and limited use of target domain training data. In this work, we present a novel Learning to Overlook Domain Style (LODS) method with such limitations solved in a principled manner. |
Shuaifeng Li; Mao Ye; Xiatian Zhu; Lihua Zhou; Lin Xiong; |
358 | Active Learning for Open-Set Annotation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, in real annotation tasks, the unlabeled data usually contains a large amount of examples from unknown classes, resulting in the failure of most active learning methods. To tackle this open-set annotation (OSA) problem, we propose a new active learning framework called LfOSA, which boosts the classification performance with an effective sampling strategy to precisely detect examples from known classes for annotation. |
Kun-Peng Ning; Xun Zhao; Yu Li; Sheng-Jun Huang; |
359 | SceneSqueezer: Learning To Compress Scene for Camera Relocalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We design a novel framework that compresses a scene while still maintaining localization accuracy. |
Luwei Yang; Rakesh Shrestha; Wenbo Li; Shuaicheng Liu; Guofeng Zhang; Zhaopeng Cui; Ping Tan; |
360 | SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video. |
Boyi Jiang; Yang Hong; Hujun Bao; Juyong Zhang; |
361 | Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address this problem, we have noticed that, there are psychological and physiological evidences showing that we humans are more likely to annotate instances of similar appearances to the same classes, and thus poor-quality or ambiguous instances of similar appearances are easier to be mislabeled to the correlated or same noisy classes. Therefore, we propose assumption on the geometry of T(x) that the closer two instances are, the more similar their corresponding transition matrices should be.. |
De Cheng; Tongliang Liu; Yixiong Ning; Nannan Wang; Bo Han; Gang Niu; Xinbo Gao; Masashi Sugiyama; |
362 | Rethinking The Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Treating each type of augmentation equally during training makes the model learn non-optimal representations for various downstream tasks and limits the flexibility to choose augmentation types beforehand. Second, the strong data augmentations used in classic contrastive learning methods may bring too much invariance in some cases, and fine-grained information that is essential to some downstream tasks may be lost. This paper proposes a general method to alleviate these two problems by considering "where" and "what" to contrast in a general contrastive learning framework. |
Junbo Zhang; Kaisheng Ma; |
363 | Self-Supervised Models Are Continual Learners Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. |
Enrico Fini; Victor G. Turrisi da Costa; Xavier Alameda-Pineda; Elisa Ricci; Karteek Alahari; Julien Mairal; |
364 | Dreaming To Prune Image Deraining Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We note that it is overstretched to fine-tune the compressed model using self-collected data, as it exhibits poor generalization over images with different degradation characteristics. To address this problem, we propose a novel data-free compression framework for deraining networks. |
Weiqi Zou; Yang Wang; Xueyang Fu; Yang Cao; |
365 | Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel and simple framework to achieve equivariance for point cloud analysis based on the message passing (graph neural network) scheme. |
Shitong Luo; Jiahan Li; Jiaqi Guan; Yufeng Su; Chaoran Cheng; Jian Peng; Jianzhu Ma; |
366 | When Does Contrastive Visual Representation Learning Work? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By looking through the lenses of data quantity, data domain, data quality, and task granularity, we provide new insights into the necessary conditions for successful self-supervised learning. |
Elijah Cole; Xuan Yang; Kimberly Wilber; Oisin Mac Aodha; Serge Belongie; |
367 | One Step at A Time: Long-Horizon Vision-and-Language Navigation With Milestones Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-Track) to guide the agent and monitor its progress. |
Chan Hee Song; Jihyung Kil; Tai-Yu Pan; Brian M. Sadler; Wei-Lun Chao; Yu Su; |
368 | Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a simple-yet-effective self-supervised node representation learning strategy via directly maximizing the mutual information between the hidden representations of nodes and their neighbourhood, which can be theoretically justified by its link to graph smoothing. |
Wei Dong; Junsheng Wu; Yi Luo; Zongyuan Ge; Peng Wang; |
369 | Point Cloud Pre-Training With Natural 3D Structures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, constructing a largescale 3D point clouds dataset is difficult. In order to remedy this issue, we propose a newly developed point cloud fractal database (PC-FractalDB), which is a novel family of formula-driven supervised learning inspired by fractal geometry encountered in natural 3D structures. |
Ryosuke Yamada; Hirokatsu Kataoka; Naoya Chiba; Yukiyasu Domae; Tetsuya Ogata; |
370 | Scene Consistency Representation Learning for Video Scene Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Spotting the correct scene boundary from the long-term video is a challenging task, since a model must understand the storyline of the video to figure out where a scene starts and ends. To this end, we propose an effective Self-Supervised Learning (SSL) framework to learn better shot representations from unlabeled long-term videos. |
Haoqian Wu; Keyu Chen; Yanan Luo; Ruizhi Qiao; Bo Ren; Haozhe Liu; Weicheng Xie; Linlin Shen; |
371 | Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. |
Tianyu Pang; Huishuai Zhang; Di He; Yinpeng Dong; Hang Su; Wei Chen; Jun Zhu; Tie-Yan Liu; |
372 | Exploiting Explainable Metrics for Augmented SGD Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the following question: can we probe intermediate layers of a deep neural network to identify and quantify the learning quality of each layer? |
Mahdi S. Hosseini; Mathieu Tuli; Konstantinos N. Plataniotis; |
373 | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames. |
Jiafan Zhuang; Zilei Wang; Yuan Gao; |
374 | GenDR: A Generalized Differentiable Renderer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present and study a generalized family of differentiable renderers. |
Felix Petersen; Bastian Goldluecke; Christian Borgelt; Oliver Deussen; |
375 | Improving Neural Implicit Surfaces Geometry With Patch Warping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Neural implicit surfaces have become an important technique for multi-view 3D reconstruction but their accuracy remains limited. In this paper, we argue that this comes from the difficulty to learn and render high frequency textures with neural networks. |
François Darmon; Bénédicte Bascle; Jean-Clément Devaux; Pascal Monasse; Mathieu Aubry; |
376 | XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a robust layout-aware multimodal network named XYLayoutLM to capture and leverage rich layout information from proper reading orders produced by our Augmented XY Cut. |
Zhangxuan Gu; Changhua Meng; Ke Wang; Jun Lan; Weiqiang Wang; Ming Gu; Liqing Zhang; |
377 | Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This task is particularly challenging for deep neural networks because data is difficult to obtain and annotate. Therefore, we formulate amodal segmentation as an out-of-task and out-of-distribution generalization problem. |
Yihong Sun; Adam Kortylewski; Alan Yuille; |
378 | How Well Do Sparse ImageNet Models Transfer? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned–that is, compressed by sparsifiying their connections. |
Eugenia Iofinova; Alexandra Peste; Mark Kurtz; Dan Alistarh; |
379 | REX: Reasoning-Aware and Grounded Explanation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. |
Shi Chen; Qi Zhao; |
380 | Dynamic Dual-Output Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we reveal some of the causes that affect the generation quality of diffusion models, especially when sampling with few iterations, and come up with a simple, yet effective, solution to mitigate them. |
Yaniv Benny; Lior Wolf; |
381 | StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis. |
Zhiheng Li; Martin Renqiang Min; Kai Li; Chenliang Xu; |
382 | JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. |
Karl D.D. Willis; Pradeep Kumar Jayaraman; Hang Chu; Yunsheng Tian; Yifei Li; Daniele Grandi; Aditya Sanghi; Linh Tran; Joseph G. Lambourne; Armando Solar-Lezama; Wojciech Matusik; |
383 | CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation Via Neural Homeomorphism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Canonical Deformation Coordinate Space (CaDeX), a unified representation of both shape and nonrigid motion. |
Jiahui Lei; Kostas Daniilidis; |
384 | Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations. |
Yang You; Zelin Ye; Yujing Lou; Chengkun Li; Yong-Lu Li; Lizhuang Ma; Weiming Wang; Cewu Lu; |
385 | V-Doc: Visual Questions Answers With Documents Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose V-Doc, a question-answering tool using document images and PDF, mainly for researchers and general non-deep learning experts looking to generate, process, and understand the document visual question answering tasks. |
Yihao Ding; Zhe Huang; Runlin Wang; YanHang Zhang; Xianru Chen; Yuzhong Ma; Hyunsuk Chung; Soyeon Caren Han; |
386 | AEGNN: Asynchronous Event-Based Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For this reason, recent works have adopted Graph Neural Networks (GNNs), which process events as static spatio-temporal graphs, which are inherently sparse. We take this trend one step further by introducing Asynchronous, Event-based Graph Neural Networks (AEGNNs), a novel event-processing paradigm that generalizes standard GNNs to process events as evolving spatio-temporal graphs. |
Simon Schaefer; Daniel Gehrig; Davide Scaramuzza; |
387 | Layer-Wised Model Aggregation for Personalized Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel pFL training framework dubbed Layer-wised Personalized Federated learning (pFedLA) that can discern the importance of each layer from different clients, and thus is able to optimize the personalized model aggregation for clients with heterogeneous data. |
Xiaosong Ma; Jie Zhang; Song Guo; Wenchao Xu; |
388 | Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks Via Singular Values Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Polarity Sampling, a theoretically justified plug-and-play method for controlling the generation quality and diversity of any pre-trained deep generative network (DGN). |
Ahmed Imtiaz Humayun; Randall Balestriero; Richard Baraniuk; |
389 | Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we present a colorization network that generates flat-color icons according to given sketches and semantic colorization styles. |
Yuan-kui Li; Yun-Hsuan Lien; Yu-Shuen Wang; |
390 | Object-Aware Video-Language Pre-Training for Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations. |
Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou; |
391 | OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an orientation-sensitive keypoint based rotated detector OSKDet. |
Dongchen Lu; Dongmei Li; Yali Li; Shengjin Wang; |
392 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. |
Wenbo Li; Zhe Lin; Kun Zhou; Lu Qi; Yi Wang; Jiaya Jia; |
393 | Exploring Geometric Consistency for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we design a series of geometric manipulations to diagnose existing detectors and then illustrate their vulnerability to consistently associate the depth with object apparent sizes and positions. To alleviate this issue, we propose four geometry-aware data augmentation approaches to enhance the geometric consistency of the detectors. |
Qing Lian; Botao Ye; Ruijia Xu; Weilong Yao; Tong Zhang; |
394 | Neural Window Fully-Connected CRFs for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization. |
Weihao Yuan; Xiaodong Gu; Zuozhuo Dai; Siyu Zhu; Ping Tan; |
395 | CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers. |
Tianchen Zhao; Niansong Zhang; Xuefei Ning; He Wang; Li Yi; Yu Wang; |
396 | Uncertainty-Aware Deep Multi-View Photometric Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a simple and effective solution to the longstanding classical multi-view photometric stereo (MVPS) problem. |
Berk Kaya; Suryansh Kumar; Carlos Oliveira; Vittorio Ferrari; Luc Van Gool; |
397 | Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore a new type of extrinsic method to directly align two geometric shapes with point-to-point correspondences in ambient space by recovering a deformation, which allows more continuous and smooth maps to be obtained. |
Aoxiang Fan; Jiayi Ma; Xin Tian; Xiaoguang Mei; Wei Liu; |
398 | Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the great success of self-supervised representation learning with contrastive objectives, in this paper, we design an Unsupervised Pre-training framework for ReID based on the contrastive learning (CL) pipeline, dubbed UP-ReID. |
Zizheng Yang; Xin Jin; Kecheng Zheng; Feng Zhao; |
399 | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment. |
Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi; |
400 | A Unified Query-Based Paradigm for Point Cloud Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel Embedding-Querying paradigm (EQ- Paradigm) for 3D understanding tasks including detection, segmentation and classification. |
Zetong Yang; Li Jiang; Yanan Sun; Bernt Schiele; Jiaya Jia; |
401 | It’s About Time: Analog Clock Reading in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a framework for reading analog clocks in natural images or videos. |
Charig Yang; Weidi Xie; Andrew Zisserman; |
402 | MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). |
Jiemin Fang; Lingxi Xie; Xinggang Wang; Xiaopeng Zhang; Wenyu Liu; Qi Tian; |
403 | Cross Modal Retrieval With Querybank Normalisation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. |
Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie; |
404 | Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, we found that such salience predictors cannot be easily trained when they are naively applied to contrastive learning from scratch. To address this issue, we propose contrastive dual gating(CDG), a novel dynamic pruning algorithm that skips the uninformative features during contrastive learning without hurting the trainability of the networks. |
Jian Meng; Li Yang; Jinwoo Shin; Deliang Fan; Jae-sun Seo; |
405 | Universal Photometric Stereo Network Using Global Lighting Contexts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unlike existing tasks that assumed specific physical lighting models; hence, drastically limited their usability, a solution algorithm of this task is supposed to work for objects with diverse shapes and materials under arbitrary lighting variations without assuming any specific models. To solve this extremely challenging task, we present a purely data-driven method, which eliminates the prior assumption of lighting by replacing the recovery of physical lighting parameters with the extraction of the generic lighting representation, named global lighting contexts. |
Satoshi Ikehata; |
406 | Hire-MLP: Vision MLP Via Hierarchical Rearrangement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents Hire-MLP, a simple yet competitive vision MLP architecture via Hierarchical rearrangement, which contains two levels of rearrangements. |
Jianyuan Guo; Yehui Tang; Kai Han; Xinghao Chen; Han Wu; Chao Xu; Chang Xu; Yunhe Wang; |
407 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera. |
Yu Zhan; Fenghai Li; Renliang Weng; Wongun Choi; |
408 | Occluded Human Mesh Recovery Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Consequently, top-down methods have difficulties in recovering accurate 3D human meshes under severe person-person occlusion. To address this, we present Occluded Human Mesh Recovery (OCHMR) – a novel top-down mesh recovery approach that incorporates image spatial context to overcome the limitations of the single-human assumption. |
Rawal Khirodkar; Shashank Tripathi; Kris Kitani; |
409 | Multi-Object Tracking Meets Moving UAV Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a UAVMOT network specially for multi-object tracking in UAV views. |
Shuai Liu; Xin Li; Huchuan Lu; You He; |
410 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. |
Bo He; Xitong Yang; Le Kang; Zhiyu Cheng; Xin Zhou; Abhinav Shrivastava; |
411 | Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces a probabilistic model named Uncertainty-Guided Probabilistic Transformer (UGPT) for complex action recognition. |
Hongji Guo; Hanjing Wang; Qiang Ji; |
412 | Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3. |
Xiaohan Ding; Xiangyu Zhang; Jungong Han; Guiguang Ding; |
413 | End-to-End Multi-Person Pose Estimation With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. |
Dahu Shi; Xing Wei; Liangqi Li; Ye Ren; Wenming Tan; |
414 | REGTR: End-to-End Point Cloud Correspondences With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we conjecture that attention mechanisms can replace the role of explicit feature matching and RANSAC, and thus propose an end-to-end framework to directly predict the final set of correspondences. |
Zi Jian Yew; Gim Hee Lee; |
415 | Neural 3D Scene Reconstruction With The Manhattan-World Assumption Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. |
Haoyu Guo; Sida Peng; Haotong Lin; Qianqian Wang; Guofeng Zhang; Hujun Bao; Xiaowei Zhou; |
416 | V2C: Visual Voice Cloning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video. |
Qi Chen; Mingkui Tan; Yuankai Qi; Jiaqiu Zhou; Yuanqing Li; Qi Wu; |
417 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we revisit the average precision (AP) loss and reveal that the crucial element is that of selecting the ranking pairs between positive and negative samples. |
Dongli Xu; Jinhong Deng; Wen Li; |
418 | 3DeformRS: Certifying Spatial Deformations on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose 3DeformRS, a method to certify the robustness of point cloud Deep Neural Networks (DNNs) against real-world deformations. |
Gabriel Pérez S.; Juan C. Pérez; Motasem Alfarra; Silvio Giancola; Bernard Ghanem; |
419 | ElePose: Unsupervised 3D Human Pose Estimation By Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unfortunately, labeled training data does not yet exist for many human activities since 3D annotation requires dedicated motion capture systems. Therefore, we propose an unsupervised approach that learns to predict a 3D human pose from a single image while only being trained with 2D pose data, which can be crowd-sourced and is already widely available. |
Bastian Wandt; James J. Little; Helge Rhodin; |
420 | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. |
Mattia Soldan; Alejandro Pardo; Juan León Alcázar; Fabian Caba; Chen Zhao; Silvio Giancola; Bernard Ghanem; |
421 | EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes to use neuromorphic events for correcting rolling shutter (RS) images as consecutive global shutter (GS) frames. |
Xinyu Zhou; Peiqi Duan; Yi Ma; Boxin Shi; |
422 | Gait Recognition in The Wild With Dense 3D Representations and A Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we propose a novel framework to explore the 3D Skinned Multi-Person Linear (SMPL) model of the human body for gait recognition, named SMPLGait. |
Jinkai Zheng; Xinchen Liu; Wu Liu; Lingxiao He; Chenggang Yan; Tao Mei; |
423 | ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, constructing both valid and diverse hand-object interactions and efficiently learning from the vast synthetic data is still challenging. To address the above issues, we propose ArtiBoost, a lightweight online data enhancement method. |
Lixin Yang; Kailin Li; Xinyu Zhan; Jun Lv; Wenqiang Xu; Jiefeng Li; Cewu Lu; |
424 | Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a deep learning approach that leverages temporal progression information to improve clinical outcome predictions from single-timepoint images. |
Aishik Konwer; Xuan Xu; Joseph Bae; Chao Chen; Prateek Prasanna; |
425 | QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To get the best of two worlds, we propose QueryDet that uses a novel query mechanism to accelerate the inference speed of feature-pyramid based object detectors. |
Chenhongyi Yang; Zehao Huang; Naiyan Wang; |
426 | IDEA-Net: Dynamic 3D Point Cloud Interpolation Via Deep Embedding Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle the challenges, we propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency. |
Yiming Zeng; Yue Qian; Qijian Zhang; Junhui Hou; Yixuan Yuan; Ying He; |
427 | UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose UNICON, a simple yet effective sample selection method which is robust to high label noise. |
Nazmul Karim; Mamshad Nayeem Rizve; Nazanin Rahnavard; Ajmal Mian; Mubarak Shah; |
428 | Learning From All Vehicles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes. |
Dian Chen; Philipp Krähenbühl; |
429 | BEHAVE: Dataset and Method for Tracking Human Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions. |
Bharat Lal Bhatnagar; Xianghui Xie; Ilya A. Petrov; Cristian Sminchisescu; Christian Theobalt; Gerard Pons-Moll; |
430 | Disentangled3D: Learning A 3D Generative Model With Disentangled Geometry and Appearance From Monocular Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations. |
Ayush Tewari; Mallikarjun B R; Xingang Pan; Ohad Fried; Maneesh Agrawala; Christian Theobalt; |
431 | Revisiting Random Channel Pruning for Neural Network Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we try to determine the channel configuration of the pruned models by random search. |
Yawei Li; Kamil Adamczewski; Wen Li; Shuhang Gu; Radu Timofte; Luc Van Gool; |
432 | One-Bit Active Query With Contrastive Pairs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a new active learning framework, which for the first time incorporates contrastive learning into recently proposed one-bit supervision. |
Yuhang Zhang; Xiaopeng Zhang; Lingxi Xie; Jie Li; Robert C. Qiu; Hengtong Hu; Qi Tian; |
433 | Estimating Egocentric 3D Human Pose in The Wild With External Weak Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new egocentric pose estimation method, which can be trained on the new dataset with weak external supervision. |
Jian Wang; Lingjie Liu; Weipeng Xu; Kripasindhu Sarkar; Diogo Luvizon; Christian Theobalt; |
434 | Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In existing MKD methods, mutual knowledge distillation is performed between models without scrutiny: a worse-performing model is allowed to generate knowledge to train a better-performing model, which may lead to collective failures. To address this problem, we propose a performance-aware MKD (PAMKD) approach for NAS, where knowledge generated by model A is allowed to train model B only if the performance of A is better than B. |
Pengtao Xie; Xuefeng Du; |
435 | Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we establish the first dataset of saliency e-commerce images (SalECI), which allows for learning to predict saliency on the e-commerce images. |
Lai Jiang; Yifei Li; Shengxi Li; Mai Xu; Se Lei; Yichen Guo; Bo Huang; |
436 | Topologically-Aware Deformation Fields for Single-View 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new framework to learn dense 3D reconstruction and correspondence from a single 2D image. |
Shivam Duggal; Deepak Pathak; |
437 | HyperInverter: Improving StyleGAN Inversion Via Hypernetwork Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: nfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. |
Tan M. Dinh; Anh Tuan Tran; Rang Nguyen; Binh-Son Hua; |
438 | Sparse Non-Local CRF Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new sparse non-local CRF: it has a sparse number of connections, but it has both local and non-local ones. |
Olga Veksler; Yuri Boykov; |
439 | Dataset Distillation By Matching Training Trajectories Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new formulation that optimizes our distilled data to guide networks to a similar state as those trained on real data across many training steps. |
George Cazenavette; Tongzhou Wang; Antonio Torralba; Alexei A. Efros; Jun-Yan Zhu; |
440 | Towards Driving-Oriented Metric for Lane Detection Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we design 2 new driving-oriented metrics for lane detection: End-to-End Lateral Deviation metric (E2E-LD) is directly formulated based on the requirements of autonomous driving, a core task downstream of lane detection; Per-frame Simulated Lateral Deviation metric (PSLD) is a lightweight surrogate metric of E2E-LD. |
Takami Sato; Qi Alfred Chen; |
441 | EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose on the SE(3) manifold, essentially bringing categorical Softmax to the continuous domain. |
Hansheng Chen; Pichao Wang; Fan Wang; Wei Tian; Lu Xiong; Hao Li; |
442 | Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: With desirable characteristics, reconstruction autoencoder-based methods deal with this problem by using input reconstruction error as a metric of novelty vs. normality. We formulate the essence of such approach as a quadruplet domain translation with an intrinsic bias to only query for a proxy of conditional data uncertainty. |
Yibo Zhou; |
443 | XYDeblur: Divide and Conquer for Single Image Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Toward an effective network architecture, we present complemental sub-solutions learning with a one-encoder-two-decoder architecture for single image deblurring. |
Seo-Won Ji; Jeongmin Lee; Seung-Wook Kim; Jun-Pyo Hong; Seung-Jin Baek; Seung-Won Jung; Sung-Jea Ko; |
444 | Generating Diverse and Natural 3D Human Motions From Text Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead of directly engaging with pose sequences, we propose motion snippet code as our internal motion representation, which captures local semantic motion contexts and is empirically shown to facilitate the generation of plausible motions faithful to the input text. |
Chuan Guo; Shihao Zou; Xinxin Zuo; Sen Wang; Wei Ji; Xingyu Li; Li Cheng; |
445 | E-CIR: Event-Enhanced Continuous Intensity Recovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents E-CIR, which converts a blurry image into a sharp video represented as a parametric function from time to intensity. |
Chen Song; Qixing Huang; Chandrajit Bajaj; |
446 | Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A systematic evaluation of key modules in existing methods is performed in terms of their robustness against adversarial attacks. From the insights of our analysis, we construct a more robust deraining method by integrating these effective modules. |
Yi Yu; Wenhan Yang; Yap-Peng Tan; Alex C. Kot; |
447 | STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To better evaluate pedestrian perception algorithms in crowded scenarios, we introduce a large-scale multimodal dataset, STCrowd. |
Peishan Cong; Xinge Zhu; Feng Qiao; Yiming Ren; Xidong Peng; Yuenan Hou; Lan Xu; Ruigang Yang; Dinesh Manocha; Yuexin Ma; |
448 | Deep Decomposition for Stochastic Normal-Abnormal Transport Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop a machine learning model, D^2-SONATA, built upon a stochastic advection-diffusion equation, which predicts the velocity and diffusion fields that drive 2D/3D image time-series of transport. |
Peirong Liu; Yueh Lee; Stephen Aylward; Marc Niethammer; |
449 | Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context, which compensates for the deficiency of the classical autoregressive model along pixel space. |
Minghui Hu; Yujie Wang; Tat-Jen Cham; Jianfei Yang; P.N. Suganthan; |
450 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a keypoint-based object-level SLAM framework that can provide globally consistent 6DoF pose estimates for symmetric and asymmetric objects alike. |
Nathaniel Merrill; Yuliang Guo; Xingxing Zuo; Xinyu Huang; Stefan Leutenegger; Xi Peng; Liu Ren; Guoquan Huang; |
451 | AziNorm: Exploiting The Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Point cloud, the most important data format for 3D environmental perception, is naturally endowed with strong radial symmetry. In this work, we exploit this radial symmetry via a divide-and-conquer strategy to boost 3D perception performance and ease optimization. |
Shaoyu Chen; Xinggang Wang; Tianheng Cheng; Wenqiang Zhang; Qian Zhang; Chang Huang; Wenyu Liu; |
452 | Towards Multimodal Depth Estimation From Light Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This allows us to supervise the multimodal depth prediction and also validate all methods by measuring the KL divergence of the predicted posteriors. With our thorough analysis and novel dataset, we aim to start a new line of depth estimation research that overcomes some of the long-standing limitations of this field. |
Titus Leistner; Radek Mackowiak; Lynton Ardizzone; Ullrich Köthe; Carsten Rother; |
453 | Learning To Recognize Procedural Activities With Distant Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we consider the problem of classifying fine-grained, multi-step activities (e.g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to several minutes. |
Xudong Lin; Fabio Petroni; Gedas Bertasius; Marcus Rohrbach; Shih-Fu Chang; Lorenzo Torresani; |
454 | Multimodal Material Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce the MCubeS dataset (from MultiModal Material Segmentation) which contains 500 sets of multimodal images capturing 42 street scenes. |
Yupeng Liang; Ryosuke Wakaki; Shohei Nobuhara; Ko Nishino; |
455 | Multi-Frame Self-Supervised Depth With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we revisit feature matching for self-supervised monocular depth estimation, and propose a novel transformer architecture for cost volume generation. |
Vitor Guizilini; Rareș Ambruș; Dian Chen; Sergey Zakharov; Adrien Gaidon; |
456 | Weakly Supervised Rotation-Invariant Aerial Object Detection Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Meanwhile, current solutions have been prone to fall into the issue with unstable detectors, as they ignore lower-scored instances and may regard them as backgrounds. To address these issues, in this paper, we construct a novel end-to-end weakly supervised Rotation-Invariant aerial object detection Network (RINet). |
Xiaoxu Feng; Xiwen Yao; Gong Cheng; Junwei Han; |
457 | Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we design a method to fuse and align appearance, motion, and linguistic features to achieve accurate segmentation. |
Wangbo Zhao; Kai Wang; Xiangxiang Chu; Fuzhao Xue; Xinchao Wang; Yang You; |
458 | Surface Reconstruction From Point Clouds By Learning Predictive Context Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this requires the local context prior to generalize to a wide variety of unseen target regions, which is hard to achieve. To resolve this issue, we introduce Predictive Context Priors by learning Predictive Queries for each specific point cloud at inference time. |
Baorui Ma; Yu-Shen Liu; Matthias Zwicker; Zhizhong Han; |
459 | Deformable Video Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce the Deformable Video Transformer (DVT), which dynamically predicts a small subset of video patches to attend for each query location based on motion information, thus allowing the model to decide where to look in the video based on correspondences across frames. |
Jue Wang; Lorenzo Torresani; |
460 | Self-Supervised Keypoint Discovery in Behavioral Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. |
Jennifer J. Sun; Serim Ryou; Roni H. Goldshmid; Brandon Weissbourd; John O. Dabiri; David J. Anderson; Ann Kennedy; Yisong Yue; Pietro Perona; |
461 | IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering. |
Rui Zhu; Zhengqin Li; Janarbek Matai; Fatih Porikli; Manmohan Chandraker; |
462 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. |
Aysim Toker; Lukas Kondmann; Mark Weber; Marvin Eisenberger; Andrés Camero; Jingliang Hu; Ariadna Pregel Hoderlein; Çağlar Şenaras; Timothy Davis; Daniel Cremers; Giovanni Marchisio; Xiao Xiang Zhu; Laura Leal-Taixé; |
463 | Connecting The Complementary-View Videos: Joint Camera Identification and Subject Association Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a new approach that can simultaneously handle three tasks: i) localizing the side-view camera in the top view; ii) estimating the view direction of the side-view camera; iii) detecting and associating the same subjects on the ground across the complementary views. |
Ruize Han; Yiyang Gan; Jiacheng Li; Feifan Wang; Wei Feng; Song Wang; |
464 | End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to forecast a future trajectory distribution of a moving agent in the real world, given the social scene images and historical trajectories. |
Ke Guo; Wenxi Liu; Jia Pan; |
465 | Fast, Accurate and Memory-Efficient Partial Permutation Synchronization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we overcome the restriction of CEMP to compact groups and propose an improved algorithm, CEMP-Partial, for estimating the corruption levels of the observed partial permutations. |
Shaohan Li; Yunpeng Shi; Gilad Lerman; |
466 | Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the quantization-aware deep optics for diffractive snapshot hyperspectral imaging. |
Lingen Li; Lizhi Wang; Weitao Song; Lei Zhang; Zhiwei Xiong; Hua Huang; |
467 | Weakly Supervised Temporal Action Localization Via Representative Snippet Knowledge Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation. To alleviate this problem, we propose a representative snippet summarization and propagation framework. |
Linjiang Huang; Liang Wang; Hongsheng Li; |
468 | Parametric Scattering Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is optimal. |
Shanel Gauthier; Benjamin Thérien; Laurent Alsène-Racicot; Muawiz Chaudhary; Irina Rish; Eugene Belilovsky; Michael Eickenberg; Guy Wolf; |
469 | SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although this setup simplifies data preparation and model design, it complicates user interaction and discards useful information in masked regions. To this end, we propose a new framework for sketch-based image manipulation that only requires sketch inputs from users and utilizes the entire original image. |
Yu Zeng; Zhe Lin; Vishal M. Patel; |
470 | ScaleNet: A Shallow Architecture for Scale Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the problem of estimating scale factors between images. |
Axel Barroso-Laguna; Yurun Tian; Krystian Mikolajczyk; |
471 | E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a novel contour-based method, named E2EC, for high-quality instance segmentation. |
Tao Zhang; Shiqing Wei; Shunping Ji; |
472 | Bounded Adversarial Attack on Deep Content Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel adversarial attack targeting content features in some deep layer, that is, individual neurons in the layer. |
Qiuling Xu; Guanhong Tao; Xiangyu Zhang; |
473 | BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch. |
Zhi Hou; Baosheng Yu; Dacheng Tao; |
474 | Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the classifier focuses only on the discriminative regions while ignoring other useful information in each image, resulting in incomplete localization maps. To address this issue, we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss. |
Qi Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; |
475 | CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, inspired by the transformer style self-attention mechanism, we propose a strategy to cross-attend and re-weight discriminative features for few-shot classification. |
Philip Chikontwe; Soopil Kim; Sang Hyun Park; |
476 | Fingerprinting Deep Neural Networks Globally Via Universal Adversarial Perturbations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks. |
Zirui Peng; Shaofeng Li; Guoxing Chen; Cheng Zhang; Haojin Zhu; Minhui Xue; |
477 | Learning Multi-View Aggregation in The Wild for Large-Scale 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions. |
Damien Robert; Bruno Vallet; Loic Landrieu; |
478 | ManiTrans: Entity-Level Text-Guided Image Manipulation Via Token-Wise Semantic Alignment and Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study a novel task on text-guided image manipulation on the entity level in the real world. |
Jianan Wang; Guansong Lu; Hang Xu; Zhenguo Li; Chunjing Xu; Yanwei Fu; |
479 | Improving Video Model Transfer With Dynamic Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, the problem of dynamic representation learning (DRL) is studied. |
Yi Li; Nuno Vasconcelos; |
480 | PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These methods can be negatively influenced by strong illumination conditions causing shading-reflectance leakages. Therefore, in this paper, an end-to-end edge-driven hybrid CNN approach is proposed for intrinsic image decomposition. |
Partha Das; Sezer Karaoglu; Theo Gevers; |
481 | Clothes-Changing Person Re-Identification With RGB Modality Only Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Clothes-based Adversarial Loss (CAL) to mine clothes-irrelevant features from the original RGB images by penalizing the predictive power of re-id model w.r.t. clothes. |
Xinqian Gu; Hong Chang; Bingpeng Ma; Shutao Bai; Shiguang Shan; Xilin Chen; |
482 | Chitransformer: Towards Reliable Stereo From Cues Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While single image depth estimation is spared from these challenges and can achieve satisfactory results with the extracted monocular cues, the lack of stereoscopic relationship renders the monocular prediction less reliable on its own, especially in highly dynamic or cluttered environments. To address these issues in both scenarios, we present an optic-chiasm-inspired self-supervised binocular depth estimation method, wherein a vision transformer (ViT) with gated positional cross-attention (GPCA) layers is designed to enable feature-sensitive pattern retrieval between views while retaining the extensive context information aggregated through self-attentions. |
Qing Su; Shihao Ji; |
483 | Robust Image Forgery Detection Over Online Social Network Shared Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To fight against the OSN-shared forgeries, in this work, a novel robust training scheme is proposed. |
Haiwei Wu; Jiantao Zhou; Jinyu Tian; Jun Liu; |
484 | QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, the feature itself does not reflect the relation with others. This paper deals with these problems by intentionally selecting significant anchor points for contrastive learning. |
Xueqi Hu; Xinyue Zhou; Qiusheng Huang; Zhengyi Shi; Li Sun; Qingli Li; |
485 | Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, our goal is to handle the new task: real-world varicolored haze removal. |
Yi Li; Yi Chang; Yan Gao; Changfeng Yu; Luxin Yan; |
486 | Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: If one of the sensors is unavailable or missing, the model may fail catastrophically. To mitigate this problem, we propose the Self-Training Multimodal Vehicle Detection Network (ST-MVDNet) which leverages a Teacher-Student mutual learning framework and a simulated sensor noise model used in strong data augmentation for Lidar and Radar. |
Yu-Jhe Li; Jinhyung Park; Matthew O’Toole; Kris Kitani; |
487 | A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By reducing the weights of the majority classes, such instances would become more difficult to learn and hurt the overall performance consequently. To tackle this problem, we propose a novel instance-level re-balancing strategy, which dynamically adjusts the sampling probabilities of instances according to the instance difficulty. |
Sihao Yu; Jiafeng Guo; Ruqing Zhang; Yixing Fan; Zizhen Wang; Xueqi Cheng; |
488 | Representation Compensation Networks for Continual Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study the continual semantic segmentation problem, where the deep neural networks are required to incorporate new classes continually without catastrophic forgetting. |
Chang-Bin Zhang; Jia-Wen Xiao; Xialei Liu; Ying-Cong Chen; Ming-Ming Cheng; |
489 | Adaptive Gating for Single-Photon 3D Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an adaptive gating scheme built upon Thompson sampling. |
Ryan Po; Adithya Pediredla; Ioannis Gkioulekas; |
490 | Tracking People By Predicting 3D Appearance, Location and Pose Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an approach for tracking people in monocular videos by predicting their future 3D representations. |
Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Jitendra Malik; |
491 | Text2Mesh: Text-Driven Neural Stylization for Meshes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we develop intuitive controls for editing the style of 3D objects. |
Oscar Michel; Roi Bar-On; Richard Liu; Sagie Benaim; Rana Hanocka; |
492 | Learning To Solve Hard Minimal Problems Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an approach to solving hard geometric optimization problems in the RANSAC framework. |
Petr Hruby; Timothy Duff; Anton Leykin; Tomas Pajdla; |
493 | H4D: Human 4D Modeling By Learning Neural Compositional Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work presents a novel framework that can effectively learn a compact and compositional representation for dynamic human by exploiting the human body prior from the widely used SMPL parametric model. |
Boyan Jiang; Yinda Zhang; Xingkui Wei; Xiangyang Xue; Yanwei Fu; |
494 | FWD: Real-Time Novel View Synthesis With Forward Warping and Depth Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In our paper, we propose a generalizable NVS method with sparse inputs, called \FWDds, which gives high-quality synthesis in real-time. |
Ang Cao; Chris Rockwell; Justin Johnson; |
495 | Non-Generative Generalized Zero-Shot Learning Via Task-Correlated Disentanglement and Controllable Samples Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a non-generative model to address these problems correspondingly in two modules: (1) Task-correlated feature disentanglement, to exclude the task-correlated features from task-independent ones by adversarial learning of domain adaption towards reasonable synthesis; (2) Controllable pseudo sample synthesis, to synthesize edge-pseudo and center-pseudo samples with certain characteristics towards more diversity generated and intuitive transfer. |
Yaogong Feng; Xiaowen Huang; Pengbo Yang; Jian Yu; Jitao Sang; |
496 | C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we find that there are mainly two challenges of medical images in WSSS: i) the boundary of object foreground and background is not clear; ii) the co-occurrence phenomenon is very severe in training stage. We thus propose a Causal CAM (C-CAM) method to overcome the above challenges. |
Zhang Chen; Zhiqiang Tian; Jihua Zhu; Ce Li; Shaoyi Du; |
497 | Leveraging Real Talking Faces Via Self-Supervision for Robust Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression. In this paper, we examine whether we can tackle this issue by harnessing videos of real talking faces, which contain rich information on natural facial appearance and behaviour and are readily available in large quantities online. |
Alexandros Haliassos; Rodrigo Mira; Stavros Petridis; Maja Pantic; |
498 | Forward Compatible Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By contrast, we suggest learning prospectively to prepare for future updates, and propose ForwArd Compatible Training (FACT) for FSCIL. |
Da-Wei Zhou; Fu-Yun Wang; Han-Jia Ye; Liang Ma; Shiliang Pu; De-Chuan Zhan; |
499 | BaLeNAS: Differentiable Architecture Search Via The Bayesian Learning Rule Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Rather than directly optimizing the architecture parameters, this paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions. |
Miao Zhang; Shirui Pan; Xiaojun Chang; Steven Su; Jilin Hu; Gholamreza (Reza) Haffari; Bin Yang; |
500 | Cannot See The Forest for The Trees: Aggregating Multiple Viewpoints To Better Classify Objects in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet. |
Sukjun Hwang; Miran Heo; Seoung Wug Oh; Seon Joo Kim; |
501 | Learning Canonical F-Correlation Projection for Compact Multiview Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the preceding problem and propose a novel canonical F-correlation framework by exploring and exploiting the nonlinear relationship between different features. |
Yun-Hao Yuan; Jin Li; Yun Li; Jipeng Qiang; Yi Zhu; Xiaobo Shen; Jianping Gou; |
502 | DIFNet: Boosting Visual Information Flow for Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, for most cases, the partially generated sentence may dominate the target word prediction due to the insufficiency of visual information, making the generated descriptions irrelevant to the content of the given image. In this paper, we propose a Dual Information Flow Network (DIFNet) to address this issue, which takes segmentation feature as another visual information source to enhance the contribution of visual information for prediction. |
Mingrui Wu; Xuying Zhang; Xiaoshuai Sun; Yiyi Zhou; Chao Chen; Jiaxin Gu; Xing Sun; Rongrong Ji; |
503 | Weakly Supervised Object Localization As Domain Adaption Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the MIL mechanism makes CAM only activate discriminative object parts rather than the whole object, weakening its performance for localizing objects. To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects. |
Lei Zhu; Qi She; Qian Chen; Yunfei You; Boyu Wang; Yanye Lu; |
504 | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the Tencent-MVSE dataset, which is the first benchmark dataset for the multi-modal video similarity evaluation task. |
Zhaoyang Zeng; Yongsheng Luo; Zhenhua Liu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen; |
505 | Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these interaction approaches usually cannot well capture the intrinsic object details in the query images that are widely encountered in FSS, e.g., if the query object to be segmented has holes and slots, inaccurate segmentation almost always happens. To this end, we propose a dynamic prototype convolution network (DPCN) to fully capture the aforementioned intrinsic details for accurate FSS. |
Jie Liu; Yanqi Bao; Guo-Sen Xie; Huan Xiong; Jan-Jakob Sonke; Efstratios Gavves; |
506 | Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting. |
Nicolas Donati; Etienne Corman; Maks Ovsjanikov; |
507 | Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels. |
Zhiyuan Liang; Tiancai Wang; Xiangyu Zhang; Jian Sun; Jianbing Shen; |
508 | Mr.BiQ: Post-Training Non-Uniform Quantization Based on Minimizing The Reconstruction Error Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We thus propose a new post-training non-uniform quantization method, called Mr.BiQ, allowing low bit-width quantization even on Transformer models. |
Yongkweon Jeon; Chungman Lee; Eulrang Cho; Yeonju Ro; |
509 | MatteFormer: Transformer-Based Image Matting Via Prior-Tokens Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. |
GyuTae Park; SungJoon Son; JaeYoung Yoo; SeHo Kim; Nojun Kwak; |
510 | Video Shadow Detection Via Spatio-Temporal Interpolation Consistency Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Using a model trained on labeled images to the video frames directly may lead to high generalization error and temporal inconsistent results. In this paper, we address these challenges by proposing a Spatio-Temporal Interpolation Consistency Training (STICT) framework to rationally feed the unlabeled video frames together with the labeled images into an image shadow detection network training. |
Xiao Lu; Yihong Cao; Sheng Liu; Chengjiang Long; Zipei Chen; Xuanyu Zhou; Yimin Yang; Chunxia Xiao; |
511 | Ranking Distance Calibration for Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, we start from a generic representation pre-trained by a cross-entropy loss and a conventional distance-based classifier, along with an image retrieval view, to employ a re-ranking process to calibrate a target distance matrix by discovering the k-reciprocal neighbours within the task. |
Pan Li; Shaogang Gong; Chengjie Wang; Yanwei Fu; |
512 | Robust and Accurate Superquadric Recovery: A Probabilistic Approach Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The superquadric recovery is formulated as a Maximum Likelihood Estimation (MLE) problem. We propose an algorithm, Expectation, Maximization, and Switching (EMS), to solve this problem, where: (1) outliers are predicted from the posterior perspective; (2) the superquadric parameter is optimized by the trust-region reflective algorithm; and (3) local optima are avoided by globally searching and switching among parameters encoding similar superquadrics. |
Weixiao Liu; Yuwei Wu; Sipu Ruan; Gregory S. Chirikjian; |
513 | Zero-Shot Text-Guided Object Generation With Dream Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To improve fidelity and visual quality, we introduce simple geometric priors, including sparsityinducing transmittance regularization, scene bounds, and new MLP architectures. |
Ajay Jain; Ben Mildenhall; Jonathan T. Barron; Pieter Abbeel; Ben Poole; |
514 | Learning Pixel Trajectories With Multiscale Contrastive Random Walks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The main contribution is introducing hierarchy into the search problem by computing the transition matrix in a coarse-to-fine manner, forming a multiscale contrastive random walk. |
Zhangxing Bian; Allan Jabri; Alexei A. Efros; Andrew Owens; |
515 | Self-Supervised Correlation Mining Network for Person Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Self-supervised Correlation Mining Network (SCM-Net) to rearrange the source images in the feature space, in which two collaborative modules are integrated, Decomposed Style Encoder (DSE) and Correlation Mining Module (CMM). |
Zijian Wang; Xingqun Qi; Kun Yuan; Muyi Sun; |
516 | Grounding Answers for Visual Questions Asked By Visually Impaired People Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual questions asked by people with visual impairments. |
Chongyan Chen; Samreen Anjum; Danna Gurari; |
517 | Task Adaptive Parameter Sharing for Multi-Task Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a simple method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. |
Matthew Wallingford; Hao Li; Alessandro Achille; Avinash Ravichandran; Charless Fowlkes; Rahul Bhotika; Stefano Soatto; |
518 | Sparse Instance Activation for Real-Time Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. |
Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Wenqiang Zhang; Qian Zhang; Chang Huang; Zhaoxiang Zhang; Wenyu Liu; |
519 | Automatic Color Image Stitching Using Quaternion Rank-1 Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To quantitatively evaluate image stitching performance, we propose a perceptual seam quality (PSQ) measure to calculate misalignments of local regions along the seamline. |
Jiaxue Li; Yicong Zhou; |
520 | VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose VisualGPT, which employs a novel self-resurrecting encoder-decoder attention mechanism to quickly adapt the PLM with a small amount of in-domain image-text data. |
Jun Chen; Han Guo; Kai Yi; Boyang Li; Mohamed Elhoseiny; |
521 | ESCNet: Gaze Target Detection With The Understanding of 3D Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper aims to address the single image gaze target detection problem. |
Jun Bao; Buyu Liu; Jun Yu; |
522 | Can You Spot The Chameleon? Adversarially Camouflaging Images From Co-Salient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it inevitably poses an entirely new safety and security issue, i.e., highly personal and sensitive content can potentially be extracting by powerful CoSOD methods. In this paper, we address this problem from the perspective of adversarial attacks and identify a novel task: adversarial co-saliency attack. |
Ruijun Gao; Qing Guo; Felix Juefei-Xu; Hongkai Yu; Huazhu Fu; Wei Feng; Yang Liu; Song Wang; |
523 | Finding Badly Drawn Bunnies Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Geometry-Aware Classification Layer (GACL), a generic method that makes feature-magnitude-as-quality-metric possible and importantly does it without the need for specific quality annotations from humans. |
Lan Yang; Kaiyue Pang; Honggang Zhang; Yi-Zhe Song; |
524 | Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Point2Cyl, a supervised network transforming a raw 3D point cloud to a set of extrusion cylinders. |
Mikaela Angelina Uy; Yen-Yu Chang; Minhyuk Sung; Purvi Goel; Joseph G. Lambourne; Tolga Birdal; Leonidas J. Guibas; |
525 | All-Photon Polarimetric Time-of-Flight Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We depart from the principle of first arrival and instead propose an all-photon ToF imaging method relying on polarization changes that analyzes both first- and late-arriving photons for shape and material scene understanding. |
Seung-Hwan Baek; Felix Heide; |
526 | MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. |
Wenhao Li; Hong Liu; Hao Tang; Pichao Wang; Luc Van Gool; |
527 | Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. |
Tianhan Xu; Yasuhiro Fujita; Eiichi Matsumoto; |
528 | Learning From Temporal Gradient for Semi-Supervised Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To better leverage the encoded temporal information in videos, we introduce temporal gradient as an additional modality for more attentive feature extraction in this paper. |
Junfei Xiao; Longlong Jing; Lin Zhang; Ju He; Qi She; Zongwei Zhou; Alan Yuille; Yingwei Li; |
529 | Towards Implicit Text-Guided 3D Shape Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we explore the challenging task of generating 3D shapes from text. |
Zhengzhe Liu; Yi Wang; Xiaojuan Qi; Chi-Wing Fu; |
530 | Audio-Driven Neural Gesture Reenactment With Video Motion Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method that reenacts a high-quality video with gestures matching a target speech audio. |
Yang Zhou; Jimei Yang; Dingzeyu Li; Jun Saito; Deepali Aneja; Evangelos Kalogerakis; |
531 | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose SoftCollage, a novel method that employs a neural-based differentiable probabilistic tree generator to produce the probability distribution of correlation-preserving collage tree conditioned on deep image feature, aspect ratio and canvas size. |
Jiahao Yu; Li Chen; Mingrui Zhang; Mading Li; |
532 | Transforming Model Prediction for Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While this inductive bias integrates valuable domain knowledge, it limits the expressivity of the tracking network. In this work, we therefore propose a tracker architecture employing a Transformer-based model prediction module. |
Christoph Mayer; Martin Danelljan; Goutam Bhat; Matthieu Paul; Danda Pani Paudel; Fisher Yu; Luc Van Gool; |
533 | A Unified Framework for Implicit Sinkhorn Differentiation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To allow for an efficient training of respective neural networks, we propose an algorithm that obtains analytical gradients of a Sinkhorn layer via implicit differentiation. |
Marvin Eisenberger; Aysim Toker; Laura Leal-Taixé; Florian Bernard; Daniel Cremers; |
534 | DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, most of them hardly consider the geometric features in 3D space, and ignore the topology cues when performing differentiable RANSAC algorithms. To this end, we proposed a Depth-Guided Edge Convolutional Network (DGECN) for 6D pose estimation task. |
Tuo Cao; Fei Luo; Yanping Fu; Wenxiao Zhang; Shengjie Zheng; Chunxia Xiao; |
535 | Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs With Language Structures Via Dependency Relationships Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, how to construct a joint vision-language (VL) structure has barely been investigated. More challenging but worthwhile, we introduce a new task that targets on inducing such a joint VL structure in an unsupervised manner. |
Chao Lou; Wenjuan Han; Yuhuan Lin; Zilong Zheng; |
536 | Open-Vocabulary Instance Segmentation Via Robust Cross-Modal Pseudo-Labeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the high-level textual information learned from caption pretraining alone cannot effectively encode the details required for pixel-wise segmentation. To address this, we propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images. |
Dat Huynh; Jason Kuen; Zhe Lin; Jiuxiang Gu; Ehsan Elhamifar; |
537 | Locality-Aware Inter- and Intra-Video Reconstruction for Self-Supervised Correspondence Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We develop LIIR, a locality-aware inter-and intra-video reconstruction framework that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. |
Liulei Li; Tianfei Zhou; Wenguan Wang; Lu Yang; Jianwu Li; Yi Yang; |
538 | A Versatile Multi-View Framework for LiDAR-Based 3D Object Detection With Guidance From Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel multi-task framework that jointly performs 3D object detection and panoptic segmentation. |
Hamidreza Fazlali; Yixuan Xu; Yuan Ren; Bingbing Liu; |
539 | Query and Attention Augmentation for Knowledge-Based Explainable Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To bridge this research gap, we present Query and Attention Augmentation, a general approach that augments neural module networks to jointly reason about visual and external knowledge. |
Yifeng Zhang; Ming Jiang; Qi Zhao; |
540 | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. |
Tristan Thrush; Ryan Jiang; Max Bartolo; Amanpreet Singh; Adina Williams; Douwe Kiela; Candace Ross; |
541 | RFNet: Unsupervised Network for Mutually Reinforcing Multi-Modal Image Registration and Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel method to realize multi-modal image registration and fusion in a mutually reinforcing framework, termed as RFNet. |
Han Xu; Jiayi Ma; Jiteng Yuan; Zhuliang Le; Wei Liu; |
542 | Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net). |
Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang; |
543 | Interactron: Embodied Adaptive Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Interactron, a method for adaptive object detection in an interactive setting, where the goal is to perform object detection in images observed by an embodied agent navigating in different environments. |
Klemen Kotar; Roozbeh Mottaghi; |
544 | 3D Scene Painting Via Semantic Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel approach to 3D scene painting using a configurable 3D scene layout. |
Jaebong Jeong; Janghun Jo; Sunghyun Cho; Jaesik Park; |
545 | MeMOT: Multi-Object Tracking With Memory Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. |
Jiarui Cai; Mingze Xu; Wei Li; Yuanjun Xiong; Wei Xia; Zhuowen Tu; Stefano Soatto; |
546 | Revisiting Weakly Supervised Pre-Training of Visual Perception Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of residual networks and the largest-ever dataset of images and corresponding hashtags. |
Mannat Singh; Laura Gustafson; Aaron Adcock; Vinicius de Freitas Reis; Bugra Gedik; Raj Prateek Kosaraju; Dhruv Mahajan; Ross Girshick; Piotr Dollár; Laurens van der Maaten; |
547 | Semi-Supervised Semantic Segmentation With Error Localization Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The unlabeled images are usually assigned pseudo labels to be used in training, which however often causes the risk of performance degradation due to the confirmation bias towards errors on the pseudo labels. We present a novel method that resolves this chronic issue of pseudo labeling. |
Donghyeon Kwon; Suha Kwak; |
548 | Meta Convolutional Neural Networks for Single Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new model, termed meta convolutional neural network, to solve the single domain generalization problem in image recognition. |
Chaoqun Wan; Xu Shen; Yonggang Zhang; Zhiheng Yin; Xinmei Tian; Feng Gao; Jianqiang Huang; Xian-Sheng Hua; |
549 | Generalizing Gaze Estimation With Rotation Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we discover the rotation-consistency property in gaze estimation and introduce the ‘sub-label’ for unsupervised domain adaptation. |
Yiwei Bao; Yunfei Liu; Haofei Wang; Feng Lu; |
550 | Anomaly Detection Via Reverse Distillation From One-Class Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, using similar or identical architectures to build the teacher and student models in previous studies hinders the diversity of anomalous representations. To tackle this problem, we propose a novel T-S model consisting of a teacher encoder and a student decoder and introduce a simple yet effective "reverse distillation" paradigm accordingly. |
Hanqiu Deng; Xingyu Li; |
551 | Fine-Grained Object Classification Via Self-Supervised Pose Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network. |
Xuhui Yang; Yaowei Wang; Ke Chen; Yong Xu; Yonghong Tian; |
552 | Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, considering the variants and diverse action types in human motion data, the cross-dependency of the spatio-temporal relationships will be difficult to depict due to the decoupled modeling strategy, which may also exacerbate the problem of insufficient generalization. Therefore, we propose the Spatio-Temporal Gating-Adjacency GCN(GAGCN) to learn the complex spatio-temporal dependencies over diverse action types. |
Chongyang Zhong; Lei Hu; Zihao Zhang; Yongjing Ye; Shihong Xia; |
553 | CellTypeGraph: A New Geometric Computer Vision Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Classifying all cells in an organ is a relevant and difficult problem from plant developmental biology. We here abstract the problem into a new benchmark for node classification in a geo-referenced graph. |
Lorenzo Cerrone; Athul Vijayan; Tejasvinee Mody; Kay Schneitz; Fred A. Hamprecht; |
554 | Clustering Plotted Data By Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a different way of clustering points in 2-dimensional space, inspired by how humans cluster data: by training neural networks to perform instance segmentation on plotted data. |
Tarek Naous; Srinjay Sarkar; Abubakar Abid; James Zou; |
555 | Accelerating Neural Network Optimization Through An Automated Control Theory Lens Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper studies the optimizer for accelerating the time-consuming deep network training through an automated control theory lens. |
Jiahao Wang; Baoyuan Wu; Rui Su; Mingdeng Cao; Shuwei Shi; Wanli Ouyang; Yujiu Yang; |
556 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing animal behavior datasets have limitations in multiple aspects, including limited numbers of animal classes, data samples and provided tasks, and also limited variations in environmental conditions and viewpoints. To address these limitations, we create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors. |
Xun Long Ng; Kian Eng Ong; Qichen Zheng; Yun Ni; Si Yong Yeo; Jun Liu; |
557 | Learning To Learn Across Diverse Data Biases in Deep Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that many bias variations such as ethnicity, head pose, occlusion and blur can jointly affect the accuracy significantly. |
Chang Liu; Xiang Yu; Yi-Hsuan Tsai; Masoud Faraki; Ramin Moslemi; Manmohan Chandraker; Yun Fu; |
558 | Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a weakly-supervised approach for 3D object detection, which makes it possible to train a strong 3D detector with position-level annotations (i.e. annotations of object centers). |
Xiuwei Xu; Yifan Wang; Yu Zheng; Yongming Rao; Jie Zhou; Jiwen Lu; |
559 | Long-Tail Recognition Via Compositional Knowledge Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a novel strategy for long-tail recognition that addresses the tail classes’ few-shot problem via training-free knowledge transfer. |
Sarah Parisot; Pedro M. Esperança; Steven McDonagh; Tamas J. Madarasz; Yongxin Yang; Zhenguo Li; |
560 | EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By formulating such out-of-distribution finetuning process in the Causal Inference paradigm, we view the erroneous semantics of these special entities as confounders to cause the retrieval failure. To rectify these semantics for aligning with e-commerce domain knowledge, we propose an intervention-based entity-aware contrastive learning framework with two modules, i.e., the Confounding Entity Selection Module and Entity-Aware Learning Module. |
Haoyu Ma; Handong Zhao; Zhe Lin; Ajinkya Kale; Zhangyang Wang; Tong Yu; Jiuxiang Gu; Sunav Choudhary; Xiaohui Xie; |
561 | Multi-Dimensional, Nuanced and Subjective – Measuring The Perception of Facial Expressions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a methodology for collecting and modeling multidimensional modulated expression annotations from human annotators. |
De’Aira Bryant; Siqi Deng; Nashlie Sephus; Wei Xia; Pietro Perona; |
562 | PyMiceTracking: An Open-Source Toolbox for Real-Time Behavioral Neuroscience Experiments Related Papers Related Patents Related Grants Related Orgs Related Experts View Abstract: The development of computational tools allows the advancement of research in behavioral neuroscience and elevates the limits of experiment design. Many behavioral experiments need … |
Richardson Menezes; Aron de Miranda; Helton Maia; |
563 | Self-Taught Metric Learning Without Labels Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel self-taught framework for unsupervised metric learning, which alternates between predicting class-equivalence relations between data through a moving average of an embedding model and learning the model with the predicted relations as pseudo labels. |
Sungyeon Kim; Dongwon Kim; Minsu Cho; Suha Kwak; |
564 | MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most existing video architectures can only process <5 seconds of a video without hitting the computation or memory bottlenecks. In this paper, we propose a new strategy to overcome this challenge. |
Chao-Yuan Wu; Yanghao Li; Karttikeya Mangalam; Haoqi Fan; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer; |
565 | Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. |
Junyu Gao; Mengyuan Chen; Changsheng Xu; |
566 | Embracing Single Stride 3D Object Detector With Sparse Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Overlooking this difference, many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. In this paper, we start by rethinking how such multi-stride stereotype affects the LiDAR-based 3D object detectors. |
Lue Fan; Ziqi Pang; Tianyuan Zhang; Yu-Xiong Wang; Hang Zhao; Feng Wang; Naiyan Wang; Zhaoxiang Zhang; |
567 | Multidimensional Belief Quantification for Label-Efficient Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel uncertainty-aware task selection model for label efficient meta-learning. |
Deep Shankar Pandey; Qi Yu; |
568 | UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a contrastive learning-based framework UTC to unify and facilitate both discriminative and generative tasks in visual dialog with a single model. |
Cheng Chen; Zhenshan Tan; Qingrong Cheng; Xin Jiang; Qun Liu; Yudong Zhu; Xiaodong Gu; |
569 | Relieving Long-Tailed Instance Segmentation Via Pairwise Class Balance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. |
Yin-Yin He; Peizhen Zhang; Xiu-Shen Wei; Xiangyu Zhang; Jian Sun; |
570 | Online Convolutional Re-Parameterization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present online convolutional re-parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. |
Mu Hu; Junyi Feng; Jiashen Hua; Baisheng Lai; Jianqiang Huang; Xiaojin Gong; Xian-Sheng Hua; |
571 | Mimicking The Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). |
Yujun Shi; Kuangqi Zhou; Jian Liang; Zihang Jiang; Jiashi Feng; Philip H.S. Torr; Song Bai; Vincent Y. F. Tan; |
572 | RIDDLE: Lidar Data Compression With Range Image Deep Delta Encoding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we show that directly compressing the range images can leverage the lidar scanning pattern, compared to compressing the unprojected point clouds. |
Xuanyu Zhou; Charles R. Qi; Yin Zhou; Dragomir Anguelov; |
573 | RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR. |
Jun Chen; Aniket Agarwal; Sherif Abdelkarim; Deyao Zhu; Mohamed Elhoseiny; |
574 | HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, many of the existing solutions, though can efficiently reduce CNN model sizes, are very difficult to bring considerable saving for computational costs, especially when the compression ratio is not huge, thereby causing the severe computation inefficiency problem. To overcome this challenge, in this paper we propose efficient High-Order DEcomposed Convolution (HODEC). |
Miao Yin; Yang Sui; Wanzhao Yang; Xiao Zang; Yu Gong; Bo Yuan; |
575 | RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds By Local Rigidity Prior Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on scene flow learning on point clouds in a self-supervised manner. |
Ruibo Li; Chi Zhang; Guosheng Lin; Zhe Wang; Chunhua Shen; |
576 | Smooth Maximum Unit: Smooth Activation Function for Deep Networks Using Smoothing Maximum Technique Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Rectified Linear Unit (ReLU) is a popular hand-designed activation function and is the most common choice in the deep learning community due to its simplicity though ReLU has some drawbacks. In this paper, we have proposed two new novel activation functions based on approximation of the maximum function, and we call these functions Smooth Maximum Unit (SMU and SMU-1). |
Koushik Biswas; Sandeep Kumar; Shilpak Banerjee; Ashish Kumar Pandey; |
577 | Learning Invisible Markers for Hidden Codes in Offline-to-Online Photography Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel invisible information hiding architecture for display/print-camera scenarios, consisting of hiding, locating, correcting, and recovery, where invisible markers are learned to make hidden codes truly invisible. |
Jun Jia; Zhongpai Gao; Dandan Zhu; Xiongkuo Min; Guangtao Zhai; Xiaokang Yang; |
578 | Personalized Image Aesthetics Assessment With Rich Attributes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To solve the dilemma, we conduct so far, the most comprehensive subjective study of personalized image aesthetics and introduce a new Personalized image Aesthetics database with Rich Attributes (PARA), which consists of 31,220 images with annotations by 438 subjects. |
Yuzhe Yang; Liwu Xu; Leida Li; Nan Qie; Yaqian Li; Peng Zhang; Yandong Guo; |
579 | Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters to generate synthetic pre-training data for them. |
Samarth Mishra; Rameswar Panda; Cheng Perng Phoo; Chun-Fu (Richard) Chen; Leonid Karlinsky; Kate Saenko; Venkatesh Saligrama; Rogerio S. Feris; |
580 | Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Part-based Pseudo Label Refinement (PPLR) framework that reduces the label noise by employing the complementary relationship between global and part features. |
Yoonki Cho; Woo Jae Kim; Seunghoon Hong; Sung-Eui Yoon; |
581 | Bridging The Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments. |
Yicong Hong; Zun Wang; Qi Wu; Stephen Gould; |
582 | HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: So we propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction. |
Xiaowan Hu; Yuanhao Cai; Jing Lin; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; |
583 | OW-DETR: Open-World Detection Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. |
Akshita Gupta; Sanath Narayan; K J Joseph; Salman Khan; Fahad Shahbaz Khan; Mubarak Shah; |
584 | Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the local codes are constrained at discrete and regular positions like grid points, which makes the code positions difficult to be optimized and limits their representation ability. To solve this problem, we propose to learn DIF with Dynamic Code Cloud, named DCC-DIF. |
Tianyang Li; Xin Wen; Yu-Shen Liu; Hua Su; Zhizhong Han; |
585 | Reversible Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. |
Karttikeya Mangalam; Haoqi Fan; Yanghao Li; Chao-Yuan Wu; Bo Xiong; Christoph Feichtenhofer; Jitendra Malik; |
586 | Amodal Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation. |
Rohit Mohan; Abhinav Valada; |
587 | Gravitationally Lensed Black Hole Emission Tomography Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose BH-NeRF, a novel tomography approach that leverages gravitational lensing to recover the continuous 3D emission field near a black hole. |
Aviad Levis; Pratul P. Srinivasan; Andrew A. Chael; Ren Ng; Katherine L. Bouman; |
588 | 3D-Aware Image Synthesis Via Learning Structural and Textural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Meanwhile, NeRF is built on volume rendering which can be too costly to produce high-resolution results, increasing the optimization difficulty. To alleviate these two problems, we propose a novel framework, termed as VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation. |
Yinghao Xu; Sida Peng; Ceyuan Yang; Yujun Shen; Bolei Zhou; |
589 | Text-to-Image Synthesis Based on Object-Guided Joint-Decoding Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an object-guided joint-decoding module to simultaneously generate the image and the corresponding layout. |
Fuxiang Wu; Liu Liu; Fusheng Hao; Fengxiang He; Jun Cheng; |
590 | Correlation Verification for Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet). |
Seongwon Lee; Hongje Seong; Suhyeon Lee; Euntai Kim; |
591 | Unsupervised Vision-and-Language Pre-Training Via Retrieval-Based Multi-Granular Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose unsupervised Vision-and-Language pre-training (UVLP) to learn the cross-modal representation from non-parallel image and text datasets. |
Mingyang Zhou; Licheng Yu; Amanpreet Singh; Mengjiao Wang; Zhou Yu; Ning Zhang; |
592 | Protecting Facial Privacy: Generating Adversarial Identity Masks Via Style-Robust Makeup Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose adversarial makeup transfer GAN (AMT-GAN), a novel face protection method aiming at constructing adversarial face images that preserve stronger black-box transferability and better visual quality simultaneously. |
Shengshan Hu; Xiaogeng Liu; Yechao Zhang; Minghui Li; Leo Yu Zhang; Hai Jin; Libing Wu; |
593 | PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of ‘where to look?’ |
Santhosh Kumar Ramakrishnan; Devendra Singh Chaplot; Ziad Al-Halah; Jitendra Malik; Kristen Grauman; |
594 | Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, they cannot guarantee the robust generalization of models due to the ignorance of useful information hidden in noisy data. To address this issue, we propose a new effective method named as LaCoL (Latent Contrastive Learning) to leverage the negative correlations from the noisy data. |
Jiexi Yan; Lei Luo; Chenghao Xu; Cheng Deng; Heng Huang; |
595 | Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: More importantly, existing approaches build upon the straightforward pose estimation loss, which unfortunately cannot constrain the network to fully leverage useful information from neighboring frames. To tackle these problems, we present a novel hierarchical alignment framework, which leverages coarse-to-fine deformations to progressively update a neighboring frame to align with the current frame at the feature level. |
Zhenguang Liu; Runyang Feng; Haoming Chen; Shuang Wu; Yixing Gao; Yunjun Gao; Xiang Wang; |
596 | Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing GAN inversion and editing methods work well for aligned objects with a clean background, such as portraits and animal faces, but often struggle for more difficult categories with complex scene layouts and object occlusions, such as cars, animals, and outdoor images. We propose a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2. |
Gaurav Parmar; Yijun Li; Jingwan Lu; Richard Zhang; Jun-Yan Zhu; Krishna Kumar Singh; |
597 | Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. |
Yangtao Wang; Xi Shen; Shell Xu Hu; Yuan Yuan; James L. Crowley; Dominique Vaufreydaz; |
598 | Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection. |
Yong Zhang; Yingwei Pan; Ting Yao; Rui Huang; Tao Mei; Chang-Wen Chen; |
599 | Towards Robust Adaptive Object Detection Under Noisy Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods assume that the source domain labels are completely clean, yet large-scale datasets often contain error-prone annotations due to instance ambiguity, which may lead to a biased source distribution and severely degrade the performance of the domain adaptive detector de facto. In this paper, we represent the first effort to formulate noisy DAOD and propose a Noise Latent Transferability Exploration (NLTE) framework to address this issue. |
Xinyu Liu; Wuyang Li; Qiushi Yang; Baopu Li; Yixuan Yuan; |
600 | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper probes intrinsic factors behind typical failure cases (e.g spatial inconsistency and boundary confusion) produced by the existing state-of-the-art method in face parsing. To tackle these problems, we propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation (DML-CSR) for face parsing. |
Qingping Zheng; Jiankang Deng; Zheng Zhu; Ying Li; Stefanos Zafeiriou; |
601 | Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore more challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain. |
Shuai Yang; Liming Jiang; Ziwei Liu; Chen Change Loy; |
602 | Learning To Memorize Feature Hallucination for One-Shot Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel model to explicitly learn and memorize reusable features that can help hallucinate novel category images. |
Yu Xie; Yanwei Fu; Ying Tai; Yun Cao; Junwei Zhu; Chengjie Wang; |
603 | AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis. |
Zhiqin Chen; Kangxue Yin; Sanja Fidler; |
604 | Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a hierarchical visual-language knowledge distillation method, i.e., HierKD, for open-vocabulary one-stage detection. |
Zongyang Ma; Guan Luo; Jin Gao; Liang Li; Yuxin Chen; Shaoru Wang; Congxuan Zhang; Weiming Hu; |
605 | Glass: Geometric Latent Augmentation for Shape Spaces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We investigate the problem of training generative models on very sparse collections of 3D models. |
Sanjeev Muralikrishnan; Siddhartha Chaudhuri; Noam Aigerman; Vladimir G. Kim; Matthew Fisher; Niloy J. Mitra; |
606 | COAP: Compositional Articulated Occupancy of People Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel neural implicit representation for articulated human bodies. |
Marko Mihajlovic; Shunsuke Saito; Aayush Bansal; Michael Zollhöfer; Siyu Tang; |
607 | Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we describe an approach that learns the two tasks simultaneously and exploits their intrinsic correlations to boost the training of each: the follower judges whether the speaker-created instruction explains the original navigation route correctly, and vice versa. |
Hanqing Wang; Wei Liang; Jianbing Shen; Luc Van Gool; Wenguan Wang; |
608 | Evading The Simplicity Bias: Training A Diverse Set of Models Discovers Solutions With Superior OOD Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. |
Damien Teney; Ehsan Abbasnejad; Simon Lucey; Anton van den Hengel; |
609 | Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 take-apart toy vehicles. |
Fadime Sener; Dibyadip Chatterjee; Daniel Shelepov; Kun He; Dipika Singhania; Robert Wang; Angela Yao; |
610 | Deterministic Point Cloud Registration Via Novel Transformation Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Given a set of putative 3D-3D point correspondences, we aim to remove outliers and estimate rigid transformation with 6 degrees of freedom (DOF). |
Wen Chen; Haoang Li; Qiang Nie; Yun-Hui Liu; |
611 | Motion-Adjustable Neural Implicit Video Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By exploiting the relation between the phase information in sinusoidal functions and their displacements, we incorporate into the conventional image-based INR model a phase-varying positional encoding module, and couple it with a phase-shift generation module that determines the phase-shift values at each frame. |
Long Mai; Feng Liu; |
612 | Neural Prior for Trajectory Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we propose a neural trajectory prior to capture continuous spatio-temporal information without the need for offline data. |
Chaoyang Wang; Xueqian Li; Jhony Kaesemodel Pontes; Simon Lucey; |
613 | DPICT: Deep Progressive Image Compression Using Trit-Planes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). |
Jae-Han Lee; Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim; |
614 | Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel representation, termed Unification, to unify the advantages of regression and classification. |
Rui Peng; Rongjie Wang; Zhenyu Wang; Yawen Lai; Ronggang Wang; |
615 | Long-Tailed Recognition Via Weight Balancing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Surprisingly, weight decay, although underexplored in LTR, significantly improves over prior work. Therefore, we adopt a two-stage training paradigm and propose a simple approach to LTR: (1) learning features using the cross-entropy loss by tuning weight decay, and (2) learning classifiers using class-balanced loss by tuning weight decay and MaxNorm. |
Shaden Alshammari; Yu-Xiong Wang; Deva Ramanan; Shu Kong; |
616 | Text to Image Generation With Semantic-Spatial Aware GAN Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A close inspection of their generated images reveals a major limitation: even though the generated image holistically matches the description, individual image regions or parts of somethings are often not recognizable or consistent with words in the sentence, e.g. "a white crown". To address this problem, we propose a novel framework Semantic-Spatial Aware GAN for synthesizing images from input text. |
Wentong Liao; Kai Hu; Michael Ying Yang; Bodo Rosenhahn; |
617 | The Norm Must Go On: Dynamic Unsupervised Domain Adaptation By Normalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This can be a hurdle in fields which require continuous dynamic adaptation or suffer from scarcity of data, e.g. autonomous driving in challenging weather conditions. To address this problem of continuous adaptation to distribution shifts, we propose Dynamic Unsupervised Adaptation (DUA). |
M. Jehanzeb Mirza; Jakub Micorek; Horst Possegger; Horst Bischof; |
618 | ShapeFormer: Transformer-Based Shape Completion Via Sparse Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ShapeFormer, a transformer-based network that produces a distribution of object completions, conditioned on incomplete, and possibly noisy, point clouds. |
Xingguang Yan; Liqiang Lin; Niloy J. Mitra; Dani Lischinski; Daniel Cohen-Or; Hui Huang; |
619 | PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Similarly, strong data augmentation and regularization techniques often improve OOD robustness but harm anomaly detection, raising the question of whether a Pareto improvement on all existing safety measures is possible. To meet this challenge, we design a new data augmentation strategy utilizing the natural structural complexity of pictures such as fractals, which outperforms numerous baselines, is near Pareto-optimal, and roundly improves safety measures. |
Dan Hendrycks; Andy Zou; Mantas Mazeika; Leonard Tang; Bo Li; Dawn Song; Jacob Steinhardt; |
620 | Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Novel contour descriptors, called eigencontours, based on low-rank approximation are proposed in this paper. |
Wonhui Park; Dongkwon Jin; Chang-Su Kim; |
621 | Generalizable Cross-Modality Medical Image Segmentation Via Style Augmentation and Dual Normalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This setting, namely generalizable cross-modality segmentation, owning its clinical potential, is much more challenging than other related settings, e.g., domain adaptation. To achieve this goal, we in this paper propose a novel dual-normalization model by leveraging the augmented source-similar and source-dissimilar images during our generalizable segmentation. |
Ziqi Zhou; Lei Qi; Xin Yang; Dong Ni; Yinghuan Shi; |
622 | Learning Optical Flow With Kernel Patch Attention Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a novel approach, called kernel patch attention (KPA), to better resolve the ambiguity in dense matching by explicitly taking the local context relations into consideration. |
Ao Luo; Fan Yang; Xin Li; Shuaicheng Liu; |
623 | Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model. |
Yu Du; Fangyun Wei; Zihe Zhang; Miaojing Shi; Yue Gao; Guoqi Li; |
624 | TimeReplayer: Unlocking The Potential of Event Cameras for Video Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To fully unlock the potential of event cameras, this paper proposes a novel TimeReplayer algorithm to interpolate videos captured by commodity cameras with events. |
Weihua He; Kaichao You; Zhendong Qiao; Xu Jia; Ziyang Zhang; Wenhui Wang; Huchuan Lu; Yaoyuan Wang; Jianxing Liao; |
625 | General Incremental Learning With Domain-Aware Categorical Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we consider a general and yet under-explored incremental learning problem in which both the class distribution and class-specific domain distribution change over time. |
Jiangwei Xie; Shipeng Yan; Xuming He; |
626 | Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects (just a few pixels wide) in large multi-megapixel high-dynamic-range (HDR) images. |
Chengyuan Xu; Boning Dong; Noah Stier; Curtis McCully; D. Andrew Howell; Pradeep Sen; Tobias Höllerer; |
627 | ActiveZero: Mixed Domain Learning for Active Stereovision With Zero Annotation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Since ground truth depth is readily available in the simulation domain but quite difficult to obtain in the real domain, we propose a method that leverages the best of both worlds. |
Isabella Liu; Edward Yang; Jianyu Tao; Rui Chen; Xiaoshuai Zhang; Qing Ran; Zhu Liu; Hao Su; |
628 | DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an early knowledge distillation framework, which is termed as DearKD, to improvethe data-efficiency required by transformers. |
Xianing Chen; Qiong Cao; Yujie Zhong; Jing Zhang; Shenghua Gao; Dacheng Tao; |
629 | Global-Aware Registration of Less-Overlap RGB-D Scans Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel method of registering less-overlap RGB-D scans. |
Che Sun; Yunde Jia; Yi Guo; Yuwei Wu; |
630 | RayMVSNet: Learning Ray-Based 1D Implicit Fields for Accurate Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose RayMVSNet which learns sequential prediction of a 1D implicit field along each camera ray with the zero-crossing point indicating scene depth. |
Junhua Xi; Yifei Shi; Yijie Wang; Yulan Guo; Kai Xu; |
631 | ContrastMask: Contrastive Learning To Segment Every Thing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike previous methods that learn such models only on base categories, in this paper, we propose a new method, named ContrastMask, which learns a mask segmentation model on both base and novel categories under a unified pixel-level contrastive learning framework. |
Xuehui Wang; Kai Zhao; Ruixin Zhang; Shouhong Ding; Yan Wang; Wei Shen; |
632 | Efficient Deep Embedded Subspace Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new mechanism for deep clustering. |
Jinyu Cai; Jicong Fan; Wenzhong Guo; Shiping Wang; Yunhe Zhang; Zhao Zhang; |
633 | Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on exploiting the high-precision and non-differentiable physics simulator to incorporate dynamical constraints in motion capture. |
Buzhen Huang; Liang Pan; Yuan Yang; Jingyi Ju; Yangang Wang; |
634 | Revisiting Temporal Alignment for Video Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a novel, generic iterative alignment module which employs a gradual refinement scheme for sub-alignments, yielding more accurate motion compensation. |
Kun Zhou; Wenbo Li; Liying Lu; Xiaoguang Han; Jiangbo Lu; |
635 | Scaling Vision Transformers to Gigapixel Images Via Hierarchical Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a new ViT architecture called the Hierarchical Image Pyramid Transformer (HIPT), which leverages the natural hierarchical structure inherent in WSIs using two levels of self-supervised learning to learn high-resolution image representations. |
Richard J. Chen; Chengkuan Chen; Yicong Li; Tiffany Y. Chen; Andrew D. Trister; Rahul G. Krishnan; Faisal Mahmood; |
636 | Neural Reflectance for Shape Recovery With Shadow Handling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims at recovering the shape of a scene with unknown, non-Lambertian, and possibly spatially-varying surface materials. |
Junxuan Li; Hongdong Li; |
637 | Rep-Net: Efficient On-Device Learning Via Feature Reprogramming Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To develop memory-efficient on-device transfer learning, in this work, we are the first to approach the concept of transfer learning from a new perspective of intermediate feature reprogramming of a pre-trained model (i.e., backbone). |
Li Yang; Adnan Siraj Rakin; Deliang Fan; |
638 | Surface Representation for Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present RepSurf (representative surfaces), a novel representation of point clouds to explicitly depict the very local structure. |
Haoxi Ran; Jun Liu; Chengjie Wang; |
639 | Implicit Motion Handling for Video Camouflaged Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames. |
Xuelian Cheng; Huan Xiong; Deng-Ping Fan; Yiran Zhong; Mehrtash Harandi; Tom Drummond; Zongyuan Ge; |
640 | OVE6D: Object Viewpoint Encoding for Depth-Based 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a universal framework, called OVE6D, for model-based 6D object pose estimation from a single depth image and a target object mask. |
Dingding Cai; Janne Heikkilä; Esa Rahtu; |
641 | DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present DeepLIIF (https://deepliif.org), a first free online platform for efficient and reproducible IHC scoring. |
Parmida Ghahremani; Joseph Marino; Ricardo Dodds; Saad Nadeem; |
642 | Joint Video Summarization and Moment Localization By Cross-Task Sample Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Approximately, the video summary can be treated as a sparse, redundancy-free version of the video moments. Inspired by this observation, we propose an importance Propagation based collaborative Teaching Network (iPTNet). |
Hao Jiang; Yadong Mu; |
643 | WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present the best of both the real and synthetic worlds for automatic occlusion supervision using a large readily available source of data: time-lapse imagery from stationary webcams observing street intersections over weeks, months, or even years. |
N. Dinesh Reddy; Robert Tamburo; Srinivasa G. Narasimhan; |
644 | Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study an untouched problem in visible-infrared person re-identification (VI-ReID), namely, Twin Noise Labels (TNL) which refers to as noisy annotation and correspondence. |
Mouxing Yang; Zhenyu Huang; Peng Hu; Taihao Li; Jiancheng Lv; Xi Peng; |
645 | Optical Flow Estimation for Spiking Camera Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, frame-based and event-based methods are not well suited to spike streams from the spiking camera due to the different data modalities. To this end, we present, SCFlow, a tailored deep learning pipeline to estimate optical flow in high-speed scenes from spike streams. |
Liwen Hu; Rui Zhao; Ziluo Ding; Lei Ma; Boxin Shi; Ruiqin Xiong; Tiejun Huang; |
646 | MetaFormer Is Actually What You Need for Vision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To verify this, we deliberately replace the attention module in transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. |
Weihao Yu; Mi Luo; Pan Zhou; Chenyang Si; Yichen Zhou; Xinchao Wang; Jiashi Feng; Shuicheng Yan; |
647 | GradViT: Gradient Inversion of Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. |
Ali Hatamizadeh; Hongxu Yin; Holger R. Roth; Wenqi Li; Jan Kautz; Daguang Xu; Pavlo Molchanov; |
648 | Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution Via Cycle-Projected Mutual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a one-stage based Cycle-projected Mutual learning network (CycMu-Net) for ST-VSR, which makes full use of spatial-temporal correlations via the mutual learning between S-VSR and T-VSR. |
Mengshun Hu; Kui Jiang; Liang Liao; Jing Xiao; Junjun Jiang; Zheng Wang; |
649 | InstaFormer: Instance-Aware Image-to-Image Translation With Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. |
Soohyun Kim; Jongbeom Baek; Jihye Park; Gyeongnyeon Kim; Seungryong Kim; |
650 | Revisiting Near/Remote Sensing With Geospatial Attention Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an approach for computing geospatial attention that incorporates geometric features and the appearance of the overhead and ground-level imagery. |
Scott Workman; M. Usman Rafique; Hunter Blanton; Nathan Jacobs; |
651 | Joint Global and Local Hierarchical Priors for Learned Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, CNNs have a limitation in modeling long-range dependencies due to their nature of local connectivity, which can be a significant bottleneck in image compression where reducing spatial redundancy is a key point. To overcome this issue, we propose a novel entropy model called Information Transformer (Informer) that exploits both global and local information in a content-dependent manner using an attention mechanism. |
Jun-Hyuk Kim; Byeongho Heo; Jong-Seok Lee; |
652 | Knowledge Distillation Via The Target-Aware Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. |
Sihao Lin; Hongwei Xie; Bing Wang; Kaicheng Yu; Xiaojun Chang; Xiaodan Liang; Gang Wang; |
653 | Recurring The Transformer for Video Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a novel Recurrent Vision Transformer (RViT) framework for spatial-temporal representation learning to achieve the video action recognition task. |
Jiewen Yang; Xingbo Dong; Liujun Liu; Chao Zhang; Jiajun Shen; Dahai Yu; |
654 | Subspace Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To control the growth of the gradient, we propose a new AT method, Subspace Adversarial Training (Sub-AT), which constrains AT in a carefully extracted subspace. |
Tao Li; Yingwen Wu; Sizhe Chen; Kun Fang; Xiaolin Huang; |
655 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by deforming point clouds during training. |
Alexander Lehner; Stefano Gasperini; Alvaro Marcos-Ramiro; Michael Schmidt; Mohammad-Ali Nikouei Mahani; Nassir Navab; Benjamin Busam; Federico Tombari; |
656 | Image Segmentation Using Text and Image Prompts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. |
Timo Lüddecke; Alexander Ecker; |
657 | AutoMine: An Unmanned Mine Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, the open-pit mine is one of the typical representatives for them. Therefore, we introduce the Autonomous driving dataset on the Mining scene (AutoMine) for positioning and perception tasks in this paper. |
Yuchen Li; Zixuan Li; Siyu Teng; Yu Zhang; Yuhang Zhou; Yuchang Zhu; Dongpu Cao; Bin Tian; Yunfeng Ai; Zhe Xuanyuan; Long Chen; |
658 | Neural Data-Dependent Transform for Learned Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Learned image compression has achieved great success due to its excellent modeling capacity, but seldom further considers the Rate-Distortion Optimization (RDO) of each input image. To explore this potential in the learned codec, we make the first attempt to build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. |
Dezhao Wang; Wenhan Yang; Yueyu Hu; Jiaying Liu; |
659 | Background Activation Suppression for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Background Activation Suppression (BAS) method. |
Pingyu Wu; Wei Zhai; Yang Cao; |
660 | How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: On this account, the model would be fed with corrupted and noisy input data, thus fatally affecting its prediction performance. In this regard, we focus on delivering accurate predictions when only a few input observations are used, thus potentially lowering the risks associated with automatic perception. |
Alessio Monti; Angelo Porrello; Simone Calderara; Pasquale Coscia; Lamberto Ballan; Rita Cucchiara; |
661 | Evaluation-Oriented Knowledge Distillation for Deep Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the ultimate goal of KD methods, we propose a novel Evaluation oriented KD method (EKD) for deep face recognition to directly reduce the performance gap between the teacher and student models during training. |
Yuge Huang; Jiaxiang Wu; Xingkun Xu; Shouhong Ding; |
662 | Improving Subgraph Recognition With Variational Graph Information Bottleneck Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, GIB suffers from training instability and degenerated results due to its intrinsic optimization process. To tackle these issues, we reformulate the subgraph recognition problem into two steps: graph perturbation and subgraph selection, leading to a novel Variational Graph Information Bottleneck (VGIB) framework |
Junchi Yu; Jie Cao; Ran He; |
663 | Slot-VPS: Object-Centric Representation Learning for Video Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, this divide-and-conquer strategy requires complex post-processing in both spatial and temporal domains and is vulnerable to failures from surrogate tasks. In this paper, inspired by object-centric learning which learns compact and robust object representations, we present Slot-VPS, the first end-to-end framework for this task. |
Yi Zhou; Hui Zhang; Hana Lee; Shuyang Sun; Pingjun Li; Yangguang Zhu; ByungIn Yoo; Xiaojuan Qi; Jae-Joon Han; |
664 | Motion-From-Blur: 3D Shape and Motion Estimation of Motion-Blurred Objects in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. |
Denys Rozumnyi; Martin R. Oswald; Vittorio Ferrari; Marc Pollefeys; |
665 | Efficient Video Instance Segmentation Via Tracklet Query and Proposal Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes EfficientVIS, a fully end-to-end framework with efficient training and inference. |
Jialian Wu; Sudhir Yarram; Hui Liang; Tian Lan; Junsong Yuan; Jayan Eledath; Gérard Medioni; |
666 | Synthetic Generation of Face Videos With Plethysmograph Physiology Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a scalable biophysical learning based method to generate physio-realistic synthetic rPPG videos given any reference image and target rPPG signal and shows that it could further improve the state-of-the-art physiological measurement and reduce the bias among different groups. |
Zhen Wang; Yunhao Ba; Pradyumna Chari; Oyku Deniz Bozkurt; Gianna Brown; Parth Patwa; Niranjan Vaddi; Laleh Jalilian; Achuta Kadambi; |
667 | TransRAC: Encoding Multi-Scale Temporal Correlation With Transformers for Repetitive Action Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the data-driven era, the degradation of such generalization capability is mainly attributed to the lack of long video datasets. To complement this margin, we introduce a new large-scale repetitive action counting dataset covering a wide variety of video lengths, along with more realistic situations where action interruption or action inconsistencies occur in the video. |
Huazhang Hu; Sixun Dong; Yiqun Zhao; Dongze Lian; Zhengxin Li; Shenghua Gao; |
668 | Hallucinated Neural Radiance Fields in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing solutions adopt NeRF with a controllable appearance embedding to render novel views under various conditions, but they cannot render view-consistent images with an unseen appearance. To solve this problem, we present an end-to-end framework for constructing a hallucinated NeRF, dubbed as Ha-NeRF. |
Xingyu Chen; Qi Zhang; Xiaoyu Li; Yue Chen; Ying Feng; Xuan Wang; Jue Wang; |
669 | NeuralHDHair: Automatic High-Fidelity Hair Modeling From A Single Image Using Implicit Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce NeuralHDHair, a flexible, fully automatic system for modeling high-fidelity hair from a single image. |
Keyu Wu; Yifan Ye; Lingchen Yang; Hongbo Fu; Kun Zhou; Youyi Zheng; |
670 | The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing literature discussing this "hard-to-learn" concept are mainly expanded either along the dimension of the samples or the dimension of the features. In this paper, we aim to introduce a simple view merging these two dimensions, leading to a new, simple yet effective, heuristic to train machine learning models by emphasizing the worst-cases on both the sample and the feature dimensions. |
Zeyi Huang; Haohan Wang; Dong Huang; Yong Jae Lee; Eric P. Xing; |
671 | Global Tracking Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel transformer-based architecture for global multi-object tracking. |
Xingyi Zhou; Tianwei Yin; Vladlen Koltun; Philipp Krähenbühl; |
672 | Backdoor Attacks on Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. |
Aniruddha Saha; Ajinkya Tejankar; Soroush Abbasi Koohpayegani; Hamed Pirsiavash; |
673 | Multimodal Token Fusion for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a multimodal token fusion method (TokenFusion), tailored for transformer-based vision tasks. |
Yikai Wang; Xinghao Chen; Lele Cao; Wenbing Huang; Fuchun Sun; Yunhe Wang; |
674 | Exploring Frequency Adversarial Attacks for Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, instead of injecting adversarial perturbations into the spatial domain, we propose a frequency adversarial attack method against face forgery detectors. |
Shuai Jia; Chao Ma; Taiping Yao; Bangjie Yin; Shouhong Ding; Xiaokang Yang; |
675 | GMFlow: Learning Optical Flow Via Global Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a GMFlow framework, which consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. |
Haofei Xu; Jing Zhang; Jianfei Cai; Hamid Rezatofighi; Dacheng Tao; |
676 | Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. |
Xian Liu; Qianyi Wu; Hang Zhou; Yinghao Xu; Rui Qian; Xinyi Lin; Xiaowei Zhou; Wayne Wu; Bo Dai; Bolei Zhou; |
677 | FLAVA: A Foundational Language and Vision Alignment Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A promising direction would be to use a single holistic universal model, as a "foundation", that targets all modalities at once—a true vision and language foundation model should be good at vision tasks, language tasks, and cross- and multi-modal vision and language tasks. We introduce FLAVA as such a model and demonstrate impressive performance on a wide range of 35 tasks spanning these target modalities. |
Amanpreet Singh; Ronghang Hu; Vedanuj Goswami; Guillaume Couairon; Wojciech Galuba; Marcus Rohrbach; Douwe Kiela; |
678 | Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we tackle large-scale SLP by learning to co-articulate between dictionary signs, a method capable of producing smooth signing while scaling to unconstrained domains of discourse. |
Ben Saunders; Necati Cihan Camgoz; Richard Bowden; |
679 | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We endeavor on a rarely explored task named Insubstan-tial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and tempo-ral information is crucial. |
Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao; |
680 | OCSampler: Compressing Videos to One Clip With Single-Step Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: There have been works that formulate frame sampling as a sequential decision task by selecting frames one by one according to their importance. In this paper, we present a more efficient framework named OCSampler, which explores such a representation with one short clip. |
Jintao Lin; Haodong Duan; Kai Chen; Dahua Lin; Limin Wang; |
681 | Learning Bayesian Sparse Networks With Full Experience Replay for Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite their performance, they still suffer from interference across tasks which leads to catastrophic forgetting. To ameliorate this problem, we propose to only activate and select sparse neurons for learning current and past tasks at any stage. |
Qingsen Yan; Dong Gong; Yuhang Liu; Anton van den Hengel; Javen Qinfeng Shi; |
682 | Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named "Memory Replay" utilizing a memory graph. |
Lihuan Li; Maurice Pagnucco; Yang Song; |
683 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we give a solution to the absolute pose problem free of motion assumptions. |
Fang Bai; Agniva Sengupta; Adrien Bartoli; |
684 | TableFormer: Table Structure Understanding With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new table-structure identification model. |
Ahmed Nassar; Nikolaos Livathinos; Maksym Lysak; Peter Staar; |
685 | Exemplar-Based Pattern Synthesis With Implicit Periodic Field Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an exemplar-based visual pattern synthesis framework that aims to model the inner statistics of visual patterns and generate new, versatile patterns that meet the aforementioned requirements. |
Haiwei Chen; Jiayi Liu; Weikai Chen; Shichen Liu; Yajie Zhao; |
686 | Grounded Language-Image Pre-Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. |
Liunian Harold Li; Pengchuan Zhang; Haotian Zhang; Jianwei Yang; Chunyuan Li; Yiwu Zhong; Lijuan Wang; Lu Yuan; Lei Zhang; Jenq-Neng Hwang; Kai-Wei Chang; Jianfeng Gao; |
687 | Spectral Unsupervised Domain Adaptation for Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Spectral UDA (SUDA), an effective and efficient UDA technique that works in the spectral space and can generalize across different visual recognition tasks. |
Jingyi Zhang; Jiaxing Huang; Zichen Tian; Shijian Lu; |
688 | AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They adopt a sub-optimal uniform sampling point allocation, limiting the expressiveness of the learned LUTs since the (tri-)linear interpolation between uniform sampling points in the LUT transform might fail to model local non-linearities of the color transform. Focusing on this problem, we present AdaInt (Adaptive Intervals Learning), a novel mechanism to achieve a more flexible sampling point allocation by adaptively learning the non-uniform sampling intervals in the 3D color space. |
Canqian Yang; Meiguang Jin; Xu Jia; Yi Xu; Ying Chen; |
689 | PatchFormer: An Efficient Point Transformer With Patch Attention Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing point Transformers are computationally expensive since they need to generate a large attention map, which has quadratic complexity (both in space and time) with respect to input size. To solve this shortcoming, we introduce patch-attention (PAT) to adaptively learn a much smaller set of bases upon which the attention maps are computed. |
Cheng Zhang; Haocheng Wan; Xinyi Shen; Zizhao Wu; |
690 | Recurrent Glimpse-Based Decoder for Detection With Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper. |
Zhe Chen; Jing Zhang; Dacheng Tao; |
691 | Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction To Treat Diabetic Foot Ulcers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce AiD Regen, a novel system that generates 3D wound models combining 2D semantic segmentation with 3D reconstruction so that they can be printed via 3D bio-printers during the surgery to treat diabetic foot ulcers (DFUs). |
Han Joo Chae; Seunghwan Lee; Hyewon Son; Seungyeob Han; Taebin Lim; |
692 | SimMIM: A Simple Framework for Masked Image Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents SimMIM, a simple framework for masked image modeling. |
Zhenda Xie; Zheng Zhang; Yue Cao; Yutong Lin; Jianmin Bao; Zhuliang Yao; Qi Dai; Han Hu; |
693 | OmniFusion: 360 Monocular Depth Estimation Via Geometry-Aware Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a 360 monocular depth estimation pipeline, OmniFusion, to tackle the spherical distortion issue. |
Yuyan Li; Yuliang Guo; Zhixin Yan; Xinyu Huang; Ye Duan; Liu Ren; |
694 | Label Matching Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite the promising results, the label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. In this paper, we delve into this problem and propose a simple yet effective LabelMatch framework from two different yet complementary perspectives, i.e., distribution-level and instance-level. |
Binbin Chen; Weijie Chen; Shicai Yang; Yunyi Xuan; Jie Song; Di Xie; Shiliang Pu; Mingli Song; Yueting Zhuang; |
695 | RegionCLIP: Region-Based Language-Image Pretraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. |
Yiwu Zhong; Jianwei Yang; Pengchuan Zhang; Chunyuan Li; Noel Codella; Liunian Harold Li; Luowei Zhou; Xiyang Dai; Lu Yuan; Yin Li; Jianfeng Gao; |
696 | Video Frame Interpolation Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field. To address these issues, we propose a Transformer-based video interpolation framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. |
Zhihao Shi; Xiangyu Xu; Xiaohong Liu; Jun Chen; Ming-Hsuan Yang; |
697 | An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We address weakly supervised point cloud segmentation by proposing a new model, MIL-derived transformer, to mine additional supervisory signals. |
Cheng-Kun Yang; Ji-Jia Wu; Kai-Syun Chen; Yung-Yu Chuang; Yen-Yu Lin; |
698 | Fast Light-Weight Near-Field Photometric Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce the first end-to-end learning-based solution to near-field Photometric Stereo (PS), where the light sources are close to the object of interest. |
Daniel Lichy; Soumyadip Sengupta; David W. Jacobs; |
699 | BCOT: A Markerless High-Precision 3D Object Tracking Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking. |
Jiachen Li; Bin Wang; Shiqiang Zhu; Xin Cao; Fan Zhong; Wenxuan Chen; Te Li; Jason Gu; Xueying Qin; |
700 | Omni-DETR: Omni-Supervised Object Detection With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection. |
Pei Wang; Zhaowei Cai; Hao Yang; Gurumurthy Swaminathan; Nuno Vasconcelos; Bernt Schiele; Stefano Soatto; |
701 | Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, to address the distortion problem of omnidirectional images, we devise a novel subdivision scheme of a spherical geodesic grid. |
Donghun Kang; Hyeonjoong Jang; Jungeon Lee; Chong-Min Kyung; Min H. Kim; |
702 | High-Resolution Image Synthesis With Latent Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. |
Robin Rombach; Andreas Blattmann; Dominik Lorenz; Patrick Esser; Björn Ommer; |
703 | Improving Adversarially Robust Few-Shot Image Classification With Generalizable Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider the FSIC problem in the case of adversarial examples. |
Junhao Dong; Yuan Wang; Jian-Huang Lai; Xiaohua Xie; |
704 | Transferable Sparse Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on sparse adversarial attack based on the l_0 norm constraint, which can succeed by only modifying a few pixels of an image. |
Ziwen He; Wei Wang; Jing Dong; Tieniu Tan; |
705 | CREAM: Weakly Supervised Object Localization Via Class RE-Activation Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we empirically prove that this problem is associated with the mixup of the activation values between less discriminative foreground regions and the background. To address it, we propose Class RE-Activation Mapping (CREAM), a novel clustering-based approach to boost the activation values of the integral object regions. |
Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng; Rui-Wei Zhao; Tao Zhang; Xuequan Lu; Shang Gao; |
706 | Semi-Weakly-Supervised Learning of Complex Actions From Instructional Task Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a general SWSL framework that can efficiently learn from both types of videos and can leverage any of the existing weakly-supervised action segmentation methods. |
Yuhan Shen; Ehsan Elhamifar; |
707 | APRIL: Finding The Achilles’ Heel on Privacy for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we analyse the gradient leakage risk of self-attention based mechanism in both theoretical and practical manners. |
Jiahao Lu; Xi Sheryl Zhang; Tianli Zhao; Xiangyu He; Jian Cheng; |
708 | Text Spotting Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild. |
Xiang Zhang; Yongwen Su; Subarna Tripathi; Zhuowen Tu; |
709 | Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an extension of mip-NeRF (a NeRF variant that addresses sampling and aliasing) that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes. |
Jonathan T. Barron; Ben Mildenhall; Dor Verbin; Pratul P. Srinivasan; Peter Hedman; |
710 | VALHALLA: Visual Hallucination for Machine Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. |
Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu (Richard) Chen; Rogerio S. Feris; David Cox; Nuno Vasconcelos; |
711 | StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. |
Roy Or-El; Xuan Luo; Mengyi Shan; Eli Shechtman; Jeong Joon Park; Ira Kemelmacher-Shlizerman; |
712 | Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers. |
Yue Cao; Zhaolin Wan; Dongwei Ren; Zifei Yan; Wangmeng Zuo; |
713 | GLAMR: Global Occlusion-Aware Human Mesh Recovery With Dynamic Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. |
Ye Yuan; Umar Iqbal; Pavlo Molchanov; Kris Kitani; Jan Kautz; |
714 | HINT: Hierarchical Neuron Concept Explainer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings. |
Andong Wang; Wei-Ning Lee; Xiaojuan Qi; |
715 | Capturing and Inferring Dense Full-Body Human-Scene Contact Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key insight is that regions in contact are always occluded so the network needs the ability to explore the whole image for evidence. |
Chun-Hao P. Huang; Hongwei Yi; Markus Höschle; Matvey Safroshkin; Tsvetelina Alexiadis; Senya Polikovsky; Daniel Scharstein; Michael J. Black; |
716 | Advancing High-Resolution Video-Language Representation With Large-Scale Video Transcriptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel High-resolution and Diversified VIdeo-LAnguage pre-training model (HD-VILA) for many visual tasks. |
Hongwei Xue; Tiankai Hang; Yanhong Zeng; Yuchong Sun; Bei Liu; Huan Yang; Jianlong Fu; Baining Guo; |
717 | Target-Aware Dual Adversarial Learning and A Multi-Scenario Multi-Modality Benchmark To Fuse Infrared and Visible for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network. |
Jinyuan Liu; Xin Fan; Zhanbo Huang; Guanyao Wu; Risheng Liu; Wei Zhong; Zhongxuan Luo; |
718 | En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead, in this paper, we propose an Intra-Class Compactness Enhancement method (ICCE) for GZSL. |
Xia Kong; Zuodong Gao; Xiaofan Li; Ming Hong; Jun Liu; Chengjie Wang; Yuan Xie; Yanyun Qu; |
719 | Neural Face Identification in A 2D Wireframe Projection of A Manifold Object Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we approach the classical problem of face identification from a novel data-driven point of view. |
Kehan Wang; Jia Zheng; Zihan Zhou; |
720 | LC-FDNet: Learned Lossless Image Compression With Frequency Decomposition Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new lossless image compression method that proceeds the encoding in a coarse-to-fine manner to separate and process low and high-frequency regions differently. |
Hochang Rhee; Yeong Il Jang; Seyun Kim; Nam Ik Cho; |
721 | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization Via Generalized Straight-Through Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. |
Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric P. Xing; Zhiqiang Shen; |
722 | Deep Rectangling for Image Stitching: A Learning Baseline Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these solutions only work for images with rich linear structures, leading to noticeable distortions for portraits and landscapes with non-linear objects. In this paper, we address these issues by proposing the first deep learning solution to image rectangling. |
Lang Nie; Chunyu Lin; Kang Liao; Shuaicheng Liu; Yao Zhao; |
723 | PCL: Proxy-Based Contrastive Learning for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that aligning positive sample-to-sample pairs tends to hinder the model generalization due to the significant distribution gaps between different domains. To address this issue, we propose a novel proxy-based contrastive learning method, which replaces the original sample-to-sample relations with proxy-to-sample relations, significantly alleviating the positive alignment issue. |
Xufeng Yao; Yang Bai; Xinyun Zhang; Yuechen Zhang; Qi Sun; Ran Chen; Ruiyu Li; Bei Yu; |
724 | SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation With Learnt Surface Embeddings Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an approach to learn dense, continuous 2D-3D correspondence distributions over the surface of objects from data with no prior knowledge of visual ambiguities like symmetry. |
Rasmus Laurvig Haugaard; Anders Glent Buch; |
725 | Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a completion method using a transformer for scene modeling and novel methods to improve the properties of a 360-degree image on the output image. |
Naofumi Akimoto; Yuhi Matsuo; Yoshimitsu Aoki; |
726 | Learning 3D Object Shape and Layout Without 3D Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While there have been recent advances in predicting 3D shape and layout from a single image, most approaches rely on 3D ground truth for training which is expensive to collect at scale. We overcome these limitations and propose a method that learns to predict 3D shape and layout for objects without any ground truth shape or layout information: instead we rely on multi-view images with 2D supervision which can more easily be collected at scale. |
Georgia Gkioxari; Nikhila Ravi; Justin Johnson; |
727 | An Empirical Study of End-to-End Temporal Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present an empirical study of end-to-end temporal action detection. |
Xiaolong Liu; Song Bai; Xiang Bai; |
728 | SimVP: Simpler Yet Better Video Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes SimVP, a simple video prediction model that is completely built upon CNN and trained by MSE loss in an end-to-end fashion. |
Zhangyang Gao; Cheng Tan; Lirong Wu; Stan Z. Li; |
729 | Object Localization Under Single Coarse Point Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points. |
Xuehui Yu; Pengfei Chen; Di Wu; Najmul Hassan; Guorong Li; Junchi Yan; Humphrey Shi; Qixiang Ye; Zhenjun Han; |
730 | Unsupervised Learning of Accurate Siamese Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel unsupervised tracking framework, in which we can learn temporal correspondence both on the classification branch and regression branch. |
Qiuhong Shen; Lei Qiao; Jinyang Guo; Peixia Li; Xin Li; Bo Li; Weitao Feng; Weihao Gan; Wei Wu; Wanli Ouyang; |
731 | Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to conduct novel Bayesian non-parametric submodular video partition (BN-SVP) to significantly improve MIL model training that can offer a highly reliable solution for robust anomaly detection in practical settings that include outlier segments or multiple types of abnormal events. |
Hitesh Sapkota; Qi Yu; |
732 | Brain-Supervised Image Editing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we present a novel alternative: the utilization of brain responses as a supervision signal for learning semantic feature representations. |
Keith M. Davis III; Carlos de la Torre-Ortiz; Tuukka Ruotsalo; |
733 | 3D Shape Variational Autoencoder Latent Disentanglement Via Mini-Batch Feature Swapping for Bodies and Faces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an intuitive yet effective self-supervised approach to train a 3D shape variational autoencoder (VAE) which encourages a disentangled latent representation of identity features. |
Simone Foti; Bongjin Koo; Danail Stoyanov; Matthew J. Clarkson; |
734 | Unified Transformer Tracker for Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. |
Fan Ma; Mike Zheng Shou; Linchao Zhu; Haoqi Fan; Yilei Xu; Yi Yang; Zhicheng Yan; |
735 | Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions. |
Jiayu Yang; Jose M. Alvarez; Miaomiao Liu; |
736 | Equalized Focal Loss for Dense Long-Tailed Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, in the long-tailed scenario, this line of work has not been explored so far. In this paper, we investigate whether one-stage detectors can perform well in this case. |
Bo Li; Yongqiang Yao; Jingru Tan; Gang Zhang; Fengwei Yu; Jianwei Lu; Ye Luo; |
737 | Generating High Fidelity Data From Low-Density Regions Using Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our work focuses on addressing sample deficiency from low-density regions of data manifold in common image datasets. |
Vikash Sehwag; Caner Hazirbas; Albert Gordo; Firat Ozgenel; Cristian Canton; |
738 | DeepDPM: Deep Clustering With An Unknown Number of Clusters Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times. In this work, we bridge this gap by introducing an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning. |
Meitar Ronen; Shahaf E. Finder; Oren Freifeld; |
739 | Spiking Transformers for Event-Based Single Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a spiking transformer network, STNet, for single object tracking. |
Jiqing Zhang; Bo Dong; Haiwei Zhang; Jianchuan Ding; Felix Heide; Baocai Yin; Xin Yang; |
740 | FocalClick: Towards Practical Interactive Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correct part. FocalClick solves both issues at once by predicting and updating the mask in localized areas. |
Xi Chen; Zhiyan Zhao; Yilei Zhang; Manni Duan; Donglian Qi; Hengshuang Zhao; |
741 | ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose ISDNet, a novel ultra-high resolution segmentation framework that integrates the shallow and deep networks in a new manner, which significantly accelerates the inference speed while achieving accurate segmentation. |
Shaohua Guo; Liang Liu; Zhenye Gan; Yabiao Wang; Wuhao Zhang; Chengjie Wang; Guannan Jiang; Wei Zhang; Ran Yi; Lizhuang Ma; Ke Xu; |
742 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work instead develops a novel unsupervised domain adaptation framework for nighttime aerial tracking (named UDAT). |
Junjie Ye; Changhong Fu; Guangze Zheng; Danda Pani Paudel; Guang Chen; |
743 | Balanced Multimodal Learning Via On-the-Fly Gradient Modulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, in this paper we point out that existing audio-visual discriminative models, in which uniform objective is designed for all modalities, could remain under-optimized uni-modal representations, caused by another dominated modality in some scenarios, e.g., sound in blowing wind event, vision in drawing picture event, etc. To alleviate this optimization imbalance, we propose on-the-fly gradient modulation to adaptively control the optimization of each modality, via monitoring the discrepancy of their contribution towards the learning objective. |
Xiaokang Peng; Yake Wei; Andong Deng; Dong Wang; Di Hu; |
744 | RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As face image contains abundant contextual information, we propose a method, RestoreFormer, which explores fully-spatial attentions to model contextual information and surpasses existing works that use local convolutions. |
Zhouxia Wang; Jiawei Zhang; Runjian Chen; Wenping Wang; Ping Luo; |
745 | Understanding Uncertainty Maps in Vision With Statistical Testing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, specifically for uncertainties defined on images, we show how revisiting results from Random Field theory (RFT) when paired with DNN tools (to get around computational hurdles) leads to efficient frameworks that can provide a hypothesis test capabilities, not otherwise available, for uncertainty maps from models used in many vision tasks. |
Jurijs Nazarovs; Zhichun Huang; Songwong Tasneeyapant; Rudrasis Chakraborty; Vikas Singh; |
746 | CAFE: Learning To Condense Dataset By Aligning Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel scheme to Condense dataset by Aligning FEatures (CAFE), which explicitly attempts to preserve the real-feature distribution as well as the discriminant power of the resulting synthetic set, lending itself to strong generalization capability to various architectures. |
Kai Wang; Bo Zhao; Xiangyu Peng; Zheng Zhu; Shuo Yang; Shuo Wang; Guan Huang; Hakan Bilen; Xinchao Wang; Yang You; |
747 | Causality Inspired Representation Learning for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: When the dependence changes with the target distribution, the statistic models may fail to generalize. In this regard, we introduce a general structural causal model to formalize the DG problem. |
Fangrui Lv; Jian Liang; Shuang Li; Bin Zang; Chi Harold Liu; Ziteng Wang; Di Liu; |
748 | Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. |
Yuanhao Cai; Jing Lin; Xiaowan Hu; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; |
749 | A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel variational Bayesian formulation for diffeomorphic non-rigid registration of medical images, which learns in an unsupervised way a data-specific similarity metric. |
Daniel Grzech; Mohammad Farid Azampour; Ben Glocker; Julia Schnabel; Nassir Navab; Bernhard Kainz; Loïc Le Folgoc; |
750 | Not Just Selection, But Exploration: Online Class-Incremental Continual Learning Via Dual View Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel yet effective framework for online class-incremental continual learning, which considers not only the selection of stored samples, but also the full exploration of the data stream. |
Yanan Gu; Xu Yang; Kun Wei; Cheng Deng; |
751 | PPDL: Predicate Probability Distribution Based Loss for Unbiased Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Predicate Probability Distribution based Loss (PPDL) to train the biased SGG models and obtain unbiased Scene Graphs ultimately. |
Wei Li; Haiwei Zhang; Qijie Bai; Guoqing Zhao; Ning Jiang; Xiaojie Yuan; |
752 | Block-NeRF: Scalable Large Scene Neural View Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Block-NeRF, a variant of Neural Radiance Fields that can represent large-scale environments. |
Matthew Tancik; Vincent Casser; Xinchen Yan; Sabeek Pradhan; Ben Mildenhall; Pratul P. Srinivasan; Jonathan T. Barron; Henrik Kretzschmar; |
753 | Coupling Vision and Proprioception for Navigation of Legged Robots Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We exploit the complementary strengths of vision and proprioception to develop a point-goal navigation system for legged robots, called VP-Nav. |
Zipeng Fu; Ashish Kumar; Ananye Agarwal; Haozhi Qi; Jitendra Malik; Deepak Pathak; |
754 | Fine-Grained Predicates Learning for Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While general SGG models are prone to predict head predicates and existing re-balancing strategies prefer tail categories, none of them can appropriately handle these hard-to-distinguish predicates. To tackle this issue, inspired by fine-grained image classification, which focuses on differentiating among hard-to-distinguish object classes, we propose a method named Fine-Grained Predicates Learning (FGPL) which aims at differentiating among hard-to-distinguish predicates for Scene Graph Generation task. |
Xinyu Lyu; Lianli Gao; Yuyu Guo; Zhou Zhao; Hao Huang; Heng Tao Shen; Jingkuan Song; |
755 | Generalized Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new benchmark, called Generalized Few-Shot Semantic Segmentation (GFS-Seg), to analyze the generalization ability of simultaneously segmenting the novel categories with very few examples and the base categories with sufficient examples. |
Zhuotao Tian; Xin Lai; Li Jiang; Shu Liu; Michelle Shu; Hengshuang Zhao; Jiaya Jia; |
756 | Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel optimization method based on a recurrent neural network to predict LiDAR scene flow in a weakly supervised manner. |
Guanting Dong; Yueyi Zhang; Hanlin Li; Xiaoyan Sun; Zhiwei Xiong; |
757 | Neural Head Avatars From Monocular RGB Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Neural Head Avatars, a novel neural representation that explicitly models the surface geometry and appearance of an animatable human avatar that can be used for teleconferencing in AR/VR or other applications in the movie or games industry that rely on a digital human. |
Philip-William Grassal; Malte Prinzler; Titus Leistner; Carsten Rother; Matthias Nießner; Justus Thies; |
758 | B-Cos Networks: Alignment Is All We Need for Interpretability Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. |
Moritz Böhle; Mario Fritz; Bernt Schiele; |
759 | EMOCA: Emotion Driven Monocular Face Capture and Animation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The result is facial geometries that do not match the emotional content of the input image. We address this with EMOCA (EMOtion Capture and Animation), by introducing a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image. |
Radek Daněček; Michael J. Black; Timo Bolkart; |
760 | Burst Image Restoration and Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our central idea is to create a set of pseudo-burst features that combine complimentary information from all the input burst frames to seamlessly exchange information. |
Akshay Dudhane; Syed Waqas Zamir; Salman Khan; Fahad Shahbaz Khan; Ming-Hsuan Yang; |
761 | What Makes Transfer Learning Work for Medical Images: Feature Reuse & Other Factors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Through a series of experiments on several medical image benchmark datasets, we explore the relationship between transfer learning, data size, the capacity and inductive bias of the model, as well as the distance between the source and target domain. Our findings suggest that transfer learning is beneficial in most cases, and we characterize the important role feature reuse plays in its success. |
Christos Matsoukas; Johan Fredin Haslum; Moein Sorkhei; Magnus Söderberg; Kevin Smith; |
762 | Towards Diverse and Natural Scene-Aware 3D Human Motion Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on the problem of synthesizing diverse scene-aware human motions under the guidance of target action sequences. |
Jingbo Wang; Yu Rong; Jingyuan Liu; Sijie Yan; Dahua Lin; Bo Dai; |
763 | Quarantine: Sparsity Can Uncover The Trojan Attack Trigger for Free Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. |
Tianlong Chen; Zhenyu Zhang; Yihua Zhang; Shiyu Chang; Sijia Liu; Zhangyang Wang; |
764 | Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement By Re-Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR. |
Karren Yang; Dejan Marković; Steven Krenn; Vasu Agrawal; Alexander Richard; |
765 | Localized Adversarial Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Adversarial domain generalization is a popular approach to DG, but conventional approaches (1) struggle to sufficiently align features so that local neighborhoods are mixed across domains; and (2) can suffer from feature space over collapse which can threaten generalization performance. To address these limitations, we propose localized adversarial domain generalization with space compactness maintenance (LADG) which constitutes two major contributions. First, we propose an adversarial localized classifier as the domain discriminator, along with a principled primary branch. |
Wei Zhu; Le Lu; Jing Xiao; Mei Han; Jiebo Luo; Adam P. Harrison; |
766 | X-Trans2Cap: Cross-Modal Knowledge Transfer Using Transformer for 3D Dense Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we investigate a cross-modal knowledge transfer using Transformer for 3D dense captioning, X-Trans2Cap, to effectively boost the performance of single-modal 3D caption through knowledge distillation using a teacher-student framework. |
Zhihao Yuan; Xu Yan; Yinghong Liao; Yao Guo; Guanbin Li; Shuguang Cui; Zhen Li; |
767 | How Much Does Input Data Type Impact Final Face Model Accuracy? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: If a particular application domain requires accuracy X, which kinds of input data are suitable? Does the input data need to be 3D, or will 2D data suffice? This paper takes a step toward answering these questions using synthetic data. |
Jiahao Luo; Fahim Hasan Khan; Issei Mori; Akila de Silva; Eric Sandoval Ruezga; Minghao Liu; Alex Pang; James Davis; |
768 | Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet annotating 3D Lidar data for these tasks is tedious and costly. In this context, we propose a self-supervised pre-training method for 3D perception models that is tailored to autonomous driving data. |
Corentin Sautier; Gilles Puy; Spyros Gidaris; Alexandre Boulch; Andrei Bursuc; Renaud Marlet; |
769 | HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a free-viewpoint rendering method — HumanNeRF — that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. |
Chung-Yi Weng; Brian Curless; Pratul P. Srinivasan; Jonathan T. Barron; Ira Kemelmacher-Shlizerman; |
770 | PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person. |
Zhijian Yang; Xiaoran Fan; Volkan Isler; Hyun Soo Park; |
771 | Which Images To Label for Few-Shot Medical Landmark Detection? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We herein propose a novel Sample Choosing Policy (SCP) to select "the most worthy" images as the templates, in the context of medical landmark detection. |
Quan Quan; Qingsong Yao; Jun Li; S. Kevin Zhou; |
772 | Why Discard If You Can Recycle?: A Recycling Max Pooling Module for 3D Point Cloud Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We first show that this traditional approach causes only a fraction of 3D points contribute to the permutation invariant features, and discards the rest of the points. In order to address this issue and improve the performance of any baseline 3D point classification or segmentation model, we propose a new module, referred to as the Recycling MaxPooling (RMP) module, to recycle and utilize the features of some of the discarded points. |
Jiajing Chen; Burak Kakillioglu; Huantao Ren; Senem Velipasalar; |
773 | Explaining Deep Convolutional Neural Networks Via Latent Visual-Semantic Filter Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, generating semantic explanations about the learned representation is challenging without direct supervision to produce such explanations. We propose a general framework, Latent Visual Semantic Explainer (LaViSE), to teach any existing convolutional neural network to generate text descriptions about its own latent representations at the filter level. |
Yu Yang; Seungbae Kim; Jungseock Joo; |
774 | AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing approaches ignored the distribution difference between training and testing data, thereby inducing a large quantization error in inference. To address this issue, we propose a new quantization scheme, Alignment Quantization with ADMM-based Correlation Preservation (AlignQ), which exploits the cumulative distribution function (CDF) to align the data to be i.i.d. (independently and identically distributed) for quantization error minimization. |
Ting-An Chen; De-Nian Yang; Ming-Syan Chen; |
775 | Self-Distillation From The Last Mini-Batch for Consistency Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, they either require extra network architecture modification, or are difficult to parallelize. To cope with these challenges, we propose an efficient and reliable self-distillation framework, named Self-Distillation from Last Mini-Batch (DLB). |
Yiqing Shen; Liwu Xu; Yuzhe Yang; Yaqian Li; Yandong Guo; |
776 | Interactive Multi-Class Tiny-Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such imagery typically contains objects from various categories, yet the multi-class interactive annotation setting for the detection task has thus far been unexplored. To address these needs, we propose a novel interactive annotation method for multiple instances of tiny objects from multiple classes, based on a few point-based user inputs. |
Chunggi Lee; Seonwook Park; Heon Song; Jeongun Ryu; Sanghoon Kim; Haejoon Kim; Sérgio Pereira; Donggeun Yoo; |
777 | Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to learn light field saliency from pixel-level noisy labels obtained from unsupervised hand crafted featured-based saliency methods. |
Mingtao Feng; Kendong Liu; Liang Zhang; Hongshan Yu; Yaonan Wang; Ajmal Mian; |
778 | UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel framework for unsupervised/supervised GEBD, by using the Temporal Self-similarity Matrix (TSM) as the video representation. |
Hyolim Kang; Jinwoo Kim; Taehyun Kim; Seon Joo Kim; |
779 | Multi-View Depth Estimation By Fusing Single-View Depth Probability With Multi-View Geometry Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For such failure modes, single-view depth estimation methods are often more reliable. To this end, we propose MaGNet, a novel framework for fusing single-view depth probability with multi-view geometry, to improve the accuracy, robustness and efficiency of multi-view depth estimation. |
Gwangbin Bae; Ignas Budvytis; Roberto Cipolla; |
780 | Learning To Collaborate in Decentralized Learning of Personalized Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we dynamically update the mixing-weights to improve the personalized model for each node’s task and meanwhile learn a sparse topology to reduce communication costs. |
Shuangtong Li; Tianyi Zhou; Xinmei Tian; Dacheng Tao; |
781 | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present CLIP-NeRF, a multi-modal 3D object manipulation method for neural radiance fields (NeRF). |
Can Wang; Menglei Chai; Mingming He; Dongdong Chen; Jing Liao; |
782 | ART-Point: Improving Rotation Robustness of Point Cloud Classifiers Via Adversarial Rotation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, for the first time, we show that the rotation robustness of point cloud classifiers can also be acquired via adversarial training with better performance on both rotated and clean datasets. |
Ruibin Wang; Yibo Yang; Dacheng Tao; |
783 | Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF’s parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. |
Dor Verbin; Peter Hedman; Ben Mildenhall; Todd Zickler; Jonathan T. Barron; Pratul P. Srinivasan; |
784 | 360-Attack: Distortion-Aware Perturbations From Perspective-Views Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an adversarial attack targeting spherical images, called 360-attactk, that transfers adversarial perturbations from perspective-view (PV) images to a final adversarial spherical image. |
Yunjian Zhang; Yanwei Liu; Jinxia Liu; Jingbo Miao; Antonios Argyriou; Liming Wang; Zhen Xu; |
785 | Targeted Supervised Contrastive Learning for Long-Tailed Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that while supervised contrastive learning can help improve performance, past baselines suffer from poor uniformity brought in by imbalanced data distribution. |
Tianhong Li; Peng Cao; Yuan Yuan; Lijie Fan; Yuzhe Yang; Rogerio S. Feris; Piotr Indyk; Dina Katabi; |
786 | Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Thus, we propose a new pipeline to cumulatively adapt style, fog and the dual-factor (style and fog). |
Xianzheng Ma; Zhixiang Wang; Yacheng Zhan; Yinqiang Zheng; Zheng Wang; Dengxin Dai; Chia-Wen Lin; |
787 | Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Ev-TTA, a simple, effective test-time adaptation algorithm for event-based object recognition. |
Junho Kim; Inwoo Hwang; Young Min Kim; |
788 | Balanced Contrastive Learning for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on representation learning for imbalanced data. |
Jianggang Zhu; Zheng Wang; Jingjing Chen; Yi-Ping Phoebe Chen; Yu-Gang Jiang; |
789 | Slimmable Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs. |
Rang Meng; Weijie Chen; Shicai Yang; Jie Song; Luojun Lin; Di Xie; Shiliang Pu; Xinchao Wang; Mingli Song; Yueting Zhuang; |
790 | Bandits for Structure Perturbation-Based Black-Box Attacks To Graph Neural Networks With Theoretical Guarantees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing attacks to GNNs are either under the less practical threat model where the attacker is assumed to access the GNN model parameters, or under the practical black-box threat model but consider perturbing node features that are shown to be not enough effective. In this paper, we aim to bridge this gap and consider black-box attacks to GNNs with structure perturbation as well as with theoretical guarantees. |
Binghui Wang; Youqi Li; Pan Zhou; |
791 | NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image Registration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel, generic, and accurate diffeomorphic image registration framework that utilizes neural ordinary differential equations (NODEs). |
Yifan Wu; Tom Z. Jiahao; Jiancong Wang; Paul A. Yushkevich; M. Ani Hsieh; James C. Gee; |
792 | DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Patchmatch-based framework to work on high-resolution optical flow estimation. |
Zihua Zheng; Ni Nie; Zhi Ling; Pengfei Xiong; Jiangyu Liu; Hao Wang; Jiankun Li; |
793 | Few-Shot Object Detection With Fully Cross-Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head. |
Guangxing Han; Jiawei Ma; Shiyuan Huang; Long Chen; Shih-Fu Chang; |
794 | Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we investigate how to efficiently and effectively integrate features at varying scales and varying stages in a point cloud segmentation network. |
Dong Nie; Rui Lan; Ling Wang; Xiaofeng Ren; |
795 | Decoupling Makes Weakly Supervised Local Feature Better Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a decoupled training describe-then-detect pipeline tailored for weakly supervised local feature learning. |
Kunhong Li; Longguang Wang; Li Liu; Qing Ran; Kai Xu; Yulan Guo; |
796 | Cross-Architecture Self-Supervised Video Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning. |
Sheng Guo; Zihua Xiong; Yujie Zhong; Limin Wang; Xiaobo Guo; Bing Han; Weilin Huang; |
797 | High-Resolution Image Harmonization Via Collaborative Dual Transformations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. |
Wenyan Cong; Xinhao Tao; Li Niu; Jing Liang; Xuesong Gao; Qihao Sun; Liqing Zhang; |
798 | Homography Loss for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel method that takes all the objects into consideration and explores their mutual relationships to help better estimate the 3D boxes. |
Jiaqi Gu; Bojian Wu; Lubin Fan; Jianqiang Huang; Shen Cao; Zhiyu Xiang; Xian-Sheng Hua; |
799 | A Unified Model for Line Projections in Catadioptric Cameras With Rotationally Symmetric Mirrors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We start by taking some general point reflection assumptions and derive a line reflection constraint. This constraint is then used to define a line projection into the image. |
Pedro Miraldo; José Pedro Iglesias; |
800 | Dynamic Sparse R-CNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose to improve Sparse R-CNN with two dynamic designs. |
Qinghang Hong; Fengming Liu; Dong Li; Ji Liu; Lu Tian; Yi Shan; |
801 | MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. |
Inkyu Shin; Yi-Hsuan Tsai; Bingbing Zhuang; Samuel Schulter; Buyu Liu; Sparsh Garg; In So Kweon; Kuk-Jin Yoon; |
802 | Stable Long-Term Recurrent Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we expose instabilities of existing recurrent VSR networks on long sequences with low motion. |
Benjamin Naoto Chiche; Arnaud Woiselle; Joana Frontera-Pons; Jean-Luc Starck; |
803 | Dual-Generator Face Reenactment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the Dual-Generator (DG) network for large-pose face reenactment. |
Gee-Sern Hsu; Chun-Hung Tsai; Hung-Yi Wu; |
804 | Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our proposed method is the first to treat arbitrary rescaling, both upscaling and downscaling, as one unified process. |
Zhihong Pan; Baopu Li; Dongliang He; Mingde Yao; Wenhao Wu; Tianwei Lin; Xin Li; Errui Ding; |
805 | Self-Supervised Neural Articulated Shape and Appearance Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. |
Fangyin Wei; Rohan Chabra; Lingni Ma; Christoph Lassner; Michael Zollhöfer; Szymon Rusinkiewicz; Chris Sweeney; Richard Newcombe; Mira Slavcheva; |
806 | A Hybrid Quantum-Classical Algorithm for Robust Fitting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a hybrid quantum-classical algorithm for robust fitting. |
Anh-Dzung Doan; Michele Sasdelli; David Suter; Tat-Jun Chin; |
807 | Topology Preserving Local Road Network Estimation From Single Onboard Camera Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims at extracting the local road network topology, directly in the bird’s-eye-view (BEV), all in a complex urban setting. |
Yigit Baran Can; Alexander Liniger; Danda Pani Paudel; Luc Van Gool; |
808 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A novel algorithm to detect road lanes in the eigenlane space is proposed in this paper. |
Dongkwon Jin; Wonhui Park; Seong-Gyun Jeong; Heeyeon Kwon; Chang-Su Kim; |
809 | Human Instance Matting Via Mutual Guidance and Multi-Instance Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a new matting task called human instance matting (HIM), which requires the pertinent model to automatically predict a precise alpha matte for each human instance. |
Yanan Sun; Chi-Keung Tang; Yu-Wing Tai; |
810 | TCTrack: Temporal Contexts for Aerial Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present TCTrack, a comprehensive framework to fully exploit temporal contexts for aerial tracking. |
Ziang Cao; Ziyuan Huang; Liang Pan; Shiwei Zhang; Ziwei Liu; Changhong Fu; |
811 | SpaceEdit: Learning A Unified Editing Space for Open-Domain Image Color Editing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, large pretrained models (e.g., BERT, StyleGAN, CLIP) show great knowledge transfer and generalization capability on various downstream tasks within their domains. Inspired by these efforts, in this paper we propose a unified model for open-domain image editing focusing on color and tone adjustment of open-domain images while keeping their original content and structure. |
Jing Shi; Ning Xu; Haitian Zheng; Alex Smith; Jiebo Luo; Chenliang Xu; |
812 | GAN-Supervised Dense Visual Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end. |
William Peebles; Jun-Yan Zhu; Richard Zhang; Antonio Torralba; Alexei A. Efros; Eli Shechtman; |
813 | SwinTextSpotter: Scene Text Spotting Via Better Synergy Between Text Detection and Text Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter. |
Mingxin Huang; Yuliang Liu; Zhenghao Peng; Chongyu Liu; Dahua Lin; Shenggao Zhu; Nicholas Yuan; Kai Ding; Lianwen Jin; |
814 | Multi-Level Feature Learning for Contrastive Multi-View Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new framework of multi-level feature learning for contrastive multi-view clustering to address the aforementioned issue. |
Jie Xu; Huayi Tang; Yazhou Ren; Liang Peng; Xiaofeng Zhu; Lifang He; |
815 | RendNet: Unified 2D/3D Recognizer With Latent Space Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a result, we propose RenderNet, a unified architecture for recognition on both 2D and 3D scenarios, which considers both VG/RG representations and exploits their interaction by incorporating the VG-to-RG rasterization process. |
Ruoxi Shi; Xinyang Jiang; Caihua Shan; Yansen Wang; Dongsheng Li; |
816 | IPLAN: Interactive and Procedural Layout Planning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the capability of involving humans into the loop has been largely ignored in existing methods which are mostly end-to-end approaches. To this end, we propose a new human-in-the-loop generative model, iPLAN, which is capable of automatically generating layouts, but also interacting with designers throughout the whole procedure, enabling humans and AI to co-evolve a sketchy idea gradually into the final design. |
Feixiang He; Yanlong Huang; He Wang; |
817 | Video Frame Interpolation With Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods built upon convolutional networks generally face challenges of handling large motion due to the locality of convolution operations. To overcome this limitation, we introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. |
Liying Lu; Ruizheng Wu; Huaijia Lin; Jiangbo Lu; Jiaya Jia; |
818 | GIFS: Neural Implicit Function for General Shape Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel method to represent general shapes including non-watertight shapes and shapes with multi-layer surfaces. |
Jianglong Ye; Yuntao Chen; Naiyan Wang; Xiaolong Wang; |
819 | Deblur-NeRF: Neural Radiance Fields From Blurry Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, image blurriness caused by defocus or motion, which often occurs when capturing scenes in the wild, significantly degrades its reconstruction quality. To address this problem, We propose Deblur-NeRF, the first method that can recover a sharp NeRF from blurry input. |
Li Ma; Xiaoyu Li; Jing Liao; Qi Zhang; Xuan Wang; Jue Wang; Pedro V. Sander; |
820 | Egocentric Prediction of Action Target in 3D Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is important in fields like human-robot collaboration, but has not yet received enough attention from vision and learning communities. To stimulate more research on this challenging egocentric vision task, we propose a large multimodality dataset of more than 1 million frames of RGB-D and IMU streams, and provide evaluation metrics based on our high-quality 2D and 3D labels from semi-automatic annotation. |
Yiming Li; Ziang Cao; Andrew Liang; Benjamin Liang; Luoyao Chen; Hang Zhao; Chen Feng; |
821 | TemporalUV: Capturing Loose Clothing With Temporally Coherent UV Coordinates Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel approach to generate temporally coherent UV coordinates for loose clothing. |
You Xie; Huiqi Mao; Angela Yao; Nils Thuerey; |
822 | Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments, and identity switches. To alleviate this propagation of errors, we propose a new prediction paradigm that uses detections and their affinity matrices across frames as inputs, removing the need for error-prone data association during tracking. |
Xinshuo Weng; Boris Ivanovic; Kris Kitani; Marco Pavone; |
823 | DoubleField: Bridging The Neural Surface and Radiance Fields for High-Fidelity Human Reconstruction and Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce DoubleField, a novel framework combining the merits of both surface field and radiance field for high-fidelity human reconstruction and rendering. |
Ruizhi Shao; Hongwen Zhang; He Zhang; Mingjia Chen; Yan-Pei Cao; Tao Yu; Yebin Liu; |
824 | Towards Real-World Navigation With Deep Differentiable Planners Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We train embodied neural networks to plan and navigate unseen complex 3D environments, emphasising real-world deployment. |
Shu Ishida; João F. Henriques; |
825 | An Iterative Quantum Approach for Transformation Estimation From Point Sets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an iterative method for estimating rigid transformations from point sets using adiabatic quantum computation. |
Natacha Kuete Meli; Florian Mannel; Jan Lellmann; |
826 | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents Video K-Net, a simple, strong, and unified framework for fully end-to-end video panoptic segmentation. |
Xiangtai Li; Wenwei Zhang; Jiangmiao Pang; Kai Chen; Guangliang Cheng; Yunhai Tong; Chen Change Loy; |
827 | UnweaveNet: Unweaving Activity Stories Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose and showcase the efficacy of pretraining UnweaveNet in a self-supervised manner. |
Will Price; Carl Vondrick; Dima Damen; |
828 | Balanced MSE for Imbalanced Visual Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we identify that the widely used Mean Square Error (MSE) loss function can be ineffective in imbalanced regression. |
Jiawei Ren; Mingyuan Zhang; Cunjun Yu; Ziwei Liu; |
829 | Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. |
Matias Mendieta; Taojiannan Yang; Pu Wang; Minwoo Lee; Zhengming Ding; Chen Chen; |
830 | PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. |
Zitong Yu; Yuming Shen; Jingang Shi; Hengshuang Zhao; Philip H.S. Torr; Guoying Zhao; |
831 | Dimension Embeddings for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection. |
Yunpeng Zhang; Wenzhao Zheng; Zheng Zhu; Guan Huang; Dalong Du; Jie Zhou; Jiwen Lu; |
832 | Look Closer To Supervise Better: One-Shot Font Generation Via Component-Based Discriminator Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We investigate the drawbacks of previous studies and find that a coarse-grained discriminator is insufficient for supervising a font generator. To this end, we propose a novel Component-Aware Module (CAM), which supervises the generator to decouple content and style at a more fine-grained level, i.e., the component level. |
Yuxin Kong; Canjie Luo; Weihong Ma; Qiyuan Zhu; Shenggao Zhu; Nicholas Yuan; Lianwen Jin; |
833 | NeRFReN: Neural Radiance Fields With Reflections Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we propose to split a scene into transmitted and reflected components, and model the two components with separate neural radiance fields. |
Yuan-Chen Guo; Di Kang; Linchao Bao; Yu He; Song-Hai Zhang; |
834 | Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel. |
Zongsheng Yue; Qian Zhao; Jianwen Xie; Lei Zhang; Deyu Meng; Kwan-Yee K. Wong; |
835 | Finding Good Configurations of Planar Primitives in Unorganized Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an algorithm for detecting planar primitives from unorganized 3D point clouds. |
Mulin Yu; Florent Lafarge; |
836 | PhyIR: Physics-Based Inverse Rendering for Panoramic Indoor Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present PhyIR, a neural inverse rendering method with a more completed SVBRDF representation and a physics-based in-network rendering layer, which can handle complex material and incorporate physical constraints by re-rendering realistic and detailed specular reflectance. |
Zhen Li; Lingli Wang; Xiang Huang; Cihui Pan; Jiaqi Yang; |
837 | SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, recent region-aware adaptive instance normalization achieves great success but only considers the global background feature distribution, making the aligned foreground feature distribution biased. To address these issues, we propose a self-consistent style contrastive learning scheme (SCS-Co). |
Yucheng Hang; Bin Xia; Wenming Yang; Qingmin Liao; |
838 | Beyond Fixation: Dynamic Window Visual Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel method, named Dynamic Window Vision Transformer (DW-ViT). |
Pengzhen Ren; Changlin Li; Guangrun Wang; Yun Xiao; Qing Du; Xiaodan Liang; Xiaojun Chang; |
839 | Progressive End-to-End Object Detection in Crowded Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new query-based detection framework for crowd detection. |
Anlin Zheng; Yuang Zhang; Xiangyu Zhang; Xiaojuan Qi; Jian Sun; |
840 | FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Alternatively, we present a novel Feature-level Modality Compensation Network (FMCNet) for VIReID in this paper, which aims to compensate the missing modality-specific information in the feature level rather than in the image level, i.e., directly generating those missing modality-specific features of one modality from existing modality-shared features of the other modality. |
Qiang Zhang; Changzhou Lai; Jianan Liu; Nianchang Huang; Jungong Han; |
841 | Improving GAN Equilibrium By Raising Spatial Awareness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the issue of D dominating the competition in GANs, we aim to raise the spatial awareness of G. Randomly sampled multi-level heatmaps are encoded into the intermediate layers of G as an inductive bias. |
Jianyuan Wang; Ceyuan Yang; Yinghao Xu; Yujun Shen; Hongdong Li; Bolei Zhou; |
842 | Neural Convolutional Surfaces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work is concerned with representation of shapes while disentangling fine, local and possibly repeating geometry, from global, coarse structures. |
Luca Morreale; Noam Aigerman; Paul Guerrero; Vladimir G. Kim; Niloy J. Mitra; |
843 | HyperSegNAS: Bridging One-Shot Neural Architecture Search With 3D Medical Image Segmentation Using HyperNet Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To enable one-shot NAS for medical image segmentation, our method, named HyperSegNAS, introduces a HyperNet to assist super-net training by incorporating architecture topology information. |
Cheng Peng; Andriy Myronenko; Ali Hatamizadeh; Vishwesh Nath; Md Mahfuzur Rahman Siddiquee; Yufan He; Daguang Xu; Rama Chellappa; Dong Yang; |
844 | A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We find that, somewhat surprisingly, in ResNets, adversarial training makes models more sensitive to the background compared to foreground than standard training. |
Mazda Moayeri; Phillip Pope; Yogesh Balaji; Soheil Feizi; |
845 | ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. |
Rahul Sajnani; Adrien Poulenard; Jivitesh Jain; Radhika Dua; Leonidas J. Guibas; Srinath Sridhar; |
846 | Source-Free Domain Adaptation Via Distribution Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel framework called SFDA-DE to address SFDA task via source Distribution Estimation. |
Ning Ding; Yixing Xu; Yehui Tang; Chao Xu; Yunhe Wang; Dacheng Tao; |
847 | Robust Combination of Distributed Gradients Under Adversarial Perturbations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new server-side learning algorithm that robustly combines gradients. |
Kwang In Kim; |
848 | Exploring Endogenous Shift for Cross-Domain Detection: A Large-Scale Benchmark and Perturbation Suppression Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To support this study, we contribute the first Endogenous Domain Shift (EDS) benchmark, X-ray security inspection, where the endogenous shifts among the domains are mainly caused by different X-ray machine types with different hardware parameters, wear degrees, etc. |
Renshuai Tao; Hainan Li; Tianbo Wang; Yanlu Wei; Yifu Ding; Bowei Jin; Hongping Zhi; Xianglong Liu; Aishan Liu; |
849 | VisCUIT: Visual Auditor for Bias in CNN Image Classifier Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present VisCUIT, an interactive visualization system that reveals how and why a CNN classifier is biased. |
Seongmin Lee; Zijie J. Wang; Judy Hoffman; Duen Horng (Polo) Chau; |
850 | Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To reduce expert effort, we present AutoSWAP: a framework for automatically synthesizing data-efficient task-level LFs. |
Albert Tseng; Jennifer J. Sun; Yisong Yue; |
851 | Transferability Estimation Using Bhattacharyya Class Separability Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose Gaussian Bhattacharyya Coefficient (GBC), a novel method for quantifying transferability between a source model and a target dataset. |
Michal Pándy; Andrea Agostinelli; Jasper Uijlings; Vittorio Ferrari; Thomas Mensink; |
852 | DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work presents a novelend-to-end Transformer-based Directed Attention (Direc-Former) framework1for robust action recognition. |
Thanh-Dat Truong; Quoc-Huy Bui; Chi Nhan Duong; Han-Seok Seo; Son Lam Phung; Xin Li; Khoa Luu; |
853 | Hierarchical Self-Supervised Representation Learning for Movie Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast, in this paper we focus on self-supervised video learning for movie understanding and propose a novel hierarchical self-supervised pretraining strategy that separately pretrains each level of our hierarchical movie understanding model. |
Fanyi Xiao; Kaustav Kundu; Joseph Tighe; Davide Modolo; |
854 | Robust Egocentric Photo-Realistic Facial Expression Transfer for Virtual Reality Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key contribution to improve robustness and generalization, is that our method implicitly decouples, in an unsupervised manner, the facial expression from nuisance factors (e.g., headset, environment, facial appearance). |
Amin Jourabloo; Fernando De la Torre; Jason Saragih; Shih-En Wei; Stephen Lombardi; Te-Li Wang; Danielle Belko; Autumn Trimble; Hernan Badino; |
855 | Does Robustness on ImageNet Transfer to Downstream Tasks? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet. |
Yutaro Yamada; Mayu Otani; |
856 | Propagation Regularizer for Semi-Supervised Learning With Extremely Scarce Labeled Samples Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on SSL environment with extremely scarce labeled samples, only 1 or 2 labeled samples per class, where most of existing methods fail to learn. |
Noo-ri Kim; Jee-Hyong Lee; |
857 | Bailando: 3D Dance Generation By Actor-Critic GPT With Choreographic Memory Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In addition, the generated dance sequence also needs to maintain temporal coherency with different music genres. To tackle these challenges, we propose a novel music-to-dance framework, Bailando, with two powerful components: 1) a choreographic memory that learns to summarize meaningful dancing units from 3D pose sequence to a quantized codebook, 2) an actor-critic Generative Pre-trained Transformer (GPT) that composes these units to a fluent dance coherent to the music. |
Li Siyao; Weijiang Yu; Tianpei Gu; Chunze Lin; Quan Wang; Chen Qian; Chen Change Loy; Ziwei Liu; |
858 | Faithful Extreme Rescaling Via Generative Prior Reciprocated Invertible Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a Generative prior ReciprocAted Invertible rescaling Network (GRAIN) for generating faithful high-resolution (HR) images from low-resolution (LR) invertible images with an extreme upscaling factor (64x). |
Zhixuan Zhong; Liangyu Chai; Yang Zhou; Bailin Deng; Jia Pan; Shengfeng He; |
859 | Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To handle the first problem, we propose an efficient knowledge distillation model, named Distillation using Oracle Queries (DOQ), which shares parameters between teacher and student networks. |
Xian Qu; Changxing Ding; Xingao Li; Xubin Zhong; Dacheng Tao; |
860 | Proto2Proto: Can You Recognize The Car, The Way I Do? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to another via knowledge distillation. |
Monish Keswani; Sriranjani Ramakrishnan; Nishant Reddy; Vineeth N Balasubramanian; |
861 | Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: With a new and strong observation that the localization issue of the center-offset formulation can be remedied in a local-window search scheme in an ideal situation, we propose a multi-person pose estimation approach, dubbed as LOGO-CAP, by learning the LOcal-GlObal Contextual Adaptation for human Pose. |
Nan Xue; Tianfu Wu; Gui-Song Xia; Liangpei Zhang; |
862 | Learning Video Representations of Human Motion From Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we take an early step towards video representation learning of human actions with the help of largescale synthetic videos, particularly for human motion representation enhancement. |
Xi Guo; Wei Wu; Dongliang Wang; Jing Su; Haisheng Su; Weihao Gan; Jian Huang; Qin Yang; |
863 | TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that these applications naturally exhibit the characteristics of large intra-image (spatial) variance and small cross-image variance. This observation motivates our efficient translation variant convolution (TVConv) for layout-aware visual processing. |
Jierun Chen; Tianlang He; Weipeng Zhuo; Li Ma; Sangtae Ha; S.-H. Gary Chan; |
864 | Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images captured by one camera to low-resolution (LR) images captured by arbitrary target devices. |
Xiaoqian Xu; Pengxu Wei; Weikai Chen; Yang Liu; Mingzhi Mao; Liang Lin; Guanbin Li; |
865 | FS6D: Few-Shot 6D Pose Estimation of Novel Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training. |
Yisheng He; Yao Wang; Haoqiang Fan; Jian Sun; Qifeng Chen; |
866 | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments – (1) ObjectGoal Navigation (e.g. ‘find & go to a chair’) and (2) Pick&Place (e.g. ‘find mug, pick mug, find counter, place mug on counter’). |
Ram Ramrakhya; Eric Undersander; Dhruv Batra; Abhishek Das; |
867 | The Probabilistic Normal Epipolar Constraint for Frame-to-Frame Rotation Optimization Under Uncertain Feature Positions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, their approach does not take into account uncertainties, so that the accuracy of the estimated relative pose is highly dependent on accurate feature positions in the target frame. In this work, we introduce the probabilistic normal epipolar constraint (PNEC) that overcomes this limitation by accounting for anisotropic and inhomogeneous uncertainties in the feature positions. |
Dominik Muhle; Lukas Koestler; Nikolaus Demmel; Florian Bernard; Daniel Cremers; |
868 | Vision-Language Pre-Training for Boosting Scene Text Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language. |
Sibo Song; Jianqiang Wan; Zhibo Yang; Jun Tang; Wenqing Cheng; Xiang Bai; Cong Yao; |
869 | Reflection and Rotation Symmetry Detection Via Equivariant Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a group-equivariant convolutional network for symmetry detection, dubbed EquiSym, which leverages equivariant feature maps with respect to a dihedral group of reflection and rotation. |
Ahyun Seo; Byungjin Kim; Suha Kwak; Minsu Cho; |
870 | BoostMIS: Boosting Medical Image Semi-Supervised Learning With Adaptive Pseudo Labeling and Informative Active Annotation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel semi-supervised learning (SSL) framework named BoostMIS that combines adaptive pseudo labeling and informative active annotation to unleash the potential of medical image SSL models: (1) BoostMIS can adaptively leverage the cluster assumption and consistency regularization of the unlabeled data according to the current learning status. |
Wenqiao Zhang; Lei Zhu; James Hallinan; Shengyu Zhang; Andrew Makmur; Qingpeng Cai; Beng Chin Ooi; |
871 | Simple But Effective: CLIP Embeddings for Embodied AI Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. |
Apoorv Khandelwal; Luca Weihs; Roozbeh Mottaghi; Aniruddha Kembhavi; |
872 | NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Alternatively, a more graceful way is that global and local context can adaptively contribute per se to accommodate different visual data. To achieve this goal, we in this paper propose a novel ViT architecture, termed NomMer, which can dynamically Nominate the synergistic global-local context in vision transforMer. |
Hao Liu; Xinghua Jiang; Xin Li; Zhimin Bao; Deqiang Jiang; Bo Ren; |
873 | HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction. |
Yunze Liu; Yun Liu; Che Jiang; Kangbo Lyu; Weikang Wan; Hao Shen; Boqiang Liang; Zhoujie Fu; He Wang; Li Yi; |
874 | Collaborative Transformers for Grounded Situation Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. |
Junhyeong Cho; Youngseok Yoon; Suha Kwak; |
875 | DyRep: Bootstrapping Training With Dynamic Re-Parameterization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to bootstrap the training with minimal cost by devising a dynamic re-parameterization (DyRep) method, which encodes Rep technique into the training process that dynamically evolves the network structures. |
Tao Huang; Shan You; Bohan Zhang; Yuxuan Du; Fei Wang; Chen Qian; Chang Xu; |
876 | Not All Labels Are Equal: Rationalizing The Labeling Costs for Training Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector, ensuring that the network performs well in all classes. |
Ismail Elezi; Zhiding Yu; Anima Anandkumar; Laura Leal-Taixé; Jose M. Alvarez; |
877 | CPPF: Towards Robust Category-Level 9D Pose Estimation in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the problem of category-level 9D pose estimation in the wild, given a single RGB-D frame. |
Yang You; Ruoxi Shi; Weiming Wang; Cewu Lu; |
878 | Interact Before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Cross-modal knowledge interaction allows other modalities to supplement missing transferable information because of the cross-modal complementarity. Also, the most transferable aspects of data can be highlighted using cross-modal consensus. In this work, we present a novel model that jointly considers these two characteristics for domain adaptive action recognition. |
Lijin Yang; Yifei Huang; Yusuke Sugano; Yoichi Sato; |
879 | Interactive Disentanglement: Learning Concepts By Interacting With Their Prototype Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show the advantages of prototype representations for understanding and revising the latent space of neural concept learners. |
Wolfgang Stammer; Marius Memmel; Patrick Schramowski; Kristian Kersting; |
880 | CDGNet: Class Distribution Guided Network for Human Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Probably, a human head is less likely to be under the feet, and arms are more likely to be near the torso. Inspired by this observation, we make instance class distributions by accumulating the original human parsing label in the horizontal and vertical directions, which can be utilized as supervision signals. |
Kunliang Liu; Ouk Choi; Jianming Wang; Wonjun Hwang; |
881 | Recall@k Surrogate Loss With Large Batches and Similarity Mixup Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A differentiable surrogate loss for the recall is proposed in this work. |
Yash Patel; Giorgos Tolias; Jiří Matas; |
882 | Direct Voxel Grid Optimization: Super-Fast Convergence for Radiance Fields Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. |
Cheng Sun; Min Sun; Hwann-Tzong Chen; |
883 | Continual Test-Time Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The noisy pseudo-labels can further lead to error accumulation and catastrophic forgetting. To tackle these issues, we propose a continual test-time adaptation approach (CoTTA) which comprises two parts. |
Qin Wang; Olga Fink; Luc Van Gool; Dengxin Dai; |
884 | URetinex-Net: Retinex-Based Deep Unfolding Network for Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the commonly used hand-crafted priors and optimization-driven solutions lead to the absence of adaptivity and efficiency. To address these issues, in this paper, we propose a Retinex-based deep unfolding network (URetinex-Net), which unfolds an optimization problem into a learnable network to decompose a low-light image into reflectance and illumination layers. |
Wenhui Wu; Jian Weng; Pingping Zhang; Xu Wang; Wenhan Yang; Jianmin Jiang; |
885 | Towards Multi-Domain Single Image Dehazing Via Test-Time Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, we observe that this training strategy tends to compromise the model performance on individual datasets. Motivated by this observation, we propose a test-time training method which leverages a helper network to assist the dehazing model in better adapting to a domain of interest. |
Huan Liu; Zijun Wu; Liangyan Li; Sadaf Salehkalaibar; Jun Chen; Keyan Wang; |
886 | Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces From 3D MRI Scans With Geometric Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although traditional and deep learning-based algorithmic pipelines exist for this purpose, they have two major drawbacks: lengthy runtimes of multiple hours (traditional) or intricate post-processing, such as mesh extraction and topology correction (deep learning-based). In this work, we address both of these issues and propose Vox2Cortex, a deep learning-based algorithm that directly yields topologically correct, three-dimensional meshes of the boundaries of the cortex. |
Fabian Bongratz; Anne-Marie Rickmann; Sebastian Pölsterl; Christian Wachinger; |
887 | Deep Safe Multi-View Clustering: Reducing The Risk of Clustering Performance Degradation Caused By View Increase Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, we observe that learning from data with more views is not guaranteed to achieve better clustering performance than from data with fewer views. To address this issue, we propose a general deep learning based framework that is guaranteed to reduce the risk of performance degradation caused by view increase. |
Huayi Tang; Yong Liu; |
888 | Dynamic MLP for Fine-Grained Image Classification By Leveraging Geographical and Temporal Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To fully explore the potential of multimodal information, we propose a dynamic MLP on top of the image representation, which interacts with multimodal features at a higher and broader dimension. |
Lingfeng Yang; Xiang Li; Renjie Song; Borui Zhao; Juntian Tao; Shihao Zhou; Jiajun Liang; Jian Yang; |
889 | HP-Capsule: Unsupervised Face Part Discovery By Hierarchical Parsing Capsule Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Hierarchical Parsing Capsule Network (HP-Capsule) for unsupervised face subpart-part discovery. |
Chang Yu; Xiangyu Zhu; Xiaomei Zhang; Zidu Wang; Zhaoxiang Zhang; Zhen Lei; |
890 | ScanQA: 3D Question Answering for Spatial Scene Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a baseline model for 3D-QA, named ScanQA model, where the model learns a fused descriptor from 3D object proposals and encoded sentence embeddings. |
Daichi Azuma; Taiki Miyanishi; Shuhei Kurita; Motoaki Kawanabe; |
891 | MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose MuKEA to represent multimodal knowledge by an explicit triplet to correlate visual objects and fact answers with implicit relations. |
Yang Ding; Jing Yu; Bang Liu; Yue Hu; Mingxin Cui; Qi Wu; |
892 | Class-Incremental Learning By Knowledge Distillation With Adaptive Feature Consolidation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel class incremental learning approach based on deep neural networks, which continually learns new tasks with limited memory for storing examples in the previous tasks. |
Minsoo Kang; Jaeyoo Park; Bohyung Han; |
893 | Learning Program Representations for Food Images and Cooking Recipes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. |
Dim P. Papadopoulos; Enrique Mora; Nadiia Chepurko; Kuan Wei Huang; Ferda Ofli; Antonio Torralba; |
894 | Bending Graphs: Hierarchical Shape Matching Using Gated Optimal Transport Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we investigate a hierarchical learning design, to which we incorporate local patch-level information and global shape-level structures. |
Mahdi Saleh; Shun-Cheng Wu; Luca Cosmo; Nassir Navab; Benjamin Busam; Federico Tombari; |
895 | Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we call for an alternative paradigm for the OK-VQA task, which transforms the image into plain text, so that we can enable knowledge passage retrieval, and generative question-answering in the natural language space. |
Feng Gao; Qing Ping; Govind Thattai; Aishwarya Reganti; Ying Nian Wu; Prem Natarajan; |
896 | Federated Learning With Position-Aware Neurons Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, we propose Position-Aware Neurons (PANs) as an alternative, fusing position-related values (i.e., position encodings) into neuron outputs. |
Xin-Chun Li; Yi-Chu Xu; Shaoming Song; Bingshuai Li; Yinchuan Li; Yunfeng Shao; De-Chuan Zhan; |
897 | Fair Contrastive Learning for Facial Attribute Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we for the first time analyze unfairness caused by supervised contrastive learning and propose a new Fair Supervised Contrastive Loss (FSCL) for fair visual representation learning. |
Sungho Park; Jewook Lee; Pilhyeon Lee; Sunhee Hwang; Dohyung Kim; Hyeran Byun; |
898 | MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present the Multi-level Dependent Attention Network (MDAN) with two branches, to leverage the emotion hierarchy and the correlation between different affective levels and semantic levels. |
Liwen Xu; Zhengtao Wang; Bin Wu; Simon Lui; |
899 | Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. |
Xiran Fan; Chun-Hao Yang; Baba C. Vemuri; |
900 | BNUDC: A Two-Branched Deep Neural Network for Restoring Images From Under-Display Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a deep neural network with two branches to reverse each type of degradation, which is more effective than performing both restorations in a single forward network. |
Jaihyun Koh; Jangho Lee; Sungroh Yoon; |
901 | RGB-Depth Fusion GAN for Indoor Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incompleted depth images as input to predict a dense and completed depth map. |
Haowen Wang; Mingyuan Wang; Zhengping Che; Zhiyuan Xu; Xiuquan Qiao; Mengshi Qi; Feifei Feng; Jian Tang; |
902 | Training Object Detectors From Scratch: An Empirical Study in The Era of Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead of proposing a specific vision transformer based detector, in this work, our goal is to reveal the insights of training vision transformer based detectors from scratch. |
Weixiang Hong; Jiangwei Lao; Wang Ren; Jian Wang; Jingdong Chen; Wei Chu; |
903 | RCL: Recurrent Continuous Localization for Temporal Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: State-of-the-art methods mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the temporal domain with a discretized grid, and then regress the accurate boundaries. In this paper, we revisit this foundational stage and introduce Recurrent Continuous Localization (RCL), which learns a fully continuous anchoring representation. |
Qiang Wang; Yanhao Zhang; Yun Zheng; Pan Pan; |
904 | C2SLR: Consistency-Enhanced Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose two auxiliary constraints to enhance the CSLR backbones from the perspective of consistency. |
Ronglai Zuo; Brian Mak; |
905 | Human Trajectory Prediction With Momentary Observation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study a task named momentary trajectory prediction, which reduces the observed history from a long time sequence to an extreme situation of two frames, one frame for social and scene contexts and both frames for the velocity of agents. |
Jianhua Sun; Yuxuan Li; Liang Chai; Hao-Shu Fang; Yong-Lu Li; Cewu Lu; |
906 | FoggyStereo: Stereo Matching With Fog Volume Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Prior methods deem the fog as noise and discard it before matching. Different from them, we propose to explore depth hints from fog and improve stereo matching via these hints. |
Chengtang Yao; Lidong Yu; |
907 | Trajectory Optimization for Physics-Based Reconstruction of 3D Human Pose From Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We focus on the task of estimating a physically plausible articulated human motion from monocular video. |
Erik Gärtner; Mykhaylo Andriluka; Hongyi Xu; Cristian Sminchisescu; |
908 | Directional Self-Supervised Learning for Heavy Image Augmentations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a directional self-supervised learning paradigm (DSSL), which is compatible with significantly more augmentations. |
Yalong Bai; Yifan Yang; Wei Zhang; Tao Mei; |
909 | Lifelong Unsupervised Domain Adaptive Person Re-Identification With Coordinated Anti-Forgetting and Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID. |
Zhipeng Huang; Zhizheng Zhang; Cuiling Lan; Wenjun Zeng; Peng Chu; Quanzeng You; Jiang Wang; Zicheng Liu; Zheng-Jun Zha; |
910 | No-Reference Point Cloud Quality Assessment Via Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel no-reference quality assessment metric, the image transferred point cloud quality assessment (IT-PCQA), for 3D point clouds. |
Qi Yang; Yipeng Liu; Siheng Chen; Yiling Xu; Jun Sun; |
911 | Generating Representative Samples for Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Few-shot class representations are often biased due to data scarcity. To mitigate this issue, we propose to generate visual samples based on semantic embeddings using a conditional variational autoencoder (CVAE) model. |
Jingyi Xu; Hieu Le; |
912 | Comprehending and Ordering Semantics for Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture. |
Yehao Li; Yingwei Pan; Ting Yao; Tao Mei; |
913 | Dynamic Scene Graph Generation Via Anticipatory Pre-Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the ability of humans to infer the visual relationship, we propose a novel anticipatory pre-training paradigm based on Transformer to explicitly model the temporal correlation of visual relationships in different frames to improve dynamic scene graph generation. |
Yiming Li; Xiaoshan Yang; Changsheng Xu; |
914 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. |
Sifeng He; Xudong Yang; Chen Jiang; Gang Liang; Wei Zhang; Tan Pan; Qing Wang; Furong Xu; Chunguang Li; JinXiong Liu; Hui Xu; Kaiming Huang; Yuan Cheng; Feng Qian; Xiaobo Zhang; Lei Yang; |
915 | GaTector: A Unified Framework for Gaze Object Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we build a novel framework named GaTector to tackle the gaze object prediction problem in a unified way. |
Binglu Wang; Tao Hu; Baoshan Li; Xiaojuan Chen; Zhijie Zhang; |
916 | ELIC: Efficient Learned Image Compression With Unevenly Grouped Space-Channel Contextual Adaptive Coding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Combining the proposed uneven grouping model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. |
Dailan He; Ziming Yang; Weikun Peng; Rui Ma; Hongwei Qin; Yan Wang; |
917 | CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. |
Xiaoyi Dong; Jianmin Bao; Dongdong Chen; Weiming Zhang; Nenghai Yu; Lu Yuan; Dong Chen; Baining Guo; |
918 | LaTr: Layout-Aware Transformer for Scene-Text VQA Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). |
Ali Furkan Biten; Ron Litman; Yusheng Xie; Srikar Appalaraju; R. Manmatha; |
919 | Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy. |
Jingzhou Chen; Peng Wang; Jian Liu; Yuntao Qian; |
920 | ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we attempt to unfold an important factor that hinders the networks from generalizing across domains: through the lens of shortcut learning. |
WeiQin Chuah; Ruwan Tennakoon; Reza Hoseinnezhad; Alireza Bab-Hadiashar; David Suter; |
921 | Enhancing Face Recognition With Self-Supervised 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to enhance face recognition with a bypass of self-supervised 3D reconstruction, which enforces the neural backbone to focus on the identity-related depth and albedo information while neglects the identity-irrelevant pose and illumination information. |
Mingjie He; Jie Zhang; Shiguang Shan; Xilin Chen; |
922 | HeadNeRF: A Real-Time NeRF-Based Parametric Head Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose HeadNeRF, a novel NeRF-based parametric head model that integrates the neural radiance field to the parametric representation of the human head. |
Yang Hong; Bo Peng; Haiyao Xiao; Ligang Liu; Juyong Zhang; |
923 | FvOR: Robust Joint Shape and Pose Optimization for Few-View Object Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses. |
Zhenpei Yang; Zhile Ren; Miguel Angel Bautista; Zaiwei Zhang; Qi Shan; Qixing Huang; |
924 | Reduce Information Loss in Transformers for Pluralistic Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To keep input information as much as possible, we propose a new transformer based framework "PUT". |
Qiankun Liu; Zhentao Tan; Dongdong Chen; Qi Chu; Xiyang Dai; Yinpeng Chen; Mengchen Liu; Lu Yuan; Nenghai Yu; |
925 | Replacing Labeled Real-Image Datasets With Auto-Generated Contours Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). |
Hirokatsu Kataoka; Ryo Hayamizu; Ryosuke Yamada; Kodai Nakashima; Sora Takashima; Xinyu Zhang; Edgar Josafat Martinez-Noriega; Nakamasa Inoue; Rio Yokota; |
926 | Cross-Modal Transferable Adversarial Attacks From Images to Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, motivated by the observation that the low-level feature space between images and video frames are similar, we propose a simple yet effective cross-modal attack method, named as Image To Video (I2V) attack. |
Zhipeng Wei; Jingjing Chen; Zuxuan Wu; Yu-Gang Jiang; |
927 | Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a simple yet effective transformer-based architecture for scene text detection. |
Jingqun Tang; Wenqing Zhang; Hongye Liu; MingKun Yang; Bo Jiang; Guanglong Hu; Xiang Bai; |
928 | Do Explanations Explain? Model Knows Best Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Thus we propose an empirical framework for axiomatic evaluation of explanation methods. |
Ashkan Khakzar; Pedram Khorsandi; Rozhin Nobahari; Nassir Navab; |
929 | WebQA: Multihop and Multimodal QA Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce WebQA, a challenging new benchmark that proves difficult for large-scale state-of-the-art models which lack language groundable visual representations for novel objects and the ability to reason, yet trivial for humans. |
Yingshan Chang; Mridu Narang; Hisami Suzuki; Guihong Cao; Jianfeng Gao; Yonatan Bisk; |
930 | Occlusion-Robust Face Alignment Using A Viewpoint-Invariant Hierarchical Network Architecture Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new network architecture called GlomFace to model the facial hierarchies against various occlusions, which draws inspiration from the viewpoint-invariant hierarchy of facial structure. |
Congcong Zhu; Xintong Wan; Shaorong Xie; Xiaoqiang Li; Yinzheng Gu; |
931 | BasicVSR++: Improving Video Super-Resolution With Enhanced Propagation and Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment. |
Kelvin C.K. Chan; Shangchen Zhou; Xiangyu Xu; Chen Change Loy; |
932 | IDR: Self-Supervised Image Denoising Via Iterative Data Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a practical unsupervised image denoising method to achieve state-of-the-art denoising performance. |
Yi Zhang; Dasong Li; Ka Lung Law; Xiaogang Wang; Hongwei Qin; Hongsheng Li; |
933 | MogFace: Towards A Deeper Appreciation on Face Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on resolving three aforementioned challenges that exiting methods are difficult to finish off and present a novel face detector, termed MogFace. |
Yang Liu; Fei Wang; Jiankang Deng; Zhipeng Zhou; Baigui Sun; Hao Li; |
934 | GuideFormer: Transformers for Image Guided Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose GuideFormer, a fully transformer-based architecture for dense depth completion. |
Kyeongha Rho; Jinsung Ha; Youngjung Kim; |
935 | Multi-Label Iterated Learning for Image Classification With Label Ambiguity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. |
Sai Rajeswar; Pau Rodríguez; Soumye Singhal; David Vazquez; Aaron Courville; |
936 | Region-Aware Face Swapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel Region-Aware Face Swapping (RAFSwap) network to achieve identity-consistent harmonious high-resolution face generation in a local-global manner: 1) Local Facial Region-Aware (FRA) branch augments local identity-relevant features by introducing the Transformer to effectively model misaligned cross-scale semantic interaction. |
Chao Xu; Jiangning Zhang; Miao Hua; Qian He; Zili Yi; Yong Liu; |
937 | Towards Language-Free Training for Text-to-Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the first work to train text-to-image generation models without any text data. |
Yufan Zhou; Ruiyi Zhang; Changyou Chen; Chunyuan Li; Chris Tensmeyer; Tong Yu; Jiuxiang Gu; Jinhui Xu; Tong Sun; |
938 | Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, to address the aforementioned problem, we introduce Transformers, which naturally integrate global information, to generate more integral initial pseudo labels for end-to-end WSSS. |
Lixiang Ru; Yibing Zhan; Baosheng Yu; Bo Du; |
939 | Pushing The Envelope of Gradient Boosting Forests Via Globally-Optimized Oblique Trees Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To date, all successful GB versions use axis-aligned trees trained in a suboptimal way via greedy recursive partitioning. We address this gap by using a more powerful type of trees (having hyperplane splits) and an algorithm that can optimize, globally over all the tree parameters, the objective function that GB dictates. |
Magzhan Gabidolla; Miguel Á. Carreira-Perpiñán; |
940 | Physical Simulation Layer for Accurate 3D Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a novel approach for generative 3D modeling that explicitly encourages the physical and thus functional consistency of the generated shapes. |
Mariem Mezghanni; Théo Bodrito; Malika Boulkenafed; Maks Ovsjanikov; |
941 | Deformable Sprites for Unsupervised Video Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We describe a method to extract persistent elements of a dynamic scene from an input video. |
Vickie Ye; Zhengqi Li; Richard Tucker; Angjoo Kanazawa; Noah Snavely; |
942 | CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. |
Haisong Liu; Tao Lu; Yihui Xu; Jia Liu; Wenjie Li; Lijun Chen; |
943 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. |
Yan Wang; Yixuan Sun; Yiwen Huang; Zhongying Liu; Shuyong Gao; Wei Zhang; Weifeng Ge; Wenqiang Zhang; |
944 | Learning To Detect Mobile Objects From LiDAR Scans Without Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. |
Yurong You; Katie Luo; Cheng Perng Phoo; Wei-Lun Chao; Wen Sun; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; |
945 | BNV-Fusion: Dense 3D Reconstruction Using Bi-Level Neural Volume Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In order to incrementally integrate new depth maps into a global neural implicit representation, we propose a novel bi-level fusion strategy that considers both efficiency and reconstruction quality by design. |
Kejie Li; Yansong Tang; Victor Adrian Prisacariu; Philip H.S. Torr; |
946 | Probabilistic Representations for Video Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation. |
Jungin Park; Jiyoung Lee; Ig-Jae Kim; Kwanghoon Sohn; |
947 | EnvEdit: Environment Editing for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to limited available data for agent training and finite diversity in navigation environments, it is challenging for the agent to generalize to new, unseen environments. To address this problem, we propose EnvEdit, a data augmentation method that creates new environments by editing existing environments, which are used to train a more generalizable agent. |
Jialu Li; Hao Tan; Mohit Bansal; |
948 | Omnivore: A Single Model for Many Visual Modalities Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. |
Rohit Girdhar; Mannat Singh; Nikhila Ravi; Laurens van der Maaten; Armand Joulin; Ishan Misra; |
949 | Neural Shape Mating: Self-Supervised Object Assembly With Adversarial Shape Priors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By focusing on shape alignment rather than semantic cues, we can achieve across category generalization and scaling. In this paper, we introduce a novel task, pairwise 3D geometric shape mating, and propose Neural Shape Mating (NSM) to tackle this problem. |
Yun-Chun Chen; Haoda Li; Dylan Turpin; Alec Jacobson; Animesh Garg; |
950 | Reflash Dropout in Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, in this paper, we show that appropriate usage of dropout benefits SR networks and improves the generalization ability. |
Xiangtao Kong; Xina Liu; Jinjin Gu; Yu Qiao; Chao Dong; |
951 | WildNet: Learning Domain Generalized Semantic Segmentation From The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to diversify both the content and style of the source domain with the help of the wild. |
Suhyeon Lee; Hongje Seong; Seongwon Lee; Euntai Kim; |
952 | Auditing Privacy Defenses in Federated Learning Via Generative Gradient Leakage Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we validate that the private training data can still be leaked under certain defense settings with a new type of leakage, i.e., Generative Gradient Leakage (GGL). |
Zhuohang Li; Jiaxin Zhang; Luyang Liu; Jian Liu; |
953 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. |
Haibao Yu; Yizhen Luo; Mao Shu; Yiyi Huo; Zebang Yang; Yifeng Shi; Zhenglong Guo; Hanyu Li; Xing Hu; Jirui Yuan; Zaiqing Nie; |
954 | DECORE: Deep Compression With Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present DECORE, a reinforcement learning based approach to automate the network compression process. |
Manoj Alwani; Yang Wang; Vashisht Madhavan; |
955 | Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose jointly training 3D detection and 3D tracking from only monocular videos in an end-to-end manner. |
Peixuan Li; Jieyu Jin; |
956 | MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To benefit from both the powerful feature representation in DNN and pixel-level geometric constraints, we reformulate the monocular object depth estimation as a progressive refinement problem and propose a joint semantic and geometric cost volume to model the depth error. |
Qing Lian; Peiliang Li; Xiaozhi Chen; |
957 | Task Discrepancy Maximization for Fine-Grained Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our objective is to localize the class-wise discriminative regions by highlighting channels encoding distinct information of the class. |
SuBeen Lee; WonJun Moon; Jae-Pil Heo; |
958 | FedDC: Federated Learning With Non-IID Data Via Local Drift Decoupling and Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the key challenge in federated learning is that the clients have significant statistical heterogeneity among their local data distributions, which would cause inconsistent optimized local models on the client-side. To address this fundamental dilemma, we propose a novel federated learning algorithm with local drift decoupling and correction (FedDC). |
Liang Gao; Huazhu Fu; Li Li; Yingwen Chen; Ming Xu; Cheng-Zhong Xu; |
959 | Efficient Classification of Very Large Images With Tiny Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an end-to-end CNN model termed Zoom-In network that leverages hierarchical attention sampling for classification of large images with tiny objects using a single GPU. |
Fanjie Kong; Ricardo Henao; |
960 | SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, continuously growing and redundant template features lead to an inefficient inference. To alleviate this, we propose a novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features. |
Zhihui Lin; Tianyu Yang; Maomao Li; Ziyu Wang; Chun Yuan; Wenhao Jiang; Wei Liu; |
961 | Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle the aforementioned problems, we propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level. |
Yuenan Hou; Xinge Zhu; Yuexin Ma; Chen Change Loy; Yikang Li; |
962 | Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We corroborate this analysis with extensive experimental support that shows that many of the fairness heuristics used in computer vision also degrade performance on the most disadvantaged groups. Building on these insights, we propose an adaptive augmentation strategy that, uniquely, of all methods tested, improves performance for the disadvantaged groups. |
Dominik Zietlow; Michael Lohaus; Guha Balakrishnan; Matthäus Kleindessner; Francesco Locatello; Bernhard Schölkopf; Chris Russell; |
963 | Generating Diverse 3D Reconstructions From A Single Occluded Face Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Furthermore, while a plurality of 3D reconstructions is plausible in the occluded regions, existing approaches are limited to generating only a single solution. To address both of these challenges, we present Diverse3DFace, which is specifically designed to simultaneously generate a diverse and realistic set of 3D reconstructions from a single occluded face image. |
Rahul Dey; Vishnu Naresh Boddeti; |
964 | RBGNet: Ray-Based Grouping for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds. |
Haiyang Wang; Shaoshuai Shi; Ze Yang; Rongyao Fang; Qi Qian; Hongsheng Li; Bernt Schiele; Liwei Wang; |
965 | Stand-Alone Inter-Frame Attention in Video Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location. |
Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Jiebo Luo; Tao Mei; |
966 | Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations; a) model-free joint localization and b) model-based parametric regression. |
Jogendra Nath Kundu; Siddharth Seth; Pradyumna YM; Varun Jampani; Anirban Chakraborty; R. Venkatesh Babu; |
967 | Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images Via Online Resources Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The internet is being used as the go-to way to verify information using different sources and modalities. Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-caption pairing using Web evidence. |
Sahar Abdelnabi; Rakibul Hasan; Mario Fritz; |
968 | Memory-Augmented Deep Conditional Unfolding Network for Pan-Sharpening Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Additionally, they ignore the different characteristics of each band of MS images and directly concatenate them with panchromatic (PAN) images, leading to severe copy artifacts. To address the above issues, we propose an interpretable deep neural network, namely Memory-augmented Deep Conditional Unfolding Network with two specified core designs. |
Gang Yang; Man Zhou; Keyu Yan; Aiping Liu; Xueyang Fu; Fan Wang; |
969 | Semi-Supervised Wide-Angle Portraits Correction By Multi-Scale Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we design a semi-supervised scheme and build a high-quality unlabeled dataset with rich scenarios, allowing us to simultaneously use labeled and unlabeled data to improve performance. |
Fushun Zhu; Shan Zhao; Peng Wang; Hao Wang; Hua Yan; Shuaicheng Liu; |
970 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels. |
Dengpan Fu; Dongdong Chen; Hao Yang; Jianmin Bao; Lu Yuan; Lei Zhang; Houqiang Li; Fang Wen; Dong Chen; |
971 | Adiabatic Quantum Computing for Multi Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, current MOT formulations are unsuitable for quantum computing due to their scaling properties. In this work, we therefore propose the first MOT formulation designed to be solved with AQC. |
Jan-Nico Zaech; Alexander Liniger; Martin Danelljan; Dengxin Dai; Luc Van Gool; |
972 | Feature Erasing and Diffusion Network for Occluded Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Feature Erasing and Diffusion Network (FED) to simultaneously handle challenges from NPO and NTP. |
Zhikang Wang; Feng Zhu; Shixiang Tang; Rui Zhao; Lihuo He; Jiangning Song; |
973 | Is Mapping Necessary for Realistic PointGoal Navigation? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper. |
Ruslan Partsey; Erik Wijmans; Naoki Yokoyama; Oles Dobosevych; Dhruv Batra; Oleksandr Maksymets; |
974 | Node-Aligned Graph Convolutional Network for Whole-Slide Image Representation and Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a hierarchical global-to-local clustering strategy to build a Node-Aligned GCN (NAGCN) to represent WSI with rich local structural information as well as global distribution. |
Yonghang Guan; Jun Zhang; Kuan Tian; Sen Yang; Pei Dong; Jinxi Xiang; Wei Yang; Junzhou Huang; Yuyao Zhang; Xiao Han; |
975 | Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a similarity-aware CAC framework that jointly learns representation and similarity metric. |
Min Shi; Hao Lu; Chen Feng; Chengxin Liu; Zhiguo Cao; |
976 | Masked Feature Prediction for Self-Supervised Visual Pre-Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models. |
Chen Wei; Haoqi Fan; Saining Xie; Chao-Yuan Wu; Alan Yuille; Christoph Feichtenhofer; |
977 | Critical Regularizations for Neural Surface Reconstruction in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present RegSDF, which shows that proper point cloud supervisions and geometry regularizations are sufficient to produce high-quality and robust reconstruction results. |
Jingyang Zhang; Yao Yao; Shiwei Li; Tian Fang; David McKinnon; Yanghai Tsin; Long Quan; |
978 | EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an unsupErvised discriminAnt Subspace lEarning (EASE) that improves transductive few-shot learning performance by learning a linear projection onto a subspace built from features of the support set and the unlabeled query set in the test time. |
Hao Zhu; Piotr Koniusz; |
979 | Object-Relation Reasoning Graph for Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an Object-Relation Reasoning Graph (OR2G) for reasoning about action in videos. |
Yangjun Ou; Li Mi; Zhenzhong Chen; |
980 | Semantic Segmentation By Early Region Proxy Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics. |
Yifan Zhang; Bo Pang; Cewu Lu; |
981 | GIQE: Generic Image Quality Enhancement Via Nth Order Iterative Degradation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we utilize a synthetic degradation model that recursively applies sets of random degradations to generate naturalistic degradation images of varying complexity, which are used as input. |
Pranjay Shyam; Kyung-Soo Kim; Kuk-Jin Yoon; |
982 | Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance. |
Justin Lazarow; Weijian Xu; Zhuowen Tu; |
983 | FaceVerse: A Fine-Grained and Detail-Controllable 3D Face Morphable Model From A Hybrid Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FaceVerse, a fine-grained 3D Neural Face Model, which is built from hybrid East Asian face datasets containing 60K fused RGB-D images and 2K high-fidelity 3D head scan models. |
Lizhen Wang; Zhiyuan Chen; Tao Yu; Chenguang Ma; Liang Li; Yebin Liu; |
984 | Bring Evanescent Representations to Life in Lifelong Class Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, if not updated, those representations become increasingly outdated as the incremental learning progresses with new classes. To address the aforementioned problems, we propose a framework which aims to (i) model the semantic drift by learning the relationship between representations of past and novel classes among incremental steps, and (ii) estimate the feature drift, defined as the evolution of the representations learned by models at each incremental step. |
Marco Toldo; Mete Ozay; |
985 | Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures With Uncalibrated Stereo Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: On the contrary, we propose GP2, General-Purpose and Geometry-Preserving training scheme, and show that conventional SVDE models can learn correct shifts themselves without any post-processing, benefiting from using stereo data even in the geometry-preserving setting. |
Nikolay Patakin; Anna Vorontsova; Mikhail Artemyev; Anton Konushin; |
986 | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To apply gesture recognition to long-distance interactive scenes such as meetings and smart homes, a large RGB-D video dataset LD-ConGR is established in this paper. |
Dan Liu; Libo Zhang; Yanjun Wu; |
987 | SimVQA: Exploring Simulated Environments for Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we explore using synthetic computer-generated data to fully control the visual and language space, allowing us to provide more diverse scenarios. |
Paola Cascante-Bonilla; Hui Wu; Letao Wang; Rogerio S. Feris; Vicente Ordonez; |
988 | Thin-Plate Spline Motion Model for Image Animation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it remains a significant challenge for current unsupervised methods when there is a large pose gap between the objects in the source and driving images. In this paper, a new end-to-end unsupervised motion transfer framework is proposed to overcome such issue. |
Jian Zhao; Hui Zhang; |
989 | Learning Local Displacements for Point Cloud Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud. |
Yida Wang; David Joseph Tan; Nassir Navab; Federico Tombari; |
990 | Human Hands As Probes for Interactive Object Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. |
Mohit Goyal; Sahil Modi; Rishabh Goyal; Saurabh Gupta; |
991 | Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the distortion of l-inf FW attacks (the attack’s l-2 norm). |
Theodoros Tsiligkaridis; Jay Roberts; |
992 | Certified Patch Robustness Via Smoothed Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We demonstrate how using vision transformers enables significantly better certified patch robustness that is also more computationally efficient and does not incur a substantial drop in standard accuracy. |
Hadi Salman; Saachi Jain; Eric Wong; Aleksander Madry; |
993 | Look Back and Forth: Video Super-Resolution With Explicit Temporal Difference Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to explore the role of explicit temporal difference modeling in both LR and HR space. |
Takashi Isobe; Xu Jia; Xin Tao; Changlin Li; Ruihuang Li; Yongjie Shi; Jing Mu; Huchuan Lu; Yu-Wing Tai; |
994 | UCC: Uncertainty Guided Cross-Head Co-Training for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel learning framework called Uncertainty guided Cross-head Co-training (UCC) for semi-supervised semantic segmentation. |
Jiashuo Fan; Bin Gao; Huan Jin; Lihui Jiang; |
995 | HVH: Learning A Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the aforementioned problems: 1) we use a novel, volumetric hair representation that is composed of thousands of primitives. |
Ziyan Wang; Giljoo Nam; Tuur Stuyck; Stephen Lombardi; Michael Zollhöfer; Jessica Hodgins; Christoph Lassner; |
996 | RADU: Ray-Aligned Depth Update Convolutions for ToF Data Denoising Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an iterative correcting approach operating in 3D space, that is designed to learn on 2.5D data by enabling 3D point convolutions to correct the points’ positions along the view direction. |
Michael Schelling; Pedro Hermosilla; Timo Ropinski; |
997 | Rethinking Visual Geo-Localization for Large-Scale Applications Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. |
Gabriele Berton; Carlo Masone; Barbara Caputo; |
998 | Learning Based Multi-Modality Image and Video Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work proposes a multi-modality compression framework for infrared and visible image pairs by exploiting the cross-modality redundancy. |
Guo Lu; Tianxiong Zhong; Jing Geng; Qiang Hu; Dong Xu; |
999 | A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we argue for intervening at the train time itself, so as to directly produce calibrated DNN models. |
Ramya Hebbalaguppe; Jatin Prakash; Neelabh Madan; Chetan Arora; |
1000 | The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information. |
Tianlong Chen; Zhenyu Zhang; Yu Cheng; Ahmed Awadallah; Zhangyang Wang; |
1001 | Deep Image-Based Illumination Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper,we formulate seamless illumination harmonization as anillumination exchange and aggregation problem. |
Zhongyun Bao; Chengjiang Long; Gang Fu; Daquan Liu; Yuanzhen Li; Jiaming Wu; Chunxia Xiao; |
1002 | ViM: Out-of-Distribution With Virtual-Logit Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: There are OOD samples that are easy to identify in the feature space while hard to distinguish in the logit space and vice versa. Motivated by this observation, we propose a novel OOD scoring method named Virtual-logit Matching (ViM), which combines the class-agnostic score from feature space and the In-Distribution (ID) class-dependent logits. |
Haoqi Wang; Zhizhong Li; Litong Feng; Wayne Zhang; |
1003 | Active Learning By Feature Mixing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel method for batch AL called ALFA-Mix. |
Amin Parvaneh; Ehsan Abbasnejad; Damien Teney; Gholamreza (Reza) Haffari; Anton van den Hengel; Javen Qinfeng Shi; |
1004 | Towards Accurate Facial Landmark Detection Via Cascaded Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, an accurate facial landmark detector is proposed based on cascaded transformers. |
Hui Li; Zidong Guo; Seon-Min Rhee; Seungju Han; Jae-Joon Han; |
1005 | Class-Aware Contrastive Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, the model’s judgment becomes noisier in real-world applications with extensive out-of-distribution data. To address this issue, we propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL), which is a drop-in helper to improve the pseudo-label quality and enhance the model’s robustness in the real-world setting. |
Fan Yang; Kai Wu; Shuyi Zhang; Guannan Jiang; Yong Liu; Feng Zheng; Wei Zhang; Chengjie Wang; Long Zeng; |
1006 | Long-Term Visual Map Sparsification With Heterogeneous GNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to overcome the environmental changes and reduce the map size at the same time by selecting points that are valuable to future localization. |
Ming-Fang Chang; Yipu Zhao; Rajvi Shah; Jakob J. Engel; Michael Kaess; Simon Lucey; |
1007 | Debiased Learning From Naturally Imbalanced Pseudo-Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To eliminate the model bias, we propose a simple yet effective method DebiasMatch, comprising of an adaptive debiasing module and an adaptive marginal loss. |
Xudong Wang; Zhirong Wu; Long Lian; Stella X. Yu; |
1008 | RNNPose: Recurrent 6-DoF Object Pose Refinement With Robust Correspondence Field Estimation and Pose Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a framework based on a recurrent neural network (RNN) for object pose refinement, which is robust to erroneous initial poses and occlusions. |
Yan Xu; Kwan-Yee Lin; Guofeng Zhang; Xiaogang Wang; Hongsheng Li; |
1009 | Ditto: Building Digital Twins of Articulated Objects From Interaction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception. |
Zhenyu Jiang; Cheng-Chun Hsu; Yuke Zhu; |
1010 | Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, it is often difficult to model complex group activities from a single view of spatial-temporal actor evolution. To tackle this problem, we propose a distinct Dual-path Actor Interaction (Dual-AI) framework, which flexibly arranges spatial and temporal transformers in two complementary orders, enhancing actor relations by integrating merits from different spatio-temporal paths. |
Mingfei Han; David Junhao Zhang; Yali Wang; Rui Yan; Lina Yao; Xiaojun Chang; Yu Qiao; |
1011 | Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content From Parameterized Transformations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Disentangling images into semantic content factors and transformations can provide significant benefits into many domain-specific image analysis tasks. To this end, we propose a generic unsupervised framework, Harmony, that simultaneously and explicitly disentangles semantic content from multiple parameterized transformations. |
Mostofa Rafid Uddin; Gregory Howe; Xiangrui Zeng; Min Xu; |
1012 | Talking Face Generation With Multilingual TTS Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a talking face generation system that generalizes to different languages. |
Hyoung-Kyu Song; Sang Hoon Woo; Junhyeok Lee; Seungmin Yang; Hyunjae Cho; Youseong Lee; Dongho Choi; Kang-wook Kim; |
1013 | A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled By Multiple Dance Genres Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, a serious limitation remains that all existing algorithms can return repeated patterns for a given initial pose sequence, which may be inferior. To mitigate this issue, we propose MNET, a novel and scalable approach that can perform music-conditioned pluralistic dance generation synthesized by multiple dance genres using only a single model. |
Jinwoo Kim; Heeseok Oh; Seongjean Kim; Hoseok Tong; Sanghoon Lee; |
1014 | Kernelized Few-Shot Object Detection With Efficient Integral Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We design a Kernelized Few-shot Object Detector by leveraging kernelized matrices computed over multiple proposal regions, which yield expressive non-linear representations whose model complexity is learned on the fly. |
Shan Zhang; Lei Wang; Naila Murray; Piotr Koniusz; |
1015 | Transformer Based Line Segment Classifier With Image Context for Real-Time Vanishing Point Detection in Manhattan World Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We find that image context can also contribute to accurate line classification. Based on this observation, we propose to classify line segments into three groups according to three unknown-but-sought vanishing points with Manhattan world assumption, using both geometric information and image context in this work. |
Xin Tong; Xianghua Ying; Yongjie Shi; Ruibin Wang; Jinfa Yang; |
1016 | Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It is a challenging task since representation optimization and feature retention can only be achieved under supervision from new classes. To address this problem, we propose a novel self-sustaining representation expansion scheme. |
Kai Zhu; Wei Zhai; Yang Cao; Jiebo Luo; Zheng-Jun Zha; |
1017 | Adaptive Early-Learning Correction for Segmentation From Noisy Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study the learning dynamics of deep segmentation networks trained on inaccurately-annotated data. |
Sheng Liu; Kangning Liu; Weicheng Zhu; Yiqiu Shen; Carlos Fernandez-Granda; |
1018 | Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing works, e.g., using the twilight as the intermediate target domain to perform the adaptation from daytime to nighttime, may fail to cope with the inherent difference between datasets caused by the camera equipment and the urban style. Faced with these two types of domain shifts, i.e., the illumination and the inherent difference of the datasets, we propose a novel domain adaptation framework via cross-domain correlation distillation, called CCDistill. |
Huan Gao; Jichang Guo; Guoli Wang; Qian Zhang; |
1019 | Context-Aware Video Reconstruction for Rolling Shutter Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these methods generate intermediate GS frames through image warping based on the RS model, which inevitably result in black holes and noticeable motion artifacts. In this paper, we alleviate these issues by proposing a context-aware GS video reconstruction architecture. |
Bin Fan; Yuchao Dai; Zhiyuan Zhang; Qi Liu; Mingyi He; |
1020 | Towards Efficient Data Free Black-Box Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, by rethinking the collaborative relationship between the generator and the substitute model, we design a novel black-box attack framework. |
Jie Zhang; Bo Li; Jianghe Xu; Shuang Wu; Shouhong Ding; Lei Zhang; Chao Wu; |
1021 | Robust Contrastive Learning Against Noisy Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new contrastive loss function that is robust against noisy views. |
Ching-Yao Chuang; R Devon Hjelm; Xin Wang; Vibhav Vineet; Neel Joshi; Antonio Torralba; Stefanie Jegelka; Yale Song; |
1022 | More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we present VDTTS, a Visually-Driven Text-to-Speech model. |
Michael Hassid; Michelle Tadmor Ramanovich; Brendan Shillingford; Miaosen Wang; Ye Jia; Tal Remez; |
1023 | Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose our analysis framework, Cross-Modal Perceptionist, under both supervised and unsupervised learning. |
Cho-Ying Wu; Chin-Cheng Hsu; Ulrich Neumann; |
1024 | On Generalizing Beyond Domains in Cross-Domain Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Many recent methods focus on preventing catastrophic forgetting under a typical assumption of thetrain and test data following a similar distribution. In thiswork, we consider the more realistic scenario of continuallearning under domain shifts where the model is able to gen-eralize its inference to a an unseen domain. |
Christian Simon; Masoud Faraki; Yi-Hsuan Tsai; Xiang Yu; Samuel Schulter; Yumin Suh; Mehrtash Harandi; Manmohan Chandraker; |
1025 | RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The existing methods based on Convolutional Neural Network (CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. |
Zhicheng Geng; Luming Liang; Tianyu Ding; Ilya Zharkov; |
1026 | Learning Memory-Augmented Unidirectional Metrics for Cross-Modality Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In response, we propose a Memory-Augmented Unidirectional Metric (MAUM) learning method consisting of two novel designs, i.e., unidirectional metrics, and memory-based augmentation. |
Jialun Liu; Yifan Sun; Feng Zhu; Hongbin Pei; Yi Yang; Wenhui Li; |
1027 | A Closer Look at Few-Shot Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As our first contribution, we propose a framework to analyze existing methods during the adaptation. |
Yunqing Zhao; Henghui Ding; Houjing Huang; Ngai-Man Cheung; |
1028 | Depth-Supervised NeRF: Fewer Views and Faster Training for Free Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: One potential reason is that standard volumetric rendering does not enforce the constraint that most of a scene’s geometry consist of empty space and opaque surfaces. We formalize the above assumption through DS-NeRF (Depth-supervised Neural Radiance Fields), a loss for learning radiance fields that takes advantage of readily-available depth supervision. |
Kangle Deng; Andrew Liu; Jun-Yan Zhu; Deva Ramanan; |
1029 | Unsupervised Domain Generalization By Learning A Bridge Across Domains Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, different from most cross-domain works that utilize some (or full) source domain supervision, we approach a relatively new and very practical Unsupervised Domain Generalization (UDG) setup of having no training supervision in neither source nor target domains. |
Sivan Harary; Eli Schwartz; Assaf Arbelle; Peter Staar; Shady Abu-Hussein; Elad Amrani; Roei Herzig; Amit Alfassy; Raja Giryes; Hilde Kuehne; Dina Katabi; Kate Saenko; Rogerio S. Feris; Leonid Karlinsky; |
1030 | Partial Class Activation Attention for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Beyond the previous CAM generated from image-level classification, we present Partial CAM, which subdivides the task into region-level prediction and achieves better localization performance. |
Sun-Ao Liu; Hongtao Xie; Hai Xu; Yongdong Zhang; Qi Tian; |
1031 | Multi-Scale Memory-Based Video Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To achieve fine-grained deblurring, we designed a memory branch to memorize the blurry-sharp feature pairs in the memory bank, thus providing useful information for the blurry query input. |
Bo Ji; Angela Yao; |
1032 | SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic Characters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work presents SkinningNet, an end-to-end Two-Stream Graph Neural Network architecture that computes skinning weights from an input mesh and its associated skeleton, without making any assumptions on shape class and structure of the provided mesh. |
Albert Mosella-Montoro; Javier Ruiz-Hidalgo; |
1033 | A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a scalable combinatorial algorithm for globally optimizing over the space of geometrically consistent mappings between 3D shapes. |
Paul Roetzer; Paul Swoboda; Daniel Cremers; Florian Bernard; |
1034 | Learning Trajectory-Aware Transformer for Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel Trajectory-aware Transformer for Video Super-Resolution (TTVSR). |
Chengxu Liu; Huan Yang; Jianlong Fu; Xueming Qian; |
1035 | Differentiable Dynamics for Articulated 3D Human Motion Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video. |
Erik Gärtner; Mykhaylo Andriluka; Erwin Coumans; Cristian Sminchisescu; |
1036 | Geometric Structure Preserving Warp for Natural Image Stitching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of the existing methods ignore the large-scale layouts reflected by straight lines or curves, decreasing overall stitching quality. To address this issue, this work presents a structure-preserving stitching approach that produces images with natural visual effects and less distortion. |
Peng Du; Jifeng Ning; Jiguang Cui; Shaoli Huang; Xinchao Wang; Jiaxin Wang; |
1037 | GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For the first time, we address the problem of generating full-body, hand and head motions of an avatar grasping an unknown object. |
Omid Taheri; Vasileios Choutas; Michael J. Black; Dimitrios Tzionas; |
1038 | Multi-Robot Active Mapping Via Neural Bipartite Graph Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel algorithm, namely NeuralCoMapping, which takes advantage of both approaches. |
Kai Ye; Siyan Dong; Qingnan Fan; He Wang; Li Yi; Fei Xia; Jue Wang; Baoquan Chen; |
1039 | Adversarial Texture for Fooling Person Detectors in The Physical World Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a generative method, named Toroidal-Cropping-based Expandable Generative Attack (TC-EGA), to craft AdvTexture with repetitive structures. |
Zhanhao Hu; Siyuan Huang; Xiaopei Zhu; Fuchun Sun; Bo Zhang; Xiaolin Hu; |
1040 | Focal Length and Object Pose Estimation Via Render and Compare Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce FocalPose, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object. |
Georgy Ponimatkin; Yann Labbé; Bryan Russell; Mathieu Aubry; Josef Sivic; |
1041 | TO-FLOW: Efficient Continuous Normalizing Flows With Temporal Optimization Adjoint With Moving Speed Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training. |
Shian Du; Yihong Luo; Wei Chen; Jian Xu; Delu Zeng; |
1042 | Arbitrary-Scale Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the design of scale-consistent positional encodings invariant to our generator’s layers transformations. |
Evangelos Ntavelis; Mohamad Shahbazi; Iason Kastanis; Radu Timofte; Martin Danelljan; Luc Van Gool; |
1043 | Cross-Modal Representation Learning for Zero-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR). |
Chung-Ching Lin; Kevin Lin; Lijuan Wang; Zicheng Liu; Linjie Li; |
1044 | Conditional Prompt Learning for Vision-Language Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; |
1045 | Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To do so, in this paper, we propose an efficient mini-batch sampling method, called graph sampling (GS), for large-scale deep metric learning. |
Shengcai Liao; Ling Shao; |
1046 | Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL), for introducing pixel level fine-grained guidance to the normalization architecture. |
Yupeng Shi; Xiao Liu; Yuxiang Wei; Zhongqin Wu; Wangmeng Zuo; |
1047 | Undoing The Damage of Label Shift for Cross-Domain Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we give an in-depth analysis and show that the damage of label shift can be overcome by aligning the data conditional distribution and correcting the posterior probability. |
Yahao Liu; Jinhong Deng; Jiale Tao; Tong Chu; Lixin Duan; Wen Li; |
1048 | GPV-Pose: Category-Level Object Pose Estimation Via Geometry-Guided Point-Wise Voting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is, however, a much more challenging task due to severe intra-class shape variations. To address this issue, we propose GPV-Pose, a novel framework for robust category-level pose estimation, harnessing geometric insights to enhance the learning of category-level pose-sensitive features. |
Yan Di; Ruida Zhang; Zhiqiang Lou; Fabian Manhardt; Xiangyang Ji; Nassir Navab; Federico Tombari; |
1049 | Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a novel method and dataset for 3D gaze estimation of a freely moving person from a distance, typically in surveillance views. |
Soma Nonaka; Shohei Nobuhara; Ko Nishino; |
1050 | Expressive Talking Head Generation With Granular Audio-Visual Control Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the Granularly Controlled Audio-Visual Talking Heads (GC-AVT), which controls lip movements, head poses, and facial expressions of a talking head in a granular manner. |
Borong Liang; Yan Pan; Zhizhi Guo; Hang Zhou; Zhibin Hong; Xiaoguang Han; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
1051 | Trustworthy Long-Tailed Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these issues, we propose a Trustworthy Long-tailed Classification (TLC) method to jointly conduct classification and uncertainty estimation to identify hard samples in a multi-expert framework. |
Bolian Li; Zongbo Han; Haining Li; Huazhu Fu; Changqing Zhang; |
1052 | Primitive3D: 3D Object Dataset Synthesis From Randomly Assembled Primitives Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, such a dataset is prohibitively expensive in 3D computer vision due to the substantial collection cost. To alleviate this issue, we propose a cost-effective method for automatically generating a large amount of 3D objects with annotations. |
Xinke Li; Henghui Ding; Zekun Tong; Yuwei Wu; Yeow Meng Chee; |
1053 | Mix and Localize: Localizing Sound Sources in Mixtures Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method for simultaneously localizing multiple sound sources within a visual scene. |
Xixi Hu; Ziyang Chen; Andrew Owens; |
1054 | FisherMatch: Semi-Supervised Rotation Regression Via Entropy-Based Filtering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the popular semi-supervised approach, FixMatch, we propose to leverage pseudo label filtering to facilitate the information flow from labeled data to unlabeled data in a teacher-student mutual learning framework. |
Yingda Yin; Yingcheng Cai; He Wang; Baoquan Chen; |
1055 | NPBG++: Accelerating Neural Point-Based Graphics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new system (NPBG++) for the novel view synthesis (NVS) task that achieves high rendering realism with low scene fitting time. |
Ruslan Rakhimov; Andrei-Timotei Ardelean; Victor Lempitsky; Evgeny Burnaev; |
1056 | SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To reduce the dependence of generative models on labeled data, we propose a semi-supervised hyper-spherical GAN for class-conditional fine-grained image generation, and our model is referred to as SphericGAN. |
Tianyi Chen; Yunfei Zhang; Xiaoyang Huo; Si Wu; Yong Xu; Hau San Wong; |
1057 | HairMapper: Removing Hair From Portraits Using GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Removing hair from portrait images is challenging due to the complex occlusions between hair and face, as well as the lack of paired portrait data with/without hair. To this end, we present a dataset and a baseline method for removing hair from portrait images using generative adversarial networks (GANs). |
Yiqian Wu; Yong-Liang Yang; Xiaogang Jin; |
1058 | Affine Medical Image Registration With Coarse-To-Fine Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. |
Tony C. W. Mok; Albert C. S. Chung; |
1059 | SMPL-A: Modeling Person-Specific Deformable Anatomy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To the best of our knowledge, this work is the first to present a learning-based approach to estimate the patient’s internal organ deformation for arbitrary human poses in order to assist with radiotherapy and similar medical protocols. |
Hengtao Guo; Benjamin Planche; Meng Zheng; Srikrishna Karanam; Terrence Chen; Ziyan Wu; |
1060 | Image Dehazing Transformer With Transmission-Aware 3D Position Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The key insight of this study is to investigate how to combine CNN and Transformer for image dehazing. |
Chun-Le Guo; Qixin Yan; Saeed Anwar; Runmin Cong; Wenqi Ren; Chongyi Li; |
1061 | Out-of-Distribution Generalization With Causal Invariant Transformations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To leverage the generally unknown causal mechanism, existing works assume a linear form of causal feature or require sufficiently many and diverse training domains, which are usually restrictive in practice. In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature. |
Ruoyu Wang; Mingyang Yi; Zhitang Chen; Shengyu Zhu; |
1062 | Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation Via Clustering Pseudo Heatmap Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a fast and high-performance LiDAR-based framework, referred to as Panoptic-PHNet, with three attractive aspects: 1) We introduce a clustering pseudo heatmap as a new paradigm, which, followed by a center grouping module, yields instance centers for efficient clustering without object-level learning tasks. |
Jinke Li; Xiao He; Yang Wen; Yuan Gao; Xiaoqiang Cheng; Dan Zhang; |
1063 | Dual-Key Multimodal Backdoors for Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that multimodal networks are vulnerable to a novel type of attack that we refer to as Dual-Key Multimodal Backdoors. |
Matthew Walmer; Karan Sikka; Indranil Sur; Abhinav Shrivastava; Susmit Jha; |
1064 | A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In addition, the higher resolution (e.g., 4K) of modern imaging devices results in larger displacement between frames. To address these challenges, we design a differentiable two-stage alignment scheme sequentially in patch and pixel level for effective JDD-B. |
Shi Guo; Xi Yang; Jianqi Ma; Gaofeng Ren; Lei Zhang; |
1065 | Unifying Panoptic Segmentation for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper aims to improve panoptic segmentation for real-world applications in three ways. |
Oliver Zendel; Matthias Schörghuber; Bernhard Rainer; Markus Murschitz; Csaba Beleznai; |
1066 | Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans From A Single Camera Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a compact motion representation by enforcing equivariance—a representation is expected to be transformed in the way that the pose is transformed. |
Jae Shin Yoon; Duygu Ceylan; Tuanfeng Y. Wang; Jingwan Lu; Jimei Yang; Zhixin Shu; Hyun Soo Park; |
1067 | On The Road to Online Adaptation for Semantic Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new problem formulation and a corresponding evaluation framework to advance research on unsupervised domain adaptation for semantic image segmentation. |
Riccardo Volpi; Pau De Jorge; Diane Larlus; Gabriela Csurka; |
1068 | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning. |
Jon Donnelly; Alina Jade Barnett; Chaofan Chen; |
1069 | Context-Aware Sequence Alignment Using 4D Skeletal Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, based on off-the-shelf human pose estimators, we propose a novel context-aware self-supervised learning architecture to align sequences of actions. |
Taein Kwon; Bugra Tekin; Siyu Tang; Marc Pollefeys; |
1070 | Perturbed and Strict Mean Teachers for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the prediction accuracy problem of consistency learning methods with novel extensions of the mean-teacher (MT) model, which include a new auxiliary teacher, and the replacement of MT’s mean square error (MSE) by a stricter confidence-weighted cross-entropy (Conf-CE) loss. |
Yuyuan Liu; Yu Tian; Yuanhong Chen; Fengbei Liu; Vasileios Belagiannis; Gustavo Carneiro; |
1071 | Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While the majority of FSL models focus on image classification, the extension to action recognition is rather challenging due to the additional temporal dimension in videos. To address this issue, we propose an end-to-end Motion-modulated Temporal Fragment Alignment Network (MTFAN) by jointly exploring the task-specific motion modulation and the multi-level temporal fragment alignment for Few-Shot Action Recognition (FSAR). |
Jiamin Wu; Tianzhu Zhang; Zhe Zhang; Feng Wu; Yongdong Zhang; |
1072 | Focal Sparse Convolutional Networks for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction. |
Yukang Chen; Yanwei Li; Xiangyu Zhang; Jian Sun; Jiaya Jia; |
1073 | Masked Autoencoders Are Scalable Vision Learners Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. |
Kaiming He; Xinlei Chen; Saining Xie; Yanghao Li; Piotr Dollár; Ross Girshick; |
1074 | Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Point-BERT, a novel paradigm for learning Transformers to generalize the concept of BERT onto 3D point cloud. |
Xumin Yu; Lulu Tang; Yongming Rao; Tiejun Huang; Jie Zhou; Jiwen Lu; |
1075 | Nested Collaborative Learning for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The networks trained on the long-tailed dataset vary remarkably, despite the same training settings, which shows the great uncertainty in long-tailed learning. To alleviate the uncertainty, we propose a Nested Collaborative Learning (NCL), which tackles the problem by collaboratively learning multiple experts together. |
Jun Li; Zichang Tan; Jun Wan; Zhen Lei; Guodong Guo; |
1076 | Crowd Counting in The Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By transforming the density map into the frequency domain and using the nice properties of the characteristic function, we propose a novel method that is simple, effective, and efficient. |
Weibo Shu; Jia Wan; Kay Chen Tan; Sam Kwong; Antoni B. Chan; |
1077 | Restormer: Efficient Transformer for High-Resolution Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. |
Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; |
1078 | STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Spatiotemporal Residual Predictive Model (STRPM) for high-resolution video prediction. |
Zheng Chang; Xinfeng Zhang; Shanshe Wang; Siwei Ma; Wen Gao; |
1079 | Learning From Untrimmed Videos: Self-Supervised Video Representation Learning With Hierarchical Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos. |
Zhiwu Qing; Shiwei Zhang; Ziyuan Huang; Yi Xu; Xiang Wang; Mingqian Tang; Changxin Gao; Rong Jin; Nong Sang; |
1080 | Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning With Pairwise Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work explores using a convolutional neural network (CNN) to jointly predict the atlas and a stationary velocity field (SVF) parameterization for diffeomorphic image registration with respect to the atlas. |
Zhipeng Ding; Marc Niethammer; |
1081 | IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing. |
Lingtong Kong; Boyuan Jiang; Donghao Luo; Wenqing Chu; Xiaoming Huang; Ying Tai; Chengjie Wang; Jie Yang; |
1082 | Large Loss Matters in Weakly Supervised Multi-Label Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: That is, the model first learns the representation of clean labels, and then starts memorizing noisy labels. Based on this finding, we propose novel methods for WSML which reject or correct the large loss samples to prevent model from memorizing the noisy label. |
Youngwook Kim; Jae Myung Kim; Zeynep Akata; Jungwoo Lee; |
1083 | Toward Practical Monocular Indoor Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To obtain more robustness, we propose a structure distillation approach to learn knacks from an off-the-shelf relative depth estimator that produces structured but metric-agnostic depth. |
Cho-Ying Wu; Jialiang Wang; Michael Hall; Ulrich Neumann; Shuochen Su; |
1084 | Attention Concatenation Volume for Accurate and Efficient Stereo Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel cost volume construction method which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. |
Gangwei Xu; Junda Cheng; Peng Guo; Xin Yang; |
1085 | Learning Distinctive Margin Toward Active Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a concise but effective ADA method called Select-by-Distinctive-Margin (SDM), which consists of a maximum margin loss and a margin sampling algorithm for data selection. |
Ming Xie; Yuxi Li; Yabiao Wang; Zekun Luo; Zhenye Gan; Zhongyi Sun; Mingmin Chi; Chengjie Wang; Pei Wang; |
1086 | Zero-Query Transfer Attacks on Context-Aware Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check of black-box object detectors operating on complex, natural scenes. |
Zikui Cai; Shantanu Rane; Alejandro E. Brito; Chengyu Song; Srikanth V. Krishnamurthy; Amit K. Roy-Chowdhury; M. Salman Asif; |
1087 | Neural Inertial Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes the inertial localization problem, the task of estimating the absolute location from a sequence of inertial sensor measurements. |
Sachini Herath; David Caruso; Chen Liu; Yufan Chen; Yasutaka Furukawa; |
1088 | Speed Up Object Detection on Gigapixel-Level Images With Patch Arrangement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Different from them, we devise a novel patch arrangement framework for fast object detection on gigapixel-level images. |
Jiahao Fan; Huabin Liu; Wenjie Yang; John See; Aixin Zhang; Weiyao Lin; |
1089 | Finding Fallen Objects Via Asynchronous Audio-Visual Integration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a setting in which to study multi-modal object localization in 3D virtual environments. |
Chuang Gan; Yi Gu; Siyuan Zhou; Jeremy Schwartz; Seth Alter; James Traer; Dan Gutfreund; Joshua B. Tenenbaum; Josh H. McDermott; Antonio Torralba; |
1090 | Learning SRGB-to-Raw-RGB De-Rendering With Content-Aware Metadata Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows how to improve the de-rendering results by jointly learning sampling and reconstruction. |
Seonghyeon Nam; Abhijith Punnappurath; Marcus A. Brubaker; Michael S. Brown; |
1091 | GraftNet: Towards Domain Generalized Stereo Matching With A Broad-Spectrum and Task-Oriented Feature Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to leverage the feature of a model trained on large-scale datasets to deal with the domain shift since it has seen various styles of images. |
Biyang Liu; Huimin Yu; Guodong Qi; |
1092 | Towards Total Recall in Industrial Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The best peforming approaches combine embeddings from ImageNet models with an outlier detection model. In this paper, we extend on this line of work and propose PatchCore, which uses a maximally representative memory bank of nominal patch-features. |
Karsten Roth; Latha Pemula; Joaquin Zepeda; Bernhard Schölkopf; Thomas Brox; Peter Gehler; |
1093 | DTA: Physical Camouflage Attacks Using Differentiable Transformation Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the Differentiable Transformation Attack (DTA), a framework for generating a robust physical adversarial pattern on a target object to camouflage it against object detection models with a wide range of transformations. |
Naufal Suryanto; Yongsu Kim; Hyoeun Kang; Harashta Tatimma Larasati; Youngyeo Yun; Thi-Thu-Huong Le; Hunmin Yang; Se-Yoon Oh; Howon Kim; |
1094 | Neural Recognition of Dashed Curves With Gestalt Law of Continuity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an innovative Transformer-based framework to recognize dashed curves based on both high-level features and low-level clues. |
Hanyuan Liu; Chengze Li; Xueting Liu; Tien-Tsin Wong; |
1095 | Semi-Supervised Object Detection Via Multi-Instance Alignment With Global Class Prototypes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To make full use of labeled data, we propose a Multi-instance Alignment model which enhances the prediction consistency based on Global Class Prototypes (MA-GCP). |
Aoxue Li; Peng Yuan; Zhenguo Li; |
1096 | HODOR: High-Level Object Descriptors for Object Re-Segmentation in Video Learned From Static Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles VOS by effectively leveraging annotated static images for understanding object appearance and scene context. |
Ali Athar; Jonathon Luiten; Alexander Hermans; Deva Ramanan; Bastian Leibe; |
1097 | Point Cloud Color Constancy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present Point Cloud Color Constancy, in short PCCC, an illumination chromaticity estimation algorithm exploiting a point cloud. |
Xiaoyan Xing; Yanlin Qian; Sibo Feng; Yuhan Dong; Jiří Matas; |
1098 | VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. |
Wenjia Xu; Yongqin Xian; Jiuniu Wang; Bernt Schiele; Zeynep Akata; |
1099 | Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel approach that learns disentangled representations of abnormalities illustrated by seen anomalies, pseudo anomalies, and latent residual anomalies (i.e., samples that have unusual residuals compared to the normal data in a latent space), with the last two abnormalities designed to detect unseen anomalies. |
Choubo Ding; Guansong Pang; Chunhua Shen; |
1100 | MLSLT: Towards Multilingual Sign Language Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such models are inefficient in building multilingual sign language translation systems. To solve this problem, we introduce the multilingual sign language translation (MSLT) task. |
Aoxiong Yin; Zhou Zhao; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He; |
1101 | Towards An End-to-End Framework for Flow-Guided Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules. |
Zhen Li; Cheng-Ze Lu; Jianhua Qin; Chun-Le Guo; Ming-Ming Cheng; |
1102 | Contrastive Test-Time Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. |
Dian Chen; Dequan Wang; Trevor Darrell; Sayna Ebrahimi; |
1103 | Multimodal Colored Point Cloud to Image Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A major challenge in acquiring such ground truth data is the accurate alignment between RGB images and the point cloud measured by a depth scanner. To overcome this difficulty, we consider a differential optimization method that aligns a colored point cloud with a given color image through iterative geometric and color matching. |
Noam Rotstein; Amit Bracha; Ron Kimmel; |
1104 | MotionAug: Augmentation With Physical Correction for Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a motion data augmentation scheme incorporating motion synthesis encouraging diversity and motion correction imposing physical plausibility. |
Takahiro Maeda; Norimichi Ukita; |
1105 | Active Teacher for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher for semi-supervised object detection (SSOD). |
Peng Mi; Jianghang Lin; Yiyi Zhou; Yunhang Shen; Gen Luo; Xiaoshuai Sun; Liujuan Cao; Rongrong Fu; Qiang Xu; Rongrong Ji; |
1106 | CrossLoc: Scalable Aerial Localization Assisted By Multimodal Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. |
Qi Yan; Jianhao Zheng; Simon Reding; Shanci Li; Iordan Doytchinov; |
1107 | Audio-Adaptive Activity Recognition Across Video Domains Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation as well as addressing shifts in the semantic distribution. |
Yunhua Zhang; Hazel Doughty; Ling Shao; Cees G. M. Snoek; |
1108 | Collaborative Learning for Hand and Object Reconstruction With Attention-Guided Graph Convolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we transfer hand mesh information to the object branch and vice versa for the hand branch. |
Tze Ho Elden Tse; Kwang In Kim; Ales̆ Leonardis; Hyung Jin Chang; |
1109 | On Learning Contrastive Representations for Learning With Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this issue, we focus on learning robust contrastive representations of data on which the classifier is hard to memorize the label noise under the CE loss. We propose a novel contrastive regularization function to learn such representations over noisy data where the label noise does not dominate the representation learning. |
Li Yi; Sheng Liu; Qi She; A. Ian McLeod; Boyu Wang; |
1110 | Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel non-local contrastive learning (NLCL) method for unsupervised image deraining. |
Yuntong Ye; Changfeng Yu; Yi Chang; Lin Zhu; Xi-Le Zhao; Luxin Yan; Yonghong Tian; |
1111 | Modeling Indirect Illumination for Inverse Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel approach to efficiently recovering spatially-varying indirect illumination. |
Yuanqing Zhang; Jiaming Sun; Xingyi He; Huan Fu; Rongfei Jia; Xiaowei Zhou; |
1112 | BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. |
David B. Lindell; Dave Van Veen; Jeong Joon Park; Gordon Wetzstein; |
1113 | Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we propose regional semantic contrast and aggregation (RCA). |
Tianfei Zhou; Meijie Zhang; Fang Zhao; Jianwu Li; |
1114 | Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a result, given a class, its hot CAM pixels may wrongly invade the area belonging to other classes, or the non-hot ones may be actually a part of the class. To this end, we introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE), dubbed ReCAM. |
Zhaozheng Chen; Tan Wang; Xiongwei Wu; Xian-Sheng Hua; Hanwang Zhang; Qianru Sun; |
1115 | TransWeather: Transformer-Based Restoration of Images Degraded By Adverse Weather Conditions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on developing an efficient solution for the all adverse weather removal problem. |
Jeya Maria Jose Valanarasu; Rajeev Yasarla; Vishal M. Patel; |
1116 | Merry Go Round: Rotate A Frame and Fool A DNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we make a key novel suggestion to use perturbation in optical flow to carry out AAs on a video analysis system. |
Daksh Thapar; Aditya Nigam; Chetan Arora; |
1117 | H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hierarchical Feature Alignment (H^2FA) R-CNN. |
Yunqiu Xu; Yifan Sun; Zongxin Yang; Jiaxu Miao; Yi Yang; |
1118 | Modeling SRGB Camera Noise With Normalizing Flows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new sRGB-domain noise model based on normalizing flows that is capable of learning the complex noise distribution found in sRGB images under various ISO levels. |
Shayan Kousha; Ali Maleky; Michael S. Brown; Marcus A. Brubaker; |
1119 | A ConvNet for The 2020s Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. |
Zhuang Liu; Hanzi Mao; Chao-Yuan Wu; Christoph Feichtenhofer; Trevor Darrell; Saining Xie; |
1120 | Reference-Based Video Super-Resolution Using Multi-Camera Video Triplets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the first reference-based video super-resolution (RefVSR) approach that utilizes reference videos for high-fidelity results. |
Junyong Lee; Myeonghee Lee; Sunghyun Cho; Seungyong Lee; |
1121 | Self-Supervised Image Representation Learning With Geometric Set Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency. |
Nenglun Chen; Lei Chu; Hao Pan; Yan Lu; Wenping Wang; |
1122 | Deep Anomaly Discovery From Unlabeled Videos Via Normality Advantage and Self-Paced Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a full deep neural network (DNN) based solution that can realize highly effective UVAD. |
Guang Yu; Siqi Wang; Zhiping Cai; Xinwang Liu; Chuanfu Xu; Chengkun Wu; |
1123 | P3Depth: Monocular Depth Estimation With A Piecewise Planarity Prior Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. |
Vaishakh Patil; Christos Sakaridis; Alexander Liniger; Luc Van Gool; |
1124 | GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects. |
Yue Liao; Aixi Zhang; Miao Lu; Yongliang Wang; Xiaobo Li; Si Liu; |
1125 | Simple Multi-Dataset Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. |
Xingyi Zhou; Vladlen Koltun; Philipp Krähenbühl; |
1126 | MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present MLP-3D networks, a novel MLP-like 3D architecture for video recognition. |
Zhaofan Qiu; Ting Yao; Chong-Wah Ngo; Tao Mei; |
1127 | Proactive Image Manipulation Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By contrast, we propose a proactive scheme to image manipulation detection. |
Vishal Asnani; Xi Yin; Tal Hassner; Sijia Liu; Xiaoming Liu; |
1128 | Sketch3T: Test-Time Training for Zero-Shot SBIR Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories. In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches — the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a result. |
Aneeshan Sain; Ayan Kumar Bhunia; Vaishnav Potlapalli; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; |
1129 | BANMo: Building Animatable 3D Neural Models From Many Casual Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key insight is to merge three schools of thought: (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) canonical embeddings that establish correspondences between pixels and a canonical 3D model, and (3) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization. |
Gengshan Yang; Minh Vo; Natalia Neverova; Deva Ramanan; Andrea Vedaldi; Hanbyul Joo; |
1130 | StyTr2: Image Style Transfer With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, traditional neural style transfer methods face biased content representation. To address this critical issue, we take long-range dependencies of input images into account for image style transfer by proposing a transformer-based approach called StyTr^2. |
Yingying Deng; Fan Tang; Weiming Dong; Chongyang Ma; Xingjia Pan; Lei Wang; Changsheng Xu; |
1131 | Towards Discriminative Representation: Multi-View Trajectory Contrastive Learning for Online Multi-Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although this strategy is effective, it fails to fully exploit the information contained in a whole trajectory. To this end, we propose a strategy, namely multi-view trajectory contrastive learning, in which each trajectory is represented as a center vector. |
En Yu; Zhuoling Li; Shoudong Han; |
1132 | Global Matching With Overlapping Attention for Optical Flow Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet. |
Shiyu Zhao; Long Zhao; Zhixing Zhang; Enyu Zhou; Dimitris Metaxas; |
1133 | Language As Queries for Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a simple and unified framework built upon Transformer, termed ReferFormer. |
Jiannan Wu; Yi Jiang; Peize Sun; Zehuan Yuan; Ping Luo; |
1134 | Investigating The Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While most of the existing works focus on developing new deep learning algorithms or model architectures, we study the problem from the physical design perspective, i.e., how different placements of multiple LiDARs influence the learning-based perception. To this end, we introduce an easy-to-compute information-theoretic surrogate metric to quantitatively and fast evaluate LiDAR placement for 3D detection of different types of objects. |
Hanjiang Hu; Zuxin Liu; Sharad Chitlangia; Akhil Agnihotri; Ding Zhao; |
1135 | MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection. |
Yanghao Li; Chao-Yuan Wu; Haoqi Fan; Karttikeya Mangalam; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer; |
1136 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes. |
Otniel-Bogdan Mercea; Lukas Riesch; A. Sophia Koepke; Zeynep Akata; |
1137 | Rethinking Efficient Lane Detection Via Curve Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a novel parametric curve-based method for lane detection in RGB images. |
Zhengyang Feng; Shaohua Guo; Xin Tan; Ke Xu; Min Wang; Lizhuang Ma; |
1138 | GreedyNASv2: Greedier Search With A Greedy Path Filter Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently. |
Tao Huang; Shan You; Fei Wang; Chen Qian; Changshui Zhang; Xiaogang Wang; Chang Xu; |
1139 | Self-Supervised Arbitrary-Scale Point Clouds Upsampling Via Implicit Neural Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel approach that achieves selfsupervised and magnification-flexible point clouds upsampling simultaneously. |
Wenbo Zhao; Xianming Liu; Zhiwei Zhong; Junjun Jiang; Wei Gao; Ge Li; Xiangyang Ji; |
1140 | Co-Advise: Cross Inductive Bias Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). |
Sucheng Ren; Zhengqi Gao; Tianyu Hua; Zihui Xue; Yonglong Tian; Shengfeng He; Hang Zhao; |
1141 | AdaMixer: A Fast-Converging Query-Based Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects. |
Ziteng Gao; Limin Wang; Bing Han; Sheng Guo; |
1142 | DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In these, there are limited number of WSI slides (bags), while the resolution of a single WSI is huge, which leads to a large number of patches (instances) cropped from this slide. To address this issue, we propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags, on which a double-tier MIL framework is built to effectively use the intrinsic features. |
Hongrun Zhang; Yanda Meng; Yitian Zhao; Yihong Qiao; Xiaoyun Yang; Sarah E. Coupland; Yalin Zheng; |
1143 | BEVT: BERT Pretraining of Video Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce BEVT which decouples video representation learning into spatial representation learning and temporal dynamics learning. |
Rui Wang; Dongdong Chen; Zuxuan Wu; Yinpeng Chen; Xiyang Dai; Mengchen Liu; Yu-Gang Jiang; Luowei Zhou; Lu Yuan; |
1144 | Deep Generalized Unfolding Networks for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Deep Generalized Unfolding Network (DGUNet) for image restoration. |
Chong Mou; Qian Wang; Jian Zhang; |
1145 | Automatic Relation-Aware Graph Network Proliferation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Automatic Relation-aware Graph Network Proliferation (ARGNP) for efficiently searching GNNs with a relation-guided message passing mechanism. |
Shaofei Cai; Liang Li; Xinzhe Han; Jiebo Luo; Zheng-Jun Zha; Qingming Huang; |
1146 | AIM: An Auto-Augmenter for Images and Meshes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents an auto-augmenter for images and meshes (AIM) that easily incorporates into neural networks at training and inference times. |
Vinit Veerendraveer Singh; Chandra Kambhamettu; |
1147 | VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel single-stage framework for online VIS built based on the grid structured feature representation. |
Su Ho Han; Sukjun Hwang; Seoung Wug Oh; Yeonchool Park; Hyunwoo Kim; Min-Jung Kim; Seon Joo Kim; |
1148 | Deep Unlearning Via Randomized Conditionally Independent Hessians Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. |
Ronak Mehta; Sourav Pal; Vikas Singh; Sathya N. Ravi; |
1149 | Patch-Level Representation Learning for Self-Supervised Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent self-supervised learning (SSL) methods have shown impressive results in learning visual representations from unlabeled images. This paper aims to improve their performance further by utilizing the architectural advantages of the underlying neural network, as the current state-of-the-art visual pretext tasks for SSL do not enjoy the benefit, i.e., they are architecture-agnostic. |
Sukmin Yun; Hankook Lee; Jaehyung Kim; Jinwoo Shin; |
1150 | Sylph: A Hypernetwork Framework for Incremental Few-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study the challenging incremental few-shot object detection (iFSD) setting. |
Li Yin; Juan M. Perez-Rua; Kevin J. Liang; |
1151 | Incremental Learning in Semantic Segmentation From Image Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel framework for Weakly Incremental Learning for Semantic Segmentation, that aims at learning to segment new classes from cheap and largely available image-level labels. |
Fabio Cermelli; Dario Fontanel; Antonio Tavera; Marco Ciccone; Barbara Caputo; |
1152 | Playable Environments: Video Manipulation in Space and Time Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Playable Environments – a new representation for interactive video generation and manipulation in space and time. |
Willi Menapace; Stéphane Lathuilière; Aliaksandr Siarohin; Christian Theobalt; Sergey Tulyakov; Vladislav Golyanik; Elisa Ricci; |
1153 | Robust Cross-Modal Representation Learning With Progressive Self-Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The learning objective of vision-language approach of CLIP does not effectively account for the noisy many-to-many correspondences found in web-harvested image captioning datasets, which contributes to its compute and data inefficiency. To address this challenge, we introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data. |
Alex Andonian; Shixing Chen; Raffay Hamid; |
1154 | What To Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. |
A S M Iftekhar; Hao Chen; Kaustav Kundu; Xinyu Li; Joseph Tighe; Davide Modolo; |
1155 | Compressive Single-Photon 3D Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: One major reason for SPAD’s bandwidth-intensive operation is the tight coupling that exists between depth resolution and histogram resolution. To weaken this coupling, we propose compressive single-photon histograms (CSPH). |
Felipe Gutierrez-Barragan; Atul Ingle; Trevor Seets; Mohit Gupta; Andreas Velten; |
1156 | Stereo Magnification With Multi-Layer Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a new view synthesis approach based on multiple semitransparent layers with scene-adapted geometry. |
Taras Khakhulin; Denis Korzhenkov; Pavel Solovev; Gleb Sterkin; Andrei-Timotei Ardelean; Victor Lempitsky; |
1157 | CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose CO-SNE, which extends the Euclidean space visualization tool, t-SNE, to hyperbolic space. |
Yunhui Guo; Haoran Guo; Stella X. Yu; |
1158 | Revisiting Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. |
Haodong Duan; Yue Zhao; Kai Chen; Dahua Lin; Bo Dai; |
1159 | Rethinking Controllable Variational Autoencoders Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, when it comes to disentangled representation learning, ControlVAE does not delve into the rationale behind it. The goal of this paper is to develop a deeper understanding of ControlVAE in learning disentangled representations, including the choice of a desired KL-divergence (i.e, set point), and its stability during training. |
Huajie Shao; Yifei Yang; Haohong Lin; Longzhong Lin; Yizhuo Chen; Qinmin Yang; Han Zhao; |
1160 | Contextual Instance Decoupling for Robust Multi-Person Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes the Contextual Instance Decoupling (CID), which presents a new pipeline for multi-person pose estimation. |
Dongkai Wang; Shiliang Zhang; |
1161 | LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. |
Duy M. H. Nguyen; Roberto Henschel; Bodo Rosenhahn; Daniel Sonntag; Paul Swoboda; |
1162 | Boosting Crowd Counting Via Multifaceted Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attentions of recent vision transformers can well handle this kind of variations. To address this problem, we propose a Multifaceted Attention Network (MAN), which incorporates global attention from vanilla transformer, learnable local attention, attention regularization and instance attention into a counting model. |
Hui Lin; Zhiheng Ma; Rongrong Ji; Yaowei Wang; Xiaopeng Hong; |
1163 | Stereo Depth From Events Cameras: Concentrate and Focus on The Future Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the event missing or overriding issue, we propose to learn to concentrate on the dense events to produce a compact event representation with high details for depth estimation. |
Yeongwoo Nam; Mohammad Mostafavi; Kuk-Jin Yoon; Jonghyun Choi; |
1164 | A Probabilistic Graphical Model Based on Neural-Symbolic Reasoning for Visual Relationship Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To overcome the aforementioned weaknesses, we integrate symbolic knowledge into deep learning models and propose a bi-level probabilistic graphical reasoning framework called BPGR. |
Dongran Yu; Bo Yang; Qianhao Wei; Anchen Li; Shirui Pan; |
1165 | A Simple Data Mixing Prior for Improving Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component for advancing recognition models. In this paper, we focus on studying its effectiveness in the self-supervised setting. |
Sucheng Ren; Huiyu Wang; Zhengqi Gao; Shengfeng He; Alan Yuille; Yuyin Zhou; Cihang Xie; |
1166 | Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. |
Ruifei He; Shuyang Sun; Jihan Yang; Song Bai; Xiaojuan Qi; |
1167 | LOLNerf: Learn From One Look Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method for learning a generative 3D model based on neural radiance fields, trained solely from data with only single views of each object. |
Daniel Rebain; Mark Matthews; Kwang Moo Yi; Dmitry Lagun; Andrea Tagliasacchi; |
1168 | Geometry-Aware Guided Loss for Deep Crack Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the geometry-aware guided loss (GAGL) that enhances the discrimination ability and is only applied in the training stage without extra computation and memory during inference. |
Zhuangzhuang Chen; Jin Zhang; Zhuonan Lai; Jie Chen; Zun Liu; Jianqiang Li; |
1169 | Multi-Modal Alignment Using Representation Codebook Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to align at a higher and more stable level using cluster representation. |
Jiali Duan; Liqun Chen; Son Tran; Jinyu Yang; Yi Xu; Belinda Zeng; Trishul Chilimbi; |
1170 | Maintaining Reasoning Consistency in Compositional Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a dialog-like reasoning method for maintaining reasoning consistency in answering a compositional question and its sub-questions. |
Chenchen Jing; Yunde Jia; Yuwei Wu; Xinyu Liu; Qi Wu; |
1171 | Structure-Aware Motion Transfer With Deformable Anchor Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel structure-aware motion modeling approach, the deformable anchor model (DAM), which can automatically discover the motion structure of arbitrary objects without leveraging their prior structure information. |
Jiale Tao; Biao Wang; Borun Xu; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan; |
1172 | BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL [19] and Analytics Zoo [18] projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). |
Jason (Jinquan) Dai; Ding Ding; Dongjie Shi; Shengsheng Huang; Jiao Wang; Xin Qiu; Kai Huang; Guoqiong Song; Yang Wang; Qiyuan Gong; Jiaming Song; Shan Yu; Le Zheng; Yina Chen; Junwei Deng; Ge Song; |
1173 | Integrative Few-Shot Learning for Classification and Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the integrative task of few-shot classification and segmentation (FS-CS) that aims to both classify and segment target objects in a query image when the target classes are given with a few examples. |
Dahyun Kang; Minsu Cho; |
1174 | Acquiring A Dynamic Light Field Through A Single-Shot Coded Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a method for compressively acquiring a dynamic light field (a 5-D volume) through a single-shot coded image (a 2-D measurement). |
Ryoya Mizuno; Keita Takahashi; Michitaka Yoshida; Chihiro Tsutake; Toshiaki Fujii; Hajime Nagahara; |
1175 | Attentive Fine-Grained Structured Sparsity for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To further optimize the trade-off between the efficiency and the restoration accuracy, we propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer. |
Junghun Oh; Heewon Kim; Seungjun Nah; Cheeun Hong; Jonghyun Choi; Kyoung Mu Lee; |
1176 | Pix2NeRF: Unsupervised Conditional P-GAN for Single Image to Neural Radiance Fields Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a pipeline to generate Neural Radiance Fields (NeRF) of an object or a scene of a specific class, conditioned on a single input image. |
Shengqu Cai; Anton Obukhov; Dengxin Dai; Luc Van Gool; |
1177 | HARA: A Hierarchical Approach for Robust Rotation Averaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel hierarchical approach for multiple rotation averaging, dubbed HARA. |
Seong Hun Lee; Javier Civera; |
1178 | Diffusion Autoencoders: Toward A Meaningful and Decodable Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key idea is to use a learnable encoder for discovering the high-level semantics, and a DPM as the decoder for modeling the remaining stochastic variations. |
Konpat Preechakul; Nattanat Chatthee; Suttisak Wizadwongsa; Supasorn Suwajanakorn; |
1179 | Learning Fair Classifiers With Partially Annotated Group Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider a more practical scenario, dubbed as Algorithmic Group Fairness with the Partially annotated Group labels (Fair-PG). |
Sangwon Jung; Sanghyuk Chun; Taesup Moon; |
1180 | StylizedNeRF: Consistent 3D Scene Stylization As Stylized NeRF Via 2D-3D Mutual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, there is a significant domain gap between style examples which are 2D images and NeRF which is an implicit volumetric representation. To address this problem, we propose a novel mutual learning framework for 3D scene stylization that combines a 2D image stylization network and NeRF to fuse the stylization ability of 2D stylization network with the 3D consistency of NeRF. |
Yi-Hua Huang; Yue He; Yu-Jie Yuan; Yu-Kun Lai; Lin Gao; |
1181 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose NightLab, a novel nighttime segmentation framework that leverages multiple deep learning models imbued with night-aware features to yield State-of-The-Art (SoTA) performance on multiple night segmentation benchmarks. |
Xueqing Deng; Peng Wang; Xiaochen Lian; Shawn Newsam; |
1182 | Knowledge Distillation With The Reused Teacher Classifier Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we empirically show that a simple knowledge distillation technique is enough to significantly narrow down the teacher-student performance gap. |
Defang Chen; Jian-Ping Mei; Hailin Zhang; Can Wang; Yan Feng; Chun Chen; |
1183 | Contrastive Learning for Unsupervised Video Highlight Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a simple contrastive learning framework for unsupervised highlight detection. |
Taivanbat Badamdorj; Mrigank Rochan; Yang Wang; Li Cheng; |
1184 | InfoGCN: Representation Learning for Human Skeleton-Based Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: InfoGCN proposes a learning framework for action recognition combining a novel learning objective and an encoding method. |
Hyung-gun Chi; Myoung Hoon Ha; Seunggeun Chi; Sang Wan Lee; Qixing Huang; Karthik Ramani; |
1185 | Rethinking Image Cropping: Exploring Diverse Compositions From Global Views Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we regard image cropping as a set prediction problem. |
Gengyun Jia; Huaibo Huang; Chaoyou Fu; Ran He; |
1186 | Constrained Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. |
Michael Hersche; Geethan Karunaratne; Giovanni Cherubini; Luca Benini; Abu Sebastian; Abbas Rahimi; |
1187 | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present our material and texture based self-supervision method named MATTER (MATerial and TExture Representation Learning), which is inspired by classical material and texture methods. |
Peri Akiva; Matthew Purri; Matthew Leotta; |
1188 | Threshold Matters in WSSS: Manipulating The Activation for The Robust and Accurate Segmentation Model Against Thresholds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Then, we show that this issue can be mitigated by satisfying two conditions; 1) reducing the imbalance in the foreground activation and 2) increasing the gap between the foreground and the background activation. Based on these findings, we propose a novel activation manipulation network with a per-pixel classification loss and a label conditioning module. |
Minhyun Lee; Dongseob Kim; Hyunjung Shim; |
1189 | Data-Free Network Compression Via Parametric Non-Uniform Mixed Precision Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: These are unavailable for the confidential scenarios due to personal privacy and security problems. Focusing on this issue, we propose a novel data-free method for network compression called PNMQ, which employs the Parametric Non-uniform Mixed precision Quantization to generate a quantized network. |
Vladimir Chikin; Mikhail Antiukh; |
1190 | Sparse to Dense Dynamic 3D Facial Expression Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a solution to the task of generating dynamic 3D facial expressions from a neutral 3D face and an expression label. |
Naima Otberdout; Claudio Ferrari; Mohamed Daoudi; Stefano Berretti; Alberto Del Bimbo; |
1191 | Think Twice Before Detecting GAN-Generated Fake Images From Their Spectral Domain Imprints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our study in this paper introduces a pipeline to mitigate the spectral artifacts. |
Chengdong Dong; Ajay Kumar; Eryun Liu; |
1192 | Crafting Better Contrastive Views for Siamese Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose ContrastiveCrop, which could effectively generate better crops for Siamese representation learning. |
Xiangyu Peng; Kai Wang; Zheng Zhu; Mang Wang; Yang You; |
1193 | RSCFed: Random Sampling Consensus Federated Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a Random Sampling Consensus Federated learning, namely RSCFed, by considering the uneven reliability among models from labeled clients and unlabeled clients. |
Xiaoxiao Liang; Yiqun Lin; Huazhu Fu; Lei Zhu; Xiaomeng Li; |
1194 | TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present TransMVSNet, based on our exploration of feature matching in multi-view stereo (MVS). |
Yikang Ding; Wentao Yuan; Qingtian Zhu; Haotian Zhang; Xiangyue Liu; Yuanjiang Wang; Xiao Liu; |
1195 | ROCA: Robust CAD Model Retrieval and Alignment From A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ROCA, a novel end-to-end approach that retrieves and aligns 3D CAD models from a shape database to a single input image. |
Can Gümeli; Angela Dai; Matthias Nießner; |
1196 | Continual Learning for Visual Search With Backward Consistent Feature Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the issues of long-term visual search, we introduce a continual learning (CL) approach that can handle the incrementally growing gallery set with backward embedding consistency. |
Timmy S. T. Wan; Jun-Cheng Chen; Tzer-Yi Wu; Chu-Song Chen; |
1197 | IFS-RCNN: An Incremental Few-Shot Instance Segmenter Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper addresses incremental few-shot instance segmentation, where a few examples of new object classes arrive when access to training examples of old classes is not available anymore, and the goal is to perform well on both old and new classes. |
Khoi Nguyen; Sinisa Todorovic; |
1198 | DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose DPGEN, a network model designed to synthesize high-resolution natural images while satisfying differential privacy. |
Jia-Wei Chen; Chia-Mu Yu; Ching-Chia Kao; Tzai-Wei Pang; Chun-Shien Lu; |
1199 | MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we tackle the problem of few-shot class incremental learning (FSCIL). |
Zhixiang Chi; Li Gu; Huan Liu; Yang Wang; Yuanhao Yu; Jin Tang; |
1200 | The Majority Can Help The Minority: Context-Rich Minority Oversampling for Long-Tailed Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images. |
Seulki Park; Youngkyu Hong; Byeongho Heo; Sangdoo Yun; Jin Young Choi; |
1201 | Dense Depth Priors for Neural Radiance Fields From Sparse Input Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our method aims to synthesize novel views of whole rooms from an order of magnitude fewer images. |
Barbara Roessle; Jonathan T. Barron; Ben Mildenhall; Pratul P. Srinivasan; Matthias Nießner; |
1202 | EyePAD++: A Distillation-Based Approach for Joint Eye Authentication and Presentation Attack Detection Using Periocular Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we introduce a joint framework for EA and PAD using periocular images. |
Prithviraj Dhar; Amit Kumar; Kirsten Kaplan; Khushi Gupta; Rakesh Ranjan; Rama Chellappa; |
1203 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. |
Guande Wu; Jianzhe Lin; Claudio T. Silva; |
1204 | Wnet: Audio-Guided Video Object Segmentation Via Wavelet-Based Cross-Modal Denoising Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider the problem of audio-guided video semantic segmentation from the viewpoint of end-to-end denoised encoder-decoder network learning. |
Wenwen Pan; Haonan Shi; Zhou Zhao; Jieming Zhu; Xiuqiang He; Zhigeng Pan; Lianli Gao; Jun Yu; Fei Wu; Qi Tian; |
1205 | Camera Pose Estimation Using Implicit Distortion Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore an alternative approach which implicitly models the lens distortion. |
Linfei Pan; Marc Pollefeys; Viktor Larsson; |
1206 | Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass. |
Mehdi S. M. Sajjadi; Henning Meyer; Etienne Pot; Urs Bergmann; Klaus Greff; Noha Radwan; Suhani Vora; Mario Lučić; Daniel Duckworth; Alexey Dosovitskiy; Jakob Uszkoreit; Thomas Funkhouser; Andrea Tagliasacchi; |
1207 | Shape-Invariant 3D Adversarial Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations. |
Qidong Huang; Xiaoyi Dong; Dongdong Chen; Hang Zhou; Weiming Zhang; Nenghai Yu; |
1208 | LAS-AT: Adversarial Training With Learnable Attack Strategy Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework for adversarial training by introducing the concept of "learnable attack strategy", dubbed LAS-AT, which learns to automatically produce attack strategies to improve the model robustness. |
Xiaojun Jia; Yong Zhang; Baoyuan Wu; Ke Ma; Jue Wang; Xiaochun Cao; |
1209 | Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we strive to liberate ViTs from pre-training by introducing CNNs’ inductive biases back to ViTs while preserving their network architectures for higher upper bound and setting up more suitable optimization objectives. |
Haofei Zhang; Jiarui Duan; Mengqi Xue; Jie Song; Li Sun; Mingli Song; |
1210 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, one of the greatest challenges remains the creation of datasets with complete, unambiguous ground truth at scale. To address this, we develop a new, more comprehensive dataset for table extraction, called PubTables-1M. |
Brandon Smock; Rohith Pesala; Robin Abraham; |
1211 | Styleformer: Transformer Based Generative Adversarial Networks With Style Vector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we effectively apply the modified Transformer structure (e.g., Increased multi-head attention and Pre-layer normalization) and attention style injection which is style modulation and demodulation method for self-attention operation. |
Jeeseung Park; Younggeun Kim; |
1212 | Efficient Two-Stage Detection of Human-Object Interactions With A Novel Unary-Pairwise Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose the Unary-Pairwise Transformer, a two-stage detector that exploits unary and pairwise representations for HOIs. |
Frederic Z. Zhang; Dylan Campbell; Stephen Gould; |
1213 | ELSR: Efficient Line Segment Reconstruction With Planes and Points Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an efficient line segment reconstruction method called ELSR. |
Dong Wei; Yi Wan; Yongjun Zhang; Xinyi Liu; Bin Zhang; Xiqi Wang; |
1214 | Meta-Attention for ViT-Backed Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study ViT-backed continual learning to strive for higher performance riding on recent advances of ViTs. |
Mengqi Xue; Haofei Zhang; Jie Song; Mingli Song; |
1215 | DST: Dynamic Substitute Training for Data-Free Black-Box Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model. |
Wenxuan Wang; Xuelin Qian; Yanwei Fu; Xiangyang Xue; |
1216 | Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present PHORHUM, a novel, end-to-end trainable, deep neural network methodology for photorealistic 3D human reconstruction given just a monocular RGB image. |
Thiemo Alldieck; Mihai Zanfir; Cristian Sminchisescu; |
1217 | A Low-Cost & Real-Time Motion Capture System Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Traditional marker-based motion capture requires excessive and specialized equipment, hindering accessibility and wider adoption. In this work, we demonstrate such a system but rely on a very sparse set of low-cost consumer-grade sensors. |
Anargyros Chatzitofis; Georgios Albanis; Nikolaos Zioulis; Spyridon Thermos; |
1218 | Unified Contrastive Learning in Image-Text-Label Space Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. |
Jianwei Yang; Chunyuan Li; Pengchuan Zhang; Bin Xiao; Ce Liu; Lu Yuan; Jianfeng Gao; |
1219 | Unifying Motion Deblurring and Frame Interpolation With Events Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Slow shutter speed and long exposure time of frame-based cameras often cause visual blur and loss of inter-frame information, degenerating the overall quality of captured videos. To this end, we present a unified framework of event-based motion deblurring and frame interpolation for blurry video enhancement, where the extremely low latency of events is leveraged to alleviate motion blur and facilitate intermediate frame prediction. |
Xiang Zhang; Lei Yu; |
1220 | Generalizing Interactive Backpropagating Refinement for Dense Prediction Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, in order to generalize backpropagating refinement for a wide range of dense prediction tasks, we introduce a set of G-BRS (Generalized Backpropagating Refinement Scheme) layers that enable both global and localized refinement for the following tasks: interactive segmentation, semantic segmentation, image matting and monocular depth estimation. |
Fanqing Lin; Brian Price; Tony Martinez; |
1221 | Unsupervised Pre-Training for Temporal Action Localization Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization. To bridge this gap, we make the first attempt to propose a self-supervised pretext task, coined as Pseudo Action Localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action Localization tasks (UP-TAL). |
Can Zhang; Tianyu Yang; Junwu Weng; Meng Cao; Jue Wang; Yuexian Zou; |
1222 | Light Field Neural Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Methods based on geometric reconstruction need only sparse views, but cannot accurately model non-Lambertian effects. We introduce a model that combines the strengths and mitigates the limitations of these two directions. |
Mohammed Suhail; Carlos Esteves; Leonid Sigal; Ameesh Makadia; |
1223 | Fast Point Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces Fast Point Transformer that consists of a new lightweight self-attention layer. |
Chunghyun Park; Yoonwoo Jeong; Minsu Cho; Jaesik Park; |
1224 | Look Outside The Room: Synthesizing A Consistent Long-Term 3D Scene Video From A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions. |
Xuanchi Ren; Xiaolong Wang; |
1225 | Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we emphatically summarize that learning an adaptive label distribution on ordinal regression tasks should follow three principles. |
Qiang Li; Jingjing Wang; Zhaoliang Yao; Yachun Li; Pengju Yang; Jingwei Yan; Chunmao Wang; Shiliang Pu; |
1226 | Augmented Geometric Distillation for Data-Free Incremental Person ReID Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, due to the strict privacy licenses and the open-set retrieval setting, it is intractable to adapt existing class IL methods to ReID. In this work, we propose an Augmented Geometric Distillation (AGD) framework to tackle these issues. |
Yichen Lu; Mei Wang; Weihong Deng; |
1227 | Deep Stereo Image Compression Via Bi-Directional Coding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel bi-directional coding-based end-to-end stereo image compression network (BCSIC-Net). |
Jianjun Lei; Xiangrui Liu; Bo Peng; Dengchao Jin; Wanqing Li; Jingxiao Gu; |
1228 | Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems Through Stochastic Contraction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we show that starting from Gaussian noise is unnecessary. |
Hyungjin Chung; Byeongsu Sim; Jong Chul Ye; |
1229 | Smooth-Swap: A Simple Enhancement for Face-Swapping With Smoothness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new face-swapping model called ‘Smooth-Swap’, which excludes complex handcrafted designs and allows fast and stable training. |
Jiseob Kim; Jihoon Lee; Byoung-Tak Zhang; |
1230 | Full-Range Virtual Try-On With Recurrent Tri-Level Transform Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a principled framework, Recurrent Tri-Level Transform (RT-VTON), that performs full-range virtual try-on on both standard and non-standard clothes. |
Han Yang; Xinrui Yu; Ziwei Liu; |
1231 | Style Neophile: Constantly Seeking Novel Styles for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these methods are restricted to a finite set of styles since they obtain styles for augmentation from a fixed set of external images or by interpolating those of training data. To address this limitation and maximize the benefit of style augmentation, we propose a new method that synthesizes novel styles constantly during training. |
Juwon Kang; Sohyun Lee; Namyup Kim; Suha Kwak; |
1232 | High-Fidelity Human Avatars From A Single RGB Camera Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a coarse-to-fine framework to reconstruct a personalized high-fidelity human avatar from a monocular video. |
Hao Zhao; Jinsong Zhang; Yu-Kun Lai; Zerong Zheng; Yingdi Xie; Yebin Liu; Kun Li; |
1233 | ADAPT: Vision-Language Navigation With Modality-Aligned Action Prompts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose modAlity-aligneD Action PrompTs (ADAPT), which provides the VLN agent with action prompts to enable the explicit learning of action-level modality alignment to pursue successful navigation. |
Bingqian Lin; Yi Zhu; Zicong Chen; Xiwen Liang; Jianzhuang Liu; Xiaodan Liang; |
1234 | Multiview Transformers for Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although transformer architectures have recently advanced the state-of-the-art, they have not explicitly modelled different spatiotemporal resolutions. To this end, we present Multiview Transformers for Video Recognition (MTV). |
Shen Yan; Xuehan Xiong; Anurag Arnab; Zhichao Lu; Mi Zhang; Chen Sun; Cordelia Schmid; |
1235 | RIO: Rotation-Equivariance Supervised Learning of Robust Inertial Odometry Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. |
Xiya Cao; Caifa Zhou; Dandan Zeng; Yongliang Wang; |
1236 | How Good Is Aesthetic Ability of A Fashion Model? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce A100 (Aesthetic 100) to assess the aesthetic ability of the fashion compatibility models. |
Xingxing Zou; Kaicheng Pang; Wen Zhang; Waikeung Wong; |
1237 | Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-Based 3D Hand Pose and Mesh Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study the cross-view information fusion problem in the task of self-supervised 3D hand pose estimation from the depth image. |
Pengfei Ren; Haifeng Sun; Jiachang Hao; Jingyu Wang; Qi Qi; Jianxin Liao; |
1238 | Automated Progressive Learning for Efficient Training of Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning. |
Changlin Li; Bohan Zhuang; Guangrun Wang; Xiaodan Liang; Xiaojun Chang; Yi Yang; |
1239 | BTS: A Bi-Lingual Benchmark for Text Segmentation in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Different from English which has a limited alphabet of letters, Chinese has much more basic characters with complex structures, making the problem more difficult to deal with. To better analyze this problem, we propose the Bi-lingual Text Segmentation (BTS) dataset, a benchmark that covers various common Chinese scenes including 14,250 diverse and fine-annotated text images. |
Xixi Xu; Zhongang Qi; Jianqi Ma; Honglun Zhang; Ying Shan; Xiaohu Qie; |
1240 | Learning Structured Gaussians To Approximate Deep Ensembles Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes using a sparse-structured multivariate Gaussian to provide a closed-form approximator for the output of probabilistic ensemble models used for dense image prediction tasks. |
Ivor J. A. Simpson; Sara Vicente; Neill D. F. Campbell; |
1241 | Adaptive Trajectory Prediction Via Transferable GNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This issue results in inevitable performance decrease. To address this issue, we propose a novel Transferable Graph Neural Network (T-GNN) framework, which jointly conducts trajectory prediction as well as domain alignment in a unified framework. |
Yi Xu; Lichen Wang; Yizhou Wang; Yun Fu; |
1242 | Total Variation Optimization Layers for Computer Vision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To study question (a), in this work, we propose total variation (TV) minimization as a layer for computer vision. |
Raymond A. Yeh; Yuan-Ting Hu; Zhongzheng Ren; Alexander G. Schwing; |
1243 | Defensive Patches for Robust Recognition in The Physical World Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the fact that robust recognition depends on both local and global features, we propose a defensive patch generation framework to address these problems by helping models better exploit these features. |
Jiakai Wang; Zixin Yin; Pengfei Hu; Aishan Liu; Renshuai Tao; Haotong Qin; Xianglong Liu; Dacheng Tao; |
1244 | Single-Stage Is Enough: Multi-Person Absolute 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We argue that it is more desirable to simplify such two-stage paradigm to a single-stage one to promote both efficiency and performance. To this end, we present an efficient single-stage solution, Decoupled Regression Model (DRM), with three distinct novelties. |
Lei Jin; Chenyang Xu; Xiaojuan Wang; Yabo Xiao; Yandong Guo; Xuecheng Nie; Jian Zhao; |
1245 | Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, large disparities between existing synthetic datasets and real scenes lead to poor model transfer. We make two major contributions to address that. |
Zhao Jin; Yinjie Lei; Naveed Akhtar; Haifeng Li; Munawar Hayat; |
1246 | Learn From Others and Be Yourself in Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose FCCL (Federated Cross-Correlation and Continual Learning). |
Wenke Huang; Mang Ye; Bo Du; |
1247 | Sequential Voting With Relational Box Fields for Active Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To leverage each pixel as evidence to determine the bounding box of the active object, we propose a pixel-wise voting function. |
Qichen Fu; Xingyu Liu; Kris Kitani; |
1248 | Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we switch from D models to G models using the classical auto-encoder (AE). |
Guangrun Wang; Yansong Tang; Liang Lin; Philip H.S. Torr; |
1249 | Learning Transferable Human-Object Interaction Detector With Natural Language Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to develop a transferable HOI detector for unseen interactions. |
Suchen Wang; Yueqi Duan; Henghui Ding; Yap-Peng Tan; Kim-Hui Yap; Junsong Yuan; |
1250 | Fourier Document Restoration for Robust Document Dewarping and Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions and improve document recognition in a reliable and simpler manner. |
Chuhui Xue; Zichen Tian; Fangneng Zhan; Shijian Lu; Song Bai; |
1251 | Consistency Learning Via Decoding Path Augmentation for Transformers in Human Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths. |
Jihwan Park; SeungJun Lee; Hwan Heo; Hyeong Kyu Choi; Hyunwoo J. Kim; |
1252 | Consistent Explanations By Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given an interpretation algorithm, e.g., Grad-CAM, we introduce a novel training method to train the model to produce more consistent explanations. |
Vipin Pillai; Soroush Abbasi Koohpayegani; Ashley Ouligian; Dennis Fong; Hamed Pirsiavash; |
1253 | Text2Pos: Text-to-Point-Cloud Cross-Modal Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose Text2Pos, a cross-modal localization module that learns to align textual descriptions with localization cues in a coarse- to-fine manner. |
Manuel Kolmet; Qunjie Zhou; Aljoša Ošep; Laura Leal-Taixé; |
1254 | MulT: An End-to-End Multitask Learning Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. |
Deblina Bhattacharjee; Tong Zhang; Sabine Süsstrunk; Mathieu Salzmann; |
1255 | Hierarchical Modular Network for Video Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics from three levels before generating captions. |
Hanhua Ye; Guorong Li; Yuankai Qi; Shuhui Wang; Qingming Huang; Ming-Hsuan Yang; |
1256 | Learning With Neighbor Consistency for Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, collecting large datasets in a time- and cost-efficient manner often results in label noise. We present a method for learning from noisy labels that leverages similarities between training examples in feature space, encouraging the prediction of each example to be similar to its nearest neighbours. |
Ahmet Iscen; Jack Valmadre; Anurag Arnab; Cordelia Schmid; |
1257 | Depth Estimation By Combining Binocular Stereo and Monocular Structured-Light Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel stereo system, which consists of two cameras (an RGB camera and an IR camera) and an IR speckle projector. |
Yuhua Xu; Xiaoli Yang; Yushan Yu; Wei Jia; Zhaobi Chu; Yulan Guo; |
1258 | Salient-to-Broad Transition for Video Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the limited utilization of temporal relations in video re-id, the frame-level attention regions of mainstream methods are partial and highly similar. To address this problem, we propose a Salient-to-Broad Module (SBM) to enlarge the attention regions gradually. |
Shutao Bai; Bingpeng Ma; Hong Chang; Rui Huang; Xilin Chen; |
1259 | Object-Region Video Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present Object-Region Video Transformers (ORViT), an object-centric approach that extends video transformer layers with a block that directly incorporates object representations. |
Roei Herzig; Elad Ben-Avraham; Karttikeya Mangalam; Amir Bar; Gal Chechik; Anna Rohrbach; Trevor Darrell; Amir Globerson; |
1260 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the other hand, the exiting decisions made by internal classifiers are unreliable sometimes. To solve these issues, we propose DeeCap framework for efficient image captioning, which dynamically selects proper-sized decoding layers from a global perspective to exit early. |
Zhengcong Fei; Xu Yan; Shuhui Wang; Qi Tian; |
1261 | AME: Attention and Memory Enhancement in Hyper-Parameter Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Training Deep Neural Networks (DNNs) is inherently subject to sensitive hyper-parameters and untimely feedbacks of performance evaluation. To solve these two difficulties, an efficient parallel hyper-parameter optimization model is proposed under the framework of Deep Reinforcement Learning (DRL). |
Nuo Xu; Jianlong Chang; Xing Nie; Chunlei Huo; Shiming Xiang; Chunhong Pan; |
1262 | Alignment-Uniformity Aware Representation Learning for Zero-Shot Video Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To enhance model generalizability, this paper presents an end-to-end framework that preserves alignment and uniformity properties for representations on both seen and unseen classes. |
Shi Pu; Kaili Zhao; Mao Zheng; |
1263 | RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a parallel conv kernel into the FC kernel. |
Xiaohan Ding; Honghao Chen; Xiangyu Zhang; Jungong Han; Guiguang Ding; |
1264 | DR.VIC: Decomposition and Reasoning for Video Individual Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to conduct the pedestrian counting from a new perspective – Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once). |
Tao Han; Lei Bai; Junyu Gao; Qi Wang; Wanli Ouyang; |
1265 | LiDARCap: Long-Range Marker-Less 3D Human Motion Capture With LiDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications. We propose LiDARHuman26M, a new human motion capture dataset captured by LiDAR at a much longer range to overcome this limitation. |
Jialian Li; Jingyi Zhang; Zhiyong Wang; Siqi Shen; Chenglu Wen; Yuexin Ma; Lan Xu; Jingyi Yu; Cheng Wang; |
1266 | GeoEngine: A Platform for Production-Ready Geospatial Research Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we introduce the GeoEngine platform for reproducible and production-ready geospatial machine learning research. |
Sagar Verma; Siddharth Gupta; Hal Shin; Akash Panigrahi; Shubham Goswami; Shweta Pardeshi; Natanael Exe; Ujwal Dutta; Tanka Raj Joshi; Nitin Bhojwani; |
1267 | Revisiting Document Image Dewarping By Grid Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization. |
Xiangwei Jiang; Rujiao Long; Nan Xue; Zhibo Yang; Cong Yao; Gui-Song Xia; |
1268 | Semi-Supervised Few-Shot Learning Via Multi-Factor Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the relationship between unlabeled and labeled data is not well exploited in generating pseudo labels, the noise of which will directly harm the model learning. In this paper, we propose a Clustering-based semi-supervised Few-Shot Learning (cluster-FSL) method to solve the above problems in image classification. |
Jie Ling; Lei Liao; Meng Yang; Jia Shuai; |
1269 | CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. |
Qihang Yu; Huiyu Wang; Dahun Kim; Siyuan Qiao; Maxwell Collins; Yukun Zhu; Hartwig Adam; Alan Yuille; Liang-Chieh Chen; |
1270 | Weakly-Supervised Generation and Grounding of Visual Descriptions With Conditional Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, this leads to sub-optimal grounding, since attention coefficients are computed without taking into account the word that needs to be localized. To address this shortcoming, we propose a novel Grounded Visual Description Conditional Variational Autoencoder (GVD-CVAE) and leverage its latent variables for grounding. |
Effrosyni Mavroudi; René Vidal; |
1271 | Novel Class Discovery in Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a new setting of Novel Class Discovery in Semantic Segmentation (NCDSS), which aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. |
Yuyang Zhao; Zhun Zhong; Nicu Sebe; Gim Hee Lee; |
1272 | ARCS: Accurate Rotation and Correspondence Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper is about the old Wahba problem in its more general form, which we call simultaneous rotation and correspondence search. |
Liangzu Peng; Manolis C. Tsakiris; René Vidal; |
1273 | Learning To Anticipate Future With Dynamic Context Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this filed, previous methods usually care more about the model architecture design or but few attention has been put on how to train an anticipation model with a proper learning policy. To this end, in this work, we propose a novel training scheme called Dynamic Context Removal (DCR), which dynamically schedule the visibility of observed future in the learning procedure. |
Xinyu Xu; Yong-Lu Li; Cewu Lu; |
1274 | GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a generative and controllable face SR framework, called GCFSR, which can reconstruct images with faithful identity information without any additional priors. |
Jingwen He; Wu Shi; Kai Chen; Lean Fu; Chao Dong; |
1275 | Perception Prioritized Training of Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that restoring data corrupted with certain noise levels offers a proper pretext task for the model to learn rich visual concepts. |
Jooyoung Choi; Jungbeom Lee; Chaehun Shin; Sungwon Kim; Hyunwoo Kim; Sungroh Yoon; |
1276 | Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In an alternative method, 3D Particle Streak Velocimetry (3D-PSV), the exposure time is increased, and the particles’ pathlines are imaged as "streaks". We treat these streaks (a) as connected endpoints and (b) as conic section segments and develop a theoretical model that describes the mechanisms of 3D ambiguity generation and shows that streaks can drastically reduce reconstruction ambiguities. |
Christina Tsalicoglou; Thomas Rösgen; |
1277 | On The Integration of Self-Attention and Convolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. |
Xuran Pan; Chunjiang Ge; Rui Lu; Shiji Song; Guanfu Chen; Zeyi Huang; Gao Huang; |
1278 | Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a high-quality human motion prediction method that accurately predicts future human poses given observed ones. |
Tiezheng Ma; Yongwei Nie; Chengjiang Long; Qing Zhang; Guiqing Li; |
1279 | CHEX: CHannel EXploration for CNN Model Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such limitations may lead to sub-optimal model quality as well as excessive memory and training cost. In this paper, we propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems. |
Zejiang Hou; Minghai Qin; Fei Sun; Xiaolong Ma; Kun Yuan; Yi Xu; Yen-Kuang Chen; Rong Jin; Yuan Xie; Sun-Yuan Kung; |
1280 | M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we exploit the underlying relations between interacting agents and decouple the joint prediction problem into marginal prediction problems. |
Qiao Sun; Xin Huang; Junru Gu; Brian C. Williams; Hang Zhao; |
1281 | Domain Adaptation on Point Clouds Via Geometry-Aware Implicits Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose a simple yet effective method for unsupervised domain adaptation on point clouds by employing a self-supervised task of learning geometry-aware implicits, which plays two critical roles in one shot. |
Yuefan Shen; Yanchao Yang; Mi Yan; He Wang; Youyi Zheng; Leonidas J. Guibas; |
1282 | Consistency Driven Sequential Transformers Attention Model for Partially Observable Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image and predicts informative glimpse locations solely based on past glimpses. |
Samrudhdhi B. Rangrej; Chetan L. Srinidhi; James J. Clark; |
1283 | GroupViT: Semantic Segmentation Emerges From Text Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, in this paper, we propose to bring back the grouping mechanism into deep networks, which allows semantic segments to emerge automatically with only text supervision. |
Jiarui Xu; Shalini De Mello; Sifei Liu; Wonmin Byeon; Thomas Breuel; Jan Kautz; Xiaolong Wang; |
1284 | NeuralHOFusion: Neural Volumetric Rendering Under Human-Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose NeuralHOFusion, a neural approach for volumetric human-object capture and rendering using sparse consumer RGBD sensors. |
Yuheng Jiang; Suyi Jiang; Guoxing Sun; Zhuo Su; Kaiwen Guo; Minye Wu; Jingyi Yu; Lan Xu; |
1285 | Generalizable Human Pose Triangulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a stochastic framework for human pose triangulation and demonstrate a superior generalization across different camera arrangements on two public datasets. |
Kristijan Bartol; David Bojanić; Tomislav Petković; Tomislav Pribanić; |
1286 | DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs text-driven image manipulation using diffusion models. |
Gwanghyun Kim; Taesung Kwon; Jong Chul Ye; |
1287 | Occlusion-Aware Cost Constructor for Light Field Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple and fast cost constructor to construct matching cost for LF depth estimation. |
Yingqian Wang; Longguang Wang; Zhengyu Liang; Jungang Yang; Wei An; Yulan Guo; |
1288 | SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. |
Anastasiia Kornilova; Marsel Faizullin; Konstantin Pakulev; Andrey Sadkov; Denis Kukushkin; Azat Akhmetyanov; Timur Akhtyamov; Hekmat Taherinejad; Gonzalo Ferrer; |
1289 | BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks Via Image Quantization and Contrastive Adversarial Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose stealthy and efficient Trojan attacks, BppAttack. |
Zhenting Wang; Juan Zhai; Shiqing Ma; |
1290 | GlideNet: Global, Local and Intrinsic Based Dense Embedding NETwork for Multi-Category Attributes Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet significant challenges remain in: 1) predicting a large number of attributes over multiple object categories, 2) modeling category-dependence of attributes, 3) methodically capturing both global and local scene context, and 4) robustly predicting attributes of objects with low pixel-count. To address these issues, we propose a novel multi-category attribute prediction deep architecture named GlideNet, which contains three distinct feature extractors. |
Kareem Metwaly; Aerin Kim; Elliot Branson; Vishal Monga; |
1291 | Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Towards this end, in this paper, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. |
Xingning Dong; Tian Gan; Xuemeng Song; Jianlong Wu; Yuan Cheng; Liqiang Nie; |
1292 | Ensembling Off-the-Shelf Models for GAN Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings, choosing the most accurate model, and progressively adding it to the discriminator ensemble. |
Nupur Kumari; Richard Zhang; Eli Shechtman; Jun-Yan Zhu; |
1293 | Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to employ mode connectivity in loss landscapes to achieve better plasticity-stability trade-off without any previous samples. |
Guoliang Lin; Hanlu Chu; Hanjiang Lai; |
1294 | Topology-Preserving Shape Reconstruction and Registration Via Neural Diffeomorphic Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new model called Neural Diffeomorphic Flow (NDF) to learn deep implicit shape templates, representing shapes as conditional diffeomorphic deformations of templates, intrinsically preserving shape topologies. |
Shanlin Sun; Kun Han; Deying Kong; Hao Tang; Xiangyi Yan; Xiaohui Xie; |
1295 | Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Segment and Complete defense (SAC), a general framework for defending object detectors against patch attacks through detection and removal of adversarial patches. |
Jiang Liu; Alexander Levine; Chun Pong Lau; Rama Chellappa; Soheil Feizi; |
1296 | Cross-Domain Few-Shot Learning With Task-Specific Adapters Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we look at the problem of cross-domain few-shot classification that aims to learn a classifier from previously unseen classes and domains with few labeled samples. |
Wei-Hong Li; Xialei Liu; Hakan Bilen; |
1297 | MAXIM: Multi-Axis MLP for Image Processing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. |
Zhengzhong Tu; Hossein Talebi; Han Zhang; Feng Yang; Peyman Milanfar; Alan Bovik; Yinxiao Li; |
1298 | Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the idea of learning part segmentation through unsupervised domain adaptation (UDA) from synthetic data. |
Qing Liu; Adam Kortylewski; Zhishuai Zhang; Zizhang Li; Mengqi Guo; Qihao Liu; Xiaoding Yuan; Jiteng Mu; Weichao Qiu; Alan Yuille; |
1299 | Delving Into The Estimation Shift of Batch Normalization in A Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper focuses on investigating the estimation of population statistics. |
Lei Huang; Yi Zhou; Tian Wang; Jie Luo; Xianglong Liu; |
1300 | Towards Better Understanding Attribution Methods Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Evaluating such methods is challenging since no ground truth attributions exist. We thus propose three novel evaluation schemes to more reliably measure the faithfulness of those methods, to make comparisons between them more fair, and to make visual inspection more systematic. |
Sukrut Rao; Moritz Böhle; Bernt Schiele; |
1301 | Learning Object Context for Novel-View Scene Layout Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such a problem is challenging as it involves accurate understanding of the 3D geometry and semantics of the scene from as little as a single 2D scene layout. To tackle this challenging problem, we propose a deep model to capture contextualized object representation by explicitly modeling the object context transformation in the scene. |
Xiaotian Qiao; Gerhard P. Hancke; Rynson W.H. Lau; |
1302 | PSTR: End-to-End One-Step Person Search With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture. |
Jiale Cao; Yanwei Pang; Rao Muhammad Anwer; Hisham Cholakkal; Jin Xie; Mubarak Shah; Fahad Shahbaz Khan; |
1303 | Neural Fields As Learnable Kernels for 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Neural Kernel Fields: a novel method for reconstructing implicit 3D shapes based on a learned kernel ridge regression. |
Francis Williams; Zan Gojcic; Sameh Khamis; Denis Zorin; Joan Bruna; Sanja Fidler; Or Litany; |
1304 | A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static Vs. Dynamic Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static frames, there is no quantitative methodology for evaluating such static bias in the latent representation compared to bias toward dynamic information (e.g., motion). We tackle this challenge by proposing a novel approach for quantifying the static and dynamic biases of any spatiotemporal model. |
Matthew Kowal; Mennatullah Siam; Md Amirul Islam; Neil D. B. Bruce; Richard P. Wildes; Konstantinos G. Derpanis; |
1305 | Detector-Free Weakly Supervised Group Activity Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing models for this task are often impractical in that they demand ground-truth bounding box labels of actors even in testing or rely on off-the-shelf object detectors. Motivated by this, we propose a novel model for group activity recognition that depends neither on bounding box labels nor on object detector. |
Dongkeun Kim; Jinsung Lee; Minsu Cho; Suha Kwak; |
1306 | NFormer: Robust Person Re-Identification With Neighbor Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the high intra-identity variations, ignoring such interactions typically leads to outlier features. To tackle this issue, we propose a Neighbor Transformer Network, or NFormer, which explicitly models interactions across all input images, thus suppressing outlier features and leading to more robust representations overall. |
Haochen Wang; Jiayi Shen; Yongtuo Liu; Yan Gao; Efstratios Gavves; |
1307 | Joint Forecasting of Panoptic Segmentations With Difference Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address both issues, we study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene using a transformer model based on ‘difference attention.’ |
Colin Graber; Cyril Jazra; Wenjie Luo; Liangyan Gui; Alexander G. Schwing; |
1308 | HairCLIP: Design Your Hair By Text and Reference Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For this purpose, we encode the image and text conditions in a shared embedding space and propose a unified hair editing framework by leveraging the powerful image text representation capability of the Contrastive Language-Image Pre-Training (CLIP) model. |
Tianyi Wei; Dongdong Chen; Wenbo Zhou; Jing Liao; Zhentao Tan; Lu Yuan; Weiming Zhang; Nenghai Yu; |
1309 | Imposing Consistency for Optical Flow Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces novel and effective consistency strategies for optical flow estimation, a problem where labels from real-world data are very challenging to derive. |
Jisoo Jeong; Jamie Menjay Lin; Fatih Porikli; Nojun Kwak; |
1310 | Style Transformer for Image Inversion and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN which is not only with less distortions, but also of high quality and flexibility for editing. |
Xueqi Hu; Qiusheng Huang; Zhengyi Shi; Siyuan Li; Changxin Gao; Li Sun; Qingli Li; |
1311 | OakInk: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a multi-modal and rich-annotated knowledge repository, OakInk, for visual and cognitive understanding of hand-object interactions. |
Lixin Yang; Kailin Li; Xinyu Zhan; Fei Wu; Anran Xu; Liu Liu; Cewu Lu; |
1312 | Pyramid Adversarial Training Improves ViT Performance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT’s overall performance. |
Charles Herrmann; Kyle Sargent; Lu Jiang; Ramin Zabih; Huiwen Chang; Ce Liu; Dilip Krishnan; Deqing Sun; |
1313 | Bridging Global Context Interactions for High-Fidelity Image Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range dependence. |
Chuanxia Zheng; Tat-Jen Cham; Jianfei Cai; Dinh Phung; |
1314 | SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present SwinBERT, an end-to-end transformer-based model for video captioning, which takes video frame patches directly as inputs, and outputs a natural language description. |
Kevin Lin; Linjie Li; Chung-Ching Lin; Faisal Ahmed; Zhe Gan; Zicheng Liu; Yumao Lu; Lijuan Wang; |
1315 | Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC), which enforces a spatial perturbation function (T) and the translation operator (G) to be commutative (i.e., T \circ G = G \circ T ). |
Yanwu Xu; Shaoan Xie; Wenhao Wu; Kun Zhang; Mingming Gong; Kayhan Batmanghelich; |
1316 | Unseen Classes at A Later Time? No Problem Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A few recent efforts towards tackling CGZSL have been limited by difference in settings, practicality, data splits and protocols followed – inhibiting fair comparison and a clear direction forward. Motivated from these observations, in this work, we firstly consolidate the different CGZSL setting variants and propose a new Online CGZSL setting which is more practical and flexible. |
Hari Chandana Kuchibhotla; Sumitra S Malagi; Shivam Chandhok; Vineeth N Balasubramanian; |
1317 | InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation. |
Mijeong Kim; Seonguk Seo; Bohyung Han; |
1318 | Learning The Degradation Distribution for Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a probabilistic degradation model (PDM), which studies the degradation D as a random variable, and learns its distribution by modeling the mapping from a priori random variable z to D. |
Zhengxiong Luo; Yan Huang; Shang Li; Liang Wang; Tieniu Tan; |
1319 | Dist-PU: Positive-Unlabeled Learning From A Label Distribution Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While existing cost-sensitive-based methods have achieved state-of-the-art performances, they explicitly minimize the risk of classifying unlabeled data as negative samples, which might result in a negative-prediction preference of the classifier. To alleviate this issue, we resort to a label distribution perspective for PU learning in this paper. |
Yunrui Zhao; Qianqian Xu; Yangbangyan Jiang; Peisong Wen; Qingming Huang; |
1320 | SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a second order spatial compatibility (SC^2) measure based method for efficient and robust point cloud registration (PCR), called SC^2-PCR. |
Zhi Chen; Kun Sun; Fan Yang; Wenbing Tao; |
1321 | Relative Pose From A Calibrated and An Uncalibrated Smartphone Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new minimal and a non-minimal solver for estimating the relative camera pose together with the unknown focal length of the second camera. |
Yaqing Ding; Daniel Barath; Jian Yang; Zuzana Kukelova; |
1322 | Towards Robust and Reproducible Active Learning Using Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we demonstrate that under identical experimental conditions, different types of AL algorithms (uncertainty based, diversity based, and committee based) produce an inconsistent gain over random sampling baseline. |
Prateek Munjal; Nasir Hayat; Munawar Hayat; Jamshid Sourati; Shadab Khan; |
1323 | Retrieval Augmented Classification for Long-Tail Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. |
Alexander Long; Wei Yin; Thalaiyasingam Ajanthan; Vu Nguyen; Pulak Purkait; Ravi Garg; Alan Blair; Chunhua Shen; Anton van den Hengel; |
1324 | Not All Tokens Are Equal: Human-Centric Visual Analysis Via Token Clustering Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, not all regions are equally important in human-centric vision tasks, e.g., the human body needs a fine representation with many tokens, while the image background can be modeled by a few tokens. To address this problem, we propose a novel Vision Transformer, called Token Clustering Transformer (TCFormer), which merges tokens by progressive clustering, where the tokens can be merged from different locations with flexible shapes and sizes. |
Wang Zeng; Sheng Jin; Wentao Liu; Chen Qian; Ping Luo; Wanli Ouyang; Xiaogang Wang; |
1325 | Temporally Efficient Vision Transformer for Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS). |
Shusheng Yang; Xinggang Wang; Yu Li; Yuxin Fang; Jiemin Fang; Wenyu Liu; Xun Zhao; Ying Shan; |
1326 | The Devil Is in The Margin: Margin-Based Label Smoothing for Network Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. |
Bingyuan Liu; Ismail Ben Ayed; Adrian Galdran; Jose Dolz; |
1327 | NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce NLX-GPT, a general, compact and faithful language model that can simultaneously predict an answer and explain it. |
Fawaz Sammani; Tanmoy Mukherjee; Nikos Deligiannis; |
1328 | Bringing Old Films Back to Life Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a learning-based framework, recurrent transformer network (RTN), to restore heavily degraded old films. |
Ziyu Wan; Bo Zhang; Dongdong Chen; Jing Liao; |
1329 | Sound and Visual Representation Learning With Multiple Pretraining Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. |
Arun Balajee Vasudevan; Dengxin Dai; Luc Van Gool; |
1330 | WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose WarpingGAN, an effective and efficient 3D point cloud generation network. |
Yingzhi Tang; Yue Qian; Qijian Zhang; Yiming Zeng; Junhui Hou; Xuefei Zhe; |
1331 | RePaint: Inpainting Using Denoising Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. |
Andreas Lugmayr; Martin Danelljan; Andres Romero; Fisher Yu; Radu Timofte; Luc Van Gool; |
1332 | Revealing Occlusions With 4D Neural Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a framework for learning to estimate 4D visual representations from monocular RGB-D video, which is able to persist objects, even once they become obstructed by occlusions. |
Basile Van Hoorick; Purva Tendulkar; Dídac Surís; Dennis Park; Simon Stent; Carl Vondrick; |
1333 | Meta Agent Teaming Active Learning for Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To reduce the human efforts on pose annotations, we propose a novel Meta Agent Teaming Active Learning (MATAL) framework to actively select and label informative images for effective learning. |
Jia Gong; Zhipeng Fan; Qiuhong Ke; Hossein Rahmani; Jun Liu; |
1334 | Forward Propagation, Backward Regression, and Pose Association for Hand Tracking in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose HandLer, a novel convolutional architecture that can jointly detect and track hands online in unconstrained videos. |
Mingzhen Huang; Supreeth Narasimhaswamy; Saif Vazir; Haibin Ling; Minh Hoai; |
1335 | Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To eliminate the heavy dependence on human annotations, we present a novel method, named Pseudo-Q, to automatically generate pseudo language queries for supervised training. |
Haojun Jiang; Yuanze Lin; Dongchen Han; Shiji Song; Gao Huang; |
1336 | E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show that event data is a very valuable modality for egocentric action recognition. |
Chiara Plizzari; Mirco Planamente; Gabriele Goletto; Marco Cannici; Emanuele Gusso; Matteo Matteucci; Barbara Caputo; |
1337 | ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. |
Ningkai Mo; Wanshui Gan; Naoto Yokoya; Shifeng Chen; |
1338 | Self-Supervised Deep Image Restoration Via Adaptive Stochastic Gradient Langevin Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the neuralization of a Bayesian estimator of the problem, this paper presents a self-supervised deep learning approach to general image restoration problems. |
Weixi Wang; Ji Li; Hui Ji; |
1339 | Towards Discovering The Effectiveness of Moderately Confident Samples for Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study whether the moderately confident samples are useless and how to select the useful ones to improve model optimization. |
Hui Tang; Kui Jia; |
1340 | OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we first identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, through extensive experiments, we compare OoD generalization algorithms across two groups of benchmarks, each dominated by one of the distribution shifts, revealing their strengths on one shift as well as limitations on the other shift. |
Nanyang Ye; Kaican Li; Haoyue Bai; Runpeng Yu; Lanqing Hong; Fengwei Zhou; Zhenguo Li; Jun Zhu; |
1341 | An Empirical Study of Training End-to-End Vision-and-Language Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present METER, a Multimodal End-to-end TransformER framework, through which we investigate how to design and pre-train a fully transformer-based VL model in an end-to-end manner. |
Zi-Yi Dou; Yichong Xu; Zhe Gan; Jianfeng Wang; Shuohang Wang; Lijuan Wang; Chenguang Zhu; Pengchuan Zhang; Lu Yuan; Nanyun Peng; Zicheng Liu; Michael Zeng; |
1342 | Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, conventional approaches are basically weak in providing trustworthy multimodal fusion, especially for safety-critical applications (e.g., medical diagnosis). For this issue, we propose a novel trustworthy multimodal classification algorithm termed Multimodal Dynamics, which dynamically evaluates both the feature-level and modality-level informativeness for different samples and thus trustworthily integrates multiple modalities. |
Zongbo Han; Fan Yang; Junzhou Huang; Changqing Zhang; Jianhua Yao; |
1343 | The Neurally-Guided Shape Parser: Grammar-Based Labeling of 3D Shape Regions With Approximate Inference Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the Neurally-Guided Shape Parser (NGSP), a method that learns how to assign fine-grained semantic labels to regions of a 3D shape. |
R. Kenny Jones; Aalia Habib; Rana Hanocka; Daniel Ritchie; |
1344 | Unsupervised Homography Estimation With Coplanarity-Aware GAN Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel method HomoGAN to guide unsupervised homography estimation to focus on the dominant plane. |
Mingbo Hong; Yuhang Lu; Nianjin Ye; Chunyu Lin; Qijun Zhao; Shuaicheng Liu; |
1345 | LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel LiDAR Image Fusion Transformer (LIFT) to model the mutual interaction relationship of cross-sensor data over time. |
Yihan Zeng; Da Zhang; Chunwei Wang; Zhenwei Miao; Ting Liu; Xin Zhan; Dayang Hao; Chao Ma; |
1346 | AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose AutoLoss-Zero, which is a general framework for searching loss functions from scratch for generic tasks. |
Hao Li; Tianwen Fu; Jifeng Dai; Hongsheng Li; Gao Huang; Xizhou Zhu; |
1347 | PatchNet: A Simple Face Anti-Spoofing Framework Via Fine-Grained Patch Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose PatchNet which reformulates face anti-spoofing as a fine-grained patch-type recognition problem. |
Chien-Yi Wang; Yu-Ding Lu; Shang-Ta Yang; Shang-Hong Lai; |
1348 | OnePose: One-Shot Object Pose Estimation Without CAD Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method named OnePose for object pose estimation. |
Jiaming Sun; Zihao Wang; Siyu Zhang; Xingyi He; Hongcheng Zhao; Guofeng Zhang; Xiaowei Zhou; |
1349 | Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper addresses a new problem of weakly-supervised online action segmentation in instructional videos. We present a framework to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach. |
Reza Ghoddoosian; Isht Dwivedi; Nakul Agarwal; Chiho Choi; Behzad Dariush; |
1350 | Rethinking Minimal Sufficient Representation in Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This reveals a new problem that the contrastive learning models have the risk of over-fitting to the shared information between views. To alleviate this problem, we propose to increase the mutual information between the representation and input as regularization to approximately introduce more task-relevant information, since we cannot utilize any downstream task information during training. |
Haoqing Wang; Xun Guo; Zhi-Hong Deng; Yan Lu; |
1351 | Disentangling Visual Embeddings for Attributes and Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study the problem of compositional zero-shot learning for object-attribute recognition. |
Nirat Saini; Khoi Pham; Abhinav Shrivastava; |
1352 | Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL). |
Yikai Wang; Xinwei Sun; Yanwei Fu; |
1353 | Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this demo, we present an interactive system based on a combiner network, trained using contrastive learning, that combines visual and textual features obtained from the OpenAI CLIP network to address conditioned CBIR. |
Alberto Baldrati; Marco Bertini; Tiberio Uricchio; Alberto Del Bimbo; |
1354 | Registering Explicit to Implicit: Towards High-Fidelity Garment Mesh Reconstruction From Single Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, a common problem for the implicit-based methods is that they cannot produce separated and topology-consistent mesh for each garment piece, which is crucial for the current 3D content creation pipeline. To address this issue, we proposed a novel geometry inference framework ReEF that reconstructs topology- consistent layered garment mesh by registering the explicit garment template to the whole-body implicit fields predicted from single images. |
Heming Zhu; Lingteng Qiu; Yuda Qiu; Xiaoguang Han; |
1355 | Federated Class-Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle the global forgetting brought by the non-i.i.d class imbalance across clients, we propose a proxy server that selects the best old global model to assist the local relation distillation. |
Jiahua Dong; Lixu Wang; Zhen Fang; Gan Sun; Shichao Xu; Xiao Wang; Qi Zhu; |
1356 | MiniViT: Compressing Vision Transformers With Weight Multiplexing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, ViT models suffer from huge number of parameters, restricting their applicability on devices with limited computation. To alleviate this problem, we propose MiniViT, a new compression framework, which achieves parameter reduction in vision transformers while retaining the same performance. |
Jinnian Zhang; Houwen Peng; Kan Wu; Mengchen Liu; Bin Xiao; Jianlong Fu; Lu Yuan; |
1357 | Practical Stereo Matching Via Cascaded Recurrent Network With Adaptive Correlation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a set of innovative designs to tackle the problem of practical stereo matching: 1) to better recover fine depth details, we design a hierarchical network with recurrent refinement to update disparities in a coarse-to-fine manner, as well as a stacked cascaded architecture for inference; 2) we propose an adaptive group correlation layer to mitigate the impact of erroneous rectification; 3) we introduce a new synthetic dataset with special attention to difficult cases for better generalizing to real-world scenes. |
Jiankun Li; Peisen Wang; Pengfei Xiong; Tao Cai; Ziwei Yan; Lei Yang; Jiangyu Liu; Haoqiang Fan; Shuaicheng Liu; |
1358 | D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose. |
Sammy Christen; Muhammed Kocabas; Emre Aksan; Jemin Hwangbo; Jie Song; Otmar Hilliges; |
1359 | Show, Deconfound and Tell: Image Captioning With Causal Inference Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we first use Structural Causal Models (SCMs) to show how two confounders damage the image captioning. Then we apply the backdoor adjustment to propose a novel causal inference based image captioning (CIIC) framework, which consists of an interventional object detector (IOD) and an interventional transformer decoder (ITD) to jointly confront both confounders. |
Bing Liu; Dong Wang; Xu Yang; Yong Zhou; Rui Yao; Zhiwen Shao; Jiaqi Zhao; |
1360 | Extracting Triangular 3D Models, Materials, and Lighting From Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations. |
Jacob Munkberg; Jon Hasselgren; Tianchang Shen; Jun Gao; Wenzheng Chen; Alex Evans; Thomas Müller; Sanja Fidler; |
1361 | Weakly Supervised Segmentation on Outdoor 4D Point Clouds With Temporal Matching and Spatial Graph Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study how to achieve scene understanding with limited annotated data. |
Hanyu Shi; Jiacheng Wei; Ruibo Li; Fayao Liu; Guosheng Lin; |
1362 | ImFace: A Nonlinear 3D Morphable Face Model With Implicit Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a novel 3D morphable face model, namely ImFace, to learn a nonlinear and continuous space with implicit neural representations. |
Mingwu Zheng; Hongyu Yang; Di Huang; Liming Chen; |
1363 | MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence. |
Xingyu Chen; Yufeng Liu; Yajiao Dong; Xiong Zhang; Chongyang Ma; Yanmin Xiong; Yuan Zhang; Xiaoyan Guo; |
1364 | Layered Depth Refinement With Mask Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, in this paper, we formulate a novel problem of mask-guided depth refinement that utilizes a generic mask to refine the depth prediction of SIDE models. |
Soo Ye Kim; Jianming Zhang; Simon Niklaus; Yifei Fan; Simon Chen; Zhe Lin; Munchurl Kim; |
1365 | Parameter-Free Online Test-Time Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the inherent uncertainty around the conditions that will ultimately be encountered at test time, we propose a particularly "conservative" approach, which addresses the problem with a Laplacian Adjusted Maximum-likelihood Estimation (LAME) objective. |
Malik Boudiaf; Romain Mueller; Ismail Ben Ayed; Luca Bertinetto; |
1366 | SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Though great success, they ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. |
Wuyang Li; Xinyu Liu; Yixuan Yuan; |
1367 | Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we first prove that MAML with over-parameterized DNNs is guaranteed to converge to global optima at a linear rate. Our convergence analysis indicates that MAML with over-parameterized DNNs is equivalent to kernel regression with a novel class of kernels, which we name as Meta Neural Tangent Kernels (MetaNTK). Then, we propose MetaNTK-NAS, a new training-free neural architecture search (NAS) method for few-shot learning that uses MetaNTK to rank and select architectures. |
Haoxiang Wang; Yite Wang; Ruoyu Sun; Bo Li; |
1368 | LAKe-Net: Topology-Aware Point Cloud Completion By Localizing Aligned Keypoints Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To better tackle the missing topology part, we propose LAKe-Net, a novel topology-aware point cloud completion model by localizing aligned keypoints, with a novel Keypoints-Skeleton-Shape prediction manner. |
Junshu Tang; Zhijun Gong; Ran Yi; Yuan Xie; Lizhuang Ma; |
1369 | Scribble-Supervised LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose using scribbles to annotate LiDAR point clouds and release ScribbleKITTI, the first scribble-annotated dataset for LiDAR semantic segmentation. |
Ozan Unal; Dengxin Dai; Luc Van Gool; |
1370 | AlignMixup: Improving Representations By Interpolating Aligned Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we revisit mixup from the deformation perspective and introduce AlignMixup, where we geometrically align two images in the feature space. |
Shashanka Venkataramanan; Ewa Kijak; Laurent Amsaleg; Yannis Avrithis; |
1371 | No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models By Fitting Feature-Level Space-Time Surfaces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To capture 3D motions without explicitly tracking correspondences, we propose a kinematics-inspired neural network (Kinet) by generalizing the kinematic concept of ST-surfaces to the feature space. |
Jia-Xing Zhong; Kaichen Zhou; Qingyong Hu; Bing Wang; Niki Trigoni; Andrew Markham; |
1372 | HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing methods neglect the symmetries of the problem and suffer from the expensive computational cost, facing the challenge of making real-time multi-agent motion prediction without sacrificing the prediction performance. To tackle this challenge, we propose Hierarchical Vector Transformer (HiVT) for fast and accurate multi-agent motion prediction. |
Zikang Zhou; Luyao Ye; Jianping Wang; Kui Wu; Kejie Lu; |
1373 | HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Hyperspectral Explicable Reconstruction and Optimal Sampling deep Network for SCI, dubbed HerosNet, which includes several phases under the ISTA-unfolding framework. |
Xuanyu Zhang; Yongbing Zhang; Ruiqin Xiong; Qilin Sun; Jian Zhang; |
1374 | Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. |
Arnav Chavan; Zhiqiang Shen; Zhuang Liu; Zechun Liu; Kwang-Ting Cheng; Eric P. Xing; |
1375 | Brain-Inspired Multilayer Perceptron With Spiking Neurons Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we incorporate the machanism of LIF neurons into the MLP models, to achieve better accuracy without extra FLOPs. |
Wenshuo Li; Hanting Chen; Jianyuan Guo; Ziyang Zhang; Yunhe Wang; |
1376 | Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider the problem of recovering a single person’s 3D human mesh from in-the-wild crowded scenes. |
Hongsuk Choi; Gyeongsik Moon; JoonKyu Park; Kyoung Mu Lee; |
1377 | ObjectFormer for Image Manipulation Detection and Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose ObjectFormer to detect and localize image manipulations. |
Junke Wang; Zuxuan Wu; Jingjing Chen; Xintong Han; Abhinav Shrivastava; Ser-Nam Lim; Yu-Gang Jiang; |
1378 | Detecting Deepfakes With Self-Blended Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present novel synthetic training data called self-blended images (SBIs) to detect deepfakes. |
Kaede Shiohara; Toshihiko Yamasaki; |
1379 | Correlation-Aware Deep Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While most methods focus on designing robust matching operations, we propose a novel target-dependent feature network inspired by the self-/cross-attention scheme. |
Fei Xie; Chunyu Wang; Guangting Wang; Yue Cao; Wankou Yang; Wenjun Zeng; |
1380 | Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, effectively leveraging the audio modality in vision-specific annotated videos for action recognition is of particular challenge. To tackle this challenge, we propose a novel audio-visual framework that effectively leverages the audio modality in any solely vision-specific annotated dataset. |
Saghir Alfasly; Jian Lu; Chen Xu; Yuru Zou; |
1381 | NeurMiPs: Neural Mixture of Planar Experts for View Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance. |
Zhi-Hao Lin; Wei-Chiu Ma; Hao-Yu Hsu; Yu-Chiang Frank Wang; Shenlong Wang; |
1382 | Implicit Sample Extension for Unsupervised Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the limited samples in each identity, we suppose there may lack some underlying information to well reveal the accurate clusters. To discover these information, we propose an Implicit Sample Extension (ISE) method to generate what we call support samples around the cluster boundaries. |
Xinyu Zhang; Dongdong Li; Zhigang Wang; Jian Wang; Errui Ding; Javen Qinfeng Shi; Zhaoxiang Zhang; Jingdong Wang; |
1383 | Energy-Based Latent Aligner for Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values. |
K J Joseph; Salman Khan; Fahad Shahbaz Khan; Rao Muhammad Anwer; Vineeth N Balasubramanian; |
1384 | Towards Semi-Supervised Deep Facial Expression Recognition With An Adaptive Confidence Margin Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition. |
Hangyu Li; Nannan Wang; Xi Yang; Xiaoyu Wang; Xinbo Gao; |
1385 | GanOrCon: Are Generative Models Useful for Few-Shot Segmentation? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: But how do these improvements stack-up against recent advances in self-supervised learning? Motivated by this we present an alternative approach based on contrastive learning and compare their performance on standard few-shot part segmentation benchmarks. |
Oindrila Saha; Zezhou Cheng; Subhransu Maji; |
1386 | Bi-Level Doubly Variational Learning for Energy-Based Latent Variable Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs. |
Ge Kan; Jinhu Lü; Tian Wang; Baochang Zhang; Aichun Zhu; Lei Huang; Guodong Guo; Hichem Snoussi; |
1387 | SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To achieve an optimal balance among computation, communication, and performance, a split-aware neural architecture search framework, SplitNets, is introduced to conduct model designing, splitting, and communication reduction simultaneously. |
Xin Dong; Barbara De Salvo; Meng Li; Chiao Liu; Zhongnan Qu; H.T. Kung; Ziyun Li; |
1388 | Masked-Attention Mask Transformer for Universal Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). |
Bowen Cheng; Ishan Misra; Alexander G. Schwing; Alexander Kirillov; Rohit Girdhar; |
1389 | Reading To Listen at The Cocktail Party: Multi-Modal Speech Separation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this paper is speech separation and enhancement in multi-speaker and noisy environments using a combination of different modalities. |
Akam Rahimi; Triantafyllos Afouras; Andrew Zisserman; |
1390 | AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-K ranked list by treating the list as a set. Second, it binarises the Intersection over Union (IoU) of each retrieved video moment using the threshold \theta and thereby ignoring fine-grained localisation quality of ranked moments. We propose an alternative measure for evaluating VMR, called Average Max IoU (AxIoU), which is free from the above two problems. |
Riku Togashi; Mayu Otani; Yuta Nakashima; Esa Rahtu; Janne Heikkilä; Tetsuya Sakai; |
1391 | NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an end-to-end Novel Object Captioning with Retrieved vocabulary from External Knowledge method (NOC-REK), which simultaneously learns vocabulary retrieval and caption generation, successfully describing novel objects outside of the training dataset. |
Duc Minh Vo; Hong Chen; Akihiro Sugimoto; Hideki Nakayama; |
1392 | Boosting Robustness of Image Matting With Context Assembling and Strong Data Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although some works propose to either refine the trimaps or adapt the algorithms to real-world images via extra data augmentation, none of them has taken both into consideration, not to mention the significant performance deterioration on benchmarks while using those data augmentation. To fill this gap, we propose an image matting method which achieves higher robustness (RMat) via multilevel context assembling and strong data augmentation targeting matting. |
Yutong Dai; Brian Price; He Zhang; Chunhua Shen; |
1393 | Group R-CNN for Weakly Semi-Supervised Object Detection With Points Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. |
Shilong Zhang; Zhuoran Yu; Liyang Liu; Xinjiang Wang; Aojun Zhou; Kai Chen; |
1394 | Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the task of action-driven stochastic human motion prediction, which aims to predict multiple plausible future motions given a sequence of action labels and a short motion history. |
Wei Mao; Miaomiao Liu; Mathieu Salzmann; |
1395 | Speech Driven Tongue Animation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion. |
Salvador Medina; Denis Tome; Carsten Stoll; Mark Tiede; Kevin Munhall; Alexander G. Hauptmann; Iain Matthews; |
1396 | Hybrid Relation Guided Set Matching for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric. |
Xiang Wang; Shiwei Zhang; Zhiwu Qing; Mingqian Tang; Zhengrong Zuo; Changxin Gao; Rong Jin; Nong Sang; |
1397 | Self-Supervised Spatial Reasoning on Multi-View Line Drawings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the fact that self-supervised learning is helpful when a large number of data are available, we propose two self-supervised learning approaches to improve the baseline performance for view consistency reasoning and camera pose reasoning tasks on the SPARE3D dataset. |
Siyuan Xiang; Anbang Yang; Yanfei Xue; Yaoqing Yang; Chen Feng; |
1398 | Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these methods suffer from spatial misalignment or false distractors due to delayed and implicit spatial-temporal interaction occurring in the decoding phase. To tackle these limitations, we propose a Language-Bridged Duplex Transfer (LBDT) module which utilizes language as an intermediary bridge to accomplish explicit and adaptive spatial-temporal interaction earlier in the encoding phase. |
Zihan Ding; Tianrui Hui; Junshi Huang; Xiaoming Wei; Jizhong Han; Si Liu; |
1399 | Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study the semi-supervised learning problem, using a few labeled data and a large amount of unlabeled data to train the network, by developing a cross-patch dense contrastive learning framework, to segment cellular nuclei in histopathologic images. |
Huisi Wu; Zhaoze Wang; Youyi Song; Lin Yang; Jing Qin; |
1400 | Frame-Wise Action Representations for Long Videos Via Sequence Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner. |
Minghao Chen; Fangyun Wei; Chong Li; Deng Cai; |
1401 | Coarse-To-Fine Deep Video Coding With Hyperprior-Guided Mode Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, using hyperprior information as the input, we propose two mode prediction networks to respectively predict the optimal block resolutions for better motion coding and decide whether to skip residual information from each block for better residual coding without introducing additional bit cost while bringing negligible extra computation cost. |
Zhihao Hu; Guo Lu; Jinyang Guo; Shan Liu; Wei Jiang; Dong Xu; |
1402 | Generalized Binary Search Network for Highly-Efficient Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel method for highly efficient MVS that remarkably decreases the memory footprint, meanwhile clearly advancing state-of-the-art depth prediction performance. |
Zhenxing Mi; Chang Di; Dan Xu; |
1403 | SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the largest synthetic dataset for autonomous driving, SHIFT. |
Tao Sun; Mattia Segu; Janis Postels; Yuxuan Wang; Luc Van Gool; Bernt Schiele; Federico Tombari; Fisher Yu; |
1404 | Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel method, named Adaptive Hierarchical Representation Learning (AHRL), from a metric learning perspective to address long-tailed object detection. |
Banghuai Li; |
1405 | FlexIT: Towards Flexible Semantic Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing. |
Guillaume Couairon; Asya Grechka; Jakob Verbeek; Holger Schwenk; Matthieu Cord; |
1406 | Face2Exp: Combating Data Biases for Facial Expression Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To combat the mismatch, we propose the Meta-Face2Exp framework, which consists of a base network and an adaptation network. |
Dan Zeng; Zhiyuan Lin; Xiao Yan; Yuting Liu; Fei Wang; Bo Tang; |
1407 | SAR-Net: Shape Alignment and Recovery Network for Category-Level 6D Object Pose and Size Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Given a single scene image, this paper proposes a method of Category-level 6D Object Pose and Size Estimation (COPSE) from the point cloud of the target object, without external real pose-annotated training data. |
Haitao Lin; Zichang Liu; Chilam Cheang; Yanwei Fu; Guodong Guo; Xiangyang Xue; |
1408 | Whose Hands Are These? Hand Detection and Hand-Body Association in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel end-to-end trainable convolutional network that can jointly detect hands and the body location for the corresponding person. |
Supreeth Narasimhaswamy; Thanh Nguyen; Mingzhen Huang; Minh Hoai; |
1409 | Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-Throughs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a simple geometric clustering algorithm for data parallelism that partitions training images (or rather pixels) into different NeRF submodules that can be trained in parallel. |
Haithem Turki; Deva Ramanan; Mahadev Satyanarayanan; |
1410 | PINA: Learning A Personalized Implicit Neural Avatar From A Single RGB-D Video Sequence Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel method to learn Personalized Implicit Neural Avatars (PINA) from a short RGB-D sequence. |
Zijian Dong; Chen Guo; Jie Song; Xu Chen; Andreas Geiger; Otmar Hilliges; |
1411 | Forecasting From LiDAR Via Future Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an end-to-end approach for motion forecasting based on raw sensor measurement as opposed to ground truth tracks. |
Neehar Peri; Jonathon Luiten; Mengtian Li; Aljoša Ošep; Laura Leal-Taixé; Deva Ramanan; |
1412 | CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On large displacements with motion blur, noisy correlations could cause severe errors in the estimated flow. To overcome this challenge, we propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT), aiming to revitalize the correlation volume computation. |
Xiuchao Sui; Shaohua Li; Xue Geng; Yan Wu; Xinxing Xu; Yong Liu; Rick Goh; Hongyuan Zhu; |
1413 | Adversarial Eigen Attack on Black-Box Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we deal with a more practical setting where a pre-trained white-box model with network parameters is provided without extra training data. |
Linjun Zhou; Peng Cui; Xingxuan Zhang; Yinan Jiang; Shiqiang Yang; |
1414 | Training Quantised Neural Networks With STE Variants: The Additive Noise Annealing Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we analyse STE variants and study their impact on QNN training. |
Matteo Spallanzani; Gian Paolo Leonardi; Luca Benini; |
1415 | Split Hierarchical Variational Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: That is, not only do state-of-the-art methods, such as normalizing flows, often demonstrate out-performance, but the initial bits required in coding makes single and parallel image compression challenging. To remedy this, we introduce Split Hierarchical Variational Compression (SHVC). |
Tom Ryder; Chen Zhang; Ning Kang; Shifeng Zhang; |
1416 | Video Swin Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we instead advocate an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. |
Ze Liu; Jia Ning; Yue Cao; Yixuan Wei; Zheng Zhang; Stephen Lin; Han Hu; |
1417 | Privacy Preserving Partial Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, in many scenarios, the pose itself might be sensitive information. We propose a principled approach overcoming these limitations, based on two observations. |
Marcel Geppert; Viktor Larsson; Johannes L. Schönberger; Marc Pollefeys; |
1418 | Cross-Modal Background Suppression for Audio-Visual Event Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence this paper proposes a novel cross-modal background suppression network for AVE task, operating at the time- and event-level, aiming to improve localization performance through suppressing asynchronous audiovisual background frames from the examined events and reducing redundant noise. |
Yan Xia; Zhou Zhao; |
1419 | Mutual Quantization for Cross-Modal Search With Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the challenge, we present a general robust cross-modal hashing framework to correlate distinct modalities and combat noisy labels simultaneously. |
Erkun Yang; Dongren Yao; Tongliang Liu; Cheng Deng; |
1420 | Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we analyzed human walking using the Lagrange’s equation and come to the conclusion that second-order information in the temporal dimension is necessary for identification. |
Tianrui Chai; Annan Li; Shaoxiong Zhang; Zilong Li; Yunhong Wang; |
1421 | SphereSR: 360deg Image Super-Resolution With Arbitrary Projection Via Continuous Spherical Image Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose SphereSR, a novel framework to generate a continuous spherical image representation from an LR 360deg image, with the goal of predicting the RGB values at given spherical coordinates for superresolution with an arbitrary 360deg image projection. |
Youngho Yoon; Inchul Chung; Lin Wang; Kuk-Jin Yoon; |
1422 | Neural Mesh Simplification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we attempt to tackle the novel task of learnable and differentiable mesh simplification. |
Rolandos Alexandros Potamias; Stylianos Ploumpis; Stefanos Zafeiriou; |
1423 | Cloth-Changing Person Re-Identification From A Single Image With Gait Prediction and Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on handling well the CC-ReID problem under a more challenging setting, i.e., just from a single image, which enables an efficient and latency-free person identity matching for surveillance. |
Xin Jin; Tianyu He; Kecheng Zheng; Zhiheng Yin; Xu Shen; Zhen Huang; Ruoyu Feng; Jianqiang Huang; Zhibo Chen; Xian-Sheng Hua; |
1424 | BoxeR: Box-Attention for 2D and 3D Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple attention mechanism, we call Box-Attention. |
Duy-Kien Nguyen; Jihong Ju; Olaf Booij; Martin R. Oswald; Cees G. M. Snoek; |
1425 | Neural Architecture Search With Representation Mutual Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing strategies, such as employing standard training or performance predictor, often suffer from high computational complexity and low generality. To address this issue, we propose to rank architectures by Representation Mutual Information (RMI). |
Xiawu Zheng; Xiang Fei; Lei Zhang; Chenglin Wu; Fei Chao; Jianzhuang Liu; Wei Zeng; Yonghong Tian; Rongrong Ji; |
1426 | Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel single-shot hyperspectral-depth reconstruction method using an off-the-shelf RGB camera and projector. |
Chunyu Li; Yusuke Monno; Masatoshi Okutomi; |
1427 | M3T: Three-Dimensional Medical Image Classifier Using Multi-Plane and Multi-Slice Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we propose a three-dimensional Medical image classifier using Multi-plane and Multi-slice Transformer (M3T) network to classify Alzheimer’s disease (AD) in 3D MRI images. |
Jinseong Jang; Dosik Hwang; |
1428 | 3MASSIV: Multilingual, Multimodal and Multi-Aspect Dataset of Social Media Short Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from a social media platform. |
Vikram Gupta; Trisha Mittal; Puneet Mathur; Vaibhav Mishra; Mayank Maheshwari; Aniket Bera; Debdoot Mukherjee; Dinesh Manocha; |
1429 | Can Neural Nets Learn The Same Model Twice? Investigating Reproducibility and Double Descent From The Decision Boundary Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We discuss methods for visualizing neural network decision boundaries and decision regions. |
Gowthami Somepalli; Liam Fowl; Arpit Bansal; Ping Yeh-Chiang; Yehuda Dar; Richard Baraniuk; Micah Goldblum; Tom Goldstein; |
1430 | Cross Domain Object Detection By Target-Perceived Dual Branch Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing approaches mainly focus on either of these two difficulties, even though they are closely coupled in cross domain object detection. To solve this problem, we propose a novel Target-perceived Dual-branch Distillation (TDD) framework. |
Mengzhe He; Yali Wang; Jiaxi Wu; Yiru Wang; Hanqing Li; Bo Li; Weihao Gan; Wei Wu; Yu Qiao; |
1431 | A Proposal-Based Paradigm for Self-Supervised Sound Source Localization in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we advocate a novel proposal-based paradigm that can directly perform semantic object-level localization, without any manual annotations. |
Hanyu Xuan; Zhiliang Wu; Jian Yang; Yan Yan; Xavier Alameda-Pineda; |
1432 | Overcoming Catastrophic Forgetting in Incremental Object Detection Via Elastic Response Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a response-based incremental distillation method, dubbed Elastic Response Distillation (ERD), which focuses on elastically learning responses from the classification head and the regression head. |
Tao Feng; Mang Wang; Hangjie Yuan; |
1433 | GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction With Relational Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To promote more comprehensive interaction modeling for relational reasoning, we propose GroupNet, a multiscale hypergraph neural network, which is novel in terms of both interaction capturing and representation learning. |
Chenxin Xu; Maosen Li; Zhenyang Ni; Ya Zhang; Siheng Chen; |
1434 | Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents an unbiased subclass regularization network (USRN) that alleviates the class imbalance issue by learning class-unbiased segmentation from balanced subclass distributions. |
Dayan Guan; Jiaxing Huang; Aoran Xiao; Shijian Lu; |
1435 | P3IV: Probabilistic Procedure Planning From Instructional Videos With Weak Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the problem of procedure planning in instructional videos. |
He Zhao; Isma Hadji; Nikita Dvornik; Konstantinos G. Derpanis; Richard P. Wildes; Allan D. Jepson; |
1436 | Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. |
Saquib Sarfraz; Marios Koulakis; Constantin Seibold; Rainer Stiefelhagen; |
1437 | Coupled Iterative Refinement for 6D Multi-Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new approach to 6D object pose estimation which consists of an end-to-end differentiable architecture that makes use of geometric knowledge. |
Lahav Lipson; Zachary Teed; Ankit Goyal; Jia Deng; |
1438 | Multi-View Transformer for 3D Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Multi-View Transformer (MVT) for 3D visual grounding. |
Shijia Huang; Yilun Chen; Jiaya Jia; Liwei Wang; |
1439 | Structured Sparse R-CNN for Direct Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, from a perspective on SGG as a direct set prediction, this paper presents a simple, sparse, and unified framework, termed as Structured Sparse R-CNN. |
Yao Teng; Limin Wang; |
1440 | Multi-Grained Spatio-Temporal Features Perceived Network for Event-Based Lip-Reading Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a novel type of sensing device, event cameras, for the task of ALR. |
Ganchao Tan; Yang Wang; Han Han; Yang Cao; Feng Wu; Zheng-Jun Zha; |
1441 | Semi-Supervised Video Paragraph Grounding With Contrastive Encoder Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, we find the existing VPG methods may not perform well on context modeling and highly rely on video-paragraph annotations. To tackle this problem, we propose a novel VPG method termed Semi-supervised Video-Paragraph TRansformer (SVPTR), which can more effectively exploit contextual information in paragraphs and significantly reduce the dependency on annotated data. |
Xun Jiang; Xing Xu; Jingran Zhang; Fumin Shen; Zuo Cao; Heng Tao Shen; |
1442 | Continual Predictive Learning From Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Can we develop predictive learning algorithms that can deal with more realistic, non-stationary physical environments? In this paper, we study a new continual learning problem in the context of video prediction, and observe that most existing methods suffer from severe catastrophic forgetting in this setup. |
Geng Chen; Wendong Zhang; Han Lu; Siyu Gao; Yunbo Wang; Mingsheng Long; Xiaokang Yang; |
1443 | Weakly Paired Associative Learning for Sound and Image Representations Via Bimodal Associative Memory Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Based on the observation, we propose a new problem to deal with the weakly paired condition: How to boost a certain modal representation even by using other unpaired modal data. |
Sangmin Lee; Hyung-Il Kim; Yong Man Ro; |
1444 | BARC: Learning To Regress 3D Dog Shape From Images By Exploiting Breed Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our goal is to recover the 3D shape and pose of dogs from a single image. |
Nadine Rüegg; Silvia Zuffi; Konrad Schindler; Michael J. Black; |
1445 | Knowledge Distillation: A Good Teacher Is Patient and Consistent Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. |
Lucas Beyer; Xiaohua Zhai; Amélie Royer; Larisa Markeeva; Rohan Anil; Alexander Kolesnikov; |
1446 | PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce PCA-based knowledge distillation to distill lightweight models and show it is motivated by theory. |
Tai-Yin Chiu; Danna Gurari; |
1447 | Frame Averaging for Equivariant Shape Space Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a framework for incorporating equivariance in encoders and decoders by introducing two contributions: (i) adapting the recent Frame Averaging (FA) framework for building generic, efficient, and maximally expressive Equivariant autoencoders; and (ii) constructing autoencoders equivariant to piecewise Euclidean motions applied to different parts of the shape. |
Matan Atzmon; Koki Nagano; Sanja Fidler; Sameh Khamis; Yaron Lipman; |
1448 | Transformer Tracking With Cyclic Shifting Window Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level. |
Zikai Song; Junqing Yu; Yi-Ping Phoebe Chen; Wei Yang; |
1449 | ProposalCLIP: Unsupervised Open-Category Object Proposal Generation Via Exploiting CLIP Cues Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose ProposalCLIP, a method towards unsupervised open-category object proposal generation. |
Hengcan Shi; Munawar Hayat; Yicheng Wu; Jianfei Cai; |
1450 | Towards Understanding Adversarial Robustness of Optical Flow Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we analyze the cause of the problem and show that the lack of robustness is rooted in the classical aperture problem of optical flow estimation in combination with bad choices in the details of the network architecture. |
Simon Schrodi; Tonmoy Saikia; Thomas Brox; |
1451 | Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. |
Zhiqi Li; Wenhai Wang; Enze Xie; Zhiding Yu; Anima Anandkumar; Jose M. Alvarez; Ping Luo; Tong Lu; |
1452 | Training High-Performance Low-Latency Spiking Neural Networks By Differentiation on Spike Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance that is competitive to ANNs yet with low latency. |
Qingyan Meng; Mingqing Xiao; Shen Yan; Yisen Wang; Zhouchen Lin; Zhi-Quan Luo; |
1453 | AnyFace: Free-Style Text-To-Face Synthesis and Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: So this paper proposes the first free-style text-to-face method namely AnyFace enabling much wider open world applications such as metaverse, social media, cosmetics, forensics, etc. |
Jianxin Sun; Qiyao Deng; Qi Li; Muyi Sun; Min Ren; Zhenan Sun; |
1454 | HL-Net: Heterophily Learning Network for Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Accordingly, in this paper, we propose a novel Heterophily Learning Network (HL-Net) to comprehensively explore the homophily and heterophily between objects/relationships in scene graphs. |
Xin Lin; Changxing Ding; Yibing Zhan; Zijian Li; Dacheng Tao; |
1455 | Lifelong Graph Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we bridge GNN and lifelong learning by converting a continual graph learning problem to a regular graph learning problem so GNN can inherit the lifelong learning techniques developed for convolutional neural networks (CNN). |
Chen Wang; Yuheng Qiu; Dasong Gao; Sebastian Scherer; |
1456 | Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Hypergraph-Induced Semantic Tuplet (HIST) loss for deep metric learning that leverages the multilateral semantic relations of multiple samples to multiple classes via hypergraph modeling. |
Jongin Lim; Sangdoo Yun; Seulki Park; Jin Young Choi; |
1457 | Computing Wasserstein-p Distance Between Images With Linear Cost Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel algorithm to compute the Wasserstein-p distance between discrete measures by restricting the optimal transport (OT) problem on a subset. |
Yidong Chen; Chen Li; Zhonghua Lu; |
1458 | DLFormer: Discrete Latent Transformer for Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While noticing the capability of discrete representation for complex reasoning and predictive learning, we propose a novel Discrete Latent Transformer (DLFormer) to reformulate video inpainting tasks into the discrete latent space rather the previous continuous feature space. |
Jingjing Ren; Qingqing Zheng; Yuanyuan Zhao; Xuemiao Xu; Chen Li; |
1459 | Unsupervised Representation Learning for Binary Networks By Joint Classifier Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: But such networks are not readily deployable to edge devices. To accelerate deployment of models with the benefit of unsupervised representation learning to such resource limited devices for various downstream tasks, we propose a self-supervised learning method for binary networks that uses a moving target network. |
Dahyun Kim; Jonghyun Choi; |
1460 | High Quality Segmentation for Ultra High-Resolution Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the fact that humans distinguish among objects continuously from coarse to precise levels, we propose the Continuous Refinement Model(CRM) for the ultra high-resolution segmentation refinement task. |
Tiancheng Shen; Yuechen Zhang; Lu Qi; Jason Kuen; Xingyu Xie; Jianlong Wu; Zhe Lin; Jiaya Jia; |
1461 | Investigating Tradeoffs in Real-World Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. |
Kelvin C.K. Chan; Shangchen Zhou; Xiangyu Xu; Chen Change Loy; |
1462 | MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce MERLOT Reserve, a model that represents videos jointly over time — through a new training objective that learns from audio, subtitles, and video frames. |
Rowan Zellers; Jiasen Lu; Ximing Lu; Youngjae Yu; Yanpeng Zhao; Mohammadreza Salehi; Aditya Kusupati; Jack Hessel; Ali Farhadi; Yejin Choi; |
1463 | Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Differentiable Stereopsis, a multi-view stereo approach that reconstructs shape and texture from few input views and noisy cameras. |
Shubham Goel; Georgia Gkioxari; Jitendra Malik; |
1464 | Towards Practical Certifiable Patch Defense With Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To move towards a practical certifiable patch defense, we introduce Vision Transformer (ViT) into the framework of Derandomized Smoothing (DS). |
Zhaoyu Chen; Bo Li; Jianghe Xu; Shuang Wu; Shouhong Ding; Wenqiang Zhang; |
1465 | A Conservative Approach for Unbiased Learning on Unknown Biases Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate this issue, we present a new scenario that does not necessitate a predefined bias. Under the observation that CNNs do have multi-variant and unbiased representations in the model, we propose a conservative framework that employs this internal information for unbiased learning. |
Myeongho Jeon; Daekyung Kim; Woochul Lee; Myungjoo Kang; Joonseok Lee; |
1466 | Large-Scale Video Panoptic Segmentation in The Wild: A Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new large-scale dataset for the video panoptic segmentation task, which aims to assign semantic classes and track identities to all pixels in a video. |
Jiaxu Miao; Xiaohan Wang; Yu Wu; Wei Li; Xu Zhang; Yunchao Wei; Yi Yang; |
1467 | Label, Verify, Correct: A Simple Few Shot Object Detection Method Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The objective of this paper is few-shot object detection (FSOD) – the task of expanding an object detector for a new category given only a few instances as training. |
Prannay Kaul; Weidi Xie; Andrew Zisserman; |
1468 | Aesthetic Text Logo Synthesis Via Content-Aware Layout Inferring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a content-aware layout generation network which takes glyph images and their corresponding text as input and synthesizes aesthetic layouts for them automatically. |
Yizhi Wang; Guo Pu; Wenhan Luo; Yexin Wang; Pengfei Xiong; Hongwen Kang; Zhouhui Lian; |
1469 | Global Tracking Via Ensemble of Local Trackers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we combine the advantages of both strategies: tracking the target in a global view while exploiting the temporal context. |
Zikun Zhou; Jianqiu Chen; Wenjie Pei; Kaige Mao; Hongpeng Wang; Zhenyu He; |
1470 | Autoregressive Image Generation Using Residual Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. |
Doyup Lee; Chiheon Kim; Saehoon Kim; Minsu Cho; Wook-Shin Han; |
1471 | MPC: Multi-View Probabilistic Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel unified framework for incomplete and complete MVC named multi-view probabilistic clustering (MPC). |
Junjie Liu; Junlong Liu; Shaotian Yan; Rongxin Jiang; Xiang Tian; Boxuan Gu; Yaowu Chen; Chen Shen; Jianqiang Huang; |
1472 | End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To that end, we propose a new end-to-end compressed video representation learning for event boundary detection that leverages the rich information in the compressed domain, i.e., RGB, motion vectors, residuals, and the internal group of pictures (GOP) structure, without fully decoding the video. |
Congcong Li; Xinyao Wang; Longyin Wen; Dexiang Hong; Tiejian Luo; Libo Zhang; |
1473 | GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we formulate GAI as three ubiquitous computer vision tasks: fine-grained recognition, domain adaptation and out-of-distribution recognition. |
Lei Fan; Yiwen Ding; Dongdong Fan; Donglin Di; Maurice Pagnucco; Yang Song; |
1474 | BokehMe: When Neural Rendering Meets Classical Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose BokehMe, a hybrid bokeh rendering framework that marries a neural renderer with a classical physically motivated renderer. |
Juewen Peng; Zhiguo Cao; Xianrui Luo; Hao Lu; Ke Xian; Jianming Zhang; |
1475 | Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we primarily study the video-based cross-modal person Re-ID method. |
Xinyu Lin; Jinxing Li; Zeyu Ma; Huafeng Li; Shuang Li; Kaixiong Xu; Guangming Lu; David Zhang; |
1476 | MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Prior works either simply align the global features of an image with its associated class semantic vector or utilize unidirectional attention to learn the limited latent semantic representations, which could not effectively discover the intrinsic semantic knowledge (e.g., attribute semantics) between visual and attribute features. To solve the above dilemma, we propose a Mutually Semantic Distillation Network (MSDN), which progressively distills the intrinsic semantic representations between visual and attribute features for ZSL. |
Shiming Chen; Ziming Hong; Guo-Sen Xie; Wenhan Yang; Qinmu Peng; Kai Wang; Jian Zhao; Xinge You; |
1477 | Oriented RepPoints for Aerial Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike the mainstreamed approaches regressing the bounding box orientations, this paper proposes an effective adaptive points learning approach to aerial object detection by taking advantage of the adaptive points representation, which is able to capture the geometric information of the arbitrary-oriented instances. |
Wentong Li; Yijie Chen; Kaixuan Hu; Jianke Zhu; |
1478 | OccAM’s Laser: Occlusion-Based Attribution Maps for 3D Object Detectors on LiDAR Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While 3D object detection in LiDAR point clouds is well-established in academia and industry, the explainability of these models is a largely unexplored field. In this paper, we propose a method to generate attribution maps for the detected objects in order to better understand the behavior of such models. |
David Schinagl; Georg Krispel; Horst Possegger; Peter M. Roth; Horst Bischof; |
1479 | BigDatasetGAN: Synthesizing ImageNet With Pixel-Wise Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we scale DatasetGAN to ImageNet scale of class diversity. |
Daiqing Li; Huan Ling; Seung Wook Kim; Karsten Kreis; Sanja Fidler; Antonio Torralba; |
1480 | Align Representations With Base: A New Approach to Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, the proposed intermediate variables are the nearest group of base vectors to representations. Hence, we call the proposed method ARB (Align Representations with Base). |
Shaofeng Zhang; Lyn Qiu; Feng Zhu; Junchi Yan; Hengrui Zhang; Rui Zhao; Hongyang Li; Xiaokang Yang; |
1481 | Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by recent success in unsupervised contrastive representation learning, we propose a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting. |
Jingjing Li; Tianyu Yang; Wei Ji; Jue Wang; Li Cheng; |
1482 | Pre-Train, Self-Train, Distill: A Simple Recipe for Supersizing 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our work learns a unified model for single-view 3D reconstruction of objects from hundreds of semantic categories. |
Kalyan Vasudev Alwala; Abhinav Gupta; Shubham Tulsiani; |
1483 | Meta Distribution Alignment for Generalizable Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, they do not consider the target domain information which is unavailable in the training phrase of DG. To address this issue, we propose a novel Meta Distribution Alignment (MDA) method to enable them to share similar distribution in a test-time-training fashion. |
Hao Ni; Jingkuan Song; Xiaopeng Luo; Feng Zheng; Wen Li; Heng Tao Shen; |
1484 | TeachAugment: Data Augmentation Optimization Using Teacher Knowledge Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a data augmentation optimization method based on the adversarial strategy called TeachAugment, which can produce informative transformed images to the model without requiring careful tuning by leveraging a teacher model. |
Teppei Suzuki; |
1485 | SVIP: Sequence VerIfication for Procedures in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel sequence verification task that aims to distinguish positive video pairs performing the same action sequence from negative ones with step-level transformations but still conducting the same task. |
Yicheng Qian; Weixin Luo; Dongze Lian; Xu Tang; Peilin Zhao; Shenghua Gao; |
1486 | Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning (CPL) to overcome the above limitations. |
Minghang Zheng; Yanjie Huang; Qingchao Chen; Yuxin Peng; Yang Liu; |
1487 | Low-Resource Adaptation for Personalized Co-Speech Gesture Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an approach, named DiffGAN, that efficiently personalizes co-speech gesture generation models of a high-resource source speaker to target speaker with just 2 minutes of target training data. |
Chaitanya Ahuja; Dong Won Lee; Louis-Philippe Morency; |
1488 | BoosterNet: Improving Domain Generalization of Deep Neural Nets Using Culpability-Ranked Features Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose BoosterNet, a lean add-on network that can be simply appended to any arbitrary core network to improve its generalization capability without requiring any changes in its architecture or training procedure. |
Nourhan Bayasi; Ghassan Hamarneh; Rafeef Garbi; |
1489 | Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate this problem of domain shift, conventional wisdom typically concentrates solely on reducing the discrepancy between the source and target domains via attached domain classifiers, yet ignoring the difficulty of such transferable features in coping with both classification and localization subtasks in object detection. To address this issue, in this paper, we propose Task-specific Inconsistency Alignment (TIA), by developing a new alignment mechanism in separate task spaces, improving the performance of the detector on both subtasks. |
Liang Zhao; Limin Wang; |
1490 | HDR-NeRF: High Dynamic Range Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present High Dynamic Range Neural Radiance Fields (HDR-NeRF) to recover an HDR radiance field from a set of low dynamic range (LDR) views with different exposures. |
Xin Huang; Qi Zhang; Ying Feng; Hongdong Li; Xuan Wang; Qing Wang; |
1491 | MS2DG-Net: Progressive Correspondence Learning Via Multiple Sparse Semantics Dynamic Graph Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most such works ignore similar sparse semantics information between two given images and cannot capture local topology among correspondences well. Therefore, to deal with the above problems, Multiple Sparse Semantics Dynamic Graph Network (MS^ 2 DG-Net) is proposed, in this paper, to predict probabilities of correspondences as inliers and recover camera poses. |
Luanyuan Dai; Yizhang Liu; Jiayi Ma; Lifang Wei; Taotao Lai; Changcai Yang; Riqing Chen; |
1492 | Neural Emotion Director: Speech-Preserving Semantic Control of Facial Expressions in "In-the-Wild" Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a novel deep learning method for photo-realistic manipulation of the emotional state of actors in "in-the-wild" videos. |
Foivos Paraperas Papantoniou; Panagiotis P. Filntisis; Petros Maragos; Anastasios Roussos; |
1493 | Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion. |
Evonne Ng; Hanbyul Joo; Liwen Hu; Hao Li; Trevor Darrell; Angjoo Kanazawa; Shiry Ginosar; |
1494 | 3PSDF: Three-Pole Signed Distance Function for Learning Surfaces With Arbitrary Topologies Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel learnable implicit representation, called the three-pole signed distance function (3PSDF), that can represent non-watertight 3D shapes with arbitrary topologies while supporting easy field-to-mesh conversion using the classic Marching Cubes algorithm. |
Weikai Chen; Cheng Lin; Weiyang Li; Bo Yang; |
1495 | Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video. |
Wen-Li Wei; Jen-Chun Lin; Tyng-Luh Liu; Hong-Yuan Mark Liao; |
1496 | MixFormer: End-to-End Tracking With Iterative Mixed Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. |
Yutao Cui; Cheng Jiang; Limin Wang; Gangshan Wu; |
1497 | Sparse Fuse Dense: Towards High Quality 3D Detection With Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel multi-modal framework SFD (Sparse Fuse Dense), which utilizes pseudo point clouds generated from depth completion to tackle the issues mentioned above. |
Xiaopei Wu; Liang Peng; Honghui Yang; Liang Xie; Chenxi Huang; Chengqi Deng; Haifeng Liu; Deng Cai; |
1498 | GIRAFFE HD: A High-Resolution 3D-Aware Generative Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose GIRAFFE HD, a high-resolution 3D-aware generative model that inherits all of GIRAFFE’s controllable features while generating high-quality, high-resolution images (512^2 resolution and above). |
Yang Xue; Yuheng Li; Krishna Kumar Singh; Yong Jae Lee; |
1499 | InOut: Diverse Image Outpainting Via GAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we formulate the problem from the perspective of inverting generative adversarial networks. |
Yen-Chi Cheng; Chieh Hubert Lin; Hsin-Ying Lee; Jian Ren; Sergey Tulyakov; Ming-Hsuan Yang; |
1500 | PNP: Robust Learning From Noisy Labels By Probabilistic Noise Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these approaches focus on dividing samples by order sorting or threshold selection, inevitably introducing hyper-parameters (e.g., selection ratio / threshold) that are hard-to-tune and dataset-dependent. To this end, we propose a simple yet effective approach named PNP (Probabilistic Noise Prediction) to explicitly model label noise. |
Zeren Sun; Fumin Shen; Dan Huang; Qiong Wang; Xiangbo Shu; Yazhou Yao; Jinhui Tang; |
1501 | Estimating Structural Disparities for Face Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: On the other hand, in many scenarios noisy groupings may be obtainable using some form of a proxy, which would allow measuring disparity metrics across sub-populations. Here we explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation. |
Shervin Ardeshir; Cristina Segalin; Nathan Kallus; |
1502 | Revisiting The Transferability of Supervised Pretraining: An MLP Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. |
Yizhou Wang; Shixiang Tang; Feng Zhu; Lei Bai; Rui Zhao; Donglian Qi; Wanli Ouyang; |
1503 | Plenoxels: Radiance Fields Without Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis. |
Sara Fridovich-Keil; Alex Yu; Matthew Tancik; Qinhong Chen; Benjamin Recht; Angjoo Kanazawa; |
1504 | What Matters for Meta-Learning Vision Regression Tasks? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Meta-learning is widely used in few-shot classification and function regression due to its ability to quickly adapt to unseen tasks. However, it has not yet been well explored on regression tasks with high dimensional inputs such as images. This paper makes two main contributions that help understand this barely explored area. |
Ning Gao; Hanna Ziesche; Ngo Anh Vien; Michael Volpp; Gerhard Neumann; |
1505 | Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the complex labeling process makes it challenging to provide AU annotations for large amounts of facial images. To remedy this, we utilize AU labeling rules defined by the Facial Action Coding System (FACS) to design a novel knowledge-driven self-supervised representation learning framework for AU recognition. |
Yanan Chang; Shangfei Wang; |
1506 | Selective-Supervised Contrastive Learning With Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To learn robust representations and handle noisy labels, we propose selective-supervised contrastive learning (Sel-CL) in this paper. |
Shikun Li; Xiaobo Xia; Shiming Ge; Tongliang Liu; |
1507 | Learning Second Order Local Anomaly for General Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel method to improve the generalization ability of CNN-based face forgery detectors. |
Jianwei Fei; Yunshu Dai; Peipeng Yu; Tianrun Shen; Zhihua Xia; Jian Weng; |
1508 | ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a direct adaptation strategy (ADAS), which aims to directly adapt a single model to multiple target domains in a semantic segmentation task without pretrained domain-specific models. |
Seunghun Lee; Wonhyeok Choi; Changjae Kim; Minwoo Choi; Sunghoon Im; |
1509 | The Devil Is in The Labels: Noisy Label Correction for Robust Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, almost all existing SGG models have overlooked the ground-truth annotation qualities of prevailing SGG datasets, i.e., they always assume: 1) all the manually annotated positive samples are equally correct; 2) all the un-annotated negative samples are absolutely background. In this paper, we argue that both assumptions are inapplicable to SGG: there are numerous "noisy" ground-truth predicate labels that break these two assumptions, and these noisy samples actually harm the training of unbiased SGG models. |
Lin Li; Long Chen; Yifeng Huang; Zhimeng Zhang; Songyang Zhang; Jun Xiao; |
1510 | LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Adopting a different approach in this work, we show that significantly better cross-modal alignments can be achieved through the early fusion of linguistic and visual features in intermediate layers of a vision Transformer encoder network. |
Zhao Yang; Jiaqi Wang; Yansong Tang; Kai Chen; Hengshuang Zhao; Philip H.S. Torr; |
1511 | SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simplex noise transition matrix (SimT) to model the mixed noise distributions in DA semantic segmentation and formulate the problem as estimation of SimT. |
Xiaoqing Guo; Jie Liu; Tongliang Liu; Yixuan Yuan; |
1512 | Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we provide mathematical evidence for IP’s superior performance and demonstrate that IP outperforms SP on all tested state-of-the-art unstructured pruning methods. |
Paul Wimmer; Jens Mehnert; Alexandru Condurache; |
1513 | PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In self-training, shapes are passed through a recognition model, which predicts programs that are treated as ‘pseudo-labels’ for those shapes. Related to these approaches, we introduce a novel self-training variant unique to program inference, where program pseudo-labels are paired with their executed output shapes, avoiding label mismatch at the cost of an approximate shape distribution. |
R. Kenny Jones; Homer Walke; Daniel Ritchie; |
1514 | PTTR: Relational 3D Point Cloud Object Tracking With Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations. |
Changqing Zhou; Zhipeng Luo; Yueru Luo; Tianrui Liu; Liang Pan; Zhongang Cai; Haiyu Zhao; Shijian Lu; |
1515 | Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To circumvent the former problem, we propose a novel algorithm that attacks semantic similarity on feature representations. |
Cheng Luo; Qinliang Lin; Weicheng Xie; Bizhu Wu; Jinheng Xie; Linlin Shen; |
1516 | ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we are concerned with rotation equivariance on 2D point cloud data. |
Georg Bökman; Fredrik Kahl; Axel Flinth; |
1517 | Video Demoireing With Relation-Based Temporal Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering the increasing demands for capturing videos, we study how to remove such undesirable moire patterns in videos, namely video demoireing. To this end, we introduce the first hand-held video demoireing dataset with a dedicated data collection pipeline to ensure spatial and temporal alignments of captured data. |
Peng Dai; Xin Yu; Lan Ma; Baoheng Zhang; Jia Li; Wenbo Li; Jiajun Shen; Xiaojuan Qi; |
1518 | Co-Domain Symmetry for Complex-Valued Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We study complex-valued scaling as a type of symmetry natural and unique to complex-valued measurements and representations. |
Utkarsh Singhal; Yifei Xing; Stella X. Yu; |
1519 | Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel style transfer method to quickly create a new visual product with a nice appearance for industrial designers’ reference. |
Jinchao Yang; Fei Guo; Shuo Chen; Jun Li; Jian Yang; |
1520 | Modeling Image Composition for Complex Scene Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method that achieves state-of-the-art results on challenging (few-shot) layout-to-image generation tasks by accurately modeling textures, structures and relationships contained in a complex scene. |
Zuopeng Yang; Daqing Liu; Chaoyue Wang; Jie Yang; Dacheng Tao; |
1521 | SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a sparselysupervised 3D object detection method, named SS3D. |
Chuandong Liu; Chenqiang Gao; Fangcen Liu; Jiang Liu; Deyu Meng; Xinbo Gao; |
1522 | Remember The Difference: Cross-Domain Few-Shot Semantic Segmentation Via Meta-Memory Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The actual domain shift problem significantly reduces the performance of few-shot learning. To remedy this problem, we propose an interesting and challenging cross-domain few-shot semantic segmentation task, where the training and test tasks perform on different domains. |
Wenjian Wang; Lijuan Duan; Yuxi Wang; Qing En; Junsong Fan; Zhaoxiang Zhang; |
1523 | GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel approach that regulates point sampling and radiance field learning on 2D manifolds, embodied as a set of learned implicit surfaces in the 3D volume. |
Yu Deng; Jiaolong Yang; Jianfeng Xiang; Xin Tong; |
1524 | UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the success of popular SSL methods has limited on single-centric-object images like those in ImageNet and ignores the correlation among the scene and instances, as well as the semantic difference of instances in the scene. To address the above problems, we propose a Unified Self-supervised Visual Pre-training (UniVIP), a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. |
Zhaowen Li; Yousong Zhu; Fan Yang; Wei Li; Chaoyang Zhao; Yingying Chen; Zhiyang Chen; Jiahao Xie; Liwei Wu; Rui Zhao; Ming Tang; Jinqiao Wang; |
1525 | GraFormer: Graph-Oriented Transformer for 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To better model the relation of joints for 3D pose estimation, we propose an effective but simple network, called GraFormer, where a novel transformer architecture is designed via embedding graph convolution layers after multi-head attention block. |
Weixi Zhao; Weiqiang Wang; Yunjie Tian; |
1526 | Decoupling Zero-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 22 points on the PAS-CAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. |
Jian Ding; Nan Xue; Gui-Song Xia; Dengxin Dai; |
1527 | Neural Collaborative Graph Machines for Table Structure Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: With the aim of filling this gap, we present a novel Neural Collaborative Graph Machines (NCGM) equipped with stacked collaborative blocks, which alternatively extracts intra-modality context and models inter-modality interactions in a hierarchical way. |
Hao Liu; Xin Li; Bing Liu; Deqiang Jiang; Yinsong Liu; Bo Ren; |
1528 | Towards Robust Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we conduct systematic evaluation on components of ViTs in terms of their impact on robustness to adversarial examples, common corruptions and distribution shifts. |
Xiaofeng Mao; Gege Qi; Yuefeng Chen; Xiaodan Li; Ranjie Duan; Shaokai Ye; Yuan He; Hui Xue; |
1529 | DeepCurrents: Learning Implicit Representations of Shapes With Boundaries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a hybrid shape representation that combines explicit boundary curves with implicit learned interiors. |
David Palmer; Dmitriy Smirnov; Stephanie Wang; Albert Chern; Justin Solomon; |
1530 | Learning Affordance Grounding From Exocentric Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Human has the ability that transform the various exocentric interactions to invariant egocentric affordance so as to counter the impact of interactive diversity. To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i.e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision. |
Hongchen Luo; Wei Zhai; Jing Zhang; Yang Cao; Dacheng Tao; |
1531 | Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. |
Van Nguyen Nguyen; Yinlin Hu; Yang Xiao; Mathieu Salzmann; Vincent Lepetit; |
1532 | Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting The Adversarial Transferability Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we treat the iterative ensemble attack as a stochastic gradient descent optimization process, in which the variance of the gradients on different models may lead to poor local optima. |
Yifeng Xiong; Jiadong Lin; Min Zhang; John E. Hopcroft; Kun He; |
1533 | Unknown-Aware Object Detection: Learning What You Don’t Know From Videos in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new unknown-aware object detection framework through Spatial-Temporal Unknown Distillation (STUD), which distills unknown objects from videos in the wild and meaningfully regularizes the model’s decision boundary. |
Xuefeng Du; Xin Wang; Gabriel Gozum; Yixuan Li; |
1534 | Multi-Modal Extreme Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. |
Anshul Mittal; Kunal Dahiya; Shreya Malani; Janani Ramaswamy; Seba Kuruvilla; Jitendra Ajmera; Keng-hao Chang; Sumeet Agarwal; Purushottam Kar; Manik Varma; |
1535 | IFOR: Iterative Flow Minimization for Robotic Object Rearrangement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, an end-to-end method for the challenging problem of object rearrangement for unknown objects given an RGBD image of the original and final scenes. |
Ankit Goyal; Arsalan Mousavian; Chris Paxton; Yu-Wei Chao; Brian Okorn; Jia Deng; Dieter Fox; |
1536 | Training-Free Transformer Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, for the first time, we investigate how to conduct TAS in a training-free manner and devise an effective training-free TAS (TF-TAS) scheme. |
Qinqin Zhou; Kekai Sheng; Xiawu Zheng; Ke Li; Xing Sun; Yonghong Tian; Jie Chen; Rongrong Ji; |
1537 | Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a unified approach to visual navigation using a novel modular transfer learning model. |
Ziad Al-Halah; Santhosh Kumar Ramakrishnan; Kristen Grauman; |
1538 | Non-Isotropy Regularization for Proxy-Based Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given the inherent non-bijectiveness of used distance functions, this can induce locally isotropic sample distributions, leading to crucial semantic context being missed due to difficulties resolving local structures and intraclass relations between samples. To alleviate this problem, we propose non-isotropy regularization (NIR) for proxy-based Deep Metric Learning. |
Karsten Roth; Oriol Vinyals; Zeynep Akata; |
1539 | C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Contrastive learning for Class-agnostic Activation Map (C^2AM) generation only using unlabeled image data, without the involvement of image-level supervision. |
Jinheng Xie; Jianfeng Xiang; Junliang Chen; Xianxu Hou; Xiaodong Zhao; Linlin Shen; |
1540 | TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer). |
Wenqiang Zhang; Zilong Huang; Guozhong Luo; Tao Chen; Xinggang Wang; Wenyu Liu; Gang Yu; Chunhua Shen; |
1541 | 3DAC: Learning Attribute Compression for Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Through an in-depth exploration of the relationships between different encoding steps and different attribute channels, we introduce a deep compression network, termed 3DAC, to explicitly compress the attributes of 3D point clouds and reduce storage usage in this paper. |
Guangchi Fang; Qingyong Hu; Hanyun Wang; Yiling Xu; Yulan Guo; |
1542 | Learning A Structured Latent Space for Unsupervised Point Cloud Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel framework, which learns a unified and structured latent space that encoding both partial and complete point clouds. |
Yingjie Cai; Kwan-Yee Lin; Chao Zhang; Qiang Wang; Xiaogang Wang; Hongsheng Li; |
1543 | The Wanderings of Odysseus in 3D Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our goal is to populate digital environments, in which digital humans have diverse body shapes, move perpetually, and have plausible body-scene contact. |
Yan Zhang; Siyu Tang; |
1544 | Few-Shot Learning With Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address mislabeled samples in FSL settings, we make several technical contributions. |
Kevin J. Liang; Samrudhdhi B. Rangrej; Vladan Petrovic; Tal Hassner; |
1545 | Understanding 3D Object Articulation in Internet Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary RGB videos. |
Shengyi Qian; Linyi Jin; Chris Rockwell; Siyi Chen; David F. Fouhey; |
1546 | Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The limitation of visual representation is prone to causing vision-language mismatching and producing poor segmentation results. To address this, we propose a novel multi-level representation learning approach, which explores the inherent structure of the video content to provide a set of discriminative visual embedding, enabling more effective vision-language semantic alignment. |
Dongming Wu; Xingping Dong; Ling Shao; Jianbing Shen; |
1547 | Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel scalable and effective mixing building block called Paramixer. |
Tong Yu; Ruslan Khalitov; Lei Cheng; Zhirong Yang; |
1548 | Interactive Image Synthesis With Panoptic Layout Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: When placement of bounding boxes is subject to perturbation, layout-based models suffer from "missing regions" in the constructed semantic layouts and hence undesirable artifacts in the generated images. In this work, we propose Panoptic Layout Generative Adversarial Network (PLGAN) to address this challenge. |
Bo Wang; Tao Wu; Minfeng Zhu; Peng Du; |
1549 | Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The gap in image-to-image generation for stereo views is much smaller than that in image-to-LiDAR generation. Motivated by this, we propose a Pseudo-Stereo 3D detection framework with three novel virtual view generation methods, including image-level generation, feature-level generation, and feature-clone, for detecting 3D objects from a single image. |
Yi-Nan Chen; Hang Dai; Yong Ding; |
1550 | All-in-One Image Restoration for Unknown Corruption Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study a challenging problem in image restoration, namely, how to develop an all-in-one method that could recover images from a variety of unknown corruption types and levels. |
Boyun Li; Xiao Liu; Peng Hu; Zhongqin Wu; Jiancheng Lv; Xi Peng; |
1551 | Syntax-Aware Network for Handwritten Mathematical Expression Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network. |
Ye Yuan; Xiao Liu; Wondimu Dikubab; Hui Liu; Zhilong Ji; Zhongqin Wu; Xiang Bai; |
1552 | Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Sketching enables many exciting applications, notably, image retrieval. The fear-to-sketch problem (i.e., I can’t sketch) has however proven to be fatal for its widespread adoption. This paper tackles this fear head on, and for the first time, proposes an auxiliary module for existing retrieval models that predominantly lets the users sketch without having to worry. |
Ayan Kumar Bhunia; Subhadeep Koley; Abdullah Faiz Ur Rahman Khilji; Aneeshan Sain; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; |
1553 | PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to explicitly integrate two matching priors in a single loss in order to learn local descriptors without supervision. |
Jérome Revaud; Vincent Leroy; Philippe Weinzaepfel; Boris Chidlovskii; |
1554 | PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present PlanarRecon — a novel framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. |
Yiming Xie; Matheus Gadelha; Fengting Yang; Xiaowei Zhou; Huaizu Jiang; |
1555 | Deep Equilibrium Optical Flow Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: They can converge poorly and thereby suffer from performance degradation. To combat these drawbacks, we propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer (using any black-box solver), and differentiates through this fixed point analytically (thus requiring O(1) training memory). |
Shaojie Bai; Zhengyang Geng; Yash Savani; J. Zico Kolter; |
1556 | Optimizing Video Prediction Via Video Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the photo-realistic results of video frame interpolation, we present a new optimization framework for video prediction via video frame interpolation, in which we solve an extrapolation problem based on an interpolation model. |
Yue Wu; Qiang Wen; Qifeng Chen; |
1557 | Motron: Multimodal Probabilistic Human Motion Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Motron, a multimodal, probabilistic, graph-structured model, that captures human’s multimodality using probabilistic methods while being able to output deterministic maximum-likelihood motions and corresponding confidence values for each mode. |
Tim Salzmann; Marco Pavone; Markus Ryll; |
1558 | Episodic Memory Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Towards that end, we introduce (1) a new task — Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent’s spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer. |
Samyak Datta; Sameer Dharur; Vincent Cartillier; Ruta Desai; Mukul Khanna; Dhruv Batra; Devi Parikh; |
1559 | Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these challenges, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at deployment. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. |
Chenghao Zhang; Kun Tian; Bin Fan; Gaofeng Meng; Zhaoxiang Zhang; Chunhong Pan; |
1560 | Few-Shot Backdoor Defense Using Shapley Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To determine the triggered neurons and protect against backdoor attacks, we exploit Shapley value and develop a new approach called Shapley Pruning (ShapPruning) that successfully mitigates backdoor attacks from models in a data-insufficient situation (1 image per class or even free of data). |
Jiyang Guan; Zhuozhuo Tu; Ran He; Dacheng Tao; |
1561 | Cycle-Consistent Counterfactuals By Latent Transformations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel approach, Cycle-Consistent Counterfactuals by Latent Transformations (C3LT), which learns a latent transformation that automatically generates visual CFs by steering in the latent space of generative models. |
Saeed Khorram; Li Fuxin; |
1562 | ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We describe a method to deal with performance drop in semantic segmentation caused by viewpoint changes within multi-camera systems, where temporally paired images are readily available, but the annotations may only be abundant for a few typical views. |
Hanxiang Ren; Yanchao Yang; He Wang; Bokui Shen; Qingnan Fan; Youyi Zheng; C. Karen Liu; Leonidas J. Guibas; |
1563 | Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to forecast future hand-object interactions given an egocentric video. |
Shaowei Liu; Subarna Tripathi; Somdeb Majumdar; Xiaolong Wang; |
1564 | Blind Face Restoration Via Integrating Face Shape and Generative Priors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose to integrate shape and generative priors to guide the challenging blind face restoration. |
Feida Zhu; Junwei Zhu; Wenqing Chu; Xinyi Zhang; Xiaozhong Ji; Chengjie Wang; Ying Tai; |
1565 | MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose MixSTE (Mixed Spatio-Temporal Encoder), which has a temporal transformer block to separately model the temporal motion of each joint and a spatial transformer block to learn inter-joint spatial correlation. |
Jinlu Zhang; Zhigang Tu; Jianyu Yang; Yujin Chen; Junsong Yuan; |
1566 | Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider the problem of safe SSL scenario where unseen-class instances appear in the unlabeled data. |
Rundong He; Zhongyi Han; Xiankai Lu; Yilong Yin; |
1567 | Learning To Zoom Inside Camera Imaging Pipeline Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes to zoom an image from RAW to RAW inside the camera imaging pipeline. |
Chengzhou Tang; Yuqiang Yang; Bing Zeng; Ping Tan; Shuaicheng Liu; |
1568 | High-Fidelity GAN Inversion for Image Attribute Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved (e.g., background, appearance, and illumination). |
Tengfei Wang; Yong Zhang; Yanbo Fan; Jue Wang; Qifeng Chen; |
1569 | RCP: Recurrent Closest Point for Point Cloud Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these methods are limited by the fact that it is difficult to define a search window on point clouds because of the irregular data structure. In this paper, we avoid this irregularity by a simple yet effective method. |
Xiaodong Gu; Chengzhou Tang; Weihao Yuan; Zuozhuo Dai; Siyu Zhu; Ping Tan; |
1570 | GDNA: Towards Generative Detailed Neural Avatars Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel method that learns to generate detailed 3D shapes of people in a variety of garments with corresponding skinning weights. |
Xu Chen; Tianjian Jiang; Jie Song; Jinlong Yang; Michael J. Black; Andreas Geiger; Otmar Hilliges; |
1571 | A Dual Weighting Label Assignment Scheme for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore a new weighting paradigm, termed dual weighting (DW), to specify pos and neg weights separately. |
Shuai Li; Chenhang He; Ruihuang Li; Lei Zhang; |
1572 | FAM: Visual Explanations for The Feature Representations From Deep Convolutional Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this case, employing an existing class or the similarity with other image is unable to provide a complete and reliable visual explanation. To handle this task, we propose a novel visual explanation paradigm called Feature Activation Mapping (FAM) in this paper. |
Yuxi Wu; Changhuai Chen; Jun Che; Shiliang Pu; |
1573 | Hyperbolic Vision Transformers: Combining Improvements in Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. |
Aleksandr Ermolov; Leyla Mirvakhabova; Valentin Khrulkov; Nicu Sebe; Ivan Oseledets; |
1574 | MaskGIT: Masked Generative Image Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. |
Huiwen Chang; Han Zhang; Lu Jiang; Ce Liu; William T. Freeman; |
1575 | Revisiting The "Video" in Video-Language Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the atemporal probe (ATP), a new model for video-language analysis which provides a stronger bound on the baseline accuracy of multimodal models constrained by image-level understanding. |
Shyamal Buch; Cristóbal Eyzaguirre; Adrien Gaidon; Jiajun Wu; Li Fei-Fei; Juan Carlos Niebles; |
1576 | Local Texture Estimator for Implicit Representation Function Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Local Texture Estimator (LTE), a dominant-frequency estimator for natural images, enabling an implicit function to capture fine details while reconstructing images in a continuous manner. |
Jaewon Lee; Kyong Hwan Jin; |
1577 | Instance-Aware Dynamic Neural Network Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present to conduct the low-bit quantization for each image individually, and develop a dynamic quantization scheme for exploring their optimal bit-widths. |
Zhenhua Liu; Yunhe Wang; Kai Han; Siwei Ma; Wen Gao; |
1578 | When To Prune? A Policy Towards Early Structural Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We attempt to combine the benefits of both directions and propose a policy that prunes as early as possible during training without hurting performance. |
Maying Shen; Pavlo Molchanov; Hongxu Yin; Jose M. Alvarez; |
1579 | COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, two-stream methods like CLIP and ALIGN with high inference efficiency have also shown promising performance, however, they only consider instance-level alignment between the two streams (thus there is still room for improvement). To overcome these limitations, we propose a novel COllaborative Two-Stream vision-language pre-training model termed COTS for image-text retrieval by enhancing cross-modal interaction. |
Haoyu Lu; Nanyi Fei; Yuqi Huo; Yizhao Gao; Zhiwu Lu; Ji-Rong Wen; |
1580 | Degree-of-Linear-Polarization-Based Color Constancy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper shows that the degree of linear polarization dramatically solves the color constancy problem because it allows us to find achromatic pixels stably. |
Taishi Ono; Yuhi Kondo; Legong Sun; Teppei Kurita; Yusuke Moriuchi; |
1581 | A Voxel Graph CNN for Object Classification With Event Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This study aims to address the core problem of balancing accuracy and model complexity for event-based classification models. |
Yongjian Deng; Hao Chen; Hai Liu; Youfu Li; |
1582 | On The Importance of Asymmetry for Siamese Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we conduct a formal study on the importance of asymmetry by explicitly distinguishing the two encoders within the network — one produces source encodings and the other targets. |
Xiao Wang; Haoqi Fan; Yuandong Tian; Daisuke Kihara; Xinlei Chen; |
1583 | Probing Representation Forgetting in Supervised and Unsupervised Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. |
MohammadReza Davari; Nader Asadi; Sudhir Mudur; Rahaf Aljundi; Eugene Belilovsky; |
1584 | ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To tackle the modality missing problem of scene text, we propose a novel fusion token based transformer aggregation approach to exchange the necessary scene text information only through the fusion token and concentrate on the most important features in each modality. |
Mengjun Cheng; Yipeng Sun; Longchao Wang; Xiongwei Zhu; Kun Yao; Jie Chen; Guoli Song; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
1585 | DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP. |
Yongming Rao; Wenliang Zhao; Guangyi Chen; Yansong Tang; Zheng Zhu; Guan Huang; Jie Zhou; Jiwen Lu; |
1586 | Exploring Effective Data for Surrogate Training Towards Black-Box Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a triple-player framework by introducing a discriminator into the traditional data-free framework. |
Xuxiang Sun; Gong Cheng; Hongda Li; Lei Pei; Junwei Han; |
1587 | JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce JRDB-Act, as an extension of the existing JRDB, which is captured by a social mobile manipulator and reflects a real distribution of human daily-life actions in a university campus environment. |
Mahsa Ehsanpour; Fatemeh Saleh; Silvio Savarese; Ian Reid; Hamid Rezatofighi; |
1588 | AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As an alternative to an AR-GAN, we propose an aperture rendering NeRF (AR-NeRF), which can utilize viewpoint and defocus cues in a unified manner by representing both factors in a common ray-tracing framework. |
Takuhiro Kaneko; |
1589 | Likert Scoring With Grade Decoupling for Long-Term Action Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, the final score should be determined by the comprehensive effect of different grades exhibited in the video. To explore this latent relationship, we design a novel Likert scoring paradigm inspired by the Likert scale in psychometrics, in which we quantify the grades explicitly and generate the final quality score by combining the quantitative values and the corresponding responses estimated from the video, instead of performing direct regression. |
Angchi Xu; Ling-An Zeng; Wei-Shi Zheng; |
1590 | Many-to-Many Splatting for Efficient Video Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. |
Ping Hu; Simon Niklaus; Stan Sclaroff; Kate Saenko; |
1591 | Investigating Top-k White-Box and Transferable Black-Box Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class. |
Chaoning Zhang; Philipp Benz; Adil Karjauv; Jae Won Cho; Kang Zhang; In So Kweon; |
1592 | Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty under small data setting due to the tightly spatiotemporal-entangled modeling; (ii) information redundancy as it usually contains lots of marginal information that is weakly relevant to classification; and (iii) low interaction between multi-modal spatiotemporal information caused by insufficient late fusion. To alleviate these drawbacks, we propose to decouple and recouple spatiotemporal representation for RGB-D-based motion recognition. |
Benjia Zhou; Pichao Wang; Jun Wan; Yanyan Liang; Fan Wang; Du Zhang; Zhen Lei; Hao Li; Rong Jin; |
1593 | Learning To Learn By Jointly Optimizing Neural Architecture and Weights Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to obtain better meta-learners by co-optimizing the architecture and meta-weights simultaneously. |
Yadong Ding; Yu Wu; Chengyue Huang; Siliang Tang; Yi Yang; Longhui Wei; Yueting Zhuang; Qi Tian; |
1594 | Attributable Visual Similarity Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. |
Borui Zhang; Wenzhao Zheng; Jie Zhou; Jiwen Lu; |
1595 | A Self-Supervised Descriptor for Image Copy Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. |
Ed Pizzi; Sreya Dutta Roy; Sugosh Nagavara Ravindra; Priya Goyal; Matthijs Douze; |
1596 | DyTox: Transformers for Continual Learning With DYnamic TOken EXpansion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a transformer architecture based on a dedicated encoder/decoder framework. |
Arthur Douillard; Alexandre Ramé; Guillaume Couairon; Matthieu Cord; |
1597 | Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, the current paradigm suffers from two shortcomings: brittle under distribution shifts and inefficient for knowledge transfer. In this work, we propose to address these challenges from a causal representation perspective. |
Yuejiang Liu; Riccardo Cadei; Jonas Schweizer; Sherwin Bahmani; Alexandre Alahi; |
1598 | Manifold Learning Benefits GANs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we improve Generative Adversarial Networks by incorporating a manifold learning step into the discriminator. |
Yao Ni; Piotr Koniusz; Richard Hartley; Richard Nock; |
1599 | A Keypoint-Based Global Association Network for Lane Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Global Association Network (GANet) to formulate the lane detection problem from a new perspective, where each keypoint is directly regressed to the starting point of the lane line instead of point-by-point extension. |
Jinsheng Wang; Yinchao Ma; Shaofei Huang; Tianrui Hui; Fei Wang; Chen Qian; Tianzhu Zhang; |
1600 | Negative-Aware Attention Framework for Image-Text Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We thereby propose a novel Negative-Aware Attention Framework (NAAF), which explicitly exploits both the positive effect of matched fragments and the negative effect of mismatched fragments to jointly infer image-text similarity. |
Kun Zhang; Zhendong Mao; Quan Wang; Yongdong Zhang; |
1601 | Semantic-Aligned Fusion Transformer for One-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Yet, their performances are often unsatisfactory. In this paper, we attribute this to inappropriate correlation methods that misalign query-support semantics by overlooking spatial structures and scale variances. |
Yizhou Zhao; Xun Guo; Yan Lu; |
1602 | Beyond Supervised Vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we compare methods using performance-based benchmarks such as linear evaluation, nearest neighbor classification, and clustering for several different datasets, demonstrating the lack of a clear frontrunner within the current state-of-the-art. |
Matthew Gwilliam; Abhinav Shrivastava; |
1603 | Few-Shot Incremental Learning for Label-to-Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a few-shot incremental learning method for label-to-image translation. |
Pei Chen; Yangkang Zhang; Zejian Li; Lingyun Sun; |
1604 | Discrete Time Convolution for Fast Event-Based Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by continuous dynamics of biological neuron models, we propose a novel encoding method for sparse events – continuous time convolution (CTC) – which learns to model the spatial feature of the data with intrinsic dynamics. |
Kaixuan Zhang; Kaiwei Che; Jianguo Zhang; Jie Cheng; Ziyang Zhang; Qinghai Guo; Luziwei Leng; |
1605 | An Image Patch Is A Wave: Phase-Aware Vision MLP Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. |
Yehui Tang; Kai Han; Jianyuan Guo; Chang Xu; Yanxi Li; Chao Xu; Yunhe Wang; |
1606 | Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new face hallucination paradigm for HFR, which not only enables data-efficient synthesis but also allows to scale up model training without breaking any privacy policy. |
Yiqun Mei; Pengfei Guo; Vishal M. Patel; |
1607 | Visual Acoustic Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials. To address this novel task, we propose a cross-modal transformer model that uses audio-visual attention to inject visual properties into the audio and generate realistic audio output. |
Changan Chen; Ruohan Gao; Paul Calamia; Kristen Grauman; |
1608 | Shunted Self-Attention Via Multi-Scale Token Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted self-attention (SSA), that allows ViTs to model the attentions at hybrid scales per attention layer. |
Sucheng Ren; Daquan Zhou; Shengfeng He; Jiashi Feng; Xinchao Wang; |
1609 | Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack By Natural Phenomenon Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study a new type of optical adversarial examples, in which the perturbations are generated by a very common natural phenomenon, shadow, to achieve naturalistic and stealthy physical-world adversarial attack under the black-box setting. |
Yiqi Zhong; Xianming Liu; Deming Zhai; Junjun Jiang; Xiangyang Ji; |
1610 | ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This is in part because large training databases do not exist and in part because biomedical annotations are often noisy. In this paper, we show that by introducing templates within the deep learning pipeline we can overcome these problems. |
Jiancheng Yang; Udaranga Wickramasinghe; Bingbing Ni; Pascal Fua; |
1611 | Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, a multivariate Gaussian mixture is proposed with means and covariances to be estimated. Then, a novel probabilistic vector quantization is utilized to effectively approximate means, and remaining covariances are further induced to a unified mixture and solved by cascaded estimation without context models involved. |
Xiaosu Zhu; Jingkuan Song; Lianli Gao; Feng Zheng; Heng Tao Shen; |
1612 | 3D Photo Stylization: Learning To Generate Stylized Novel Views From A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we make a connection between the two, and address the challenging task of 3D photo stylization – generating stylized novel views from a single image given an arbitrary style. |
Fangzhou Mu; Jian Wang; Yicheng Wu; Yin Li; |
1613 | Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a transformer-based framework for accurate visual grounding by establishing text-conditioned discriminative features and performing multi-stage cross-modal reasoning. |
Li Yang; Yan Xu; Chunfeng Yuan; Wei Liu; Bing Li; Weiming Hu; |
1614 | Contrastive Learning for Space-Time Correspondence Via Self-Cycle Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel probabilistic method employing Bayesian Model Averaging and self-cycle regularization for spatio-temporal correspondence learning in videos within a self-supervised learning framework. |
Jeany Son; |
1615 | Learning Robust Image-Based Rendering on Sparse Scene Geometry Via Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: When only few views are provided, the performance of these methods drops off significantly, as the scene geometry becomes sparse as well. Therefore, in this paper, we propose Sparse-IBRNet (SIBRNet) to perform robust IBR on sparse scene geometry by depth completion. |
Yuqi Sun; Shili Zhou; Ri Cheng; Weimin Tan; Bo Yan; Lang Fu; |
1616 | Scale-Equivalent Distillation for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Further, we overcome these challenges by introducing a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance. |
Qiushan Guo; Yao Mu; Jianyu Chen; Tianqi Wang; Yizhou Yu; Ping Luo; |
1617 | Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to The Task of Accelerated MRI Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we present a novel Deep Learning-based Inverse Problem solver applied to the task of Accelerated MRI Reconstruction, called the Recurrent Variational Network (RecurrentVarNet), by exploiting the properties of Convolutional Recurrent Neural Networks and unrolled algorithms for solving Inverse Problems. |
George Yiasemis; Jan-Jakob Sonke; Clarisa Sánchez; Jonas Teuwen; |
1618 | SelfD: Self-Learning Large-Scale Driving Policies From The Web Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce SelfD, a framework for learning scalable driving by utilizing large amounts of online monocular images. |
Jimuyang Zhang; Ruizhao Zhu; Eshed Ohn-Bar; |
1619 | "The Pedestrian Next to The Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While such an increase in error is entirely expected – localization is harder at distance – much of the drop in performance can be attributed to the cues used by current texture-based models, in particular, they make heavy use of object-ground intersections (such as shadows), which become increasingly sparse and uncertain for distant objects. In this work, we address these shortcomings in BEV-mapping by learning the spatial relationship between objects in a scene. |
Avishkar Saha; Oscar Mendez; Chris Russell; Richard Bowden; |
1620 | Attribute Group Editing for Reliable Few-Shot Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new "editing-based" method, i.e., Attribute Group Editing (AGE), for few-shot image generation. |
Guanqi Ding; Xinzhe Han; Shuhui Wang; Shuzhe Wu; Xin Jin; Dandan Tu; Qingming Huang; |
1621 | Surpassing The Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore the potential of CNN-based models for gallbladder cancer (GBC) detection from ultrasound (USG) images as no prior study is known. |
Soumen Basu; Mayank Gupta; Pratyaksha Rana; Pankaj Gupta; Chetan Arora; |
1622 | CroMo: Cross-Modal Learning for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we systematically investigate key trade-offs associated with sensor and modality design choices as well as related model training strategies. |
Yannick Verdié; Jifei Song; Barnabé Mas; Benjamin Busam; Ales̆ Leonardis; Steven McDonagh; |
1623 | Self-Supervised Object Detection From Audio-Visual Correspondence Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We tackle the problem of learning object detectors without supervision. |
Triantafyllos Afouras; Yuki M. Asano; Francois Fagan; Andrea Vedaldi; Florian Metze; |
1624 | Autofocus for Event Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, the inherent differences between event and frame data in terms of sensing modality, noise, temporal resolutions, etc., bring many challenges in designing an effective AF method for event cameras. To address these challenges, we develop a novel event-based autofocus framework consisting of an event-specific focus measure called event rate (ER) and a robust search strategy called event-based golden search (EGS). |
Shijie Lin; Yinqiang Zhang; Lei Yu; Bin Zhou; Xiaowei Luo; Jia Pan; |
1625 | Learning Multiple Adverse Weather Removal Via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward A Unified Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, an ill-posed problem of multiple adverse weather removal is investigated. |
Wei-Ting Chen; Zhi-Kai Huang; Cheng-Che Tsai; Hao-Hsiang Yang; Jian-Jiun Ding; Sy-Yen Kuo; |
1626 | Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a generative adversarial network that can simultaneously generate aligned image samples from multiple related domains. |
Seung Wook Kim; Karsten Kreis; Daiqing Li; Antonio Torralba; Sanja Fidler; |
1627 | Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Accordingly, we propose our defense strategy, namely Appearance and Structure Aware Robust Graph Matching (ASAR-GM). |
Qibing Ren; Qingquan Bao; Runzhong Wang; Junchi Yan; |
1628 | Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A comprehensive evaluation comparing to other methods shows that the generated sets of orientations have low discrepancy, minimal spurious components in the power spectrum, and almost identical Voronoi volumes. |
Marc Alexa; |
1629 | TrackFormer: Multi-Object Tracking With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce TrackFormer, an end-to-end trainable MOT approach based on an encoder-decoder Transformer architecture. |
Tim Meinhardt; Alexander Kirillov; Laura Leal-Taixé; Christoph Feichtenhofer; |
1630 | L-Verse: Bidirectional Generation Between Image and Text Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To better leverage the correlation between image and text, we propose L-Verse, a novel architecture consisting of feature-augmented variational autoencoder (AugVAE) and bidirectional auto-regressive transformer (BiART) for image-to-text and text-to-image generation. |
Taehoon Kim; Gwangmo Song; Sihaeng Lee; Sangyun Kim; Yewon Seo; Soonyoung Lee; Seung Hwan Kim; Honglak Lee; Kyunghoon Bae; |
1631 | PanopticDepth: A Unified Framework for Depth-Aware Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. |
Naiyu Gao; Fei He; Jian Jia; Yanhu Shan; Haoyang Zhang; Xin Zhao; Kaiqi Huang; |
1632 | 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since the semantic attributes of a single image are usually implicit and entangled with each other, it is still challenging to reconstruct 3D shape with detailed semantic structures represented by the input image. To address this problem, we propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images. |
Xin Wen; Junsheng Zhou; Yu-Shen Liu; Hua Su; Zhen Dong; Zhizhong Han; |
1633 | Feature Statistics Mixing Regularization for Generative Adversarial Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a remedy, we propose feature statistics mixing regularization (FSMR) that encourages the discriminator’s prediction to be invariant to the styles of input images. |
Junho Kim; Yunjey Choi; Youngjung Uh; |
1634 | Learning To Learn and Remember Super Long Multi-Domain Task Sequence Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a simple yet effective learning to learn approach, i.e., meta optimizer, to mitigate the CF problem in SDML. |
Zhenyi Wang; Li Shen; Tiehang Duan; Donglin Zhan; Le Fang; Mingchen Gao; |
1635 | OpenTAL: Towards Open Set Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we, for the first time, step toward the Open Set TAL (OSTAL) problem and propose a general framework OpenTAL based on Evidential Deep Learning (EDL). |
Wentao Bao; Qi Yu; Yu Kong; |
1636 | Urban Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e.g., Street View). |
Konstantinos Rematas; Andrew Liu; Pratul P. Srinivasan; Jonathan T. Barron; Andrea Tagliasacchi; Thomas Funkhouser; Vittorio Ferrari; |
1637 | Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work addresses the generalizable deepfake detection from a simple principle: a generalizable representation should be sensitive to diverse types of forgeries. Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations. |
Liang Chen; Yong Zhang; Yibing Song; Lingqiao Liu; Jue Wang; |
1638 | Domain-Agnostic Prior for Transfer Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a simple and effective mechanism that regularizes cross-domain representation learning with a domain-agnostic prior (DAP) that constrains the features extracted from source and target domains to align with a domain-agnostic space. |
Xinyue Huo; Lingxi Xie; Hengtong Hu; Wengang Zhou; Houqiang Li; Qi Tian; |
1639 | Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present MetaDOCK, a task-specific dynamic kernel selection strategy for designing compressed CNN models that generalize well on unseen tasks in meta-learning. |
Arnav Chavan; Rishabh Tiwari; Udbhav Bamba; Deepak K. Gupta; |
1640 | Ego4D: Around The World in 3,000 Hours of Egocentric Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. |
Kristen Grauman; Andrew Westbury; Eugene Byrne; Zachary Chavis; Antonino Furnari; Rohit Girdhar; Jackson Hamburger; Hao Jiang; Miao Liu; Xingyu Liu; Miguel Martin; Tushar Nagarajan; Ilija Radosavovic; Santhosh Kumar Ramakrishnan; Fiona Ryan; Jayant Sharma; Michael Wray; Mengmeng Xu; Eric Zhongcong Xu; Chen Zhao; Siddhant Bansal; Dhruv Batra; Vincent Cartillier; Sean Crane; Tien Do; Morrie Doulaty; Akshay Erapalli; Christoph Feichtenhofer; Adriano Fragomeni; Qichen Fu; Abrham Gebreselasie; Cristina González; James Hillis; Xuhua Huang; Yifei Huang; Wenqi Jia; Weslie Khoo; Jáchym Kolář; Satwik Kottur; Anurag Kumar; Federico Landini; Chao Li; Yanghao Li; Zhenqiang Li; Karttikeya Mangalam; Raghava Modhugu; Jonathan Munro; Tullie Murrell; Takumi Nishiyasu; Will Price; Paola Ruiz; Merey Ramazanova; Leda Sari; Kiran Somasundaram; Audrey Southerland; Yusuke Sugano; Ruijie Tao; Minh Vo; Yuchen Wang; Xindi Wu; Takuma Yagi; Ziwei Zhao; Yunyi Zhu; Pablo Arbeláez; David Crandall; Dima Damen; Giovanni Maria Farinella; Christian Fuegen; Bernard Ghanem; Vamsi Krishna Ithapu; C. V. Jawahar; Hanbyul Joo; Kris Kitani; Haizhou Li; Richard Newcombe; Aude Oliva; Hyun Soo Park; James M. Rehg; Yoichi Sato; Jianbo Shi; Mike Zheng Shou; Antonio Torralba; Lorenzo Torresani; Mingfei Yan; Jitendra Malik; |
1641 | Differentially Private Federated Learning With Local Regularization and Sparsification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the cause of model performance degradation in federated learning with user-level DP guarantee. |
Anda Cheng; Peisong Wang; Xi Sheryl Zhang; Jian Cheng; |
1642 | Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. |
Yucheng Tang; Dong Yang; Wenqi Li; Holger R. Roth; Bennett Landman; Daguang Xu; Vishwesh Nath; Ali Hatamizadeh; |
1643 | Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such cross-view training samples could be unavailable under the ISolated Camera Supervised (ISCS) setting, e.g., a surveillance system deployed across distant scenes. To handle this challenging problem, a new pipeline is introduced by synthesizing the cross-camera samples in the feature space for model training. |
Chao Wu; Wenhang Ge; Ancong Wu; Xiaobin Chang; |
1644 | Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method, W-OoD, for utilizing the hard OoDs. |
Jungbeom Lee; Seong Joon Oh; Sangdoo Yun; Junsuk Choe; Eunji Kim; Sungroh Yoon; |
1645 | Point-Level Region Contrast for Object Detection Pre-Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection. |
Yutong Bai; Xinlei Chen; Alexander Kirillov; Alan Yuille; Alexander C. Berg; |
1646 | Upright-Net: Learning Upright Orientation for 3D Point Cloud Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Upright-Net, a deep-learning-based approach for estimating the upright orientation of 3D point clouds. |
Xufang Pang; Feng Li; Ning Ding; Xiaopin Zhong; |
1647 | Learning Semantic Associations for Mirror Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observe that humans tend to place mirrors in relation to certain objects for specific functional purposes, e.g., a mirror above the sink. Inspired by this observation, we propose a model to exploit the semantic associations between the mirror and its surrounding objects for a reliable mirror localization. |
Huankang Guan; Jiaying Lin; Rynson W.H. Lau; |
1648 | Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an approach to estimate arm and hand dynamics from monocular video by utilizing the relationship between arm and hand. |
Shuying Liu; Wenbin Wu; Jiaxian Wu; Yue Lin; |
1649 | Failure Modes of Domain Generalization Algorithms Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we propose an evaluation framework for domain generalization algorithms that allows decomposition of the error into components capturing distinct aspects of generalization. |
Tigran Galstyan; Hrayr Harutyunyan; Hrant Khachatrian; Greg Ver Steeg; Aram Galstyan; |
1650 | Geometric and Textural Augmentation for Domain Gap Reduction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These new approaches improve performance, but they ignore the geometric variations in object shape that real art exhibits: artists deform and warp objects for artistic effect. Motivated by this observation, we propose a method to reduce bias by jointly increasing the texture and geometry diversities of the training data. |
Xiao-Chang Liu; Yong-Liang Yang; Peter Hall; |
1651 | Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We identify a problem in CSS: A model tends to be confused between old and new classes that are visually similar, which makes it forget the old ones. To address this gap, we propose REMINDER – a new CSS framework and a novel class similarity knowledge distillation (CSW-KD) method. |
Minh Hieu Phan; The-Anh Ta; Son Lam Phung; Long Tran-Thanh; Abdesselam Bouzerdoum; |
1652 | DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in-the-wild. |
Tetiana Martyniuk; Orest Kupyn; Yana Kurlyak; Igor Krashenyi; Jiří Matas; Viktoriia Sharmanska; |
1653 | Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key idea is to infer signed distances by pushing both the query projections to be on the surface and the projection distance to be the minimum. |
Baorui Ma; Yu-Shen Liu; Zhizhong Han; |
1654 | HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation Via Hybrid Contrastive Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart. |
Mengtian Li; Yuan Xie; Yunhang Shen; Bo Ke; Ruizhi Qiao; Bo Ren; Shaohui Lin; Lizhuang Ma; |
1655 | Fine-Tuning Image Transformers Using Learnable Memory Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose augmenting Vision Transformer models with learnable memory tokens. |
Mark Sandler; Andrey Zhmoginov; Max Vladymyrov; Andrew Jackson; |
1656 | Contrastive Conditional Neural Processes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead, noise contrastive estimation might be able to provide more robust representations by learning distributional matching objectives to combat such inherent limitation of generative models. In light of this, we propose to equip CNPs by 1) aligning prediction with encoded ground-truth observation, and 2) decoupling meta-representation adaptation from generative reconstruction. |
Zesheng Ye; Lina Yao; |
1657 | VCLIMB: A Novel Video Class Incremental Learning Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce vCLIMB, a novel video continual learning benchmark. |
Andrés Villa; Kumail Alhamoud; Victor Escorcia; Fabian Caba; Juan León Alcázar; Bernard Ghanem; |
1658 | Bending Reality: Distortion-Aware Transformers for Adapting to Panoramic Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Distortions and the distinct image-feature distribution in 360deg panoramas impede the transfer from the annotation-rich pinhole domain and therefore come with a big dent in performance. To get around this domain difference and bring together semantic annotations from pinhole- and 360deg surround-visuals, we propose to learn object deformations and panoramic image distortions in the Deformable Patch Embedding (DPE) and Deformable MLP (DMLP) components which blend into our Transformer for PAnoramic Semantic Segmentation (Trans4PASS) model. |
Jiaming Zhang; Kailun Yang; Chaoxiang Ma; Simon Reiß; Kunyu Peng; Rainer Stiefelhagen; |
1659 | Sparse and Complete Latent Organization for Geospatial Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: First, foreground objects are tiny in the remote sensing images and are represented by only a few pixels, which leads to large foreground intra-class variance and undermines the discrimination between foreground classes (issue firstly considered in this work). Second, background class contains complex context, which results in false alarms due to large background intra-class variance. To alleviate these two issues, we construct a sparse and complete latent structure via prototypes. |
Fengyu Yang; Chenyang Ma; |
1660 | Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Robust Equivariant Imaging (REI) framework which can learn to image from noisy partial measurements alone. |
Dongdong Chen; Julián Tachella; Mike E. Davies; |
1661 | Not All Relations Are Equal: Mining Informative Labels for Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Learning on trivial relations that indicate generic spatial configuration like ‘on’ instead of informative relations such as ‘parked on’ does not enforce this complex reasoning, harming generalization. To address this problem, we propose a novel framework for SGG training that exploits relation labels based on their informativeness. |
Arushi Goel; Basura Fernando; Frank Keller; Hakan Bilen; |
1662 | Learning To Detect Scene Landmarks for Camera Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new learned camera localization technique that eliminates the need to store features or a detailed 3D point cloud. |
Tien Do; Ondrej Miksik; Joseph DeGol; Hyun Soo Park; Sudipta N. Sinha; |
1663 | INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose INS-Conv, an INcremental Sparse Convolutional network which enables online accurate 3D semantic and instance segmentation. |
Leyao Liu; Tian Zheng; Yun-Jou Lin; Kai Ni; Lu Fang; |
1664 | ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the impressive results, we thoroughly investigate the SDA and provide some empirical analysis. |
Lihe Yang; Wei Zhuo; Lei Qi; Yinghuan Shi; Yang Gao; |
1665 | Visual Vibration Tomography: Estimating Interior Material Properties From Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: An object’s interior material properties, while invisible to the human eye, determine motion observed on its surface. We propose an approach that estimates heterogeneous material properties of an object from a monocular video of its surface vibrations. |
Berthy T. Feng; Alexander C. Ogren; Chiara Daraio; Katherine L. Bouman; |
1666 | Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation With Reliable Voted Pseudo Labels Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an unsupervised domain adaptation method for deep point cloud representation learning. |
Hehe Fan; Xiaojun Chang; Wanyue Zhang; Yi Cheng; Ying Sun; Mohan Kankanhalli; |
1667 | Interacting Attention Graph for Single Image Two-Hand Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image. |
Mengcheng Li; Liang An; Hongwen Zhang; Lianpeng Wu; Feng Chen; Tao Yu; Yebin Liu; |
1668 | Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To accelerate the progress of roadside perception, we present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view. |
Xiaoqing Ye; Mao Shu; Hanyu Li; Yifeng Shi; Yingying Li; Guangjie Wang; Xiao Tan; Errui Ding; |
1669 | Noisy Boundaries: Lemon or Lemonade for Semi-Supervised Instance Segmentation? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we formally address semi-supervised instance segmentation, where unlabeled images are employed to boost the performance. |
Zhenyu Wang; Yali Li; Shengjin Wang; |
1670 | Boosting View Synthesis With Residual Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a simple but effective technique to boost the rendering quality, which can be easily integrated with most view synthesis methods. |
Xuejian Rong; Jia-Bin Huang; Ayush Saraf; Changil Kim; Johannes Kopf; |
1671 | Input-Level Inductive Biases for 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we tackle 3D reconstruction using a domain agnostic architecture and study how instead to inject the same type of inductive biases directly as extra inputs to the model. |
Wang Yifan; Carl Doersch; Relja Arandjelović; João Carreira; Andrew Zisserman; |
1672 | Exploring and Evaluating Image Restoration Potential in Dynamic Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, to better study an image’s potential value that can be explored for restoration, we propose a novel concept, referring to image restoration potential (IRP). |
Cheng Zhang; Shaolin Su; Yu Zhu; Qingsen Yan; Jinqiu Sun; Yanning Zhang; |
1673 | FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new vision-language transformer based model, FashionVLP, that brings the prior knowledge contained in large image-text corpora to the domain of fashion image re-trieval, and combines visual information from multiple levels of context to effectively capture fashion related information. |
Sonam Goenka; Zhaoheng Zheng; Ayush Jaiswal; Rakesh Chada; Yue Wu; Varsha Hedau; Pradeep Natarajan; |
1674 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel Cross-Image Relational KD (CIRKD), which focuses on transferring structured pixel-to-pixel and pixel-to-region relations among the whole images. |
Chuanguang Yang; Helong Zhou; Zhulin An; Xue Jiang; Yongjun Xu; Qian Zhang; |
1675 | A-ViT: Adaptive Tokens for Efficient Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer ViT for images of different complexity. |
Hongxu Yin; Arash Vahdat; Jose M. Alvarez; Arun Mallya; Jan Kautz; Pavlo Molchanov; |
1676 | Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a dual-scale graph transformer (DUET) for joint long-term action planning and fine-grained cross-modal understanding. |
Shizhe Chen; Pierre-Louis Guhur; Makarand Tapaswi; Cordelia Schmid; Ivan Laptev; |
1677 | Towards Layer-Wise Image Vectorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Layer-wise Image Vectorization, namely LIVE, to convert raster images to SVGs and simultaneously maintain its image topology. |
Xu Ma; Yuqian Zhou; Xingqian Xu; Bin Sun; Valerii Filev; Nikita Orlov; Yun Fu; Humphrey Shi; |
1678 | Scenic: A JAX Library for Computer Vision Research and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Scenic is an open-source (https://github.com/google-research/scenic) JAX library with a focus on transformer-based models for computer vision research and beyond. |
Mostafa Dehghani; Alexey Gritsenko; Anurag Arnab; Matthias Minderer; Yi Tay; |
1679 | CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While ongoing research efforts are engaging these problems from various angles, in most computer vision related cases these approaches can be generalized to investigations of the effects of distribution shifts in image data. In this context, we propose to study the shifts in the learned weights of trained CNN models. |
Paul Gavrikov; Janis Keuper; |
1680 | ScePT: Scene-Consistent, Policy-Based Trajectory Predictions for Planning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present ScePT, a policy planning-based trajectory prediction model that generates accurate, scene-consistent trajectory predictions suitable for autonomous system motion planning. |
Yuxiao Chen; Boris Ivanovic; Marco Pavone; |
1681 | Calibrating Deep Neural Networks By Pairwise Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is well known that deep neural networks (DNNs) produce poorly calibrated estimates of class-posterior probabilities. We hypothesize that this is due to the limited calibration supervision provided by the cross-entropy loss, which places all emphasis on the probability of the true class and mostly ignores the remaining. |
Jiacheng Cheng; Nuno Vasconcelos; |
1682 | Deep Saliency Prior for Reducing Visual Distraction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present results on a variety of natural images and conduct a perceptual study to evaluate and validate the changes in viewers’ eye-gaze between the original images and our edited results. |
Kfir Aberman; Junfeng He; Yossi Gandelsman; Inbar Mosseri; David E. Jacobs; Kai Kohlhoff; Yael Pritch; Michael Rubinstein; |
1683 | Efficient Large-Scale Localization By Global Instance Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an efficient and accurate large-scale localization framework based on the recognition of buildings, which are not only discriminative for coarse localization but also robust for fine localization. |
Fei Xue; Ignas Budvytis; Daniel Olmeda Reino; Roberto Cipolla; |
1684 | Sign Language Video Retrieval With Free-Form Textual Queries Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We identify that a key bottleneck in the performance of the system is the quality of the sign video embedding which suffers from a scarcity of labelled training data. We, therefore, propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data. |
Amanda Duarte; Samuel Albanie; Xavier Giró-i-Nieto; Gül Varol; |
1685 | Real-Time Object Detection for Streaming Perception Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem. |
Jinrong Yang; Songtao Liu; Zeming Li; Xiaoping Li; Jian Sun; |
1686 | Simulated Adversarial Testing of Face Recognition Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios. |
Nataniel Ruiz; Adam Kortylewski; Weichao Qiu; Cihang Xie; Sarah Adel Bargal; Alan Yuille; Stan Sclaroff; |
1687 | VisualHow: Multimodal Problem Solving Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: With an overarching goal of developing intelligent systems to assist humans in various daily activities, we propose VisualHow, a free-form and open-ended research that focuses on understanding a real-life problem and deriving its solution by incorporating key components across multiple modalities. |
Jinhui Yang; Xianyu Chen; Ming Jiang; Shi Chen; Louis Wang; Qi Zhao; |
1688 | Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Invariant representation learning, by itself, is ill-suited to fully model the data generation process. In this paper, we show how bringing recent results on equivariant representation learning (for studying symmetries in neural networks) together with simple use of classical results on causal inference provides an effective practical solution to this problem. |
Vishnu Suresh Lokhande; Rudrasis Chakraborty; Sathya N. Ravi; Vikas Singh; |
1689 | Spatial Commonsense Graph for Object Localisation in Partial Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) |
Francesco Giuliari; Geri Skenderi; Marco Cristani; Yiming Wang; Alessio Del Bue; |
1690 | CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). |
Yanan Zhang; Jiaxin Chen; Di Huang; |
1691 | OSSGAN: Open-Set Semi-Supervised Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set. |
Kai Katsumata; Duc Minh Vo; Hideki Nakayama; |
1692 | Lite Vision Transformer With Enhanced Self-Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Lite Vision Transformer (LVT), a novel light-weight transformer network with two enhanced self-attention mechanisms to improve the model performances for mobile deployment. |
Chenglin Yang; Yilin Wang; Jianming Zhang; He Zhang; Zijun Wei; Zhe Lin; Alan Yuille; |
1693 | Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these assumptions do not always hold in practical applications. To tackle this problem, we propose a depth solving system that fully explores the visual clues from the subtasks in M3OD and generates multiple estimations for the depth of each target. |
Zhuoling Li; Zhan Qu; Yang Zhou; Jianzhuang Liu; Haoqian Wang; Lihui Jiang; |
1694 | NinjaDesc: Content-Concealing Visual Descriptors Via Adversarial Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose an adversarial learning framework for training visual descriptors that prevent image reconstruction, while maintaining the matching accuracy. |
Tony Ng; Hyo Jin Kim; Vincent T. Lee; Daniel DeTone; Tsun-Yi Yang; Tianwei Shen; Eddy Ilg; Vassileios Balntas; Krystian Mikolajczyk; Chris Sweeney; |
1695 | Physically-Guided Disentangled Implicit Rendering for 3D Face Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel Physically-guided Disentangled Implicit Rendering (PhyDIR) framework for high-fidelity 3D face modeling. |
Zhenyu Zhang; Yanhao Ge; Ying Tai; Weijian Cao; Renwang Chen; Kunlin Liu; Hao Tang; Xiaoming Huang; Chengjie Wang; Zhifeng Xie; Dongjin Huang; |
1696 | M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By leveraging the natural suitability of E-commerce, where different modalities capture complementary semantic information, we contribute a large-scale multi-modal pre-training dataset M5Product. |
Xiao Dong; Xunlin Zhan; Yangxin Wu; Yunchao Wei; Michael C. Kampffmeyer; Xiaoyong Wei; Minlong Lu; Yaowei Wang; Xiaodan Liang; |
1697 | Bi-Level Alignment for Cross-Domain Crowd Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to develop a new adversarial learning based method, which is simple and efficient to apply. |
Shenjian Gong; Shanshan Zhang; Jian Yang; Dengxin Dai; Bernt Schiele; |
1698 | ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: VFI can be extremely challenging, particularly in sequences containing large motions, occlusions or dynamic textures, where existing approaches fail to offer perceptually robust interpolation performance. In this context, we present a novel deep learning based VFI method, ST-MFNet, based on a Spatio-Temporal Multi-Flow architecture. |
Duolikun Danier; Fan Zhang; David Bull; |
1699 | Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Modern Earth observation satellites capture multi-exposure bursts of push-frame images that can be super-resolved via computational means. In this work, we propose a super-resolution method for such multi-exposure sequences, a problem that has received very little attention in the literature. |
Ngoc Long Nguyen; Jérémy Anger; Axel Davy; Pablo Arias; Gabriele Facciolo; |
1700 | Efficient Multi-View Stereo By Iterative Dynamic Cost Volume Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel iterative dynamic cost volume for multi-view stereo. |
Shaoqian Wang; Bo Li; Yuchao Dai; |
1701 | Learning To Generate Line Drawings That Convey Geometry and Semantics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents an unpaired method for creating line drawings from photographs. |
Caroline Chan; Frédo Durand; Phillip Isola; |
1702 | On Guiding Visual Attention With Language Specification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our insight is to use high-level language specification as advice for constraining the prediction evidence to task-relevant features, instead of distractors. |
Suzanne Petryk; Lisa Dunlap; Keyan Nasseri; Joseph Gonzalez; Trevor Darrell; Anna Rohrbach; |
1703 | ReSTR: Convolution-Free Referring Image Segmentation Using Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most of existing methods for this task rely heavily on convolutional neural networks, which however have trouble capturing long-range dependencies between entities in the language expression and are not flexible enough for modeling interactions between the two different modalities. To address these issues, we present the first convolution-free model for referring image segmentation using transformers, dubbed ReSTR. |
Namyup Kim; Dongwon Kim; Cuiling Lan; Wenjun Zeng; Suha Kwak; |
1704 | A Graph Matching Perspective With Transformers on Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a graph matching-based method to formulate VIS. |
Zheyun Qin; Xiankai Lu; Xiushan Nie; Yilong Yin; Jianbing Shen; |
1705 | TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing. |
Yanbo Xu; Yueqin Yin; Liming Jiang; Qianyi Wu; Chengyao Zheng; Chen Change Loy; Bo Dai; Wayne Wu; |
1706 | FLAG: Flow-Based 3D Avatar Generation From Sparse Observations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While these signals are valuable, they are an incomplete representation of the human body, making it challenging to generate a faithful full-body avatar. We address this challenge by developing a flow-based generative model of the 3D human body from sparse observations, wherein we learn not only a conditional distribution of 3D human pose, but also a probabilistic mapping from observations to the latent space from which we can generate a plausible pose along with uncertainty estimates for the joints. |
Sadegh Aliakbarian; Pashmina Cameron; Federica Bogo; Andrew Fitzgibbon; Thomas J. Cashman; |
1707 | Stability-Driven Contact Reconstruction From Monocular Color Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key idea is to reconstruct the contact pattern directly from monocular images and utilize the physical stability criterion in the simulation to drive the optimization process described above. |
Zimeng Zhao; Binghui Zuo; Wei Xie; Yangang Wang; |
1708 | Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes. |
Shu Zhang; Ran Xu; Caiming Xiong; Chetan Ramaiah; |
1709 | SGTR: End-to-End Scene Graph Generation With Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem. |
Rongjie Li; Songyang Zhang; Xuming He; |
1710 | Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present PanopticNeRF, an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff). |
Abhijit Kundu; Kyle Genova; Xiaoqi Yin; Alireza Fathi; Caroline Pantofaru; Leonidas J. Guibas; Andrea Tagliasacchi; Frank Dellaert; Thomas Funkhouser; |
1711 | Texture-Based Error Analysis for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key contribution is to leverage a texture classifier, which enables us to assign patches with semantic labels, to identify the source of SR errors both globally and locally. |
Salma Abdel Magid; Zudi Lin; Donglai Wei; Yulun Zhang; Jinjin Gu; Hanspeter Pfister; |
1712 | PILC: Practical Image Lossless Compression With An End-to-End GPU Oriented Neural Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose PILC, an end-to-end image lossless compression framework that achieves 200 MB/s for both compression and decompression with a single NVIDIA Tesla V100 GPU, 10x faster than the most efficient one before. |
Ning Kang; Shanzhao Qiu; Shifeng Zhang; Zhenguo Li; Shu-Tao Xia; |
1713 | Set-Supervised Action Learning in Procedural Task Videos Via Pairwise Order Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key observation is that videos within the same task have similar ordering of actions, which can be leveraged for effective learning. Therefore, we propose an attention-based method with a new Pairwise Ordering Consistency (POC) loss that encourages that for each common action pair in two videos of the same task, the attentions of actions follow a similar ordering. |
Zijia Lu; Ehsan Elhamifar; |
1714 | Learning To Align Sequential Actions in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an approach to align sequential actions in the wild that involve diverse temporal variations. |
Weizhe Liu; Bugra Tekin; Huseyin Coskun; Vibhav Vineet; Pascal Fua; Marc Pollefeys; |
1715 | Decoupled Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation(DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly. |
Borui Zhao; Quan Cui; Renjie Song; Yiyu Qiu; Jiajun Liang; |
1716 | DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion. |
Yingwei Li; Adams Wei Yu; Tianjian Meng; Ben Caine; Jiquan Ngiam; Daiyi Peng; Junyang Shen; Yifeng Lu; Denny Zhou; Quoc V. Le; Alan Yuille; Mingxing Tan; |
1717 | Neural Volumetric Object Selection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF). |
Zhongzheng Ren; Aseem Agarwala; Bryan Russell; Alexander G. Schwing; Oliver Wang; |
1718 | GCR: Gradient Coreset Based Replay Buffer Selection for Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Gradient Coreset Replay (GCR), a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. |
Rishabh Tiwari; Krishnateja Killamsetty; Rishabh Iyer; Pradeep Shenoy; |
1719 | PointCLIP: Point Cloud Understanding By CLIP Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point cloud and 3D category texts. |
Renrui Zhang; Ziyu Guo; Wei Zhang; Kunchang Li; Xupeng Miao; Bin Cui; Yu Qiao; Peng Gao; Hongsheng Li; |
1720 | NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose NeRFusion, a method that combines the advantages of NeRF and TSDF-based fusion techniques to achieve efficient large-scale reconstruction and photo-realistic rendering. |
Xiaoshuai Zhang; Sai Bi; Kalyan Sunkavalli; Hao Su; Zexiang Xu; |
1721 | DeepFace-EMD: Re-Ranking Using Patch-Wise Earth Mover’s Distance Improves Out-of-Distribution Face Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose a re-ranking approach that compares two faces using the Earth Mover’s Distance on the deep, spatial features of image patches. |
Hai Phan; Anh Nguyen; |
1722 | A Sampling-Based Approach for Efficient Clustering in Large Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. |
Georgios Exarchakis; Omar Oubari; Gregor Lenz; |
1723 | General Facial Representation Learning in A Visual-Linguistic Manner Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general facial representation learning. |
Yinglin Zheng; Hao Yang; Ting Zhang; Jianmin Bao; Dongdong Chen; Yangyu Huang; Lu Yuan; Dong Chen; Ming Zeng; Fang Wen; |
1724 | Deep Color Consistent Network for Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we decouple a color image to two main components, a gray image and a color hist histogram. |
Zhao Zhang; Huan Zheng; Richang Hong; Mingliang Xu; Shuicheng Yan; Meng Wang; |
1725 | AdaSTE: An Adaptive Straight-Through Estimator To Train Binary Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new algorithm for training deep neural networks (DNNs) with binary weights. |
Huu Le; Rasmus Kjær Høier; Che-Tsung Lin; Christopher Zach; |
1726 | Reusing The Task-Specific Classifier As A Discriminator: Discriminator-Free Adversarial Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of these methods failed to effectively leverage the predicted discriminative information, and thus cause mode collapse for generator. In this work, we address this problem from a different perspective and design a simple yet effective adversarial paradigm in the form of a discriminator-free adversarial learning network (DALN), wherein the category classifier is reused as a discriminator, which achieves explicit domain alignment and category distinguishment through a unified objective, enabling the DALN to leverage the predicted discriminative information for sufficient feature alignment. |
Lin Chen; Huaian Chen; Zhixiang Wei; Xin Jin; Xiao Tan; Yi Jin; Enhong Chen; |
1727 | Pooling Revisited: Your Receptive Field Is Suboptimal Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, we propose a simple yet effective Dynamically Optimized Pooling operation, referred to as DynOPool, which learns the optimized scale factors of feature maps end-to-end. |
Dong-Hwan Jang; Sanghyeok Chu; Joonhyuk Kim; Bohyung Han; |
1728 | Dual Task Learning By Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, contemporary approaches fail to show the reported performance in real-world settings. To overcome this limitation, we propose SimSaC. |
Jin-Man Park; Ue-Hwan Kim; Seon-Hoon Lee; Jong-Hwan Kim; |
1729 | Show Me What and Tell Me How: Video Synthesis Via Multimodal Conditioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work presents a multimodal video generation framework that benefits from text and images provided jointly or separately. |
Ligong Han; Jian Ren; Hsin-Ying Lee; Francesco Barbieri; Kyle Olszewski; Shervin Minaee; Dimitris Metaxas; Sergey Tulyakov; |
1730 | Patch Slimming for Efficient Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Considering that the attention mechanism aggregates different patches layer-by-layer, we present a novel patch slimming approach that discards useless patches in a top-down paradigm. |
Yehui Tang; Kai Han; Yunhe Wang; Chang Xu; Jianyuan Guo; Chao Xu; Dacheng Tao; |
1731 | Bijective Mapping Network for Shadow Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we argue that shadow removal and generation are interrelated and could provide useful informative supervision for each other. |
Yurui Zhu; Jie Huang; Xueyang Fu; Feng Zhao; Qibin Sun; Zheng-Jun Zha; |
1732 | End-to-End Semi-Supervised Learning for Video Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on semi-supervised learning for video action detection which utilizes both labeled as well as unlabeled data. |
Akash Kumar; Yogesh Singh Rawat; |
1733 | Causal Transportability for Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, we then show that the causal effect, which severs all sources of confounding, remains invariant across domains. This motivates us to develop an algorithm to estimate the causal effect for image classification, which is transportable (i.e., invariant) across source and target environments. |
Chengzhi Mao; Kevin Xia; James Wang; Hao Wang; Junfeng Yang; Elias Bareinboim; Carl Vondrick; |
1734 | Local Attention Pyramid for Scene Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, the synthesized qualities of small and less frequent object classes tend to be low. To address this, we propose a novel attention module, Local Attention Pyramid (LAP) module tailored for scene image synthesis, that encourages GANs to generate diverse object classes in a high quality by explicit spread of high attention scores to local regions, since objects in scene images are scattered over the entire images. |
Sang-Heon Shim; Sangeek Hyun; DaeHyun Bae; Jae-Pil Heo; |
1735 | Multi-Objective Diverse Human Motion Prediction With Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we aim to design a prediction framework that can balance the accuracy sampling and diversity sampling during the testing phase. |
Hengbo Ma; Jiachen Li; Ramtin Hosseini; Masayoshi Tomizuka; Chiho Choi; |
1736 | GridShift: A Faster Mode-Seeking Algorithm for Image Segmentation and Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Therefore, it is very slow for large-scale datasets. To address this issue, we propose a mode-seeking algorithm, GridShift, with faster computing and principally based on MeanShift that uses a grid-based approach. |
Abhishek Kumar; Oladayo S. Ajani; Swagatam Das; Rammohan Mallipeddi; |
1737 | Confidence Propagation Cluster: Unleash Full Potential of Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, inspired by belief propagation (BP), we propose the Confidence Propagation Cluster (CP-Cluster) to replace NMS-based methods, which is fully parallelizable as well as better in accuracy. |
Yichun Shen; Wanli Jiang; Zhen Xu; Rundong Li; Junghyun Kwon; Siyi Li; |
1738 | Cluster-Guided Image Synthesis With Unconditional Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on controllable image generation by leveraging GANs that are well-trained in an unsupervised fashion. |
Markos Georgopoulos; James Oldfield; Grigorios G. Chrysos; Yannis Panagakis; |
1739 | ISNet: Shape Matters for Infrared Small Target Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: How to detect the precise shape information of infrared targets remains challenging. In this paper, we propose a novel infrared shape network (ISNet), where Taylor finite difference (TFD)-inspired edge block and two-orientation attention aggregation (TOAA) block are devised to address this problem. |
Mingjin Zhang; Rui Zhang; Yuxiang Yang; Haichen Bai; Jing Zhang; Jie Guo; |
1740 | Robust Region Feature Synthesizer for Zero-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Zero-shot object detection aims at incorporating class semantic vectors to realize the detection of (both seen and) unseen classes given an unconstrained test image. In this study, we reveal the core challenges in this research area: how to synthesize robust region features (for unseen objects) that are as intra-class diverse and inter-class separable as the real samples, so that strong unseen object detectors can be trained upon them. |
Peiliang Huang; Junwei Han; De Cheng; Dingwen Zhang; |
1741 | Virtual Correspondence: Humans As A Cue for Extreme-View Geometry Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a method to find virtual correspondences based on humans in the scene. |
Wei-Chiu Ma; Anqi Joyce Yang; Shenlong Wang; Raquel Urtasun; Antonio Torralba; |
1742 | Segment, Magnify and Reiterate: Detecting Camouflaged Objects The Hard Way Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle camouflaged object detection (COD), we are inspired by humans attention coupled with the coarse-to-fine detection strategy, and thereby propose an iterative refinement framework, coined SegMaR, which integrates Segment, Magnify and Reiterate in a multi-stage detection fashion. |
Qi Jia; Shuilian Yao; Yu Liu; Xin Fan; Risheng Liu; Zhongxuan Luo; |
1743 | SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a novel image-based relighting pipeline, SIMBAR, that can work with a single image as input. |
Xianling Zhang; Nathan Tseng; Ameerah Syed; Rohan Bhasin; Nikita Jaipuria; |
1744 | Shape From Thermal Radiation: Passive Ranging Using Multi-Spectral LWIR Measurements Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new cue of depth sensing using thermal radiation. |
Yasuto Nagase; Takahiro Kushida; Kenichiro Tanaka; Takuya Funatomi; Yasuhiro Mukaigawa; |
1745 | Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset’s partial annotations. |
Emanuel Ben-Baruch; Tal Ridnik; Itamar Friedman; Avi Ben-Cohen; Nadav Zamir; Asaf Noy; Lihi Zelnik-Manor; |
1746 | HSC4D: Human-Centered 4D Scene Capture in Large-Scale Indoor-Outdoor Space Using Wearable IMUs and LiDAR Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Human-centered 4D Scene Capture (HSC4D) to accurately and efficiently create a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, and rich interactions between humans and environments. |
Yudi Dai; Yitai Lin; Chenglu Wen; Siqi Shen; Lan Xu; Jingyi Yu; Yuexin Ma; Cheng Wang; |
1747 | CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods, based on convolutional neural networks (CNNs) and/or graph neural networks (GNNs), regress instance bounding boxes in the pixel domain and then convert the predictions into symbols. In this paper, we present a novel framework named CADTransformer, that can painlessly modify existing vision transformer (ViT) backbones to tackle the above limitations for the panoptic symbol spotting task. |
Zhiwen Fan; Tianlong Chen; Peihao Wang; Zhangyang Wang; |
1748 | IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Learning to synthesize data has emerged as a promising direction in zero-shot quantization (ZSQ), which represents neural networks by low-bit integer without accessing any of the real data. In this paper, we observe an interesting phenomenon of intra-class heterogeneity in real data and show that existing methods fail to retain this property in their synthetic images, which causes a limited performance increase. |
Yunshan Zhong; Mingbao Lin; Gongrui Nan; Jianzhuang Liu; Baochang Zhang; Yonghong Tian; Rongrong Ji; |
1749 | M3L: Language-Based Video Editing Via Multi-Modal Multi-Level Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces the language-based video editing (LBVE) task, which allows the model to edit, guided by text instruction, a source video into a target video. |
Tsu-Jui Fu; Xin Eric Wang; Scott T. Grafton; Miguel P. Eckstein; William Yang Wang; |
1750 | I M Avatar: Implicit Morphable Head Avatars From Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos. |
Yufeng Zheng; Victoria Fernández Abrevaya; Marcel C. Bühler; Xu Chen; Michael J. Black; Otmar Hilliges; |
1751 | BodyMap: Learning Full-Body Dense Correspondence Map Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we present BodyMap, a new framework for obtaining high-definition full-body and continuous dense correspondence between in-the-wild images of clothed humans and the surface of a 3D template model. |
Anastasia Ianina; Nikolaos Sarafianos; Yuanlu Xu; Ignacio Rocco; Tony Tung; |
1752 | Weakly-Supervised Metric Learning With Cross-Module Communications for The Classification of Anterior Chamber Angle Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel end-to-end framework GCNet for automated Glaucoma Classification based on ACA images or other Glaucoma-related medical images. |
Jingqi Huang; Yue Ning; Dong Nie; Linan Guan; Xiping Jia; |
1753 | A Hybrid Egocentric Activity Anticipation Framework Via Memory-Augmented Recurrent and One-Shot Representation Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the limitations of current recursive prediction models arise from two aspects: (i) The vanilla recurrent units are prone to accumulated errors in relatively long periods of anticipation. (ii) The anticipated representations may be insufficient to reflect the desired semantics of the target activity, due to lack of contextual clues. To address these issues, we propose HRO, a hybrid framework that integrates both the memory-augmented recurrent and one-shot representation forecasting strategies. |
Tianshan Liu; Kin-Man Lam; |
1754 | It’s All in The Teacher: Zero-Shot Quantization Brought Closer to The Teacher Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the observations, we propose AIT, a simple yet powerful technique for zero-shot quantization, which addresses the aforementioned two problems in the following way: AIT i) uses a KL distance loss only without a cross-entropy loss, and ii) manipulates gradients to guarantee that a certain portion of weights are properly updated after crossing the rounding thresholds. |
Kanghyun Choi; Hye Yoon Lee; Deokki Hong; Joonsang Yu; Noseong Park; Youngsok Kim; Jinho Lee; |
1755 | Improving Segmentation of The Inferior Alveolar Nerve Through Deep Label Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper exploits and publicly releases a new 3D densely annotated dataset, through which we are able to train a deep label propagation model which obtains better results than those available in literature. |
Marco Cipriano; Stefano Allegretti; Federico Bolelli; Federico Pollastri; Costantino Grana; |
1756 | A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is because the current CNN-based methods adopt locality-based operations, which are not effective to deal with the variation caused by deformations. In this paper, we propose a CNN based Text ATTention network (TATT) to address this problem. |
Jianqi Ma; Zhetong Liang; Lei Zhang; |
1757 | Multi-Modal Dynamic Graph Transformer for Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Their performance depends on the density and quality of the candidate regions and is capped by the inability to optimize the located regions continuously. To address these issues, we propose to remodel VG into a progressively optimized visual semantic alignment process. |
Sijia Chen; Baochun Li; |
1758 | OSOP: A Multi-Stage One Shot Object Pose Estimation Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects. |
Ivan Shugurov; Fu Li; Benjamin Busam; Slobodan Ilic; |
1759 | Generative Cooperative Learning for Unsupervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This problem is challenging yet rewarding as it can completely eradicate the costs of obtaining laborious annotations and enable such systems to be deployed without human intervention. To this end, we propose a novel unsupervised Generative Cooperative Learning (GCL) approach for video anomaly detection that exploits the low frequency of anomalies towards building a cross-supervision between a generator and a discriminator. |
M. Zaigham Zaheer; Arif Mahmood; M. Haris Khan; Mattia Segu; Fisher Yu; Seung-Ik Lee; |
1760 | Rethinking Semantic Segmentation: A Prototype View Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. |
Tianfei Zhou; Wenguan Wang; Ender Konukoglu; Luc Van Gool; |
1761 | Geometric Transformer for Fast and Robust Point Cloud Registration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Geometric Transformer to learn geometric feature for robust superpoint matching. |
Zheng Qin; Hao Yu; Changjian Wang; Yulan Guo; Yuxing Peng; Kai Xu; |
1762 | Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.In this work, we propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL). |
Yinghao Xu; Fangyun Wei; Xiao Sun; Ceyuan Yang; Yujun Shen; Bo Dai; Bolei Zhou; Stephen Lin; |
1763 | UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Nevertheless, jointly conducting moment retrieval and highlight detection is an emerging research topic, even though its component problems and some related tasks have already been studied for a while. In this paper, we present the first unified framework, named Unified Multi-modal Transformers (UMT), capable of realizing such joint optimization while can also be easily degenerated for solving individual problems. |
Ye Liu; Siyuan Li; Yang Wu; Chang-Wen Chen; Ying Shan; Xiaohu Qie; |
1764 | Dual-Shutter Optical Vibration Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel method for sensing vibrations at high speeds (up to 63kHz), for multiple scene sources at once, using sensors rated for only 130Hz operation. |
Mark Sheinin; Dorian Chan; Matthew O’Toole; Srinivasa G. Narasimhan; |
1765 | Demystifying The Neural Tangent Kernel From A Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we revisit several at-initialization metrics that can be derived from the NTK and reveal their key shortcomings. |
Jisoo Mok; Byunggook Na; Ji-Hoon Kim; Dongyoon Han; Sungroh Yoon; |
1766 | Learning To Find Good Models in RANSAC Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the Model Quality Network, MQ-Net in short, for predicting the quality, e.g. the pose error of essential matrices, of models generated inside RANSAC. |
Daniel Barath; Luca Cavalli; Marc Pollefeys; |
1767 | Interactiveness Field in Human-Object Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a previously overlooked interactiveness bimodal prior: given an object in an image, after pairing it with the humans, the generated pairs are either mostly non-interactive, or mostly interactive, with the former more frequent than the latter. |
Xinpeng Liu; Yong-Lu Li; Xiaoqian Wu; Yu-Wing Tai; Cewu Lu; Chi-Keung Tang; |
1768 | BodyGAN: General-Purpose Controllable Neural Human Body Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In addition, such a unimodal strategy is prone to causing severe artifacts in the generated images like color distortions and unrealistic textures. To tackle these issues, this paper proposes a multi-factor conditioned method dubbed BodyGAN. |
Chaojie Yang; Hanhui Li; Shengjie Wu; Shengkai Zhang; Haonan Yan; Nianhong Jiao; Jie Tang; Runnan Zhou; Xiaodan Liang; Tianxiang Zheng; |
1769 | Image Disentanglement Autoencoder for Steganography Without Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Image DisEntanglement Autoencoder for Steganography (IDEAS) as a novel steganography without embedding (SWE) technique. |
Xiyao Liu; Ziping Ma; Junxing Ma; Jian Zhang; Gerald Schaefer; Hui Fang; |
1770 | Self-Supervised Dense Consistency Regularization for Image-to-Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a simple but effective regularization technique for improving GAN-based image-to-image translation. |
Minsu Ko; Eunju Cha; Sungjoo Suh; Huijin Lee; Jae-Joon Han; Jinwoo Shin; Bohyung Han; |
1771 | The Devil Is in The Details: Window-Based Attention for Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block. |
Renjie Zou; Chunfeng Song; Zhaoxiang Zhang; |
1772 | Category-Aware Transformer Network for Better Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we try to study the issue of promoting transformerbased HOI detectors by initializing the Object Query with category-aware semantic information. |
Leizhen Dong; Zhimin Li; Kunlun Xu; Zhijun Zhang; Luxin Yan; Sheng Zhong; Xu Zou; |
1773 | Deep Depth From Focus With Differential Focus Volume Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a convolutional neural network (CNN) to find the best-focused pixels in a focal stack and infer depth from the focus estimation. |
Fengting Yang; Xiaolei Huang; Zihan Zhou; |
1774 | DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a new real-world photometric stereo dataset with "ground truth" normal maps, which is 10 times larger than the widely adopted one. |
Jieji Ren; Feishi Wang; Jiahao Zhang; Qian Zheng; Mingjun Ren; Boxin Shi; |
1775 | Robust Fine-Tuning of Zero-Shot Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although existing fine-tuning methods substantially improve accuracy on a given target distribution, they often reduce robustness to distribution shifts. We address this tension by introducing a simple and effective method for improving robustness while fine-tuning: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT). |
Mitchell Wortsman; Gabriel Ilharco; Jong Wook Kim; Mike Li; Simon Kornblith; Rebecca Roelofs; Raphael Gontijo Lopes; Hannaneh Hajishirzi; Ali Farhadi; Hongseok Namkoong; Ludwig Schmidt; |
1776 | Towards Data-Free Model Stealing in A Hard Label Setting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we show that it is indeed possible to steal Machine Learning models by accessing only top-1 predictions (Hard Label setting) as well, without access to model gradients (Black-Box setting) or even the training dataset (Data-Free setting) within a low query budget. |
Sunandini Sanyal; Sravanti Addepalli; R. Venkatesh Babu; |
1777 | PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons. |
Stefano Zorzi; Shabab Bazrafkan; Stefan Habenschuss; Friedrich Fraundorfer; |
1778 | GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD Drawings Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By treating each CAD drawing as a graph, we propose a novel graph attention network GAT-CADNet to solve the panoptic symbol spotting problem: vertex features derived from the GAT branch are mapped to semantic labels, while their attention scores are cascaded and mapped to instance prediction. |
Zhaohua Zheng; Jianfang Li; Lingjie Zhu; Honghua Li; Frank Petzold; Ping Tan; |
1779 | Multi-Granularity Alignment Domain Adaptation for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a unified multi-granularity alignment based object detection framework towards domain-invariant feature learning. |
Wenzhang Zhou; Dawei Du; Libo Zhang; Tiejian Luo; Yanjun Wu; |
1780 | LARGE: Latent-Based Regression Through GAN Semantics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel method for solving regression tasks using few-shot or weak supervision. |
Yotam Nitzan; Rinon Gal; Ofir Brenner; Daniel Cohen-Or; |
1781 | Are Multimodal Transformers Robust to Missing Modality? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: What surprised us is that the optimal fusion strategy is dataset dependent even for the same Transformer model; there does not exist a universal strategy that works in general cases. Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data. |
Mengmeng Ma; Jian Ren; Long Zhao; Davide Testuggine; Xi Peng; |
1782 | Degradation-Agnostic Correspondence From Resolution-Asymmetric Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the problem of stereo matching from a pair of images with different resolutions, e.g., those acquired with a tele-wide camera system. |
Xihao Chen; Zhiwei Xiong; Zhen Cheng; Jiayong Peng; Yueyi Zhang; Zheng-Jun Zha; |
1783 | Fisher Information Guidance for Learned Time-of-Flight Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Fisher-information guided framework to jointly optimize the coding functions (light modulation and sensor demodulation functions) and the reconstruction network of iToF imaging, with the supervision of the proposed discriminative fisher loss. |
Jiaqu Li; Tao Yue; Sijie Zhao; Xuemei Hu; |
1784 | VRDFormer: End-to-End Video Visual Relation Detection With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most previous works adopt a multi-stage framework for video visual relation detection (VidVRD), which cannot capture long-term spatiotemporal contexts in different stages and also suffers from inefficiency. In this paper, we propose a transformerbased framework called VRDFormer to unify these decoupling stages. |
Sipeng Zheng; Shizhe Chen; Qin Jin; |
1785 | Robust Federated Learning With Noisy and Heterogeneous Clients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel solution RHFL (Robust Heterogeneous Federated Learning), which simultaneously handles the label noise and performs federated learning in a single framework. |
Xiuwen Fang; Mang Ye; |
1786 | Enabling Equivariance for Arbitrary Lie Groups Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a rigourous mathematical framework to permit invariance to any Lie group of warps, exclusively using convolutions (over Lie groups), without the need for capsules. |
Lachlan E. MacDonald; Sameera Ramasinghe; Simon Lucey; |
1787 | Unbiased Teacher V2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Unbiased Teacher v2, which shows the generalization of SS-OD method to anchor-free detectors and also introduces Listen2Student mechanism for the unsupervised regression loss. |
Yen-Cheng Liu; Chih-Yao Ma; Zsolt Kira; |
1788 | GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we show that HC can be parallelized on a GPU, showing significant speedups up to 56 times on polynomial benchmarks. |
Chiang-Heng Chien; Hongyi Fan; Ahmad Abdelfattah; Elias Tsigaridas; Stanimire Tomov; Benjamin Kimia; |
1789 | Learning Pixel-Level Distinctions for Video Highlight Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose to learn pixel-level distinctions to improve the video highlight detection. |
Fanyue Wei; Biao Wang; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan; |
1790 | Noise Distribution Adaptive Self-Supervised Image Denoising Using Tweedie Distribution and Score Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we reveal that Tweedie distributions also play key roles in modern deep learning era, leading to a distribution independent self-supervised image denoising formula without clean reference images. |
Kwanyoung Kim; Taesung Kwon; Jong Chul Ye; |
1791 | Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. |
Yihan Wang; Muyang Li; Han Cai; Wei-Ming Chen; Song Han; |
1792 | Boosting Black-Box Attack With Partially Transferred Conditional Adversarial Distribution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the possible differences on model architectures and training datasets between surrogate and target models, dubbed "surrogate biases", the contribution of adversarial transferability to improving the attack performance may be weakened. To tackle this issue, we innovatively propose a black-box attack method by developing a novel mechanism of adversarial transferability, which is robust to the surrogate biases. |
Yan Feng; Baoyuan Wu; Yanbo Fan; Li Liu; Zhifeng Li; Shu-Tao Xia; |
1793 | CLIPstyler: Image Style Transfer With A Single Text Condition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer ‘without’ a style image, but only with a text description of the desired style. |
Gihyun Kwon; Jong Chul Ye; |
1794 | Ray Priors Through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, instead of exploiting few-shot image synthesis, we study the novel view extrapolation setting that (1) the training images can well describe an object, and (2) there is a notable discrepancy between the training and test viewpoints’ distributions. |
Jian Zhang; Yuanqing Zhang; Huan Fu; Xiaowei Zhou; Bowen Cai; Jinchi Huang; Rongfei Jia; Binqiang Zhao; Xing Tang; |
1795 | Spatio-Temporal Relation Modeling for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. |
Anirudh Thatipelli; Sanath Narayan; Salman Khan; Rao Muhammad Anwer; Fahad Shahbaz Khan; Bernard Ghanem; |
1796 | Pop-Out Motion: 3D-Aware Image Deformation Via Learning The Shape Laplacian Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a framework that can deform an object in a 2D image as it exists in 3D space. |
Jihyun Lee; Minhyuk Sung; Hyunjin Kim; Tae-Kyun Kim; |
1797 | Volumetric Bundle Adjustment for Online Photorealistic Scene Capture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a system that can reconstruct photorealistic models of complex scenes in an efficient manner. |
Ronald Clark; |
1798 | Multi-Person Extreme Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons. |
Wen Guo; Xiaoyu Bie; Xavier Alameda-Pineda; Francesc Moreno-Noguer; |
1799 | Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To bridge adversarial robustness and model compression, we propose a novel adversarial pruning method, Masking Adversarial Damage (MAD) that employs second-order information of adversarial loss. |
Byung-Kwan Lee; Junho Kim; Yong Man Ro; |
1800 | Channel Balancing for Accurate Quantization of Winograd Convolutions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a novel class of Winograd algorithms that balances the filter and input channels in the Winograd domain. |
Vladimir Chikin; Vladimir Kryzhanovskiy; |
1801 | RegNeRF: Regularizing Neural Radiance Fields for View Synthesis From Sparse Inputs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. |
Michael Niemeyer; Jonathan T. Barron; Ben Mildenhall; Mehdi S. M. Sajjadi; Andreas Geiger; Noha Radwan; |
1802 | Structured Local Radiance Fields for Human Avatar Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To learn our representation from RGB data and facilitate pose generalization, we propose to learn the node translations and the detail variations in a conditional generative latent space. |
Zerong Zheng; Han Huang; Tao Yu; Hongwen Zhang; Yandong Guo; Yebin Liu; |
1803 | Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we train a SANCE model which utilizes an auxiliary segmentation module to supplement high-level semantic information for contour training by backbone feature sharing and online label supervision. |
Jing Li; Junsong Fan; Zhaoxiang Zhang; |
1804 | Ranking-Based Siamese Visual Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, such tracking paradigm takes only the classification confidence of proposals for the final prediction, which may yield the misalignment between classification and localization. To resolve these issues, this paper proposes a ranking-based optimization algorithm to explore the relationship among different proposals. |
Feng Tang; Qiang Ling; |
1805 | Learnable Lookup Table for Neural Network Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we formulate the quantization process as a simple lookup operation and propose to learn lookup tables as quantizers. |
Longguang Wang; Xiaoyu Dong; Yingqian Wang; Li Liu; Wei An; Yulan Guo; |
1806 | SEEG: Semantic Energized Co-Speech Gesture Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel method SEmantic Energized Generation (SEEG), for semantic-aware gesture generation. |
Yuanzhi Liang; Qianyu Feng; Linchao Zhu; Li Hu; Pan Pan; Yi Yang; |
1807 | AdaViT: Adaptive Vision Transformers for Efficient Image Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we argue that due to the large variations among images, their need for modeling long-range dependencies between patches differ. |
Lingchen Meng; Hengduo Li; Bor-Chun Chen; Shiyi Lan; Zuxuan Wu; Yu-Gang Jiang; Ser-Nam Lim; |
1808 | Compound Domain Generalization Via Meta-Knowledge Encoding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study a practical problem of compound DG, which relaxes the discrete domain assumption to the mixed source domains setting. |
Chaoqi Chen; Jiongcheng Li; Xiaoguang Han; Xiaoqing Liu; Yizhou Yu; |
1809 | NAN: Noise-Aware NeRFs for Burst-Denoising Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We show that Neural Radiance Fields (NeRFs), originally suggested for physics-based novel-view rendering, can serve as a powerful framework for burst denoising. |
Naama Pearl; Tali Treibitz; Simon Korman; |
1810 | Physical Inertial Poser (PIP): Physics-Aware Real-Time Human Motion Tracking From Sparse Inertial Sensors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In consequence, recent state-of-the-art methods can barely handle very long period motions, and unrealistic artifacts are common due to the unawareness of physical constraints. To this end, we present the first method which combines a neural kinematics estimator and a physics-aware motion optimizer to track body motions with only 6 inertial sensors. |
Xinyu Yi; Yuxiao Zhou; Marc Habermann; Soshi Shimada; Vladislav Golyanik; Christian Theobalt; Feng Xu; |
1811 | B-DARTS: Beta-Decay Regularization for Differentiable Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, they suffer from two main issues, the weak robustness to the performance collapse and the poor generalization ability of the searched architectures. To solve these two problems, a simple-but-efficient regularization method, termed as Beta-Decay, is proposed to regularize the DARTS-based NAS searching process. |
Peng Ye; Baopu Li; Yikang Li; Tao Chen; Jiayuan Fan; Wanli Ouyang; |
1812 | Vector Quantized Diffusion Model for Text-to-Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. |
Shuyang Gu; Dong Chen; Jianmin Bao; Fang Wen; Bo Zhang; Dongdong Chen; Lu Yuan; Baining Guo; |
1813 | CMT: Convolutional Neural Networks Meet Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs). In this paper, we aim to address this issue and develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models. |
Jianyuan Guo; Kai Han; Han Wu; Yehui Tang; Xinghao Chen; Yunhe Wang; Chang Xu; |
1814 | Hyperspherical Consistency Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we systematically explore the relationship between self-supervised learning and supervised learning, and study how self-supervised learning helps robust data-efficient deep learning. |
Cheng Tan; Zhangyang Gao; Lirong Wu; Siyuan Li; Stan Z. Li; |
1815 | Unsupervised Image-to-Image Translation With Generative Prior Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm. |
Shuai Yang; Liming Jiang; Ziwei Liu; Chen Change Loy; |
1816 | KNN Local Attention for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, by focusing only on adjacent positions, the local attention suffers from an insufficient receptive field for image restoration. In this paper, we propose a new attention mechanism for image restoration, called k-NN Image Transformer (KiT), that rectifies above mentioned limitations. |
Hunsang Lee; Hyesong Choi; Kwanghoon Sohn; Dongbo Min; |
1817 | Face Relighting With Geometrically Consistent Shadows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel differentiable algorithm for synthesizing hard shadows based on ray tracing, which we incorporate into training our face relighting model. |
Andrew Hou; Michel Sarkis; Ning Bi; Yiying Tong; Xiaoming Liu; |
1818 | Open-Set Text Recognition Via Character-Context Decoupling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Under open-set scenarios, the intractable bias in contextual information can be passed down to visual information, consequently impairing the classification performance. In this paper, a Character-Context Decoupling framework is proposed to alleviate this problem by separating contextual information and character-visual information. |
Chang Liu; Chun Yang; Xu-Cheng Yin; |
1819 | Multi-Marginal Contrastive Learning for Multi-Label Subcellular Protein Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a deep protein subcellular localization method with multi-marginal contrastive learning to perceive the same PSLs in different tissue images and different PSLs within the same tissue image. |
Ziyi Liu; Zengmao Wang; Bo Du; |
1820 | Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Probabilistic Warp Consistency, a weakly-supervised learning objective for semantic matching. |
Prune Truong; Martin Danelljan; Fisher Yu; Luc Van Gool; |
1821 | Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered By Pre-Trained Vision-Language Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations. |
Zipeng Xu; Tianwei Lin; Hao Tang; Fu Li; Dongliang He; Nicu Sebe; Radu Timofte; Luc Van Gool; Errui Ding; |
1822 | Optimizing Elimination Templates By Greedy Parameter Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method for constructing elimination templates for efficient polynomial system solving of minimal problems in structure from motion, image matching, and camera tracking. |
Evgeniy Martyushev; Jana Vráblíková; Tomas Pajdla; |
1823 | TransMix: Attend To Mix for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This may lead to a strange phenomenon that sometimes there is no valid object in the mixed image due to the random process in augmentation but there is still response in the label space. To bridge such gap between the input and label spaces, we propose TransMix, which mixes labels based on the attention maps of Vision Transformers. |
Jie-Neng Chen; Shuyang Sun; Ju He; Philip H.S. Torr; Alan Yuille; Song Bai; |
1824 | HOP: History-and-Order Aware Pre-Training for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, to promote the learning of spatio-temporal visual-textual correspondence as well as the agent’s capability of decision making, we propose a novel history-and-order aware pre-training paradigm (HOP) with VLN-specific objectives that exploit the past observations and support future action prediction. |
Yanyuan Qiao; Yuankai Qi; Yicong Hong; Zheng Yu; Peng Wang; Qi Wu; |
1825 | Inertia-Guided Flow Completion and Style Fusion for Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Nevertheless, the existing flow-guided cross-frame warping methods fail to consider the lightening and sharpness variation across video frames, which leads to spatial incoherence after warping from other frames. To alleviate such problem, we propose the Adaptive Style Fusion Network (ASFN), which utilizes the style information extracted from the valid regions to guide the gradient refinement in the warped regions. |
Kaidong Zhang; Jingjing Fu; Dong Liu; |
1826 | RU-Net: Regularized Unrolling Network for Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing SGG methods usually suffer from several issues, including 1) ambiguous object representations, as graph neural network-based message passing (GMP) modules are typically sensitive to spurious inter-node correlations, and 2) low diversity in relationship predictions due to severe class imbalance and a large number of missing annotations. To address both problems, in this paper, we propose a regularized unrolling network (RU-Net). |
Xin Lin; Changxing Ding; Jing Zhang; Yibing Zhan; Dacheng Tao; |
1827 | Long-Tailed Visual Recognition Via Gaussian Clouded Logit Adjustment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper proposes the Gaussian clouded logit adjustment by Gaussian perturbation of different class logits with varied amplitude. |
Mengke Li; Yiu-ming Cheung; Yang Lu; |
1828 | Image Animation With Perturbed Masks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel approach for image-animation of a source image by a driving video, both depicting the same type of object. |
Yoav Shalev; Lior Wolf; |
1829 | Exploring The Equivalence of Siamese Self-Supervised Learning Via A Unified Gradient Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The final accuracy numbers also vary, where different networks and tricks are utilized in different works. In this work, we demonstrate that these methods can be unified into the same form. |
Chenxin Tao; Honghui Wang; Xizhou Zhu; Jiahua Dong; Shiji Song; Gao Huang; Jifeng Dai; |
1830 | Point Density-Aware Voxels for LiDAR 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our proposed solution, Point Density-Aware Voxel network (PDV), is an end-to-end two stage LiDAR 3D object detection architecture that is designed to account for these point density variations. |
Jordan S. K. Hu; Tianshu Kuai; Steven L. Waslander; |
1831 | Integrating Language Guidance Into Vision-Based Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space. To tackle this issue, we propose a language guidance objective for visual similarity learning. |
Karsten Roth; Oriol Vinyals; Zeynep Akata; |
1832 | PartGlot: Learning Shape Part Segmentation From Language Reference Games Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce PartGlot, a neural framework and associated architectures for learning semantic part segmentation of 3D shape geometry, based solely on part referential language. |
Juil Koo; Ian Huang; Panos Achlioptas; Leonidas J. Guibas; Minhyuk Sung; |
1833 | Domain Generalization Via Shuffled Style Assembly for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we separate the complete representation into content and style ones. |
Zhuo Wang; Zezheng Wang; Zitong Yu; Weihong Deng; Jiahong Li; Tingting Gao; Zhongyuan Wang; |
1834 | A Simple Episodic Linear Probe Improves Visual Recognition in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an episodic linear probing (ELP) classifier to reflect the generalization of visual representations in an online manner. |
Yuanzhi Liang; Linchao Zhu; Xiaohan Wang; Yi Yang; |
1835 | Matching Feature Sets for Few-Shot Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. |
Arman Afrasiyabi; Hugo Larochelle; Jean-François Lalonde; Christian Gagné; |
1836 | DIVeR: Real-Time and Accurate Neural Radiance Fields With Deterministic Integration for Volume Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: DIVeR builds on the key ideas of NeRF and its variants — density models and volume rendering — to learn 3D object models that can be rendered realistically from small numbers of images. |
Liwen Wu; Jae Yong Lee; Anand Bhattad; Yu-Xiong Wang; David Forsyth; |
1837 | Enhancing Classifier Conservativeness and Robustness By Polynomiality Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We illustrate the detrimental effect, such as overconfident decisions, that exponential behavior can have in methods like classical LDA and logistic regression. We then show how polynomiality can remedy the situation. |
Ziqi Wang; Marco Loog; |
1838 | Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Differently from existing methods, which are purely based on deep learning, we take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. |
Luke Melas-Kyriazi; Christian Rupprecht; Iro Laina; Andrea Vedaldi; |
1839 | OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This problem is even more severe for single-view-based systems due to strong occlusions. Based on these observations, we propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction. |
Wenbin Lin; Chengwei Zheng; Jun-Hai Yong; Feng Xu; |
1840 | ContIG: Self-Supervised Multimodal Contrastive Learning for Medical Imaging With Genetics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose ContIG, a self-supervised method that can learn from large datasets of unlabeled medical images and genetic data. |
Aiham Taleb; Matthias Kirchler; Remo Monti; Christoph Lippert; |
1841 | Revisiting Domain Generalized Stereo Matching Networks From A Feature Consistency Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that maintaining feature consistency between matching pixels is a vital factor for promoting the generalization capability of stereo matching networks, which has not been adequately considered. Here we address this issue by proposing a simple pixel-wise contrastive learning across the viewpoints. |
Jiawei Zhang; Xiang Wang; Xiao Bai; Chen Wang; Lei Huang; Yimin Chen; Lin Gu; Jun Zhou; Tatsuya Harada; Edwin R. Hancock; |
1842 | MonoScene: Monocular 3D Semantic Scene Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Along with architectural contributions, we introduce novel global scene and local frustums losses. |
Anh-Quan Cao; Raoul de Charette; |
1843 | TubeFormer-DeepLab: Video Mask Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner. |
Dahun Kim; Jun Xie; Huiyu Wang; Siyuan Qiao; Qihang Yu; Hong-Seok Kim; Hartwig Adam; In So Kweon; Liang-Chieh Chen; |
1844 | XMP-Font: Self-Supervised Cross-Modality Pre-Training for Few-Shot Font Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. |
Wei Liu; Fangyue Liu; Fei Ding; Qian He; Zili Yi; |
1845 | Disentangling Visual and Written Concepts in CLIP Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The CLIP network measures the similarity between natural text and images; in this work, we investigate the entanglement of the representation of word images and natural images in its image encoder. |
Joanna Materzyńska; Antonio Torralba; David Bau; |
1846 | Gradient-SDF: A Semi-Implicit Surface Representation for 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Gradient-SDF, a novel representation for 3D geometry that combines the advantages of implict and explicit representations. |
Christiane Sommer; Lu Sang; David Schubert; Daniel Cremers; |
1847 | Bilateral Video Magnification Filter Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To enhance EVM performance on real videos, this paper proposes a bilateral video magnification filter (BVMF) that offers simple yet robust temporal filtering. |
Shoichiro Takeda; Kenta Niwa; Mariko Isogawa; Shinya Shimizu; Kazuki Okami; Yushi Aono; |
1848 | AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. |
Yulin Wang; Yang Yue; Yuanze Lin; Haojun Jiang; Zihang Lai; Victor Kulikov; Nikita Orlov; Humphrey Shi; Gao Huang; |
1849 | Localization Distillation for Dense Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, by reformulating the knowledge distillation process on localization, we present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student. |
Zhaohui Zheng; Rongguang Ye; Ping Wang; Dongwei Ren; Wangmeng Zuo; Qibin Hou; Ming-Ming Cheng; |
1850 | What’s in Your Hands? 3D Reconstruction of Generic Objects in Hands Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key insight is that hand articulation is highly predictive of the object shape, and we propose an approach that conditionally reconstructs the object based on the articulation and the visual input. |
Yufei Ye; Abhinav Gupta; Shubham Tulsiani; |
1851 | Continuous Scene Representations for Embodied AI Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings. |
Samir Yitzhak Gadre; Kiana Ehsani; Shuran Song; Roozbeh Mottaghi; |
1852 | Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective. |
Chaoda Zheng; Xu Yan; Haiming Zhang; Baoyuan Wang; Shenghui Cheng; Shuguang Cui; Zhen Li; |
1853 | Neural Mean Discrepancy for Efficient Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data. Leveraging the simplicity of NMD, we propose an efficient OOD detector that computes neural means by a standard forward pass followed by a lightweight classifier. |
Xin Dong; Junfeng Guo; Ang Li; Wei-Te Ting; Cong Liu; H.T. Kung; |
1854 | Non-Probability Sampling Network for Stochastic Human Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we analyze the problem by reconstructing and comparing probabilistic distributions from prediction samples and socially-acceptable paths, respectively. |
Inhwan Bae; Jin-Hwi Park; Hae-Gon Jeon; |
1855 | Marginal Contrastive Correspondence for Guided Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation. |
Fangneng Zhan; Yingchen Yu; Rongliang Wu; Jiahui Zhang; Shijian Lu; Changgong Zhang; |
1856 | Complex Backdoor Detection By Symmetric Feature Differencing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop a new detection method. It first uses a trigger inversion technique to generate triggers, namely, universal input patterns flipping victim class samples to a target class. It then checks if any such trigger is composed of features that are not natural distinctive features between the victim and target classes. |
Yingqi Liu; Guangyu Shen; Guanhong Tao; Zhenting Wang; Shiqing Ma; Xiangyu Zhang; |
1857 | Time Lens++: Event-Based Frame Interpolation With Parametric Non-Linear Flow and Multi-Scale Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Moreover, previous methods were only tested on datasets consisting of planar and far-away scenes, which do not capture the full complexity of the real world. In this work, we address the above problems by introducing multi-scale feature-level fusion and computing one-shot non-linear inter-frame motion—which can be efficiently sampled for image warping—from events and images. |
Stepan Tulyakov; Alfredo Bochicchio; Daniel Gehrig; Stamatios Georgoulis; Yuanyou Li; Davide Scaramuzza; |
1858 | ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: So we propose ResSFL, a Split Federated Learning Framework that is designed to be MI-resistant during training. |
Jingtao Li; Adnan Siraj Rakin; Xing Chen; Zhezhi He; Deliang Fan; Chaitali Chakrabarti; |
1859 | RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such undesired shifts would prevent the SNN from performing well and going deep. To tackle these problems, we attempt to rectify the membrane potential distribution (MPD) by designing a novel distribution loss, MPD-Loss, which can explicitly penalize the undesired shifts without introducing any additional operations in the inference phase. |
Yufei Guo; Xinyi Tong; Yuanpei Chen; Liwen Zhang; Xiaode Liu; Zhe Ma; Xuhui Huang; |
1860 | Human-Aware Object Placement for Visual Environment Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images, and use these in optimizing the 3D scene to reconstruct a consistent, physically plausible, 3D scene layout. |
Hongwei Yi; Chun-Hao P. Huang; Dimitrios Tzionas; Muhammed Kocabas; Mohamed Hassan; Siyu Tang; Justus Thies; Michael J. Black; |
1861 | X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Common text-agnostic aggregations schemes include mean-pooling or self-attention over the frames, but these are likely to encode misleading visual information not described in the given text. To address this, we propose a cross-modal attention model called X-Pool that reasons between a text and the frames of a video. |
Satya Krishna Gorti; Noël Vouitsis; Junwei Ma; Keyvan Golestan; Maksims Volkovs; Animesh Garg; Guangwei Yu; |
1862 | Learning of Global Objective for Network Flow in Multi-Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most previous studies focus on learning the cost function by only taking into account two frames during training, therefore the learned cost function is sub-optimal for MCF where a multi-frame data association must be considered during inference. In order to address this problem, in this paper we propose a novel differentiable framework that ties training and inference together during learning by solving a bi-level optimization problem, where the lower-level solves a linear program and the upper-level contains a loss function that incorporates global tracking result. |
Shuai Li; Yu Kong; Hamid Rezatofighi; |
1863 | Towards Weakly-Supervised Text Spotting Using A Multi-Task Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. |
Yair Kittenplon; Inbal Lavi; Sharon Fogel; Yarin Bar; R. Manmatha; Pietro Perona; |
1864 | Gated2Gated: Self-Supervised Depth Estimation From Gated Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although existing methods have shown that it is possible to decode high-resolution depth from such measurements, these methods require synchronized and calibrated LiDAR to supervise the gated depth decoder – prohibiting fast adoption across geographies, training on large unpaired datasets, and exploring alternative applications outside of automotive use cases. In this work, we fill this gap and propose an entirely self-supervised depth estimation method that uses gated intensity profiles and temporal consistency as a training signal. |
Amanpreet Walia; Stefanie Walz; Mario Bijelic; Fahim Mannan; Frank Julca-Aguilar; Michael Langer; Werner Ritter; Felix Heide; |
1865 | RAMA: A Rapid Multicut Algorithm on GPU Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a highly parallel primal-dual algorithm for the multicut (a.k.a. correlation clustering) problem, a classical graph clustering problem widely used in machine learning and computer vision. |
Ahmed Abbas; Paul Swoboda; |
1866 | Adversarial Parametric Pose Prior Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose learning a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training. |
Andrey Davydov; Anastasia Remizova; Victor Constantin; Sina Honari; Mathieu Salzmann; Pascal Fua; |
1867 | DC-SSL: Addressing Mismatched Class Distribution in Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, this paper revisits the Exponentially Moving Average (EMA) model and utilizes it to estimate RCD in an iteratively improved manner, which is achieved with a momentum-update scheme throughout the training procedure. |
Zhen Zhao; Luping Zhou; Yue Duan; Lei Wang; Lei Qi; Yinghuan Shi; |
1868 | Mask Transfiner for High-Quality Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Mask Transfiner for high-quality and efficient instance segmentation. |
Lei Ke; Martin Danelljan; Xia Li; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu; |
1869 | End-to-End Reconstruction-Classification Learning for Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, from a new perspective, we propose a forgery detection framework emphasizing the common compact representations of genuine faces based on reconstruction-classification learning. |
Junyi Cao; Chao Ma; Taiping Yao; Shen Chen; Shouhong Ding; Xiaokang Yang; |
1870 | It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning By Contrastive Data Collection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We show that collecting new data, in the same way, is not effective in mitigating this emotional bias. To remedy this problem, we propose a contrastive data collection approach to balance ArtEmis with a new complementary dataset such that a pair of similar images have contrasting emotions (one positive and one negative). |
Youssef Mohamed; Faizan Farooq Khan; Kilichbek Haydarov; Mohamed Elhoseiny; |
1871 | Transferability Metrics for Selecting Source Model Ensembles Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Since fine-tuning all possible ensembles is computationally prohibitive, we aim at predicting performance on the target dataset using a computationally efficient transferability metric. |
Andrea Agostinelli; Jasper Uijlings; Thomas Mensink; Vittorio Ferrari; |
1872 | Neural Global Shutter: Learn To Restore Video From A Rolling Shutter Camera With Global Reset Feature Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate using rolling shutter with a global reset feature (RSGR) to restore clean global shutter (GS) videos. |
Zhixiang Wang; Xiang Ji; Jia-Bin Huang; Shin’ichi Satoh; Xiao Zhou; Yinqiang Zheng; |
1873 | DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. |
Fatemeh Haghighi; Mohammad Reza Hosseinzadeh Taher; Michael B. Gotway; Jianming Liang; |
1874 | Open Challenges in Deep Stereo: The Booster Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel high-resolution and challenging stereo dataset framing indoor scenes annotated with dense and accurate ground-truth disparities. |
Pierluigi Zama Ramirez; Fabio Tosi; Matteo Poggi; Samuele Salti; Stefano Mattoccia; Luigi Di Stefano; |
1875 | Location-Free Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it is rather time-consuming to collect high-quality and fine-grained annotations for human body. To alleviate this issue, we revisit HPE and propose a location-free framework without supervision of keypoint locations. |
Xixia Xu; Yingguo Gao; Ke Yan; Xue Lin; Qi Zou; |
1876 | Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In addition, these methods discard the rich structural and appearance information carried in the BMA stripe region. To address these issues, in this paper we propose a self-supervised content-aware BMA removal model. |
Jiaxiang Ren; Kicheon Park; Yingtian Pan; Haibin Ling; |
1877 | Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to work on new object categories. We propose to learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views, with no joints annotation supervision, or information about the structure. |
Atsuhiro Noguchi; Umar Iqbal; Jonathan Tremblay; Tatsuya Harada; Orazio Gallo; |
1878 | PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consequently, approaches on these respective tasks are eligible to complement each other. Therefore, we introduce PoseTrack21, a large-scale dataset for person search, multi-object tracking and multi-person pose tracking in real-world scenarios with a high diversity of poses. |
Andreas Döring; Di Chen; Shanshan Zhang; Bernt Schiele; Jürgen Gall; |
1879 | Event-Based Video Reconstruction Via Potential-Assisted Spiking Neural Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel Event-based Video reconstruction framework based on a fully Spiking Neural Network (EVSNN), which utilizes Leaky-Integrate-and-Fire (LIF) neuron and Membrane Potential (MP) neuron. |
Lin Zhu; Xiao Wang; Yi Chang; Jianing Li; Tiejun Huang; Yonghong Tian; |
1880 | Efficient Maximal Coding Rate Reduction By Variational Forms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By taking advantage of variational forms of spectral functions of a matrix, we reformulate the MCR2 objective to a form that can scale significantly without compromising training accuracy. |
Christina Baek; Ziyang Wu; Kwan Ho Ryan Chan; Tianjiao Ding; Yi Ma; Benjamin D. Haeffele; |
1881 | Ithaca365: Dataset and Driving Perception Under Repeated and Challenging Weather Conditions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a new dataset to enable robust autonomous driving via a novel data collection process — data is repeatedly recorded along a 15 km route under diverse scene (urban, highway, rural, campus), weather (snow, rain, sun), time (day/night), and traffic conditions (pedestrians, cyclists and cars). |
Carlos A. Diaz-Ruiz; Youya Xia; Yurong You; Jose Nino; Junan Chen; Josephine Monica; Xiangyu Chen; Katie Luo; Yan Wang; Marc Emond; Wei-Lun Chao; Bharath Hariharan; Kilian Q. Weinberger; Mark Campbell; |
1882 | AutoLoss-GMS: Searching Generalized Margin-Based Softmax Loss Function for Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel method, AutoLoss-GMS, to search the better loss function in the space of generalized margin-based softmax loss function for person re-identification automatically. |
Hongyang Gu; Jianmin Li; Guangyuan Fu; Chifong Wong; Xinghao Chen; Jun Zhu; |
1883 | YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Many video understanding tasks require analyzing multi-shot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset—YouMVOS—of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. |
Donglai Wei; Siddhant Kharbanda; Sarthak Arora; Roshan Roy; Nishant Jain; Akash Palrecha; Tanav Shah; Shray Mathur; Ritik Mathur; Abhijay Kemkar; Anirudh Chakravarthy; Zudi Lin; Won-Dong Jang; Yansong Tang; Song Bai; James Tompkin; Philip H.S. Torr; Hanspeter Pfister; |
1884 | DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and newly reveal the potential of Transformers for UDA semantic segmentation. Based on the findings, we propose a novel UDA method, DAFormer. |
Lukas Hoyer; Dengxin Dai; Luc Van Gool; |
1885 | Sound-Guided Semantic Image Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose a framework that directly encodes sound into the multi-modal (image-text) embedding space and manipulates an image from the space. |
Seung Hyun Lee; Wonseok Roh; Wonmin Byeon; Sang Ho Yoon; Chanyoung Kim; Jinkyu Kim; Sangpil Kim; |
1886 | Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a deep Brownian Distance Covariance (DeepBDC) method for few-shot classification. |
Jiangtao Xie; Fei Long; Jiaming Lv; Qilong Wang; Peihua Li; |
1887 | Proper Reuse of Image Classification Features Improves Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent works show this is not strictly necessary under longer training regimes and provide recipes for training the backbone from scratch. We investigate the opposite direction of this end-to-end training trend: we show that an extreme form of knowledge preservation–freezing the classifier-initialized backbone– consistently improves many different detection models, and leads to considerable resource savings. |
Cristina Vasconcelos; Vighnesh Birodkar; Vincent Dumoulin; |
1888 | MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the era of deep learning, human pose estimation from multiple cameras with unknown calibration has received little attention to date. We show how to train a neural model to perform this task with high precision and minimal latency overhead. |
Ben Usman; Andrea Tagliasacchi; Kate Saenko; Avneesh Sud; |
1889 | End-to-End Human-Gaze-Target Detection With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following. |
Danyang Tu; Xiongkuo Min; Huiyu Duan; Guodong Guo; Guangtao Zhai; Wei Shen; |
1890 | The Devil Is in The Pose: Ambiguity-Free 3D Rotation-Invariant Learning Via Pose-Aware Convolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we real that the global information loss stems from an unexplored pose information loss problem, i.e., common convolution layers cannot capture the relative poses between RI features, thus hindering the global information to be hierarchically aggregated in the deep networks. |
Ronghan Chen; Yang Cong; |
1891 | Compositional Temporal Grounding With Structured Variational Cross-Graph Correspondence Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Evaluating the state-of-the-art methods on our new dataset splits, we empirically find that they fail to generalize to queries with novel combinations of seen words. To tackle this challenge, we propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies and learns fine-grained semantic correspondence among them. |
Juncheng Li; Junlin Xie; Long Qian; Linchao Zhu; Siliang Tang; Fei Wu; Yi Yang; Yueting Zhuang; Xin Eric Wang; |
1892 | Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV), including 500 sequences with 1.7 million high-resolution (1920*1080 pixels) frame pairs. |
Pengyu Zhang; Jie Zhao; Dong Wang; Huchuan Lu; Xiang Ruan; |
1893 | Future Transformer for Long-Term Action Anticipation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In a similar spirit, we propose an end-to-end attention model for action anticipation, dubbed Future Transformer (FUTR), that leverages global attention over all input frames and output tokens to predict a minutes-long sequence of future actions. |
Dayoung Gong; Joonseok Lee; Manjin Kim; Seong Jong Ha; Minsu Cho; |
1894 | Optimal LED Spectral Multiplexing for NIR2RGB Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to break the fundamental restrictions on reliable NIR-to-RGB (NIR2RGB) translation by examining the imaging mechanism of single-chip silicon-based RGB cameras under NIR illuminations, and propose to retrieve the optimal LED multiplexing via deep learning. |
Lei Liu; Yuze Chen; Junchi Yan; Yinqiang Zheng; |
1895 | Rethinking Spatial Invariance of Convolutional Networks for Object Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. |
Zhi-Qi Cheng; Qi Dai; Hong Li; Jingkuan Song; Xiao Wu; Alexander G. Hauptmann; |
1896 | Self-Supervised Video Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose self-supervised training for video transformers using unlabeled video data. |
Kanchana Ranasinghe; Muzammal Naseer; Salman Khan; Fahad Shahbaz Khan; Michael S. Ryoo; |
1897 | AutoRF: Learning 3D Object Radiance Fields From Single View Observations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce AutoRF – a new approach for learning neural 3D object representations where each object in the training set is observed by only a single view. |
Norman Müller; Andrea Simonelli; Lorenzo Porzi; Samuel Rota Bulò; Matthias Nießner; Peter Kontschieder; |
1898 | Expanding Large Pre-Trained Unimodal Models With Multimodal Information Injection for Image-Text Multimodal Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose the Multimodal Information Injection Plug-in (MI2P) which is attached to different layers of the unimodal models (e.g., DenseNet and BERT). |
Tao Liang; Guosheng Lin; Mingyang Wan; Tianrui Li; Guojun Ma; Fengmao Lv; |
1899 | Neural RGB-D Surface Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The volumetric representation of the surface based on densities leads to artifacts when a surface is extracted using Marching Cubes, since during optimization, densities are accumulated along the ray and are not used at a single sample point in isolation. Instead of this volumetric representation of the surface, we propose to represent the surface using an implicit function (truncated signed distance function). |
Dejan Azinović; Ricardo Martin-Brualla; Dan B Goldman; Matthias Nießner; Justus Thies; |
1900 | ClusterGNN: Cluster-Based Coarse-To-Fine Graph Neural Network for Efficient Feature Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by a prior observation that self- and cross- attention matrices converge to a sparse representation, we propose ClusterGNN, an attentional GNN architecture which operates on clusters for learning the feature matching task. |
Yan Shi; Jun-Xiong Cai; Yoli Shavit; Tai-Jiang Mu; Wensen Feng; Kai Zhang; |
1901 | AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation By Learnable Motion Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose AdaptPose, an end-to-end framework that generates synthetic 3D human motions from a source dataset and uses them to fine-tune a 3D pose estimator. |
Mohsen Gholami; Bastian Wandt; Helge Rhodin; Rabab Ward; Z. Jane Wang; |
1902 | ClothFormer: Taming Video Virtual Try-On in All Module Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e.g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment. |
Jianbin Jiang; Tan Wang; He Yan; Junhui Liu; |
1903 | Cross-Domain Adaptive Teacher for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it suffers from the domain shift and generates many low-quality pseudo labels (e.g., false positives), which leads to sub-optimal performance. To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap. |
Yu-Jhe Li; Xiaoliang Dai; Chih-Yao Ma; Yen-Cheng Liu; Kan Chen; Bichen Wu; Zijian He; Kris Kitani; Peter Vajda; |
1904 | Geometric Anchor Correspondence Mining With Uncertainty Modeling for Universal Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Previous works rarely explore the intrinsic geometrical relationship between the two domains, and they manually set a threshold for the overconfident closed-world classifier to reject "unknown" samples. Therefore, in this paper, we propose a Geometric anchor-guided Adversarial and conTrastive learning framework with uncErtainty modeling called GATE to alleviate these issues. |
Liang Chen; Yihang Lou; Jianzhong He; Tao Bai; Minghua Deng; |
1905 | Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the produced pseudo labels often contain much noise because the model is biased to source domain as well as majority categories. To address the above issues, we propose to directly explore the intrinsic pixel distributions of target domain data, instead of heavily relying on the source domain. |
Ruihuang Li; Shuai Li; Chenhang He; Yabin Zhang; Xu Jia; Lei Zhang; |
1906 | Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. |
Jiaxun Cui; Hang Qiu; Dian Chen; Peter Stone; Yuke Zhu; |
1907 | Condensing CNNs With Partial Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new feature layer, called the Global layer, that enforces PDE constraints on the feature maps, resulting in rich features. |
Anil Kag; Venkatesh Saligrama; |
1908 | Few-Shot Keypoint Detection With Uncertainty Learning for Unseen Species Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we propose a versatile Few-shot Keypoint Detection (FSKD) pipeline, which can detect a varying number of keypoints of different kinds. |
Changsheng Lu; Piotr Koniusz; |
1909 | Improving Robustness Against Stealthy Weight Bit-Flip Attacks By Output Code Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This renders the attack undetectable from a DNN operation perspective. We propose a DNN defense mechanism to improve robustness in such realistic stealthy weight bit-flip attack scenarios. |
Ozan Özdenizci; Robert Legenstein; |
1910 | Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our idea is that a good representation must be able to reveal not just a particular level of grouping, but any level of grouping in a consistent and predictable manner across different levels of granularity. |
Tsung-Wei Ke; Jyh-Jing Hwang; Yunhui Guo; Xudong Wang; Stella X. Yu; |
1911 | 3D-SPS: Single-Stage 3D Visual Grounding Via Referred Point Progressive Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a 3D Single-Stage Referred Point Progressive Selection (3D-SPS) method, which progressively selects keypoints with the guidance of language and directly locates the target. |
Junyu Luo; Jiahui Fu; Xianghao Kong; Chen Gao; Haibing Ren; Hao Shen; Huaxia Xia; Si Liu; |
1912 | TubeR: Tubelet Transformer for Video Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose TubeR: a simple solution for spatio-temporal video action detection. |
Jiaojiao Zhao; Yanyi Zhang; Xinyu Li; Hao Chen; Bing Shuai; Mingze Xu; Chunhui Liu; Kaustav Kundu; Yuanjun Xiong; Davide Modolo; Ivan Marsic; Cees G. M. Snoek; Joseph Tighe; |
1913 | LASER: LAtent SpacE Rendering for 2D Visual Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present LASER, an image-based Monte Carlo Localization (MCL) framework for 2D floor maps. |
Zhixiang Min; Naji Khosravan; Zachary Bessinger; Manjunath Narayana; Sing Bing Kang; Enrique Dunn; Ivaylo Boyadzhiev; |
1914 | MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Especially when extending SSL to semi-supervised object detection (SSOD), many strong augmentation methodologies related to image geometry and interpolation-regularization are hard to utilize since they possibly hurt the location information of the bounding box in the object detection task. To address this, we introduce a simple yet effective data augmentation method, Mix/UnMix (MUM), which unmixes feature tiles for the mixed image tiles for the SSOD framework. |
JongMok Kim; JooYoung Jang; Seunghyeon Seo; Jisoo Jeong; Jongkeun Na; Nojun Kwak; |
1915 | On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, few studies have analyzed the adversarial robustness of trajectory prediction or investigated whether the worst-case prediction can still lead to safe planning. To bridge this gap, we study the adversarial robustness of trajectory prediction models by proposing a new adversarial attack that perturbs normal vehicle trajectories to maximize the prediction error. |
Qingzhao Zhang; Shengtuo Hu; Jiachen Sun; Qi Alfred Chen; Z. Morley Mao; |
1916 | Kubric: A Scalable Dataset Generator Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. |
Klaus Greff; Francois Belletti; Lucas Beyer; Carl Doersch; Yilun Du; Daniel Duckworth; David J. Fleet; Dan Gnanapragasam; Florian Golemo; Charles Herrmann; Thomas Kipf; Abhijit Kundu; Dmitry Lagun; Issam Laradji; Hsueh-Ti (Derek) Liu; Henning Meyer; Yishu Miao; Derek Nowrouzezahrai; Cengiz Oztireli; Etienne Pot; Noha Radwan; Daniel Rebain; Sara Sabour; Mehdi S. M. Sajjadi; Matan Sela; Vincent Sitzmann; Austin Stone; Deqing Sun; Suhani Vora; Ziyu Wang; Tianhao Wu; Kwang Moo Yi; Fangcheng Zhong; Andrea Tagliasacchi; |
1917 | Unpaired Deep Image Deraining Using Dual Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop an effective unpaired SID adversarial framework which explores mutual properties of the unpaired exemplars by a dual contrastive learning manner in a deep feature space, named as DCD-GAN. |
Xiang Chen; Jinshan Pan; Kui Jiang; Yufeng Li; Yufeng Huang; Caihua Kong; Longgang Dai; Zhentao Fan; |
1918 | Learning Multiple Dense Prediction Tasks From Partially Annotated Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a label efficient approach and look at jointly learning of multiple dense prediction tasks on partially annotated data (i.e. not all the task labels are available for each image), which we call multi-task partially-supervised learning. |
Wei-Hong Li; Xialei Liu; Hakan Bilen; |
1919 | Pushing The Performance Limit of Scene Text Recognizer Without Human Annotation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to boost STR models by leveraging both synthetic data and the numerous real unlabeled images, exempting human annotation cost thoroughly. |
Caiyuan Zheng; Hui Li; Seon-Min Rhee; Seungju Han; Jae-Joon Han; Peng Wang; |
1920 | Boosting 3D Object Detection By Simulating Multimodality on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. |
Wu Zheng; Mingxuan Hong; Li Jiang; Chi-Wing Fu; |
1921 | Towards Low-Cost and Efficient Malaria Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a dataset to further the research on malaria microscopy over the low-cost microscopes at low magnification. |
Waqas Sultani; Wajahat Nawaz; Syed Javed; Muhammad Sohail Danish; Asma Saadia; Mohsen Ali; |
1922 | Learning Neural Light Fields With Ray-Space Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel neural light field representation that, in contrast to prior work, is fast, memory efficient, and excels at modeling complicated view dependence. |
Benjamin Attal; Jia-Bin Huang; Michael Zollhöfer; Johannes Kopf; Changil Kim; |
1923 | Exposure Normalization and Compensation for Multiple-Exposure Correction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Besides, to further alleviate the imbalanced performance caused by variations in the optimization process, we introduce a parameter regularization fine-tuning strategy to improve the performance of the worst-performed exposure without degrading other exposures. |
Jie Huang; Yajing Liu; Xueyang Fu; Man Zhou; Yang Wang; Feng Zhao; Zhiwei Xiong; |
1924 | UDA-COPE: Unsupervised Domain Adaptation for Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Learning to estimate object pose often requires ground-truth (GT) labels, such as CAD model and absolute-scale object pose, which is expensive and laborious to obtain in the real world. To tackle this problem, we propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called UDA-COPE. |
Taeyeop Lee; Byeong-Uk Lee; Inkyu Shin; Jaesung Choe; Ukcheol Shin; In So Kweon; Kuk-Jin Yoon; |
1925 | Learning Non-Target Knowledge for Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing studies in few-shot semantic segmentation only focus on mining the target object information, however, often are hard to tell ambiguous regions, especially in non-target regions, which include background (BG) and Distracting Objects (DOs). To alleviate this problem, we propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query. |
Yuanwei Liu; Nian Liu; Qinglong Cao; Xiwen Yao; Junwei Han; Ling Shao; |
1926 | TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. |
Xuyang Bai; Zeyu Hu; Xinge Zhu; Qingqiu Huang; Yilun Chen; Hongbo Fu; Chiew-Lan Tai; |
1927 | Real-Time Hyperspectral Imaging in Hardware Via Trained Metasurface Encoders Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work introduces Hyplex, a new integrated architecture addressing the limitations discussed above. |
Maksim Makarenko; Arturo Burguete-Lopez; Qizhou Wang; Fedor Getman; Silvio Giancola; Bernard Ghanem; Andrea Fratalocchi; |
1928 | Clean Implicit 3D Structure From Noisy 2D STEM Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, these 2D images can be too noisy to be fused into a useful 3D structure and facilitating good denoisers is challenging due to the lack of clean-noisy pairs. Additionally, representing detailed 3D structure can be difficult even for clean data when using regular 3D grids. Addressing these two limitations, we suggest a differentiable image formation model for STEM, allowing to learn a joint model of 2D sensor noise in STEM together with an implicit 3D model. |
Hannah Kniesel; Timo Ropinski; Tim Bergner; Kavitha Shaga Devan; Clarissa Read; Paul Walther; Tobias Ritschel; Pedro Hermosilla; |
1929 | UKPGAN: A General Self-Supervised Keypoint Detector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we reckon keypoint detection as information compression, and force the model to distill out important points of an object. |
Yang You; Wenhai Liu; Yanjie Ze; Yong-Lu Li; Weiming Wang; Cewu Lu; |
1930 | Learning Optimal K-Space Acquisition and Reconstruction Using Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel framework to learn optimized k-space sampling trajectories using deep learning by considering it as an Ordinary Differential Equation (ODE) problem that can be solved using neural ODE. |
Wei Peng; Li Feng; Guoying Zhao; Fang Liu; |
1931 | Leveraging Adversarial Examples To Quantify Membership Information Leakage Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The strategy we propose consists of measuring the magnitude of a perturbation necessary to build an adversarial example. |
Ganesh Del Grosso; Hamid Jalalzai; Georg Pichler; Catuscia Palamidessi; Pablo Piantanida; |
1932 | Raw High-Definition Radar for Multi-Task Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel HD radar sensing model, FFT-RadNet, that eliminates the overhead of computing the range-azimuth-Doppler 3D tensor, learning instead to recover angles from a range-Doppler spectrum. |
Julien Rebut; Arthur Ouaknine; Waqas Malik; Patrick Pérez; |
1933 | Point-NeRF: Point-Based Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Volumetric neural rendering methods like NeRF generate high-quality view synthesis results but are optimized per-scene leading to prohibitive reconstruction time. On the other hand, deep multi-view stereo methods can quickly reconstruct scene geometry via direct network inference. Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural features, to model a radiance field. |
Qiangeng Xu; Zexiang Xu; Julien Philip; Sai Bi; Zhixin Shu; Kalyan Sunkavalli; Ulrich Neumann; |
1934 | Contextual Debiasing for Visual Recognition With Causal Mechanisms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a simple but effective framework employing causal inference to mitigate contextual bias. |
Ruyang Liu; Hao Liu; Ge Li; Haodi Hou; TingHao Yu; Tao Yang; |
1935 | Complex Video Action Reasoning Via Learnable Markov Logic Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The main contributions of this work are two-fold: 1) Different from existing black-box models, the proposed model simultaneously implements the localization of temporal boundaries and the recognition of action categories by grounding the logical rules of MLN in videos. |
Yang Jin; Linchao Zhu; Yadong Mu; |
1936 | Per-Clip Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The scheme provides two potential benefits: accuracy gain by clip-level optimization and efficiency gain by parallel computation of multiple frames. To this end, we propose a new method tailored for the per-clip inference. |
Kwanyong Park; Sanghyun Woo; Seoung Wug Oh; In So Kweon; Joon-Young Lee; |
1937 | Exploring Set Similarity for Dense Self-Supervised Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the pixel-level correspondence tends to be noisy because of many similar misleading pixels, e.g., backgrounds. To address this issue, in this paper, we propose to explore set similarity (SetSim) for dense self-supervised representation learning. |
Zhaoqing Wang; Qiang Li; Guoxin Zhang; Pengfei Wan; Wen Zheng; Nannan Wang; Mingming Gong; Tongliang Liu; |
1938 | Coarse-To-Fine Feature Mining for Video Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there is no research about how to simultaneously learn static and motional contexts which are highly correlated and complementary to each other. To address this problem, we propose a Coarse-to-Fine Feature Mining (CFFM) technique to learn a unified presentation of static contexts and motional contexts. |
Guolei Sun; Yun Liu; Henghui Ding; Thomas Probst; Luc Van Gool; |
1939 | ONCE-3DLanes: Building Monocular 3D Lane Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing 3D lane detection datasets are either unpublished or synthesized from a simulated environment, severely hampering the development of this field. In this paper, we take steps towards addressing these issues. |
Fan Yan; Ming Nie; Xinyue Cai; Jianhua Han; Hang Xu; Zhen Yang; Chaoqiang Ye; Yanwei Fu; Michael Bi Mi; Li Zhang; |
1940 | Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space as well as a parametric bird’s-eye-view (BEV) space. |
Buyu Liu; Bingbing Zhuang; Manmohan Chandraker; |
1941 | Compressing Models With Few Samples: Mimicking Then Replacing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher’s in the penultimate layer, and then replaces teacher’s layers before penultimate with a well-tuned compact one. |
Huanyu Wang; Junjie Liu; Xin Ma; Yang Yong; Zhenhua Chai; Jianxin Wu; |
1942 | FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose FedCor—an FL framework built on a correlation-based client selection strategy, to boost the convergence rate of FL. |
Minxue Tang; Xuefei Ning; Yitu Wang; Jingwei Sun; Yu Wang; Hai Li; Yiran Chen; |
1943 | Modulated Contrast for Versatile Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents MoNCE, a versatile metric that introduces image contrast to learn a calibrated metric for the perception of multifaceted inter-image distances. |
Fangneng Zhan; Jiahui Zhang; Yingchen Yu; Rongliang Wu; Shijian Lu; |
1944 | PokeBNN: A Binary Pursuit of Lightweight Accuracy Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual paths, and tuning the activation function. |
Yichi Zhang; Zhiru Zhang; Lukasz Lew; |
1945 | HumanNeRF: Efficiently Generated Human Radiance Field From Sparse Inputs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present HumanNeRF – a neural representation with efficient generalization ability – for high-fidelity free-view synthesis of dynamic humans. |
Fuqiang Zhao; Wei Yang; Jiakai Zhang; Pei Lin; Yingliang Zhang; Jingyi Yu; Lan Xu; |
1946 | Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Apart from high intrinsic similarity between the camouflaged objects and their background, the objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To deal with these problems, we propose a mixed-scale triplet network, ZoomNet, which mimics the behavior of humans when observing vague images, i.e., zooming in and out. |
Youwei Pang; Xiaoqi Zhao; Tian-Zhu Xiang; Lihe Zhang; Huchuan Lu; |
1947 | Identifying Ambiguous Similarity Conditions Via Semantic Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For example, the previous comparison becomes invalid once the conditional label changes to "is vehicle". To this end, we introduce a novel evaluation criterion by predicting the comparison’s correctness after assigning the learned embeddings to their optimal conditions, which measures how much WS-CSL could cover latent semantics as the supervised model. |
Han-Jia Ye; Yi Shi; De-Chuan Zhan; |
1948 | MISF: Multi-Level Interactive Siamese Filtering for High-Fidelity Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the issues while adopting the respective advantages, we propose a novel filtering technique, i.e., Multi-level Interactive Siamese Filtering (MISF) containing two branches: kernel prediction branch (KPB) and semantic & image filtering branch (SIFB). |
Xiaoguang Li; Qing Guo; Di Lin; Ping Li; Wei Feng; Song Wang; |
1949 | Cascade Transformers for End-to-End Person Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the Cascade Occluded Attention Transformer (COAT) for end-to-end person search. |
Rui Yu; Dawei Du; Rodney LaLonde; Daniel Davila; Christopher Funk; Anthony Hoogs; Brian Clipp; |
1950 | MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, they are limited to single-scale feature resolution, providing suboptimal performance in scenes containing humans, objects, and their interactions with vastly different scales and distances. To tackle this problem, we propose a Multi-Scale TRansformer (MSTR) for HOI detection powered by two novel HOI-aware deformable attention modules called Dual-Entity attention and Entity-conditioned Context attention. |
Bumsoo Kim; Jonghwan Mun; Kyoung-Woon On; Minchul Shin; Junhyun Lee; Eun-Sol Kim; |
1951 | LSVC: A Learning-Based Stereo Video Compression Framework Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose the first end-to-end optimized framework for compressing automotive stereo videos (i.e., stereo videos from autonomous driving applications) from both left and right views. |
Zhenghao Chen; Guo Lu; Zhihao Hu; Shan Liu; Wei Jiang; Dong Xu; |
1952 | How Do You Do It? Fine-Grained Action Understanding With Pseudo-Adverbs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We aim to understand how actions are performed and identify subtle differences, such as ‘fold firmly’ vs. ‘fold gently’. To this end, we propose a method which recognizes adverbs across different actions. |
Hazel Doughty; Cees G. M. Snoek; |
1953 | InsetGAN for Full-Body Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While GANs can produce photo-realistic images in ideal conditions for certain domains, the generation of full-body human images remains difficult due to the diversity of identities, hairstyles, clothing, and the variance in pose. Instead of modeling this complex domain with a single GAN, we propose a novel method to combine multiple pretrained GANs, where one GAN generates a global canvas (e.g., human body) and a set of specialized GANs, or insets, focus on different parts (e.g., faces, shoes) that can be seamlessly inserted onto the global canvas. |
Anna Frühstück; Krishna Kumar Singh; Eli Shechtman; Niloy J. Mitra; Peter Wonka; Jingwan Lu; |
1954 | DetectorDetective: Investigating The Effects of Adversarial Examples on Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose DetectorDetective, an interactive visual tool that aims to help users better understand the behaviors of a model as adversarial images journey through an object detector. |
Sivapriya Vellaichamy; Matthew Hull; Zijie J. Wang; Nilaksh Das; ShengYun Peng; Haekyu Park; Duen Horng (Polo) Chau; |
1955 | SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel MSI representation called Soft Occlusion MSI (SOMSI) that enables modelling high-dimensional appearance features in MSI while retaining the fast rendering times of a standard MSI. |
Tewodros Habtegebrial; Christiano Gava; Marcel Rogge; Didier Stricker; Varun Jampani; |
1956 | EMScore: Evaluating Video Captioning Via Coarse-Grained and Fine-Grained Embedding Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions. |
Yaya Shi; Xu Yang; Haiyang Xu; Chunfeng Yuan; Bing Li; Weiming Hu; Zheng-Jun Zha; |
1957 | SNR-Aware Low-Light Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a new solution for low-light image enhancement by collectively exploiting Signal-to-Noise-Ratio-aware transformers and convolutional models to dynamically enhance pixels with spatial-varying operations. |
Xiaogang Xu; Ruixing Wang; Chi-Wing Fu; Jiaya Jia; |
1958 | 3D Common Corruptions and Data Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a set of image transformations that can be used as corruptions to evaluate the robustness of models as well as data augmentation mechanisms for training neural networks. |
Oğuzhan Fatih Kar; Teresa Yeo; Andrei Atanov; Amir Zamir; |
1959 | PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing dual-loop learning framework. |
Kehong Gong; Bingbing Li; Jianfeng Zhang; Tao Wang; Jing Huang; Michael Bi Mi; Jiashi Feng; Xinchao Wang; |
1960 | Injecting Semantic Concepts Into End-to-End Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features. |
Zhiyuan Fang; Jianfeng Wang; Xiaowei Hu; Lin Liang; Zhe Gan; Lijuan Wang; Yezhou Yang; Zicheng Liu; |
1961 | An Efficient Training Approach for Very Large Scale Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The heavy computational and memory costs mainly result from the million-level dimensionality of thefully connected (FC) layer. To this end, we propose a novel training approach, termed Faster Face Classification (F2C), to alleviate time and cost without sacrificing the performance. |
Kai Wang; Shuo Wang; Panpan Zhang; Zhipeng Zhou; Zheng Zhu; Xiaobo Wang; Xiaojiang Peng; Baigui Sun; Hao Li; Yang You; |
1962 | Long-Term Video Frame Interpolation Via Feature Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, VFI works perform well for small frame gaps and perform poorly as the frame gap increases. In this work, we propose a novel framework to address this problem. |
Dawit Mureja Argaw; In So Kweon; |
1963 | Coarse-To-Fine Q-Attention: Efficient Learning for Visual Robotic Manipulation Via Discretisation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actor-critic methods in continuous robotics domains. |
Stephen James; Kentaro Wada; Tristan Laidlow; Andrew J. Davison; |
1964 | Event-Aided Direct Sparse Odometry Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce EDS, a direct monocular visual odometry using events and frames. |
Javier Hidalgo-Carrió; Guillermo Gallego; Davide Scaramuzza; |
1965 | Group Contextualization for Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an efficient feature refinement method that decomposes the feature channels into several groups and separately refines them with different axial contexts in parallel. |
Yanbin Hao; Hao Zhang; Chong-Wah Ngo; Xiangnan He; |
1966 | Single-Domain Generalized Object Detection in Urban Scene Via Cyclic-Disentangled Self-Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we are concerned with enhancing the generalization capability of object detectors. |
Aming Wu; Cheng Deng; |
1967 | Visual Abductive Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. |
Chen Liang; Wenguan Wang; Tianfei Zhou; Yi Yang; |
1968 | L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining. |
Peng-Tao Jiang; Yuqi Yang; Qibin Hou; Yunchao Wei; |
1969 | Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, unifying the overall architecture under the Bayesian perspective can make the architecture have a rigorous theoretical basis, so that each part of the architecture can have a clear probabilistic interpretation. Therefore, to solve the problems, we propose a new generative Bayesian deep learning (GBDL) architecture. |
Jianfeng Wang; Thomas Lukasiewicz; |
1970 | Continual Learning With Lifelong Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel attention-based framework Lifelong Vision Transformer (LVT), to achieve a better stability-plasticity trade-off for continual learning. |
Zhen Wang; Liu Liu; Yiqun Duan; Yajing Kong; Dacheng Tao; |
1971 | MPViT: Multi-Path Vision Transformer for Dense Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, with a different perspective from existing Transformers, we explore multi-scale patch embedding and multi-path structure, constructing the Multi-Path Vision Transformer (MPViT). |
Youngwan Lee; Jonghee Kim; Jeffrey Willette; Sung Ju Hwang; |
1972 | NICGSlowDown: Evaluating The Efficiency Robustness of Neural Image Caption Generation Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To further understand such efficiency-oriented threats, we propose a new attack approach, NICGSlowDown, to evaluate the efficiency robustness of NICG models. |
Simin Chen; Zihe Song; Mirazul Haque; Cong Liu; Wei Yang; |
1973 | Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image. |
Shreyas Hampali; Sayan Deb Sarkar; Mahdi Rad; Vincent Lepetit; |
1974 | SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way. |
Yichun Shi; Xiao Yang; Yangyue Wan; Xiaohui Shen; |
1975 | Accurate 3D Body Shape Regression Using Metric and Semantic Attributes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We exploit the anthropometric measurements and linguistic shape attributes in several novel ways to train a neural network, called SHAPY, that regresses 3D human pose and shape from an RGB image. |
Vasileios Choutas; Lea Müller; Chun-Hao P. Huang; Siyu Tang; Dimitrios Tzionas; Michael J. Black; |
1976 | VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. |
Estelle Aflalo; Meng Du; Shao-Yen Tseng; Yongfei Liu; Chenfei Wu; Nan Duan; Vasudev Lal; |
1977 | Label-Only Model Inversion Attacks Via Boundary Repulsion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert private training data using only the target model’s predicted labels. |
Mostafa Kahla; Si Chen; Hoang Anh Just; Ruoxi Jia; |
1978 | Privacy-Preserving Online AutoML for Domain-Specific Face Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce "HyperFD", a new privacy-preserving online AutoML framework for face detection. |
Chenqian Yan; Yuge Zhang; Quanlu Zhang; Yaming Yang; Xinyang Jiang; Yuqing Yang; Baoyuan Wang; |
1979 | Self-Augmented Unpaired Image Dehazing Via Density and Depth Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a self-augmented image dehazing framework, termed D^4 (Dehazing via Decomposing transmission map into Density and Depth) for haze generation and removal. |
Yang Yang; Chaoyue Wang; Risheng Liu; Lin Zhang; Xiaojie Guo; Dacheng Tao; |
1980 | Neural 3D Video Synthesis From Multi-View Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation. |
Tianye Li; Mira Slavcheva; Michael Zollhöfer; Simon Green; Christoph Lassner; Changil Kim; Tanner Schmidt; Steven Lovegrove; Michael Goesele; Richard Newcombe; Zhaoyang Lv; |
1981 | LiDAR Snowfall Simulation for Robust 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the problem of LiDAR-based 3D object detection under snowfall. |
Martin Hahner; Christos Sakaridis; Mario Bijelic; Felix Heide; Fisher Yu; Dengxin Dai; Luc Van Gool; |
1982 | Learning Where To Learn in Cross-View Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial information of features, so that the projected embeddings could be exactly aligned and thus guide the feature learning better. |
Lang Huang; Shan You; Mingkai Zheng; Fei Wang; Chen Qian; Toshihiko Yamasaki; |
1983 | SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this technique, we propose SemAffiNet for point cloud semantic segmentation, which utilizes the attention mechanism in the Transformer module to implicitly and explicitly capture global structural knowledge within local parts for overall comprehension of each category. |
Ziyi Wang; Yongming Rao; Xumin Yu; Jie Zhou; Jiwen Lu; |
1984 | Sparse Object-Level Supervision for Instance Segmentation With Pixel Embeddings Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to address the dense annotation bottleneck by introducing a proposal-free segmentation approach based on non-spatial embeddings, which exploits the structure of the learned embedding space to extract individual instances in a differentiable way. |
Adrian Wolny; Qin Yu; Constantin Pape; Anna Kreshuk; |
1985 | How Much More Data Do I Need? Estimating Requirements for Downstream Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we consider a broad class of computer vision tasks and systematically investigate a family of functions that generalize the power-law function to allow for better estimation of data requirements. |
Rafid Mahmood; James Lucas; David Acuna; Daiqing Li; Jonah Philion; Jose M. Alvarez; Zhiding Yu; Sanja Fidler; Marc T. Law; |
1986 | Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for Semantic Segmentation. |
Deyi Ji; Haoran Wang; Mingyuan Tao; Jianqiang Huang; Xian-Sheng Hua; Hongtao Lu; |
1987 | Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search. |
Han Xiao; Ziwei Wang; Zheng Zhu; Jie Zhou; Jiwen Lu; |
1988 | The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we explore how, from a bundle of these measurements acquired during viewfinding, we can combine dense micro-baseline parallax cues with kilopixel LiDAR depth to distill a high-fidelity depth map. |
Ilya Chugunov; Yuxuan Zhang; Zhihao Xia; Xuaner Zhang; Jiawen Chen; Felix Heide; |
1989 | Learning What Not To Segment: A New Perspective on Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a fresh and straightforward insight to alleviate the problem. |
Chunbo Lang; Gong Cheng; Binfei Tu; Junwei Han; |
1990 | Blended Diffusion for Text-Driven Editing of Natural Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. |
Omri Avrahami; Dani Lischinski; Ohad Fried; |
1991 | Towards Unsupervised Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we study a novel generalization problem called unsupervised domain generalization (UDG), which aims to learn generalizable models with unlabeled data and analyze the effects of pre-training on DG. |
Xingxuan Zhang; Linjun Zhou; Renzhe Xu; Peng Cui; Zheyan Shen; Haoxin Liu; |
1992 | HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel attention mechanism for pansharpening called HyperTransformer, in which features of LR-HSI and PAN are formulated as queries and keys in a transformer, respectively. |
Wele Gedara Chaminda Bandara; Vishal M. Patel; |
1993 | Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents Segment-Fusion, a novel attention-based method for hierarchical fusion of semantic and instance information to address the part misclassifications. |
Anirud Thyagharajan; Benjamin Ummenhofer; Prashant Laddha; Om Ji Omer; Sreenivas Subramoney; |
1994 | Robust Invertible Image Steganography Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel flow-based framework for robust invertible image steganography, dubbed as RIIS. |
Youmin Xu; Chong Mou; Yujie Hu; Jingfen Xie; Jian Zhang; |
1995 | Entropy-Based Active Learning for Object Detection With Progressive Diversity Constraint Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Active learning for object detection is more challenging and existing efforts on it are relatively rare. In this paper, we propose a novel hybrid approach to address this problem, where the instance-level uncertainty and diversity are jointly considered in a bottom-up manner. |
Jiaxi Wu; Jiaxin Chen; Di Huang; |
1996 | BE-STI: Spatial-Temporal Integrated Network for Class-Agnostic Motion Prediction With Bidirectional Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Spatial-Temporal Integrated network with Bidirectional Enhancement, BE-STI, to improve the temporal motion prediction performance by spatial semantic features, which points out an efficient way to combine semantic segmentation and motion prediction. |
Yunlong Wang; Hongyu Pan; Jun Zhu; Yu-Huan Wu; Xin Zhan; Kun Jiang; Diange Yang; |
1997 | A Structured Dictionary Perspective on Implicit Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel unified perspective to theoretically analyse INRs. |
Gizem Yüce; Guillermo Ortiz-Jiménez; Beril Besbinar; Pascal Frossard; |
1998 | Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel end-to-end deep learning approach that is able to give robust voice activity detection and localization results. |
Hao Jiang; Calvin Murdock; Vamsi Krishna Ithapu; |
1999 | Vision-Language Pre-Training With Triple Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose triple contrastive learning (TCL) for vision-language pre-training by leveraging both cross-modal and intra-modal self-supervision. |
Jinyu Yang; Jiali Duan; Son Tran; Yi Xu; Sampath Chanda; Liqun Chen; Belinda Zeng; Trishul Chilimbi; Junzhou Huang; |
2000 | Structure-Aware Flow Generation for Human Body Reshaping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the complicated structure and multifarious appearance of human bodies, existing methods either fall back on the 3D domain via body morphable model or resort to keypoint-based image deformation, leading to inefficiency and unsatisfied visual quality. In this paper, we address these limitations by formulating an end-to-end flow generation architecture under the guidance of body structural priors, including skeletons and Part Affinity Fields, and achieve unprecedentedly controllable performance under arbitrary poses and garments. |
Jianqiang Ren; Yuan Yao; Biwen Lei; Miaomiao Cui; Xuansong Xie; |
2001 | Practical Learned Lossless JPEG Recompression With Multi-Level Cross-Channel Entropy Model in The DCT Domain Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning based JPEG recompression method that operates on DCT domain and propose a Multi-Level Cross-Channel Entropy Model to compress the most informative Y component. |
Lina Guo; Xinjie Shi; Dailan He; Yuanyuan Wang; Rui Ma; Hongwei Qin; Yan Wang; |
2002 | Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel Fourier PlenOctree (FPO) technique to tackle efficient neural modeling and real-time rendering of dynamic scenes captured under the free-view video (FVV) setting. |
Liao Wang; Jiakai Zhang; Xinhang Liu; Fuqiang Zhao; Yanshun Zhang; Yingliang Zhang; Minye Wu; Jingyi Yu; Lan Xu; |
2003 | Learning To Answer Questions in Dynamic Audio-Visual Scenarios Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. |
Guangyao Li; Yake Wei; Yapeng Tian; Chenliang Xu; Ji-Rong Wen; Di Hu; |
2004 | Leveraging Equivariant Features for Absolute Pose Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we demonstrate how a translation and rotation equivariant Convolutional Neural Network directly induces representations of camera motions into the feature space. |
Mohamed Adel Musallam; Vincent Gaudillière; Miguel Ortiz del Castillo; Kassem Al Ismaeil; Djamila Aouada; |
2005 | Synthetic Aperture Imaging With Events and Frames Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the performance of E-SAI is not consistent under sparse occlusions due to the dramatic decrease of signal events. This paper addresses this problem by leveraging the merits of both events and frames, leading to a fusion-based SAI (EF-SAI) that performs consistently under the different densities of occlusions. |
Wei Liao; Xiang Zhang; Lei Yu; Shijie Lin; Wen Yang; Ning Qiao; |
2006 | CLIP-Event: Connecting Text and Images With Event Structures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a contrastive learning framework to enforce vision-language pretraining models to comprehend events and associated argument (participant) roles. |
Manling Li; Ruochen Xu; Shuohang Wang; Luowei Zhou; Xudong Lin; Chenguang Zhu; Michael Zeng; Heng Ji; Shih-Fu Chang; |
2007 | MonoGround: Detecting Monocular 3D Objects From The Ground Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the ill-posed 2D to 3D mapping essence from the monocular imaging process, monocular 3D object detection suffers from inaccurate depth estimation and thus has poor 3D detection results. To alleviate this problem, we propose to introduce the ground plane as a prior in the monocular 3d object detection. |
Zequn Qin; Xi Li; |
2008 | Deep Visual Geo-Localization Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new open-source benchmarking framework for Visual Geo-localization (VG) that allows to build, train, and test a wide range of commonly used architectures, with the flexibility to change individual components of a geo-localization pipeline. |
Gabriele Berton; Riccardo Mereu; Gabriele Trivigno; Carlo Masone; Gabriela Csurka; Torsten Sattler; Barbara Caputo; |
2009 | Scaling Up Vision-Language Pre-Training for Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning. |
Xiaowei Hu; Zhe Gan; Jianfeng Wang; Zhengyuan Yang; Zicheng Liu; Yumao Lu; Lijuan Wang; |
2010 | Semiconductor Defect Detection By Hybrid Classical-Quantum Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we utilize the information processing advantages of quantum computing to promote the defect learning defect review (DLDR). |
Yuan-Fu Yang; Min Sun; |
2011 | StyleGAN-V: A Continuous Video Generator With The Price, Image Quality and Perks of StyleGAN2 Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Videos show continuous events, yet most — if not all — video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be — time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. |
Ivan Skorokhodov; Sergey Tulyakov; Mohamed Elhoseiny; |
2012 | Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To approach realistic practicality, we propose the first gray-box and physically realizable weights attack algorithm for backdoor injection, namely subnet replacement attack (SRA), which only requires architecture information of the victim model and can support physical triggers in the real world. |
Xiangyu Qi; Tinghao Xie; Ruizhe Pan; Jifeng Zhu; Yong Yang; Kai Bu; |
2013 | Scaling Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While the laws for scaling Transformer language models have been studied, it is unknown how Vision Transformers scale. To address this, we scale ViT models and data, both up and down, and characterize the relationships between error rate, data, and compute. |
Xiaohua Zhai; Alexander Kolesnikov; Neil Houlsby; Lucas Beyer; |
2014 | Unsupervised Action Segmentation By Joint Representation Learning and Online Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel approach for unsupervised activity segmentation which uses video frame clustering as a pretext task and simultaneously performs representation learning and online clustering. |
Sateesh Kumar; Sanjay Haresh; Awais Ahmed; Andrey Konin; M. Zeeshan Zia; Quoc-Huy Tran; |
2015 | Pin The Memory: Learning To Generalize Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework. |
Jin Kim; Jiyoung Lee; Jungin Park; Dongbo Min; Kwanghoon Sohn; |
2016 | LISA: Learning Implicit Shape and Appearance of Hands Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a do-it-all neural model of human hands, named LISA. |
Enric Corona; Tomas Hodan; Minh Vo; Francesc Moreno-Noguer; Chris Sweeney; Richard Newcombe; Lingni Ma; |
2017 | DiGS: Divergence Guided Shape Implicit Neural Representation for Unoriented Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. |
Yizhak Ben-Shabat; Chamin Hewa Koneputugodage; Stephen Gould; |
2018 | Iterative Deep Homography Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Iterative Homography Network, namely IHN, a new deep homography estimation architecture. |
Si-Yuan Cao; Jianxin Hu; Zehua Sheng; Hui-Liang Shen; |
2019 | Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a simple, but effective solution for semantic correspondence, called SemiMatch, that learns the networks in a semi-supervised manner by supplementing few ground-truth correspondences via utilization of a large amount of confident correspondences as pseudo-labels. |
Jiwon Kim; Kwangrok Ryoo; Junyoung Seo; Gyuseong Lee; Daehwan Kim; Hansang Cho; Seungryong Kim; |
2020 | Learned Queries for Efficient Local Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new shift-invariant local attention layer, called query and attend (QnA), that aggregates the input locally in an overlapping manner, much like convolutions. |
Moab Arar; Ariel Shamir; Amit H. Bermano; |
2021 | Stereoscopic Universal Perturbations Across Different Architectures and Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method to craft a single set of perturbations that, when added to any stereo image pair in a dataset, can fool a stereo network to significantly alter the perceived scene geometry. |
Zachary Berger; Parth Agrawal; Tian Yu Liu; Stefano Soatto; Alex Wong; |
2022 | Colar: Effective and Efficient Online Action Detection By Consulting Exemplars Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper develops an effective exemplar-consultation mechanism that first measures the similarity between a frame and exemplary frames, and then aggregates exemplary features based on the similarity weights. |
Le Yang; Junwei Han; Dingwen Zhang; |
2023 | AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Others try to use conventional task-agnostic approaches designed for domain generalization problems with no task prior knowledge considered. To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered. |
Xueyi Liu; Xiaomeng Xu; Anyi Rao; Chuang Gan; Li Yi; |
2024 | DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: With DeltaCNN, we present a sparse convolutional neural network framework that enables sparse frame-by-frame updates to accelerate video inference in practice. |
Mathias Parger; Chengcheng Tang; Christopher D. Twigg; Cem Keskin; Robert Wang; Markus Steinberger; |
2025 | HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the vanishing gradient in extreme scenarios, e.g., structural missing pixels, we introduce a parametric total variation regularization to constrain the DNN parameters and the tensor factor parameters with theoretical analysis. |
Yisi Luo; Xi-Le Zhao; Deyu Meng; Tai-Xiang Jiang; |
2026 | Leveraging Self-Supervision for Cross-Domain Crowd Counting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unfortunately, due to domain shift, the resulting models generalize poorly on real imagery. We remedy this shortcoming by training with both synthetic images, along with their associated labels, and unlabeled real images. |
Weizhe Liu; Nikita Durasov; Pascal Fua; |
2027 | MNSRNet: Multimodal Transformer Network for 3D Surface Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Due to the unstructured and irregular nature of 3D object data, it is usually difficult to obtain high-quality surface details and geometry textures at a low cost. In this article, we propose an effective multimodal-driven deep neural network to perform 3D surface super-resolution in 2D normal domain, which is simple, accurate, and robust to the above difficulty. |
Wuyuan Xie; Tengcong Huang; Miaohui Wang; |
2028 | Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the problem in a completely different way by considering a random inference model, where we model the mean and variance functions of the variational posterior as random Gaussian processes (GP). |
Minyoung Kim; |
2029 | PlaneMVS: 3D Plane Reconstruction From Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses. |
Jiachen Liu; Pan Ji; Nitin Bansal; Changjiang Cai; Qingan Yan; Xiaolei Huang; Yi Xu; |
2030 | Scene Graph Expansion for Semantics-Guided Image Outpainting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the task of semantics-guided image outpainting, which is to complete an image by generating semantically practical content. |
Chiao-An Yang; Cheng-Yo Tan; Wan-Cyuan Fan; Cheng-Fu Yang; Meng-Lin Wu; Yu-Chiang Frank Wang; |
2031 | SoftGroup for 3D Instance Segmentation on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the aforementioned problems, this paper proposes a 3D instance segmentation method referred to as SoftGroup by performing bottom-up soft grouping followed by top-down refinement. |
Thang Vu; Kookhoi Kim; Tung M. Luu; Thanh Nguyen; Chang D. Yoo; |
2032 | SharpContour: A Contour-Based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an efficient contour-based boundary refinement approach, named SharpContour, to tackle the segmentation of boundary area. |
Chenming Zhu; Xuanye Zhang; Yanran Li; Liangdong Qiu; Kai Han; Xiaoguang Han; |
2033 | MVS2D: Efficient Multi-View Stereo Via Attention-Driven 2D Convolutions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present MVS2D, a highly efficient multi-view stereo algorithm that seamlessly integrates multi-view constraints into single-view networks via an attention mechanism. |
Zhenpei Yang; Zhile Ren; Qi Shan; Qixing Huang; |
2034 | FIBA: Frequency-Injection Based Backdoor Attack in Medical Image Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most existing BA methods are designed to attack natural image classification models, which apply spatial triggers to training images and inevitably corrupt the semantics of poisoned pixels, leading to the failures of attacking dense prediction models. To address this issue, we propose a novel Frequency-Injection based Backdoor Attack method (FIBA) that is capable of delivering attacks in various MIA tasks. |
Yu Feng; Benteng Ma; Jing Zhang; Shanshan Zhao; Yong Xia; Dacheng Tao; |
2035 | Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation Via Semantic Knowledge Transfer and Self-Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel approach including two innovative components. |
Beomyoung Kim; YoungJoon Yoo; Chae Eun Rhee; Junmo Kim; |
2036 | Bridged Transformer for Vision and Point Cloud 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To that end, we propose Bridged Transformer (BrT), an end-to-end architecture for 3D object detection. |
Yikai Wang; TengQi Ye; Lele Cao; Wenbing Huang; Fuchun Sun; Fengxiang He; Dacheng Tao; |
2037 | Deep Constrained Least Squares for Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the problem of blind image super-resolution(SR) with a reformulated degradation model and two novel modules. |
Ziwei Luo; Haibin Huang; Lei Yu; Youwei Li; Haoqiang Fan; Shuaicheng Liu; |
2038 | EDTER: Edge Detection With Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we propose a novel transformer-based edge detector, Edge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges by exploiting the full image context information and detailed local cues simultaneously. |
Mengyang Pu; Yaping Huang; Yuming Liu; Qingji Guan; Haibin Ling; |
2039 | Fine-Tuning Global Model Via Data-Free Knowledge Distillation for Non-IID Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instead, we propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation. |
Lin Zhang; Li Shen; Liang Ding; Dacheng Tao; Ling-Yu Duan; |
2040 | JIFF: Jointly-Aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on improving the quality of face in the reconstruction and propose a novel Jointly-aligned Implicit Face Function (JIFF) that combines the merits of the implicit function based approach and model based approach. |
Yukang Cao; Guanying Chen; Kai Han; Wenqi Yang; Kwan-Yee K. Wong; |
2041 | Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them From 2D Renderings Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unfortunately, retrieving messages from 2D renderings of 3D meshes is still challenging and underexplored. We introduce a novel end-to-end learning framework to solve this problem through: 1) an encoder to covertly embed messages in both mesh geometry and textures; 2) a differentiable renderer to render watermarked 3D objects from different camera angles and under varied lighting conditions; 3) a decoder to recover the messages from 2D rendered images. |
Innfarn Yoo; Huiwen Chang; Xiyang Luo; Ondrej Stava; Ce Liu; Peyman Milanfar; Feng Yang; |
2042 | Beyond A Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The assumption that such outputs can represent all necessary information is unrealistic, especially when the detector is transferred across datasets. In this work, we reason about the graphical model induced by this assumption, and propose to add an auxiliary input to represent missing information such as object relationships. |
Chia-Wen Kuo; Zsolt Kira; |
2043 | Symmetry-Aware Neural Architecture for Embodied Visual Exploration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we attempt to improve the generalization ability by utilizing the inductive biases available for the task. |
Shuang Liu; Takayuki Okatani; |
2044 | AirObject: A Temporally Evolving Graph Embedding for Object Identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Furthermore, given the vast distribution of unknown novel objects in the real world, the object identification process must be class-agnostic. In this context, we propose a novel temporal 3D object encoding approach, dubbed AirObject, to obtain global keypoint graph-based embeddings of objects. |
Nikhil Varma Keetha; Chen Wang; Yuheng Qiu; Kuan Xu; Sebastian Scherer; |
2045 | From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). |
Jiangtong Li; Li Niu; Liqing Zhang; |
2046 | Semantic-Aware Domain Generalized Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address domain generalized semantic segmentation, where a segmentation model is trained to be domain-invariant without using any target domain data. |
Duo Peng; Yinjie Lei; Munawar Hayat; Yulan Guo; Wen Li; |
2047 | TransVPR: Transformer-Based Place Recognition With Multi-Level Attention Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a novel holistic place recognition model, TransVPR, based on vision Transformers. |
Ruotong Wang; Yanqing Shen; Weiliang Zuo; Sanping Zhou; Nanning Zheng; |
2048 | DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In response to such bias, we would like to re-emphasize that methods for multi-object tracking should also work when object appearance is not sufficiently discriminative. To this end, we propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation. |
Peize Sun; Jinkun Cao; Yi Jiang; Zehuan Yuan; Song Bai; Kris Kitani; Ping Luo; |
2049 | Unsupervised Learning of Debiased Representations With Pseudo-Attributes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although existing works often handle this issue using human supervision, the availability of the proper annotations is impractical and even unrealistic. To better tackle this challenge, we propose a simple but effective debiasing technique in an unsupervised manner. |
Seonguk Seo; Joon-Young Lee; Bohyung Han; |
2050 | Protecting Celebrities From DeepFake With Identity Consistency Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions. |
Xiaoyi Dong; Jianmin Bao; Dongdong Chen; Ting Zhang; Weiming Zhang; Nenghai Yu; Dong Chen; Fang Wen; Baining Guo; |
2051 | Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show how the global reasoning of (scaled) dot-product attention can be the source of a major vulnerability when confronted with adversarial patch attacks. |
Giulio Lovisotto; Nicole Finnie; Mauricio Munoz; Chaithanya Kumar Mummadi; Jan Hendrik Metzen; |
2052 | TubeDETR: Spatio-Temporal Video Grounding With Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query. |
Antoine Yang; Antoine Miech; Josef Sivic; Ivan Laptev; Cordelia Schmid; |
2053 | KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While previous works tackle the problem by learning embeddings for the compositions jointly,here we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. |
Shyamgopal Karthik; Massimiliano Mancini; Zeynep Akata; |
2054 | SLIC: Self-Supervised Learning With Iterative Clustering for Human Action Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A typical assumption is that similar clips only occur temporally close within a single video, leading to insufficient examples of motion similarity. To mitigate this, we propose SLIC, a clustering-based self-supervised contrastive learning method for human action videos. |
Salar Hosseini Khorasgani; Yuxuan Chen; Florian Shkurti; |
2055 | CD2-pFed: Cyclic Distillation-Guided Channel Decoupling for Model Personalization in Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose CD^2-pFed, a novel Cyclic Distillation-guided Channel Decoupling framework, to personalize the global model in FL, under various settings of data heterogeneity. |
Yiqing Shen; Yuyin Zhou; Lequan Yu; |
2056 | UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types. To this end, we propose UBnormal, a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. |
Andra Acsintoae; Andrei Florescu; Mariana-Iuliana Georgescu; Tudor Mare; Paul Sumedrea; Radu Tudor Ionescu; Fahad Shahbaz Khan; Mubarak Shah; |
2057 | Beyond Cross-View Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Departing from the conventional wisdom of image retrieval, this paper presents a novel solution that can achieve highly-accurate localization. |
Yujiao Shi; Hongdong Li; |
2058 | Closing The Generalization Gap of Cross-Silo Federated Medical Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel training framework FedSM to avoid the client drift issue and successfully close the generalization gap compared with the centralized training for medical image segmentation tasks for the first time. |
An Xu; Wenqi Li; Pengfei Guo; Dong Yang; Holger R. Roth; Ali Hatamizadeh; Can Zhao; Daguang Xu; Heng Huang; Ziyue Xu; |
2059 | AKB-48: A Real-World Articulated Object Knowledge Base Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes, and largely reduce the cost for object modeling in the real world. |
Liu Liu; Wenqiang Xu; Haoyuan Fu; Sucheng Qian; Qiaojun Yu; Yang Han; Cewu Lu; |
2060 | Style-ERD: Responsive and Coherent Online Motion Style Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, for online animation applications, such as real-time avatar animation from motion capture, motions need to be processed as a stream with minimal latency. In this work, we realize a flexible, high-quality motion style transfer method for this setting. |
Tianxin Tao; Xiaohang Zhan; Zhongquan Chen; Michiel van de Panne; |
2061 | Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In essence, this strategy ignores the fact that two crops may truly contain different image information, e.g., background and small objects, and thus tends to restrain the diversity of the learned representations. In this work, we address this issue by introducing a new self-supervised learning strategy, LoGo, that explicitly reasons about Lo cal and G l o bal crops. |
Tong Zhang; Congpei Qiu; Wei Ke; Sabine Süsstrunk; Mathieu Salzmann; |
2062 | Stratified Transformer for 3D Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance. |
Xin Lai; Jianhui Liu; Li Jiang; Liwei Wang; Hengshuang Zhao; Shu Liu; Xiaojuan Qi; Jiaya Jia; |
2063 | NeRF in The Dark: High Dynamic Range View Synthesis From Noisy Raw Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We modify NeRF to instead train directly on linear raw images, preserving the scene’s full dynamic range. |
Ben Mildenhall; Peter Hedman; Ricardo Martin-Brualla; Pratul P. Srinivasan; Jonathan T. Barron; |
2064 | DArch: Dental Arch Prior-Assisted 3D Tooth Instance Segmentation With Weak Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing learning-based methods rely heavily on expensive point-wise annotations. To alleviate this problem, we are the first to explore a low-cost annotation way for 3D tooth instance segmentation, i.e., labeling all tooth centroids and only a few teeth for each dental model. |
Liangdong Qiu; Chongjie Ye; Pei Chen; Yunbi Liu; Xiaoguang Han; Shuguang Cui; |
2065 | Task Decoupled Framework for Reference-Based Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Those methods conduct the super-resolution task of the input low-resolution(LR) image and the texture transfer task from the reference image together in one module, easily introducing the interference between LR and reference features. Inspired by this finding, we propose a novel framework, which decouples the two tasks of RefSR, eliminating the interference between the LR image and the reference image. |
Yixuan Huang; Xiaoyun Zhang; Yu Fu; Siheng Chen; Ya Zhang; Yan-Feng Wang; Dazhi He; |
2066 | Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by that, we propose Augmented NeRF (Aug-NeRF), which for the first time brings the power of robust data augmentations into regularizing the NeRF training. |
Tianlong Chen; Peihao Wang; Zhiwen Fan; Zhangyang Wang; |
2067 | RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera, required only during training data acquisition. |
Fabio Tosi; Pierluigi Zama Ramirez; Matteo Poggi; Samuele Salti; Stefano Mattoccia; Luigi Di Stefano; |
2068 | Id-Free Person Similarity Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we present a contrastive learning framework to learn person similarity without using manually labeled identity annotations. |
Bing Shuai; Xinyu Li; Kaustav Kundu; Joseph Tighe; |
2069 | Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods treat it as a cross-modality retrieval task and learn the common latent embeddings from image and video modalities, which are both less effective and efficient due to large modality gap and redundant feature learning by utilizing all video frames. In this work, we first regard this task as point-to-set matching problem identical to human decision process, and propose a novel Temporal Complementarity-Guided Reinforcement Learning (TCRL) approach for image-to-video person re-identification. |
Wei Wu; Jiawei Liu; Kecheng Zheng; Qibin Sun; Zheng-Jun Zha; |
2070 | Globetrotter: Connecting Languages By Connecting Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a method that uses visual observations to bridge the gap between languages, rather than relying on parallel corpora or topological properties of the representations. |
Dídac Surís; Dave Epstein; Carl Vondrick; |
2071 | Fairness-Aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By contrast, we propose a more flexible approach, i.e., fairness-aware adversarial perturbation (FAAP), which learns to perturb input data to blind deployed models on fairness-related features, e.g., gender and ethnicity. |
Zhibo Wang; Xiaowei Dong; Henry Xue; Zhifei Zhang; Weifeng Chiu; Tao Wei; Kui Ren; |
2072 | Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. |
Feng Cheng; Mingze Xu; Yuanjun Xiong; Hao Chen; Xinyu Li; Wei Li; Wei Xia; |
2073 | Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In order to exploit the part-level layouts, we propose a Shape-aware Position Descriptor (SPD) to describe each pixel’s positional feature, where object shape is explicitly encoded into the SPD feature. |
Zhengyao Lv; Xiaoming Li; Zhenxing Niu; Bing Cao; Wangmeng Zuo; |
2074 | Egocentric Scene Understanding Via Multimodal Spatial Rectifier Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study a problem of egocentric scene understanding, i.e., predicting depths and surface normals from an egocentric image. |
Tien Do; Khiem Vuong; Hyun Soo Park; |
2075 | Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, such a pixel can be convincingly treated as a negative sample to those most unlikely categories. Based on this insight, we develop an effective pipeline to make sufficient use of unlabeled data. |
Yuchao Wang; Haochen Wang; Yujun Shen; Jingjing Fei; Wei Li; Guoqiang Jin; Liwei Wu; Rui Zhao; Xinyi Le; |
2076 | Day-to-Night Image Synthesis for Training Nighttime Neural ISPs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this problem, we propose a method that synthesizes nighttime images from daytime images. |
Abhijith Punnappurath; Abdullah Abuolaim; Abdelrahman Abdelhamed; Alex Levinshtein; Michael S. Brown; |
2077 | Commonality in Natural Images Rescues GANs: Pretraining GANs With Generic and Privacy-Free Synthetic Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing studies show that the pretrained model using a single benchmark dataset is not generalized to various target datasets. More importantly, the pretrained model can be vulnerable to copyright or privacy risks as membership inference attack advances. To resolve both issues, we propose an effective and unbiased data synthesizer, namely Primitives-PS, inspired by the generic characteristics of natural images. |
Kyungjune Baek; Hyunjung Shim; |