CVPR 2022 Papers with Code/Data
Readers are also encouraged to read our CVPR 2022 highlights, which associates each CVPR-2022 paper with a one sentence highlight. You may also like to explore our “Best Paper” Digest (CVPR), which lists the most influential CVPR papers since 1988.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper. Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services on ranking, search, tracking and automatic literature review.
If you do not want to miss interesting academic papers, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: CVPR 2022 Papers with Code/Data
Paper | Author(s) | Code | |
---|---|---|---|
1 | Controllable Animation of Fluid Elements in Still Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. |
Aniruddha Mahapatra; Kuldeep Kulkarni; | code |
2 | F-SfT: Shape-From-Template With A Physics-Based Deformation Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to previous works, this paper proposes a new SfT approach explaining 2D observations through physical simulations accounting for forces and material properties. |
Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik; | code |
3 | TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To leverage the unlabeled data to boost model performance, we present a novel Two-Way Inter-label Self-Training framework named TWIST. |
Ruihang Chu; Xiaoqing Ye; Zhengzhe Liu; Xiao Tan; Xiaojuan Qi; Chi-Wing Fu; Jiaya Jia; | code |
4 | Do Learned Representations Respect Causal Relationships? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Data often has many semantic attributes that are causally associated with each other. But do attribute-specific learned representations of data also respect the same causal relations? We answer this question in three steps. |
Lan Wang; Vishnu Naresh Boddeti; | code |
5 | ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step. |
Yoad Tewel; Yoav Shalev; Idan Schwartz; Lior Wolf; | code |
6 | 3D Moments From Near-Duplicate Photos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce 3D Moments, a new computational photography effect. |
Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen; | code |
7 | Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space. |
Yabin Zhang; Minghan Li; Ruihuang Li; Kui Jia; Lei Zhang; | code |
8 | Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods. |
Zejin Wang; Jiazheng Liu; Guoqing Li; Hua Han; | code |
9 | Balanced and Hierarchical Relation Learning for One-Shot Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the balanced and hierarchical learning for our detector. |
Hanqing Yang; Sijia Cai; Hualian Sheng; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Yong Tang; Yu Zhang; | code |
10 | NICE-SLAM: Neural Implicit Scalable Encoding for SLAM Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation. |
Zihan Zhu; Songyou Peng; Viktor Larsson; Weiwei Xu; Hujun Bao; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys; | code |
11 | Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID), in which we progressively discard indeterminacy from all the walkable areas until reaching the desired trajectory. |
Tianpei Gu; Guangyi Chen; Junlong Li; Chunze Lin; Yongming Rao; Jie Zhou; Jiwen Lu; | code |
12 | CLRNet: Cross Layer Refinement Network for Lane Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present Cross Layer Refinement Network (CLRNet) aiming at fully utilizing both high-level and low-level features in lane detection. |
Tu Zheng; Yifei Huang; Yang Liu; Wenjian Tang; Zheng Yang; Deng Cai; Xiaofei He; | code |
13 | Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others. |
Shuangrui Ding; Maomao Li; Tianyu Yang; Rui Qian; Haohang Xu; Qingyi Chen; Jue Wang; Hongkai Xiong; | code |
14 | DINE: Domain Adaptation From Single and Multiple Black-Box Predictors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). |
Jian Liang; Dapeng Hu; Jiashi Feng; Ran He; | code |
15 | FaceFormer: Speech-Driven 3D Facial Animation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes. |
Yingruo Fan; Zhaojiang Lin; Jun Saito; Wenping Wang; Taku Komura; | code |
16 | Rotationally Equivariant 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information. To this end, we propose Equivariant Object detection Network (EON) with a rotation equivariance suspension design to achieve object-level equivariance. |
Hong-Xing Yu; Jiajun Wu; Li Yi; | code |
17 | Accelerating DETR Convergence Via Semantic-Aligned Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that the slow convergence is largely attributed to the complication in matching object queries with target features in different feature embedding spaces. This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR’s convergence without sacrificing its accuracy. |
Gongjie Zhang; Zhipeng Luo; Yingchen Yu; Kaiwen Cui; Shijian Lu; | code |
18 | Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, synthesized persons in existing datasets are mostly cartoon-like and in random dress collocation, which limits their performance. To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. |
Yanan Wang; Xuezhi Liang; Shengcai Liao; | code |
19 | GeoNeRF: Generalizing NeRF With Geometry Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields. |
Mohammad Mahdi Johari; Yann Lepoittevin; François Fleuret; | code |
20 | ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos. |
Biwen Lei; Xiefan Guo; Hongyu Yang; Miaomiao Cui; Xuansong Xie; Di Huang; | code |
21 | Expanding Low-Density Latent Regions for Open-Set Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to identify unknown objects by separating high/low-density regions in the latent space, based on the consensus that unknown objects are usually distributed in low-density latent regions. |
Jiaming Han; Yuqiang Ren; Jian Ding; Xingjia Pan; Ke Yan; Gui-Song Xia; | code |
22 | Uformer: A General U-Shaped Transformer for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. |
Zhendong Wang; Xiaodong Cun; Jianmin Bao; Wengang Zhou; Jianzhuang Liu; Houqiang Li; | code |
23 | Exploring Dual-Task Correlation for Pose Guided Person Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG. |
Pengze Zhang; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; | code |
24 | Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework to remove eyeglasses as well as their cast shadows from face images. |
Junfeng Lyu; Zhibo Wang; Feng Xu; | code |
25 | Modeling 3D Layout for Group Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, layout ambiguity is introduced because these methods only consider the 2D layout on the imaging plane. In this paper, we overcome the above limitations by 3D layout modeling. |
Quan Zhang; Kaiheng Dang; Jian-Huang Lai; Zhanxiang Feng; Xiaohua Xie; | code |
26 | Toward Fast, Flexible, and Robust Low-Light Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. |
Long Ma; Tengyu Ma; Risheng Liu; Xin Fan; Zhongxuan Luo; | code |
27 | Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. |
Muheng Li; Lei Chen; Yueqi Duan; Zhilan Hu; Jianjiang Feng; Jie Zhou; Jiwen Lu; | code |
28 | HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, in this work, we propose a novel 3D hand mesh estimation network HandOccNet, that can fully exploits the information at occluded regions as a secondary means to enhance image features and make it much richer. |
JoonKyu Park; Yeonguk Oh; Gyeongsik Moon; Hongsuk Choi; Kyoung Mu Lee; | code |
29 | Modular Action Concept Grounding in Semantic Video Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the idea of Mixture of Experts, we embody each abstract label by a structured combination of various visual concept learners and propose a novel video prediction model, Modular Action Concept Network (MAC). |
Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg; | code |
30 | StyleSwin: Transformer-Based GAN for High-Resolution Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis. |
Bowen Zhang; Shuyang Gu; Bo Zhang; Jianmin Bao; Dong Chen; Fang Wen; Yong Wang; Baining Guo; | code |
31 | Discrete Cosine Transform Network for Guided Depth Map Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the challenges in interpreting the working mechanism, extracting cross-modal features and RGB texture over-transferred, we propose a novel Discrete Cosine Transform Network (DCTNet) to alleviate the problems from three aspects. |
Zixiang Zhao; Jiangshe Zhang; Shuang Xu; Zudi Lin; Hanspeter Pfister; | code |
32 | Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. |
Xiaoxue Chen; Tianyu Liu; Hao Zhao; Guyue Zhou; Ya-Qin Zhang; | code |
33 | TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective. |
Sijie Zhu; Mubarak Shah; Chen Chen; | code |
34 | Contrastive Boundary Learning for Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the segmentation of scene boundaries. |
Liyao Tang; Yibing Zhan; Zhe Chen; Baosheng Yu; Dacheng Tao; | code |
35 | Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts. |
Jie Liang; Hui Zeng; Lei Zhang; | code |
36 | CVNet: Contour Vibration Network for Building Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the physical vibration theory, we propose a contour vibration network (CVNet) for automatic building boundary delineation. |
Ziqiang Xu; Chunyan Xu; Zhen Cui; Xiangwei Zheng; Jian Yang; | code |
37 | Swin Transformer V2: Scaling Up Capacity and Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution. |
Ze Liu; Han Hu; Yutong Lin; Zhuliang Yao; Zhenda Xie; Yixuan Wei; Jia Ning; Yue Cao; Zheng Zhang; Li Dong; Furu Wei; Baining Guo; | code |
38 | Projective Manifold Gradient Layer for Deep Rotation Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a manifold-aware gradient that directly backpropagates into deep network weights. |
Jiayi Chen; Yingda Yin; Tolga Birdal; Baoquan Chen; Leonidas J. Guibas; He Wang; | code |
39 | HCSC: Hierarchical Contrastive Selective Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the negative pairs used in these methods are not guaranteed to be semantically distinct, which could further hamper the structural correctness of learned image representations. To tackle these limitations, we propose a novel contrastive learning framework called Hierarchical Contrastive Selective Coding (HCSC). |
Yuanfan Guo; Minghao Xu; Jiawen Li; Bingbing Ni; Xuanyu Zhu; Zhenbang Sun; Yi Xu; | code |
40 | TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. |
Haodong Duan; Nanxuan Zhao; Kai Chen; Dahua Lin; | code |
41 | DiSparse: Disentangled Sparsification for Multitask Model Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. |
Xinglong Sun; Ali Hassani; Zhangyang Wang; Gao Huang; Humphrey Shi; | code |
42 | Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We seek to push the limits of a simple-but-effective pipeline for real-world few-shot image classification in practice. To this end, we explore few-shot learning from the perspective of neural architecture, as well as a three stage pipeline of pre-training on external data, meta-training with labelled few-shot tasks, and task-specific fine-tuning on unseen tasks. |
Shell Xu Hu; Da Li; Jan Stühmer; Minyoung Kim; Timothy M. Hospedales; | code |
43 | Towards Efficient and Scalable Sharpness-Aware Minimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel algorithm LookSAM – that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM. |
Yong Liu; Siqi Mai; Xiangning Chen; Cho-Jui Hsieh; Yang You; | code |
44 | OSSO: Obtaining Skeletal Shape From Outside Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address the problem of inferring the anatomic skeleton of a person, in an arbitrary pose, from the 3D surface of the body; i.e. we predict the inside (bones) from the outside (skin). |
Marilyn Keller; Silvia Zuffi; Michael J. Black; Sergi Pujades; | code |
45 | A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the biases of a varied set of SSL visual models, trained using ImageNet data, using a method and dataset designed by psychological experts to measure social biases. |
Kirill Sirotkin; Pablo Carballeira; Marcos Escudero-Viñolo; | code |
46 | Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, instead of following previous literature, we propose Self-Supervised Predictive Learning (SSPL), a negative-free method for sound localization via explicit positive mining. |
Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang; | code |
47 | Comparing Correspondences: Video Prediction With Correspondence-Wise Losses Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels. |
Daniel Geng; Max Hamilton; Andrew Owens; | code |
48 | Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance. |
Binhui Xie; Longhui Yuan; Shuang Li; Chi Harold Liu; Xinjing Cheng; | code |
49 | CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations. |
Mohamed Afham; Isuru Dissanayake; Dinithi Dissanayake; Amaya Dharmasiri; Kanchana Thilakarathna; Ranga Rodrigo; | code |
50 | Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing methods are prone to model overfitting and collapse in extremely few shot setting (less than 10). To solve this problem, we propose a relaxed spatial structural alignment (RSSA) method to calibrate the target generative models during the adaption. |
Jiayu Xiao; Liang Li; Chaofei Wang; Zheng-Jun Zha; Qingming Huang; | code |
51 | Enhancing Adversarial Training With Second-Order Statistics of Weights Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through Second-Order Statistics Optimization (S^2O) with respect to the weights. |
Gaojie Jin; Xinping Yi; Wei Huang; Sven Schewe; Xiaowei Huang; | code |
52 | Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. |
Chaoning Zhang; Kang Zhang; Trung X. Pham; Axi Niu; Zhinan Qiao; Chang D. Yoo; In So Kweon; | code |
53 | Moving Window Regression: A Novel Approach to Ordinal Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel ordinal regression algorithm, called moving window regression (MWR), is proposed in this paper. |
Nyeong-Ho Shin; Seon-Ho Lee; Chang-Su Kim; | code |
54 | Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Different from related methods, we propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block. |
Nicolae-Cătălin Ristea; Neelu Madan; Radu Tudor Ionescu; Kamal Nasrollahi; Fahad Shahbaz Khan; Thomas B. Moeslund; Mubarak Shah; | code |
55 | Robust Optimization As Data Augmentation for Large-Scale Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training. |
Kezhi Kong; Guohao Li; Mucong Ding; Zuxuan Wu; Chen Zhu; Bernard Ghanem; Gavin Taylor; Tom Goldstein; | code |
56 | Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to the literature, we propose a family of robust structured declarative classifiers for point cloud classification, where the internal constrained optimization mechanism can effectively defend adversarial attacks through implicit gradients. |
Kaidong Li; Ziming Zhang; Cuncong Zhong; Guanghui Wang; | code |
57 | Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, prior works utilize simple image transformations such as resizing, which limits input diversity. To tackle this limitation, we propose the object-based diverse input (ODI) method that draws an adversarial image on a 3D object and induces the rendered image to be classified as the target class. |
Junyoung Byun; Seungju Cho; Myung-Joon Kwon; Hee-Seon Kim; Changick Kim; | code |
58 | ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present ObjectFolder 2.0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1.0 in three aspects. |
Ruohan Gao; Zilin Si; Yen-Yu Chang; Samuel Clarke; Jeannette Bohg; Li Fei-Fei; Wenzhen Yuan; Jiajun Wu; | code |
59 | 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a flexible framework for monocular depth estimation from high-resolution 360deg images using tangent images. |
Manuel Rey-Area; Mingze Yuan; Christian Richardt; | code |
60 | POCO: Point Convolution for Surface Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, relying on fixed patch sizes may require discretization tuning. To address these issues, we propose to use point cloud convolutions and compute latent vectors at each input point. |
Alexandre Boulch; Renaud Marlet; | code |
61 | Neural Texture Extraction and Distribution for Controllable Person Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Observing that person images are highly structured, we propose to generate desired images by extracting and distributing semantic entities of reference images. |
Yurui Ren; Xiaoqing Fan; Ge Li; Shan Liu; Thomas H. Li; | code |
62 | Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Today’s VidSGG models are all proposal-based methods, i.e., they first generate numerous paired subject-object snippets as proposals, and then conduct predicate classification for each proposal. In this paper, we argue that this prevalent proposal-based framework has three inherent drawbacks: 1) The ground-truth predicate labels for proposals are partially correct. |
Kaifeng Gao; Long Chen; Yulei Niu; Jian Shao; Jun Xiao; | code |
63 | DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). |
Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu; | code |
64 | ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we take a step towards computer-aided waste detection and present the first in-the-wild industrial-grade waste detection and segmentation dataset, ZeroWaste. |
Dina Bashkirova; Mohamed Abdelfattah; Ziliang Zhu; James Akl; Fadi Alladkani; Ping Hu; Vitaly Ablavsky; Berk Calli; Sarah Adel Bargal; Kate Saenko; | code |
65 | UNIST: Unpaired Neural Implicit Shape Translation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains. |
Qimin Chen; Johannes Merz; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang; | code |
66 | APES: Articulated Part Extraction From Sprite Sheets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation. |
Zhan Xu; Matthew Fisher; Yang Zhou; Deepali Aneja; Rushikesh Dudhat; Li Yi; Evangelos Kalogerakis; | code |
67 | SPAct: Self-Supervised Privacy Preservation for Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels. |
Ishan Rajendrakumar Dave; Chen Chen; Mubarak Shah; | code |
68 | De-Rendering 3D Objects in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters. |
Felix Wimbauer; Shangzhe Wu; Christian Rupprecht; | code |
69 | Global Sensing and Measurements Reuse for Image Compressed Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale. |
Zi-En Fan; Feng Lian; Jia-Ni Quan; | code |
70 | Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable (i.e., approaching the lower bound of robustness). Towards this target, we propose a parameter-free Adaptive Auto Attack (A3) evaluation method which addresses the efficiency and reliability in a test-time-training fashion. |
Ye Liu; Yaya Cheng; Lianli Gao; Xianglong Liu; Qilong Zhang; Jingkuan Song; | code |
71 | Cross-View Transformers for Real-Time Map-View Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present cross-view transformers, an efficient attention-based model for map-view semantic segmentation from multiple cameras. |
Brady Zhou; Philipp Krähenbühl; | code |
72 | Controllable Dynamic Multi-Task Architectures Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. |
Dripta S. Raychaudhuri; Yumin Suh; Samuel Schulter; Xiang Yu; Masoud Faraki; Amit K. Roy-Chowdhury; Manmohan Chandraker; | code |
73 | FastDOG: Fast Discrete Optimization on GPU Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a massively parallel Lagrange decomposition method for solving 0–1 integer linear programs occurring in structured prediction. |
Ahmed Abbas; Paul Swoboda; | code |
74 | Focal and Global Knowledge Distillation for Detectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. |
Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan; | code |
75 | Learning To Prompt for Continual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time. |
Zifeng Wang; Zizhao Zhang; Chen-Yu Lee; Han Zhang; Ruoxi Sun; Xiaoqi Ren; Guolong Su; Vincent Perot; Jennifer Dy; Tomas Pfister; | code |
76 | Human Mesh Recovery From Multiple Shots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing 3D human understanding methods. In this paper, we address these limitations with the insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly. |
Georgios Pavlakos; Jitendra Malik; Angjoo Kanazawa; | code |
77 | Convolution of Convolution: Let Kernels Spatially Collaborate Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the biological visual pathway, especially the retina, neurons are tiled along spatial dimensions with the electrical coupling as their local association, while in a convolution layer, kernels are placed along the channel dimension singly. We propose Convolution of Convolution, associating kernels in a layer and letting them collaborate spatially. |
Rongzhen Zhao; Jian Li; Zhenzhi Wu; | code |
78 | Make It Move: Controllable Image-to-Video Generation With Text Descriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The key challenges of TI2V task lie both in aligning appearance and motion from different modalities, and in handling uncertainty in text descriptions. To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor (MA) structure to store appearance-motion aligned representation. |
Yaosi Hu; Chong Luo; Zhenzhong Chen; | code |
79 | Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Neural Points, a novel point cloud representation and apply it to the arbitrary-factored upsampling task. |
Wanquan Feng; Jin Li; Hongrui Cai; Xiaonan Luo; Juyong Zhang; | code |
80 | Video-Text Representation Learning Via Differentiable Weak Temporal Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW). |
Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim; | code |
81 | Bi-Directional Object-Context Prioritization Learning for Saliency Ranking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. |
Xin Tian; Ke Xu; Xin Yang; Lin Du; Baocai Yin; Rynson W.H. Lau; | code |
82 | Vehicle Trajectory Prediction Works, But Not Everywhere Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel method that automatically generates realistic scenes causing state-of-the-art models to go off-road. |
Mohammadhossein Bahari; Saeed Saadatnejad; Ahmad Rahimi; Mohammad Shaverdikondori; Amir Hossein Shahidzadeh; Seyed-Mohsen Moosavi-Dezfooli; Alexandre Alahi; | code |
83 | MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection. |
Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu; | code |
84 | Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents new hierarchically cascaded transformers that can improve data efficiency through attribute surrogates learning and spectral tokens pooling. |
Yangji He; Weihan Liang; Dongyang Zhao; Hong-Yu Zhou; Weifeng Ge; Yizhou Yu; Wenqiang Zhang; | code |
85 | Generalized Category Discovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. |
Sagar Vaze; Kai Han; Andrea Vedaldi; Andrew Zisserman; | code |
86 | Contour-Hugging Heatmaps for Landmark Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an effective and easy-to-implement method for simultaneously performing landmark detection in images and obtaining an ingenious uncertainty measurement for each landmark. |
James McCouat; Irina Voiculescu; | code |
87 | Voxel Field Fusion for 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion. |
Yanwei Li; Xiaojuan Qi; Yukang Chen; Liwei Wang; Zeming Li; Jian Sun; Jiaya Jia; | code |
88 | DisARM: Displacement Aware Relation Module for 3D Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Displacement Aware Relation Module (DisARM), a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes. |
Yao Duan; Chenyang Zhu; Yuqing Lan; Renjiao Yi; Xinwang Liu; Kai Xu; | code |
89 | MixFormer: Mixing Features Across Windows and Dimensions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self-attention within non-overlapped windows and shares weights on the channel dimension. We propose MixFormer to find a solution. |
Qiang Chen; Qiman Wu; Jian Wang; Qinghao Hu; Tao Hu; Errui Ding; Jian Cheng; Jingdong Wang; | code |
90 | FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose to parse pairwise query and exemplar action instances into consecutive steps with diverse semantic and temporal correspondences. |
Jinglin Xu; Yongming Rao; Xumin Yu; Guangyi Chen; Jie Zhou; Jiwen Lu; | code |
91 | HEAT: Holistic Edge Attention Transformer for Structured Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. |
Jiacheng Chen; Yiming Qian; Yasutaka Furukawa; | code |
92 | Mobile-Former: Bridging MobileNet and Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. |
Yinpeng Chen; Xiyang Dai; Dongdong Chen; Mengchen Liu; Xiaoyi Dong; Lu Yuan; Zicheng Liu; | code |
93 | CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the difficulties, we propose a new framework for scribble learning-based medical image segmentation, which is composed of mix augmentation and cycle consistency and thus is referred to as CycleMix. |
Ke Zhang; Xiahai Zhuang; | code |
94 | VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. |
Zeyuan Chen; Yinbo Chen; Jingwen Liu; Xingqian Xu; Vidit Goel; Zhangyang Wang; Humphrey Shi; Xiaolong Wang; | code |
95 | Towards End-to-End Unified Scene Text Detection and Layout Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. |
Shangbang Long; Siyang Qin; Dmitry Panteleev; Alessandro Bissacco; Yasuhisa Fujii; Michalis Raptis; | code |
96 | AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation. |
Paritosh Mittal; Yen-Chi Cheng; Maneesh Singh; Shubham Tulsiani; | code |
97 | ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we first show that optimal neural architectures in the DIP framework are image-dependent. Leveraging this insight, we then propose an image-specific NAS strategy for the DIP framework that requires substantially less training than typical NAS approaches, effectively enabling image-specific NAS. |
Metin Ersin Arican; Ozgur Kara; Gustav Bredell; Ender Konukoglu; | code |
98 | End-to-End Referring Video Object Segmentation With Multimodal Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple Transformer-based approach to RVOS. |
Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin; | code |
99 | IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present IterMVS, a new data-driven method for high-resolution multi-view stereo. |
Fangjinhua Wang; Silvano Galliani; Christoph Vogel; Marc Pollefeys; | code |
100 | Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, the foreground points are inherently more important than background points for object detectors. Motivated by this, we propose a highly-efficient single-stage point-based 3D detector in this paper, termed IA-SSD. |
Yifan Zhang; Qingyong Hu; Guoquan Xu; Yanxin Ma; Jianwei Wan; Yulan Guo; | code |
101 | Detecting Camouflaged Object in Frequency Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To well involve the frequency clues into the CNN models, we present a powerful network with two special components. |
Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding; | code |
102 | SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video. |
Boyi Jiang; Yang Hong; Hujun Bao; Juyong Zhang; | code |
103 | Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel and simple framework to achieve equivariance for point cloud analysis based on the message passing (graph neural network) scheme. |
Shitong Luo; Jiahan Li; Jiaqi Guan; Yufeng Su; Chaoran Cheng; Jian Peng; Jianzhu Ma; | code |
104 | Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a simple-yet-effective self-supervised node representation learning strategy via directly maximizing the mutual information between the hidden representations of nodes and their neighbourhood, which can be theoretically justified by its link to graph smoothing. |
Wei Dong; Junsheng Wu; Yi Luo; Zongyuan Ge; Peng Wang; | code |
105 | Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames. |
Jiafan Zhuang; Zilei Wang; Yuan Gao; | code |
106 | Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This task is particularly challenging for deep neural networks because data is difficult to obtain and annotate. Therefore, we formulate amodal segmentation as an out-of-task and out-of-distribution generalization problem. |
Yihong Sun; Adam Kortylewski; Alan Yuille; | code |
107 | How Well Do Sparse ImageNet Models Transfer? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned–that is, compressed by sparsifiying their connections. |
Eugenia Iofinova; Alexandra Peste; Mark Kurtz; Dan Alistarh; | code |
108 | REX: Reasoning-Aware and Grounded Explanation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. |
Shi Chen; Qi Zhao; | code |
109 | Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations. |
Yang You; Zelin Ye; Yujing Lou; Chengkun Li; Yong-Lu Li; Lizhuang Ma; Weiming Wang; Cewu Lu; | code |
110 | Object-Aware Video-Language Pre-Training for Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations. |
Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou; | code |
111 | MAT: Mask-Aware Transformer for Large Hole Image Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. |
Wenbo Li; Zhe Lin; Kun Zhou; Lu Qi; Yi Wang; Jiaya Jia; | code |
112 | Align and Prompt: Video-and-Language Pre-Training With Entity Prompts Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment. |
Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi; | code |
113 | MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). |
Jiemin Fang; Lingxi Xie; Xinggang Wang; Xiaopeng Zhang; Wenyu Liu; Qi Tian; | code |
114 | Cross Modal Retrieval With Querybank Normalisation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. |
Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie; | code |
115 | Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera. |
Yu Zhan; Fenghai Li; Renliang Weng; Wongun Choi; | code |
116 | ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. |
Bo He; Xitong Yang; Le Kang; Zhiyu Cheng; Xin Zhou; Abhinav Shrivastava; | code |
117 | Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3. |
Xiaohan Ding; Xiangyu Zhang; Jungong Han; Guiguang Ding; | code |
118 | End-to-End Multi-Person Pose Estimation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR. |
Dahu Shi; Xing Wei; Liangqi Li; Ye Ren; Wenming Tan; | code |
119 | REGTR: End-to-End Point Cloud Correspondences With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we conjecture that attention mechanisms can replace the role of explicit feature matching and RANSAC, and thus propose an end-to-end framework to directly predict the final set of correspondences. |
Zi Jian Yew; Gim Hee Lee; | code |
120 | Neural 3D Scene Reconstruction With The Manhattan-World Assumption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. |
Haoyu Guo; Sida Peng; Haotong Lin; Qianqian Wang; Guofeng Zhang; Hujun Bao; Xiaowei Zhou; | code |
121 | V2C: Visual Voice Cloning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video. |
Qi Chen; Mingkui Tan; Yuankai Qi; Jiaqiu Zhou; Yuanqing Li; Qi Wu; | code |
122 | Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we revisit the average precision (AP) loss and reveal that the crucial element is that of selecting the ranking pairs between positive and negative samples. |
Dongli Xu; Jinhong Deng; Wen Li; | code |
123 | MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies. |
Mattia Soldan; Alejandro Pardo; Juan León Alcázar; Fabian Caba; Chen Zhao; Silvio Giancola; Bernard Ghanem; | code |
124 | Gait Recognition in The Wild With Dense 3D Representations and A Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, we propose a novel framework to explore the 3D Skinned Multi-Person Linear (SMPL) model of the human body for gait recognition, named SMPLGait. |
Jinkai Zheng; Xinchen Liu; Wu Liu; Lingxiao He; Chenggang Yan; Tao Mei; | code |
125 | ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, constructing both valid and diverse hand-object interactions and efficiently learning from the vast synthetic data is still challenging. To address the above issues, we propose ArtiBoost, a lightweight online data enhancement method. |
Lixin Yang; Kailin Li; Xinyu Zhan; Jun Lv; Wenqiang Xu; Jiefeng Li; Cewu Lu; | code |
126 | QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To get the best of two worlds, we propose QueryDet that uses a novel query mechanism to accelerate the inference speed of feature-pyramid based object detectors. |
Chenhongyi Yang; Zehao Huang; Naiyan Wang; | code |
127 | IDEA-Net: Dynamic 3D Point Cloud Interpolation Via Deep Embedding Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the challenges, we propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency. |
Yiming Zeng; Yue Qian; Qijian Zhang; Junhui Hou; Yixuan Yuan; Ying He; | code |
128 | BEHAVE: Dataset and Method for Tracking Human Object Interactions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions. |
Bharat Lal Bhatnagar; Xianghui Xie; Ilya A. Petrov; Cristian Sminchisescu; Christian Theobalt; Gerard Pons-Moll; | code |
129 | Revisiting Random Channel Pruning for Neural Network Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we try to determine the channel configuration of the pruned models by random search. |
Yawei Li; Kamil Adamczewski; Wen Li; Shuhang Gu; Radu Timofte; Luc Van Gool; | code |
130 | Generating Diverse and Natural 3D Human Motions From Text Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead of directly engaging with pose sequences, we propose motion snippet code as our internal motion representation, which captures local semantic motion contexts and is empirically shown to facilitate the generation of plausible motions faithful to the input text. |
Chuan Guo; Shihao Zou; Xinxin Zuo; Sen Wang; Wei Ji; Xingyu Li; Li Cheng; | code |
131 | E-CIR: Event-Enhanced Continuous Intensity Recovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents E-CIR, which converts a blurry image into a sharp video represented as a parametric function from time to intensity. |
Chen Song; Qixing Huang; Chandrajit Bajaj; | code |
132 | Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A systematic evaluation of key modules in existing methods is performed in terms of their robustness against adversarial attacks. From the insights of our analysis, we construct a more robust deraining method by integrating these effective modules. |
Yi Yu; Wenhan Yang; Yap-Peng Tan; Alex C. Kot; | code |
133 | Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a keypoint-based object-level SLAM framework that can provide globally consistent 6DoF pose estimates for symmetric and asymmetric objects alike. |
Nathaniel Merrill; Yuliang Guo; Xingxing Zuo; Xinyu Huang; Stefan Leutenegger; Xi Peng; Liu Ren; Guoquan Huang; | code |
134 | AziNorm: Exploiting The Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Point cloud, the most important data format for 3D environmental perception, is naturally endowed with strong radial symmetry. In this work, we exploit this radial symmetry via a divide-and-conquer strategy to boost 3D perception performance and ease optimization. |
Shaoyu Chen; Xinggang Wang; Tianheng Cheng; Wenqiang Zhang; Qian Zhang; Chang Huang; Wenyu Liu; | code |
135 | Weakly Supervised Rotation-Invariant Aerial Object Detection Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, current solutions have been prone to fall into the issue with unstable detectors, as they ignore lower-scored instances and may regard them as backgrounds. To address these issues, in this paper, we construct a novel end-to-end weakly supervised Rotation-Invariant aerial object detection Network (RINet). |
Xiaoxu Feng; Xiwen Yao; Gong Cheng; Junwei Han; | code |
136 | Surface Reconstruction From Point Clouds By Learning Predictive Context Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this requires the local context prior to generalize to a wide variety of unseen target regions, which is hard to achieve. To resolve this issue, we introduce Predictive Context Priors by learning Predictive Queries for each specific point cloud at inference time. |
Baorui Ma; Yu-Shen Liu; Matthias Zwicker; Zhizhong Han; | code |
137 | IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering. |
Rui Zhu; Zhengqin Li; Janarbek Matai; Fatih Porikli; Manmohan Chandraker; | code |
138 | DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs. |
Aysim Toker; Lukas Kondmann; Mark Weber; Marvin Eisenberger; Andrés Camero; Jingliang Hu; Ariadna Pregel Hoderlein; Çağlar Şenaras; Timothy Davis; Daniel Cremers; Giovanni Marchisio; Xiao Xiang Zhu; Laura Leal-Taixé; | code |
139 | Weakly Supervised Temporal Action Localization Via Representative Snippet Knowledge Propagation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation. To alleviate this problem, we propose a representative snippet summarization and propagation framework. |
Linjiang Huang; Liang Wang; Hongsheng Li; | code |
140 | E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel contour-based method, named E2EC, for high-quality instance segmentation. |
Tao Zhang; Shiqing Wei; Shunping Ji; | code |
141 | BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch. |
Zhi Hou; Baosheng Yu; Dacheng Tao; | code |
142 | Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the classifier focuses only on the discriminative regions while ignoring other useful information in each image, resulting in incomplete localization maps. To address this issue, we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss. |
Qi Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie; | code |
143 | Learning Multi-View Aggregation in The Wild for Large-Scale 3D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, we propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions. |
Damien Robert; Bruno Vallet; Loic Landrieu; | code |
144 | PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These methods can be negatively influenced by strong illumination conditions causing shading-reflectance leakages. Therefore, in this paper, an end-to-end edge-driven hybrid CNN approach is proposed for intrinsic image decomposition. |
Partha Das; Sezer Karaoglu; Theo Gevers; | code |
145 | Clothes-Changing Person Re-Identification With RGB Modality Only Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Clothes-based Adversarial Loss (CAL) to mine clothes-irrelevant features from the original RGB images by penalizing the predictive power of re-id model w.r.t. clothes. |
Xinqian Gu; Hong Chang; Bingpeng Ma; Shutao Bai; Shiguang Shan; Xilin Chen; | code |
146 | Robust Image Forgery Detection Over Online Social Network Shared Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fight against the OSN-shared forgeries, in this work, a novel robust training scheme is proposed. |
Haiwei Wu; Jiantao Zhou; Jinyu Tian; Jun Liu; | code |
147 | Representation Compensation Networks for Continual Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the continual semantic segmentation problem, where the deep neural networks are required to incorporate new classes continually without catastrophic forgetting. |
Chang-Bin Zhang; Jia-Wen Xiao; Xialei Liu; Ying-Cong Chen; Ming-Ming Cheng; | code |
148 | Tracking People By Predicting 3D Appearance, Location and Pose Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an approach for tracking people in monocular videos by predicting their future 3D representations. |
Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Jitendra Malik; | code |
149 | Text2Mesh: Text-Driven Neural Stylization for Meshes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop intuitive controls for editing the style of 3D objects. |
Oscar Michel; Roi Bar-On; Richard Liu; Sagie Benaim; Rana Hanocka; | code |
150 | C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we find that there are mainly two challenges of medical images in WSSS: i) the boundary of object foreground and background is not clear; ii) the co-occurrence phenomenon is very severe in training stage. We thus propose a Causal CAM (C-CAM) method to overcome the above challenges. |
Zhang Chen; Zhiqiang Tian; Jihua Zhu; Ce Li; Shaoyi Du; | code |
151 | Forward Compatible Few-Shot Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, we suggest learning prospectively to prepare for future updates, and propose ForwArd Compatible Training (FACT) for FSCIL. |
Da-Wei Zhou; Fu-Yun Wang; Han-Jia Ye; Liang Ma; Shiliang Pu; De-Chuan Zhan; | code |
152 | Weakly Supervised Object Localization As Domain Adaption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the MIL mechanism makes CAM only activate discriminative object parts rather than the whole object, weakening its performance for localizing objects. To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects. |
Lei Zhu; Qi She; Qian Chen; Yunfei You; Boyu Wang; Yanye Lu; | code |
153 | Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Tencent-MVSE dataset, which is the first benchmark dataset for the multi-modal video similarity evaluation task. |
Zhaoyang Zeng; Yongsheng Luo; Zhenhua Liu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen; | code |
154 | Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting. |
Nicolas Donati; Etienne Corman; Maks Ovsjanikov; | code |
155 | Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels. |
Zhiyuan Liang; Tiancai Wang; Xiangyu Zhang; Jian Sun; Jianbing Shen; | code |
156 | MatteFormer: Transformer-Based Image Matting Via Prior-Tokens Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. |
GyuTae Park; SungJoon Son; JaeYoung Yoo; SeHo Kim; Nojun Kwak; | code |
157 | Video Shadow Detection Via Spatio-Temporal Interpolation Consistency Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Using a model trained on labeled images to the video frames directly may lead to high generalization error and temporal inconsistent results. In this paper, we address these challenges by proposing a Spatio-Temporal Interpolation Consistency Training (STICT) framework to rationally feed the unlabeled video frames together with the labeled images into an image shadow detection network training. |
Xiao Lu; Yihong Cao; Sheng Liu; Chengjiang Long; Zipei Chen; Xuanyu Zhou; Yimin Yang; Chunxia Xiao; | code |
158 | Robust and Accurate Superquadric Recovery: A Probabilistic Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The superquadric recovery is formulated as a Maximum Likelihood Estimation (MLE) problem. We propose an algorithm, Expectation, Maximization, and Switching (EMS), to solve this problem, where: (1) outliers are predicted from the posterior perspective; (2) the superquadric parameter is optimized by the trust-region reflective algorithm; and (3) local optima are avoided by globally searching and switching among parameters encoding similar superquadrics. |
Weixiao Liu; Yuwei Wu; Sipu Ruan; Gregory S. Chirikjian; | code |
159 | Grounding Answers for Visual Questions Asked By Visually Impaired People Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual questions asked by people with visual impairments. |
Chongyan Chen; Samreen Anjum; Danna Gurari; | code |
160 | Sparse Instance Activation for Real-Time Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation. |
Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Wenqiang Zhang; Qian Zhang; Chang Huang; Zhaoxiang Zhang; Wenyu Liu; | code |
161 | VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose VisualGPT, which employs a novel self-resurrecting encoder-decoder attention mechanism to quickly adapt the PLM with a small amount of in-domain image-text data. |
Jun Chen; Han Guo; Kai Yi; Boyang Li; Mohamed Elhoseiny; | code |
162 | MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses. |
Wenhao Li; Hong Liu; Hao Tang; Pichao Wang; Luc Van Gool; | code |
163 | Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. |
Tianhan Xu; Yasuhiro Fujita; Eiichi Matsumoto; | code |
164 | Towards Implicit Text-Guided 3D Shape Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we explore the challenging task of generating 3D shapes from text. |
Zhengzhe Liu; Yi Wang; Xiaojuan Qi; Chi-Wing Fu; | code |
165 | SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SoftCollage, a novel method that employs a neural-based differentiable probabilistic tree generator to produce the probability distribution of correlation-preserving collage tree conditioned on deep image feature, aspect ratio and canvas size. |
Jiahao Yu; Li Chen; Mingrui Zhang; Mading Li; | code |
166 | Query and Attention Augmentation for Knowledge-Based Explainable Reasoning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To bridge this research gap, we present Query and Attention Augmentation, a general approach that augments neural module networks to jointly reason about visual and external knowledge. |
Yifeng Zhang; Ming Jiang; Qi Zhao; | code |
167 | Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. |
Tristan Thrush; Ryan Jiang; Max Bartolo; Amanpreet Singh; Adina Williams; Douwe Kiela; Candace Ross; | code |
168 | Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net). |
Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang; | code |
169 | Fine-Grained Object Classification Via Self-Supervised Pose Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network. |
Xuhui Yang; Yaowei Wang; Ke Chen; Yong Xu; Yonghong Tian; | code |
170 | Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing animal behavior datasets have limitations in multiple aspects, including limited numbers of animal classes, data samples and provided tasks, and also limited variations in environmental conditions and viewpoints. To address these limitations, we create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors. |
Xun Long Ng; Kian Eng Ong; Qichen Zheng; Yun Ni; Si Yong Yeo; Jun Liu; | code |
171 | Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. |
Junyu Gao; Mengyuan Chen; Changsheng Xu; | code |
172 | Relieving Long-Tailed Instance Segmentation Via Pairwise Class Balance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. |
Yin-Yin He; Peizhen Zhang; Xiu-Shen Wei; Xiangyu Zhang; Jian Sun; | code |
173 | Online Convolutional Re-Parameterization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present online convolutional re-parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. |
Mu Hu; Junyi Feng; Jiashen Hua; Baisheng Lai; Jianqiang Huang; Xiaojin Gong; Xian-Sheng Hua; | code |
174 | Mimicking The Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model). |
Yujun Shi; Kuangqi Zhou; Jian Liang; Zihang Jiang; Jiashi Feng; Philip H.S. Torr; Song Bai; Vincent Y. F. Tan; | code |
175 | RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR. |
Jun Chen; Aniket Agarwal; Sherif Abdelkarim; Deyao Zhu; Mohamed Elhoseiny; | code |
176 | Personalized Image Aesthetics Assessment With Rich Attributes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the dilemma, we conduct so far, the most comprehensive subjective study of personalized image aesthetics and introduce a new Personalized image Aesthetics database with Rich Attributes (PARA), which consists of 31,220 images with annotations by 438 subjects. |
Yuzhe Yang; Liwu Xu; Leida Li; Nan Qie; Yaqian Li; Peng Zhang; Yandong Guo; | code |
177 | Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Part-based Pseudo Label Refinement (PPLR) framework that reduces the label noise by employing the complementary relationship between global and part features. |
Yoonki Cho; Woo Jae Kim; Seunghoon Hong; Sung-Eui Yoon; | code |
178 | HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: So we propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction. |
Xiaowan Hu; Yuanhao Cai; Jing Lin; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; | code |
179 | OW-DETR: Open-World Detection Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. |
Akshita Gupta; Sanath Narayan; K J Joseph; Salman Khan; Fahad Shahbaz Khan; Mubarak Shah; | code |
180 | Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the local codes are constrained at discrete and regular positions like grid points, which makes the code positions difficult to be optimized and limits their representation ability. To solve this problem, we propose to learn DIF with Dynamic Code Cloud, named DCC-DIF. |
Tianyang Li; Xin Wen; Yu-Shen Liu; Hua Su; Zhizhong Han; | code |
181 | Reversible Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. |
Karttikeya Mangalam; Haoqi Fan; Yanghao Li; Chao-Yuan Wu; Bo Xiong; Christoph Feichtenhofer; Jitendra Malik; | code |
182 | Amodal Panoptic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation. |
Rohit Mohan; Abhinav Valada; | code |
183 | Correlation Verification for Image Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet). |
Seongwon Lee; Hongje Seong; Suhyeon Lee; Euntai Kim; | code |
184 | Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More importantly, existing approaches build upon the straightforward pose estimation loss, which unfortunately cannot constrain the network to fully leverage useful information from neighboring frames. To tackle these problems, we present a novel hierarchical alignment framework, which leverages coarse-to-fine deformations to progressively update a neighboring frame to align with the current frame at the feature level. |
Zhenguang Liu; Runyang Feng; Haoming Chen; Shuang Wu; Yixing Gao; Yunjun Gao; Xiang Wang; | code |
185 | Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image. |
Yangtao Wang; Xi Shen; Shell Xu Hu; Yuan Yuan; James L. Crowley; Dominique Vaufreydaz; | code |
186 | Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection. |
Yong Zhang; Yingwei Pan; Ting Yao; Rui Huang; Tao Mei; Chang-Wen Chen; | code |
187 | Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper probes intrinsic factors behind typical failure cases (e.g spatial inconsistency and boundary confusion) produced by the existing state-of-the-art method in face parsing. To tackle these problems, we propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation (DML-CSR) for face parsing. |
Qingping Zheng; Jiankang Deng; Zheng Zhu; Ying Li; Stefanos Zafeiriou; | code |
188 | Glass: Geometric Latent Augmentation for Shape Spaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the problem of training generative models on very sparse collections of 3D models. |
Sanjeev Muralikrishnan; Siddhartha Chaudhuri; Noam Aigerman; Vladimir G. Kim; Matthew Fisher; Niloy J. Mitra; | code |
189 | DPICT: Deep Progressive Image Compression Using Trit-Planes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS). |
Jae-Han Lee; Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim; | code |
190 | Text to Image Generation With Semantic-Spatial Aware GAN Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A close inspection of their generated images reveals a major limitation: even though the generated image holistically matches the description, individual image regions or parts of somethings are often not recognizable or consistent with words in the sentence, e.g. "a white crown". To address this problem, we propose a novel framework Semantic-Spatial Aware GAN for synthesizing images from input text. |
Wentong Liao; Kai Hu; Michael Ying Yang; Bodo Rosenhahn; | code |
191 | Generalizable Cross-Modality Medical Image Segmentation Via Style Augmentation and Dual Normalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This setting, namely generalizable cross-modality segmentation, owning its clinical potential, is much more challenging than other related settings, e.g., domain adaptation. To achieve this goal, we in this paper propose a novel dual-normalization model by leveraging the augmented source-similar and source-dissimilar images during our generalizable segmentation. |
Ziqi Zhou; Lei Qi; Xin Yang; Dong Ni; Yinghuan Shi; | code |
192 | Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model. |
Yu Du; Fangyun Wei; Zihe Zhang; Miaojing Shi; Yue Gao; Guoqi Li; | code |
193 | Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects (just a few pixels wide) in large multi-megapixel high-dynamic-range (HDR) images. |
Chengyuan Xu; Boning Dong; Noah Stier; Curtis McCully; D. Andrew Howell; Pradeep Sen; Tobias Höllerer; | code |
194 | Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we focus on exploiting the high-precision and non-differentiable physics simulator to incorporate dynamical constraints in motion capture. |
Buzhen Huang; Liang Pan; Yuan Yang; Jingyi Ju; Yangang Wang; | code |
195 | Surface Representation for Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present RepSurf (representative surfaces), a novel representation of point clouds to explicitly depict the very local structure. |
Haoxi Ran; Jun Liu; Chengjie Wang; | code |
196 | Implicit Motion Handling for Video Camouflaged Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames. |
Xuelian Cheng; Huan Xiong; Deng-Ping Fan; Yiran Zhong; Mehrtash Harandi; Tom Drummond; Zongyuan Ge; | code |
197 | DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present DeepLIIF (https://deepliif.org), a first free online platform for efficient and reproducible IHC scoring. |
Parmida Ghahremani; Joseph Marino; Ricardo Dodds; Saad Nadeem; | code |
198 | Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study an untouched problem in visible-infrared person re-identification (VI-ReID), namely, Twin Noise Labels (TNL) which refers to as noisy annotation and correspondence. |
Mouxing Yang; Zhenyu Huang; Peng Hu; Taihao Li; Jiancheng Lv; Xi Peng; | code |
199 | Optical Flow Estimation for Spiking Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, frame-based and event-based methods are not well suited to spike streams from the spiking camera due to the different data modalities. To this end, we present, SCFlow, a tailored deep learning pipeline to estimate optical flow in high-speed scenes from spike streams. |
Liwen Hu; Rui Zhao; Ziluo Ding; Lei Ma; Boxin Shi; Ruiqin Xiong; Tiejun Huang; | code |
200 | GradViT: Gradient Inversion of Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks. |
Ali Hatamizadeh; Hongxu Yin; Holger R. Roth; Wenqi Li; Jan Kautz; Daguang Xu; Pavlo Molchanov; | code |
201 | Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution Via Cycle-Projected Mutual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a one-stage based Cycle-projected Mutual learning network (CycMu-Net) for ST-VSR, which makes full use of spatial-temporal correlations via the mutual learning between S-VSR and T-VSR. |
Mengshun Hu; Kui Jiang; Liang Liao; Jing Xiao; Junjun Jiang; Zheng Wang; | code |
202 | Joint Global and Local Hierarchical Priors for Learned Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, CNNs have a limitation in modeling long-range dependencies due to their nature of local connectivity, which can be a significant bottleneck in image compression where reducing spatial redundancy is a key point. To overcome this issue, we propose a novel entropy model called Information Transformer (Informer) that exploits both global and local information in a content-dependent manner using an attention mechanism. |
Jun-Hyuk Kim; Byeongho Heo; Jong-Seok Lee; | code |
203 | Knowledge Distillation Via The Target-Aware Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. |
Sihao Lin; Hongwei Xie; Bing Wang; Kaicheng Yu; Xiaojun Chang; Xiaodan Liang; Gang Wang; | code |
204 | Subspace Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To control the growth of the gradient, we propose a new AT method, Subspace Adversarial Training (Sub-AT), which constrains AT in a carefully extracted subspace. |
Tao Li; Yingwen Wu; Sizhe Chen; Kun Fang; Xiaolin Huang; | code |
205 | 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by deforming point clouds during training. |
Alexander Lehner; Stefano Gasperini; Alvaro Marcos-Ramiro; Michael Schmidt; Mohammad-Ali Nikouei Mahani; Nassir Navab; Benjamin Busam; Federico Tombari; | code |
206 | Image Segmentation Using Text and Image Prompts Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. |
Timo Lüddecke; Alexander Ecker; | code |
207 | AutoMine: An Unmanned Mine Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the open-pit mine is one of the typical representatives for them. Therefore, we introduce the Autonomous driving dataset on the Mining scene (AutoMine) for positioning and perception tasks in this paper. |
Yuchen Li; Zixuan Li; Siyu Teng; Yu Zhang; Yuhang Zhou; Yuchang Zhu; Dongpu Cao; Bin Tian; Yunfeng Ai; Zhe Xuanyuan; Long Chen; | code |
208 | Background Activation Suppression for Weakly Supervised Object Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Background Activation Suppression (BAS) method. |
Pingyu Wu; Wei Zhai; Yang Cao; | code |
209 | Synthetic Generation of Face Videos With Plethysmograph Physiology Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a scalable biophysical learning based method to generate physio-realistic synthetic rPPG videos given any reference image and target rPPG signal and shows that it could further improve the state-of-the-art physiological measurement and reduce the bias among different groups. |
Zhen Wang; Yunhao Ba; Pradyumna Chari; Oyku Deniz Bozkurt; Gianna Brown; Parth Patwa; Niranjan Vaddi; Laleh Jalilian; Achuta Kadambi; | code |
210 | Hallucinated Neural Radiance Fields in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing solutions adopt NeRF with a controllable appearance embedding to render novel views under various conditions, but they cannot render view-consistent images with an unseen appearance. To solve this problem, we present an end-to-end framework for constructing a hallucinated NeRF, dubbed as Ha-NeRF. |
Xingyu Chen; Qi Zhang; Xiaoyu Li; Yue Chen; Ying Feng; Xuan Wang; Jue Wang; | code |
211 | Global Tracking Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel transformer-based architecture for global multi-object tracking. |
Xingyi Zhou; Tianwei Yin; Vladlen Koltun; Philipp Krähenbühl; | code |
212 | Backdoor Attacks on Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. |
Aniruddha Saha; Ajinkya Tejankar; Soroush Abbasi Koohpayegani; Hamed Pirsiavash; | code |
213 | GMFlow: Learning Optical Flow Via Global Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose a GMFlow framework, which consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. |
Haofei Xu; Jing Zhang; Jianfei Cai; Hamid Rezatofighi; Dacheng Tao; | code |
214 | Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. |
Xian Liu; Qianyi Wu; Hang Zhou; Yinghao Xu; Rui Qian; Xinyi Lin; Xiaowei Zhou; Wayne Wu; Bo Dai; Bolei Zhou; | code |
215 | Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We endeavor on a rarely explored task named Insubstan-tial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and tempo-ral information is crucial. |
Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao; | code |
216 | Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named "Memory Replay" utilizing a memory graph. |
Lihuan Li; Maurice Pagnucco; Yang Song; | code |
217 | Scanline Homographies for Rolling-Shutter Plane Absolute Pose Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we give a solution to the absolute pose problem free of motion assumptions. |
Fang Bai; Agniva Sengupta; Adrien Bartoli; | code |
218 | AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: They adopt a sub-optimal uniform sampling point allocation, limiting the expressiveness of the learned LUTs since the (tri-)linear interpolation between uniform sampling points in the LUT transform might fail to model local non-linearities of the color transform. Focusing on this problem, we present AdaInt (Adaptive Intervals Learning), a novel mechanism to achieve a more flexible sampling point allocation by adaptively learning the non-uniform sampling intervals in the 3D color space. |
Canqian Yang; Meiguang Jin; Xu Jia; Yi Xu; Ying Chen; | code |
219 | Recurrent Glimpse-Based Decoder for Detection With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper. |
Zhe Chen; Jing Zhang; Dacheng Tao; | code |
220 | SimMIM: A Simple Framework for Masked Image Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents SimMIM, a simple framework for masked image modeling. |
Zhenda Xie; Zheng Zhang; Yue Cao; Yutong Lin; Jianmin Bao; Zhuliang Yao; Qi Dai; Han Hu; | code |
221 | Label Matching Semi-Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Despite the promising results, the label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. In this paper, we delve into this problem and propose a simple yet effective LabelMatch framework from two different yet complementary perspectives, i.e., distribution-level and instance-level. |
Binbin Chen; Weijie Chen; Shicai Yang; Yunyi Xuan; Jie Song; Di Xie; Shiliang Pu; Mingli Song; Yueting Zhuang; | code |
222 | RegionCLIP: Region-Based Language-Image Pretraining Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. |
Yiwu Zhong; Jianwei Yang; Pengchuan Zhang; Chunyuan Li; Noel Codella; Liunian Harold Li; Luowei Zhou; Xiyang Dai; Lu Yuan; Yin Li; Jianfeng Gao; | code |
223 | Video Frame Interpolation Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field. To address these issues, we propose a Transformer-based video interpolation framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. |
Zhihao Shi; Xiangyu Xu; Xiaohong Liu; Jun Chen; Ming-Hsuan Yang; | code |
224 | BCOT: A Markerless High-Precision 3D Object Tracking Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking. |
Jiachen Li; Bin Wang; Shiqiang Zhu; Xin Cao; Fan Zhong; Wenxuan Chen; Te Li; Jason Gu; Xueying Qin; | code |
225 | Omni-DETR: Omni-Supervised Object Detection With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection. |
Pei Wang; Zhaowei Cai; Hao Yang; Gurumurthy Swaminathan; Nuno Vasconcelos; Bernt Schiele; Stefano Soatto; | code |
226 | Transferable Sparse Adversarial Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on sparse adversarial attack based on the l_0 norm constraint, which can succeed by only modifying a few pixels of an image. |
Ziwen He; Wei Wang; Jing Dong; Tieniu Tan; | code |
227 | CREAM: Weakly Supervised Object Localization Via Class RE-Activation Mapping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we empirically prove that this problem is associated with the mixup of the activation values between less discriminative foreground regions and the background. To address it, we propose Class RE-Activation Mapping (CREAM), a novel clustering-based approach to boost the activation values of the integral object regions. |
Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng; Rui-Wei Zhao; Tao Zhang; Xuequan Lu; Shang Gao; | code |
228 | VALHALLA: Visual Hallucination for Machine Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. |
Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu (Richard) Chen; Rogerio S. Feris; David Cox; Nuno Vasconcelos; | code |
229 | HINT: Hierarchical Neuron Concept Explainer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings. |
Andong Wang; Wei-Ning Lee; Xiaojuan Qi; | code |
230 | Neural Face Identification in A 2D Wireframe Projection of A Manifold Object Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we approach the classical problem of face identification from a novel data-driven point of view. |
Kehan Wang; Jia Zheng; Zihan Zhou; | code |
231 | Nonuniform-to-Uniform Quantization: Towards Accurate Quantization Via Generalized Straight-Through Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. |
Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric P. Xing; Zhiqiang Shen; | code |
232 | An Empirical Study of End-to-End Temporal Action Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an empirical study of end-to-end temporal action detection. |
Xiaolong Liu; Song Bai; Xiang Bai; | code |
233 | Object Localization Under Single Coarse Point Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points. |
Xuehui Yu; Pengfei Chen; Di Wu; Najmul Hassan; Guorong Li; Junchi Yan; Humphrey Shi; Qixiang Ye; Zhenjun Han; | code |
234 | Unsupervised Learning of Accurate Siamese Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel unsupervised tracking framework, in which we can learn temporal correspondence both on the classification branch and regression branch. |
Qiuhong Shen; Lei Qiao; Jinyang Guo; Peixia Li; Xin Li; Bo Li; Weitao Feng; Weihao Gan; Wei Wu; Wanli Ouyang; | code |
235 | Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, we propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions. |
Jiayu Yang; Jose M. Alvarez; Miaomiao Liu; | code |
236 | Equalized Focal Loss for Dense Long-Tailed Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in the long-tailed scenario, this line of work has not been explored so far. In this paper, we investigate whether one-stage detectors can perform well in this case. |
Bo Li; Yongqiang Yao; Jingru Tan; Gang Zhang; Fengwei Yu; Jianwei Lu; Ye Luo; | code |
237 | DeepDPM: Deep Clustering With An Unknown Number of Clusters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times. In this work, we bridge this gap by introducing an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning. |
Meitar Ronen; Shahaf E. Finder; Oren Freifeld; | code |
238 | ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose ISDNet, a novel ultra-high resolution segmentation framework that integrates the shallow and deep networks in a new manner, which significantly accelerates the inference speed while achieving accurate segmentation. |
Shaohua Guo; Liang Liu; Zhenye Gan; Yabiao Wang; Wuhao Zhang; Chengjie Wang; Guannan Jiang; Wei Zhang; Ran Yi; Lizhuang Ma; Ke Xu; | code |
239 | Unsupervised Domain Adaptation for Nighttime Aerial Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work instead develops a novel unsupervised domain adaptation framework for nighttime aerial tracking (named UDAT). |
Junjie Ye; Changhong Fu; Guangze Zheng; Danda Pani Paudel; Guang Chen; | code |
240 | RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As face image contains abundant contextual information, we propose a method, RestoreFormer, which explores fully-spatial attentions to model contextual information and surpasses existing works that use local convolutions. |
Zhouxia Wang; Jiawei Zhang; Runjian Chen; Wenping Wang; Ping Luo; | code |
241 | Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. |
Yuanhao Cai; Jing Lin; Xiaowan Hu; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; | code |
242 | A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel variational Bayesian formulation for diffeomorphic non-rigid registration of medical images, which learns in an unsupervised way a data-specific similarity metric. |
Daniel Grzech; Mohammad Farid Azampour; Ben Glocker; Julia Schnabel; Nassir Navab; Bernhard Kainz; Loïc Le Folgoc; | code |
243 | Not Just Selection, But Exploration: Online Class-Incremental Continual Learning Via Dual View Consistency Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel yet effective framework for online class-incremental continual learning, which considers not only the selection of stored samples, but also the full exploration of the data stream. |
Yanan Gu; Xu Yang; Kun Wei; Cheng Deng; | code |
244 | Coupling Vision and Proprioception for Navigation of Legged Robots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We exploit the complementary strengths of vision and proprioception to develop a point-goal navigation system for legged robots, called VP-Nav. |
Zipeng Fu; Ashish Kumar; Ananye Agarwal; Haozhi Qi; Jitendra Malik; Deepak Pathak; | code |
245 | Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel optimization method based on a recurrent neural network to predict LiDAR scene flow in a weakly supervised manner. |
Guanting Dong; Yueyi Zhang; Hanlin Li; Xiaoyan Sun; Zhiwei Xiong; | code |
246 | EMOCA: Emotion Driven Monocular Face Capture and Animation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The result is facial geometries that do not match the emotional content of the input image. We address this with EMOCA (EMOtion Capture and Animation), by introducing a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image. |
Radek Daněček; Michael J. Black; Timo Bolkart; | code |
247 | Quarantine: Sparsity Can Uncover The Trojan Attack Trigger for Free Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. |
Tianlong Chen; Zhenyu Zhang; Yihua Zhang; Shiyu Chang; Sijia Liu; Zhangyang Wang; | code |
248 | AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing approaches ignored the distribution difference between training and testing data, thereby inducing a large quantization error in inference. To address this issue, we propose a new quantization scheme, Alignment Quantization with ADMM-based Correlation Preservation (AlignQ), which exploits the cumulative distribution function (CDF) to align the data to be i.i.d. (independently and identically distributed) for quantization error minimization. |
Ting-An Chen; De-Nian Yang; Ming-Syan Chen; | code |
249 | Interactive Multi-Class Tiny-Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such imagery typically contains objects from various categories, yet the multi-class interactive annotation setting for the detection task has thus far been unexplored. To address these needs, we propose a novel interactive annotation method for multiple instances of tiny objects from multiple classes, based on a few point-based user inputs. |
Chunggi Lee; Seonwook Park; Heon Song; Jeongun Ryu; Sanghoon Kim; Haejoon Kim; Sérgio Pereira; Donggeun Yoo; | code |
250 | Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to learn light field saliency from pixel-level noisy labels obtained from unsupervised hand crafted featured-based saliency methods. |
Mingtao Feng; Kendong Liu; Liang Zhang; Hongshan Yu; Yaonan Wang; Ajmal Mian; | code |
251 | Multi-View Depth Estimation By Fusing Single-View Depth Probability With Multi-View Geometry Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For such failure modes, single-view depth estimation methods are often more reliable. To this end, we propose MaGNet, a novel framework for fusing single-view depth probability with multi-view geometry, to improve the accuracy, robustness and efficiency of multi-view depth estimation. |
Gwangbin Bae; Ignas Budvytis; Roberto Cipolla; | code |
252 | Slimmable Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs. |
Rang Meng; Weijie Chen; Shicai Yang; Jie Song; Luojun Lin; Di Xie; Shiliang Pu; Xinchao Wang; Mingli Song; Yueting Zhuang; | code |
253 | High-Resolution Image Harmonization Via Collaborative Dual Transformations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network. |
Wenyan Cong; Xinhao Tao; Li Niu; Jing Liang; Xuesong Gao; Qihao Sun; Liqing Zhang; | code |
254 | MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. |
Inkyu Shin; Yi-Hsuan Tsai; Bingbing Zhuang; Samuel Schulter; Buyu Liu; Sparsh Garg; In So Kweon; Kuk-Jin Yoon; | code |
255 | Self-Supervised Neural Articulated Shape and Appearance Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. |
Fangyin Wei; Rohan Chabra; Lingni Ma; Christoph Lassner; Michael Zollhöfer; Szymon Rusinkiewicz; Chris Sweeney; Richard Newcombe; Mira Slavcheva; | code |
256 | Topology Preserving Local Road Network Estimation From Single Onboard Camera Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims at extracting the local road network topology, directly in the bird’s-eye-view (BEV), all in a complex urban setting. |
Yigit Baran Can; Alexander Liniger; Danda Pani Paudel; Luc Van Gool; | code |
257 | Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel algorithm to detect road lanes in the eigenlane space is proposed in this paper. |
Dongkwon Jin; Wonhui Park; Seong-Gyun Jeong; Heeyeon Kwon; Chang-Su Kim; | code |
258 | SwinTextSpotter: Scene Text Spotting Via Better Synergy Between Text Detection and Text Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter. |
Mingxin Huang; Yuliang Liu; Zhenghao Peng; Chongyu Liu; Dahua Lin; Shenggao Zhu; Nicholas Yuan; Kai Ding; Lianwen Jin; | code |
259 | Deblur-NeRF: Neural Radiance Fields From Blurry Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, image blurriness caused by defocus or motion, which often occurs when capturing scenes in the wild, significantly degrades its reconstruction quality. To address this problem, We propose Deblur-NeRF, the first method that can recover a sharp NeRF from blurry input. |
Li Ma; Xiaoyu Li; Jing Liao; Qi Zhang; Xuan Wang; Jue Wang; Pedro V. Sander; | code |
260 | Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments, and identity switches. To alleviate this propagation of errors, we propose a new prediction paradigm that uses detections and their affinity matrices across frames as inputs, removing the need for error-prone data association during tracking. |
Xinshuo Weng; Boris Ivanovic; Kris Kitani; Marco Pavone; | code |
261 | Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents Video K-Net, a simple, strong, and unified framework for fully end-to-end video panoptic segmentation. |
Xiangtai Li; Wenwei Zhang; Jiangmiao Pang; Kai Chen; Guangliang Cheng; Yunhai Tong; Chen Change Loy; | code |
262 | Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. |
Matias Mendieta; Taojiannan Yang; Pu Wang; Minwoo Lee; Zhengming Ding; Chen Chen; | code |
263 | Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel. |
Zongsheng Yue; Qian Zhao; Jianwen Xie; Lei Zhang; Deyu Meng; Kwan-Yee K. Wong; | code |
264 | Faithful Extreme Rescaling Via Generative Prior Reciprocated Invertible Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a Generative prior ReciprocAted Invertible rescaling Network (GRAIN) for generating faithful high-resolution (HR) images from low-resolution (LR) invertible images with an extreme upscaling factor (64x). |
Zhixuan Zhong; Liangyu Chai; Yang Zhou; Bailin Deng; Jia Pan; Shengfeng He; | code |
265 | Proto2Proto: Can You Recognize The Car, The Way I Do? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to another via knowledge distillation. |
Monish Keswani; Sriranjani Ramakrishnan; Nishant Reddy; Vineeth N Balasubramanian; | code |
266 | TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that these applications naturally exhibit the characteristics of large intra-image (spatial) variance and small cross-image variance. This observation motivates our efficient translation variant convolution (TVConv) for layout-aware visual processing. |
Jierun Chen; Tianlang He; Weipeng Zhuo; Li Ma; Sangtae Ha; S.-H. Gary Chan; | code |
267 | Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images captured by one camera to low-resolution (LR) images captured by arbitrary target devices. |
Xiaoqian Xu; Pengxu Wei; Weikai Chen; Yang Liu; Mingzhi Mao; Liang Lin; Guanbin Li; | code |
268 | Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments – (1) ObjectGoal Navigation (e.g. ‘find & go to a chair’) and (2) Pick&Place (e.g. ‘find mug, pick mug, find counter, place mug on counter’). |
Ram Ramrakhya; Eric Undersander; Dhruv Batra; Abhishek Das; | code |
269 | Simple But Effective: CLIP Embeddings for Embodied AI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. |
Apoorv Khandelwal; Luca Weihs; Roozbeh Mottaghi; Aniruddha Kembhavi; | code |
270 | NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Alternatively, a more graceful way is that global and local context can adaptively contribute per se to accommodate different visual data. To achieve this goal, we in this paper propose a novel ViT architecture, termed NomMer, which can dynamically Nominate the synergistic global-local context in vision transforMer. |
Hao Liu; Xinghua Jiang; Xin Li; Zhimin Bao; Deqiang Jiang; Bo Ren; | code |
271 | Collaborative Transformers for Grounded Situation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary. |
Junhyeong Cho; Youngseok Yoon; Suha Kwak; | code |
272 | CPPF: Towards Robust Category-Level 9D Pose Estimation in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the problem of category-level 9D pose estimation in the wild, given a single RGB-D frame. |
Yang You; Ruoxi Shi; Weiming Wang; Cewu Lu; | code |
273 | Continual Test-Time Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The noisy pseudo-labels can further lead to error accumulation and catastrophic forgetting. To tackle these issues, we propose a continual test-time adaptation approach (CoTTA) which comprises two parts. |
Qin Wang; Olga Fink; Luc Van Gool; Dengxin Dai; | code |
274 | Dynamic MLP for Fine-Grained Image Classification By Leveraging Geographical and Temporal Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully explore the potential of multimodal information, we propose a dynamic MLP on top of the image representation, which interacts with multimodal features at a higher and broader dimension. |
Lingfeng Yang; Xiang Li; Renjie Song; Borui Zhao; Juntian Tao; Shihao Zhou; Jiajun Liang; Jian Yang; | code |
275 | MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose MuKEA to represent multimodal knowledge by an explicit triplet to correlate visual objects and fact answers with implicit relations. |
Yang Ding; Jing Yu; Bang Liu; Yue Hu; Mingxin Cui; Qi Wu; | code |
276 | Fair Contrastive Learning for Facial Attribute Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we for the first time analyze unfairness caused by supervised contrastive learning and propose a new Fair Supervised Contrastive Loss (FSCL) for fair visual representation learning. |
Sungho Park; Jewook Lee; Pilhyeon Lee; Sunhee Hwang; Dohyung Kim; Hyeran Byun; | code |
277 | Directional Self-Supervised Learning for Heavy Image Augmentations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a directional self-supervised learning paradigm (DSSL), which is compatible with significantly more augmentations. |
Yalong Bai; Yifan Yang; Wei Zhang; Tao Mei; | code |
278 | No-Reference Point Cloud Quality Assessment Via Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel no-reference quality assessment metric, the image transferred point cloud quality assessment (IT-PCQA), for 3D point clouds. |
Qi Yang; Yipeng Liu; Siheng Chen; Yiling Xu; Jun Sun; | code |
279 | Comprehending and Ordering Semantics for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture. |
Yehao Li; Yingwei Pan; Ting Yao; Tao Mei; | code |
280 | A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. |
Sifeng He; Xudong Yang; Chen Jiang; Gang Liang; Wei Zhang; Tan Pan; Qing Wang; Furong Xu; Chunguang Li; JinXiong Liu; Hui Xu; Kaiming Huang; Yuan Cheng; Feng Qian; Xiaobo Zhang; Lei Yang; | code |
281 | Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy. |
Jingzhou Chen; Peng Wang; Jian Liu; Yuntao Qian; | code |
282 | HeadNeRF: A Real-Time NeRF-Based Parametric Head Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose HeadNeRF, a novel NeRF-based parametric head model that integrates the neural radiance field to the parametric representation of the human head. |
Yang Hong; Bo Peng; Haiyao Xiao; Ligang Liu; Juyong Zhang; | code |
283 | Occlusion-Robust Face Alignment Using A Viewpoint-Invariant Hierarchical Network Architecture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new network architecture called GlomFace to model the facial hierarchies against various occlusions, which draws inspiration from the viewpoint-invariant hierarchy of facial structure. |
Congcong Zhu; Xintong Wan; Shaorong Xie; Xiaoqiang Li; Yinzheng Gu; | code |
284 | IDR: Self-Supervised Image Denoising Via Iterative Data Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a practical unsupervised image denoising method to achieve state-of-the-art denoising performance. |
Yi Zhang; Dasong Li; Ka Lung Law; Xiaogang Wang; Hongwei Qin; Hongsheng Li; | code |
285 | MogFace: Towards A Deeper Appreciation on Face Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on resolving three aforementioned challenges that exiting methods are difficult to finish off and present a novel face detector, termed MogFace. |
Yang Liu; Fei Wang; Jiankang Deng; Zhipeng Zhou; Baigui Sun; Hao Li; | code |
286 | Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, to address the aforementioned problem, we introduce Transformers, which naturally integrate global information, to generate more integral initial pseudo labels for end-to-end WSSS. |
Lixiang Ru; Yibing Zhan; Baosheng Yu; Bo Du; | code |
287 | CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. |
Haisong Liu; Tao Lu; Yihui Xu; Jia Liu; Wenjie Li; Lijun Chen; | code |
288 | FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. |
Yan Wang; Yixuan Sun; Yiwen Huang; Zhongying Liu; Shuyong Gao; Wei Zhang; Weifeng Ge; Wenqiang Zhang; | code |
289 | Learning To Detect Mobile Objects From LiDAR Scans Without Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. |
Yurong You; Katie Luo; Cheng Perng Phoo; Wei-Lun Chao; Wen Sun; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; | code |
290 | WildNet: Learning Domain Generalized Semantic Segmentation From The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to diversify both the content and style of the source domain with the help of the wild. |
Suhyeon Lee; Hongje Seong; Seongwon Lee; Euntai Kim; | code |
291 | DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations. |
Haibao Yu; Yizhen Luo; Mao Shu; Yiyi Huo; Zebang Yang; Yifeng Shi; Zhenglong Guo; Hanyu Li; Xing Hu; Jirui Yuan; Zaiqing Nie; | code |
292 | Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the aforementioned problems, we propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level. |
Yuenan Hou; Xinge Zhu; Yuexin Ma; Chen Change Loy; Yikang Li; | code |
293 | Generating Diverse 3D Reconstructions From A Single Occluded Face Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, while a plurality of 3D reconstructions is plausible in the occluded regions, existing approaches are limited to generating only a single solution. To address both of these challenges, we present Diverse3DFace, which is specifically designed to simultaneously generate a diverse and realistic set of 3D reconstructions from a single occluded face image. |
Rahul Dey; Vishnu Naresh Boddeti; | code |
294 | Stand-Alone Inter-Frame Attention in Video Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location. |
Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Jiebo Luo; Tao Mei; | code |
295 | Large-Scale Pre-Training for Person Re-Identification With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels. |
Dengpan Fu; Dongdong Chen; Hao Yang; Jianmin Bao; Lu Yuan; Lei Zhang; Houqiang Li; Fang Wen; Dong Chen; | code |
296 | Semantic Segmentation By Early Region Proxy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics. |
Yifan Zhang; Bo Pang; Cewu Lu; | code |
297 | LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To apply gesture recognition to long-distance interactive scenes such as meetings and smart homes, a large RGB-D video dataset LD-ConGR is established in this paper. |
Dan Liu; Libo Zhang; Yanjun Wu; | code |
298 | HVH: Learning A Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the aforementioned problems: 1) we use a novel, volumetric hair representation that is composed of thousands of primitives. |
Ziyan Wang; Giljoo Nam; Tuur Stuyck; Stephen Lombardi; Michael Zollhöfer; Jessica Hodgins; Christoph Lassner; | code |
299 | Rethinking Visual Geo-Localization for Large-Scale Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning. |
Gabriele Berton; Carlo Masone; Barbara Caputo; | code |
300 | The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information. |
Tianlong Chen; Zhenyu Zhang; Yu Cheng; Ahmed Awadallah; Zhangyang Wang; | code |
301 | ViM: Out-of-Distribution With Virtual-Logit Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There are OOD samples that are easy to identify in the feature space while hard to distinguish in the logit space and vice versa. Motivated by this observation, we propose a novel OOD scoring method named Virtual-logit Matching (ViM), which combines the class-agnostic score from feature space and the In-Distribution (ID) class-dependent logits. |
Haoqi Wang; Zhizhong Li; Litong Feng; Wayne Zhang; | code |
302 | Class-Aware Contrastive Semi-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the model’s judgment becomes noisier in real-world applications with extensive out-of-distribution data. To address this issue, we propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL), which is a drop-in helper to improve the pseudo-label quality and enhance the model’s robustness in the real-world setting. |
Fan Yang; Kai Wu; Shuyi Zhang; Guannan Jiang; Yong Liu; Feng Zheng; Wei Zhang; Chengjie Wang; Long Zeng; | code |
303 | Ditto: Building Digital Twins of Articulated Objects From Interaction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception. |
Zhenyu Jiang; Cheng-Chun Hsu; Yuke Zhu; | code |
304 | Adaptive Early-Learning Correction for Segmentation From Noisy Annotations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the learning dynamics of deep segmentation networks trained on inaccurately-annotated data. |
Sheng Liu; Kangning Liu; Weicheng Zhu; Yiqiu Shen; Carlos Fernandez-Granda; | code |
305 | Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing works, e.g., using the twilight as the intermediate target domain to perform the adaptation from daytime to nighttime, may fail to cope with the inherent difference between datasets caused by the camera equipment and the urban style. Faced with these two types of domain shifts, i.e., the illumination and the inherent difference of the datasets, we propose a novel domain adaptation framework via cross-domain correlation distillation, called CCDistill. |
Huan Gao; Jichang Guo; Guoli Wang; Qian Zhang; | code |
306 | RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The existing methods based on Convolutional Neural Network (CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. |
Zhicheng Geng; Luming Liang; Tianyu Ding; Ilya Zharkov; | code |
307 | Partial Class Activation Attention for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Beyond the previous CAM generated from image-level classification, we present Partial CAM, which subdivides the task into region-level prediction and achieves better localization performance. |
Sun-Ao Liu; Hongtao Xie; Hai Xu; Yongdong Zhang; Qi Tian; | code |
308 | Multi-Scale Memory-Based Video Deblurring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To achieve fine-grained deblurring, we designed a memory branch to memorize the blurry-sharp feature pairs in the memory bank, thus providing useful information for the blurry query input. |
Bo Ji; Angela Yao; | code |
309 | A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a scalable combinatorial algorithm for globally optimizing over the space of geometrically consistent mappings between 3D shapes. |
Paul Roetzer; Paul Swoboda; Daniel Cremers; Florian Bernard; | code |
310 | Geometric Structure Preserving Warp for Natural Image Stitching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of the existing methods ignore the large-scale layouts reflected by straight lines or curves, decreasing overall stitching quality. To address this issue, this work presents a structure-preserving stitching approach that produces images with natural visual effects and less distortion. |
Peng Du; Jifeng Ning; Jiguang Cui; Shaoli Huang; Xinchao Wang; Jiaxin Wang; | code |
311 | GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For the first time, we address the problem of generating full-body, hand and head motions of an avatar grasping an unknown object. |
Omid Taheri; Vasileios Choutas; Michael J. Black; Dimitrios Tzionas; | code |
312 | Conditional Prompt Learning for Vision-Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector). |
Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu; | code |
313 | Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To do so, in this paper, we propose an efficient mini-batch sampling method, called graph sampling (GS), for large-scale deep metric learning. |
Shengcai Liao; Ling Shao; | code |
314 | Undoing The Damage of Label Shift for Cross-Domain Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we give an in-depth analysis and show that the damage of label shift can be overcome by aligning the data conditional distribution and correcting the posterior probability. |
Yahao Liu; Jinhong Deng; Jiale Tao; Tong Chu; Lixin Duan; Wen Li; | code |
315 | FisherMatch: Semi-Supervised Rotation Regression Via Entropy-Based Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the popular semi-supervised approach, FixMatch, we propose to leverage pseudo label filtering to facilitate the information flow from labeled data to unlabeled data in a teacher-student mutual learning framework. |
Yingda Yin; Yingcheng Cai; He Wang; Baoquan Chen; | code |
316 | Affine Medical Image Registration With Coarse-To-Fine Vision Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. |
Tony C. W. Mok; Albert C. S. Chung; | code |
317 | A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the higher resolution (e.g., 4K) of modern imaging devices results in larger displacement between frames. To address these challenges, we design a differentiable two-stage alignment scheme sequentially in patch and pixel level for effective JDD-B. |
Shi Guo; Xi Yang; Jianqi Ma; Gaofeng Ren; Lei Zhang; | code |
318 | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning. |
Jon Donnelly; Alina Jade Barnett; Chaofan Chen; | code |
319 | Restormer: Efficient Transformer for High-Resolution Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. |
Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; | code |
320 | IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing. |
Lingtong Kong; Boyuan Jiang; Donghao Luo; Wenqing Chu; Xiaoming Huang; Ying Tai; Chengjie Wang; Jie Yang; | code |
321 | Large Loss Matters in Weakly Supervised Multi-Label Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: That is, the model first learns the representation of clean labels, and then starts memorizing noisy labels. Based on this finding, we propose novel methods for WSML which reject or correct the large loss samples to prevent model from memorizing the noisy label. |
Youngwook Kim; Jae Myung Kim; Zeynep Akata; Jungwoo Lee; | code |
322 | Neural Inertial Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes the inertial localization problem, the task of estimating the absolute location from a sequence of inertial sensor measurements. |
Sachini Herath; David Caruso; Chen Liu; Yufan Chen; Yasutaka Furukawa; | code |
323 | GraftNet: Towards Domain Generalized Stereo Matching With A Broad-Spectrum and Task-Oriented Feature Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to leverage the feature of a model trained on large-scale datasets to deal with the domain shift since it has seen various styles of images. |
Biyang Liu; Huimin Yu; Guodong Qi; | code |
324 | VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. |
Wenjia Xu; Yongqin Xian; Jiuniu Wang; Bernt Schiele; Zeynep Akata; | code |
325 | Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach that learns disentangled representations of abnormalities illustrated by seen anomalies, pseudo anomalies, and latent residual anomalies (i.e., samples that have unusual residuals compared to the normal data in a latent space), with the last two abnormalities designed to detect unseen anomalies. |
Choubo Ding; Guansong Pang; Chunhua Shen; | code |
326 | MLSLT: Towards Multilingual Sign Language Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, such models are inefficient in building multilingual sign language translation systems. To solve this problem, we introduce the multilingual sign language translation (MSLT) task. |
Aoxiong Yin; Zhou Zhao; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He; | code |
327 | Towards An End-to-End Framework for Flow-Guided Video Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules. |
Zhen Li; Cheng-Ze Lu; Jianhua Qin; Chun-Le Guo; Ming-Ming Cheng; | code |
328 | Contrastive Test-Time Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels. |
Dian Chen; Dequan Wang; Trevor Darrell; Sayna Ebrahimi; | code |
329 | MotionAug: Augmentation With Physical Correction for Human Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a motion data augmentation scheme incorporating motion synthesis encouraging diversity and motion correction imposing physical plausibility. |
Takahiro Maeda; Norimichi Ukita; | code |
330 | Modeling Indirect Illumination for Inverse Rendering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach to efficiently recovering spatially-varying indirect illumination. |
Yuanqing Zhang; Jiaming Sun; Xingyi He; Huan Fu; Rongfei Jia; Xiaowei Zhou; | code |
331 | TransWeather: Transformer-Based Restoration of Images Degraded By Adverse Weather Conditions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we focus on developing an efficient solution for the all adverse weather removal problem. |
Jeya Maria Jose Valanarasu; Rajeev Yasarla; Vishal M. Patel; | code |
332 | H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hierarchical Feature Alignment (H^2FA) R-CNN. |
Yunqiu Xu; Yifan Sun; Zongxin Yang; Jiaxu Miao; Yi Yang; | code |
333 | P3Depth: Monocular Depth Estimation With A Piecewise Planarity Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth. |
Vaishakh Patil; Christos Sakaridis; Alexander Liniger; Luc Van Gool; | code |
334 | GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects. |
Yue Liao; Aixi Zhang; Miao Lu; Yongliang Wang; Xiaobo Li; Si Liu; | code |
335 | Simple Multi-Dataset Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. |
Xingyi Zhou; Vladlen Koltun; Philipp Krähenbühl; | code |
336 | Proactive Image Manipulation Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, we propose a proactive scheme to image manipulation detection. |
Vishal Asnani; Xi Yin; Tal Hassner; Sijia Liu; Xiaoming Liu; | code |
337 | StyTr2: Image Style Transfer With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Therefore, traditional neural style transfer methods face biased content representation. To address this critical issue, we take long-range dependencies of input images into account for image style transfer by proposing a transformer-based approach called StyTr^2. |
Yingying Deng; Fan Tang; Weiming Dong; Chongyang Ma; Xingjia Pan; Lei Wang; Changsheng Xu; | code |
338 | Global Matching With Overlapping Attention for Optical Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet. |
Shiyu Zhao; Long Zhao; Zhixing Zhang; Enyu Zhou; Dimitris Metaxas; | code |
339 | Language As Queries for Referring Video Object Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a simple and unified framework built upon Transformer, termed ReferFormer. |
Jiannan Wu; Yi Jiang; Peize Sun; Zehuan Yuan; Ping Luo; | code |
340 | MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection. |
Yanghao Li; Chao-Yuan Wu; Haoqi Fan; Karttikeya Mangalam; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer; | code |
341 | Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes. |
Otniel-Bogdan Mercea; Lukas Riesch; A. Sophia Koepke; Zeynep Akata; | code |
342 | Rethinking Efficient Lane Detection Via Curve Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel parametric curve-based method for lane detection in RGB images. |
Zhengyang Feng; Shaohua Guo; Xin Tan; Ke Xu; Min Wang; Lizhuang Ma; | code |
343 | Self-Supervised Arbitrary-Scale Point Clouds Upsampling Via Implicit Neural Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach that achieves selfsupervised and magnification-flexible point clouds upsampling simultaneously. |
Wenbo Zhao; Xianming Liu; Zhiwei Zhong; Junjun Jiang; Wei Gao; Ge Li; Xiangyang Ji; | code |
344 | Co-Advise: Cross Inductive Bias Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). |
Sucheng Ren; Zhengqi Gao; Tianyu Hua; Zihui Xue; Yonglong Tian; Shengfeng He; Hang Zhao; | code |
345 | AdaMixer: A Fast-Converging Query-Based Object Detector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects. |
Ziteng Gao; Limin Wang; Bing Han; Sheng Guo; | code |
346 | DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In these, there are limited number of WSI slides (bags), while the resolution of a single WSI is huge, which leads to a large number of patches (instances) cropped from this slide. To address this issue, we propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags, on which a double-tier MIL framework is built to effectively use the intrinsic features. |
Hongrun Zhang; Yanda Meng; Yitian Zhao; Yihong Qiao; Xiaoyun Yang; Sarah E. Coupland; Yalin Zheng; | code |
347 | BEVT: BERT Pretraining of Video Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce BEVT which decouples video representation learning into spatial representation learning and temporal dynamics learning. |
Rui Wang; Dongdong Chen; Zuxuan Wu; Yinpeng Chen; Xiyang Dai; Mengchen Liu; Yu-Gang Jiang; Luowei Zhou; Lu Yuan; | code |
348 | Deep Generalized Unfolding Networks for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Deep Generalized Unfolding Network (DGUNet) for image restoration. |
Chong Mou; Qian Wang; Jian Zhang; | code |
349 | VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel single-stage framework for online VIS built based on the grid structured feature representation. |
Su Ho Han; Sukjun Hwang; Seoung Wug Oh; Yeonchool Park; Hyunwoo Kim; Min-Jung Kim; Seon Joo Kim; | code |
350 | Deep Unlearning Via Randomized Conditionally Independent Hessians Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. |
Ronak Mehta; Sourav Pal; Vikas Singh; Sathya N. Ravi; | code |
351 | Revisiting Skeleton-Based Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. |
Haodong Duan; Yue Zhao; Kai Chen; Dahua Lin; Bo Dai; | code |
352 | Stereo Depth From Events Cameras: Concentrate and Focus on The Future Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate the event missing or overriding issue, we propose to learn to concentrate on the dense events to produce a compact event representation with high details for depth estimation. |
Yeongwoo Nam; Mohammad Mostafavi; Kuk-Jin Yoon; Jonghyun Choi; | code |
353 | A Simple Data Mixing Prior for Improving Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component for advancing recognition models. In this paper, we focus on studying its effectiveness in the self-supervised setting. |
Sucheng Ren; Huiyu Wang; Zhengqi Gao; Shengfeng He; Alan Yuille; Yuyin Zhou; Cihang Xie; | code |
354 | Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks. |
Ruifei He; Shuyang Sun; Jihan Yang; Song Bai; Xiaojuan Qi; | code |
355 | BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL [19] and Analytics Zoo [18] projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). |
Jason (Jinquan) Dai; Ding Ding; Dongjie Shi; Shengsheng Huang; Jiao Wang; Xin Qiu; Kai Huang; Guoqiong Song; Yang Wang; Qiyuan Gong; Jiaming Song; Shan Yu; Le Zheng; Yina Chen; Junwei Deng; Ge Song; | code |
356 | Attentive Fine-Grained Structured Sparsity for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To further optimize the trade-off between the efficiency and the restoration accuracy, we propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer. |
Junghun Oh; Heewon Kim; Seungjun Nah; Cheeun Hong; Jonghyun Choi; Kyoung Mu Lee; | code |
357 | Learning Fair Classifiers With Partially Annotated Group Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a more practical scenario, dubbed as Algorithmic Group Fairness with the Partially annotated Group labels (Fair-PG). |
Sangwon Jung; Sanghyuk Chun; Taesup Moon; | code |
358 | NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose NightLab, a novel nighttime segmentation framework that leverages multiple deep learning models imbued with night-aware features to yield State-of-The-Art (SoTA) performance on multiple night segmentation benchmarks. |
Xueqing Deng; Peng Wang; Xiaochen Lian; Shawn Newsam; | code |
359 | Constrained Few-Shot Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. |
Michael Hersche; Geethan Karunaratne; Giovanni Cherubini; Luca Benini; Abu Sebastian; Abbas Rahimi; | code |
360 | Threshold Matters in WSSS: Manipulating The Activation for The Robust and Accurate Segmentation Model Against Thresholds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Then, we show that this issue can be mitigated by satisfying two conditions; 1) reducing the imbalance in the foreground activation and 2) increasing the gap between the foreground and the background activation. Based on these findings, we propose a novel activation manipulation network with a per-pixel classification loss and a label conditioning module. |
Minhyun Lee; Dongseob Kim; Hyunjung Shim; | code |
361 | TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present TransMVSNet, based on our exploration of feature matching in multi-view stereo (MVS). |
Yikang Ding; Wentao Yuan; Qingtian Zhu; Haotian Zhang; Xiangyue Liu; Yuanjiang Wang; Xiao Liu; | code |
362 | DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose DPGEN, a network model designed to synthesize high-resolution natural images while satisfying differential privacy. |
Jia-Wei Chen; Chia-Mu Yu; Ching-Chia Kao; Tzai-Wei Pang; Chun-Shien Lu; | code |
363 | The Majority Can Help The Minority: Context-Rich Minority Oversampling for Long-Tailed Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images. |
Seulki Park; Youngkyu Hong; Byeongho Heo; Sangdoo Yun; Jin Young Choi; | code |
364 | IntentVizor: Towards Generic Query Guided Interactive Video Summarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. |
Guande Wu; Jianzhe Lin; Claudio T. Silva; | code |
365 | Shape-Invariant 3D Adversarial Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations. |
Qidong Huang; Xiaoyi Dong; Dongdong Chen; Hang Zhou; Weiming Zhang; Nenghai Yu; | code |
366 | Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we strive to liberate ViTs from pre-training by introducing CNNs’ inductive biases back to ViTs while preserving their network architectures for higher upper bound and setting up more suitable optimization objectives. |
Haofei Zhang; Jiarui Duan; Mengqi Xue; Jie Song; Li Sun; Mingli Song; | code |
367 | PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, one of the greatest challenges remains the creation of datasets with complete, unambiguous ground truth at scale. To address this, we develop a new, more comprehensive dataset for table extraction, called PubTables-1M. |
Brandon Smock; Rohith Pesala; Robin Abraham; | code |
368 | Meta-Attention for ViT-Backed Continual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study ViT-backed continual learning to strive for higher performance riding on recent advances of ViTs. |
Mengqi Xue; Haofei Zhang; Jie Song; Mingli Song; | code |
369 | DST: Dynamic Substitute Training for Data-Free Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model. |
Wenxuan Wang; Xuelin Qian; Yanwei Fu; Xiangyang Xue; | code |
370 | Unified Contrastive Learning in Image-Text-Label Space Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. |
Jianwei Yang; Chunyuan Li; Pengchuan Zhang; Bin Xiao; Ce Liu; Lu Yuan; Jianfeng Gao; | code |
371 | Unsupervised Pre-Training for Temporal Action Localization Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization. To bridge this gap, we make the first attempt to propose a self-supervised pretext task, coined as Pseudo Action Localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action Localization tasks (UP-TAL). |
Can Zhang; Tianyu Yang; Junwu Weng; Meng Cao; Jue Wang; Yuexian Zou; | code |
372 | Look Outside The Room: Synthesizing A Consistent Long-Term 3D Scene Video From A Single Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions. |
Xuanchi Ren; Xiaolong Wang; | code |
373 | High-Fidelity Human Avatars From A Single RGB Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a coarse-to-fine framework to reconstruct a personalized high-fidelity human avatar from a monocular video. |
Hao Zhao; Jinsong Zhang; Yu-Kun Lai; Zerong Zheng; Yingdi Xie; Yebin Liu; Kun Li; | code |
374 | Multiview Transformers for Video Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although transformer architectures have recently advanced the state-of-the-art, they have not explicitly modelled different spatiotemporal resolutions. To this end, we present Multiview Transformers for Video Recognition (MTV). |
Shen Yan; Xuehan Xiong; Anurag Arnab; Zhichao Lu; Mi Zhang; Chen Sun; Cordelia Schmid; | code |
375 | How Good Is Aesthetic Ability of A Fashion Model? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce A100 (Aesthetic 100) to assess the aesthetic ability of the fashion compatibility models. |
Xingxing Zou; Kaicheng Pang; Wen Zhang; Waikeung Wong; | code |
376 | Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, large disparities between existing synthetic datasets and real scenes lead to poor model transfer. We make two major contributions to address that. |
Zhao Jin; Yinjie Lei; Naveed Akhtar; Haifeng Li; Munawar Hayat; | code |
377 | Sequential Voting With Relational Box Fields for Active Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To leverage each pixel as evidence to determine the bounding box of the active object, we propose a pixel-wise voting function. |
Qichen Fu; Xingyu Liu; Kris Kitani; | code |
378 | Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we switch from D models to G models using the classical auto-encoder (AE). |
Guangrun Wang; Yansong Tang; Liang Lin; Philip H.S. Torr; | code |
379 | Consistency Learning Via Decoding Path Augmentation for Transformers in Human Object Interaction Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths. |
Jihwan Park; SeungJun Lee; Hwan Heo; Hyeong Kyu Choi; Hyunwoo J. Kim; | code |
380 | Consistent Explanations By Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Given an interpretation algorithm, e.g., Grad-CAM, we introduce a novel training method to train the model to produce more consistent explanations. |
Vipin Pillai; Soroush Abbasi Koohpayegani; Ashley Ouligian; Dennis Fong; Hamed Pirsiavash; | code |
381 | Hierarchical Modular Network for Video Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics from three levels before generating captions. |
Hanhua Ye; Guorong Li; Yuankai Qi; Shuhui Wang; Qingming Huang; Ming-Hsuan Yang; | code |
382 | Depth Estimation By Combining Binocular Stereo and Monocular Structured-Light Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel stereo system, which consists of two cameras (an RGB camera and an IR camera) and an IR speckle projector. |
Yuhua Xu; Xiaoli Yang; Yushan Yu; Wei Jia; Zhaobi Chu; Yulan Guo; | code |
383 | Salient-to-Broad Transition for Video Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the limited utilization of temporal relations in video re-id, the frame-level attention regions of mainstream methods are partial and highly similar. To address this problem, we propose a Salient-to-Broad Module (SBM) to enlarge the attention regions gradually. |
Shutao Bai; Bingpeng Ma; Hong Chang; Rui Huang; Xilin Chen; | code |
384 | DeeCap: Dynamic Early Exiting for Efficient Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On the other hand, the exiting decisions made by internal classifiers are unreliable sometimes. To solve these issues, we propose DeeCap framework for efficient image captioning, which dynamically selects proper-sized decoding layers from a global perspective to exit early. |
Zhengcong Fei; Xu Yan; Shuhui Wang; Qi Tian; | code |
385 | RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a parallel conv kernel into the FC kernel. |
Xiaohan Ding; Honghao Chen; Xiangyu Zhang; Jungong Han; Guiguang Ding; | code |
386 | DR.VIC: Decomposition and Reasoning for Video Individual Counting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to conduct the pedestrian counting from a new perspective – Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once). |
Tao Han; Lei Bai; Junyu Gao; Qi Wang; Wanli Ouyang; | code |
387 | ARCS: Accurate Rotation and Correspondence Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper is about the old Wahba problem in its more general form, which we call simultaneous rotation and correspondence search. |
Liangzu Peng; Manolis C. Tsakiris; René Vidal; | code |
388 | Learning To Anticipate Future With Dynamic Context Removal Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this filed, previous methods usually care more about the model architecture design or but few attention has been put on how to train an anticipation model with a proper learning policy. To this end, in this work, we propose a novel training scheme called Dynamic Context Removal (DCR), which dynamically schedule the visibility of observed future in the learning procedure. |
Xinyu Xu; Yong-Lu Li; Cewu Lu; | code |
389 | GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a generative and controllable face SR framework, called GCFSR, which can reconstruct images with faithful identity information without any additional priors. |
Jingwen He; Wu Shi; Kai Chen; Lean Fu; Chao Dong; | code |
390 | On The Integration of Self-Attention and Convolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. |
Xuran Pan; Chunjiang Ge; Rui Lu; Shiji Song; Guanfu Chen; Zeyi Huang; Gao Huang; | code |
391 | Domain Adaptation on Point Clouds Via Geometry-Aware Implicits Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose a simple yet effective method for unsupervised domain adaptation on point clouds by employing a self-supervised task of learning geometry-aware implicits, which plays two critical roles in one shot. |
Yuefan Shen; Yanchao Yang; Mi Yan; He Wang; Youyi Zheng; Leonidas J. Guibas; | code |
392 | GroupViT: Semantic Segmentation Emerges From Text Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, in this paper, we propose to bring back the grouping mechanism into deep networks, which allows semantic segments to emerge automatically with only text supervision. |
Jiarui Xu; Shalini De Mello; Sifei Liu; Wonmin Byeon; Thomas Breuel; Jan Kautz; Xiaolong Wang; | code |
393 | DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs text-driven image manipulation using diffusion models. |
Gwanghyun Kim; Taesung Kwon; Jong Chul Ye; | code |
394 | BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks Via Image Quantization and Contrastive Adversarial Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose stealthy and efficient Trojan attacks, BppAttack. |
Zhenting Wang; Juan Zhai; Shiqing Ma; | code |
395 | Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Towards this end, in this paper, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder. |
Xingning Dong; Tian Gan; Xuemeng Song; Jianlong Wu; Yuan Cheng; Liqiang Nie; | code |
396 | Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to employ mode connectivity in loss landscapes to achieve better plasticity-stability trade-off without any previous samples. |
Guoliang Lin; Hanlu Chu; Hanjiang Lai; | code |
397 | Topology-Preserving Shape Reconstruction and Registration Via Neural Diffeomorphic Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new model called Neural Diffeomorphic Flow (NDF) to learn deep implicit shape templates, representing shapes as conditional diffeomorphic deformations of templates, intrinsically preserving shape topologies. |
Shanlin Sun; Kun Han; Deying Kong; Hao Tang; Xiangyi Yan; Xiaohui Xie; | code |
398 | Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Segment and Complete defense (SAC), a general framework for defending object detectors against patch attacks through detection and removal of adversarial patches. |
Jiang Liu; Alexander Levine; Chun Pong Lau; Rama Chellappa; Soheil Feizi; | code |
399 | MAXIM: Multi-Axis MLP for Image Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. |
Zhengzhong Tu; Hossein Talebi; Han Zhang; Feng Yang; Peyman Milanfar; Alan Bovik; Yinxiao Li; | code |
400 | Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the idea of learning part segmentation through unsupervised domain adaptation (UDA) from synthetic data. |
Qing Liu; Adam Kortylewski; Zhishuai Zhang; Zizhang Li; Mengqi Guo; Qihao Liu; Xiaoding Yuan; Jiteng Mu; Weichao Qiu; Alan Yuille; | code |
401 | PSTR: End-to-End One-Step Person Search With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture. |
Jiale Cao; Yanwei Pang; Rao Muhammad Anwer; Hisham Cholakkal; Jin Xie; Mubarak Shah; Fahad Shahbaz Khan; | code |
402 | NFormer: Robust Person Re-Identification With Neighbor Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, due to the high intra-identity variations, ignoring such interactions typically leads to outlier features. To tackle this issue, we propose a Neighbor Transformer Network, or NFormer, which explicitly models interactions across all input images, thus suppressing outlier features and leading to more robust representations overall. |
Haochen Wang; Jiayi Shen; Yongtuo Liu; Yan Gao; Efstratios Gavves; | code |
403 | Bridging Global Context Interactions for High-Fidelity Image Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range dependence. |
Chuanxia Zheng; Tat-Jen Cham; Jianfei Cai; Dinh Phung; | code |
404 | SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present SwinBERT, an end-to-end transformer-based model for video captioning, which takes video frame patches directly as inputs, and outputs a natural language description. |
Kevin Lin; Linjie Li; Chung-Ching Lin; Faisal Ahmed; Zhe Gan; Zicheng Liu; Yumao Lu; Lijuan Wang; | code |
405 | Not All Tokens Are Equal: Human-Centric Visual Analysis Via Token Clustering Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, not all regions are equally important in human-centric vision tasks, e.g., the human body needs a fine representation with many tokens, while the image background can be modeled by a few tokens. To address this problem, we propose a novel Vision Transformer, called Token Clustering Transformer (TCFormer), which merges tokens by progressive clustering, where the tokens can be merged from different locations with flexible shapes and sizes. |
Wang Zeng; Sheng Jin; Wentao Liu; Chen Qian; Ping Luo; Wanli Ouyang; Xiaogang Wang; | code |
406 | Temporally Efficient Vision Transformer for Video Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS). |
Shusheng Yang; Xinggang Wang; Yu Li; Yuxin Fang; Jiemin Fang; Wenyu Liu; Xun Zhao; Ying Shan; | code |
407 | The Devil Is in The Margin: Margin-Based Label Smoothing for Network Calibration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. |
Bingyuan Liu; Ismail Ben Ayed; Adrian Galdran; Jose Dolz; | code |
408 | NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce NLX-GPT, a general, compact and faithful language model that can simultaneously predict an answer and explain it. |
Fawaz Sammani; Tanmoy Mukherjee; Nikos Deligiannis; | code |
409 | WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose WarpingGAN, an effective and efficient 3D point cloud generation network. |
Yingzhi Tang; Yue Qian; Qijian Zhang; Yiming Zeng; Junhui Hou; Xuefei Zhe; | code |
410 | Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To eliminate the heavy dependence on human annotations, we present a novel method, named Pseudo-Q, to automatically generate pseudo language queries for supervised training. |
Haojun Jiang; Yuanze Lin; Dongchen Han; Shiji Song; Gao Huang; | code |
411 | E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that event data is a very valuable modality for egocentric action recognition. |
Chiara Plizzari; Mirco Planamente; Gabriele Goletto; Marco Cannici; Emanuele Gusso; Matteo Matteucci; Barbara Caputo; | code |
412 | OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we first identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, through extensive experiments, we compare OoD generalization algorithms across two groups of benchmarks, each dominated by one of the distribution shifts, revealing their strengths on one shift as well as limitations on the other shift. |
Nanyang Ye; Kaican Li; Haoyue Bai; Runpeng Yu; Lanqing Hong; Fengwei Zhou; Zhenguo Li; Jun Zhu; | code |
413 | OnePose: One-Shot Object Pose Estimation Without CAD Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new method named OnePose for object pose estimation. |
Jiaming Sun; Zihao Wang; Siyu Zhang; Xingyi He; Hongcheng Zhao; Guofeng Zhang; Xiaowei Zhou; | code |
414 | Rethinking Minimal Sufficient Representation in Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This reveals a new problem that the contrastive learning models have the risk of over-fitting to the shared information between views. To alleviate this problem, we propose to increase the mutual information between the representation and input as regularization to approximately introduce more task-relevant information, since we cannot utilize any downstream task information during training. |
Haoqing Wang; Xun Guo; Zhi-Hong Deng; Yan Lu; | code |
415 | Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL). |
Yikai Wang; Xinwei Sun; Yanwei Fu; | code |
416 | Federated Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the global forgetting brought by the non-i.i.d class imbalance across clients, we propose a proxy server that selects the best old global model to assist the local relation distillation. |
Jiahua Dong; Lixu Wang; Zhen Fang; Gan Sun; Shichao Xu; Xiao Wang; Qi Zhu; | code |
417 | Show, Deconfound and Tell: Image Captioning With Causal Inference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we first use Structural Causal Models (SCMs) to show how two confounders damage the image captioning. Then we apply the backdoor adjustment to propose a novel causal inference based image captioning (CIIC) framework, which consists of an interventional object detector (IOD) and an interventional transformer decoder (ITD) to jointly confront both confounders. |
Bing Liu; Dong Wang; Xu Yang; Yong Zhou; Rui Yao; Zhiwen Shao; Jiaqi Zhao; | code |
418 | MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence. |
Xingyu Chen; Yufeng Liu; Yajiao Dong; Xiong Zhang; Chongyang Ma; Yanmin Xiong; Yuan Zhang; Xiaoyan Guo; | code |
419 | Parameter-Free Online Test-Time Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by the inherent uncertainty around the conditions that will ultimately be encountered at test time, we propose a particularly "conservative" approach, which addresses the problem with a Laplacian Adjusted Maximum-likelihood Estimation (LAME) objective. |
Malik Boudiaf; Romain Mueller; Ismail Ben Ayed; Luca Bertinetto; | code |
420 | SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Though great success, they ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. |
Wuyang Li; Xinyu Liu; Yixuan Yuan; | code |
421 | No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models By Fitting Feature-Level Space-Time Surfaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To capture 3D motions without explicitly tracking correspondences, we propose a kinematics-inspired neural network (Kinet) by generalizing the kinematic concept of ST-surfaces to the feature space. |
Jia-Xing Zhong; Kaichen Zhou; Qingyong Hu; Bing Wang; Niki Trigoni; Andrew Markham; | code |
422 | HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Hyperspectral Explicable Reconstruction and Optimal Sampling deep Network for SCI, dubbed HerosNet, which includes several phases under the ISTA-unfolding framework. |
Xuanyu Zhang; Yongbing Zhang; Ruiqin Xiong; Qilin Sun; Jian Zhang; | code |
423 | Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. |
Arnav Chavan; Zhiqiang Shen; Zhuang Liu; Zechun Liu; Kwang-Ting Cheng; Eric P. Xing; | code |
424 | Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of recovering a single person’s 3D human mesh from in-the-wild crowded scenes. |
Hongsuk Choi; Gyeongsik Moon; JoonKyu Park; Kyoung Mu Lee; | code |
425 | Detecting Deepfakes With Self-Blended Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present novel synthetic training data called self-blended images (SBIs) to detect deepfakes. |
Kaede Shiohara; Toshihiko Yamasaki; | code |
426 | Implicit Sample Extension for Unsupervised Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the limited samples in each identity, we suppose there may lack some underlying information to well reveal the accurate clusters. To discover these information, we propose an Implicit Sample Extension (ISE) method to generate what we call support samples around the cluster boundaries. |
Xinyu Zhang; Dongdong Li; Zhigang Wang; Jian Wang; Errui Ding; Javen Qinfeng Shi; Zhaoxiang Zhang; Jingdong Wang; | code |
427 | Energy-Based Latent Aligner for Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values. |
K J Joseph; Salman Khan; Fahad Shahbaz Khan; Rao Muhammad Anwer; Vineeth N Balasubramanian; | code |
428 | Towards Semi-Supervised Deep Facial Expression Recognition With An Adaptive Confidence Margin Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition. |
Hangyu Li; Nannan Wang; Xi Yang; Xiaoyu Wang; Xinbo Gao; | code |
429 | Group R-CNN for Weakly Semi-Supervised Object Detection With Points Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. |
Shilong Zhang; Zhuoran Yu; Liyang Liu; Xinjiang Wang; Aojun Zhou; Kai Chen; | code |
430 | Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the task of action-driven stochastic human motion prediction, which aims to predict multiple plausible future motions given a sequence of action labels and a short motion history. |
Wei Mao; Miaomiao Liu; Mathieu Salzmann; | code |
431 | Hybrid Relation Guided Set Matching for Few-Shot Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric. |
Xiang Wang; Shiwei Zhang; Zhiwu Qing; Mingqian Tang; Zhengrong Zuo; Changxin Gao; Rong Jin; Nong Sang; | code |
432 | Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the semi-supervised learning problem, using a few labeled data and a large amount of unlabeled data to train the network, by developing a cross-patch dense contrastive learning framework, to segment cellular nuclei in histopathologic images. |
Huisi Wu; Zhaoze Wang; Youyi Song; Lin Yang; Jing Qin; | code |
433 | Generalized Binary Search Network for Highly-Efficient Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel method for highly efficient MVS that remarkably decreases the memory footprint, meanwhile clearly advancing state-of-the-art depth prediction performance. |
Zhenxing Mi; Chang Di; Dan Xu; | code |
434 | SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the largest synthetic dataset for autonomous driving, SHIFT. |
Tao Sun; Mattia Segu; Janis Postels; Yuxuan Wang; Luc Van Gool; Bernt Schiele; Federico Tombari; Fisher Yu; | code |
435 | FlexIT: Towards Flexible Semantic Image Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing. |
Guillaume Couairon; Asya Grechka; Jakob Verbeek; Holger Schwenk; Matthieu Cord; | code |
436 | CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On large displacements with motion blur, noisy correlations could cause severe errors in the estimated flow. To overcome this challenge, we propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT), aiming to revitalize the correlation volume computation. |
Xiuchao Sui; Shaohua Li; Xue Geng; Yan Wu; Xinxing Xu; Yong Liu; Rick Goh; Hongyuan Zhu; | code |
437 | BoxeR: Box-Attention for 2D and 3D Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple attention mechanism, we call Box-Attention. |
Duy-Kien Nguyen; Jihong Ju; Olaf Booij; Martin R. Oswald; Cees G. M. Snoek; | code |
438 | Neural Architecture Search With Representation Mutual Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing strategies, such as employing standard training or performance predictor, often suffer from high computational complexity and low generality. To address this issue, we propose to rank architectures by Representation Mutual Information (RMI). |
Xiawu Zheng; Xiang Fei; Lei Zhang; Chenglin Wu; Fei Chao; Jianzhuang Liu; Wei Zeng; Yonghong Tian; Rongrong Ji; | code |
439 | Can Neural Nets Learn The Same Model Twice? Investigating Reproducibility and Double Descent From The Decision Boundary Perspective Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We discuss methods for visualizing neural network decision boundaries and decision regions. |
Gowthami Somepalli; Liam Fowl; Arpit Bansal; Ping Yeh-Chiang; Yehuda Dar; Richard Baraniuk; Micah Goldblum; Tom Goldstein; | code |
440 | Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels. |
Saquib Sarfraz; Marios Koulakis; Constantin Seibold; Rainer Stiefelhagen; | code |
441 | Multi-View Transformer for 3D Visual Grounding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Multi-View Transformer (MVT) for 3D visual grounding. |
Shijia Huang; Yilun Chen; Jiaya Jia; Liwei Wang; | code |
442 | Structured Sparse R-CNN for Direct Scene Graph Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, from a perspective on SGG as a direct set prediction, this paper presents a simple, sparse, and unified framework, termed as Structured Sparse R-CNN. |
Yao Teng; Limin Wang; | code |
443 | BARC: Learning To Regress 3D Dog Shape From Images By Exploiting Breed Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to recover the 3D shape and pose of dogs from a single image. |
Nadine Rüegg; Silvia Zuffi; Konrad Schindler; Michael J. Black; | code |
444 | PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce PCA-based knowledge distillation to distill lightweight models and show it is motivated by theory. |
Tai-Yin Chiu; Danna Gurari; | code |
445 | Towards Understanding Adversarial Robustness of Optical Flow Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the cause of the problem and show that the lack of robustness is rooted in the classical aperture problem of optical flow estimation in combination with bad choices in the details of the network architecture. |
Simon Schrodi; Tonmoy Saikia; Thomas Brox; | code |
446 | Lifelong Graph Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we bridge GNN and lifelong learning by converting a continual graph learning problem to a regular graph learning problem so GNN can inherit the lifelong learning techniques developed for convolutional neural networks (CNN). |
Chen Wang; Yuheng Qiu; Dasong Gao; Sebastian Scherer; | code |
447 | Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Hypergraph-Induced Semantic Tuplet (HIST) loss for deep metric learning that leverages the multilateral semantic relations of multiple samples to multiple classes via hypergraph modeling. |
Jongin Lim; Sangdoo Yun; Seulki Park; Jin Young Choi; | code |
448 | Computing Wasserstein-p Distance Between Images With Linear Cost Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel algorithm to compute the Wasserstein-p distance between discrete measures by restricting the optimal transport (OT) problem on a subset. |
Yidong Chen; Chen Li; Zhonghua Lu; | code |
449 | Unsupervised Representation Learning for Binary Networks By Joint Classifier Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: But such networks are not readily deployable to edge devices. To accelerate deployment of models with the benefit of unsupervised representation learning to such resource limited devices for various downstream tasks, we propose a self-supervised learning method for binary networks that uses a moving target network. |
Dahyun Kim; Jonghyun Choi; | code |
450 | Large-Scale Video Panoptic Segmentation in The Wild: A Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new large-scale dataset for the video panoptic segmentation task, which aims to assign semantic classes and track identities to all pixels in a video. |
Jiaxu Miao; Xiaohan Wang; Yu Wu; Wei Li; Xu Zhang; Yunchao Wei; Yi Yang; | code |
451 | GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we formulate GAI as three ubiquitous computer vision tasks: fine-grained recognition, domain adaptation and out-of-distribution recognition. |
Lei Fan; Yiwen Ding; Dongdong Fan; Donglin Di; Maurice Pagnucco; Yang Song; | code |
452 | Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we primarily study the video-based cross-modal person Re-ID method. |
Xinyu Lin; Jinxing Li; Zeyu Ma; Huafeng Li; Shuang Li; Kaixiong Xu; Guangming Lu; David Zhang; | code |
453 | MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prior works either simply align the global features of an image with its associated class semantic vector or utilize unidirectional attention to learn the limited latent semantic representations, which could not effectively discover the intrinsic semantic knowledge (e.g., attribute semantics) between visual and attribute features. To solve the above dilemma, we propose a Mutually Semantic Distillation Network (MSDN), which progressively distills the intrinsic semantic representations between visual and attribute features for ZSL. |
Shiming Chen; Ziming Hong; Guo-Sen Xie; Wenhan Yang; Qinmu Peng; Kai Wang; Jian Zhao; Xinge You; | code |
454 | Oriented RepPoints for Aerial Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike the mainstreamed approaches regressing the bounding box orientations, this paper proposes an effective adaptive points learning approach to aerial object detection by taking advantage of the adaptive points representation, which is able to capture the geometric information of the arbitrary-oriented instances. |
Wentong Li; Yijie Chen; Kaixuan Hu; Jianke Zhu; | code |
455 | Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning (CPL) to overcome the above limitations. |
Minghang Zheng; Yanjie Huang; Qingchao Chen; Yuxin Peng; Yang Liu; | code |
456 | Low-Resource Adaptation for Personalized Co-Speech Gesture Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach, named DiffGAN, that efficiently personalizes co-speech gesture generation models of a high-resource source speaker to target speaker with just 2 minutes of target training data. |
Chaitanya Ahuja; Dong Won Lee; Louis-Philippe Morency; | code |
457 | Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate this problem of domain shift, conventional wisdom typically concentrates solely on reducing the discrepancy between the source and target domains via attached domain classifiers, yet ignoring the difficulty of such transferable features in coping with both classification and localization subtasks in object detection. To address this issue, in this paper, we propose Task-specific Inconsistency Alignment (TIA), by developing a new alignment mechanism in separate task spaces, improving the performance of the detector on both subtasks. |
Liang Zhao; Limin Wang; | code |
458 | MS2DG-Net: Progressive Correspondence Learning Via Multiple Sparse Semantics Dynamic Graph Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most such works ignore similar sparse semantics information between two given images and cannot capture local topology among correspondences well. Therefore, to deal with the above problems, Multiple Sparse Semantics Dynamic Graph Network (MS^ 2 DG-Net) is proposed, in this paper, to predict probabilities of correspondences as inliers and recover camera poses. |
Luanyuan Dai; Yizhang Liu; Jiayi Ma; Lifang Wei; Taotao Lai; Changcai Yang; Riqing Chen; | code |
459 | Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion. |
Evonne Ng; Hanbyul Joo; Liwen Hu; Hao Li; Trevor Darrell; Angjoo Kanazawa; Shiry Ginosar; | code |
460 | Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video. |
Wen-Li Wei; Jen-Chun Lin; Tyng-Luh Liu; Hong-Yuan Mark Liao; | code |
461 | MixFormer: End-to-End Tracking With Iterative Mixed Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. |
Yutao Cui; Cheng Jiang; Limin Wang; Gangshan Wu; | code |
462 | Plenoxels: Radiance Fields Without Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis. |
Sara Fridovich-Keil; Alex Yu; Matthew Tancik; Qinhong Chen; Benjamin Recht; Angjoo Kanazawa; | code |
463 | Selective-Supervised Contrastive Learning With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To learn robust representations and handle noisy labels, we propose selective-supervised contrastive learning (Sel-CL) in this paper. |
Shikun Li; Xiaobo Xia; Shiming Ge; Tongliang Liu; | code |
464 | SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simplex noise transition matrix (SimT) to model the mixed noise distributions in DA semantic segmentation and formulate the problem as estimation of SimT. |
Xiaoqing Guo; Jie Liu; Tongliang Liu; Yixuan Yuan; | code |
465 | Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To circumvent the former problem, we propose a novel algorithm that attacks semantic similarity on feature representations. |
Cheng Luo; Qinliang Lin; Weicheng Xie; Bizhu Wu; Jinheng Xie; Linlin Shen; | code |
466 | Video Demoireing With Relation-Based Temporal Consistency Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Considering the increasing demands for capturing videos, we study how to remove such undesirable moire patterns in videos, namely video demoireing. To this end, we introduce the first hand-held video demoireing dataset with a dedicated data collection pipeline to ensure spatial and temporal alignments of captured data. |
Peng Dai; Xin Yu; Lan Ma; Baoheng Zhang; Jia Li; Wenbo Li; Jiajun Shen; Xiaojuan Qi; | code |
467 | Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel style transfer method to quickly create a new visual product with a nice appearance for industrial designers’ reference. |
Jinchao Yang; Fei Guo; Shuo Chen; Jun Li; Jian Yang; | code |
468 | Modeling Image Composition for Complex Scene Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method that achieves state-of-the-art results on challenging (few-shot) layout-to-image generation tasks by accurately modeling textures, structures and relationships contained in a complex scene. |
Zuopeng Yang; Daqing Liu; Chaoyue Wang; Jie Yang; Dacheng Tao; | code |
469 | Decoupling Zero-Shot Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 22 points on the PAS-CAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes. |
Jian Ding; Nan Xue; Gui-Song Xia; Dengxin Dai; | code |
470 | Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. |
Van Nguyen Nguyen; Yinlin Hu; Yang Xiao; Mathieu Salzmann; Vincent Lepetit; | code |
471 | Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting The Adversarial Transferability Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we treat the iterative ensemble attack as a stochastic gradient descent optimization process, in which the variance of the gradients on different models may lead to poor local optima. |
Yifeng Xiong; Jiadong Lin; Min Zhang; John E. Hopcroft; Kun He; | code |
472 | IFOR: Iterative Flow Minimization for Robotic Object Rearrangement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, an end-to-end method for the challenging problem of object rearrangement for unknown objects given an RGBD image of the original and final scenes. |
Ankit Goyal; Arsalan Mousavian; Chris Paxton; Yu-Wei Chao; Brian Okorn; Jia Deng; Dieter Fox; | code |
473 | Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a unified approach to visual navigation using a novel modular transfer learning model. |
Ziad Al-Halah; Santhosh Kumar Ramakrishnan; Kristen Grauman; | code |
474 | TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer). |
Wenqiang Zhang; Zilong Huang; Guozhong Luo; Tao Chen; Xinggang Wang; Wenyu Liu; Gang Yu; Chunhua Shen; | code |
475 | The Wanderings of Odysseus in 3D Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to populate digital environments, in which digital humans have diverse body shapes, move perpetually, and have plausible body-scene contact. |
Yan Zhang; Siyu Tang; | code |
476 | All-in-One Image Restoration for Unknown Corruption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study a challenging problem in image restoration, namely, how to develop an all-in-one method that could recover images from a variety of unknown corruption types and levels. |
Boyun Li; Xiao Liu; Peng Hu; Zhongqin Wu; Jiancheng Lv; Xi Peng; | code |
477 | PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to explicitly integrate two matching priors in a single loss in order to learn local descriptors without supervision. |
Jérome Revaud; Vincent Leroy; Philippe Weinzaepfel; Boris Chidlovskii; | code |
478 | MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose MixSTE (Mixed Spatio-Temporal Encoder), which has a temporal transformer block to separately model the temporal motion of each joint and a spatial transformer block to learn inter-joint spatial correlation. |
Jinlu Zhang; Zhigang Tu; Jianyu Yang; Yujin Chen; Junsong Yuan; | code |
479 | RCP: Recurrent Closest Point for Point Cloud Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these methods are limited by the fact that it is difficult to define a search window on point clouds because of the irregular data structure. In this paper, we avoid this irregularity by a simple yet effective method. |
Xiaodong Gu; Chengzhou Tang; Weihao Yuan; Zuozhuo Dai; Siyu Zhu; Ping Tan; | code |
480 | A Dual Weighting Label Assignment Scheme for Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore a new weighting paradigm, termed dual weighting (DW), to specify pos and neg weights separately. |
Shuai Li; Chenhang He; Ruihuang Li; Lei Zhang; | code |
481 | Hyperbolic Vision Transformers: Combining Improvements in Metric Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. |
Aleksandr Ermolov; Leyla Mirvakhabova; Valentin Khrulkov; Nicu Sebe; Ivan Oseledets; | code |
482 | Instance-Aware Dynamic Neural Network Quantization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present to conduct the low-bit quantization for each image individually, and develop a dynamic quantization scheme for exploring their optimal bit-widths. |
Zhenhua Liu; Yunhe Wang; Kai Han; Siwei Ma; Wen Gao; | code |
483 | Exploring Effective Data for Surrogate Training Towards Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a triple-player framework by introducing a discriminator into the traditional data-free framework. |
Xuxiang Sun; Gong Cheng; Hongda Li; Lei Pei; Junwei Han; | code |
484 | JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce JRDB-Act, as an extension of the existing JRDB, which is captured by a social mobile manipulator and reflects a real distribution of human daily-life actions in a university campus environment. |
Mahsa Ehsanpour; Fatemeh Saleh; Silvio Savarese; Ian Reid; Hamid Rezatofighi; | code |
485 | Investigating Top-k White-Box and Transferable Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class. |
Chaoning Zhang; Philipp Benz; Adil Karjauv; Jae Won Cho; Kang Zhang; In So Kweon; | code |
486 | Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty under small data setting due to the tightly spatiotemporal-entangled modeling; (ii) information redundancy as it usually contains lots of marginal information that is weakly relevant to classification; and (iii) low interaction between multi-modal spatiotemporal information caused by insufficient late fusion. To alleviate these drawbacks, we propose to decouple and recouple spatiotemporal representation for RGB-D-based motion recognition. |
Benjia Zhou; Pichao Wang; Jun Wan; Yanyan Liang; Fan Wang; Du Zhang; Zhen Lei; Hao Li; Rong Jin; | code |
487 | A Self-Supervised Descriptor for Image Copy Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. |
Ed Pizzi; Sreya Dutta Roy; Sugosh Nagavara Ravindra; Priya Goyal; Matthijs Douze; | code |
488 | Negative-Aware Attention Framework for Image-Text Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We thereby propose a novel Negative-Aware Attention Framework (NAAF), which explicitly exploits both the positive effect of matched fragments and the negative effect of mismatched fragments to jointly infer image-text similarity. |
Kun Zhang; Zhendong Mao; Quan Wang; Yongdong Zhang; | code |
489 | An Image Patch Is A Wave: Phase-Aware Vision MLP Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. |
Yehui Tang; Kai Han; Jianyuan Guo; Chang Xu; Yanxi Li; Chao Xu; Yunhe Wang; | code |
490 | Shunted Self-Attention Via Multi-Scale Token Aggregation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted self-attention (SSA), that allows ViTs to model the attentions at hybrid scales per attention layer. |
Sucheng Ren; Daquan Zhou; Shengfeng He; Jiashi Feng; Xinchao Wang; | code |
491 | Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, a multivariate Gaussian mixture is proposed with means and covariances to be estimated. Then, a novel probabilistic vector quantization is utilized to effectively approximate means, and remaining covariances are further induced to a unified mixture and solved by cascaded estimation without context models involved. |
Xiaosu Zhu; Jingkuan Song; Lianli Gao; Feng Zheng; Heng Tao Shen; | code |
492 | Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to The Task of Accelerated MRI Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we present a novel Deep Learning-based Inverse Problem solver applied to the task of Accelerated MRI Reconstruction, called the Recurrent Variational Network (RecurrentVarNet), by exploiting the properties of Convolutional Recurrent Neural Networks and unrolled algorithms for solving Inverse Problems. |
George Yiasemis; Jan-Jakob Sonke; Clarisa Sánchez; Jonas Teuwen; | code |
493 | Surpassing The Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We explore the potential of CNN-based models for gallbladder cancer (GBC) detection from ultrasound (USG) images as no prior study is known. |
Soumen Basu; Mayank Gupta; Pratyaksha Rana; Pankaj Gupta; Chetan Arora; | code |
494 | Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Accordingly, we propose our defense strategy, namely Appearance and Structure Aware Robust Graph Matching (ASAR-GM). |
Qibing Ren; Qingquan Bao; Runzhong Wang; Junchi Yan; | code |
495 | TrackFormer: Multi-Object Tracking With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce TrackFormer, an end-to-end trainable MOT approach based on an encoder-decoder Transformer architecture. |
Tim Meinhardt; Alexander Kirillov; Laura Leal-Taixé; Christoph Feichtenhofer; | code |
496 | 3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since the semantic attributes of a single image are usually implicit and entangled with each other, it is still challenging to reconstruct 3D shape with detailed semantic structures represented by the input image. To address this problem, we propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images. |
Xin Wen; Junsheng Zhou; Yu-Shen Liu; Hua Su; Zhen Dong; Zhizhong Han; | code |
497 | Feature Statistics Mixing Regularization for Generative Adversarial Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a remedy, we propose feature statistics mixing regularization (FSMR) that encourages the discriminator’s prediction to be invariant to the styles of input images. |
Junho Kim; Yunjey Choi; Youngjung Uh; | code |
498 | OpenTAL: Towards Open Set Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we, for the first time, step toward the Open Set TAL (OSTAL) problem and propose a general framework OpenTAL based on Evidential Deep Learning (EDL). |
Wentao Bao; Qi Yu; Yu Kong; | code |
499 | Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work addresses the generalizable deepfake detection from a simple principle: a generalizable representation should be sensitive to diverse types of forgeries. Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations. |
Liang Chen; Yong Zhang; Yibing Song; Lingqiao Liu; Jue Wang; | code |
500 | Ego4D: Around The World in 3,000 Hours of Egocentric Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. |
Kristen Grauman; Andrew Westbury; Eugene Byrne; Zachary Chavis; Antonino Furnari; Rohit Girdhar; Jackson Hamburger; Hao Jiang; Miao Liu; Xingyu Liu; Miguel Martin; Tushar Nagarajan; Ilija Radosavovic; Santhosh Kumar Ramakrishnan; Fiona Ryan; Jayant Sharma; Michael Wray; Mengmeng Xu; Eric Zhongcong Xu; Chen Zhao; Siddhant Bansal; Dhruv Batra; Vincent Cartillier; Sean Crane; Tien Do; Morrie Doulaty; Akshay Erapalli; Christoph Feichtenhofer; Adriano Fragomeni; Qichen Fu; Abrham Gebreselasie; Cristina González; James Hillis; Xuhua Huang; Yifei Huang; Wenqi Jia; Weslie Khoo; Jáchym Kolář; Satwik Kottur; Anurag Kumar; Federico Landini; Chao Li; Yanghao Li; Zhenqiang Li; Karttikeya Mangalam; Raghava Modhugu; Jonathan Munro; Tullie Murrell; Takumi Nishiyasu; Will Price; Paola Ruiz; Merey Ramazanova; Leda Sari; Kiran Somasundaram; Audrey Southerland; Yusuke Sugano; Ruijie Tao; Minh Vo; Yuchen Wang; Xindi Wu; Takuma Yagi; Ziwei Zhao; Yunyi Zhu; Pablo Arbeláez; David Crandall; Dima Damen; Giovanni Maria Farinella; Christian Fuegen; Bernard Ghanem; Vamsi Krishna Ithapu; C. V. Jawahar; Hanbyul Joo; Kris Kitani; Haizhou Li; Richard Newcombe; Aude Oliva; Hyun Soo Park; James M. Rehg; Yoichi Sato; Jianbo Shi; Mike Zheng Shou; Antonio Torralba; Lorenzo Torresani; Mingfei Yan; Jitendra Malik; | code |
501 | Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. |
Yucheng Tang; Dong Yang; Wenqi Li; Holger R. Roth; Bennett Landman; Daguang Xu; Vishwesh Nath; Ali Hatamizadeh; | code |
502 | Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method, W-OoD, for utilizing the hard OoDs. |
Jungbeom Lee; Seong Joon Oh; Sangdoo Yun; Junsuk Choe; Eunji Kim; Sungroh Yoon; | code |
503 | DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From A Single Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in-the-wild. |
Tetiana Martyniuk; Orest Kupyn; Yana Kurlyak; Igor Krashenyi; Jiří Matas; Viktoriia Sharmanska; | code |
504 | Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key idea is to infer signed distances by pushing both the query projections to be on the surface and the projection distance to be the minimum. |
Baorui Ma; Yu-Shen Liu; Zhizhong Han; | code |
505 | VCLIMB: A Novel Video Class Incremental Learning Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce vCLIMB, a novel video continual learning benchmark. |
Andrés Villa; Kumail Alhamoud; Victor Escorcia; Fabian Caba; Juan León Alcázar; Bernard Ghanem; | code |
506 | Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Robust Equivariant Imaging (REI) framework which can learn to image from noisy partial measurements alone. |
Dongdong Chen; Julián Tachella; Mike E. Davies; | code |
507 | ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the impressive results, we thoroughly investigate the SDA and provide some empirical analysis. |
Lihe Yang; Wei Zhuo; Lei Qi; Yinghuan Shi; Yang Gao; | code |
508 | Interacting Attention Graph for Single Image Two-Hand Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image. |
Mengcheng Li; Liang An; Hongwen Zhang; Lianpeng Wu; Feng Chen; Tao Yu; Yebin Liu; | code |
509 | Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To accelerate the progress of roadside perception, we present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view. |
Xiaoqing Ye; Mao Shu; Hanyu Li; Yifeng Shi; Yingying Li; Guangjie Wang; Xiao Tan; Errui Ding; | code |
510 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel Cross-Image Relational KD (CIRKD), which focuses on transferring structured pixel-to-pixel and pixel-to-region relations among the whole images. |
Chuanguang Yang; Helong Zhou; Zhulin An; Xue Jiang; Yongjun Xu; Qian Zhang; | code |
511 | Towards Layer-Wise Image Vectorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose Layer-wise Image Vectorization, namely LIVE, to convert raster images to SVGs and simultaneously maintain its image topology. |
Xu Ma; Yuqian Zhou; Xingqian Xu; Bin Sun; Valerii Filev; Nikita Orlov; Yun Fu; Humphrey Shi; | code |
512 | Scenic: A JAX Library for Computer Vision Research and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Scenic is an open-source (https://github.com/google-research/scenic) JAX library with a focus on transformer-based models for computer vision research and beyond. |
Mostafa Dehghani; Alexey Gritsenko; Anurag Arnab; Matthias Minderer; Yi Tay; | code |
513 | Real-Time Object Detection for Streaming Perception Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem. |
Jinrong Yang; Songtao Liu; Zeming Li; Xiaoping Li; Jian Sun; | code |
514 | VisualHow: Multimodal Problem Solving Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: With an overarching goal of developing intelligent systems to assist humans in various daily activities, we propose VisualHow, a free-form and open-ended research that focuses on understanding a real-life problem and deriving its solution by incorporating key components across multiple modalities. |
Jinhui Yang; Xianyu Chen; Ming Jiang; Shi Chen; Louis Wang; Qi Zhao; | code |
515 | Spatial Commonsense Graph for Object Localisation in Partial Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) |
Francesco Giuliari; Geri Skenderi; Marco Cristani; Yiming Wang; Alessio Del Bue; | code |
516 | OSSGAN: Open-Set Semi-Supervised Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set. |
Kai Katsumata; Duc Minh Vo; Hideki Nakayama; | code |
517 | Bi-Level Alignment for Cross-Domain Crowd Counting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to develop a new adversarial learning based method, which is simple and efficient to apply. |
Shenjian Gong; Shanshan Zhang; Jian Yang; Dengxin Dai; Bernt Schiele; | code |
518 | ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: VFI can be extremely challenging, particularly in sequences containing large motions, occlusions or dynamic textures, where existing approaches fail to offer perceptually robust interpolation performance. In this context, we present a novel deep learning based VFI method, ST-MFNet, based on a Spatio-Temporal Multi-Flow architecture. |
Duolikun Danier; Fan Zhang; David Bull; | code |
519 | Efficient Multi-View Stereo By Iterative Dynamic Cost Volume Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel iterative dynamic cost volume for multi-view stereo. |
Shaoqian Wang; Bo Li; Yuchao Dai; | code |
520 | TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing. |
Yanbo Xu; Yueqin Yin; Liming Jiang; Qianyi Wu; Chengyao Zheng; Chen Change Loy; Bo Dai; Wayne Wu; | code |
521 | Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes. |
Shu Zhang; Ran Xu; Caiming Xiong; Chetan Ramaiah; | code |
522 | SGTR: End-to-End Scene Graph Generation With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem. |
Rongjie Li; Songyang Zhang; Xuming He; | code |
523 | Decoupled Knowledge Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation(DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly. |
Borui Zhao; Quan Cui; Renjie Song; Yiyu Qiu; Jiajun Liang; | code |
524 | DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion. |
Yingwei Li; Adams Wei Yu; Tianjian Meng; Ben Caine; Jiquan Ngiam; Daiyi Peng; Junyang Shen; Yifeng Lu; Denny Zhou; Quoc V. Le; Alan Yuille; Mingxing Tan; | code |
525 | Reusing The Task-Specific Classifier As A Discriminator: Discriminator-Free Adversarial Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of these methods failed to effectively leverage the predicted discriminative information, and thus cause mode collapse for generator. In this work, we address this problem from a different perspective and design a simple yet effective adversarial paradigm in the form of a discriminator-free adversarial learning network (DALN), wherein the category classifier is reused as a discriminator, which achieves explicit domain alignment and category distinguishment through a unified objective, enabling the DALN to leverage the predicted discriminative information for sufficient feature alignment. |
Lin Chen; Huaian Chen; Zhixiang Wei; Xin Jin; Xiao Tan; Yi Jin; Enhong Chen; | code |
526 | Show Me What and Tell Me How: Video Synthesis Via Multimodal Conditioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work presents a multimodal video generation framework that benefits from text and images provided jointly or separately. |
Ligong Han; Jian Ren; Hsin-Ying Lee; Francesco Barbieri; Kyle Olszewski; Shervin Minaee; Dimitris Metaxas; Sergey Tulyakov; | code |
527 | SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel image-based relighting pipeline, SIMBAR, that can work with a single image as input. |
Xianling Zhang; Nathan Tseng; Ameerah Syed; Rohan Bhasin; Nikita Jaipuria; | code |
528 | Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset’s partial annotations. |
Emanuel Ben-Baruch; Tal Ridnik; Itamar Friedman; Avi Ben-Cohen; Nadav Zamir; Asaf Noy; Lihi Zelnik-Manor; | code |
529 | CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods, based on convolutional neural networks (CNNs) and/or graph neural networks (GNNs), regress instance bounding boxes in the pixel domain and then convert the predictions into symbols. In this paper, we present a novel framework named CADTransformer, that can painlessly modify existing vision transformer (ViT) backbones to tackle the above limitations for the panoptic symbol spotting task. |
Zhiwen Fan; Tianlong Chen; Peihao Wang; Zhangyang Wang; | code |
530 | IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Learning to synthesize data has emerged as a promising direction in zero-shot quantization (ZSQ), which represents neural networks by low-bit integer without accessing any of the real data. In this paper, we observe an interesting phenomenon of intra-class heterogeneity in real data and show that existing methods fail to retain this property in their synthetic images, which causes a limited performance increase. |
Yunshan Zhong; Mingbao Lin; Gongrui Nan; Jianzhuang Liu; Baochang Zhang; Yonghong Tian; Rongrong Ji; | code |
531 | I M Avatar: Implicit Morphable Head Avatars From Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos. |
Yufeng Zheng; Victoria Fernández Abrevaya; Marcel C. Bühler; Xu Chen; Michael J. Black; Otmar Hilliges; | code |
532 | Weakly-Supervised Metric Learning With Cross-Module Communications for The Classification of Anterior Chamber Angle Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel end-to-end framework GCNet for automated Glaucoma Classification based on ACA images or other Glaucoma-related medical images. |
Jingqi Huang; Yue Ning; Dong Nie; Linan Guan; Xiping Jia; | code |
533 | A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is because the current CNN-based methods adopt locality-based operations, which are not effective to deal with the variation caused by deformations. In this paper, we propose a CNN based Text ATTention network (TATT) to address this problem. |
Jianqi Ma; Zhetong Liang; Lei Zhang; | code |
534 | Multi-Modal Dynamic Graph Transformer for Visual Grounding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Their performance depends on the density and quality of the candidate regions and is capped by the inability to optimize the located regions continuously. To address these issues, we propose to remodel VG into a progressively optimized visual semantic alignment process. |
Sijia Chen; Baochun Li; | code |
535 | Geometric Transformer for Fast and Robust Point Cloud Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Geometric Transformer to learn geometric feature for robust superpoint matching. |
Zheng Qin; Hao Yu; Changjian Wang; Yulan Guo; Yuxing Peng; Kai Xu; | code |
536 | UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nevertheless, jointly conducting moment retrieval and highlight detection is an emerging research topic, even though its component problems and some related tasks have already been studied for a while. In this paper, we present the first unified framework, named Unified Multi-modal Transformers (UMT), capable of realizing such joint optimization while can also be easily degenerated for solving individual problems. |
Ye Liu; Siyuan Li; Yang Wu; Chang-Wen Chen; Ying Shan; Xiaohu Qie; | code |
537 | Demystifying The Neural Tangent Kernel From A Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we revisit several at-initialization metrics that can be derived from the NTK and reveal their key shortcomings. |
Jisoo Mok; Byunggook Na; Ji-Hoon Kim; Dongyoon Han; Sungroh Yoon; | code |
538 | The Devil Is in The Details: Window-Based Attention for Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block. |
Renjie Zou; Chunfeng Song; Zhaoxiang Zhang; | code |
539 | DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a new real-world photometric stereo dataset with "ground truth" normal maps, which is 10 times larger than the widely adopted one. |
Jieji Ren; Feishi Wang; Jiahao Zhang; Qian Zheng; Mingjun Ren; Boxin Shi; | code |
540 | PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons. |
Stefano Zorzi; Shabab Bazrafkan; Stefan Habenschuss; Friedrich Fraundorfer; | code |
541 | Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge. |
Yihan Wang; Muyang Li; Han Cai; Wei-Ming Chen; Song Han; | code |
542 | Spatio-Temporal Relation Modeling for Few-Shot Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. |
Anirudh Thatipelli; Sanath Narayan; Salman Khan; Rao Muhammad Anwer; Fahad Shahbaz Khan; Bernard Ghanem; | code |
543 | Multi-Person Extreme Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons. |
Wen Guo; Xiaoyu Bie; Xavier Alameda-Pineda; Francesc Moreno-Noguer; | code |
544 | B-DARTS: Beta-Decay Regularization for Differentiable Architecture Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, they suffer from two main issues, the weak robustness to the performance collapse and the poor generalization ability of the searched architectures. To solve these two problems, a simple-but-efficient regularization method, termed as Beta-Decay, is proposed to regularize the DARTS-based NAS searching process. |
Peng Ye; Baopu Li; Yikang Li; Tao Chen; Jiayuan Fan; Wanli Ouyang; | code |
545 | CMT: Convolutional Neural Networks Meet Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs). In this paper, we aim to address this issue and develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models. |
Jianyuan Guo; Kai Han; Han Wu; Yehui Tang; Xinghao Chen; Yunhe Wang; Chang Xu; | code |
546 | KNN Local Attention for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, by focusing only on adjacent positions, the local attention suffers from an insufficient receptive field for image restoration. In this paper, we propose a new attention mechanism for image restoration, called k-NN Image Transformer (KiT), that rectifies above mentioned limitations. |
Hunsang Lee; Hyesong Choi; Kwanghoon Sohn; Dongbo Min; | code |
547 | Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered By Pre-Trained Vision-Language Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations. |
Zipeng Xu; Tianwei Lin; Hao Tang; Fu Li; Dongliang He; Nicu Sebe; Radu Timofte; Luc Van Gool; Errui Ding; | code |
548 | TransMix: Attend To Mix for Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This may lead to a strange phenomenon that sometimes there is no valid object in the mixed image due to the random process in augmentation but there is still response in the label space. To bridge such gap between the input and label spaces, we propose TransMix, which mixes labels based on the attention maps of Vision Transformers. |
Jie-Neng Chen; Shuyang Sun; Ju He; Philip H.S. Torr; Alan Yuille; Song Bai; | code |
549 | Inertia-Guided Flow Completion and Style Fusion for Video Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nevertheless, the existing flow-guided cross-frame warping methods fail to consider the lightening and sharpness variation across video frames, which leads to spatial incoherence after warping from other frames. To alleviate such problem, we propose the Adaptive Style Fusion Network (ASFN), which utilizes the style information extracted from the valid regions to guide the gradient refinement in the warped regions. |
Kaidong Zhang; Jingjing Fu; Dong Liu; | code |
550 | Long-Tailed Visual Recognition Via Gaussian Clouded Logit Adjustment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper proposes the Gaussian clouded logit adjustment by Gaussian perturbation of different class logits with varied amplitude. |
Mengke Li; Yiu-ming Cheung; Yang Lu; | code |
551 | Image Animation With Perturbed Masks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel approach for image-animation of a source image by a driving video, both depicting the same type of object. |
Yoav Shalev; Lior Wolf; | code |
552 | Domain Generalization Via Shuffled Style Assembly for Face Anti-Spoofing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we separate the complete representation into content and style ones. |
Zhuo Wang; Zezheng Wang; Zitong Yu; Weihong Deng; Jiahong Li; Tingting Gao; Zhongyuan Wang; | code |
553 | OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This problem is even more severe for single-view-based systems due to strong occlusions. Based on these observations, we propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction. |
Wenbin Lin; Chengwei Zheng; Jun-Hai Yong; Feng Xu; | code |
554 | MonoScene: Monocular 3D Semantic Scene Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Along with architectural contributions, we introduce novel global scene and local frustums losses. |
Anh-Quan Cao; Raoul de Charette; | code |
555 | AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization. |
Yulin Wang; Yang Yue; Yuanze Lin; Haojun Jiang; Zihang Lai; Victor Kulikov; Nikita Orlov; Humphrey Shi; Gao Huang; | code |
556 | Continuous Scene Representations for Embodied AI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings. |
Samir Yitzhak Gadre; Kiana Ehsani; Shuran Song; Roozbeh Mottaghi; | code |
557 | Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective. |
Chaoda Zheng; Xu Yan; Haiming Zhang; Baoyuan Wang; Shenghui Cheng; Shuguang Cui; Zhen Li; | code |
558 | Non-Probability Sampling Network for Stochastic Human Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the problem by reconstructing and comparing probabilistic distributions from prediction samples and socially-acceptable paths, respectively. |
Inhwan Bae; Jin-Hwi Park; Hae-Gon Jeon; | code |
559 | ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: So we propose ResSFL, a Split Federated Learning Framework that is designed to be MI-resistant during training. |
Jingtao Li; Adnan Siraj Rakin; Xing Chen; Zhezhi He; Deliang Fan; Chaitali Chakrabarti; | code |
560 | Human-Aware Object Placement for Visual Environment Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images, and use these in optimizing the 3D scene to reconstruct a consistent, physically plausible, 3D scene layout. |
Hongwei Yi; Chun-Hao P. Huang; Dimitrios Tzionas; Muhammed Kocabas; Mohamed Hassan; Siyu Tang; Justus Thies; Michael J. Black; | code |
561 | X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Common text-agnostic aggregations schemes include mean-pooling or self-attention over the frames, but these are likely to encode misleading visual information not described in the given text. To address this, we propose a cross-modal attention model called X-Pool that reasons between a text and the frames of a video. |
Satya Krishna Gorti; Noël Vouitsis; Junwei Ma; Keyvan Golestan; Maksims Volkovs; Animesh Garg; Guangwei Yu; | code |
562 | RAMA: A Rapid Multicut Algorithm on GPU Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a highly parallel primal-dual algorithm for the multicut (a.k.a. correlation clustering) problem, a classical graph clustering problem widely used in machine learning and computer vision. |
Ahmed Abbas; Paul Swoboda; | code |
563 | Adversarial Parametric Pose Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose learning a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training. |
Andrey Davydov; Anastasia Remizova; Victor Constantin; Sina Honari; Mathieu Salzmann; Pascal Fua; | code |
564 | Mask Transfiner for High-Quality Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Mask Transfiner for high-quality and efficient instance segmentation. |
Lei Ke; Martin Danelljan; Xia Li; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu; | code |
565 | It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning By Contrastive Data Collection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that collecting new data, in the same way, is not effective in mitigating this emotional bias. To remedy this problem, we propose a contrastive data collection approach to balance ArtEmis with a new complementary dataset such that a pair of similar images have contrasting emotions (one positive and one negative). |
Youssef Mohamed; Faizan Farooq Khan; Kilichbek Haydarov; Mohamed Elhoseiny; | code |
566 | DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning. |
Fatemeh Haghighi; Mohammad Reza Hosseinzadeh Taher; Michael B. Gotway; Jianming Liang; | code |
567 | Event-Based Video Reconstruction Via Potential-Assisted Spiking Neural Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel Event-based Video reconstruction framework based on a fully Spiking Neural Network (EVSNN), which utilizes Leaky-Integrate-and-Fire (LIF) neuron and Membrane Potential (MP) neuron. |
Lin Zhu; Xiao Wang; Yi Chang; Jianing Li; Tiejun Huang; Yonghong Tian; | code |
568 | YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many video understanding tasks require analyzing multi-shot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset—YouMVOS—of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots. |
Donglai Wei; Siddhant Kharbanda; Sarthak Arora; Roshan Roy; Nishant Jain; Akash Palrecha; Tanav Shah; Shray Mathur; Ritik Mathur; Abhijay Kemkar; Anirudh Chakravarthy; Zudi Lin; Won-Dong Jang; Yansong Tang; Song Bai; James Tompkin; Philip H.S. Torr; Hanspeter Pfister; | code |
569 | DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and newly reveal the potential of Transformers for UDA semantic segmentation. Based on the findings, we propose a novel UDA method, DAFormer. |
Lukas Hoyer; Dengxin Dai; Luc Van Gool; | code |
570 | Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep Brownian Distance Covariance (DeepBDC) method for few-shot classification. |
Jiangtao Xie; Fei Long; Jiaming Lv; Qilong Wang; Peihua Li; | code |
571 | Self-Supervised Video Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose self-supervised training for video transformers using unlabeled video data. |
Kanchana Ranasinghe; Muzammal Naseer; Salman Khan; Fahad Shahbaz Khan; Michael S. Ryoo; | code |
572 | AutoRF: Learning 3D Object Radiance Fields From Single View Observations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce AutoRF – a new approach for learning neural 3D object representations where each object in the training set is observed by only a single view. |
Norman Müller; Andrea Simonelli; Lorenzo Porzi; Samuel Rota Bulò; Matthias Nießner; Peter Kontschieder; | code |
573 | Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. |
Jiaxun Cui; Hang Qiu; Dian Chen; Peter Stone; Yuke Zhu; | code |
574 | TubeR: Tubelet Transformer for Video Action Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose TubeR: a simple solution for spatio-temporal video action detection. |
Jiaojiao Zhao; Yanyi Zhang; Xinyu Li; Hao Chen; Bing Shuai; Mingze Xu; Chunhui Liu; Kaustav Kundu; Yuanjun Xiong; Davide Modolo; Ivan Marsic; Cees G. M. Snoek; Joseph Tighe; | code |
575 | MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Especially when extending SSL to semi-supervised object detection (SSOD), many strong augmentation methodologies related to image geometry and interpolation-regularization are hard to utilize since they possibly hurt the location information of the bounding box in the object detection task. To address this, we introduce a simple yet effective data augmentation method, Mix/UnMix (MUM), which unmixes feature tiles for the mixed image tiles for the SSOD framework. |
JongMok Kim; JooYoung Jang; Seunghyeon Seo; Jisoo Jeong; Jongkeun Na; Nojun Kwak; | code |
576 | Learning Non-Target Knowledge for Few-Shot Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing studies in few-shot semantic segmentation only focus on mining the target object information, however, often are hard to tell ambiguous regions, especially in non-target regions, which include background (BG) and Distracting Objects (DOs). To alleviate this problem, we propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query. |
Yuanwei Liu; Nian Liu; Qinglong Cao; Xiwen Yao; Junwei Han; Ling Shao; | code |
577 | UKPGAN: A General Self-Supervised Keypoint Detector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we reckon keypoint detection as information compression, and force the model to distill out important points of an object. |
Yang You; Wenhai Liu; Yanjie Ze; Yong-Lu Li; Weiming Wang; Cewu Lu; | code |
578 | Raw High-Definition Radar for Multi-Task Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel HD radar sensing model, FFT-RadNet, that eliminates the overhead of computing the range-azimuth-Doppler 3D tensor, learning instead to recover angles from a range-Doppler spectrum. |
Julien Rebut; Arthur Ouaknine; Waqas Malik; Patrick Pérez; | code |
579 | Coarse-To-Fine Feature Mining for Video Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there is no research about how to simultaneously learn static and motional contexts which are highly correlated and complementary to each other. To address this problem, we propose a Coarse-to-Fine Feature Mining (CFFM) technique to learn a unified presentation of static contexts and motional contexts. |
Guolei Sun; Yun Liu; Henghui Ding; Thomas Probst; Luc Van Gool; | code |
580 | Compressing Models With Few Samples: Mimicking Then Replacing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher’s in the penultimate layer, and then replaces teacher’s layers before penultimate with a well-tuned compact one. |
Huanyu Wang; Junjie Liu; Xin Ma; Yang Yong; Zhenhua Chai; Jianxin Wu; | code |
581 | PokeBNN: A Binary Pursuit of Lightweight Accuracy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual paths, and tuning the activation function. |
Yichi Zhang; Zhiru Zhang; Lukasz Lew; | code |
582 | Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Apart from high intrinsic similarity between the camouflaged objects and their background, the objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To deal with these problems, we propose a mixed-scale triplet network, ZoomNet, which mimics the behavior of humans when observing vague images, i.e., zooming in and out. |
Youwei Pang; Xiaoqi Zhao; Tian-Zhu Xiang; Lihe Zhang; Huchuan Lu; | code |
583 | SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel MSI representation called Soft Occlusion MSI (SOMSI) that enables modelling high-dimensional appearance features in MSI while retaining the fast rendering times of a standard MSI. |
Tewodros Habtegebrial; Christiano Gava; Marcel Rogge; Didier Stricker; Varun Jampani; | code |
584 | EMScore: Evaluating Video Captioning Via Coarse-Grained and Fine-Grained Embedding Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions. |
Yaya Shi; Xu Yang; Haiyang Xu; Chunfeng Yuan; Bing Li; Weiming Hu; Zheng-Jun Zha; | code |
585 | PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing dual-loop learning framework. |
Kehong Gong; Bingbing Li; Jianfeng Zhang; Tao Wang; Jing Huang; Michael Bi Mi; Jiashi Feng; Xinchao Wang; | code |
586 | Group Contextualization for Video Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an efficient feature refinement method that decomposes the feature channels into several groups and separately refines them with different axial contexts in parallel. |
Yanbin Hao; Hao Zhang; Chong-Wah Ngo; Xiangnan He; | code |
587 | Single-Domain Generalized Object Detection in Urban Scene Via Cyclic-Disentangled Self-Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we are concerned with enhancing the generalization capability of object detectors. |
Aming Wu; Cheng Deng; | code |
588 | L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining. |
Peng-Tao Jiang; Yuqi Yang; Qibin Hou; Yunchao Wei; | code |
589 | Self-Augmented Unpaired Image Dehazing Via Density and Depth Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a self-augmented image dehazing framework, termed D^4 (Dehazing via Decomposing transmission map into Density and Depth) for haze generation and removal. |
Yang Yang; Chaoyue Wang; Risheng Liu; Lin Zhang; Xiaojie Guo; Dacheng Tao; | code |
590 | Neural 3D Video Synthesis From Multi-View Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation. |
Tianye Li; Mira Slavcheva; Michael Zollhöfer; Simon Green; Christoph Lassner; Changil Kim; Tanner Schmidt; Steven Lovegrove; Michael Goesele; Richard Newcombe; Zhaoyang Lv; | code |
591 | SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on this technique, we propose SemAffiNet for point cloud semantic segmentation, which utilizes the attention mechanism in the Transformer module to implicitly and explicitly capture global structural knowledge within local parts for overall comprehension of each category. |
Ziyi Wang; Yongming Rao; Xumin Yu; Jie Zhou; Jiwen Lu; | code |
592 | Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search. |
Han Xiao; Ziwei Wang; Zheng Zhu; Jie Zhou; Jiwen Lu; | code |
593 | HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel attention mechanism for pansharpening called HyperTransformer, in which features of LR-HSI and PAN are formulated as queries and keys in a transformer, respectively. |
Wele Gedara Chaminda Bandara; Vishal M. Patel; | code |
594 | Structure-Aware Flow Generation for Human Body Reshaping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the complicated structure and multifarious appearance of human bodies, existing methods either fall back on the 3D domain via body morphable model or resort to keypoint-based image deformation, leading to inefficiency and unsatisfied visual quality. In this paper, we address these limitations by formulating an end-to-end flow generation architecture under the guidance of body structural priors, including skeletons and Part Affinity Fields, and achieve unprecedentedly controllable performance under arbitrary poses and garments. |
Jianqiang Ren; Yuan Yao; Biwen Lei; Miaomiao Cui; Xuansong Xie; | code |
595 | Learning To Answer Questions in Dynamic Audio-Visual Scenarios Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. |
Guangyao Li; Yake Wei; Yapeng Tian; Chenliang Xu; Ji-Rong Wen; Di Hu; | code |
596 | Synthetic Aperture Imaging With Events and Frames Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the performance of E-SAI is not consistent under sparse occlusions due to the dramatic decrease of signal events. This paper addresses this problem by leveraging the merits of both events and frames, leading to a fusion-based SAI (EF-SAI) that performs consistently under the different densities of occlusions. |
Wei Liao; Xiang Zhang; Lei Yu; Shijie Lin; Wen Yang; Ning Qiao; | code |
597 | MonoGround: Detecting Monocular 3D Objects From The Ground Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the ill-posed 2D to 3D mapping essence from the monocular imaging process, monocular 3D object detection suffers from inaccurate depth estimation and thus has poor 3D detection results. To alleviate this problem, we propose to introduce the ground plane as a prior in the monocular 3d object detection. |
Zequn Qin; Xi Li; | code |
598 | Deep Visual Geo-Localization Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new open-source benchmarking framework for Visual Geo-localization (VG) that allows to build, train, and test a wide range of commonly used architectures, with the flexibility to change individual components of a geo-localization pipeline. |
Gabriele Berton; Riccardo Mereu; Gabriele Trivigno; Carlo Masone; Gabriela Csurka; Torsten Sattler; Barbara Caputo; | code |
599 | StyleGAN-V: A Continuous Video Generator With The Price, Image Quality and Perks of StyleGAN2 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Videos show continuous events, yet most — if not all — video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be — time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. |
Ivan Skorokhodov; Sergey Tulyakov; Mohamed Elhoseiny; | code |
600 | LISA: Learning Implicit Shape and Appearance of Hands Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a do-it-all neural model of human hands, named LISA. |
Enric Corona; Tomas Hodan; Minh Vo; Francesc Moreno-Noguer; Chris Sweeney; Richard Newcombe; Lingni Ma; | code |
601 | Iterative Deep Homography Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Iterative Homography Network, namely IHN, a new deep homography estimation architecture. |
Si-Yuan Cao; Jianxin Hu; Zehua Sheng; Hui-Liang Shen; | code |
602 | Learned Queries for Efficient Local Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new shift-invariant local attention layer, called query and attend (QnA), that aggregates the input locally in an overlapping manner, much like convolutions. |
Moab Arar; Ariel Shamir; Amit H. Bermano; | code |
603 | Colar: Effective and Efficient Online Action Detection By Consulting Exemplars Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper develops an effective exemplar-consultation mechanism that first measures the similarity between a frame and exemplary frames, and then aggregates exemplary features based on the similarity weights. |
Le Yang; Junwei Han; Dingwen Zhang; | code |
604 | SoftGroup for 3D Instance Segmentation on Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the aforementioned problems, this paper proposes a 3D instance segmentation method referred to as SoftGroup by performing bottom-up soft grouping followed by top-down refinement. |
Thang Vu; Kookhoi Kim; Tung M. Luu; Thanh Nguyen; Chang D. Yoo; | code |
605 | MVS2D: Efficient Multi-View Stereo Via Attention-Driven 2D Convolutions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present MVS2D, a highly efficient multi-view stereo algorithm that seamlessly integrates multi-view constraints into single-view networks via an attention mechanism. |
Zhenpei Yang; Zhile Ren; Qi Shan; Qixing Huang; | code |
606 | Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation Via Semantic Knowledge Transfer and Self-Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach including two innovative components. |
Beomyoung Kim; YoungJoon Yoo; Chae Eun Rhee; Junmo Kim; | code |
607 | Deep Constrained Least Squares for Blind Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the problem of blind image super-resolution(SR) with a reformulated degradation model and two novel modules. |
Ziwei Luo; Haibin Huang; Lei Yu; Youwei Li; Haoqiang Fan; Shuaicheng Liu; | code |
608 | EDTER: Edge Detection With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we propose a novel transformer-based edge detector, Edge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges by exploiting the full image context information and detailed local cues simultaneously. |
Mengyang Pu; Yaping Huang; Yuming Liu; Qingji Guan; Haibin Ling; | code |
609 | AirObject: A Temporally Evolving Graph Embedding for Object Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, given the vast distribution of unknown novel objects in the real world, the object identification process must be class-agnostic. In this context, we propose a novel temporal 3D object encoding approach, dubbed AirObject, to obtain global keypoint graph-based embeddings of objects. |
Nikhil Varma Keetha; Chen Wang; Yuheng Qiu; Kuan Xu; Sebastian Scherer; | code |
610 | From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). |
Jiangtong Li; Li Niu; Liqing Zhang; | code |
611 | Semantic-Aware Domain Generalized Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address domain generalized semantic segmentation, where a segmentation model is trained to be domain-invariant without using any target domain data. |
Duo Peng; Yinjie Lei; Munawar Hayat; Yulan Guo; Wen Li; | code |
612 | DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In response to such bias, we would like to re-emphasize that methods for multi-object tracking should also work when object appearance is not sufficiently discriminative. To this end, we propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation. |
Peize Sun; Jinkun Cao; Yi Jiang; Zehuan Yuan; Song Bai; Kris Kitani; Ping Luo; | code |
613 | UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types. To this end, we propose UBnormal, a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. |
Andra Acsintoae; Andrei Florescu; Mariana-Iuliana Georgescu; Tudor Mare; Paul Sumedrea; Radu Tudor Ionescu; Fahad Shahbaz Khan; Mubarak Shah; | code |
614 | AKB-48: A Real-World Articulated Object Knowledge Base Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes, and largely reduce the cost for object modeling in the real world. |
Liu Liu; Wenqiang Xu; Haoyuan Fu; Sucheng Qian; Qiaojun Yu; Yang Han; Cewu Lu; | code |
615 | Stratified Transformer for 3D Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance. |
Xin Lai; Jianhui Liu; Li Jiang; Liwei Wang; Hengshuang Zhao; Shu Liu; Xiaojuan Qi; Jiaya Jia; | code |
616 | Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by that, we propose Augmented NeRF (Aug-NeRF), which for the first time brings the power of robust data augmentations into regularizing the NeRF training. |
Tianlong Chen; Peihao Wang; Zhiwen Fan; Zhangyang Wang; | code |
617 | Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In order to exploit the part-level layouts, we propose a Shape-aware Position Descriptor (SPD) to describe each pixel’s positional feature, where object shape is explicitly encoded into the SPD feature. |
Zhengyao Lv; Xiaoming Li; Zhenxing Niu; Bing Cao; Wangmeng Zuo; | code |
618 | Day-to-Night Image Synthesis for Training Nighttime Neural ISPs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this problem, we propose a method that synthesizes nighttime images from daytime images. |
Abhijith Punnappurath; Abdullah Abuolaim; Abdelrahman Abdelhamed; Alex Levinshtein; Michael S. Brown; | code |