CVPR 2022 Papers with Code/Data

June 7, 2022June 18, 2022 admin

We identified >600 CVPR 2022 papers that have code or data published. We list all of them in the following table. Since the extraction step is done by machines, we may miss some papers. Let us know if more papers can be added to this table.

Readers are also encouraged to read our CVPR 2022 highlights, which associates each CVPR-2022 paper with a one sentence highlight. You may also like to explore our “Best Paper” Digest (CVPR), which lists the most influential CVPR papers since 1988.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper. Based in New York, Paper Digest is dedicated to producing high-quality text analysis results that people can acturally use on a daily basis. Since 2018, we have been serving users across the world with a number of exclusive services on ranking, search, tracking and automatic literature review.

If you do not want to miss interesting academic papers, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: CVPR 2022 Papers with Code/Data

	Paper	Author(s)	Code
1	Controllable Animation of Fluid Elements in Still Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs.	Aniruddha Mahapatra; Kuldeep Kulkarni;	code
2	F-SfT: Shape-From-Template With A Physics-Based Deformation Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to previous works, this paper proposes a new SfT approach explaining 2D observations through physical simulations accounting for forces and material properties.	Navami Kairanda; Edith Tretschk; Mohamed Elgharib; Christian Theobalt; Vladislav Golyanik;	code
3	TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To leverage the unlabeled data to boost model performance, we present a novel Two-Way Inter-label Self-Training framework named TWIST.	Ruihang Chu; Xiaoqing Ye; Zhengzhe Liu; Xiao Tan; Xiaojuan Qi; Chi-Wing Fu; Jiaya Jia;	code
4	Do Learned Representations Respect Causal Relationships? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Data often has many semantic attributes that are causally associated with each other. But do attribute-specific learned representations of data also respect the same causal relations? We answer this question in three steps.	Lan Wang; Vishnu Naresh Boddeti;	code
5	ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step.	Yoad Tewel; Yoav Shalev; Idan Schwartz; Lior Wolf;	code
6	3D Moments From Near-Duplicate Photos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce 3D Moments, a new computational photography effect.	Qianqian Wang; Zhengqi Li; David Salesin; Noah Snavely; Brian Curless; Janne Kontkanen;	code
7	Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.	Yabin Zhang; Minghan Li; Ruihuang Li; Kui Jia; Lei Zhang;	code
8	Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods.	Zejin Wang; Jiazheng Liu; Guoqing Li; Hua Han;	code
9	Balanced and Hierarchical Relation Learning for One-Shot Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the balanced and hierarchical learning for our detector.	Hanqing Yang; Sijia Cai; Hualian Sheng; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Yong Tang; Yu Zhang;	code
10	NICE-SLAM: Neural Implicit Scalable Encoding for SLAM Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation.	Zihan Zhu; Songyou Peng; Viktor Larsson; Weiwei Xu; Hujun Bao; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys;	code
11	Stochastic Trajectory Prediction Via Motion Indeterminacy Diffusion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID), in which we progressively discard indeterminacy from all the walkable areas until reaching the desired trajectory.	Tianpei Gu; Guangyi Chen; Junlong Li; Chunze Lin; Yongming Rao; Jie Zhou; Jiwen Lu;	code
12	CLRNet: Cross Layer Refinement Network for Lane Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present Cross Layer Refinement Network (CLRNet) aiming at fully utilizing both high-level and low-level features in lane detection.	Tu Zheng; Yifei Huang; Yang Liu; Wenjian Tang; Zheng Yang; Deng Cai; Xiaofei He;	code
13	Motion-Aware Contrastive Video Representation Learning Via Foreground-Background Merging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such bias makes the model suffer from weak generalization ability, leading to worse performance on downstream tasks such as action recognition. To alleviate such bias, we propose Foreground-background Merging (FAME) to deliberately compose the moving foreground region of the selected video onto the static background of others.	Shuangrui Ding; Maomao Li; Tianyu Yang; Rui Qian; Haohang Xu; Qingyi Chen; Jue Wang; Hongkai Xiong;	code
14	DINE: Domain Adaptation From Single and Multiple Black-Box Predictors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE).	Jian Liang; Dapeng Hu; Jiashi Feng; Ran He;	code
15	FaceFormer: Speech-Driven 3D Facial Animation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prior works typically focus on learning phoneme-level features of short audio windows with limited context, occasionally resulting in inaccurate lip movements. To tackle this limitation, we propose a Transformer-based autoregressive model, FaceFormer, which encodes the long-term audio context and autoregressively predicts a sequence of animated 3D face meshes.	Yingruo Fan; Zhaojiang Lin; Jun Saito; Wenping Wang; Taku Komura;	code
16	Rotationally Equivariant 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To incorporate object-level rotation equivariance into 3D object detectors, we need a mechanism to extract equivariant features with local object-level spatial support while being able to model cross-object context information. To this end, we propose Equivariant Object detection Network (EON) with a rotation equivariance suspension design to achieve object-level equivariance.	Hong-Xing Yu; Jiajun Wu; Li Yi;	code
17	Accelerating DETR Convergence Via Semantic-Aligned Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that the slow convergence is largely attributed to the complication in matching object queries with target features in different feature embedding spaces. This paper presents SAM-DETR, a Semantic-Aligned-Matching DETR that greatly accelerates DETR’s convergence without sacrificing its accuracy.	Gongjie Zhang; Zhipeng Luo; Yingchen Yu; Kaiwen Cui; Shijian Lu;	code
18	Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, synthesized persons in existing datasets are mostly cartoon-like and in random dress collocation, which limits their performance. To address this, in this work, an automatic approach is proposed to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart.	Yanan Wang; Xuezhi Liang; Shengcai Liao;	code
19	GeoNeRF: Generalizing NeRF With Geometry Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present GeoNeRF, a generalizable photorealistic novel view synthesis method based on neural radiance fields.	Mohammad Mahdi Johari; Yann Lepoittevin; François Fleuret;	code
20	ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos.	Biwen Lei; Xiefan Guo; Hongyu Yang; Miaomiao Cui; Xuansong Xie; Di Huang;	code
21	Expanding Low-Density Latent Regions for Open-Set Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to identify unknown objects by separating high/low-density regions in the latent space, based on the consensus that unknown objects are usually distributed in low-density latent regions.	Jiaming Han; Yuqiang Ren; Jian Ding; Xingjia Pan; Ke Yan; Gui-Song Xia;	code
22	Uformer: A General U-Shaped Transformer for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block.	Zhendong Wang; Xiaodong Cun; Jianmin Bao; Wengang Zhou; Jianzhuang Liu; Houqiang Li;	code
23	Exploring Dual-Task Correlation for Pose Guided Person Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG.	Pengze Zhang; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie;	code
24	Portrait Eyeglasses and Shadow Removal By Leveraging 3D Synthetic Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework to remove eyeglasses as well as their cast shadows from face images.	Junfeng Lyu; Zhibo Wang; Feng Xu;	code
25	Modeling 3D Layout for Group Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, layout ambiguity is introduced because these methods only consider the 2D layout on the imaging plane. In this paper, we overcome the above limitations by 3D layout modeling.	Quan Zhang; Kaiheng Dang; Jian-Huang Lai; Zhanxiang Feng; Xiaohua Xie;	code
26	Toward Fast, Flexible, and Robust Low-Light Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.	Long Ma; Tengyu Ma; Risheng Liu; Xin Fan; Zhongxuan Luo;	code
27	Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos.	Muheng Li; Lei Chen; Yueqi Duan; Zhilan Hu; Jianjiang Feng; Jie Zhou; Jiwen Lu;	code
28	HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Thus, in this work, we propose a novel 3D hand mesh estimation network HandOccNet, that can fully exploits the information at occluded regions as a secondary means to enhance image features and make it much richer.	JoonKyu Park; Yeonguk Oh; Gyeongsik Moon; Hongsuk Choi; Kyoung Mu Lee;	code
29	Modular Action Concept Grounding in Semantic Video Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the idea of Mixture of Experts, we embody each abstract label by a structured combination of various visual concept learners and propose a novel video prediction model, Modular Action Concept Network (MAC).	Wei Yu; Wenxin Chen; Songheng Yin; Steve Easterbrook; Animesh Garg;	code
30	StyleSwin: Transformer-Based GAN for High-Resolution Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we seek to explore using pure transformers to build a generative adversarial network for high-resolution image synthesis.	Bowen Zhang; Shuyang Gu; Bo Zhang; Jianmin Bao; Dong Chen; Fang Wen; Yong Wang; Baining Guo;	code
31	Discrete Cosine Transform Network for Guided Depth Map Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the challenges in interpreting the working mechanism, extracting cross-modal features and RGB texture over-transferred, we propose a novel Discrete Cosine Transform Network (DCTNet) to alleviate the problems from three aspects.	Zixiang Zhao; Jiangshe Zhang; Shuang Xu; Zudi Lin; Hanspeter Pfister;	code
32	Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing.	Xiaoxue Chen; Tianyu Liu; Hao Zhao; Guyue Zhou; Ya-Qin Zhang;	code
33	TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The dominant CNN-based methods for cross-view image geo-localization rely on polar transform and fail to model global correlation. We propose a pure transformer-based approach (TransGeo) to address these limitations from a different perspective.	Sijie Zhu; Mubarak Shah; Chen Chen;	code
34	Contrastive Boundary Learning for Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the segmentation of scene boundaries.	Liyao Tang; Yibing Zhan; Zhe Chen; Baosheng Yu; Dacheng Tao;	code
35	Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts.	Jie Liang; Hui Zeng; Lei Zhang;	code
36	CVNet: Contour Vibration Network for Building Extraction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the physical vibration theory, we propose a contour vibration network (CVNet) for automatic building boundary delineation.	Ziqiang Xu; Chunyan Xu; Zhen Cui; Xiangwei Zheng; Jian Yang;	code
37	Swin Transformer V2: Scaling Up Capacity and Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present techniques for scaling Swin Transformer [??] up to 3 billion parameters and making it capable of training with images of up to 1,536×1,536 resolution.	Ze Liu; Han Hu; Yutong Lin; Zhuliang Yao; Zhenda Xie; Yixuan Wei; Jia Ning; Yue Cao; Zheng Zhang; Li Dong; Furu Wei; Baining Guo;	code
38	Projective Manifold Gradient Layer for Deep Rotation Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a manifold-aware gradient that directly backpropagates into deep network weights.	Jiayi Chen; Yingda Yin; Tolga Birdal; Baoquan Chen; Leonidas J. Guibas; He Wang;	code
39	HCSC: Hierarchical Contrastive Selective Coding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the negative pairs used in these methods are not guaranteed to be semantically distinct, which could further hamper the structural correctness of learned image representations. To tackle these limitations, we propose a novel contrastive learning framework called Hierarchical Contrastive Selective Coding (HCSC).	Yuanfan Guo; Minghao Xu; Jiawen Li; Bingbing Ni; Xuanyu Zhu; Zhenbang Sun; Yi Xu;	code
40	TransRank: Self-Supervised Video Representation Learning Via Ranking-Based Transformation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation.	Haodong Duan; Nanxuan Zhao; Kai Chen; Dahua Lin;	code
41	DiSparse: Disentangled Sparsification for Multitask Model Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.	Xinglong Sun; Ali Hassani; Zhangyang Wang; Gao Huang; Humphrey Shi;	code
42	Pushing The Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make A Difference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We seek to push the limits of a simple-but-effective pipeline for real-world few-shot image classification in practice. To this end, we explore few-shot learning from the perspective of neural architecture, as well as a three stage pipeline of pre-training on external data, meta-training with labelled few-shot tasks, and task-specific fine-tuning on unseen tasks.	Shell Xu Hu; Da Li; Jan Stühmer; Minyoung Kim; Timothy M. Hospedales;	code
43	Towards Efficient and Scalable Sharpness-Aware Minimization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel algorithm LookSAM – that only periodically calculates the inner gradient ascent, to significantly reduce the additional training cost of SAM.	Yong Liu; Siqi Mai; Xiangning Chen; Cho-Jui Hsieh; Yang You;	code
44	OSSO: Obtaining Skeletal Shape From Outside Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We address the problem of inferring the anatomic skeleton of a person, in an arbitrary pose, from the 3D surface of the body; i.e. we predict the inside (bones) from the outside (skin).	Marilyn Keller; Silvia Zuffi; Michael J. Black; Sergi Pujades;	code
45	A Study on The Distribution of Social Biases in Self-Supervised Learning Visual Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the biases of a varied set of SSL visual models, trained using ImageNet data, using a method and dataset designed by psychological experts to measure social biases.	Kirill Sirotkin; Pablo Carballeira; Marcos Escudero-Viñolo;	code
46	Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, instead of following previous literature, we propose Self-Supervised Predictive Learning (SSPL), a negative-free method for sound localization via explicit positive mining.	Zengjie Song; Yuxi Wang; Junsong Fan; Tieniu Tan; Zhaoxiang Zhang;	code
47	Comparing Correspondences: Video Prediction With Correspondence-Wise Losses Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Image prediction methods often struggle on tasks that require changing the positions of objects, such as video prediction, producing blurry images that average over the many positions that objects might occupy. In this paper, we propose a simple change to existing image similarity metrics that makes them more robust to positional errors: we match the images using optical flow, then measure the visual similarity of corresponding pixels.	Daniel Geng; Max Hamilton; Andrew Owens;	code
48	Towards Fewer Annotations: Active Learning Via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple region-based active learning approach for semantic segmentation under a domain shift, aiming to automatically query a small partition of image regions to be labeled while maximizing segmentation performance.	Binhui Xie; Longhui Yuan; Shuang Li; Chi Harold Liu; Xinjing Cheng;	code
49	CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe in the real world that humans are capable of mapping the visual concepts learnt from 2D images to understand the 3D world. Encouraged by this insight, we propose CrossPoint, a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations.	Mohamed Afham; Isuru Dissanayake; Dinithi Dissanayake; Amaya Dharmasiri; Kanchana Thilakarathna; Ranga Rodrigo;	code
50	Few Shot Generative Model Adaption Via Relaxed Spatial Structural Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing methods are prone to model overfitting and collapse in extremely few shot setting (less than 10). To solve this problem, we propose a relaxed spatial structural alignment (RSSA) method to calibrate the target generative models during the adaption.	Jiayu Xiao; Liang Li; Chaofei Wang; Zheng-Jun Zha; Qingming Huang;	code
51	Enhancing Adversarial Training With Second-Order Statistics of Weights Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that treating model weights as random variables allows for enhancing adversarial training through Second-Order Statistics Optimization (S^2O) with respect to the weights.	Gaojie Jin; Xinping Yi; Wei Huang; Sven Schewe; Xiaowei Huang;	code
52	Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum.	Chaoning Zhang; Kang Zhang; Trung X. Pham; Axi Niu; Zhinan Qiao; Chang D. Yoo; In So Kweon;	code
53	Moving Window Regression: A Novel Approach to Ordinal Regression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel ordinal regression algorithm, called moving window regression (MWR), is proposed in this paper.	Nyeong-Ho Shin; Seon-Ho Lee; Chang-Su Kim;	code
54	Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Different from related methods, we propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block.	Nicolae-Cătălin Ristea; Neelu Madan; Radu Tudor Ionescu; Kamal Nasrollahi; Fahad Shahbaz Khan; Thomas B. Moeslund; Mubarak Shah;	code
55	Robust Optimization As Data Augmentation for Large-Scale Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.	Kezhi Kong; Guohao Li; Mucong Ding; Zuxuan Wu; Chen Zhu; Bernard Ghanem; Gavin Taylor; Tom Goldstein;	code
56	Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast to the literature, we propose a family of robust structured declarative classifiers for point cloud classification, where the internal constrained optimization mechanism can effectively defend adversarial attacks through implicit gradients.	Kaidong Li; Ziming Zhang; Cuncong Zhong; Guanghui Wang;	code
57	Improving The Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, prior works utilize simple image transformations such as resizing, which limits input diversity. To tackle this limitation, we propose the object-based diverse input (ODI) method that draws an adversarial image on a 3D object and induces the rendered image to be classified as the target class.	Junyoung Byun; Seungju Cho; Myung-Joon Kwon; Hee-Seon Kim; Changick Kim;	code
58	ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present ObjectFolder 2.0, a large-scale, multisensory dataset of common household objects in the form of implicit neural representations that significantly enhances ObjectFolder 1.0 in three aspects.	Ruohan Gao; Zilin Si; Yen-Yu Chang; Samuel Clarke; Jeannette Bohg; Li Fei-Fei; Wenzhen Yuan; Jiajun Wu;	code
59	360MonoDepth: High-Resolution 360deg Monocular Depth Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a flexible framework for monocular depth estimation from high-resolution 360deg images using tangent images.	Manuel Rey-Area; Mingze Yuan; Christian Richardt;	code
60	POCO: Point Convolution for Surface Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Besides, relying on fixed patch sizes may require discretization tuning. To address these issues, we propose to use point cloud convolutions and compute latent vectors at each input point.	Alexandre Boulch; Renaud Marlet;	code
61	Neural Texture Extraction and Distribution for Controllable Person Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Observing that person images are highly structured, we propose to generate desired images by extracting and distributing semantic entities of reference images.	Yurui Ren; Xiaoqing Fan; Ge Li; Shan Liu; Thomas H. Li;	code
62	Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Today’s VidSGG models are all proposal-based methods, i.e., they first generate numerous paired subject-object snippets as proposals, and then conduct predicate classification for each proposal. In this paper, we argue that this prevalent proposal-based framework has three inherent drawbacks: 1) The ground-truth predicate labels for proposals are partially correct.	Kaifeng Gao; Long Chen; Yulei Niu; Jian Shao; Jun Xiao;	code
63	DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).	Ming Tao; Hao Tang; Fei Wu; Xiao-Yuan Jing; Bing-Kun Bao; Changsheng Xu;	code
64	ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we take a step towards computer-aided waste detection and present the first in-the-wild industrial-grade waste detection and segmentation dataset, ZeroWaste.	Dina Bashkirova; Mohamed Abdelfattah; Ziliang Zhu; James Akl; Fadi Alladkani; Ping Hu; Vitaly Ablavsky; Berk Calli; Sarah Adel Bargal; Kate Saenko;	code
65	UNIST: Unpaired Neural Implicit Shape Translation Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce UNIST, the first deep neural implicit model for general-purpose, unpaired shape-to-shape translation, in both 2D and 3D domains.	Qimin Chen; Johannes Merz; Aditya Sanghi; Hooman Shayani; Ali Mahdavi-Amiri; Hao Zhang;	code
66	APES: Articulated Part Extraction From Sprite Sheets Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Creating these puppets requires partitioning characters into independently moving parts. In this work, we present a method to automatically identify such articulated parts from a small set of character poses shown in a sprite sheet, which is an illustration of the character that artists often draw before puppet creation.	Zhan Xu; Matthew Fisher; Yang Zhou; Deepali Aneja; Rushikesh Dudhat; Li Yi; Evangelos Kalogerakis;	code
67	SPAct: Self-Supervised Privacy Preservation for Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For the first time, we present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels.	Ishan Rajendrakumar Dave; Chen Chen; Mubarak Shah;	code
68	De-Rendering 3D Objects in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters.	Felix Wimbauer; Shangzhe Wu; Christian Rupprecht;	code
69	Global Sensing and Measurements Reuse for Image Compressed Sensing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, using measurements only once may not be enough for extracting richer information from measurements. To address these issues, we propose a novel Measurements Reuse Convolutional Compressed Sensing Network (MR-CCSNet) which employs Global Sensing Module (GSM) to collect all level features for achieving an efficient sensing and Measurements Reuse Block (MRB) to reuse measurements multiple times on multi-scale.	Zi-En Fan; Feng Lian; Jia-Ni Quan;	code
70	Practical Evaluation of Adversarial Robustness Via Adaptive Auto Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable (i.e., approaching the lower bound of robustness). Towards this target, we propose a parameter-free Adaptive Auto Attack (A3) evaluation method which addresses the efficiency and reliability in a test-time-training fashion.	Ye Liu; Yaya Cheng; Lianli Gao; Xianglong Liu; Qilong Zhang; Jingkuan Song;	code
71	Cross-View Transformers for Real-Time Map-View Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present cross-view transformers, an efficient attention-based model for map-view semantic segmentation from multiple cameras.	Brady Zhou; Philipp Krähenbühl;	code
72	Controllable Dynamic Multi-Task Architectures Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints.	Dripta S. Raychaudhuri; Yumin Suh; Samuel Schulter; Xiang Yu; Masoud Faraki; Amit K. Roy-Chowdhury; Manmohan Chandraker;	code
73	FastDOG: Fast Discrete Optimization on GPU Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a massively parallel Lagrange decomposition method for solving 0–1 integer linear programs occurring in structured prediction.	Ahmed Abbas; Paul Swoboda;	code
74	Focal and Global Knowledge Distillation for Detectors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background.	Zhendong Yang; Zhe Li; Xiaohu Jiang; Yuan Gong; Zehuan Yuan; Danpei Zhao; Chun Yuan;	code
75	Learning To Prompt for Continual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a more succinct memory system without accessing task identity at test time.	Zifeng Wang; Zizhao Zhang; Chen-Yu Lee; Han Zhang; Ruoxi Sun; Xiaoqi Ren; Guolong Su; Vincent Perot; Jennifer Dy; Tomas Pfister;	code
76	Human Mesh Recovery From Multiple Shots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the richness of data comes at the expense of fundamental challenges such as abrupt shot changes and close up shots of actors with heavy truncation, which limits the applicability of existing 3D human understanding methods. In this paper, we address these limitations with the insight that while shot changes of the same scene incur a discontinuity between frames, the 3D structure of the scene still changes smoothly.	Georgios Pavlakos; Jitendra Malik; Angjoo Kanazawa;	code
77	Convolution of Convolution: Let Kernels Spatially Collaborate Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the biological visual pathway, especially the retina, neurons are tiled along spatial dimensions with the electrical coupling as their local association, while in a convolution layer, kernels are placed along the channel dimension singly. We propose Convolution of Convolution, associating kernels in a layer and letting them collaborate spatially.	Rongzhen Zhao; Jian Li; Zhenzhi Wu;	code
78	Make It Move: Controllable Image-to-Video Generation With Text Descriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The key challenges of TI2V task lie both in aligning appearance and motion from different modalities, and in handling uncertainty in text descriptions. To address these challenges, we propose a Motion Anchor-based video GEnerator (MAGE) with an innovative motion anchor (MA) structure to store appearance-motion aligned representation.	Yaosi Hu; Chong Luo; Zhenzhong Chen;	code
79	Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Neural Points, a novel point cloud representation and apply it to the arbitrary-factored upsampling task.	Wanquan Feng; Jin Li; Hongrui Cai; Xiaonan Luo; Juyong Zhang;	code
80	Video-Text Representation Learning Via Differentiable Weak Temporal Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).	Dohwan Ko; Joonmyung Choi; Juyeon Ko; Shinyeong Noh; Kyoung-Woon On; Eun-Sol Kim; Hyunwoo J. Kim;	code
81	Bi-Directional Object-Context Prioritization Learning for Saliency Ranking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.	Xin Tian; Ke Xu; Xin Yang; Lin Du; Baocai Yin; Rynson W.H. Lau;	code
82	Vehicle Trajectory Prediction Works, But Not Everywhere Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel method that automatically generates realistic scenes causing state-of-the-art models to go off-road.	Mohammadhossein Bahari; Saeed Saadatnejad; Ahmad Rahimi; Mohammad Shaverdikondori; Amir Hossein Shahidzadeh; Seyed-Mohsen Moosavi-Dezfooli; Alexandre Alahi;	code
83	MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transformer network for monocular 3D object detection.	Kuan-Chih Huang; Tsung-Han Wu; Hung-Ting Su; Winston H. Hsu;	code
84	Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents new hierarchically cascaded transformers that can improve data efficiency through attribute surrogates learning and spectral tokens pooling.	Yangji He; Weihan Liang; Dongyang Zhao; Hong-Yu Zhou; Weifeng Ge; Yizhou Yu; Wenqiang Zhang;	code
85	Generalized Category Discovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set.	Sagar Vaze; Kai Han; Andrea Vedaldi; Andrew Zisserman;	code
86	Contour-Hugging Heatmaps for Landmark Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an effective and easy-to-implement method for simultaneously performing landmark detection in images and obtaining an ingenious uncertainty measurement for each landmark.	James McCouat; Irina Voiculescu;	code
87	Voxel Field Fusion for 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.	Yanwei Li; Xiaojuan Qi; Yukang Chen; Liwei Wang; Zeming Li; Jian Sun; Jiaya Jia;	code
88	DisARM: Displacement Aware Relation Module for 3D Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Displacement Aware Relation Module (DisARM), a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes.	Yao Duan; Chenyang Zhu; Yuqing Lan; Renjiao Yi; Xinwang Liu; Kai Xu;	code
89	MixFormer: Mixing Features Across Windows and Dimensions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self-attention within non-overlapped windows and shares weights on the channel dimension. We propose MixFormer to find a solution.	Qiang Chen; Qiman Wu; Jian Wang; Qinghao Hu; Tao Hu; Errui Ding; Jian Cheng; Jingdong Wang;	code
90	FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose to parse pairwise query and exemplar action instances into consecutive steps with diverse semantic and temporal correspondences.	Jinglin Xu; Yongming Rao; Xumin Yu; Guangyi Chen; Jie Zhou; Jiwen Lu;	code
91	HEAT: Holistic Edge Attention Transformer for Structured Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure.	Jiacheng Chen; Yiming Qian; Yasutaka Furukawa;	code
92	Mobile-Former: Bridging MobileNet and Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between.	Yinpeng Chen; Xiyang Dai; Dongdong Chen; Mengchen Liu; Xiaoyi Dong; Lu Yuan; Zicheng Liu;	code
93	CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the difficulties, we propose a new framework for scribble learning-based medical image segmentation, which is composed of mix augmentation and cycle consistency and thus is referred to as CycleMix.	Ke Zhang; Xiahai Zhuang;	code
94	VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR.	Zeyuan Chen; Yinbo Chen; Jingwen Liu; Xingqian Xu; Vidit Goel; Zhangyang Wang; Humphrey Shi; Xiaolong Wang;	code
95	Towards End-to-End Unified Scene Text Detection and Layout Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis.	Shangbang Long; Siyang Qin; Dmitry Panteleev; Alessandro Bissacco; Yasuhisa Fujii; Michalis Raptis;	code
96	AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an autoregressive prior for 3D shapes to solve multimodal 3D tasks such as shape completion, reconstruction, and generation.	Paritosh Mittal; Yen-Chi Cheng; Maneesh Singh; Shubham Tulsiani;	code
97	ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we first show that optimal neural architectures in the DIP framework are image-dependent. Leveraging this insight, we then propose an image-specific NAS strategy for the DIP framework that requires substantially less training than typical NAS approaches, effectively enabling image-specific NAS.	Metin Ersin Arican; Ozgur Kara; Gustav Bredell; Ender Konukoglu;	code
98	End-to-End Referring Video Object Segmentation With Multimodal Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple Transformer-based approach to RVOS.	Adam Botach; Evgenii Zheltonozhskii; Chaim Baskin;	code
99	IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present IterMVS, a new data-driven method for high-resolution multi-view stereo.	Fangjinhua Wang; Silvano Galliani; Christoph Vogel; Marc Pollefeys;	code
100	Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, the foreground points are inherently more important than background points for object detectors. Motivated by this, we propose a highly-efficient single-stage point-based 3D detector in this paper, termed IA-SSD.	Yifan Zhang; Qingyong Hu; Guoquan Xu; Yanxin Ma; Jianwei Wan; Yulan Guo;	code
101	Detecting Camouflaged Object in Frequency Domain Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To well involve the frequency clues into the CNN models, we present a powerful network with two special components.	Yijie Zhong; Bo Li; Lv Tang; Senyun Kuang; Shuang Wu; Shouhong Ding;	code
102	SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SelfRecon, a clothed human body reconstruction method that combines implicit and explicit representations to recover space-time coherent geometries from a monocular self-rotating human video.	Boyi Jiang; Yang Hong; Hujun Bao; Juyong Zhang;	code
103	Equivariant Point Cloud Analysis Via Learning Orientations for Message Passing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel and simple framework to achieve equivariance for point cloud analysis based on the message passing (graph neural network) scheme.	Shitong Luo; Jiahan Li; Jiaqi Guan; Yufeng Su; Chaoran Cheng; Jian Peng; Jianzhu Ma;	code
104	Node Representation Learning in Graph Via Node-to-Neighbourhood Mutual Information Maximization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a simple-yet-effective self-supervised node representation learning strategy via directly maximizing the mutual information between the hidden representations of nodes and their neighbourhood, which can be theoretically justified by its link to graph smoothing.	Wei Dong; Junsheng Wu; Yi Luo; Zongyuan Ge; Peng Wang;	code
105	Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is called inner-video overfitting, and it would actually lead to inferior performance. To tackle this issue, we propose a novel inter-frame feature reconstruction (IFR) technique to leverage the ground-truth labels to supervise the model training on unlabeled frames.	Jiafan Zhuang; Zilei Wang; Yuan Gao;	code
106	Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With A Bayesian Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This task is particularly challenging for deep neural networks because data is difficult to obtain and annotate. Therefore, we formulate amodal segmentation as an out-of-task and out-of-distribution generalization problem.	Yihong Sun; Adam Kortylewski; Alan Yuille;	code
107	How Well Do Sparse ImageNet Models Transfer? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned–that is, compressed by sparsifiying their connections.	Eugenia Iofinova; Alexandra Peste; Mark Kurtz; Dan Alistarh;	code
108	REX: Reasoning-Aware and Grounded Explanation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images.	Shi Chen; Qi Zhao;	code
109	Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations.	Yang You; Zelin Ye; Yujing Lou; Chengkun Li; Yong-Lu Li; Lizhuang Ma; Weiming Wang; Cewu Lu;	code
110	Object-Aware Video-Language Pre-Training for Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations.	Jinpeng Wang; Yixiao Ge; Guanyu Cai; Rui Yan; Xudong Lin; Ying Shan; Xiaohu Qie; Mike Zheng Shou;	code
111	MAT: Mask-Aware Transformer for Large Hole Image Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images.	Wenbo Li; Zhe Lin; Kun Zhou; Lu Qi; Yi Wang; Jiaya Jia;	code
112	Align and Prompt: Video-and-Language Pre-Training With Entity Prompts Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Align and Prompt: an efficient and effective video-and-language pre-training framework with better cross-modal alignment.	Dongxu Li; Junnan Li; Hongdong Li; Juan Carlos Niebles; Steven C.H. Hoi;	code
113	MSG-Transformer: Exchanging Local Spatial Information By Manipulating Messenger Tokens Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG).	Jiemin Fang; Lingxi Xie; Xinggang Wang; Xiaopeng Zhang; Wenyu Liu; Qi Tian;	code
114	Cross Modal Retrieval With Querybank Normalisation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space.	Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu; Samuel Albanie;	code
115	Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel monocular ray-based 3D (Ray3D) absolute human pose estimation with calibrated camera.	Yu Zhan; Fenghai Li; Renliang Weng; Wongun Choi;	code
116	ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods.	Bo He; Xitong Yang; Le Kang; Zhiyu Cheng; Xin Zhou; Abhinav Shrivastava;	code
117	Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31×31, in contrast to commonly used 3×3.	Xiaohan Ding; Xiangyu Zhang; Jungong Han; Guiguang Ding;	code
118	End-to-End Multi-Person Pose Estimation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the first fully end-to-end multi-person Pose Estimation framework with TRansformers, termed PETR.	Dahu Shi; Xing Wei; Liangqi Li; Ye Ren; Wenming Tan;	code
119	REGTR: End-to-End Point Cloud Correspondences With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we conjecture that attention mechanisms can replace the role of explicit feature matching and RANSAC, and thus propose an end-to-end framework to directly predict the final set of correspondences.	Zi Jian Yew; Gim Hee Lee;	code
120	Neural 3D Scene Reconstruction With The Manhattan-World Assumption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.	Haoyu Guo; Sida Peng; Haotong Lin; Qianqian Wang; Guofeng Zhang; Hujun Bao; Xiaowei Zhou;	code
121	V2C: Visual Voice Cloning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there also exist many scenarios that cannot be well reflected by these VC tasks, such as movie dubbing, which requires the speech to be with emotions consistent with the movie plots. To fill this gap, in this work we propose a new task named Visual Voice Cloning (V2C), which seeks to convert a paragraph of text to a speech with both desired voice specified by a reference audio and desired emotion specified by a reference video.	Qi Chen; Mingkui Tan; Yuankai Qi; Jiaqiu Zhou; Yuanqing Li; Qi Wu;	code
122	Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we revisit the average precision (AP) loss and reveal that the crucial element is that of selecting the ranking pairs between positive and negative samples.	Dongli Xu; Jinhong Deng; Wen Li;	code
123	MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations and focuses on crawling and aligning available audio descriptions of mainstream movies.	Mattia Soldan; Alejandro Pardo; Juan León Alcázar; Fabian Caba; Chen Zhao; Silvio Giancola; Bernard Ghanem;	code
124	Gait Recognition in The Wild With Dense 3D Representations and A Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In particular, we propose a novel framework to explore the 3D Skinned Multi-Person Linear (SMPL) model of the human body for gait recognition, named SMPLGait.	Jinkai Zheng; Xinchen Liu; Wu Liu; Lingxiao He; Chenggang Yan; Tao Mei;	code
125	ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, constructing both valid and diverse hand-object interactions and efficiently learning from the vast synthetic data is still challenging. To address the above issues, we propose ArtiBoost, a lightweight online data enhancement method.	Lixin Yang; Kailin Li; Xinyu Zhan; Jun Lv; Wenqiang Xu; Jiefeng Li; Cewu Lu;	code
126	QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To get the best of two worlds, we propose QueryDet that uses a novel query mechanism to accelerate the inference speed of feature-pyramid based object detectors.	Chenhongyi Yang; Zehao Huang; Naiyan Wang;	code
127	IDEA-Net: Dynamic 3D Point Cloud Interpolation Via Deep Embedding Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the challenges, we propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency.	Yiming Zeng; Yue Qian; Qijian Zhang; Junhui Hou; Yixuan Yuan; Ying He;	code
128	BEHAVE: Dataset and Method for Tracking Human Object Interactions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key insight is to predict correspondences from the human and the object to a statistical body model to obtain human-object contacts during interactions.	Bharat Lal Bhatnagar; Xianghui Xie; Ilya A. Petrov; Cristian Sminchisescu; Christian Theobalt; Gerard Pons-Moll;	code
129	Revisiting Random Channel Pruning for Neural Network Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we try to determine the channel configuration of the pruned models by random search.	Yawei Li; Kamil Adamczewski; Wen Li; Shuhang Gu; Radu Timofte; Luc Van Gool;	code
130	Generating Diverse and Natural 3D Human Motions From Text Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead of directly engaging with pose sequences, we propose motion snippet code as our internal motion representation, which captures local semantic motion contexts and is empirically shown to facilitate the generation of plausible motions faithful to the input text.	Chuan Guo; Shihao Zou; Xinxin Zuo; Sen Wang; Wei Ji; Xingyu Li; Li Cheng;	code
131	E-CIR: Event-Enhanced Continuous Intensity Recovery Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents E-CIR, which converts a blurry image into a sharp video represented as a parametric function from time to intensity.	Chen Song; Qixing Huang; Chandrajit Bajaj;	code
132	Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A systematic evaluation of key modules in existing methods is performed in terms of their robustness against adversarial attacks. From the insights of our analysis, we construct a more robust deraining method by integrating these effective modules.	Yi Yu; Wenhan Yang; Yap-Peng Tan; Alex C. Kot;	code
133	Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a keypoint-based object-level SLAM framework that can provide globally consistent 6DoF pose estimates for symmetric and asymmetric objects alike.	Nathaniel Merrill; Yuliang Guo; Xingxing Zuo; Xinyu Huang; Stefan Leutenegger; Xi Peng; Liu Ren; Guoquan Huang;	code
134	AziNorm: Exploiting The Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Point cloud, the most important data format for 3D environmental perception, is naturally endowed with strong radial symmetry. In this work, we exploit this radial symmetry via a divide-and-conquer strategy to boost 3D perception performance and ease optimization.	Shaoyu Chen; Xinggang Wang; Tianheng Cheng; Wenqiang Zhang; Qian Zhang; Chang Huang; Wenyu Liu;	code
135	Weakly Supervised Rotation-Invariant Aerial Object Detection Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, current solutions have been prone to fall into the issue with unstable detectors, as they ignore lower-scored instances and may regard them as backgrounds. To address these issues, in this paper, we construct a novel end-to-end weakly supervised Rotation-Invariant aerial object detection Network (RINet).	Xiaoxu Feng; Xiwen Yao; Gong Cheng; Junwei Han;	code
136	Surface Reconstruction From Point Clouds By Learning Predictive Context Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this requires the local context prior to generalize to a wide variety of unseen target regions, which is hard to achieve. To resolve this issue, we introduce Predictive Context Priors by learning Predictive Queries for each specific point cloud at inference time.	Baorui Ma; Yu-Shen Liu; Matthias Zwicker; Zhizhong Han;	code
137	IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering.	Rui Zhu; Zhengqin Li; Janarbek Matai; Fatih Porikli; Manmohan Chandraker;	code
138	DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To that end, we propose the DynamicEarthNet dataset that consists of daily, multi-spectral satellite observations of 75 selected areas of interest distributed over the globe with imagery from Planet Labs.	Aysim Toker; Lukas Kondmann; Mark Weber; Marvin Eisenberger; Andrés Camero; Jingliang Hu; Ariadna Pregel Hoderlein; Çağlar Şenaras; Timothy Davis; Daniel Cremers; Giovanni Marchisio; Xiao Xiang Zhu; Laura Leal-Taixé;	code
139	Weakly Supervised Temporal Action Localization Via Representative Snippet Knowledge Propagation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many existing methods seek to generate pseudo labels for bridging the discrepancy between classification and localization, but usually only make use of limited contextual information for pseudo label generation. To alleviate this problem, we propose a representative snippet summarization and propagation framework.	Linjiang Huang; Liang Wang; Hongsheng Li;	code
140	E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel contour-based method, named E2EC, for high-quality instance segmentation.	Tao Zhang; Shiqing Wei; Shunping Ji;	code
141	BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch.	Zhi Hou; Baosheng Yu; Dacheng Tao;	code
142	Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the classifier focuses only on the discriminative regions while ignoring other useful information in each image, resulting in incomplete localization maps. To address this issue, we propose a Self-supervised Image-specific Prototype Exploration (SIPE) that consists of an Image-specific Prototype Exploration (IPE) and a General-Specific Consistency (GSC) loss.	Qi Chen; Lingxiao Yang; Jian-Huang Lai; Xiaohua Xie;	code
143	Learning Multi-View Aggregation in The Wild for Large-Scale 3D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, we propose an end-to-end trainable multi-view aggregation model leveraging the viewing conditions of 3D points to merge features from images taken at arbitrary positions.	Damien Robert; Bruno Vallet; Loic Landrieu;	code
144	PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These methods can be negatively influenced by strong illumination conditions causing shading-reflectance leakages. Therefore, in this paper, an end-to-end edge-driven hybrid CNN approach is proposed for intrinsic image decomposition.	Partha Das; Sezer Karaoglu; Theo Gevers;	code
145	Clothes-Changing Person Re-Identification With RGB Modality Only Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Clothes-based Adversarial Loss (CAL) to mine clothes-irrelevant features from the original RGB images by penalizing the predictive power of re-id model w.r.t. clothes.	Xinqian Gu; Hong Chang; Bingpeng Ma; Shutao Bai; Shiguang Shan; Xilin Chen;	code
146	Robust Image Forgery Detection Over Online Social Network Shared Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fight against the OSN-shared forgeries, in this work, a novel robust training scheme is proposed.	Haiwei Wu; Jiantao Zhou; Jinyu Tian; Jun Liu;	code
147	Representation Compensation Networks for Continual Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the continual semantic segmentation problem, where the deep neural networks are required to incorporate new classes continually without catastrophic forgetting.	Chang-Bin Zhang; Jia-Wen Xiao; Xialei Liu; Ying-Cong Chen; Ming-Ming Cheng;	code
148	Tracking People By Predicting 3D Appearance, Location and Pose Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present an approach for tracking people in monocular videos by predicting their future 3D representations.	Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Jitendra Malik;	code
149	Text2Mesh: Text-Driven Neural Stylization for Meshes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we develop intuitive controls for editing the style of 3D objects.	Oscar Michel; Roi Bar-On; Richard Liu; Sagie Benaim; Rana Hanocka;	code
150	C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we find that there are mainly two challenges of medical images in WSSS: i) the boundary of object foreground and background is not clear; ii) the co-occurrence phenomenon is very severe in training stage. We thus propose a Causal CAM (C-CAM) method to overcome the above challenges.	Zhang Chen; Zhiqiang Tian; Jihua Zhu; Ce Li; Shaoyi Du;	code
151	Forward Compatible Few-Shot Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, we suggest learning prospectively to prepare for future updates, and propose ForwArd Compatible Training (FACT) for FSCIL.	Da-Wei Zhou; Fu-Yun Wang; Han-Jia Ye; Liang Ma; Shiliang Pu; De-Chuan Zhan;	code
152	Weakly Supervised Object Localization As Domain Adaption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the MIL mechanism makes CAM only activate discriminative object parts rather than the whole object, weakening its performance for localizing objects. To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects.	Lei Zhu; Qi She; Qian Chen; Yunfei You; Boyu Wang; Yanye Lu;	code
153	Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the Tencent-MVSE dataset, which is the first benchmark dataset for the multi-modal video similarity evaluation task.	Zhaoyang Zeng; Yongsheng Luo; Zhenhua Liu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen;	code
154	Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting.	Nicolas Donati; Etienne Corman; Maks Ovsjanikov;	code
155	Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels.	Zhiyuan Liang; Tiancai Wang; Xiangyu Zhang; Jian Sun; Jianbing Shen;	code
156	MatteFormer: Transformer-Based Image Matting Via Prior-Tokens Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block.	GyuTae Park; SungJoon Son; JaeYoung Yoo; SeHo Kim; Nojun Kwak;	code
157	Video Shadow Detection Via Spatio-Temporal Interpolation Consistency Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Using a model trained on labeled images to the video frames directly may lead to high generalization error and temporal inconsistent results. In this paper, we address these challenges by proposing a Spatio-Temporal Interpolation Consistency Training (STICT) framework to rationally feed the unlabeled video frames together with the labeled images into an image shadow detection network training.	Xiao Lu; Yihong Cao; Sheng Liu; Chengjiang Long; Zipei Chen; Xuanyu Zhou; Yimin Yang; Chunxia Xiao;	code
158	Robust and Accurate Superquadric Recovery: A Probabilistic Approach Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The superquadric recovery is formulated as a Maximum Likelihood Estimation (MLE) problem. We propose an algorithm, Expectation, Maximization, and Switching (EMS), to solve this problem, where: (1) outliers are predicted from the posterior perspective; (2) the superquadric parameter is optimized by the trust-region reflective algorithm; and (3) local optima are avoided by globally searching and switching among parameters encoding similar superquadrics.	Weixiao Liu; Yuwei Wu; Sipu Ruan; Gregory S. Chirikjian;	code
159	Grounding Answers for Visual Questions Asked By Visually Impaired People Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual questions asked by people with visual impairments.	Chongyan Chen; Samreen Anjum; Danna Gurari;	code
160	Sparse Instance Activation for Real-Time Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a conceptually novel, efficient, and fully convolutional framework for real-time instance segmentation.	Tianheng Cheng; Xinggang Wang; Shaoyu Chen; Wenqiang Zhang; Qian Zhang; Chang Huang; Zhaoxiang Zhang; Wenyu Liu;	code
161	VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose VisualGPT, which employs a novel self-resurrecting encoder-decoder attention mechanism to quickly adapt the PLM with a small amount of in-domain image-text data.	Jun Chen; Han Guo; Kai Yi; Boyang Li; Mohamed Elhoseiny;	code
162	MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, those works ignore the fact that it is an inverse problem where multiple feasible solutions (i.e., hypotheses) exist. To relieve this limitation, we propose a Multi-Hypothesis Transformer (MHFormer) that learns spatio-temporal representations of multiple plausible pose hypotheses.	Wenhao Li; Hong Liu; Hao Tang; Pichao Wang; Luc Van Gool;	code
163	Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos.	Tianhan Xu; Yasuhiro Fujita; Eiichi Matsumoto;	code
164	Towards Implicit Text-Guided 3D Shape Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we explore the challenging task of generating 3D shapes from text.	Zhengzhe Liu; Yi Wang; Xiaojuan Qi; Chi-Wing Fu;	code
165	SoftCollage: A Differentiable Probabilistic Tree Generator for Image Collage Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose SoftCollage, a novel method that employs a neural-based differentiable probabilistic tree generator to produce the probability distribution of correlation-preserving collage tree conditioned on deep image feature, aspect ratio and canvas size.	Jiahao Yu; Li Chen; Mingrui Zhang; Mading Li;	code
166	Query and Attention Augmentation for Knowledge-Based Explainable Reasoning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To bridge this research gap, we present Query and Attention Augmentation, a general approach that augments neural module networks to jointly reason about visual and external knowledge.	Yifeng Zhang; Ming Jiang; Qi Zhao;	code
167	Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground.	Tristan Thrush; Ryan Jiang; Max Bartolo; Amanpreet Singh; Adina Williams; Douwe Kiela; Candace Ross;	code
168	Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net).	Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang;	code
169	Fine-Grained Object Classification Via Self-Supervised Pose Alignment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network.	Xuhui Yang; Yaowei Wang; Ke Chen; Yong Xu; Yonghong Tian;	code
170	Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing animal behavior datasets have limitations in multiple aspects, including limited numbers of animal classes, data samples and provided tasks, and also limited variations in environmental conditions and viewpoints. To address these limitations, we create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks to enable a more thorough understanding of natural animal behaviors.	Xun Long Ng; Kian Eng Ong; Qichen Zheng; Yun Ni; Si Yong Yeo; Jun Liu;	code
171	Fine-Grained Temporal Contrastive Learning for Weakly-Supervised Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances.	Junyu Gao; Mengyuan Chen; Changsheng Xu;	code
172	Relieving Long-Tailed Instance Segmentation Via Pairwise Class Balance Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one.	Yin-Yin He; Peizhen Zhang; Xiu-Shen Wei; Xiangyu Zhang; Jian Sun;	code
173	Online Convolutional Re-Parameterization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present online convolutional re-parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.	Mu Hu; Junyi Feng; Jiashen Hua; Baisheng Lai; Jianqiang Huang; Xiaojin Gong; Xian-Sheng Hua;	code
174	Mimicking The Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that, with fewer training classes, the data representations of each class lie in a long and narrow region; with more training classes, the representations of each class scatter more uniformly. Inspired by this observation, we propose Class-wise Decorrelation (CwD) that effectively regularizes representations of each class to scatter more uniformly, thus mimicking the model jointly trained with all classes (i.e., the oracle model).	Yujun Shi; Kuangqi Zhou; Jian Liang; Zihang Jiang; Jiashi Feng; Philip H.S. Torr; Song Bai; Vincent Y. F. Tan;	code
175	RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.	Jun Chen; Aniket Agarwal; Sherif Abdelkarim; Deyao Zhu; Mohamed Elhoseiny;	code
176	Personalized Image Aesthetics Assessment With Rich Attributes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To solve the dilemma, we conduct so far, the most comprehensive subjective study of personalized image aesthetics and introduce a new Personalized image Aesthetics database with Rich Attributes (PARA), which consists of 31,220 images with annotations by 438 subjects.	Yuzhe Yang; Liwu Xu; Leida Li; Nan Qie; Yaqian Li; Peng Zhang; Yandong Guo;	code
177	Part-Based Pseudo Label Refinement for Unsupervised Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Part-based Pseudo Label Refinement (PPLR) framework that reduces the label noise by employing the complementary relationship between global and part features.	Yoonki Cho; Woo Jae Kim; Seunghoon Hong; Sung-Eui Yoon;	code
178	HDNet: High-Resolution Dual-Domain Learning for Spectral Compressive Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: So we propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction.	Xiaowan Hu; Yuanhao Cai; Jing Lin; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool;	code
179	OW-DETR: Open-World Detection Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection.	Akshita Gupta; Sanath Narayan; K J Joseph; Salman Khan; Fahad Shahbaz Khan; Mubarak Shah;	code
180	Learning Deep Implicit Functions for 3D Shapes With Dynamic Code Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the local codes are constrained at discrete and regular positions like grid points, which makes the code positions difficult to be optimized and limits their representation ability. To solve this problem, we propose to learn DIF with Dynamic Code Cloud, named DCC-DIF.	Tianyang Li; Xin Wen; Yu-Shen Liu; Hua Su; Zhizhong Han;	code
181	Reversible Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition.	Karttikeya Mangalam; Haoqi Fan; Yanghao Li; Chao-Yuan Wu; Bo Xiong; Christoph Feichtenhofer; Jitendra Malik;	code
182	Amodal Panoptic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This ability of amodal perception forms the basis of our perceptual and cognitive understanding of our world. To enable robots to reason with this capability, we formulate and propose a novel task that we name amodal panoptic segmentation.	Rohit Mohan; Abhinav Valada;	code
183	Correlation Verification for Image Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a novel image retrieval re-ranking network named Correlation Verification Networks (CVNet).	Seongwon Lee; Hongje Seong; Suhyeon Lee; Euntai Kim;	code
184	Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More importantly, existing approaches build upon the straightforward pose estimation loss, which unfortunately cannot constrain the network to fully leverage useful information from neighboring frames. To tackle these problems, we present a novel hierarchical alignment framework, which leverages coarse-to-fine deformations to progressively update a neighboring frame to align with the current frame at the feature level.	Zhenguang Liu; Runyang Feng; Haoming Chen; Shuang Wu; Yixing Gao; Yunjun Gao; Xiang Wang;	code
185	Self-Supervised Transformers for Unsupervised Object Discovery Using Normalized Cut Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show a graph-based method that uses the self-supervised transformer features to discover an object from an image.	Yangtao Wang; Xi Shen; Shell Xu Hu; Yuan Yuan; James L. Crowley; Dominique Vaufreydaz;	code
186	Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection.	Yong Zhang; Yingwei Pan; Ting Yao; Rui Huang; Tao Mei; Chang-Wen Chen;	code
187	Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper probes intrinsic factors behind typical failure cases (e.g spatial inconsistency and boundary confusion) produced by the existing state-of-the-art method in face parsing. To tackle these problems, we propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation (DML-CSR) for face parsing.	Qingping Zheng; Jiankang Deng; Zheng Zhu; Ying Li; Stefanos Zafeiriou;	code
188	Glass: Geometric Latent Augmentation for Shape Spaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the problem of training generative models on very sparse collections of 3D models.	Sanjeev Muralikrishnan; Siddhartha Chaudhuri; Noam Aigerman; Vladimir G. Kim; Matthew Fisher; Niloy J. Mitra;	code
189	DPICT: Deep Progressive Image Compression Using Trit-Planes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose the deep progressive image compression using trit-planes (DPICT) algorithm, which is the first learning-based codec supporting fine granular scalability (FGS).	Jae-Han Lee; Seungmin Jeon; Kwang Pyo Choi; Youngo Park; Chang-Su Kim;	code
190	Text to Image Generation With Semantic-Spatial Aware GAN Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A close inspection of their generated images reveals a major limitation: even though the generated image holistically matches the description, individual image regions or parts of somethings are often not recognizable or consistent with words in the sentence, e.g. "a white crown". To address this problem, we propose a novel framework Semantic-Spatial Aware GAN for synthesizing images from input text.	Wentong Liao; Kai Hu; Michael Ying Yang; Bodo Rosenhahn;	code
191	Generalizable Cross-Modality Medical Image Segmentation Via Style Augmentation and Dual Normalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This setting, namely generalizable cross-modality segmentation, owning its clinical potential, is much more challenging than other related settings, e.g., domain adaptation. To achieve this goal, we in this paper propose a novel dual-normalization model by leveraging the augmented source-similar and source-dissimilar images during our generalizable segmentation.	Ziqi Zhou; Lei Qi; Xin Yang; Dong Ni; Yinghuan Shi;	code
192	Learning To Prompt for Open-Vocabulary Object Detection With Vision-Language Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model.	Yu Du; Fangyun Wei; Zihe Zhang; Miaojing Shi; Yue Gao; Guoqi Li;	code
193	Interactive Segmentation and Visualization for Tiny Objects in Multi-Megapixel Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce an interactive image segmentation and visualization framework for identifying, inspecting, and editing tiny objects (just a few pixels wide) in large multi-megapixel high-dynamic-range (HDR) images.	Chengyuan Xu; Boning Dong; Noah Stier; Curtis McCully; D. Andrew Howell; Pradeep Sen; Tobias Höllerer;	code
194	Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we focus on exploiting the high-precision and non-differentiable physics simulator to incorporate dynamical constraints in motion capture.	Buzhen Huang; Liang Pan; Yuan Yang; Jingyi Ju; Yangang Wang;	code
195	Surface Representation for Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present RepSurf (representative surfaces), a novel representation of point clouds to explicitly depict the very local structure.	Haoxi Ran; Jun Liu; Chengjie Wang;	code
196	Implicit Motion Handling for Video Camouflaged Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames.	Xuelian Cheng; Huan Xiong; Deng-Ping Fan; Yiran Zhong; Mehrtash Harandi; Tom Drummond; Zongyuan Ge;	code
197	DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present DeepLIIF (https://deepliif.org), a first free online platform for efficient and reproducible IHC scoring.	Parmida Ghahremani; Joseph Marino; Ricardo Dodds; Saad Nadeem;	code
198	Learning With Twin Noisy Labels for Visible-Infrared Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study an untouched problem in visible-infrared person re-identification (VI-ReID), namely, Twin Noise Labels (TNL) which refers to as noisy annotation and correspondence.	Mouxing Yang; Zhenyu Huang; Peng Hu; Taihao Li; Jiancheng Lv; Xi Peng;	code
199	Optical Flow Estimation for Spiking Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, frame-based and event-based methods are not well suited to spike streams from the spiking camera due to the different data modalities. To this end, we present, SCFlow, a tailored deep learning pipeline to estimate optical flow in high-speed scenes from spike streams.	Liwen Hu; Rui Zhao; Ziluo Ding; Lei Ma; Boxin Shi; Ruiqin Xiong; Tiejun Huang;	code
200	GradViT: Gradient Inversion of Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.	Ali Hatamizadeh; Hongxu Yin; Holger R. Roth; Wenqi Li; Jan Kautz; Daguang Xu; Pavlo Molchanov;	code
201	Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution Via Cycle-Projected Mutual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a one-stage based Cycle-projected Mutual learning network (CycMu-Net) for ST-VSR, which makes full use of spatial-temporal correlations via the mutual learning between S-VSR and T-VSR.	Mengshun Hu; Kui Jiang; Liang Liao; Jing Xiao; Junjun Jiang; Zheng Wang;	code
202	Joint Global and Local Hierarchical Priors for Learned Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, CNNs have a limitation in modeling long-range dependencies due to their nature of local connectivity, which can be a significant bottleneck in image compression where reducing spatial redundancy is a key point. To overcome this issue, we propose a novel entropy model called Information Transformer (Informer) that exploits both global and local information in a content-dependent manner using an attention mechanism.	Jun-Hyuk Kim; Byeongho Heo; Jong-Seok Lee;	code
203	Knowledge Distillation Via The Target-Aware Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach.	Sihao Lin; Hongwei Xie; Bing Wang; Kaicheng Yu; Xiaojun Chang; Xiaodan Liang; Gang Wang;	code
204	Subspace Adversarial Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To control the growth of the gradient, we propose a new AT method, Subspace Adversarial Training (Sub-AT), which constrains AT in a carefully extracted subspace.	Tao Li; Yingwen Wu; Sizhe Chen; Kun Fang; Xiaolin Huang;	code
205	3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by deforming point clouds during training.	Alexander Lehner; Stefano Gasperini; Alvaro Marcos-Ramiro; Michael Schmidt; Mohammad-Ali Nikouei Mahani; Nassir Navab; Benjamin Busam; Federico Tombari;	code
206	Image Segmentation Using Text and Image Prompts Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose a system that can generate image segmentations based on arbitrary prompts at test time.	Timo Lüddecke; Alexander Ecker;	code
207	AutoMine: An Unmanned Mine Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the open-pit mine is one of the typical representatives for them. Therefore, we introduce the Autonomous driving dataset on the Mining scene (AutoMine) for positioning and perception tasks in this paper.	Yuchen Li; Zixuan Li; Siyu Teng; Yu Zhang; Yuhang Zhou; Yuchang Zhu; Dongpu Cao; Bin Tian; Yunfeng Ai; Zhe Xuanyuan; Long Chen;	code
208	Background Activation Suppression for Weakly Supervised Object Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Background Activation Suppression (BAS) method.	Pingyu Wu; Wei Zhai; Yang Cao;	code
209	Synthetic Generation of Face Videos With Plethysmograph Physiology Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a scalable biophysical learning based method to generate physio-realistic synthetic rPPG videos given any reference image and target rPPG signal and shows that it could further improve the state-of-the-art physiological measurement and reduce the bias among different groups.	Zhen Wang; Yunhao Ba; Pradyumna Chari; Oyku Deniz Bozkurt; Gianna Brown; Parth Patwa; Niranjan Vaddi; Laleh Jalilian; Achuta Kadambi;	code
210	Hallucinated Neural Radiance Fields in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing solutions adopt NeRF with a controllable appearance embedding to render novel views under various conditions, but they cannot render view-consistent images with an unseen appearance. To solve this problem, we present an end-to-end framework for constructing a hallucinated NeRF, dubbed as Ha-NeRF.	Xingyu Chen; Qi Zhang; Xiaoyu Li; Yue Chen; Ying Feng; Xuan Wang; Jue Wang;	code
211	Global Tracking Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel transformer-based architecture for global multi-object tracking.	Xingyi Zhou; Tianwei Yin; Vladlen Koltun; Philipp Krähenbühl;	code
212	Backdoor Attacks on Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning.	Aniruddha Saha; Ajinkya Tejankar; Soroush Abbasi Koohpayegani; Hamed Pirsiavash;	code
213	GMFlow: Learning Optical Flow Via Global Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, we propose a GMFlow framework, which consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.	Haofei Xu; Jing Zhang; Jianfei Cai; Hamid Rezatofighi; Dacheng Tao;	code
214	Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation.	Xian Liu; Qianyi Wu; Hang Zhou; Yinghao Xu; Rui Qian; Xinyi Lin; Xiaowei Zhou; Wayne Wu; Bo Dai; Bolei Zhou;	code
215	Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We endeavor on a rarely explored task named Insubstan-tial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and tempo-ral information is crucial.	Kailai Zhou; Yibo Wang; Tao Lv; Yunqian Li; Linsen Chen; Qiu Shen; Xun Cao;	code
216	Graph-Based Spatial Transformer With Memory Replay for Multi-Future Pedestrian Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named "Memory Replay" utilizing a memory graph.	Lihuan Li; Maurice Pagnucco; Yang Song;	code
217	Scanline Homographies for Rolling-Shutter Plane Absolute Pose Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we give a solution to the absolute pose problem free of motion assumptions.	Fang Bai; Agniva Sengupta; Adrien Bartoli;	code
218	AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-Time Image Enhancement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: They adopt a sub-optimal uniform sampling point allocation, limiting the expressiveness of the learned LUTs since the (tri-)linear interpolation between uniform sampling points in the LUT transform might fail to model local non-linearities of the color transform. Focusing on this problem, we present AdaInt (Adaptive Intervals Learning), a novel mechanism to achieve a more flexible sampling point allocation by adaptively learning the non-uniform sampling intervals in the 3D color space.	Canqian Yang; Meiguang Jin; Xu Jia; Yi Xu; Ying Chen;	code
219	Recurrent Glimpse-Based Decoder for Detection With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Alternative to existing studies that mainly develop advanced feature or embedding designs to tackle the training issue, we point out that the Region-of-Interest (RoI) based detection refinement can easily help mitigate the difficulty of training for DETR methods. Based on this, we introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.	Zhe Chen; Jing Zhang; Dacheng Tao;	code
220	SimMIM: A Simple Framework for Masked Image Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents SimMIM, a simple framework for masked image modeling.	Zhenda Xie; Zheng Zhang; Yue Cao; Yutong Lin; Jianmin Bao; Zhuliang Yao; Qi Dai; Han Hu;	code
221	Label Matching Semi-Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Despite the promising results, the label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. In this paper, we delve into this problem and propose a simple yet effective LabelMatch framework from two different yet complementary perspectives, i.e., distribution-level and instance-level.	Binbin Chen; Weijie Chen; Shicai Yang; Yunyi Xuan; Jie Song; Di Xie; Shiliang Pu; Mingli Song; Yueting Zhuang;	code
222	RegionCLIP: Region-Based Language-Image Pretraining Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts.	Yiwu Zhong; Jianwei Yang; Pengchuan Zhang; Chunyuan Li; Noel Codella; Liunian Harold Li; Luowei Zhou; Xiyang Dai; Lu Yuan; Yin Li; Jianfeng Gao;	code
223	Video Frame Interpolation Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field. To address these issues, we propose a Transformer-based video interpolation framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.	Zhihao Shi; Xiangyu Xu; Xiaohong Liu; Jun Chen; Ming-Hsuan Yang;	code
224	BCOT: A Markerless High-Precision 3D Object Tracking Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a multi-view approach to estimate the accurate 3D poses of real moving objects, and then use binocular data to construct a new benchmark for monocular textureless 3D object tracking.	Jiachen Li; Bin Wang; Shiqiang Zhu; Xin Cao; Fan Zhong; Wenxuan Chen; Te Li; Jason Gu; Xueying Qin;	code
225	Omni-DETR: Omni-Supervised Object Detection With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of omni-supervised object detection, which can use unlabeled, fully labeled and weakly labeled annotations, such as image tags, counts, points, etc., for object detection.	Pei Wang; Zhaowei Cai; Hao Yang; Gurumurthy Swaminathan; Nuno Vasconcelos; Bernt Schiele; Stefano Soatto;	code
226	Transferable Sparse Adversarial Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on sparse adversarial attack based on the l_0 norm constraint, which can succeed by only modifying a few pixels of an image.	Ziwen He; Wei Wang; Jing Dong; Tieniu Tan;	code
227	CREAM: Weakly Supervised Object Localization Via Class RE-Activation Mapping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we empirically prove that this problem is associated with the mixup of the activation values between less discriminative foreground regions and the background. To address it, we propose Class RE-Activation Mapping (CREAM), a novel clustering-based approach to boost the activation values of the integral object regions.	Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng; Rui-Wei Zhao; Tao Zhang; Xuequan Lu; Shang Gao;	code
228	VALHALLA: Visual Hallucination for Machine Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation.	Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu (Richard) Chen; Rogerio S. Feris; David Cox; Nuno Vasconcelos;	code
229	HINT: Hierarchical Neuron Concept Explainer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings.	Andong Wang; Wei-Ning Lee; Xiaojuan Qi;	code
230	Neural Face Identification in A 2D Wireframe Projection of A Manifold Object Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we approach the classical problem of face identification from a novel data-driven point of view.	Kehan Wang; Jia Zheng; Zihan Zhou;	code
231	Nonuniform-to-Uniform Quantization: Towards Accurate Quantization Via Generalized Straight-Through Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference.	Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric P. Xing; Zhiqiang Shen;	code
232	An Empirical Study of End-to-End Temporal Action Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present an empirical study of end-to-end temporal action detection.	Xiaolong Liu; Song Bai; Xiang Bai;	code
233	Object Localization Under Single Coarse Point Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points.	Xuehui Yu; Pengfei Chen; Di Wu; Najmul Hassan; Guorong Li; Junchi Yan; Humphrey Shi; Qixiang Ye; Zhenjun Han;	code
234	Unsupervised Learning of Accurate Siamese Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel unsupervised tracking framework, in which we can learn temporal correspondence both on the classification branch and regression branch.	Qiuhong Shen; Lei Qiao; Jinyang Guo; Peixia Li; Xin Li; Bo Li; Weitao Feng; Weihao Gan; Wei Wu; Wanli Ouyang;	code
235	Non-Parametric Depth Distribution Modelling Based Depth Inference for Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In contrast, we propose constructing the cost volume by non-parametric depth distribution modeling to handle pixels with unimodal and multi-modal distributions.	Jiayu Yang; Jose M. Alvarez; Miaomiao Liu;	code
236	Equalized Focal Loss for Dense Long-Tailed Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, in the long-tailed scenario, this line of work has not been explored so far. In this paper, we investigate whether one-stage detectors can perform well in this case.	Bo Li; Yongqiang Yao; Jingru Tan; Gang Zhang; Fengwei Yu; Jianwei Lu; Ye Luo;	code
237	DeepDPM: Deep Clustering With An Unknown Number of Clusters Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times. In this work, we bridge this gap by introducing an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning.	Meitar Ronen; Shahaf E. Finder; Oren Freifeld;	code
238	ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose ISDNet, a novel ultra-high resolution segmentation framework that integrates the shallow and deep networks in a new manner, which significantly accelerates the inference speed while achieving accurate segmentation.	Shaohua Guo; Liang Liu; Zhenye Gan; Yabiao Wang; Wuhao Zhang; Chengjie Wang; Guannan Jiang; Wei Zhang; Ran Yi; Lizhuang Ma; Ke Xu;	code
239	Unsupervised Domain Adaptation for Nighttime Aerial Tracking Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work instead develops a novel unsupervised domain adaptation framework for nighttime aerial tracking (named UDAT).	Junjie Ye; Changhong Fu; Guangze Zheng; Danda Pani Paudel; Guang Chen;	code
240	RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As face image contains abundant contextual information, we propose a method, RestoreFormer, which explores fully-spatial attentions to model contextual information and surpasses existing works that use local convolutions.	Zhouxia Wang; Jiawei Zhang; Runjian Chen; Wenping Wang; Ping Luo;	code
241	Mask-Guided Spectral-Wise Transformer for Efficient Hyperspectral Image Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction.	Yuanhao Cai; Jing Lin; Xiaowan Hu; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool;	code
242	A Variational Bayesian Method for Similarity Learning in Non-Rigid Image Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel variational Bayesian formulation for diffeomorphic non-rigid registration of medical images, which learns in an unsupervised way a data-specific similarity metric.	Daniel Grzech; Mohammad Farid Azampour; Ben Glocker; Julia Schnabel; Nassir Navab; Bernhard Kainz; Loïc Le Folgoc;	code
243	Not Just Selection, But Exploration: Online Class-Incremental Continual Learning Via Dual View Consistency Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel yet effective framework for online class-incremental continual learning, which considers not only the selection of stored samples, but also the full exploration of the data stream.	Yanan Gu; Xu Yang; Kun Wei; Cheng Deng;	code
244	Coupling Vision and Proprioception for Navigation of Legged Robots Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We exploit the complementary strengths of vision and proprioception to develop a point-goal navigation system for legged robots, called VP-Nav.	Zipeng Fu; Ashish Kumar; Ananye Agarwal; Haozhi Qi; Jitendra Malik; Deepak Pathak;	code
245	Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel optimization method based on a recurrent neural network to predict LiDAR scene flow in a weakly supervised manner.	Guanting Dong; Yueyi Zhang; Hanlin Li; Xiaoyan Sun; Zhiwei Xiong;	code
246	EMOCA: Emotion Driven Monocular Face Capture and Animation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The result is facial geometries that do not match the emotional content of the input image. We address this with EMOCA (EMOtion Capture and Animation), by introducing a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.	Radek Daněček; Michael J. Black; Timo Bolkart;	code
247	Quarantine: Sparsity Can Uncover The Trojan Attack Trigger for Free Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.	Tianlong Chen; Zhenyu Zhang; Yihua Zhang; Shiyu Chang; Sijia Liu; Zhangyang Wang;	code
248	AlignQ: Alignment Quantization With ADMM-Based Correlation Preservation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, existing approaches ignored the distribution difference between training and testing data, thereby inducing a large quantization error in inference. To address this issue, we propose a new quantization scheme, Alignment Quantization with ADMM-based Correlation Preservation (AlignQ), which exploits the cumulative distribution function (CDF) to align the data to be i.i.d. (independently and identically distributed) for quantization error minimization.	Ting-An Chen; De-Nian Yang; Ming-Syan Chen;	code
249	Interactive Multi-Class Tiny-Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such imagery typically contains objects from various categories, yet the multi-class interactive annotation setting for the detection task has thus far been unexplored. To address these needs, we propose a novel interactive annotation method for multiple instances of tiny objects from multiple classes, based on a few point-based user inputs.	Chunggi Lee; Seonwook Park; Heon Song; Jeongun Ryu; Sanghoon Kim; Haejoon Kim; Sérgio Pereira; Donggeun Yoo;	code
250	Learning From Pixel-Level Noisy Label: A New Perspective for Light Field Saliency Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to learn light field saliency from pixel-level noisy labels obtained from unsupervised hand crafted featured-based saliency methods.	Mingtao Feng; Kendong Liu; Liang Zhang; Hongshan Yu; Yaonan Wang; Ajmal Mian;	code
251	Multi-View Depth Estimation By Fusing Single-View Depth Probability With Multi-View Geometry Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For such failure modes, single-view depth estimation methods are often more reliable. To this end, we propose MaGNet, a novel framework for fusing single-view depth probability with multi-view geometry, to improve the accuracy, robustness and efficiency of multi-view depth estimation.	Gwangbin Bae; Ignas Budvytis; Roberto Cipolla;	code
252	Slimmable Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs.	Rang Meng; Weijie Chen; Shicai Yang; Jie Song; Luojun Lin; Di Xie; Shiliang Pu; Xinchao Wang; Mingli Song; Yueting Zhuang;	code
253	High-Resolution Image Harmonization Via Collaborative Dual Transformations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end network.	Wenyan Cong; Xinhao Tao; Li Niu; Jing Liang; Xuesong Gao; Qihao Sun; Liqing Zhang;	code
254	MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation.	Inkyu Shin; Yi-Hsuan Tsai; Bingbing Zhuang; Samuel Schulter; Buyu Liu; Sparsh Garg; In So Kweon; Kuk-Jin Yoon;	code
255	Self-Supervised Neural Articulated Shape and Appearance Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input.	Fangyin Wei; Rohan Chabra; Lingni Ma; Christoph Lassner; Michael Zollhöfer; Szymon Rusinkiewicz; Chris Sweeney; Richard Newcombe; Mira Slavcheva;	code
256	Topology Preserving Local Road Network Estimation From Single Onboard Camera Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims at extracting the local road network topology, directly in the bird’s-eye-view (BEV), all in a complex urban setting.	Yigit Baran Can; Alexander Liniger; Danda Pani Paudel; Luc Van Gool;	code
257	Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: A novel algorithm to detect road lanes in the eigenlane space is proposed in this paper.	Dongkwon Jin; Wonhui Park; Seong-Gyun Jeong; Heeyeon Kwon; Chang-Su Kim;	code
258	SwinTextSpotter: Scene Text Spotting Via Better Synergy Between Text Detection and Text Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter.	Mingxin Huang; Yuliang Liu; Zhenghao Peng; Chongyu Liu; Dahua Lin; Shenggao Zhu; Nicholas Yuan; Kai Ding; Lianwen Jin;	code
259	Deblur-NeRF: Neural Radiance Fields From Blurry Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, image blurriness caused by defocus or motion, which often occurs when capturing scenes in the wild, significantly degrades its reconstruction quality. To address this problem, We propose Deblur-NeRF, the first method that can recover a sharp NeRF from blurry input.	Li Ma; Xiaoyu Li; Jing Liao; Qi Zhang; Xuan Wang; Jue Wang; Pedro V. Sander;	code
260	Whose Track Is It Anyway? Improving Robustness to Tracking Errors With Affinity-Based Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments, and identity switches. To alleviate this propagation of errors, we propose a new prediction paradigm that uses detections and their affinity matrices across frames as inputs, removing the need for error-prone data association during tracking.	Xinshuo Weng; Boris Ivanovic; Kris Kitani; Marco Pavone;	code
261	Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents Video K-Net, a simple, strong, and unified framework for fully end-to-end video panoptic segmentation.	Xiangtai Li; Wenwei Zhang; Jiangmiao Pang; Kai Chen; Guangliang Cheng; Yunhai Tong; Chen Change Loy;	code
262	Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction.	Matias Mendieta; Taojiannan Yang; Pu Wang; Minwoo Lee; Zhengming Ding; Chen Chen;	code
263	Blind Image Super-Resolution With Elaborate Degradation Modeling on Noise and Kernel Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel.	Zongsheng Yue; Qian Zhao; Jianwen Xie; Lei Zhang; Deyu Meng; Kwan-Yee K. Wong;	code
264	Faithful Extreme Rescaling Via Generative Prior Reciprocated Invertible Representations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a Generative prior ReciprocAted Invertible rescaling Network (GRAIN) for generating faithful high-resolution (HR) images from low-resolution (LR) invertible images with an extreme upscaling factor (64x).	Zhixuan Zhong; Liangyu Chai; Yang Zhou; Bailin Deng; Jia Pan; Shengfeng He;	code
265	Proto2Proto: Can You Recognize The Car, The Way I Do? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present Proto2Proto, a novel method to transfer interpretability of one prototypical part network to another via knowledge distillation.	Monish Keswani; Sriranjani Ramakrishnan; Nishant Reddy; Vineeth N Balasubramanian;	code
266	TVConv: Efficient Translation Variant Convolution for Layout-Aware Visual Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We observe that these applications naturally exhibit the characteristics of large intra-image (spatial) variance and small cross-image variance. This observation motivates our efficient translation variant convolution (TVConv) for layout-aware visual processing.	Jierun Chen; Tianlang He; Weipeng Zhuo; Li Ma; Sangtae Ha; S.-H. Gary Chan;	code
267	Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images captured by one camera to low-resolution (LR) images captured by arbitrary target devices.	Xiaoqian Xu; Pengxu Wei; Weikai Chen; Yang Liu; Mingzhi Mao; Liang Lin; Guanbin Li;	code
268	Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments – (1) ObjectGoal Navigation (e.g. ‘find & go to a chair’) and (2) Pick&Place (e.g. ‘find mug, pick mug, find counter, place mug on counter’).	Ram Ramrakhya; Eric Undersander; Dhruv Batra; Abhishek Das;	code
269	Simple But Effective: CLIP Embeddings for Embodied AI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks.	Apoorv Khandelwal; Luca Weihs; Roozbeh Mottaghi; Aniruddha Kembhavi;	code
270	NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Alternatively, a more graceful way is that global and local context can adaptively contribute per se to accommodate different visual data. To achieve this goal, we in this paper propose a novel ViT architecture, termed NomMer, which can dynamically Nominate the synergistic global-local context in vision transforMer.	Hao Liu; Xinghua Jiang; Xin Li; Zhimin Bao; Deqiang Jiang; Bo Ren;	code
271	Collaborative Transformers for Grounded Situation Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Grounded situation recognition is the task of predicting the main activity, entities playing certain roles within the activity, and bounding-box groundings of the entities in the given image. To effectively deal with this challenging task, we introduce a novel approach where the two processes for activity classification and entity estimation are interactive and complementary.	Junhyeong Cho; Youngseok Yoon; Suha Kwak;	code
272	CPPF: Towards Robust Category-Level 9D Pose Estimation in The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the problem of category-level 9D pose estimation in the wild, given a single RGB-D frame.	Yang You; Ruoxi Shi; Weiming Wang; Cewu Lu;	code
273	Continual Test-Time Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The noisy pseudo-labels can further lead to error accumulation and catastrophic forgetting. To tackle these issues, we propose a continual test-time adaptation approach (CoTTA) which comprises two parts.	Qin Wang; Olga Fink; Luc Van Gool; Dengxin Dai;	code
274	Dynamic MLP for Fine-Grained Image Classification By Leveraging Geographical and Temporal Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To fully explore the potential of multimodal information, we propose a dynamic MLP on top of the image representation, which interacts with multimodal features at a higher and broader dimension.	Lingfeng Yang; Xiang Li; Renjie Song; Borui Zhao; Juntian Tao; Shihao Zhou; Jiajun Liang; Jian Yang;	code
275	MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-Based Visual Question Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose MuKEA to represent multimodal knowledge by an explicit triplet to correlate visual objects and fact answers with implicit relations.	Yang Ding; Jing Yu; Bang Liu; Yue Hu; Mingxin Cui; Qi Wu;	code
276	Fair Contrastive Learning for Facial Attribute Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we for the first time analyze unfairness caused by supervised contrastive learning and propose a new Fair Supervised Contrastive Loss (FSCL) for fair visual representation learning.	Sungho Park; Jewook Lee; Pilhyeon Lee; Sunhee Hwang; Dohyung Kim; Hyeran Byun;	code
277	Directional Self-Supervised Learning for Heavy Image Augmentations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a directional self-supervised learning paradigm (DSSL), which is compatible with significantly more augmentations.	Yalong Bai; Yifan Yang; Wei Zhang; Tao Mei;	code
278	No-Reference Point Cloud Quality Assessment Via Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel no-reference quality assessment metric, the image transferred point cloud quality assessment (IT-PCQA), for 3D point clouds.	Qi Yang; Yipeng Liu; Siheng Chen; Yiling Xu; Jun Sun;	code
279	Comprehending and Ordering Semantics for Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net), that novelly unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.	Yehao Li; Yingwei Pan; Ting Yao; Tao Mei;	code
280	A Large-Scale Comprehensive Dataset and Copy-Overlap Aware Evaluation Protocol for Segment-Level Video Copy Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset.	Sifeng He; Xudong Yang; Chen Jiang; Gang Liang; Wei Zhang; Tan Pan; Qing Wang; Furong Xu; Chunguang Li; JinXiong Liu; Hui Xu; Kaiming Huang; Yuan Cheng; Feng Qian; Xiaobo Zhang; Lei Yang;	code
281	Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy.	Jingzhou Chen; Peng Wang; Jian Liu; Yuntao Qian;	code
282	HeadNeRF: A Real-Time NeRF-Based Parametric Head Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose HeadNeRF, a novel NeRF-based parametric head model that integrates the neural radiance field to the parametric representation of the human head.	Yang Hong; Bo Peng; Haiyao Xiao; Ligang Liu; Juyong Zhang;	code
283	Occlusion-Robust Face Alignment Using A Viewpoint-Invariant Hierarchical Network Architecture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a new network architecture called GlomFace to model the facial hierarchies against various occlusions, which draws inspiration from the viewpoint-invariant hierarchy of facial structure.	Congcong Zhu; Xintong Wan; Shaorong Xie; Xiaoqiang Li; Yinzheng Gu;	code
284	IDR: Self-Supervised Image Denoising Via Iterative Data Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a practical unsupervised image denoising method to achieve state-of-the-art denoising performance.	Yi Zhang; Dasong Li; Ka Lung Law; Xiaogang Wang; Hongwei Qin; Hongsheng Li;	code
285	MogFace: Towards A Deeper Appreciation on Face Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on resolving three aforementioned challenges that exiting methods are difficult to finish off and present a novel face detector, termed MogFace.	Yang Liu; Fei Wang; Jiankang Deng; Zhipeng Zhou; Baigui Sun; Hao Li;	code
286	Learning Affinity From Attention: End-to-End Weakly-Supervised Semantic Segmentation With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, to address the aforementioned problem, we introduce Transformers, which naturally integrate global information, to generate more integral initial pseudo labels for end-to-end WSSS.	Lixiang Ru; Yibing Zhan; Baosheng Yu; Bo Du;	code
287	CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data.	Haisong Liu; Tao Lu; Yihui Xu; Jia Liu; Wenjie Li; Lijun Chen;	code
288	FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k.	Yan Wang; Yixuan Sun; Yiwen Huang; Zhongying Liu; Shuyong Gao; Wei Zhang; Weifeng Ge; Wenqiang Zhang;	code
289	Learning To Detect Mobile Objects From LiDAR Scans Without Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth.	Yurong You; Katie Luo; Cheng Perng Phoo; Wei-Lun Chao; Wen Sun; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger;	code
290	WildNet: Learning Domain Generalized Semantic Segmentation From The Wild Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to diversify both the content and style of the source domain with the help of the wild.	Suhyeon Lee; Hongje Seong; Seongwon Lee; Euntai Kim;	code
291	DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations.	Haibao Yu; Yizhen Luo; Mao Shu; Yiyi Huo; Zebang Yang; Yifeng Shi; Zhenglong Guo; Hanyu Li; Xing Hu; Jirui Yuan; Zaiqing Nie;	code
292	Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the aforementioned problems, we propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level.	Yuenan Hou; Xinge Zhu; Yuexin Ma; Chen Change Loy; Yikang Li;	code
293	Generating Diverse 3D Reconstructions From A Single Occluded Face Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, while a plurality of 3D reconstructions is plausible in the occluded regions, existing approaches are limited to generating only a single solution. To address both of these challenges, we present Diverse3DFace, which is specifically designed to simultaneously generate a diverse and realistic set of 3D reconstructions from a single occluded face image.	Rahul Dey; Vishnu Naresh Boddeti;	code
294	Stand-Alone Inter-Frame Attention in Video Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.	Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Jiebo Luo; Tao Mei;	code
295	Large-Scale Pre-Training for Person Re-Identification With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels.	Dengpan Fu; Dongdong Chen; Hao Yang; Jianmin Bao; Lu Yuan; Lei Zhang; Houqiang Li; Fang Wen; Dong Chen;	code
296	Semantic Segmentation By Early Region Proxy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics.	Yifan Zhang; Bo Pang; Cewu Lu;	code
297	LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To apply gesture recognition to long-distance interactive scenes such as meetings and smart homes, a large RGB-D video dataset LD-ConGR is established in this paper.	Dan Liu; Libo Zhang; Yanjun Wu;	code
298	HVH: Learning A Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address the aforementioned problems: 1) we use a novel, volumetric hair representation that is composed of thousands of primitives.	Ziyan Wang; Giljoo Nam; Tuur Stuyck; Stephen Lombardi; Michael Zollhöfer; Jessica Hodgins; Christoph Lassner;	code
299	Rethinking Visual Geo-Localization for Large-Scale Applications Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We find that current methods fail to scale to such large datasets, therefore we design a new highly scalable training technique, called CosPlace, which casts the training as a classification problem avoiding the expensive mining needed by the commonly used contrastive learning.	Gabriele Berton; Carlo Masone; Barbara Caputo;	code
300	The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In view of them, we advocate a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information.	Tianlong Chen; Zhenyu Zhang; Yu Cheng; Ahmed Awadallah; Zhangyang Wang;	code
301	ViM: Out-of-Distribution With Virtual-Logit Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: There are OOD samples that are easy to identify in the feature space while hard to distinguish in the logit space and vice versa. Motivated by this observation, we propose a novel OOD scoring method named Virtual-logit Matching (ViM), which combines the class-agnostic score from feature space and the In-Distribution (ID) class-dependent logits.	Haoqi Wang; Zhizhong Li; Litong Feng; Wayne Zhang;	code
302	Class-Aware Contrastive Semi-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, the model’s judgment becomes noisier in real-world applications with extensive out-of-distribution data. To address this issue, we propose a general method named Class-aware Contrastive Semi-Supervised Learning (CCSSL), which is a drop-in helper to improve the pseudo-label quality and enhance the model’s robustness in the real-world setting.	Fan Yang; Kai Wu; Shuyi Zhang; Guannan Jiang; Yong Liu; Feng Zheng; Wei Zhang; Chengjie Wang; Long Zeng;	code
303	Ditto: Building Digital Twins of Articulated Objects From Interaction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Ditto to learn articulation model estimation and 3D geometry reconstruction of an articulated object through interactive perception.	Zhenyu Jiang; Cheng-Chun Hsu; Yuke Zhu;	code
304	Adaptive Early-Learning Correction for Segmentation From Noisy Annotations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we study the learning dynamics of deep segmentation networks trained on inaccurately-annotated data.	Sheng Liu; Kangning Liu; Weicheng Zhu; Yiqiu Shen; Carlos Fernandez-Granda;	code
305	Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing works, e.g., using the twilight as the intermediate target domain to perform the adaptation from daytime to nighttime, may fail to cope with the inherent difference between datasets caused by the camera equipment and the urban style. Faced with these two types of domain shifts, i.e., the illumination and the inherent difference of the datasets, we propose a novel domain adaptation framework via cross-domain correlation distillation, called CCDistill.	Huan Gao; Jichang Guo; Guoli Wang; Qian Zhang;	code
306	RSTT: Real-Time Spatial Temporal Transformer for Space-Time Video Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The existing methods based on Convolutional Neural Network (CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model.	Zhicheng Geng; Luming Liang; Tianyu Ding; Ilya Zharkov;	code
307	Partial Class Activation Attention for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Beyond the previous CAM generated from image-level classification, we present Partial CAM, which subdivides the task into region-level prediction and achieves better localization performance.	Sun-Ao Liu; Hongtao Xie; Hai Xu; Yongdong Zhang; Qi Tian;	code
308	Multi-Scale Memory-Based Video Deblurring Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To achieve fine-grained deblurring, we designed a memory branch to memorize the blurry-sharp feature pairs in the memory bank, thus providing useful information for the blurry query input.	Bo Ji; Angela Yao;	code
309	A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a scalable combinatorial algorithm for globally optimizing over the space of geometrically consistent mappings between 3D shapes.	Paul Roetzer; Paul Swoboda; Daniel Cremers; Florian Bernard;	code
310	Geometric Structure Preserving Warp for Natural Image Stitching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of the existing methods ignore the large-scale layouts reflected by straight lines or curves, decreasing overall stitching quality. To address this issue, this work presents a structure-preserving stitching approach that produces images with natural visual effects and less distortion.	Peng Du; Jifeng Ning; Jiguang Cui; Shaoli Huang; Xinchao Wang; Jiaxin Wang;	code
311	GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: For the first time, we address the problem of generating full-body, hand and head motions of an avatar grasping an unknown object.	Omid Taheri; Vasileios Choutas; Michael J. Black; Dimitrios Tzionas;	code
312	Conditional Prompt Learning for Vision-Language Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the problem, we propose Conditional Context Optimization (CoCoOp), which extends CoOp by further learning a lightweight neural network to generate for each image an input-conditional token (vector).	Kaiyang Zhou; Jingkang Yang; Chen Change Loy; Ziwei Liu;	code
313	Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To do so, in this paper, we propose an efficient mini-batch sampling method, called graph sampling (GS), for large-scale deep metric learning.	Shengcai Liao; Ling Shao;	code
314	Undoing The Damage of Label Shift for Cross-Domain Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we give an in-depth analysis and show that the damage of label shift can be overcome by aligning the data conditional distribution and correcting the posterior probability.	Yahao Liu; Jinhong Deng; Jiale Tao; Tong Chu; Lixin Duan; Wen Li;	code
315	FisherMatch: Semi-Supervised Rotation Regression Via Entropy-Based Filtering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the popular semi-supervised approach, FixMatch, we propose to leverage pseudo label filtering to facilitate the information flow from labeled data to unlabeled data in a teacher-student mutual learning framework.	Yingda Yin; Yingcheng Cai; He Wang; Baoquan Chen;	code
316	Affine Medical Image Registration With Coarse-To-Fine Vision Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration.	Tony C. W. Mok; Albert C. S. Chung;	code
317	A Differentiable Two-Stage Alignment Scheme for Burst Image Reconstruction With Large Shift Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In addition, the higher resolution (e.g., 4K) of modern imaging devices results in larger displacement between frames. To address these challenges, we design a differentiable two-stage alignment scheme sequentially in patch and pixel level for effective JDD-B.	Shi Guo; Xi Yang; Jianqi Ma; Gaofeng Ren; Lei Zhang;	code
318	Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning.	Jon Donnelly; Alina Jade Barnett; Chaofan Chen;	code
319	Restormer: Efficient Transformer for High-Resolution Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images.	Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang;	code
320	IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing.	Lingtong Kong; Boyuan Jiang; Donghao Luo; Wenqing Chu; Xiaoming Huang; Ying Tai; Chengjie Wang; Jie Yang;	code
321	Large Loss Matters in Weakly Supervised Multi-Label Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: That is, the model first learns the representation of clean labels, and then starts memorizing noisy labels. Based on this finding, we propose novel methods for WSML which reject or correct the large loss samples to prevent model from memorizing the noisy label.	Youngwook Kim; Jae Myung Kim; Zeynep Akata; Jungwoo Lee;	code
322	Neural Inertial Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes the inertial localization problem, the task of estimating the absolute location from a sequence of inertial sensor measurements.	Sachini Herath; David Caruso; Chen Liu; Yufan Chen; Yasutaka Furukawa;	code
323	GraftNet: Towards Domain Generalized Stereo Matching With A Broad-Spectrum and Task-Oriented Feature Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to leverage the feature of a model trained on large-scale datasets to deal with the domain shift since it has seen various styles of images.	Biyang Liu; Huimin Yu; Guodong Qi;	code
324	VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation.	Wenjia Xu; Yongqin Xian; Jiuniu Wang; Bernt Schiele; Zeynep Akata;	code
325	Catching Both Gray and Black Swans: Open-Set Supervised Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach that learns disentangled representations of abnormalities illustrated by seen anomalies, pseudo anomalies, and latent residual anomalies (i.e., samples that have unusual residuals compared to the normal data in a latent space), with the last two abnormalities designed to detect unseen anomalies.	Choubo Ding; Guansong Pang; Chunhua Shen;	code
326	MLSLT: Towards Multilingual Sign Language Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, such models are inefficient in building multilingual sign language translation systems. To solve this problem, we introduce the multilingual sign language translation (MSLT) task.	Aoxiong Yin; Zhou Zhao; Weike Jin; Meng Zhang; Xingshan Zeng; Xiaofei He;	code
327	Towards An End-to-End Framework for Flow-Guided Video Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules.	Zhen Li; Cheng-Ze Lu; Jianhua Qin; Chun-Le Guo; Ming-Ming Cheng;	code
328	Contrastive Test-Time Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel way to leverage self-supervised contrastive learning to facilitate target feature learning, along with an online pseudo labeling scheme with refinement that significantly denoises pseudo labels.	Dian Chen; Dequan Wang; Trevor Darrell; Sayna Ebrahimi;	code
329	MotionAug: Augmentation With Physical Correction for Human Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a motion data augmentation scheme incorporating motion synthesis encouraging diversity and motion correction imposing physical plausibility.	Takahiro Maeda; Norimichi Ukita;	code
330	Modeling Indirect Illumination for Inverse Rendering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach to efficiently recovering spatially-varying indirect illumination.	Yuanqing Zhang; Jiaming Sun; Xingyi He; Huan Fu; Rongfei Jia; Xiaowei Zhou;	code
331	TransWeather: Transformer-Based Restoration of Images Degraded By Adverse Weather Conditions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we focus on developing an efficient solution for the all adverse weather removal problem.	Jeya Maria Jose Valanarasu; Rajeev Yasarla; Vishal M. Patel;	code
332	H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods usually focus on partial detection components for domain alignment. In contrast, this paper considers that all the detection components are important and proposes a Holistic and Hierarchical Feature Alignment (H^2FA) R-CNN.	Yunqiu Xu; Yifan Sun; Zongxin Yang; Jiaxu Miao; Yi Yang;	code
333	P3Depth: Monocular Depth Estimation With A Piecewise Planarity Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on knowledge about the high regularity of real 3D scenes, we propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth.	Vaishakh Patil; Christos Sakaridis; Alexander Liniger; Luc Van Gool;	code
334	GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we reveal and address the disadvantages of the conventional query-driven HOI detectors from the two aspects.	Yue Liao; Aixi Zhang; Miao Lu; Yongliang Wang; Xiaobo Li; Si Liu;	code
335	Simple Multi-Dataset Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a simple method for training a unified detector on multiple large-scale datasets.	Xingyi Zhou; Vladlen Koltun; Philipp Krähenbühl;	code
336	Proactive Image Manipulation Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: By contrast, we propose a proactive scheme to image manipulation detection.	Vishal Asnani; Xi Yin; Tal Hassner; Sijia Liu; Xiaoming Liu;	code
337	StyTr2: Image Style Transfer With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Therefore, traditional neural style transfer methods face biased content representation. To address this critical issue, we take long-range dependencies of input images into account for image style transfer by proposing a transformer-based approach called StyTr^2.	Yingying Deng; Fan Tang; Weiming Dong; Chongyang Ma; Xingjia Pan; Lei Wang; Changsheng Xu;	code
338	Global Matching With Overlapping Attention for Optical Flow Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, inspired by the traditional matching-optimization methods where matching is introduced to handle large displacements before energy-based optimizations, we introduce a simple but effective global matching step before the direct regression and develop a learning-based matching-optimization framework, namely GMFlowNet.	Shiyu Zhao; Long Zhao; Zhixing Zhang; Enyu Zhou; Dimitris Metaxas;	code
339	Language As Queries for Referring Video Object Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a simple and unified framework built upon Transformer, termed ReferFormer.	Jiannan Wu; Yi Jiang; Peize Sun; Zehuan Yuan; Ping Luo;	code
340	MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.	Yanghao Li; Chao-Yuan Wu; Haoqi Fan; Karttikeya Mangalam; Bo Xiong; Jitendra Malik; Christoph Feichtenhofer;	code
341	Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Focusing on the relatively underexplored task of audio-visual zero-shot learning, we propose to learn multi-modal representations from audio-visual data using cross-modal attention and exploit textual label embeddings for transferring knowledge from seen classes to unseen classes.	Otniel-Bogdan Mercea; Lukas Riesch; A. Sophia Koepke; Zeynep Akata;	code
342	Rethinking Efficient Lane Detection Via Curve Modeling Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel parametric curve-based method for lane detection in RGB images.	Zhengyang Feng; Shaohua Guo; Xin Tan; Ke Xu; Min Wang; Lizhuang Ma;	code
343	Self-Supervised Arbitrary-Scale Point Clouds Upsampling Via Implicit Neural Representation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach that achieves selfsupervised and magnification-flexible point clouds upsampling simultaneously.	Wenbo Zhao; Xianming Liu; Zhiwei Zhong; Junjun Jiang; Wei Gao; Ge Li; Xiangyang Ji;	code
344	Co-Advise: Cross Inductive Bias Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution).	Sucheng Ren; Zhengqi Gao; Tianyu Hua; Zihui Xue; Yonglong Tian; Shengfeng He; Hang Zhao;	code
345	AdaMixer: A Fast-Converging Query-Based Object Detector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, this paradigm still suffers from slow convergence, limited performance, and design complexity of extra networks between backbone and decoder. In this paper, we find that the key to these issues is the adaptability of decoders for casting queries to varying objects.	Ziteng Gao; Limin Wang; Bing Han; Sheng Guo;	code
346	DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In these, there are limited number of WSI slides (bags), while the resolution of a single WSI is huge, which leads to a large number of patches (instances) cropped from this slide. To address this issue, we propose to virtually enlarge the number of bags by introducing the concept of pseudo-bags, on which a double-tier MIL framework is built to effectively use the intrinsic features.	Hongrun Zhang; Yanda Meng; Yitian Zhao; Yihong Qiao; Xiaoyun Yang; Sarah E. Coupland; Yalin Zheng;	code
347	BEVT: BERT Pretraining of Video Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce BEVT which decouples video representation learning into spatial representation learning and temporal dynamics learning.	Rui Wang; Dongdong Chen; Zuxuan Wu; Yinpeng Chen; Xiyang Dai; Mengchen Liu; Yu-Gang Jiang; Luowei Zhou; Lu Yuan;	code
348	Deep Generalized Unfolding Networks for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Deep Generalized Unfolding Network (DGUNet) for image restoration.	Chong Mou; Qian Wang; Jian Zhang;	code
349	VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel single-stage framework for online VIS built based on the grid structured feature representation.	Su Ho Han; Sukjun Hwang; Seoung Wug Oh; Yeonchool Park; Hyunwoo Kim; Min-Jung Kim; Seon Joo Kim;	code
350	Deep Unlearning Via Randomized Conditionally Independent Hessians Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level.	Ronak Mehta; Sourav Pal; Vikas Singh; Sathya N. Ravi;	code
351	Revisiting Skeleton-Based Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition.	Haodong Duan; Yue Zhao; Kai Chen; Dahua Lin; Bo Dai;	code
352	Stereo Depth From Events Cameras: Concentrate and Focus on The Future Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate the event missing or overriding issue, we propose to learn to concentrate on the dense events to produce a compact event representation with high details for depth estimation.	Yeongwoo Nam; Mohammad Mostafavi; Kuk-Jin Yoon; Jonghyun Choi;	code
353	A Simple Data Mixing Prior for Improving Self-Supervised Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component for advancing recognition models. In this paper, we focus on studying its effectiveness in the self-supervised setting.	Sucheng Ren; Huiyu Wang; Zhengqi Gao; Shengfeng He; Alan Yuille; Yuyin Zhou; Cihang Xie;	code
354	Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we investigate an alternative strategy for pre-training, namely Knowledge Distillation as Efficient Pre-training (KDEP), aiming to efficiently transfer the learned feature representation from existing pre-trained models to new student models for future downstream tasks.	Ruifei He; Shuyang Sun; Jihan Yang; Song Bai; Xiaojuan Qi;	code
355	BigDL 2.0: Seamless Scaling of AI Pipelines From Laptops to Distributed Cluster Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL [19] and Analytics Zoo [18] projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases).	Jason (Jinquan) Dai; Ding Ding; Dongjie Shi; Shengsheng Huang; Jiao Wang; Xin Qiu; Kai Huang; Guoqiong Song; Yang Wang; Qiyuan Gong; Jiaming Song; Shan Yu; Le Zheng; Yina Chen; Junwei Deng; Ge Song;	code
356	Attentive Fine-Grained Structured Sparsity for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To further optimize the trade-off between the efficiency and the restoration accuracy, we propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer.	Junghun Oh; Heewon Kim; Seungjun Nah; Cheeun Hong; Jonghyun Choi; Kyoung Mu Lee;	code
357	Learning Fair Classifiers With Partially Annotated Group Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we consider a more practical scenario, dubbed as Algorithmic Group Fairness with the Partially annotated Group labels (Fair-PG).	Sangwon Jung; Sanghyuk Chun; Taesup Moon;	code
358	NightLab: A Dual-Level Architecture With Hardness Detection for Segmentation at Night Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose NightLab, a novel nighttime segmentation framework that leverages multiple deep learning models imbued with night-aware features to yield State-of-The-Art (SoTA) performance on multiple night segmentation benchmarks.	Xueqing Deng; Peng Wang; Xiaochen Lian; Shawn Newsam;	code
359	Constrained Few-Shot Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes.	Michael Hersche; Geethan Karunaratne; Giovanni Cherubini; Luca Benini; Abu Sebastian; Abbas Rahimi;	code
360	Threshold Matters in WSSS: Manipulating The Activation for The Robust and Accurate Segmentation Model Against Thresholds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Then, we show that this issue can be mitigated by satisfying two conditions; 1) reducing the imbalance in the foreground activation and 2) increasing the gap between the foreground and the background activation. Based on these findings, we propose a novel activation manipulation network with a per-pixel classification loss and a label conditioning module.	Minhyun Lee; Dongseob Kim; Hyunjung Shim;	code
361	TransMVSNet: Global Context-Aware Multi-View Stereo Network With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present TransMVSNet, based on our exploration of feature matching in multi-view stereo (MVS).	Yikang Ding; Wentao Yuan; Qingtian Zhu; Haotian Zhang; Xiangyue Liu; Yuanjiang Wang; Xiao Liu;	code
362	DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here, we propose DPGEN, a network model designed to synthesize high-resolution natural images while satisfying differential privacy.	Jia-Wei Chen; Chia-Mu Yu; Ching-Chia Kao; Tzai-Wei Pang; Chun-Shien Lu;	code
363	The Majority Can Help The Minority: Context-Rich Minority Oversampling for Long-Tailed Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel minority over-sampling method to augment diversified minority samples by leveraging the rich context of the majority classes as background images.	Seulki Park; Youngkyu Hong; Byeongho Heo; Sangdoo Yun; Jin Young Choi;	code
364	IntentVizor: Towards Generic Query Guided Interactive Video Summarization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries.	Guande Wu; Jianzhe Lin; Claudio T. Silva;	code
365	Shape-Invariant 3D Adversarial Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Point-Cloud Sensitivity Map to boost both the efficiency and imperceptibility of point perturbations.	Qidong Huang; Xiaoyi Dong; Dongdong Chen; Hang Zhou; Weiming Zhang; Nenghai Yu;	code
366	Bootstrapping ViTs: Towards Liberating Vision Transformers From Pre-Training Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we strive to liberate ViTs from pre-training by introducing CNNs’ inductive biases back to ViTs while preserving their network architectures for higher upper bound and setting up more suitable optimization objectives.	Haofei Zhang; Jiarui Duan; Mengqi Xue; Jie Song; Li Sun; Mingli Song;	code
367	PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, one of the greatest challenges remains the creation of datasets with complete, unambiguous ground truth at scale. To address this, we develop a new, more comprehensive dataset for table extraction, called PubTables-1M.	Brandon Smock; Rohith Pesala; Robin Abraham;	code
368	Meta-Attention for ViT-Backed Continual Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study ViT-backed continual learning to strive for higher performance riding on recent advances of ViTs.	Mengqi Xue; Haofei Zhang; Jie Song; Mingli Song;	code
369	DST: Dynamic Substitute Training for Data-Free Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model.	Wenxuan Wang; Xuelin Qian; Yanwei Fu; Xiangyang Xue;	code
370	Unified Contrastive Learning in Image-Text-Label Space Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space.	Jianwei Yang; Chunyuan Li; Pengchuan Zhang; Bin Xiao; Ce Liu; Lu Yuan; Jianfeng Gao;	code
371	Unsupervised Pre-Training for Temporal Action Localization Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization. To bridge this gap, we make the first attempt to propose a self-supervised pretext task, coined as Pseudo Action Localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action Localization tasks (UP-TAL).	Can Zhang; Tianyu Yang; Junwu Weng; Meng Cao; Jue Wang; Yuexian Zou;	code
372	Look Outside The Room: Synthesizing A Consistent Long-Term 3D Scene Video From A Single Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions.	Xuanchi Ren; Xiaolong Wang;	code
373	High-Fidelity Human Avatars From A Single RGB Camera Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a coarse-to-fine framework to reconstruct a personalized high-fidelity human avatar from a monocular video.	Hao Zhao; Jinsong Zhang; Yu-Kun Lai; Zerong Zheng; Yingdi Xie; Yebin Liu; Kun Li;	code
374	Multiview Transformers for Video Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although transformer architectures have recently advanced the state-of-the-art, they have not explicitly modelled different spatiotemporal resolutions. To this end, we present Multiview Transformers for Video Recognition (MTV).	Shen Yan; Xuehan Xiong; Anurag Arnab; Zhichao Lu; Mi Zhang; Chen Sun; Cordelia Schmid;	code
375	How Good Is Aesthetic Ability of A Fashion Model? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce A100 (Aesthetic 100) to assess the aesthetic ability of the fashion compatibility models.	Xingxing Zou; Kaicheng Pang; Wen Zhang; Waikeung Wong;	code
376	Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, large disparities between existing synthetic datasets and real scenes lead to poor model transfer. We make two major contributions to address that.	Zhao Jin; Yinjie Lei; Naveed Akhtar; Haifeng Li; Munawar Hayat;	code
377	Sequential Voting With Relational Box Fields for Active Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To leverage each pixel as evidence to determine the bounding box of the active object, we propose a pixel-wise voting function.	Qichen Fu; Xingyu Liu; Kris Kitani;	code
378	Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we switch from D models to G models using the classical auto-encoder (AE).	Guangrun Wang; Yansong Tang; Liang Lin; Philip H.S. Torr;	code
379	Consistency Learning Via Decoding Path Augmentation for Transformers in Human Object Interaction Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths.	Jihwan Park; SeungJun Lee; Hwan Heo; Hyeong Kyu Choi; Hyunwoo J. Kim;	code
380	Consistent Explanations By Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Given an interpretation algorithm, e.g., Grad-CAM, we introduce a novel training method to train the model to produce more consistent explanations.	Vipin Pillai; Soroush Abbasi Koohpayegani; Ashley Ouligian; Dennis Fong; Hamed Pirsiavash;	code
381	Hierarchical Modular Network for Video Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a hierarchical modular network to bridge video representations and linguistic semantics from three levels before generating captions.	Hanhua Ye; Guorong Li; Yuankai Qi; Shuhui Wang; Qingming Huang; Ming-Hsuan Yang;	code
382	Depth Estimation By Combining Binocular Stereo and Monocular Structured-Light Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel stereo system, which consists of two cameras (an RGB camera and an IR camera) and an IR speckle projector.	Yuhua Xu; Xiaoli Yang; Yushan Yu; Wei Jia; Zhaobi Chu; Yulan Guo;	code
383	Salient-to-Broad Transition for Video Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the limited utilization of temporal relations in video re-id, the frame-level attention regions of mainstream methods are partial and highly similar. To address this problem, we propose a Salient-to-Broad Module (SBM) to enlarge the attention regions gradually.	Shutao Bai; Bingpeng Ma; Hong Chang; Rui Huang; Xilin Chen;	code
384	DeeCap: Dynamic Early Exiting for Efficient Image Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On the other hand, the exiting decisions made by internal classifiers are unreliable sometimes. To solve these issues, we propose DeeCap framework for efficient image captioning, which dynamically selects proper-sized decoding layers from a global perspective to exit early.	Zhengcong Fei; Xu Yan; Shuhui Wang; Qi Tian;	code
385	RepMLPNet: Hierarchical Vision MLP With Re-Parameterized Locality Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a parallel conv kernel into the FC kernel.	Xiaohan Ding; Honghao Chen; Xiangyu Zhang; Jungong Han; Guiguang Ding;	code
386	DR.VIC: Decomposition and Reasoning for Video Individual Counting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to conduct the pedestrian counting from a new perspective – Video Individual Counting (VIC), which counts the total number of individual pedestrians in the given video (a person is only counted once).	Tao Han; Lei Bai; Junyu Gao; Qi Wang; Wanli Ouyang;	code
387	ARCS: Accurate Rotation and Correspondence Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper is about the old Wahba problem in its more general form, which we call simultaneous rotation and correspondence search.	Liangzu Peng; Manolis C. Tsakiris; René Vidal;	code
388	Learning To Anticipate Future With Dynamic Context Removal Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this filed, previous methods usually care more about the model architecture design or but few attention has been put on how to train an anticipation model with a proper learning policy. To this end, in this work, we propose a novel training scheme called Dynamic Context Removal (DCR), which dynamically schedule the visibility of observed future in the learning procedure.	Xinyu Xu; Yong-Lu Li; Cewu Lu;	code
389	GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a generative and controllable face SR framework, called GCFSR, which can reconstruct images with faithful identity information without any additional priors.	Jingwen He; Wu Shi; Kai Chen; Lean Fu; Chao Dong;	code
390	On The Integration of Self-Attention and Convolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation.	Xuran Pan; Chunjiang Ge; Rui Lu; Shiji Song; Guanfu Chen; Zeyi Huang; Gao Huang;	code
391	Domain Adaptation on Point Clouds Via Geometry-Aware Implicits Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Here we propose a simple yet effective method for unsupervised domain adaptation on point clouds by employing a self-supervised task of learning geometry-aware implicits, which plays two critical roles in one shot.	Yuefan Shen; Yanchao Yang; Mi Yan; He Wang; Youyi Zheng; Leonidas J. Guibas;	code
392	GroupViT: Semantic Segmentation Emerges From Text Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, in this paper, we propose to bring back the grouping mechanism into deep networks, which allows semantic segments to emerge automatically with only text supervision.	Jiarui Xu; Shalini De Mello; Sifei Liu; Wonmin Byeon; Thomas Breuel; Jan Kautz; Xiaolong Wang;	code
393	DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs text-driven image manipulation using diffusion models.	Gwanghyun Kim; Taesung Kwon; Jong Chul Ye;	code
394	BppAttack: Stealthy and Efficient Trojan Attacks Against Deep Neural Networks Via Image Quantization and Contrastive Adversarial Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose stealthy and efficient Trojan attacks, BppAttack.	Zhenting Wang; Juan Zhai; Shiqing Ma;	code
395	Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Towards this end, in this paper, we first present a novel Stacked Hybrid-Attention network, which facilitates the intra-modal refinement as well as the inter-modal interaction, to serve as the encoder. We then devise an innovative Group Collaborative Learning strategy to optimize the decoder.	Xingning Dong; Tian Gan; Xuemeng Song; Jianlong Wu; Yuan Cheng; Liqiang Nie;	code
396	Towards Better Plasticity-Stability Trade-Off in Incremental Learning: A Simple Linear Connector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose to employ mode connectivity in loss landscapes to achieve better plasticity-stability trade-off without any previous samples.	Guoliang Lin; Hanlu Chu; Hanjiang Lai;	code
397	Topology-Preserving Shape Reconstruction and Registration Via Neural Diffeomorphic Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new model called Neural Diffeomorphic Flow (NDF) to learn deep implicit shape templates, representing shapes as conditional diffeomorphic deformations of templates, intrinsically preserving shape topologies.	Shanlin Sun; Kun Han; Deying Kong; Hao Tang; Xiangyi Yan; Xiaohui Xie;	code
398	Segment and Complete: Defending Object Detectors Against Adversarial Patch Attacks With Robust Patch Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Segment and Complete defense (SAC), a general framework for defending object detectors against patch attacks through detection and removal of adversarial patches.	Jiang Liu; Alexander Levine; Chun Pong Lau; Rama Chellappa; Soheil Feizi;	code
399	MAXIM: Multi-Axis MLP for Image Processing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks.	Zhengzhong Tu; Hossein Talebi; Han Zhang; Feng Yang; Peyman Milanfar; Alan Bovik; Yinxiao Li;	code
400	Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose the idea of learning part segmentation through unsupervised domain adaptation (UDA) from synthetic data.	Qing Liu; Adam Kortylewski; Zhishuai Zhang; Zizhang Li; Mengqi Guo; Qihao Liu; Xiaoding Yuan; Jiteng Mu; Weichao Qiu; Alan Yuille;	code
401	PSTR: End-to-End One-Step Person Search With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.	Jiale Cao; Yanwei Pang; Rao Muhammad Anwer; Hisham Cholakkal; Jin Xie; Mubarak Shah; Fahad Shahbaz Khan;	code
402	NFormer: Robust Person Re-Identification With Neighbor Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, due to the high intra-identity variations, ignoring such interactions typically leads to outlier features. To tackle this issue, we propose a Neighbor Transformer Network, or NFormer, which explicitly models interactions across all input images, thus suppressing outlier features and leading to more robust representations overall.	Haochen Wang; Jiayi Shen; Yongtuo Liu; Yan Gao; Efstratios Gavves;	code
403	Bridging Global Context Interactions for High-Fidelity Image Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range dependence.	Chuanxia Zheng; Tat-Jen Cham; Jianfei Cai; Dinh Phung;	code
404	SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we present SwinBERT, an end-to-end transformer-based model for video captioning, which takes video frame patches directly as inputs, and outputs a natural language description.	Kevin Lin; Linjie Li; Chung-Ching Lin; Faisal Ahmed; Zhe Gan; Zicheng Liu; Yumao Lu; Lijuan Wang;	code
405	Not All Tokens Are Equal: Human-Centric Visual Analysis Via Token Clustering Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, not all regions are equally important in human-centric vision tasks, e.g., the human body needs a fine representation with many tokens, while the image background can be modeled by a few tokens. To address this problem, we propose a novel Vision Transformer, called Token Clustering Transformer (TCFormer), which merges tokens by progressive clustering, where the tokens can be merged from different locations with flexible shapes and sizes.	Wang Zeng; Sheng Jin; Wentao Liu; Chen Qian; Ping Luo; Wanli Ouyang; Xiaogang Wang;	code
406	Temporally Efficient Vision Transformer for Video Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).	Shusheng Yang; Xinggang Wang; Yu Li; Yuxin Fang; Jiemin Fang; Wenyu Liu; Xun Zhao; Ying Shan;	code
407	The Devil Is in The Margin: Margin-Based Label Smoothing for Network Calibration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.	Bingyuan Liu; Ismail Ben Ayed; Adrian Galdran; Jose Dolz;	code
408	NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce NLX-GPT, a general, compact and faithful language model that can simultaneously predict an answer and explain it.	Fawaz Sammani; Tanmoy Mukherjee; Nikos Deligiannis;	code
409	WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose WarpingGAN, an effective and efficient 3D point cloud generation network.	Yingzhi Tang; Yue Qian; Qijian Zhang; Yiming Zeng; Junhui Hou; Xuefei Zhe;	code
410	Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To eliminate the heavy dependence on human annotations, we present a novel method, named Pseudo-Q, to automatically generate pseudo language queries for supervised training.	Haojun Jiang; Yuanze Lin; Dongchen Han; Shiji Song; Gao Huang;	code
411	E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we show that event data is a very valuable modality for egocentric action recognition.	Chiara Plizzari; Mirco Planamente; Gabriele Goletto; Marco Cannici; Emanuele Gusso; Matteo Matteucci; Barbara Caputo;	code
412	OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we first identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, through extensive experiments, we compare OoD generalization algorithms across two groups of benchmarks, each dominated by one of the distribution shifts, revealing their strengths on one shift as well as limitations on the other shift.	Nanyang Ye; Kaican Li; Haoyue Bai; Runpeng Yu; Lanqing Hong; Fengwei Zhou; Zhenguo Li; Jun Zhu;	code
413	OnePose: One-Shot Object Pose Estimation Without CAD Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a new method named OnePose for object pose estimation.	Jiaming Sun; Zihao Wang; Siyu Zhang; Xingyi He; Hongcheng Zhao; Guofeng Zhang; Xiaowei Zhou;	code
414	Rethinking Minimal Sufficient Representation in Contrastive Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This reveals a new problem that the contrastive learning models have the risk of over-fitting to the shared information between views. To alleviate this problem, we propose to increase the mutual information between the representation and input as regularization to approximately introduce more task-relevant information, since we cannot utilize any downstream task information during training.	Haoqing Wang; Xun Guo; Zhi-Hong Deng; Yan Lu;	code
415	Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL).	Yikai Wang; Xinwei Sun; Yanwei Fu;	code
416	Federated Class-Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To tackle the global forgetting brought by the non-i.i.d class imbalance across clients, we propose a proxy server that selects the best old global model to assist the local relation distillation.	Jiahua Dong; Lixu Wang; Zhen Fang; Gan Sun; Shichao Xu; Xiao Wang; Qi Zhu;	code
417	Show, Deconfound and Tell: Image Captioning With Causal Inference Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we first use Structural Causal Models (SCMs) to show how two confounders damage the image captioning. Then we apply the backdoor adjustment to propose a novel causal inference based image captioning (CIIC) framework, which consists of an interventional object detector (IOD) and an interventional transformer decoder (ITD) to jointly confront both confounders.	Bing Liu; Dong Wang; Xu Yang; Yong Zhou; Rui Yao; Zhiwen Shao; Jiaqi Zhao;	code
418	MobRecon: Mobile-Friendly Hand Mesh Reconstruction From Monocular Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence.	Xingyu Chen; Yufeng Liu; Yajiao Dong; Xiong Zhang; Chongyang Ma; Yanmin Xiong; Yuan Zhang; Xiaoyan Guo;	code
419	Parameter-Free Online Test-Time Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Motivated by the inherent uncertainty around the conditions that will ultimately be encountered at test time, we propose a particularly "conservative" approach, which addresses the problem with a Laplacian Adjusted Maximum-likelihood Estimation (LAME) objective.	Malik Boudiaf; Romain Mueller; Ismail Ben Ayed; Luca Bertinetto;	code
420	SIGMA: Semantic-Complete Graph Matching for Domain Adaptive Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Though great success, they ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching.	Wuyang Li; Xinyu Liu; Yixuan Yuan;	code
421	No Pain, Big Gain: Classify Dynamic Point Cloud Sequences With Static Models By Fitting Feature-Level Space-Time Surfaces Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To capture 3D motions without explicitly tracking correspondences, we propose a kinematics-inspired neural network (Kinet) by generalizing the kinematic concept of ST-surfaces to the feature space.	Jia-Xing Zhong; Kaichen Zhou; Qingyong Hu; Bing Wang; Niki Trigoni; Andrew Markham;	code
422	HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive Imaging Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel Hyperspectral Explicable Reconstruction and Optimal Sampling deep Network for SCI, dubbed HerosNet, which includes several phases under the ISTA-unfolding framework.	Xuanyu Zhang; Yongbing Zhang; Ruiqin Xiong; Qilin Sun; Jian Zhang;	code
423	Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework.	Arnav Chavan; Zhiqiang Shen; Zhuang Liu; Zechun Liu; Kwang-Ting Cheng; Eric P. Xing;	code
424	Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We consider the problem of recovering a single person’s 3D human mesh from in-the-wild crowded scenes.	Hongsuk Choi; Gyeongsik Moon; JoonKyu Park; Kyoung Mu Lee;	code
425	Detecting Deepfakes With Self-Blended Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present novel synthetic training data called self-blended images (SBIs) to detect deepfakes.	Kaede Shiohara; Toshihiko Yamasaki;	code
426	Implicit Sample Extension for Unsupervised Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the limited samples in each identity, we suppose there may lack some underlying information to well reveal the accurate clusters. To discover these information, we propose an Implicit Sample Extension (ISE) method to generate what we call support samples around the cluster boundaries.	Xinyu Zhang; Dongdong Li; Zhigang Wang; Jian Wang; Errui Ding; Javen Qinfeng Shi; Zhaoxiang Zhang; Jingdong Wang;	code
427	Energy-Based Latent Aligner for Incremental Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose ELI: Energy-based Latent Aligner for Incremental Learning, which first learns an energy manifold for the latent representations such that previous task latents will have low energy and the current task latents have high energy values.	K J Joseph; Salman Khan; Fahad Shahbaz Khan; Rao Muhammad Anwer; Vineeth N Balasubramanian;	code
428	Towards Semi-Supervised Deep Facial Expression Recognition With An Adaptive Confidence Margin Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition.	Hangyu Li; Nannan Wang; Xi Yang; Xiaoyu Wang; Xinbo Gao;	code
429	Group R-CNN for Weakly Semi-Supervised Object Detection With Points Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN.	Shilong Zhang; Zhuoran Yu; Liyang Liu; Xinjiang Wang; Aojun Zhou; Kai Chen;	code
430	Weakly-Supervised Action Transition Learning for Stochastic Human Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce the task of action-driven stochastic human motion prediction, which aims to predict multiple plausible future motions given a sequence of action labels and a short motion history.	Wei Mao; Miaomiao Liu; Mathieu Salzmann;	code
431	Hybrid Relation Guided Set Matching for Few-Shot Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To overcome the two limitations, we propose a novel Hybrid Relation guided Set Matching (HyRSM) approach that incorporates two key components: hybrid relation module and set matching metric.	Xiang Wang; Shiwei Zhang; Zhiwu Qing; Mingqian Tang; Zhengrong Zuo; Changxin Gao; Rong Jin; Nong Sang;	code
432	Cross-Patch Dense Contrastive Learning for Semi-Supervised Segmentation of Cellular Nuclei in Histopathologic Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We study the semi-supervised learning problem, using a few labeled data and a large amount of unlabeled data to train the network, by developing a cross-patch dense contrastive learning framework, to segment cellular nuclei in histopathologic images.	Huisi Wu; Zhaoze Wang; Youyi Song; Lin Yang; Jing Qin;	code
433	Generalized Binary Search Network for Highly-Efficient Multi-View Stereo Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel method for highly efficient MVS that remarkably decreases the memory footprint, meanwhile clearly advancing state-of-the-art depth prediction performance.	Zhenxing Mi; Chang Di; Dan Xu;	code
434	SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce the largest synthetic dataset for autonomous driving, SHIFT.	Tao Sun; Mattia Segu; Janis Postels; Yuxuan Wang; Luc Van Gool; Bernt Schiele; Federico Tombari; Fisher Yu;	code
435	FlexIT: Towards Flexible Semantic Image Translation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.	Guillaume Couairon; Asya Grechka; Jakob Verbeek; Holger Schwenk; Matthieu Cord;	code
436	CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: On large displacements with motion blur, noisy correlations could cause severe errors in the estimated flow. To overcome this challenge, we propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT), aiming to revitalize the correlation volume computation.	Xiuchao Sui; Shaohua Li; Xue Geng; Yan Wu; Xinxing Xu; Yong Liu; Rick Goh; Hongyuan Zhu;	code
437	BoxeR: Box-Attention for 2D and 3D Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simple attention mechanism, we call Box-Attention.	Duy-Kien Nguyen; Jihong Ju; Olaf Booij; Martin R. Oswald; Cees G. M. Snoek;	code
438	Neural Architecture Search With Representation Mutual Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing strategies, such as employing standard training or performance predictor, often suffer from high computational complexity and low generality. To address this issue, we propose to rank architectures by Representation Mutual Information (RMI).	Xiawu Zheng; Xiang Fei; Lei Zhang; Chenglin Wu; Fei Chao; Jianzhuang Liu; Wei Zeng; Yonghong Tian; Rongrong Ji;	code
439	Can Neural Nets Learn The Same Model Twice? Investigating Reproducibility and Double Descent From The Decision Boundary Perspective Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We discuss methods for visualizing neural network decision boundaries and decision regions.	Gowthami Somepalli; Liam Fowl; Arpit Bansal; Ping Yeh-Chiang; Yehuda Dar; Richard Baraniuk; Micah Goldblum; Tom Goldstein;	code
440	Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space which is used to preserve the grouping properties of the data distribution on multiple levels.	Saquib Sarfraz; Marios Koulakis; Constantin Seibold; Rainer Stiefelhagen;	code
441	Multi-View Transformer for 3D Visual Grounding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Multi-View Transformer (MVT) for 3D visual grounding.	Shijia Huang; Yilun Chen; Jiaya Jia; Liwei Wang;	code
442	Structured Sparse R-CNN for Direct Scene Graph Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Instead, from a perspective on SGG as a direct set prediction, this paper presents a simple, sparse, and unified framework, termed as Structured Sparse R-CNN.	Yao Teng; Limin Wang;	code
443	BARC: Learning To Regress 3D Dog Shape From Images By Exploiting Breed Information Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to recover the 3D shape and pose of dogs from a single image.	Nadine Rüegg; Silvia Zuffi; Konrad Schindler; Michael J. Black;	code
444	PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce PCA-based knowledge distillation to distill lightweight models and show it is motivated by theory.	Tai-Yin Chiu; Danna Gurari;	code
445	Towards Understanding Adversarial Robustness of Optical Flow Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the cause of the problem and show that the lack of robustness is rooted in the classical aperture problem of optical flow estimation in combination with bad choices in the details of the network architecture.	Simon Schrodi; Tonmoy Saikia; Thomas Brox;	code
446	Lifelong Graph Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we bridge GNN and lifelong learning by converting a continual graph learning problem to a regular graph learning problem so GNN can inherit the lifelong learning techniques developed for convolutional neural networks (CNN).	Chen Wang; Yuheng Qiu; Dasong Gao; Sebastian Scherer;	code
447	Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Hypergraph-Induced Semantic Tuplet (HIST) loss for deep metric learning that leverages the multilateral semantic relations of multiple samples to multiple classes via hypergraph modeling.	Jongin Lim; Sangdoo Yun; Seulki Park; Jin Young Choi;	code
448	Computing Wasserstein-p Distance Between Images With Linear Cost Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel algorithm to compute the Wasserstein-p distance between discrete measures by restricting the optimal transport (OT) problem on a subset.	Yidong Chen; Chen Li; Zhonghua Lu;	code
449	Unsupervised Representation Learning for Binary Networks By Joint Classifier Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: But such networks are not readily deployable to edge devices. To accelerate deployment of models with the benefit of unsupervised representation learning to such resource limited devices for various downstream tasks, we propose a self-supervised learning method for binary networks that uses a moving target network.	Dahyun Kim; Jonghyun Choi;	code
450	Large-Scale Video Panoptic Segmentation in The Wild: A Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a new large-scale dataset for the video panoptic segmentation task, which aims to assign semantic classes and track identities to all pixels in a video.	Jiaxu Miao; Xiaohan Wang; Yu Wu; Wei Li; Xu Zhang; Yunchao Wei; Yi Yang;	code
451	GrainSpace: A Large-Scale Dataset for Fine-Grained and Domain-Adaptive Recognition of Cereal Grains Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper we formulate GAI as three ubiquitous computer vision tasks: fine-grained recognition, domain adaptation and out-of-distribution recognition.	Lei Fan; Yiwen Ding; Dongdong Fan; Donglin Di; Maurice Pagnucco; Yang Song;	code
452	Learning Modal-Invariant and Temporal-Memory for Video-Based Visible-Infrared Person Re-Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we primarily study the video-based cross-modal person Re-ID method.	Xinyu Lin; Jinxing Li; Zeyu Ma; Huafeng Li; Shuang Li; Kaixiong Xu; Guangming Lu; David Zhang;	code
453	MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Prior works either simply align the global features of an image with its associated class semantic vector or utilize unidirectional attention to learn the limited latent semantic representations, which could not effectively discover the intrinsic semantic knowledge (e.g., attribute semantics) between visual and attribute features. To solve the above dilemma, we propose a Mutually Semantic Distillation Network (MSDN), which progressively distills the intrinsic semantic representations between visual and attribute features for ZSL.	Shiming Chen; Ziming Hong; Guo-Sen Xie; Wenhan Yang; Qinmu Peng; Kai Wang; Jian Zhao; Xinge You;	code
454	Oriented RepPoints for Aerial Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Unlike the mainstreamed approaches regressing the bounding box orientations, this paper proposes an effective adaptive points learning approach to aerial object detection by taking advantage of the adaptive points representation, which is able to capture the geometric information of the arbitrary-oriented instances.	Wentong Li; Yijie Chen; Kaixuan Hu; Jianke Zhu;	code
455	Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning (CPL) to overcome the above limitations.	Minghang Zheng; Yanjie Huang; Qingchao Chen; Yuxin Peng; Yang Liu;	code
456	Low-Resource Adaptation for Personalized Co-Speech Gesture Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose an approach, named DiffGAN, that efficiently personalizes co-speech gesture generation models of a high-resource source speaker to target speaker with just 2 minutes of target training data.	Chaitanya Ahuja; Dong Won Lee; Louis-Philippe Morency;	code
457	Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To alleviate this problem of domain shift, conventional wisdom typically concentrates solely on reducing the discrepancy between the source and target domains via attached domain classifiers, yet ignoring the difficulty of such transferable features in coping with both classification and localization subtasks in object detection. To address this issue, in this paper, we propose Task-specific Inconsistency Alignment (TIA), by developing a new alignment mechanism in separate task spaces, improving the performance of the detector on both subtasks.	Liang Zhao; Limin Wang;	code
458	MS2DG-Net: Progressive Correspondence Learning Via Multiple Sparse Semantics Dynamic Graph Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most such works ignore similar sparse semantics information between two given images and cannot capture local topology among correspondences well. Therefore, to deal with the above problems, Multiple Sparse Semantics Dynamic Graph Network (MS^ 2 DG-Net) is proposed, in this paper, to predict probabilities of correspondences as inliers and recover camera poses.	Luanyuan Dai; Yizhang Liu; Jiayi Ma; Lifang Wei; Taotao Lai; Changcai Yang; Riqing Chen;	code
459	Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.	Evonne Ng; Hanbyul Joo; Liwen Hu; Hao Li; Trevor Darrell; Angjoo Kanazawa; Shiry Ginosar;	code
460	Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation From Monocular Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video.	Wen-Li Wei; Jen-Chun Lin; Tyng-Luh Liu; Hong-Yuan Mark Liao;	code
461	MixFormer: End-to-End Tracking With Iterative Mixed Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers.	Yutao Cui; Cheng Jiang; Limin Wang; Gangshan Wu;	code
462	Plenoxels: Radiance Fields Without Neural Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis.	Sara Fridovich-Keil; Alex Yu; Matthew Tancik; Qinhong Chen; Benjamin Recht; Angjoo Kanazawa;	code
463	Selective-Supervised Contrastive Learning With Noisy Labels Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To learn robust representations and handle noisy labels, we propose selective-supervised contrastive learning (Sel-CL) in this paper.	Shikun Li; Xiaobo Xia; Shiming Ge; Tongliang Liu;	code
464	SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a simplex noise transition matrix (SimT) to model the mixed noise distributions in DA semantic segmentation and formulate the problem as estimation of SimT.	Xiaoqing Guo; Jie Liu; Tongliang Liu; Yixuan Yuan;	code
465	Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To circumvent the former problem, we propose a novel algorithm that attacks semantic similarity on feature representations.	Cheng Luo; Qinliang Lin; Weicheng Xie; Bizhu Wu; Jinheng Xie; Linlin Shen;	code
466	Video Demoireing With Relation-Based Temporal Consistency Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Considering the increasing demands for capturing videos, we study how to remove such undesirable moire patterns in videos, namely video demoireing. To this end, we introduce the first hand-held video demoireing dataset with a dedicated data collection pipeline to ensure spatial and temporal alignments of captured data.	Peng Dai; Xin Yu; Lan Ma; Baoheng Zhang; Jia Li; Wenbo Li; Jiajun Shen; Xiaojuan Qi;	code
467	Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel style transfer method to quickly create a new visual product with a nice appearance for industrial designers’ reference.	Jinchao Yang; Fei Guo; Shuo Chen; Jun Li; Jian Yang;	code
468	Modeling Image Composition for Complex Scene Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method that achieves state-of-the-art results on challenging (few-shot) layout-to-image generation tasks by accurately modeling textures, structures and relationships contained in a complex scene.	Zuopeng Yang; Daqing Liu; Chaoyue Wang; Jie Yang; Dacheng Tao;	code
469	Decoupling Zero-Shot Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on the decoupling formulation, we propose a simple and effective zero-shot semantic segmentation model, called ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by large margins, e.g., 22 points on the PAS-CAL VOC and 3 points on the COCO-Stuff in terms of mIoU for unseen classes.	Jian Ding; Nan Xue; Gui-Song Xia; Dengxin Dai;	code
470	Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions.	Van Nguyen Nguyen; Yinlin Hu; Yang Xiao; Mathieu Salzmann; Vincent Lepetit;	code
471	Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting The Adversarial Transferability Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we treat the iterative ensemble attack as a stochastic gradient descent optimization process, in which the variance of the gradients on different models may lead to poor local optima.	Yifeng Xiong; Jiadong Lin; Min Zhang; John E. Hopcroft; Kun He;	code
472	IFOR: Iterative Flow Minimization for Robotic Object Rearrangement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, an end-to-end method for the challenging problem of object rearrangement for unknown objects given an RGBD image of the original and final scenes.	Ankit Goyal; Arsalan Mousavian; Chris Paxton; Yu-Wei Chao; Brian Okorn; Jia Deng; Dieter Fox;	code
473	Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a unified approach to visual navigation using a novel modular transfer learning model.	Ziad Al-Halah; Santhosh Kumar Ramakrishnan; Kristen Grauman;	code
474	TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer).	Wenqiang Zhang; Zilong Huang; Guozhong Luo; Tao Chen; Xinggang Wang; Wenyu Liu; Gang Yu; Chunhua Shen;	code
475	The Wanderings of Odysseus in 3D Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our goal is to populate digital environments, in which digital humans have diverse body shapes, move perpetually, and have plausible body-scene contact.	Yan Zhang; Siyu Tang;	code
476	All-in-One Image Restoration for Unknown Corruption Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study a challenging problem in image restoration, namely, how to develop an all-in-one method that could recover images from a variety of unknown corruption types and levels.	Boyun Li; Xiao Liu; Peng Hu; Zhongqin Wu; Jiancheng Lv; Xi Peng;	code
477	PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose to explicitly integrate two matching priors in a single loss in order to learn local descriptors without supervision.	Jérome Revaud; Vincent Leroy; Philippe Weinzaepfel; Boris Chidlovskii;	code
478	MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose MixSTE (Mixed Spatio-Temporal Encoder), which has a temporal transformer block to separately model the temporal motion of each joint and a spatial transformer block to learn inter-joint spatial correlation.	Jinlu Zhang; Zhigang Tu; Jianyu Yang; Yujin Chen; Junsong Yuan;	code
479	RCP: Recurrent Closest Point for Point Cloud Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, these methods are limited by the fact that it is difficult to define a search window on point clouds because of the irregular data structure. In this paper, we avoid this irregularity by a simple yet effective method.	Xiaodong Gu; Chengzhou Tang; Weihao Yuan; Zuozhuo Dai; Siyu Zhu; Ping Tan;	code
480	A Dual Weighting Label Assignment Scheme for Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we explore a new weighting paradigm, termed dual weighting (DW), to specify pos and neg weights separately.	Shuai Li; Chenhang He; Ruihuang Li; Lei Zhang;	code
481	Hyperbolic Vision Transformers: Combining Improvements in Metric Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning.	Aleksandr Ermolov; Leyla Mirvakhabova; Valentin Khrulkov; Nicu Sebe; Ivan Oseledets;	code
482	Instance-Aware Dynamic Neural Network Quantization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present to conduct the low-bit quantization for each image individually, and develop a dynamic quantization scheme for exploring their optimal bit-widths.	Zhenhua Liu; Yunhe Wang; Kai Han; Siwei Ma; Wen Gao;	code
483	Exploring Effective Data for Surrogate Training Towards Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a triple-player framework by introducing a discriminator into the traditional data-free framework.	Xuxiang Sun; Gong Cheng; Hongda Li; Lei Pei; Junwei Han;	code
484	JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we introduce JRDB-Act, as an extension of the existing JRDB, which is captured by a social mobile manipulator and reflects a real distribution of human daily-life actions in a university campus environment.	Mahsa Ehsanpour; Fatemeh Saleh; Silvio Savarese; Ian Reid; Hamid Rezatofighi;	code
485	Investigating Top-k White-Box and Transferable Black-Box Attack Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To this end, we propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class.	Chaoning Zhang; Philipp Benz; Adil Karjauv; Jae Won Cho; Kang Zhang; In So Kweon;	code
486	Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty under small data setting due to the tightly spatiotemporal-entangled modeling; (ii) information redundancy as it usually contains lots of marginal information that is weakly relevant to classification; and (iii) low interaction between multi-modal spatiotemporal information caused by insufficient late fusion. To alleviate these drawbacks, we propose to decouple and recouple spatiotemporal representation for RGB-D-based motion recognition.	Benjia Zhou; Pichao Wang; Jun Wan; Yanyan Liang; Fan Wang; Du Zhang; Zhen Lei; Hao Li; Rong Jin;	code
487	A Self-Supervised Descriptor for Image Copy Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective.	Ed Pizzi; Sreya Dutta Roy; Sugosh Nagavara Ravindra; Priya Goyal; Matthijs Douze;	code
488	Negative-Aware Attention Framework for Image-Text Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We thereby propose a novel Negative-Aware Attention Framework (NAAF), which explicitly exploits both the positive effect of matched fragments and the negative effect of mismatched fragments to jointly infer image-text similarity.	Kun Zhang; Zhendong Mao; Quan Wang; Yongdong Zhang;	code
489	An Image Patch Is A Wave: Phase-Aware Vision MLP Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase.	Yehui Tang; Kai Han; Jianyuan Guo; Chang Xu; Yanxi Li; Chao Xu; Yunhe Wang;	code
490	Shunted Self-Attention Via Multi-Scale Token Aggregation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted self-attention (SSA), that allows ViTs to model the attentions at hybrid scales per attention layer.	Sucheng Ren; Daquan Zhou; Shengfeng He; Jiashi Feng; Xinchao Wang;	code
491	Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Specifically, a multivariate Gaussian mixture is proposed with means and covariances to be estimated. Then, a novel probabilistic vector quantization is utilized to effectively approximate means, and remaining covariances are further induced to a unified mixture and solved by cascaded estimation without context models involved.	Xiaosu Zhu; Jingkuan Song; Lianli Gao; Feng Zheng; Heng Tao Shen;	code
492	Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to The Task of Accelerated MRI Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work we present a novel Deep Learning-based Inverse Problem solver applied to the task of Accelerated MRI Reconstruction, called the Recurrent Variational Network (RecurrentVarNet), by exploiting the properties of Convolutional Recurrent Neural Networks and unrolled algorithms for solving Inverse Problems.	George Yiasemis; Jan-Jakob Sonke; Clarisa Sánchez; Jonas Teuwen;	code
493	Surpassing The Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We explore the potential of CNN-based models for gallbladder cancer (GBC) detection from ultrasound (USG) images as no prior study is known.	Soumen Basu; Mayank Gupta; Pratyaksha Rana; Pankaj Gupta; Chetan Arora;	code
494	Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Accordingly, we propose our defense strategy, namely Appearance and Structure Aware Robust Graph Matching (ASAR-GM).	Qibing Ren; Qingquan Bao; Runzhong Wang; Junchi Yan;	code
495	TrackFormer: Multi-Object Tracking With Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: The challenging task of multi-object tracking (MOT) requires simultaneous reasoning about track initialization, identity, and spatio-temporal trajectories. We formulate this task as a frame-to-frame set prediction problem and introduce TrackFormer, an end-to-end trainable MOT approach based on an encoder-decoder Transformer architecture.	Tim Meinhardt; Alexander Kirillov; Laura Leal-Taixé; Christoph Feichtenhofer;	code
496	3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Since the semantic attributes of a single image are usually implicit and entangled with each other, it is still challenging to reconstruct 3D shape with detailed semantic structures represented by the input image. To address this problem, we propose 3DAttriFlow to disentangle and extract semantic attributes through different semantic levels in the input images.	Xin Wen; Junsheng Zhou; Yu-Shen Liu; Hua Su; Zhen Dong; Zhizhong Han;	code
497	Feature Statistics Mixing Regularization for Generative Adversarial Networks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As a remedy, we propose feature statistics mixing regularization (FSMR) that encourages the discriminator’s prediction to be invariant to the styles of input images.	Junho Kim; Yunjey Choi; Youngjung Uh;	code
498	OpenTAL: Towards Open Set Temporal Action Localization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we, for the first time, step toward the Open Set TAL (OSTAL) problem and propose a general framework OpenTAL based on Evidential Deep Learning (EDL).	Wentao Bao; Qi Yu; Yu Kong;	code
499	Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work addresses the generalizable deepfake detection from a simple principle: a generalizable representation should be sensitive to diverse types of forgeries. Following this principle, we propose to enrich the "diversity" of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the "sensitivity" to the forgeries by enforcing the model to predict the forgery configurations.	Liang Chen; Yong Zhang; Yibing Song; Lingqiao Liu; Jue Wang;	code
500	Ego4D: Around The World in 3,000 Hours of Egocentric Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.	Kristen Grauman; Andrew Westbury; Eugene Byrne; Zachary Chavis; Antonino Furnari; Rohit Girdhar; Jackson Hamburger; Hao Jiang; Miao Liu; Xingyu Liu; Miguel Martin; Tushar Nagarajan; Ilija Radosavovic; Santhosh Kumar Ramakrishnan; Fiona Ryan; Jayant Sharma; Michael Wray; Mengmeng Xu; Eric Zhongcong Xu; Chen Zhao; Siddhant Bansal; Dhruv Batra; Vincent Cartillier; Sean Crane; Tien Do; Morrie Doulaty; Akshay Erapalli; Christoph Feichtenhofer; Adriano Fragomeni; Qichen Fu; Abrham Gebreselasie; Cristina González; James Hillis; Xuhua Huang; Yifei Huang; Wenqi Jia; Weslie Khoo; Jáchym Kolář; Satwik Kottur; Anurag Kumar; Federico Landini; Chao Li; Yanghao Li; Zhenqiang Li; Karttikeya Mangalam; Raghava Modhugu; Jonathan Munro; Tullie Murrell; Takumi Nishiyasu; Will Price; Paola Ruiz; Merey Ramazanova; Leda Sari; Kiran Somasundaram; Audrey Southerland; Yusuke Sugano; Ruijie Tao; Minh Vo; Yuchen Wang; Xindi Wu; Takuma Yagi; Ziwei Zhao; Yunyi Zhu; Pablo Arbeláez; David Crandall; Dima Damen; Giovanni Maria Farinella; Christian Fuegen; Bernard Ghanem; Vamsi Krishna Ithapu; C. V. Jawahar; Hanbyul Joo; Kris Kitani; Haizhou Li; Richard Newcombe; Aude Oliva; Hyun Soo Park; James M. Rehg; Yoichi Sato; Jianbo Shi; Mike Zheng Shou; Antonio Torralba; Lorenzo Torresani; Mingfei Yan; Jitendra Malik;	code
501	Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis.	Yucheng Tang; Dong Yang; Wenqi Li; Holger R. Roth; Bennett Landman; Daguang Xu; Vishwesh Nath; Ali Hatamizadeh;	code
502	Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a method, W-OoD, for utilizing the hard OoDs.	Jungbeom Lee; Seong Joon Oh; Sangdoo Yun; Junsuk Choe; Eunji Kim; Sungroh Yoon;	code
503	DAD-3DHeads: A Large-Scale Dense, Accurate and Diverse Dataset for 3D Head Alignment From A Single Image Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in-the-wild.	Tetiana Martyniuk; Orest Kupyn; Yana Kurlyak; Igor Krashenyi; Jiří Matas; Viktoriia Sharmanska;	code
504	Reconstructing Surfaces for Sparse Point Clouds With On-Surface Priors Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key idea is to infer signed distances by pushing both the query projections to be on the surface and the projection distance to be the minimum.	Baorui Ma; Yu-Shen Liu; Zhizhong Han;	code
505	VCLIMB: A Novel Video Class Incremental Learning Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce vCLIMB, a novel video continual learning benchmark.	Andrés Villa; Kumail Alhamoud; Victor Escorcia; Fabian Caba; Juan León Alcázar; Bernard Ghanem;	code
506	Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Robust Equivariant Imaging (REI) framework which can learn to image from noisy partial measurements alone.	Dongdong Chen; Julián Tachella; Mike E. Davies;	code
507	ST++: Make Self-Training Work Better for Semi-Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by the impressive results, we thoroughly investigate the SDA and provide some empirical analysis.	Lihe Yang; Wei Zhuo; Lei Qi; Yinghuan Shi; Yang Gao;	code
508	Interacting Attention Graph for Single Image Two-Hand Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image.	Mengcheng Li; Liang An; Hongwen Zhang; Lianpeng Wu; Feng Chen; Tao Yu; Yebin Liu;	code
509	Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To accelerate the progress of roadside perception, we present the first high-diversity challenging Roadside Perception 3D dataset- Rope3D from a novel view.	Xiaoqing Ye; Mao Shu; Hanyu Li; Yifeng Shi; Yingying Li; Guangjie Wang; Xiao Tan; Errui Ding;	code
510	Cross-Image Relational Knowledge Distillation for Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a novel Cross-Image Relational KD (CIRKD), which focuses on transferring structured pixel-to-pixel and pixel-to-region relations among the whole images.	Chuanguang Yang; Helong Zhou; Zhulin An; Xue Jiang; Yongjun Xu; Qian Zhang;	code
511	Towards Layer-Wise Image Vectorization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose Layer-wise Image Vectorization, namely LIVE, to convert raster images to SVGs and simultaneously maintain its image topology.	Xu Ma; Yuqian Zhou; Xingqian Xu; Bin Sun; Valerii Filev; Nikita Orlov; Yun Fu; Humphrey Shi;	code
512	Scenic: A JAX Library for Computer Vision Research and Beyond Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Scenic is an open-source (https://github.com/google-research/scenic) JAX library with a focus on transformer-based models for computer vision research and beyond.	Mostafa Dehghani; Alexey Gritsenko; Anurag Arnab; Matthias Minderer; Yi Tay;	code
513	Real-Time Object Detection for Streaming Perception Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While past works ignore the inevitable changes in the environment after processing, streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. In this paper, instead of searching trade-offs between accuracy and speed like previous works, we point out that endowing real-time models with the ability to predict the future is the key to dealing with this problem.	Jinrong Yang; Songtao Liu; Zeming Li; Xiaoping Li; Jian Sun;	code
514	VisualHow: Multimodal Problem Solving Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: With an overarching goal of developing intelligent systems to assist humans in various daily activities, we propose VisualHow, a free-form and open-ended research that focuses on understanding a real-life problem and deriving its solution by incorporating key components across multiple modalities.	Jinhui Yang; Xianyu Chen; Ming Jiang; Shi Chen; Louis Wang; Qi Zhao;	code
515	Spatial Commonsense Graph for Object Localisation in Partial Scenes Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?)	Francesco Giuliari; Geri Skenderi; Marco Cristani; Yiming Wang; Alessio Del Bue;	code
516	OSSGAN: Open-Set Semi-Supervised Image Generation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set.	Kai Katsumata; Duc Minh Vo; Hideki Nakayama;	code
517	Bi-Level Alignment for Cross-Domain Crowd Counting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we aim to develop a new adversarial learning based method, which is simple and efficient to apply.	Shenjian Gong; Shanshan Zhang; Jian Yang; Dengxin Dai; Bernt Schiele;	code
518	ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: VFI can be extremely challenging, particularly in sequences containing large motions, occlusions or dynamic textures, where existing approaches fail to offer perceptually robust interpolation performance. In this context, we present a novel deep learning based VFI method, ST-MFNet, based on a Spatio-Temporal Multi-Flow architecture.	Duolikun Danier; Fan Zhang; David Bull;	code
519	Efficient Multi-View Stereo By Iterative Dynamic Cost Volume Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel iterative dynamic cost volume for multi-view stereo.	Shaoqian Wang; Bo Li; Yuchao Dai;	code
520	TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing.	Yanbo Xu; Yueqin Yin; Liming Jiang; Qianyi Wu; Chengyao Zheng; Chen Change Loy; Bo Dai; Wayne Wu;	code
521	Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.	Shu Zhang; Ran Xu; Caiming Xiong; Chetan Ramaiah;	code
522	SGTR: End-to-End Scene Graph Generation With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem.	Rongjie Li; Songyang Zhang; Xuming He;	code
523	Decoupled Knowledge Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of NCKD and (2) limits the flexibility to balance these two parts. To address these issues, we present Decoupled Knowledge Distillation(DKD), enabling TCKD and NCKD to play their roles more efficiently and flexibly.	Borui Zhao; Quan Cui; Renjie Song; Yiyu Qiu; Jiajun Liang;	code
524	DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.	Yingwei Li; Adams Wei Yu; Tianjian Meng; Ben Caine; Jiquan Ngiam; Daiyi Peng; Junyang Shen; Yifeng Lu; Denny Zhou; Quoc V. Le; Alan Yuille; Mingxing Tan;	code
525	Reusing The Task-Specific Classifier As A Discriminator: Discriminator-Free Adversarial Domain Adaptation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, most of these methods failed to effectively leverage the predicted discriminative information, and thus cause mode collapse for generator. In this work, we address this problem from a different perspective and design a simple yet effective adversarial paradigm in the form of a discriminator-free adversarial learning network (DALN), wherein the category classifier is reused as a discriminator, which achieves explicit domain alignment and category distinguishment through a unified objective, enabling the DALN to leverage the predicted discriminative information for sufficient feature alignment.	Lin Chen; Huaian Chen; Zhixiang Wei; Xin Jin; Xiao Tan; Yi Jin; Enhong Chen;	code
526	Show Me What and Tell Me How: Video Synthesis Via Multimodal Conditioning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work presents a multimodal video generation framework that benefits from text and images provided jointly or separately.	Ligong Han; Jian Ren; Hsin-Ying Lee; Francesco Barbieri; Kyle Olszewski; Shervin Minaee; Dimitris Metaxas; Sergey Tulyakov;	code
527	SIMBAR: Single Image-Based Scene Relighting for Effective Data Augmentation for Automated Driving Vision Tasks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a novel image-based relighting pipeline, SIMBAR, that can work with a single image as input.	Xianling Zhang; Nathan Tseng; Ameerah Syed; Rohan Bhasin; Nikita Jaipuria;	code
528	Multi-Label Classification With Partial Annotations Using Class-Aware Selective Loss Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset’s partial annotations.	Emanuel Ben-Baruch; Tal Ridnik; Itamar Friedman; Avi Ben-Cohen; Nadav Zamir; Asaf Noy; Lihi Zelnik-Manor;	code
529	CADTransformer: Panoptic Symbol Spotting Transformer for CAD Drawings Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing methods, based on convolutional neural networks (CNNs) and/or graph neural networks (GNNs), regress instance bounding boxes in the pixel domain and then convert the predictions into symbols. In this paper, we present a novel framework named CADTransformer, that can painlessly modify existing vision transformer (ViT) backbones to tackle the above limitations for the panoptic symbol spotting task.	Zhiwen Fan; Tianlong Chen; Peihao Wang; Zhangyang Wang;	code
530	IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Learning to synthesize data has emerged as a promising direction in zero-shot quantization (ZSQ), which represents neural networks by low-bit integer without accessing any of the real data. In this paper, we observe an interesting phenomenon of intra-class heterogeneity in real data and show that existing methods fail to retain this property in their synthetic images, which causes a limited performance increase.	Yunshan Zhong; Mingbao Lin; Gongrui Nan; Jianzhuang Liu; Baochang Zhang; Yonghong Tian; Rongrong Ji;	code
531	I M Avatar: Implicit Morphable Head Avatars From Videos Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos.	Yufeng Zheng; Victoria Fernández Abrevaya; Marcel C. Bühler; Xu Chen; Michael J. Black; Otmar Hilliges;	code
532	Weakly-Supervised Metric Learning With Cross-Module Communications for The Classification of Anterior Chamber Angle Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel end-to-end framework GCNet for automated Glaucoma Classification based on ACA images or other Glaucoma-related medical images.	Jingqi Huang; Yue Ning; Dong Nie; Linan Guan; Xiping Jia;	code
533	A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is because the current CNN-based methods adopt locality-based operations, which are not effective to deal with the variation caused by deformations. In this paper, we propose a CNN based Text ATTention network (TATT) to address this problem.	Jianqi Ma; Zhetong Liang; Lei Zhang;	code
534	Multi-Modal Dynamic Graph Transformer for Visual Grounding Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Their performance depends on the density and quality of the candidate regions and is capped by the inability to optimize the located regions continuously. To address these issues, we propose to remodel VG into a progressively optimized visual semantic alignment process.	Sijia Chen; Baochun Li;	code
535	Geometric Transformer for Fast and Robust Point Cloud Registration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Geometric Transformer to learn geometric feature for robust superpoint matching.	Zheng Qin; Hao Yu; Changjian Wang; Yulan Guo; Yuxing Peng; Kai Xu;	code
536	UMT: Unified Multi-Modal Transformers for Joint Video Moment Retrieval and Highlight Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nevertheless, jointly conducting moment retrieval and highlight detection is an emerging research topic, even though its component problems and some related tasks have already been studied for a while. In this paper, we present the first unified framework, named Unified Multi-modal Transformers (UMT), capable of realizing such joint optimization while can also be easily degenerated for solving individual problems.	Ye Liu; Siyuan Li; Yang Wu; Chang-Wen Chen; Ying Shan; Xiaohu Qie;	code
537	Demystifying The Neural Tangent Kernel From A Practical Perspective: Can It Be Trusted for Neural Architecture Search Without Training? Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we revisit several at-initialization metrics that can be derived from the NTK and reveal their key shortcomings.	Jisoo Mok; Byunggook Na; Ji-Hoon Kim; Dongyoon Han; Sungroh Yoon;	code
538	The Devil Is in The Details: Window-Based Attention for Image Compression Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.	Renjie Zou; Chunfeng Song; Zhaoxiang Zhang;	code
539	DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper presents a new real-world photometric stereo dataset with "ground truth" normal maps, which is 10 times larger than the widely adopted one.	Jieji Ren; Feishi Wang; Jiahao Zhang; Qian Zheng; Mingjun Ren; Boxin Shi;	code
540	PolyWorld: Polygonal Building Extraction With Graph Neural Networks in Satellite Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons.	Stefano Zorzi; Shabab Bazrafkan; Stefan Habenschuss; Friedrich Fraundorfer;	code
541	Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we study efficient architecture design for real-time multi-person pose estimation on edge.	Yihan Wang; Muyang Li; Han Cai; Wei-Ming Chen; Song Han;	code
542	Spatio-Temporal Relation Modeling for Few-Shot Action Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations.	Anirudh Thatipelli; Sanath Narayan; Salman Khan; Rao Muhammad Anwer; Fahad Shahbaz Khan; Bernard Ghanem;	code
543	Multi-Person Extreme Motion Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons.	Wen Guo; Xiaoyu Bie; Xavier Alameda-Pineda; Francesc Moreno-Noguer;	code
544	B-DARTS: Beta-Decay Regularization for Differentiable Architecture Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, they suffer from two main issues, the weak robustness to the performance collapse and the poor generalization ability of the searched architectures. To solve these two problems, a simple-but-efficient regularization method, termed as Beta-Decay, is proposed to regularize the DARTS-based NAS searching process.	Peng Ye; Baopu Li; Yikang Li; Tao Chen; Jiayuan Fan; Wanli Ouyang;	code
545	CMT: Convolutional Neural Networks Meet Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there are still gaps in both performance and computational cost between transformers and existing convolutional neural networks (CNNs). In this paper, we aim to address this issue and develop a network that can outperform not only the canonical transformers, but also the high-performance convolutional models.	Jianyuan Guo; Kai Han; Han Wu; Yehui Tang; Xinghao Chen; Yunhe Wang; Chang Xu;	code
546	KNN Local Attention for Image Restoration Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, by focusing only on adjacent positions, the local attention suffers from an insufficient receptive field for image restoration. In this paper, we propose a new attention mechanism for image restoration, called k-NN Image Transformer (KiT), that rectifies above mentioned limitations.	Hunsang Lee; Hyesong Choi; Kwanghoon Sohn; Dongbo Min;	code
547	Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered By Pre-Trained Vision-Language Model Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations.	Zipeng Xu; Tianwei Lin; Hao Tang; Fu Li; Dongliang He; Nicu Sebe; Radu Timofte; Luc Van Gool; Errui Ding;	code
548	TransMix: Attend To Mix for Vision Transformers Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This may lead to a strange phenomenon that sometimes there is no valid object in the mixed image due to the random process in augmentation but there is still response in the label space. To bridge such gap between the input and label spaces, we propose TransMix, which mixes labels based on the attention maps of Vision Transformers.	Jie-Neng Chen; Shuyang Sun; Ju He; Philip H.S. Torr; Alan Yuille; Song Bai;	code
549	Inertia-Guided Flow Completion and Style Fusion for Video Inpainting Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Nevertheless, the existing flow-guided cross-frame warping methods fail to consider the lightening and sharpness variation across video frames, which leads to spatial incoherence after warping from other frames. To alleviate such problem, we propose the Adaptive Style Fusion Network (ASFN), which utilizes the style information extracted from the valid regions to guide the gradient refinement in the warped regions.	Kaidong Zhang; Jingjing Fu; Dong Liu;	code
550	Long-Tailed Visual Recognition Via Gaussian Clouded Logit Adjustment Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper proposes the Gaussian clouded logit adjustment by Gaussian perturbation of different class logits with varied amplitude.	Mengke Li; Yiu-ming Cheung; Yang Lu;	code
551	Image Animation With Perturbed Masks Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present a novel approach for image-animation of a source image by a driving video, both depicting the same type of object.	Yoav Shalev; Lior Wolf;	code
552	Domain Generalization Via Shuffled Style Assembly for Face Anti-Spoofing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we separate the complete representation into content and style ones.	Zhuo Wang; Zezheng Wang; Zitong Yu; Weihong Deng; Jiahong Li; Tingting Gao; Zhongyuan Wang;	code
553	OcclusionFusion: Occlusion-Aware Motion Estimation for Real-Time Dynamic 3D Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This problem is even more severe for single-view-based systems due to strong occlusions. Based on these observations, we propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction.	Wenbin Lin; Chengwei Zheng; Jun-Hai Yong; Feng Xu;	code
554	MonoScene: Monocular 3D Semantic Scene Completion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Along with architectural contributions, we introduce novel global scene and local frustums losses.	Anh-Quan Cao; Raoul de Charette;	code
555	AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This work reformulates the training of AdaFocus as a simple one-stage algorithm by introducing a differentiable interpolation-based patch selection operation, enabling efficient end-to-end optimization.	Yulin Wang; Yang Yue; Yuanze Lin; Haojun Jiang; Zihang Lai; Victor Kulikov; Nikita Orlov; Humphrey Shi; Gao Huang;	code
556	Continuous Scene Representations for Embodied AI Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings.	Samir Yitzhak Gadre; Kiana Ehsani; Shuran Song; Roozbeh Mottaghi;	code
557	Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective.	Chaoda Zheng; Xu Yan; Haiming Zhang; Baoyuan Wang; Shenghui Cheng; Shuguang Cui; Zhen Li;	code
558	Non-Probability Sampling Network for Stochastic Human Trajectory Prediction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we analyze the problem by reconstructing and comparing probabilistic distributions from prediction samples and socially-acceptable paths, respectively.	Inhwan Bae; Jin-Hwi Park; Hae-Gon Jeon;	code
559	ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: So we propose ResSFL, a Split Federated Learning Framework that is designed to be MI-resistant during training.	Jingtao Li; Adnan Siraj Rakin; Xing Chen; Zhezhi He; Deliang Fan; Chaitali Chakrabarti;	code
560	Human-Aware Object Placement for Visual Environment Reconstruction Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images, and use these in optimizing the 3D scene to reconstruct a consistent, physically plausible, 3D scene layout.	Hongwei Yi; Chun-Hao P. Huang; Dimitrios Tzionas; Muhammed Kocabas; Mohamed Hassan; Siyu Tang; Justus Thies; Michael J. Black;	code
561	X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Common text-agnostic aggregations schemes include mean-pooling or self-attention over the frames, but these are likely to encode misleading visual information not described in the given text. To address this, we propose a cross-modal attention model called X-Pool that reasons between a text and the frames of a video.	Satya Krishna Gorti; Noël Vouitsis; Junwei Ma; Keyvan Golestan; Maksims Volkovs; Animesh Garg; Guangwei Yu;	code
562	RAMA: A Rapid Multicut Algorithm on GPU Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a highly parallel primal-dual algorithm for the multicut (a.k.a. correlation clustering) problem, a classical graph clustering problem widely used in machine learning and computer vision.	Ahmed Abbas; Paul Swoboda;	code
563	Adversarial Parametric Pose Prior Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose learning a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training.	Andrey Davydov; Anastasia Remizova; Victor Constantin; Sina Honari; Mathieu Salzmann; Pascal Fua;	code
564	Mask Transfiner for High-Quality Instance Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present Mask Transfiner for high-quality and efficient instance segmentation.	Lei Ke; Martin Danelljan; Xia Li; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu;	code
565	It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning By Contrastive Data Collection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We show that collecting new data, in the same way, is not effective in mitigating this emotional bias. To remedy this problem, we propose a contrastive data collection approach to balance ArtEmis with a new complementary dataset such that a pair of similar images have contrasting emotions (one positive and one negative).	Youssef Mohamed; Faizan Farooq Khan; Kilichbek Haydarov; Mohamed Elhoseiny;	code
566	DiRA: Discriminative, Restorative, and Adversarial Learning for Self-Supervised Medical Image Analysis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing efforts, however, omit their synergistic effects on each other in a ternary setup, which, we envision, can significantly benefit deep semantic representation learning. To realize this vision, we have developed DiRA, the first framework that unites discriminative, restorative, and adversarial learning in a unified manner to collaboratively glean complementary visual information from unlabeled medical images for fine-grained semantic representation learning.	Fatemeh Haghighi; Mohammad Reza Hosseinzadeh Taher; Michael B. Gotway; Jianming Liang;	code
567	Event-Based Video Reconstruction Via Potential-Assisted Spiking Neural Network Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel Event-based Video reconstruction framework based on a fully Spiking Neural Network (EVSNN), which utilizes Leaky-Integrate-and-Fire (LIF) neuron and Membrane Potential (MP) neuron.	Lin Zhu; Xiao Wang; Yi Chang; Jianing Li; Tiejun Huang; Yonghong Tian;	code
568	YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Many video understanding tasks require analyzing multi-shot videos, but existing datasets for video object segmentation (VOS) only consider single-shot videos. To address this challenge, we collected a new dataset—YouMVOS—of 200 popular YouTube videos spanning ten genres, where each video is on average five minutes long and with 75 shots.	Donglai Wei; Siddhant Kharbanda; Sarthak Arora; Roshan Roy; Nishant Jain; Akash Palrecha; Tanav Shah; Shray Mathur; Ritik Mathur; Abhijay Kemkar; Anirudh Chakravarthy; Zudi Lin; Won-Dong Jang; Yansong Tang; Song Bai; James Tompkin; Philip H.S. Torr; Hanspeter Pfister;	code
569	DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: As the influence of recent network architectures has not been systematically studied, we first benchmark different network architectures for UDA and newly reveal the potential of Transformers for UDA semantic segmentation. Based on the findings, we propose a novel UDA method, DAFormer.	Lukas Hoyer; Dengxin Dai; Luc Van Gool;	code
570	Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a deep Brownian Distance Covariance (DeepBDC) method for few-shot classification.	Jiangtao Xie; Fei Long; Jiaming Lv; Qilong Wang; Peihua Li;	code
571	Self-Supervised Video Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose self-supervised training for video transformers using unlabeled video data.	Kanchana Ranasinghe; Muzammal Naseer; Salman Khan; Fahad Shahbaz Khan; Michael S. Ryoo;	code
572	AutoRF: Learning 3D Object Radiance Fields From Single View Observations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce AutoRF – a new approach for learning neural 3D object representations where each object in the training set is observed by only a single view.	Norman Müller; Andrea Simonelli; Lorenzo Porzi; Samuel Rota Bulò; Matthias Nießner; Peter Kontschieder;	code
573	Coopernaut: End-to-End Driving With Cooperative Perception for Networked Vehicles Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving.	Jiaxun Cui; Hang Qiu; Dian Chen; Peter Stone; Yuke Zhu;	code
574	TubeR: Tubelet Transformer for Video Action Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose TubeR: a simple solution for spatio-temporal video action detection.	Jiaojiao Zhao; Yanyi Zhang; Xinyu Li; Hao Chen; Bing Shuai; Mingze Xu; Chunhui Liu; Kaustav Kundu; Yuanjun Xiong; Davide Modolo; Ivan Marsic; Cees G. M. Snoek; Joseph Tighe;	code
575	MUM: Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Especially when extending SSL to semi-supervised object detection (SSOD), many strong augmentation methodologies related to image geometry and interpolation-regularization are hard to utilize since they possibly hurt the location information of the bounding box in the object detection task. To address this, we introduce a simple yet effective data augmentation method, Mix/UnMix (MUM), which unmixes feature tiles for the mixed image tiles for the SSOD framework.	JongMok Kim; JooYoung Jang; Seunghyeon Seo; Jisoo Jeong; Jongkeun Na; Nojun Kwak;	code
576	Learning Non-Target Knowledge for Few-Shot Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Existing studies in few-shot semantic segmentation only focus on mining the target object information, however, often are hard to tell ambiguous regions, especially in non-target regions, which include background (BG) and Distracting Objects (DOs). To alleviate this problem, we propose a novel framework, namely Non-Target Region Eliminating (NTRE) network, to explicitly mine and eliminate BG and DO regions in the query.	Yuanwei Liu; Nian Liu; Qinglong Cao; Xiwen Yao; Junwei Han; Ling Shao;	code
577	UKPGAN: A General Self-Supervised Keypoint Detector Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this work, we reckon keypoint detection as information compression, and force the model to distill out important points of an object.	Yang You; Wenhai Liu; Yanjie Ze; Yong-Lu Li; Weiming Wang; Cewu Lu;	code
578	Raw High-Definition Radar for Multi-Task Learning Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel HD radar sensing model, FFT-RadNet, that eliminates the overhead of computing the range-azimuth-Doppler 3D tensor, learning instead to recover angles from a range-Doppler spectrum.	Julien Rebut; Arthur Ouaknine; Waqas Malik; Patrick Pérez;	code
579	Coarse-To-Fine Feature Mining for Video Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, there is no research about how to simultaneously learn static and motional contexts which are highly correlated and complementary to each other. To address this problem, we propose a Coarse-to-Fine Feature Mining (CFFM) technique to learn a unified presentation of static contexts and motional contexts.	Guolei Sun; Yun Liu; Henghui Ding; Thomas Probst; Luc Van Gool;	code
580	Compressing Models With Few Samples: Mimicking Then Replacing Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher’s in the penultimate layer, and then replaces teacher’s layers before penultimate with a well-tuned compact one.	Huanyu Wang; Junjie Liu; Xin Ma; Yang Yong; Zhenhua Chai; Jianxin Wu;	code
581	PokeBNN: A Binary Pursuit of Lightweight Accuracy Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Binary neural networks (BNNs) have the potential to significantly lower the compute intensity but existing models suffer from low quality. To overcome this deficiency, we propose PokeConv, a binary convolution block which improves quality of BNNs by techniques such as adding multiple residual paths, and tuning the activation function.	Yichi Zhang; Zhiru Zhang; Lukasz Lew;	code
582	Zoom in and Out: A Mixed-Scale Triplet Network for Camouflaged Object Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Apart from high intrinsic similarity between the camouflaged objects and their background, the objects are usually diverse in scale, fuzzy in appearance, and even severely occluded. To deal with these problems, we propose a mixed-scale triplet network, ZoomNet, which mimics the behavior of humans when observing vague images, i.e., zooming in and out.	Youwei Pang; Xiaoqi Zhao; Tian-Zhu Xiang; Lihe Zhang; Huchuan Lu;	code
583	SOMSI: Spherical Novel View Synthesis With Soft Occlusion Multi-Sphere Images Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel MSI representation called Soft Occlusion MSI (SOMSI) that enables modelling high-dimensional appearance features in MSI while retaining the fast rendering times of a standard MSI.	Tewodros Habtegebrial; Christiano Gava; Marcel Rogge; Didier Stricker; Varun Jampani;	code
584	EMScore: Evaluating Video Captioning Via Coarse-Grained and Fine-Grained Embedding Matching Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by human evaluation, we propose EMScore (Embedding Matching-based score), a novel reference-free metric for video captioning, which directly measures similarity between video and candidate captions.	Yaya Shi; Xu Yang; Haiyang Xu; Chunfeng Yuan; Bing Li; Weiming Hu; Zheng-Jun Zha;	code
585	PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing dual-loop learning framework.	Kehong Gong; Bingbing Li; Jianfeng Zhang; Tao Wang; Jing Huang; Michael Bi Mi; Jiashi Feng; Xinchao Wang;	code
586	Group Contextualization for Video Recognition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose an efficient feature refinement method that decomposes the feature channels into several groups and separately refines them with different axial contexts in parallel.	Yanbin Hao; Hao Zhang; Chong-Wah Ngo; Xiangnan He;	code
587	Single-Domain Generalized Object Detection in Urban Scene Via Cyclic-Disentangled Self-Distillation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we are concerned with enhancing the generalization capability of object detectors.	Aming Wu; Cheng Deng;	code
588	L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present L2G, a simple online local-to-global knowledge transfer framework for high-quality object attention mining.	Peng-Tao Jiang; Yuqi Yang; Qibin Hou; Yunchao Wei;	code
589	Self-Augmented Unpaired Image Dehazing Via Density and Depth Decomposition Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a self-augmented image dehazing framework, termed D^4 (Dehazing via Decomposing transmission map into Density and Depth) for haze generation and removal.	Yang Yang; Chaoyue Wang; Risheng Liu; Lin Zhang; Xiaojie Guo; Dacheng Tao;	code
590	Neural 3D Video Synthesis From Multi-View Video Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation.	Tianye Li; Mira Slavcheva; Michael Zollhöfer; Simon Green; Christoph Lassner; Changil Kim; Tanner Schmidt; Steven Lovegrove; Michael Goesele; Richard Newcombe; Zhaoyang Lv;	code
591	SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Based on this technique, we propose SemAffiNet for point cloud semantic segmentation, which utilizes the attention mechanism in the Transformer module to implicitly and explicitly capture global structural knowledge within local parts for overall comprehension of each category.	Ziyi Wang; Yongming Rao; Xumin Yu; Jie Zhou; Jiwen Lu;	code
592	Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search.	Han Xiao; Ziwei Wang; Zheng Zhu; Jie Zhou; Jiwen Lu;	code
593	HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we present a novel attention mechanism for pansharpening called HyperTransformer, in which features of LR-HSI and PAN are formulated as queries and keys in a transformer, respectively.	Wele Gedara Chaminda Bandara; Vishal M. Patel;	code
594	Structure-Aware Flow Generation for Human Body Reshaping Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the complicated structure and multifarious appearance of human bodies, existing methods either fall back on the 3D domain via body morphable model or resort to keypoint-based image deformation, leading to inefficiency and unsatisfied visual quality. In this paper, we address these limitations by formulating an end-to-end flow generation architecture under the guidance of body structural priors, including skeletons and Part Affinity Fields, and achieve unprecedentedly controllable performance under arbitrary poses and garments.	Jianqiang Ren; Yuan Yao; Biwen Lei; Miaomiao Cui; Xuansong Xie;	code
595	Learning To Answer Questions in Dynamic Audio-Visual Scenarios Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.	Guangyao Li; Yake Wei; Yapeng Tian; Chenliang Xu; Ji-Rong Wen; Di Hu;	code
596	Synthetic Aperture Imaging With Events and Frames Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: However, the performance of E-SAI is not consistent under sparse occlusions due to the dramatic decrease of signal events. This paper addresses this problem by leveraging the merits of both events and frames, leading to a fusion-based SAI (EF-SAI) that performs consistently under the different densities of occlusions.	Wei Liao; Xiang Zhang; Lei Yu; Shijie Lin; Wen Yang; Ning Qiao;	code
597	MonoGround: Detecting Monocular 3D Objects From The Ground Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Due to the ill-posed 2D to 3D mapping essence from the monocular imaging process, monocular 3D object detection suffers from inaccurate depth estimation and thus has poor 3D detection results. To alleviate this problem, we propose to introduce the ground plane as a prior in the monocular 3d object detection.	Zequn Qin; Xi Li;	code
598	Deep Visual Geo-Localization Benchmark Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new open-source benchmarking framework for Visual Geo-localization (VG) that allows to build, train, and test a wide range of commonly used architectures, with the flexibility to change individual components of a geo-localization pipeline.	Gabriele Berton; Riccardo Mereu; Gabriele Trivigno; Carlo Masone; Gabriela Csurka; Torsten Sattler; Barbara Caputo;	code
599	StyleGAN-V: A Continuous Video Generator With The Price, Image Quality and Perks of StyleGAN2 Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Videos show continuous events, yet most — if not all — video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be — time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator.	Ivan Skorokhodov; Sergey Tulyakov; Mohamed Elhoseiny;	code
600	LISA: Learning Implicit Shape and Appearance of Hands Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper proposes a do-it-all neural model of human hands, named LISA.	Enric Corona; Tomas Hodan; Minh Vo; Francesc Moreno-Noguer; Chris Sweeney; Richard Newcombe; Lingni Ma;	code
601	Iterative Deep Homography Estimation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We propose Iterative Homography Network, namely IHN, a new deep homography estimation architecture.	Si-Yuan Cao; Jianxin Hu; Zehua Sheng; Hui-Liang Shen;	code
602	Learned Queries for Efficient Local Attention Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a new shift-invariant local attention layer, called query and attend (QnA), that aggregates the input locally in an overlapping manner, much like convolutions.	Moab Arar; Ariel Shamir; Amit H. Bermano;	code
603	Colar: Effective and Efficient Online Action Detection By Consulting Exemplars Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This paper develops an effective exemplar-consultation mechanism that first measures the similarity between a frame and exemplary frames, and then aggregates exemplary features based on the similarity weights.	Le Yang; Junwei Han; Dingwen Zhang;	code
604	SoftGroup for 3D Instance Segmentation on Point Clouds Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address the aforementioned problems, this paper proposes a 3D instance segmentation method referred to as SoftGroup by performing bottom-up soft grouping followed by top-down refinement.	Thang Vu; Kookhoi Kim; Tung M. Luu; Thanh Nguyen; Chang D. Yoo;	code
605	MVS2D: Efficient Multi-View Stereo Via Attention-Driven 2D Convolutions Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: We present MVS2D, a highly efficient multi-view stereo algorithm that seamlessly integrates multi-view constraints into single-view networks via an attention mechanism.	Zhenpei Yang; Zhile Ren; Qi Shan; Qixing Huang;	code
606	Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation Via Semantic Knowledge Transfer and Self-Refinement Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose a novel approach including two innovative components.	Beomyoung Kim; YoungJoon Yoo; Chae Eun Rhee; Junmo Kim;	code
607	Deep Constrained Least Squares for Blind Image Super-Resolution Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we tackle the problem of blind image super-resolution(SR) with a reformulated degradation model and two novel modules.	Ziwei Luo; Haibin Huang; Lei Yu; Youwei Li; Haoqiang Fan; Shuaicheng Liu;	code
608	EDTER: Edge Detection With Transformer Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we propose a novel transformer-based edge detector, Edge Detection TransformER (EDTER), to extract clear and crisp object boundaries and meaningful edges by exploiting the full image context information and detailed local cues simultaneously.	Mengyang Pu; Yaping Huang; Yuming Liu; Qingji Guan; Haibin Ling;	code
609	AirObject: A Temporally Evolving Graph Embedding for Object Identification Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Furthermore, given the vast distribution of unknown novel objects in the real world, the object identification process must be class-agnostic. In this context, we propose a novel temporal 3D object encoding approach, dubbed AirObject, to obtain global keypoint graph-based embeddings of objects.	Nikhil Varma Keetha; Chen Wang; Yuheng Qiu; Kuan Xu; Sebastian Scherer;	code
610	From Representation to Reasoning: Towards Both Evidence and Commonsense Reasoning for Video Question-Answering Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual).	Jiangtong Li; Li Niu; Liqing Zhang;	code
611	Semantic-Aware Domain Generalized Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we address domain generalized semantic segmentation, where a segmentation model is trained to be domain-invariant without using any target domain data.	Duo Peng; Yinjie Lei; Munawar Hayat; Yulan Guo; Wen Li;	code
612	DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In response to such bias, we would like to re-emphasize that methods for multi-object tracking should also work when object appearance is not sufficiently discriminative. To this end, we propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation.	Peize Sun; Jinkun Cao; Yi Jiang; Zehuan Yuan; Song Bai; Kris Kitani; Ping Luo;	code
613	UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: This is a closed-set scenario that fails to test the capability of systems at detecting new anomaly types. To this end, we propose UBnormal, a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection.	Andra Acsintoae; Andrei Florescu; Mariana-Iuliana Georgescu; Tudor Mare; Paul Sumedrea; Radu Tudor Ionescu; Fahad Shahbaz Khan; Mubarak Shah;	code
614	AKB-48: A Real-World Articulated Object Knowledge Base Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To build the AKB-48, we present a fast articulation knowledge modeling (FArM) pipeline, which can fulfill the ArtiKG for an articulated object within 10-15 minutes, and largely reduce the cost for object modeling in the real world.	Liu Liu; Wenqiang Xu; Haoyuan Fu; Sucheng Qian; Qiaojun Yu; Yang Han; Cewu Lu;	code
615	Stratified Transformer for 3D Point Cloud Segmentation Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance.	Xin Lai; Jianhui Liu; Li Jiang; Liwei Wang; Hengshuang Zhao; Shu Liu; Xiaojuan Qi; Jiaya Jia;	code
616	Aug-NeRF: Training Stronger Neural Radiance Fields With Triple-Level Physically-Grounded Augmentations Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: Inspired by that, we propose Augmented NeRF (Aug-NeRF), which for the first time brings the power of robust data augmentations into regularizing the NeRF training.	Tianlong Chen; Peihao Wang; Zhiwen Fan; Zhangyang Wang;	code
617	Semantic-Shape Adaptive Feature Modulation for Semantic Image Synthesis Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: In order to exploit the part-level layouts, we propose a Shape-aware Position Descriptor (SPD) to describe each pixel’s positional feature, where object shape is explicitly encoded into the SPD feature.	Zhengyao Lv; Xiaoming Li; Zhenxing Niu; Bing Cao; Wangmeng Zuo;	code
618	Day-to-Night Image Synthesis for Training Nighttime Neural ISPs Literature Review Related Patents Related Grants Related Orgs Related Experts Details Highlight: To address this problem, we propose a method that synthesizes nighttime images from daytime images.	Abhijith Punnappurath; Abdullah Abuolaim; Abdelrahman Abdelhamed; Alex Levinshtein; Michael S. Brown;	code