Paper Digest: ECCV 2022 Highlights & Code
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.
If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ECCV 2022 Highlights & Code
Paper | Author(s) | |
---|---|---|
1 | Learning Depth from Focus in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a convolutional neural network-based depth estimation from single focal stacks.In addition, for the generalization of the proposed network, we develop a simulator to realistically reproduce the features of commercial cameras, such as changes in field of view, focal length and principal points. |
Changyeon Won; Hae-Gon Jeon; |
2 | Learning-Based Point Cloud Registration for 6D Object Pose Estimation in The Real World Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. |
Zheng Dang; Lizhou Wang; Yu Guo; Mathieu Salzmann; |
3 | An End-to-End Transformer Model for Crowd Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an elegant, end-to-end Crowd Localization TRansformer named CLTR that solves the task in the regression-based paradigm. |
Dingkang Liang; Wei Xu; Xiang Bai; |
4 | Few-Shot Single-View 3D Reconstruction with Memory Prior Contrastive Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a Memory Prior Contrastive Network (MPCN) that can store shape prior knowledge in a few-shot learning based 3D reconstruction framework. |
Zhen Xing; Yijiang Chen; Zhixin Ling; Xiangdong Zhou; Yu Xiang; |
5 | DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). |
Liang Peng; Xiaopei Wu; Zheng Yang; Haifeng Liu; Deng Cai; |
6 | Adaptive Co-Teaching for Unsupervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unsupervised depth estimation using photometric losses suffers from local minimum and training instability. We address this issue by proposing an adaptive co-teaching framework to distill the learned knowledge from unsupervised teacher networks to a student network. |
Weisong Ren; Lijun Wang; Yongri Piao; Miao Zhang; Huchuan Lu; Ting Liu; |
7 | Fusing Local Similarities for Retrieval-Based 3D Orientation Estimation of Unseen Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. |
Chen Zhao; Yinlin Hu; Mathieu Salzmann; |
8 | Lidar Point Cloud Guided Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We delve into this underlying mechanism and then empirically find that: concerning the label accuracy, the 3D location part in the label is preferred compared to other parts of labels. Motivated by the conclusions above and considering the precise LiDAR 3D measurement, we propose a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D object detection (LPCG). |
Liang Peng; Fei Liu; Zhengxu Yu; Senbo Yan; Dan Deng; Zheng Yang; Haifeng Liu; Deng Cai; |
9 | Structural Causal 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper considers the problem of unsupervised 3D object reconstruction from in-the-wild single-view images. |
Weiyang Liu; Zhen Liu; Liam Paull; Adrian Weller; Bernhard Schö,lkopf; |
10 | 3D Human Pose Estimation Using Möbius Graph Convolutional Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, a major limitation of GCNs is their inability to encode all the transformations between joints explicitly. To address this issue, we propose a novel spectral GCN using the Möbius transformation (MöbiusGCN). |
Niloofar Azizi; Horst Possegger; Emanuele Rodolà,; Horst Bischof; |
11 | Learning to Train A Point Cloud Reconstruction Network Without Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel framework named PCLossNet which learns to train a point cloud reconstruction network without any matching. |
Tianxin Huang; Xuemeng Yang; Jiangning Zhang; Jinhao Cui; Hao Zou; Jun Chen; Xiangrui Zhao; Yong Liu; |
12 | PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes the panorama Transformer (named PanoFormer) to estimate the depth in panorama images, with tangent patches from spherical domain, learnable token flows, and panorama specific metrics. |
Zhijie Shen; Chunyu Lin; Kang Liao; Lang Nie; Zishuo Zheng; Yao Zhao; |
13 | Self-supervised Human Mesh Recovery with Cross-Representation Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, on synthetic dense correspondence maps (i.e., IUV) few have been explored since the domain gap between synthetic training data and real testing data is hard to address for 2D dense representation. To alleviate this domain gap on IUV, we propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints). |
Xuan Gong; Meng Zheng; Benjamin Planche; Srikrishna Karanam; Terrence Chen; David Doermann; Ziyan Wu; |
14 | AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we propose a joint learning framework that disentangles the pose and the shape. |
Zerui Chen; Yana Hasson; Cordelia Schmid; Ivan Laptev; |
15 | A Reliable Online Method for Joint Estimation of Focal Length and Camera Rotation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Linear perspective cues deriving from regularities of the built environment can be used to recalibrate both intrinsic and extrinsic camera parameters online, but these estimates can be unreliable due to irregularities in the scene, uncertainties in line segment estimation and background clutter. Here we address this challenge through four initiatives. |
Yiming Qian; James H. Elder; |
16 | PS-NeRF: Neural Inverse Rendering for Multi-View Photometric Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a neural inverse rendering method for MVPS based on implicit representation. |
Wenqi Yang; Guanying Chen; Chaofeng Chen; Zhenfang Chen; Kwan-Yee K. Wong; |
17 | Share with Thy Neighbors: Single-View Reconstruction By Cross-Instance Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our main contributions are two ways for leveraging cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion and (ii) neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture. |
Tom Monnier; Matthew Fisher; Alexei A. Efros; Mathieu Aubry; |
18 | Towards Comprehensive Representation Enhancement in Semantics-Guided Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an attention-based module to enhance task-specific feature by addressing their feature uniqueness within instances. |
Jingyuan Ma; Xiangyu Lei; Nan Liu; Xian Zhao; Shiliang Pu; |
19 | AvatarCap: Animatable Avatar Conditioned Monocular Human Volumetric Capture Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the ill-posed problem caused by partial observations in monocular human volumetric capture, we present AvatarCap, a novel framework that introduces animatable avatars into the capture pipeline for high-fidelity reconstruction in both visible and invisible regions. |
Zhe Li; Zerong Zheng; Hongwen Zhang; Chaonan Ji; Yebin Liu; |
20 | Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO. |
Junhyeong Cho; Kim Youwang; Tae-Hyun Oh; |
21 | GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a robust and accurate depth refinement system, named GeoRefine, for geometrically-consistent dense mapping from monocular sequences. |
Pan Ji; Qingan Yan; Yuxin Ma; Yi Xu; |
22 | Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we formulate a potentially valuable panoramic depth completion (PDC) task as panoramic 3D cameras often produce 360° depth with missing data in complex scenes. |
Zhiqiang Yan; Xiang Li; Kun Wang; Zhenyu Zhang; Jun Li; Jian Yang; |
23 | GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel two-stage Geometry PrIor-based Transformation framework named GitNet, consisting of (i) the geometry-guided pre-alignment and (ii) ray-based transformer. |
Shi Gong; Xiaoqing Ye; Xiao Tan; Jingdong Wang; Errui Ding; Yu Zhou; Xiang Bai; |
24 | Learning Visibility for Robust Dense Human Body Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we learn dense human body estimation that is robust to partial observations. |
Chun-Han Yao; Jimei Yang; Duygu Ceylan; Yi Zhou; Yang Zhou; Ming-Hsuan Yang; |
25 | Towards High-Fidelity Single-View Holistic Reconstruction of Indoor Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new framework to reconstruct holistic 3D indoor scenes including both room background and indoor objects from single-view images. |
Haolin Liu; Yujian Zheng; Guanying Chen; Shuguang Cui; Xiaoguang Han; |
26 | CompNVS: Novel View Synthesis with Scene Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. |
Zuoyue Li; Tianxing Fan; Zhenqiang Li; Zhaopeng Cui; Yoichi Sato; Marc Pollefeys; Martin R. Oswald; |
27 | SketchSampler: Sketch-Based 3D Reconstruction Via View-Dependent Depth Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Through analyzing the 3D-to-2D projection process, we notice that the density map that characterizes the distribution of 2D point clouds (i.e., the probability of points projected at each location of the projection plane) can be used as a proxy to facilitate the reconstruction process. |
Chenjian Gao; Qian Yu; Lu Sheng; Yi-Zhe Song; Dong Xu; |
28 | LocalBins: Improving Depth Estimation By Learning Local Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel architecture for depth estimation from a single image. |
Shariq Farooq Bhat; Ibraheem Alhashim; Peter Wonka; |
29 | 2D GANs Meet Unsupervised Single-View 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, less attention has been devoted to 3D vision tasks. In light of this, we propose a novel image-conditioned neural implicit field, which can leverage 2D supervisions from GAN-generated multi-view images and perform the single-view reconstruction of generic objects. |
Feng Liu; Xiaoming Liu; |
30 | InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view. |
Zhengqi Li; Qianqian Wang; Noah Snavely; Angjoo Kanazawa; |
31 | Semi-Supervised Single-View 3D Reconstruction Via Prototype Shape Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction. |
Zhen Xing; Hengduo Li; Zuxuan Wu; Yu-Gang Jiang; |
32 | Bilateral Normal Integration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To model discontinuities, we introduce the assumption that the surface to be recovered is semi-smooth, i.e., the surface is one-sided differentiable (hence one-sided continuous) everywhere in the horizontal and vertical directions. |
Xu Cao; Hiroaki Santo; Boxin Shi; Fumio Okura; Yasuyuki Matsushita; |
33 | S$^2$Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel semi-supervised framework that allows us to learn contact from monocular videos. |
Tze Ho Elden Tse; Zhongqun Zhang; Kwang In Kim; Aleš Leonardis; Feng Zheng; Hyung Jin Chang; |
34 | SC-wLS: Towards Interpretable Feed-Forward Camera Re-localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In order to have the best of both worlds, we propose a feed-forward method termed SC-wLS that exploits all scene coordinate estimates for weighted least squares pose regression. |
Xin Wu; Hao Zhao; Shunkai Li; Yingdian Cao; Hongbin Zha; |
35 | FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Leveraging ToF depth estimates and a wide-angle RGB camera, we design an automatic calibration technique based on dense 2D/3D matching that can estimate camera pose intrinsic and distortion parameters of a stabilized main RGB sensor from a single snapshot. |
Andreas Meuleman; Hakyeong Kim; James Tompkin; Min H. Kim; |
36 | DELTAR: Depth Estimation from A Light-Weight ToF Sensor and RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose DELTAR, a novel method to empower light-weight ToF sensors with the capability of measuring high resolution and accurate depth by cooperating with a color image. |
Yijin Li; Xinyang Liu; Wenqi Dong; Han Zhou; Hujun Bao; Guofeng Zhang; Yinda Zhang; Zhaopeng Cui; |
37 | 3D Room Layout Estimation from A Cubemap of Panorama Image Via Deep Manhattan Hough Transform Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Significant geometric structures can be compactly described by global wireframes in the estimation of 3D room layout from a single panoramic image. Based on this observation, we present an alternative approach to estimate the walls in 3D space by modeling long-range geometric patterns in a learnable Hough Transform block. |
Yining Zhao; Chao Wen; Zhou Xue; Yue Gao; |
38 | RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, their shape prior integration strategy boosts pose estimation indirectly, which leads to insufficient pose-sensitive feature extraction and slow inference speed. To tackle this problem, in this paper, we propose a novel geometry-guided Residual Object Bounding Box Projection network RBP-Pose that jointly predicts object pose and residual vectors describing the displacements from the shape-prior-indicated object surface projections on the bounding box towards real surface projections. |
Ruida Zhang; Yan Di; Zhiqiang Lou; Fabian Manhardt; Federico Tombari; Xiangyang Ji; |
39 | Monocular 3D Object Reconstruction with GAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present MeshInversion, a novel framework to improve the reconstruction by exploiting the generative prior of a 3D GAN pre-trained for 3D textured mesh synthesis. |
Junzhe Zhang; Daxuan Ren; Zhongang Cai; Chai Kiat Yeo; Bo Dai; Chen Change Loy; |
40 | Map-Free Visual Relocalization: Metric Pose Relative to A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we propose Map-free Relocalization, i.e., using only one photo of a scene to enable instant, metric scaled relocalization.Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. |
Eduardo Arnold; Jamie Wynn; Sara Vicente; Guillermo Garcia-Hernando; Aron Monszpart; Victor Prisacariu; Daniyar Turmukhambetov; Eric Brachmann; |
41 | Self-Distilled Feature Aggregation for Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most of the existing works in literature aggregate multi-scale features for depth prediction via either straightforward concatenation or element-wise addition, however, such feature aggregation operations generally neglect the contextual consistency between multi-scale features. Addressing this problem, we propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features and maintaining their contextual consistency. |
Zhengming Zhou; Qiulei Dong; |
42 | Planes Vs. Chairs: Category-Guided 3D Shape Learning Without Any 3D Cues Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image. |
Zixuan Huang; Stefan Stojanov; Anh Thai; Varun Jampani; James M. Rehg; |
43 | MHR-Net: Multiple-Hypothesis Reconstruction of Non-rigid Shapes from 2D Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM). |
Haitian Zeng; Xin Yu; Jiaxu Miao; Yi Yang; |
44 | Depth Map Decomposition for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel algorithm for monocular depth estimation that decomposes a metric depth map into a normalized depth map and scale features. |
Jinyoung Jun; Jae-Han Lee; Chul Lee; Chang-Su Kim; |
45 | Monitored Distillation for Positive Congruent Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a method to infer a dense depth map from a single image, its calibration, and the associated sparse point cloud. |
Tian Yu Liu; Parth Agrawal; Allison Chen; Byung-Woo Hong; Alex Wong; |
46 | Resolution-Free Point Cloud Sampling Network with Data Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel resolution-free point clouds sampling network to directly sample the original point cloud to different resolutions, which is conducted by optimizing non-learning-based initial sampled points to better positions. |
Tianxin Huang; Jiangning Zhang; Jun Chen; Yuang Liu; Yong Liu; |
47 | Organic Priors in Non-rigid Structure from Motion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is shown that such priors reside in the factorized matrices, and quite surprisingly, existing methods generally disregard them. The paper’s main contribution is to put forward a simple, methodical, and practical method that can effectively exploit such organic priors to solve NRSfM. |
Suryansh Kumar; Luc Van Gool; |
48 | Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a method that can be trained solely on synthetic images, or optionally using a few additional real ones. |
Yinlin Hu; Pascal Fua; Mathieu Salzmann; |
49 | DANBO: Disentangled Articulated Neural Body Representations Via Graph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a three-stage method that induces two inductive biases to better disentangled pose-dependent deformation. |
Shih-Yang Su; Timur Bagautdinov; Helge Rhodin; |
50 | CHORE: Contact, Human and Object REconstruction from A Single RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce CHORE, a novel method that learns to jointly reconstruct the human and the object from a single RGB image. |
Xianghui Xie; Bharat Lal Bhatnagar; Gerard Pons-Moll; |
51 | Learned Vertex Descent: A New Direction for 3D Human Model Fitting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel optimization-based paradigm for 3D human shape fitting on images. |
Enric Corona; Gerard Pons-Moll; Guillem Alenyà,; Francesc Moreno-Noguer; |
52 | Self-Calibrating Photometric Stereo By Neural Inverse Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method that jointly optimizes object shape, light directions, and light intensities, all under general surfaces and lights assumptions. |
Junxuan Li; Hongdong Li; |
53 | 3D Clothed Human Reconstruction in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such datasets contain simple human poses and less natural image appearances compared to those of real in-the-wild datasets, which makes generalization of it to in-the-wild images extremely challenging. To resolve this issue, in this work, we propose ClothWild, a 3D clothed human reconstruction framework that firstly addresses the robustness on in-thewild images. |
Gyeongsik Moon; Hyeongjin Nam; Takaaki Shiratori; Kyoung Mu Lee; |
54 | Directed Ray Distance Functions for 3D Scene Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an approach for full 3D scene reconstruction from a single new image that can be trained on realistic non-watertight scans. |
Nilesh Kulkarni; Justin Johnson; David F. Fouhey; |
55 | Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, RGBD-based category-level 6D object pose estimation has achieved promising improvement in performance, however, the requirement of depth information prohibits broader applications. In order to relieve this problem, this paper proposes a novel approach named Object Level Depth reconstruction Network (OLD-Net) taking only RGB images as input for category-level 6D object pose estimation. |
Zhaoxin Fan; Zhenbo Song; Jian Xu; Zhicheng Wang; Kejian Wu; Hongyan Liu; Jun He; |
56 | Uncertainty Quantification in Depth Estimation Via Constrained Ordinal Regression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper provides an uncertainty quantification method for supervised MDE models. |
Dongting Hu; Liuhua Peng; Tingjin Chu; Xiaoxing Zhang; Yinian Mao; Howard Bondell; Mingming Gong; |
57 | CostDCNet: Cost Volume Based Depth Completion for A Single RGB-D Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel depth completion framework, CostDCNet, based on the cost volume-based depth estimation approach that has been successfully employed for multi-view stereo (MVS). |
Jaewon Kam; Jungeon Kim; Soongjin Kim; Jaesik Park; Seungyong Lee; |
58 | ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estimation. |
Muhammad Zubair Irshad; Sergey Zakharov; Rare? Ambru?; Thomas Kollar; Zsolt Kira; Adrien Gaidon; |
59 | 3D Siamese Transformer Network for Single Object Tracking on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area of point clouds. |
Le Hui; Lingpeng Wang; Linghua Tang; Kaihao Lan; Jin Xie; Jian Yang; |
60 | Object Wake-Up: 3D Object Rigging from A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It is a new problem that not only goes beyond image-based object reconstruction but also involves articulated animation of generic objects in 3D, which could give rise to numerous downstream augmented and virtual reality applications. In this paper, we propose an automated approach to tackle the entire process of reconstruct such generic 3D objects, rigging and animation, all from single images. |
Ji Yang; Xinxin Zuo; Sen Wang; Zhenbo Yu; Xingyu Li; Bingbing Ni; Minglun Gong; Li Cheng; |
61 | IntegratedPIFu: Integrated Pixel Aligned Implicit Function for Single-View Human Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose IntegratedPIFu, a new pixel-aligned implicit model that builds on the foundation set by PIFuHD. |
Kennard Yanting Chan; Guosheng Lin; Haiyu Zhao; Weisi Lin; |
62 | Realistic One-Shot Mesh-Based Head Avatars Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a system for the creation of realistic one-shot mesh-based (ROME) human head avatars. |
Taras Khakhulin; Vanessa Sklyarova; Victor Lempitsky; Egor Zakharov; |
63 | A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here, we present a new approach based on Kendall’s shape space to reconstruct 3D shapes from single monocular 2D images. |
Martha Paskin; Daniel Baum; Mason N. Dean; Christoph von Tycowicz; |
64 | Neural Light Field Estimation for Street Scenes with Differentiable Virtual Object Insertion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a neural approach that estimates the 5D HDR light field from a single image, and a differentiable object insertion formulation that enables end-to-end training with image-based losses that encourage realism. |
Zian Wang; Wenzheng Chen; David Acuna; Jan Kautz; Sanja Fidler; |
65 | Perspective Phase Angle Model for Polarimetric 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In the case of a large field of view, however, this assumption does not hold and may result in significant reconstruction errors in methods that make this assumption. To address this problem, we present the perspective phase angle (PPA) model that is applicable to perspective cameras. |
Guangcheng Chen; Li He; Yisheng Guan; Hong Zhang; |
66 | DeepShadow: Neural Shape from Shadow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents ‘DeepShadow’, a one-shot method for recovering the depth map and surface normals from photometric stereo shadow maps. |
Asaf Karnieli; Ohad Fried; Yacov Hel-Or; |
67 | Camera Auto-Calibration from The Steiner Conic of The Fundamental Matrix Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We thus propose a method to fully calibrate the camera. |
Yu Liu; Hui Zhang; |
68 | Super-Resolution 3D Human Shape from A Single Low-Resolution Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel framework to reconstruct super-resolution human shape from a single low-resolution input image. |
Marco Pesavento; Marco Volino; Adrian Hilton; |
69 | Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal Charts and Distortion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present Minimal Neural Atlas, a novel atlas-based explicit neural surface representation. |
Weng Fei Low; Gim Hee Lee; |
70 | ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ExtrudeNet, an unsupervised end-to-end network for discovering sketch and extrude from point clouds. |
Daxuan Ren; Jianmin Zheng; Jianfei Cai; Jiatong Li; Junzhe Zhang; |
71 | CATRE: Iterative Point Clouds Alignment for Category-Level Object Pose Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In specific, we propose a novel disentangled architecture being aware of the inherent distinctions between rotation and translation/size estimation. |
Xingyu Liu; Gu Wang; Yi Li; Xiangyang Ji; |
72 | Optimization Over Disentangled Encoding: Unsupervised Cross-Domain Point Cloud Completion Via Occlusion Factor Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we disentangle partial scans into three (domain, shape, and occlusion) factors to handle the output gap in cross-domain completion. |
Jingyu Gong; Fengqi Liu; Jiachen Xu; Min Wang; Xin Tan; Zhizhong Zhang; Ran Yi; Haichuan Song; Yuan Xie; Lizhuang Ma; |
73 | Unsupervised Learning of 3D Semantic Keypoints with Mutual Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: From a novel mutual reconstruction perspective, we present an unsupervised method to generate consistent semantic keypoints from point clouds explicitly. |
Haocheng Yuan; Chen Zhao; Shichao Fan; Jiaxi Jiang; Jiaqi Yang; |
74 | MvDeCor: Multi-View Dense Correspondence Learning for Fine-Grained 3D Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. |
Gopal Sharma; Kangxue Yin; Subhransu Maji; Evangelos Kalogerakis; Or Litany; Sanja Fidler; |
75 | SUPR: A Sparse Unified Part-Based Human Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consequently, we propose a new learning scheme that jointly trains a full-body model and specific part models using a federated dataset of full-body and body-part scans. |
Ahmed A. A. Osman; Timo Bolkart; Dimitrios Tzionas; Michael J. Black; |
76 | Revisiting Point Cloud Simplification: A Learnable Feature Preserving Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Traditional simplification techniques usually rely on solving a time-consuming optimization problem, hence they are impractical for large-scale datasets. In an attempt to alleviate this computational burden, we propose a fast point cloud simplification method by learning to sample salient points. |
Rolandos Alexandros Potamias; Giorgos Bouritsas; Stefanos Zafeiriou; |
77 | Masked Autoencoders for Point Cloud Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a promising scheme of self-supervised learning, masked autoencoding has significantly advanced natural language processing and computer vision. Inspired by this, we propose a neat scheme of masked autoencoders for point cloud self-supervised learning, addressing the challenges posed by point cloud’s properties, including leakage of location information and uneven information density. |
Yatian Pang; Wenxiao Wang; Francis E.H. Tay; Wei Liu; Yonghong Tian; Li Yuan; |
78 | Intrinsic Neural Fields: Learning Functions on Manifolds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The extrinsic embedding ignores known intrinsic manifold properties and is inflexible wrt. transfer of the learned function. To overcome these limitations, this work introduces intrinsic neural fields, a novel and versatile representation for neural fields on manifolds. |
Lukas Koestler; Daniel Grittner; Michael Moeller; Daniel Cremers; Zorah Lä,hner; |
79 | Skeleton-Free Pose Transfer for Stylized 3D Characters Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the first method that automatically transfers poses between stylized 3D characters without skeletal rigging. |
Zhouyingcheng Liao; Jimei Yang; Jun Saito; Gerard Pons-Moll; Yang Zhou; |
80 | Masked Discrimination for Self-Supervised Learning on Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, mask based pretraining has yet to show benefits for point cloud understanding, likely due to standard backbones like PointNet being unable to properly handle the training versus testing distribution mismatch introduced by masking during training. In this paper, we bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds. |
Haotian Liu; Mu Cai; Yong Jae Lee; |
81 | FBNet: Feedback Network for Point Cloud Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel Feedback Network (FBNet) for point cloud completion, in which present features are efficiently refined by rerouting subsequent fine-grained ones. |
Xuejun Yan; Hongyu Yan; Jingjing Wang; Hang Du; Zhihong Wu; Di Xie; Shiliang Pu; Li Lu; |
82 | Meta-Sampler: Almost-Universal Yet Task-Oriented Sampling for Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an almost-universal sampler, in our quest for a sampler that can learn to preserve the most useful points for a particular task, yet be inexpensive to adapt to different tasks, models or datasets. |
Ta-Ying Cheng; Qingyong Hu; Qian Xie; Niki Trigoni; Andrew Markham; |
83 | A Level Set Theory for Neural Implicit Evolution Under Explicit Flows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. |
Ishit Mehta; Manmohan Chandraker; Ravi Ramamoorthi; |
84 | Efficient Point Cloud Analysis Using Hilbert Curve Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this way, we propose the HilbertNet to maintain the locality advantage of voxel-based methods while significantly reducing the computational cost. |
Wanli Chen; Xinge Zhu; Guojin Chen; Bei Yu; |
85 | TOCH: Spatio-Temporal Object-to-Hand Correspondence for Motion Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present TOCH, a method for refining incorrect 3D hand-object interaction sequences using a data prior. |
Keyang Zhou; Bharat Lal Bhatnagar; Jan Eric Lenssen; Gerard Pons-Moll; |
86 | LaTeRF: Label and Text Driven Object Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene and known camera poses, a natural language description of the object, and a small number of point-labels of object and non-object points in the input images. |
Ashkan Mirzaei; Yash Kant; Jonathan Kelly; Igor Gilitschenski; |
87 | MeshMAE: Masked Autoencoders for 3D Mesh Data Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, self-supervised pre-training has advanced Vision Transformers on various tasks w.r.t. different data modalities, e.g., image and 3D point cloud data. In this paper, we explore this learning paradigm for 3D mesh data analysis based on Transformers. |
Yaqian Liang; Shanshan Zhao; Baosheng Yu; Jing Zhang; Fazhi He; |
88 | Unsupervised Deep Multi-Shape Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel approach for deep multi-shape matching that ensures cycle-consistent multi-matchings while not depending on an explicit template shape. |
Dongliang Cao; Florian Bernard; |
89 | Texturify: Generating Textures on 3D Shape Surfaces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We thus propose Texturify, a GAN-based method that leverages a 3D shape dataset of an object class and learns to reproduce the distribution of appearances observed in real images by generating high-quality textures. |
Yawar Siddiqui; Justus Thies; Fangchang Ma; Qi Shan; Matthias Nieß,ner; Angela Dai; |
90 | Autoregressive 3D Shape Generation Via Canonical Mapping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, taming them in generating less structured and voluminous data formats such as high-resolution point clouds have seldom been explored due to ambiguous sequentialization processes and infeasible computation burden. In this paper, we aim to further exploit the power of transformers and employ them for the task of 3D point cloud generation. |
An-Chieh Cheng; Xueting Li; Sifei Liu; Min Sun; Ming-Hsuan Yang; |
91 | PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite rapid progress, state-of-the-art encoders are restrictive to canonicalized point clouds, and have weaker than necessary performance when encountering geometric transformation distortions. To overcome this challenge, we propose PointTree, a general-purpose point cloud encoder that is robust to transformations based on relaxed K-D trees. |
Jun-Kun Chen; Yu-Xiong Wang; |
92 | UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose united implicit functions (UNIF), a part-based method for clothed human reconstruction and animation with raw scans and skeletons as the input. |
Shenhan Qian; Jiale Xu; Ziwei Liu; Liqian Ma; Shenghua Gao; |
93 | PRIF: Primary Ray-Based Implicit Function Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). |
Brandon Y. Feng; Yinda Zhang; Danhang Tang; Ruofei Du; Amitabh Varshney; |
94 | Point Cloud Domain Adaptation Via Masked Local 3D Structure Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Masked Local Structure Prediction (MLSP) method to encode target data. |
Hanxue Liang; Hehe Fan; Zhiwen Fan; Yi Wang; Tianlong Chen; Yu Cheng; Zhangyang Wang; |
95 | CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose CLIP-Actor, a text-driven motion recommendation and neural mesh stylization system for human mesh animation. |
Kim Youwang; Kim Ji-Yeon; Tae-Hyun Oh; |
96 | PlaneFormers: From Sparse View Planes to 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present an approach for the planar surface reconstruction of a scene from images with limited overlap. |
Samir Agarwala; Linyi Jin; Chris Rockwell; David F. Fouhey; |
97 | Learning Implicit Templates for Point-Based Clothed Human Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FITE, a First-Implicit-Then-Explicit framework for modeling human avatars in clothing. |
Siyou Lin; Hongwen Zhang; Zerong Zheng; Ruizhi Shao; Yebin Liu; |
98 | Exploring The Devil in Graph Spectral Domain for 3D Point Cloud Attacks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we propose point cloud attacks from a new perspective—Graph Spectral Domain Attack (GSDA), aiming to perturb transform coefficients in the graph spectral domain that corresponds to varying certain geometric structure. |
Qianjiang Hu; Daizong Liu; Wei Hu; |
99 | Structure-Aware Editable Morphable Model for 3D Facial Detail Animation and Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper augments morphable models in representing facial details by learning a Structure-aware Editable Morphable Model (SEMM). |
Jingwang Ling; Zhibo Wang; Ming Lu; Quan Wang; Chen Qian; Feng Xu; |
100 | MoFaNeRF: Morphable Facial Neural Radiance Field Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a parametric model that maps free-view images into a vector space of coded facial shape, expression and appearance with a neural radiance field, namely Morphable Facial NeRF. |
Yiyu Zhuang; Hao Zhu; Xusen Sun; Xun Cao; |
101 | PointInst3D: Segmenting 3D Instances By Points Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast, we propose a fully convolutional 3D point cloud instance segmentation method that works in a per-point prediction fashion. |
Tong He; Wei Yin; Chunhua Shen; Anton van den Hengel; |
102 | Cross-Modal 3D Shape Generation and Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces. |
Zezhou Cheng; Menglei Chai; Jian Ren; Hsin-Ying Lee; Kyle Olszewski; Zeng Huang; Subhransu Maji; Sergey Tulyakov; |
103 | Latent Partition Implicit with Surface Codes for 3D Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Current solutions learn various primitives and blend the primitives directly in the spatial space, which still struggle to approximate the 3D shape accurately. To resolve this problem, we introduce a novel implicit representation to represent a single 3D shape as a set of parts in the latent space, towards both highly accurate and plausibly interpretable shape modeling. |
Chao Chen; Yu-Shen Liu; Zhizhong Han; |
104 | Implicit Field Supervision for Robust Non-rigid Shape Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce an approach based on an auto-decoder framework, that learns a continuous shape-wise deformation field over a fixed template. |
Ramana Sundararaman; Gautam Pai; Maks Ovsjanikov; |
105 | Learning Self-Prior for Mesh Denoising Using Dual Graph Convolutional Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This study proposes a deep-learning framework for mesh denoising from a single noisy input, where two graph convolutional networks are trained jointly to filter vertex positions and facet normals apart. |
Shota Hattori; Tatsuya Yatagawa; Yutaka Ohtake; Hiromasa Suzuki; |
106 | DiffConv: Analyzing Irregular Point Clouds with An Irregular View Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel graph convolution named Difference Graph Convolution (diffConv), which does not rely on a regular view. |
Manxi Lin; Aasa Feragen; |
107 | PD-Flow: A Point Cloud Denoising Framework with Normalizing Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel deep learning-based denoising model, that incorporates normalizing flows and noise disentanglement techniques to achieve high denoising accuracy. |
Aihua Mao; Zihui Du; Yu-Hui Wen; Jun Xuan; Yong-Jin Liu; |
108 | SeedFormer: Patch Seeds Based Point Cloud Completion with Upsample Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel SeedFormer to improve the ability of detail preservation and recovery in point cloud completion. |
Haoran Zhou; Yun Cao; Wenqing Chu; Junwei Zhu; Tong Lu; Ying Tai; Chengjie Wang; |
109 | DeepMend: Learning Occupancy Functions to Represent Shape for Repair Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present DeepMend, a novel approach to reconstruct restorations to fractured shapes using learned occupancy functions. |
Nikolas Lamb; Sean Banerjee; Natasha Kholgade Banerjee; |
110 | A Repulsive Force Unit for Garment Collision Handling in Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite recent success, deep learning-based methods for predicting 3D garment deformation under body motion suffer from interpenetration problems between the garment and the body. To address this problem, we propose a novel collision handling neural network layer called Repulsive Force Unit (ReFU). |
Qingyang Tan; Yi Zhou; Tuanfeng Wang; Duygu Ceylan; Xin Sun; Dinesh Manocha; |
111 | Shape-Pose Disentanglement Using SE(3)-Equivariant Vector Neurons Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce an unsupervised technique for encoding point clouds into a canonical shape representation, by disentangling shape and pose. |
Oren Katzir; Dani Lischinski; Daniel Cohen-Or; |
112 | 3D Equivariant Graph Implicit Functions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In recent years, neural implicit representations have made remarkable progress in modeling of 3D shapes with arbitrary topology. In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations. |
Yunlu Chen; Basura Fernando; Hakan Bilen; Matthias Nieß,ner; Efstratios Gavves; |
113 | PatchRD: Detail-Preserving Shape Completion By Learning Patch Retrieval and Deformation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a data-driven shape completion approach that focuses on completing geometric details of missing regions of 3D shapes. |
Bo Sun; Vladimir G. Kim; Noam Aigerman; Qixing Huang; Siddhartha Chaudhuri; |
114 | 3D Shape Sequence of Human Comparison and Classification Using Current and Varifolds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we address the task of the comparison and the classification of 3D shape sequences of human. |
Emery Pierson; Mohamed Daoudi; Sylvain Arguillere; |
115 | Conditional-Flow NeRF: Accurate 3D Modelling with Reliable Uncertainty Quantification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This information is paramount in real applications such as medical diagnosis or autonomous driving where, to reduce potentially catastrophic failures, the confidence on the model outputs must be included into the decision-making process. In this context, we introduce Conditional-Flow NeRF (CF-NeRF), a novel probabilistic framework to incorporate uncertainty quantification into NeRF-based approaches. |
Jianxiong Shen; Antonio Agudo; Francesc Moreno-Noguer; Adria Ruiz; |
116 | Unsupervised Pose-Aware Part Decomposition for Man-Made Articulated Objects Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose PPD (unsupervised Pose-aware Part Decomposition) to address a novel setting that explicitly targets man-made articulated objects with mechanical joints, considering the part poses in part parsing. |
Yuki Kawana; Yusuke Mukuta; Tatsuya Harada; |
117 | MeshUDF: Fast and Differentiable Meshing of Unsigned Distance Field Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we extend the marching cube algorithm to handle UDFs, both fast and accurately. |
Benoî,t Guillard; Federico Stella; Pascal Fua; |
118 | SPE-Net: Boosting Point Cloud Analysis Via Rotation Robustness Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net. |
Zhaofan Qiu; Yehao Li; Yu Wang; Yingwei Pan; Ting Yao; Tao Mei; |
119 | The Shape Part Slot Machine: Contact-Based Reasoning for Generating 3D Shapes from Parts Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the Shape Part Slot Machine, a new method for assembling novel 3D shapes from existing parts by performing contact-based reasoning. |
Kai Wang; Paul Guerrero; Vladimir G. Kim; Siddhartha Chaudhuri; Minhyuk Sung; Daniel Ritchie; |
120 | Spatiotemporal Self-Attention Modeling with Temporal Patch Shift for Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. |
Wangmeng Xiang; Chao Li; Biao Wang; Xihan Wei; Xian-Sheng Hua; Lei Zhang; |
121 | Proposal-Free Temporal Action Detection Via Global Segmentation Mask Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, for the first time, we propose a proposal-free Temporal Action detection model with Global Segmentation mask (TAGS). |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
122 | Semi-Supervised Temporal Action Detection with Proposal-Free Masking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to their sequential localization (e.g, proposal generation) and classification design, they are prone to proposal error propagation. To overcome this limitation, in this work we propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) with a parallel localization (mask generation) and classification architecture. |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
123 | Zero-Shot Temporal Action Detection Via Vision-Language Prompting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the sequential localization (e.g, proposal generation) and classification design, it is prone to localization error propagation. To overcome this problem, in this paper we propose a novel zero-Shot Temporal Action detection model via vision-LanguagE prompting (STALE). |
Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang; |
124 | CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This poses two major challenges: (1) spatial domain shift between web images and video frames (2) modality gap between image and video data. To address these challenges, we propose Cycle Domain Adaptation (CycDA), a cycle-based approach for unsupervised image-to-video domain adaptation. |
Wei Lin; Anna Kukleva; Kunyang Sun; Horst Possegger; Hilde Kuehne; Horst Bischof; |
125 | S2N: Suppression-Strengthen Network for Event-Based Recognition Under Variant Illuminations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the event degradation due to imaging under low illumination obscures the correlation between event signals and brings uncertainty into event representation. Targeting this issue, we present a novel suppression-strengthen network (S2N) to augment the event feature representation after suppressing the influence of degradation. |
Zengyu Wan; Yang Wang; Ganchao Tan; Yang Cao; Zheng-Jun Zha; |
126 | CMD: Self-Supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. |
Yunyao Mao; Wengang Zhou; Zhenbo Lu; Jiajun Deng; Houqiang Li; |
127 | Expanding Language-Image Pretrained Models for General Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly, instead of pretraining a new model from scratch. |
Bolin Ni; Houwen Peng; Minghao Chen; Songyang Zhang; Gaofeng Meng; Jianlong Fu; Shiming Xiang; Haibin Ling; |
128 | Hunting Group Clues with Transformers for Social Group Activity Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel framework for social group activity recognition. |
Masato Tamura; Rahul Vishwakarma; Ravigopal Vennelakanti; |
129 | Contrastive Positive Mining for Unsupervised 3D Action Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, a Contrastive Positive Mining (CPM) framework is proposed for unsupervised skeleton 3D action representation learning. |
Haoyuan Zhang; Yonghong Hou; Wenjing Zhang; Wanqing Li; |
130 | Target-Absent Human Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images. |
Zhibo Yang; Sounak Mondal; Seoyoung Ahn; Gregory Zelinsky; Minh Hoai; Dimitris Samaras; |
131 | Uncertainty-Based Spatial-Temporal Attention for Online Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we proposed an uncertainty-based spatial-temporal attention for online action detection. |
Hongji Guo; Zhou Ren; Yi Wu; Gang Hua; Qiang Ji; |
132 | Iwin: Human-Object Interaction Detection Via Transformer with Irregular Windows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a new vision Transformer, named Iwin Transformer, which is specifically designed for human-object interaction (HOI) detection, a detailed scene understanding task involving a sequential process of human/object detection and interaction recognition. |
Danyang Tu; Xiongkuo Min; Huiyu Duan; Guodong Guo; Guangtao Zhai; Wei Shen; |
133 | Rethinking Zero-Shot Action Recognition: Learning from Latent Atomic Actions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It enables humans to quickly understand an unseen action given a bunch of atomic actions learned from seen actions. Inspired by this, we propose Jigsaw Network (JigsawNet) which recognizes complex actions through unsupervisedly decomposing them into combinations of atomic actions and bridging group-to-group relationships between visual features and semantic representations. |
Yijun Qian; Lijun Yu; Wenhe Liu; Alexander G. Hauptmann; |
134 | Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we argue that comparing body-parts of multi-person simultaneously can afford us more useful and supplementary interactiveness cues. |
Xiaoqian Wu; Yong-Lu Li; Xinpeng Liu; Junyi Zhang; Yuzhe Wu; Cewu Lu; |
135 | Collaborating Domain-Shared and Target-Specific Feature Clustering for Cross-Domain 3D Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider the problem of cross-domain 3D action recognition in the open-set setting, which has been rarely explored before. |
Qinying Liu; Zilei Wang; |
136 | Is Appearance Free Action Recognition Possible? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. |
Filip Ilic; Thomas Pock; Richard P. Wildes; |
137 | Learning Spatial-Preserved Skeleton Representations for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing skeleton-based spatial-temporal models tend to deteriorate the positional distinguishability of joints, which leads to fuzzy spatial matching and poor explainability. To address these issues, we propose a novel spatial matching strategy consisting of spatial disentanglement and spatial activation. |
Ning Ma; Hongyi Zhang; Xuhui Li; Sheng Zhou; Zhen Zhang; Jun Wen; Haifeng Li; Jingjun Gu; Jiajun Bu; |
138 | Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. |
Mengyuan Chen; Junyu Gao; Shicai Yang; Changsheng Xu; |
139 | Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. |
Boeun Kim; Hyung Jin Chang; Jungho Kim; Jin Young Choi; |
140 | AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper explores the unified formulation of spatial-temporal dynamic computation on top of the recently proposed AdaFocusV2 algorithm, contributing to an improved AdaFocusV3 framework. |
Yulin Wang; Yang Yue; Xinhong Xu; Ali Hassani; Victor Kulikov; Nikita Orlov; Shiji Song; Humphrey Shi; Gao Huang; |
141 | Panoramic Human Activity Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneously achieve the the recognition of individual actions, social group activities, and global activities. |
Ruize Han; Haomin Yan; Jiacheng Li; Songmiao Wang; Wei Feng; Song Wang; |
142 | Delving Into Details: Synopsis-to-Detail Networks for Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the details in video recognition with the aim to improve the accuracy. |
Shuxian Liang; Xu Shen; Jianqiang Huang; Xian-Sheng Hua; |
143 | A Generalized \& Robust Framework for Timestamp Supervision in Temporal Action Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel Expectation-Maximization (EM) based approach which leverages label uncertainty of unlabelled frames and is robust enough to accommodate possible annotation errors. |
Rahul Rahaman; Dipika Singhania; Alexandre Thiery; Angela Yao; |
144 | Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a hierarchical matching model to support comprehensive similarity measure at global, temporal and spatial levels via a zoom-in matching module.We further propose a mixed-supervised hierarchical contrastive learning (HCL) in training, which not only employs supervised contrastive learning to differentiate videos at different levels, but also utilizes cycle consistency as weak supervision to align discriminative temporal clips or spatial patches. |
Sipeng Zheng; Shizhe Chen; Qin Jin; |
145 | PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an optimizing framework to provide robust visual privacy protection along the human action recognition pipeline. |
Carlos Hinojosa; Miguel Marquez; Henry Arguello; Ehsan Adeli; Li Fei-Fei; Juan Carlos Niebles; |
146 | Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a scale-aware weakly supervised learning approach to capture local and salient anomalous patterns from the background, using only coarse video-level labels as supervision. |
Guoqiu Li; Guanxiong Cai; Xingyu Zeng; Rui Zhao; |
147 | Compound Prototype Matching for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. |
Yifei Huang; Lijin Yang; Yoichi Sato; |
148 | Continual 3D Convolutional Neural Networks for Real-Time Processing of Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Continual 3D Convolutional Neural Networks (Co3D CNNs), a new computational formulation of spatio-temporal 3D CNNs, in which videos are processed frame-by-frame rather than by clip. |
Lukas Hedegaard; Alexandros Iosifidis; |
149 | Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. |
Tianjiao Li; Lin Geng Foo; Qiuhong Ke; Hossein Rahmani; Anran Wang; Jinghua Wang; Jun Liu; |
150 | Dynamic Local Aggregation Network with Adaptive Clusterer for Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To overcome these drawbacks, we introduce DLAN-AC, a Dynamic Local Aggregation Network with Adaptive Clusterer, for anomaly detection. |
Zhiwei Yang; Peng Wu; Jing Liu; Xiaotao Liu; |
151 | Action Quality Assessment with Temporal Parsing Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing state-of-the-art methods typically rely on the holistic video representations for score regression or ranking, which limits the generalization to capture fine-grained intra-class variation. To overcome the above limitation, we propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations. |
Yang Bai; Desen Zhou; Songyang Zhang; Jian Wang; Errui Ding; Yu Guan; Yang Long; Jingdong Wang; |
152 | Entry-Flipped Transformer for Inference and Prediction of Participant Behavior Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key idea is to model the spatio-temporal relations among participants in a manner that is robust to error accumulation during frame-wise inference and prediction. |
Bo Hu; Tat-Jen Cham; |
153 | Pairwise Contrastive Learning Network for Action Quality Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it ignores the subtle and critical difference between videos. To address this problem, a new pairwise contrastive learning network (PCLN) is proposed to concern these differences and form an end-to-end AQA model with basic regression network. |
Mingzhe Li; Hong-Bo Zhang; Qing Lei; Zongwen Fan; Jinghua Liu; Ji-Xiang Du; |
154 | Geometric Features Informed Multi-Person Human-Object Interaction Recognition in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consider that geometric features such as human pose and object position provide meaningful information to understand HOIs, we argue to combine the benefits of both visual and geometric features in HOI recognition, and propose a novel Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN).To demonstrate the novelty and effectiveness of our method in challenging scenarios, we propose a new multi-person HOI dataset (MPHOI-72). |
Tanqiu Qiao; Qianhui Men; Frederick W. B. Li; Yoshiki Kubotani; Shigeo Morishima; Hubert P. H. Shum; |
155 | ActionFormer: Localizing Moments of Actions with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present ActionFormer–a simple yet powerful model to identify actions in time and recognize their categories in a single shot, without using action proposals or relying on pre-defined anchor windows. |
Chen-Lin Zhang; Jianxin Wu; Yin Li; |
156 | SocialVAE: Human Trajectory Prediction Using Timewise Latents Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose SocialVAE, a novel approach for human trajectory prediction. |
Pei Xu; Jean-Bernard Hayet; Ioannis Karamouzas; |
157 | Shape Matters: Deformable Patch Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous work always assumes patches to have fixed shapes, such as circles or rectangles, and it does not consider the shape of patches as a factor in patch attacks. To explore this issue, we propose a novel Deformable Patch Representation (DPR) that can harness the geometric structure of triangles to support the differentiable mapping between contour modeling and masks. |
Zhaoyu Chen; Bo Li; Shuang Wu; Jianghe Xu; Shouhong Ding; Wenqiang Zhang; |
158 | Frequency Domain Model Augmentation for Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the observation that the transferability of adversarial examples can be improved by attacking diverse models simultaneously, model augmentation methods which simulate different models by using transformed images are proposed. |
Yuyang Long; Qilong Zhang; Boheng Zeng; Lianli Gao; Xianglong Liu; Jian Zhang; Jingkuan Song; |
159 | Prior-Guided Adversarial Initialization for Fast Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the difference between the training processes of SAT and FAT and observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting. |
Xiaojun Jia; Yong Zhang; Xingxing Wei; Baoyuan Wu; Ke Ma; Jue Wang; Xiaochun Cao; |
160 | Enhanced Accuracy and Robustness Via Multi-Teacher Adversarial Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To improve the robust and clean accuracy of small models, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the adversarial training process of small models. |
Shiji Zhao; Jie Yu; Zhenlong Sun; Bo Zhang; Xingxing Wei; |
161 | LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. |
Martin Gubri; Maxime Cordy; Mike Papadakis; Yves Le Traon; Koushik Sen; |
162 | A Large-Scale Multiple-Objective Method for Black-Box Attack Against Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most existing attack methods aim to minimize the true positive rate, which often shows poor attack performance, as another sub-optimal bounding box may be detected around the attacked bounding box to be the new true positive one. To settle this challenge, we propose to minimize the true positive rate and maximize the false positive rate, which can encourage more false positive objects to block the generation of new true positive bounding boxes. |
Siyuan Liang; Longkang Li; Yanbo Fan; Xiaojun Jia; Jingzhi Li; Baoyuan Wu; Xiaochun Cao; |
163 | GradAuto: Energy-Oriented Attack on Dynamic Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the robustness of dynamic neural networks against energy-oriented attacks. |
Jianhong Pan; Qichen Zheng; Zhipeng Fan; Hossein Rahmani; Qiuhong Ke; Jun Liu; |
164 | A Spectral View of Randomized Smoothing Under Common Corruptions: Benchmarking and Improving Certified Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we explore a new problem setting to critically examine how the adversarial robustness guarantees change when state-of-the-art randomized smoothing-based certifications encounter common corruptions of the test data. |
Jiachen Sun; Akshay Mehra; Bhavya Kailkhura; Pin-Yu Chen; Dan Hendrycks; Jihun Hamm; Z. Morley Mao; |
165 | Improving Adversarial Robustness of 3D Point Cloud Classification Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we design two innovative methodologies to improve the adversarial robustness of 3D point cloud classification models. |
Guanlin Li; Guowen Xu; Han Qiu; Ruan He; Jiwei Li; Tianwei Zhang; |
166 | Learning Extremely Lightweight and Robust Model with Differentiable Constraints on Sparsity and Condition Number Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a framework for building extremely lightweight models, which combines tensor product with the differentiable constraints for reducing condition number and promoting sparsity. |
Xian Wei; Yangyu Xu; Yanhui Huang; Hairong Lv; Hai Lan; Mingsong Chen; Xuan Tang; |
167 | RIBAC: Towards Robust and Imperceptible Backdoor Attack Against Compact DNN Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to study and develop Robust and Imperceptible Backdoor Attack against Compact DNN models (RIBAC). |
Huy Phan; Cong Shi; Yi Xie; Tianfang Zhang; Zhuohang Li; Tianming Zhao; Jian Liu; Yan Wang; Yingying Chen; Bo Yuan; |
168 | Boosting Transferability of Targeted Adversarial Examples Via Hierarchical Generative Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we develop a simple yet effective framework to craft targeted transfer-based adversarial examples, applying a hierarchical generative network. |
Xiao Yang; Yinpeng Dong; Tianyu Pang; Hang Su; Jun Zhu; |
169 | Adaptive Image Transformations for Transfer-Based Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel architecture, called Adaptive Image Transformation Learner (AITL), which incorporates different image transformation operations into a unified framework to further improve the transferability of adversarial examples. |
Zheng Yuan; Jie Zhang; Shiguang Shan; |
170 | Generative Multiplane Images: Making A 2D GAN 3D-Aware Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: What is really needed to make an existing 2D GAN 3Daware? To answer this question, we modify a classical GAN, i.e., StyleGANv2, as little as possible. |
Xiaoming Zhao; Fangchang Ma; David Gü,era; Zhile Ren; Alexander G. Schwing; Alex Colburn; |
171 | AdvDO: Realistic Adversarial Attacks for Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While many prior works aim to achieve higher prediction accuracy, few studies the adversarial robustness of their methods. To bridge this gap, we propose to study the adversarial robustness of data-driven trajectory prediction systems. |
Yulong Cao; Chaowei Xiao; Anima Anandkumar; Danfei Xu; Marco Pavone; |
172 | Adversarial Contrastive Learning Via Asymmetric InfoNCE Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this mechanism can be potentially flawed, since adversarial perturbations may cause instance-level identity confusion, which can impede CL performance by pulling together different instances with separate identities. To address this issue, we propose to treat adversarial samples unequally when contrasted to positive and negative samples, with an asymmetric InfoNCE objective (A-InfoNCE) that allows discriminating considerations of adversarial samples. |
Qiying Yu; Jieming Lou; Xianyuan Zhan; Qizhang Li; Wangmeng Zuo; Yang Liu; Jingjing Liu; |
173 | One Size Does NOT Fit All: Data-Adaptive Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we argue that, for the attackable examples, traditional adversarial training which utilizes a fixed size perturbation ball can create adversarial examples that deviate far away from the original class towards the target class. |
Shuo Yang; Chang Xu; |
174 | UniCR: Universally Approximated Certified Robustness Via Randomized Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we propose the first universally approximated certified robustness (UniCR) framework, which can approximate the robustness certification of \emph{any} input on \emph{any} classifier against \emph{any} $\ell_p$ perturbations with noise generated by \emph{any} continuous probability distribution. |
Hanbin Hong; Binghui Wang; Yuan Hong; |
175 | Hardly Perceptible Trojan Attack Against Neural Networks with Bit Flips Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel attack, namely hardly perceptible Trojan attack (HPT). |
Jiawang Bai; Kuofeng Gao; Dihong Gong; Shu-Tao Xia; Zhifeng Li; Wei Liu; |
176 | Robust Network Architecture Search Via Feature Distortion Restraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Robust Network Architecture Search (RNAS) to obtain a robust network against adversarial attacks. |
Yaguan Qian; Shenghui Huang; Bin Wang; Xiang Ling; Xiaohui Guan; Zhaoquan Gu; Shaoning Zeng; Wujie Zhou; Haijiang Wang; |
177 | SecretGen: Privacy Recovery on Pre-trained Models Via Distribution Discrimination Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it raises extensive concerns on whether these pre-trained models would leak privacy-sensitive information of their training data. Thus, in this work, we aim to answer the following questions: Can we effectively recover private information from these pre-trained models? |
Zhuowen Yuan; Fan Wu; Yunhui Long; Chaowei Xiao; Bo Li; |
178 | Triangle Attack: A Query-Efficient Decision-Based Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we find that a benign sample, the current and the next adversarial examples can naturally construct a triangle in a subspace for any iterative attacks. |
Xiaosen Wang; Zeliang Zhang; Kangheng Tong; Dihong Gong; Kun He; Zhifeng Li; Wei Liu; |
179 | Data-Free Backdoor Removal Based on Channel Lipschitzness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce a novel concept called Channel Lipschitz Constant (CLC), which is defined as the Lipschitz constant of the mapping from the input images to the output of each channel. |
Runkai Zheng; Rongjun Tang; Jianze Li; Li Liu; |
180 | Black-Box Dissector: Towards Erasing-Based Hard-Label Model Stealing Attack Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a novel hard-label model stealing method termed black-box dissector, which consists of two erasing-based modules. |
Yixu Wang; Jie Li; Hong Liu; Yan Wang; Yongjian Wu; Feiyue Huang; Rongrong Ji; |
181 | Learning Energy-Based Models with Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study a new approach to learning energy-based models (EBMs) based on adversarial training (AT). |
Xuwang Yin; Shiying Li; Gustavo K. Rohde; |
182 | Adversarial Label Poisoning Attack on Graph Neural Networks Via Label Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we propose a label poisoning attack framework for graph convolutional networks (GCNs), inspired by the equivalence between label propagation and decoupled GCNs that separate message passing from neural networks. |
Ganlin Liu; Xiaowei Huang; Xinping Yi; |
183 | Revisiting Outer Optimization in Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. |
Ali Dabouei; Fariborz Taherkhani; Sobhan Soleymani; Nasser M. Nasrabadi; |
184 | Zero-Shot Attribute Attacks on Fine-Grained Recognition Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such attacks, in particular, universal perturbations that are class-agnostic and ideally should generalize to unseen classes, however, cannot leverage or capture small distinctions among fine-grained classes. Therefore, we propose a compositional attribute-based framework for generating adversarial attacks on zero-shot fine-grained recognition models. |
Nasim Shafiee; Ehsan Elhamifar; |
185 | Towards Effective and Robust Neural Trojan Defenses Via Input Filtering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most defense methods still make out-of-date assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel filtering defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify all potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. |
Kien Do; Haripriya Harikumar; Hung Le; Dung Nguyen; Truyen Tran; Santu Rana; Dang Nguyen; Willy Susilo; Svetha Venkatesh; |
186 | Scaling Adversarial Training to Large Perturbation Bounds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. |
Sravanti Addepalli; Samyak Jain; Gaurang Sriramanan; R. Venkatesh Babu; |
187 | Exploiting The Local Parabolic Landscapes of Adversarial Losses to Accelerate Black-Box Adversarial Attack Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to improve the query efficiency of black-box methods by exploiting the smoothness of the local loss landscape. |
Hoang Tran; Dan Lu; Guannan Zhang; |
188 | Generative Domain Adaptation for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, insufficient supervision of unlabeled target domains and neglect of low-level feature alignment degrade the performances of existing methods. To address these issues, we propose a novel perspective of UDA FAS that directly fits the target data to the models, i.e., stylizes the target data to the source-domain style via image translation, and further feeds the stylized data into the well-trained source model for classification. |
Qianyu Zhou; Ke-Yue Zhang; Taiping Yao; Ran Yi; Kekai Sheng; Shouhong Ding; Lizhuang Ma; |
189 | MetaGait: Learning to Learn An Omni Sample Adaptive Representation for Gait Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, gait recognition still suffers from the conflicts between the limited binary visual clues of the silhouette and numerous covariates with diverse scales, which brings challenges to the model’s adaptiveness. In this paper, we address this conflict by developing a novel MetaGait that learns to learn an omni sample adaptive representation. |
Huanzhang Dou; Pengyi Zhang; Wei Su; Yunlong Yu; Xi Li; |
190 | GaitEdge: Beyond Plain End-to-End Gait Recognition for Better Practicality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel end-to-end framework named GaitEdge which can effectively block gait-irrelevant information and release end-to-end training potential. |
Junhao Liang; Chao Fan; Saihui Hou; Chuanfu Shen; Yongzhen Huang; Shiqi Yu; |
191 | UIA-ViT: Unsupervised Inconsistency-Aware Method Based on Vision Transformer for Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Some existing methods generate large-scale synthesized data with location annotations, which is time-consuming. Others generate forgery location labels by subtracting paired real and fake images, yet such paired data is difficult to collected and the generated label is usually discontinuous. To overcome these limitations, we propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT. |
Wanyi Zhuang; Qi Chu; Zhentao Tan; Qiankun Liu; Haojie Yuan; Changtao Miao; Zixiang Luo; Nenghai Yu; |
192 | Effective Presentation Attack Detection Driven By Face Related Task Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike this specific PAD task, other face related tasks trained by huge amount of real faces (e.g. face recognition and attribute editing) can be effectively adopted into different application scenarios. Inspired by this, we propose to trade position of PAD and face related work in a face system and apply the free acquired prior knowledge from face related tasks to solve face PAD, so as to improve the generalization ability in detecting PAs. |
Wentian Zhang; Haozhe Liu; Feng Liu; Raghavendra Ramachandra; Christoph Busch; |
193 | PPT: Token-Pruned Pose Transformer for Monocular and Multi-View Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens. |
Haoyu Ma; Zhe Wang; Yifei Chen; Deying Kong; Liangjian Chen; Xingwei Liu; Xiangyi Yan; Hao Tang; Xiaohui Xie; |
194 | AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present AvatarPoser, the first learning-based method that predicts full-body poses in world coordinates using only motion input from the user’s head and hands. |
Jiaxi Jiang; Paul Streli; Huajian Qiu; Andreas Fender; Larissa Laich; Patrick Snape; Christian Holz; |
195 | P-STMO: Pre-trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. |
Wenkang Shan; Zhenhua Liu; Xinfeng Zhang; Shanshe Wang; Siwei Ma; Wen Gao; |
196 | D\&D: Learning Human Dynamics from Dynamic Camera Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present D&D (Learning Human Dynamics from Dynamic Camera), which leverages the laws of physics to reconstruct 3D human motion from the in-the-wild videos with a moving camera. |
Jiefeng Li; Siyuan Bian; Chao Xu; Gang Liu; Gang Yu; Cewu Lu; |
197 | Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions. |
Qihao Liu; Yi Zhang; Song Bai; Alan Yuille; |
198 | COUCH: Towards Controllable Human-Chair Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing works on synthesizing human scene interaction focus on the high-level control of interacting with a particular object without considering fine-grained control of limb motion variations within one task. In this work, we drive this direction and study the problem of synthesizing scene interactions conditioned on a wide range of contact positions on the object. |
Xiaohan Zhang; Bharat Lal Bhatnagar; Sebastian Starke; Vladimir Guzov; Gerard Pons-Moll; |
199 | Identity-Aware Hand Mesh Estimation and Personalization from RGB Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an identity-aware hand mesh estimation model, which can incorporate the identity information represented by the intrinsic shape parameters of the subject. |
Deying Kong; Linguang Zhang; Liangjian Chen; Haoyu Ma; Xiangyi Yan; Shanlin Sun; Xingwei Liu; Kun Han; Xiaohui Xie; |
200 | C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose to transfer 2D HPE annotation information within the existing large-scale RGB datasets (e.g., MS COCO) to 3D task, using unlabelled RGB-point cloud sequence easy to acquire for linking 2D and 3D domains. |
Cunlin Wu; Yang Xiao; Boshen Zhang; Mingyang Zhang; Zhiguo Cao; Joey Tianyi Zhou; |
201 | Pose-NDF: Modeling Human Pose Manifolds with Neural Distance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Pose-NDF, a continuous model for plausible human poses based on neural distance fields (NDFs). |
Garvita Tiwari; Dimitrije Anti?; Jan Eric Lenssen; Nikolaos Sarafianos; Tony Tung; Gerard Pons-Moll; |
202 | CLIFF: Carrying Location Information in Full Frames Into Human Pose and Shape Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, cropping, their first step, discards the location information from the very beginning, which makes themselves unable to accurately predict the global rotation in the original camera coordinate system. To address this problem, we propose to Carry Location Information in Full Frames (CLIFF) into this task. |
Zhihao Li; Jianzhuang Liu; Zhensong Zhang; Songcen Xu; Youliang Yan; |
203 | DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. |
Ailing Zeng; Xuan Ju; Lei Yang; Ruiyuan Gao; Xizhou Zhu; Bo Dai; Qiang Xu; |
204 | SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, for rarely seen or occluded actions, the estimated positions of multiple joints largely deviate from the ground truth values for a consecutive sequence of frames, rendering significant jitters on them. To tackle this problem, we propose to attach a dedicated temporal-only refinement network to existing pose estimators for jitter mitigation, named SmoothNet. |
Ailing Zeng; Lei Yang; Xuan Ju; Jiefeng Li; Jianyi Wang; Qiang Xu; |
205 | PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a simple yet effective data augmentation method, termed Pose Transformation (PoseTrans), to alleviate the aforementioned problems. |
Wentao Jiang; Sheng Jin; Wentao Liu; Chen Qian; Ping Luo; Si Liu; |
206 | Multi-Person 3D Pose and Shape Estimation Via Inverse Kinematics and Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) transformer-based relation-aware refinement techniques. |
Junuk Cha; Muhammad Saqlain; GeonU Kim; Mingyu Shin; Seungryul Baek; |
207 | Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task from the view of interpolation. |
Xiaoning Sun; Qiongjie Cui; Huaijiang Sun; Bin Li; Weiqing Li; Jianfeng Lu; |
208 | Structural Triangulation: A Closed-Form Solution to Constrained 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Structural Triangulation, a closed-form solution for optimal 3D human pose considering multi-view 2D pose estimations, calibrated camera parameters, and bone lengths. |
Zhuo Chen; Xu Zhao; Xiaoyue Wan; |
209 | Audio-Driven Stylized Gesture Generation with Flow-Based Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new end-to-end flow-based model, which can generate audio-driven gestures of arbitrary styles without the preprocessing procedure and style labels. |
Sheng Ye; Yu-Hui Wen; Yanan Sun; Ying He; Ziyang Zhang; Yaoyuan Wang; Weihua He; Yong-Jin Liu; |
210 | Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop a self-constrained prediction-verification network to characterize and learn the structural correlation between keypoints during training. |
Zhehan Kan; Shuoshuo Chen; Zeng Li; Zhihai He; |
211 | UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present UnrealEgo, a new large-scale naturalistic dataset for egocentric 3D human pose estimation.We next generate a large corpus of human motions. |
Hiroyasu Akada; Jian Wang; Soshi Shimada; Masaki Takahashi; Christian Theobalt; Vladislav Golyanik; |
212 | Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into various graph spectrum bands to provide richer information, promoting more comprehensive feature extraction. To address the second issue, body parts are modeled separately to learn diverse dynamics, which enables finer feature extraction along the spatial dimensions. Integrating the above two designs, we propose a novel skeleton-parted graph scattering network (SPGSN). |
Maosen Li; Siheng Chen; Zijing Zhang; Lingxi Xie; Qi Tian; Ya Zhang; |
213 | Rethinking Keypoint Representations: Modeling Keypoints and Poses As Objects for Multi-Person Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated to find a more efficient solution, we propose to model individual keypoints and sets of spatially related keypoints (i.e., poses) as objects within a dense single-stage anchor-based detection framework. |
William McNally; Kanav Vats; Alexander Wong; John McPhee; |
214 | VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we perform a systematic evaluation of the existing methods and find that they get notably larger errors when tested on different cameras, human poses and appearance. To address the problem, we introduce VirtualPose, a two-stage learning framework to exploit the hidden free lunch specific to this task, i.e. generating infinite number of poses and cameras for training models at no cost. |
Jiajun Su; Chunyu Wang; Xiaoxuan Ma; Wenjun Zeng; Yizhou Wang; |
215 | Poseur: Direct Human Pose Regression with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a direct, regression-based approach to 2D human pose estimation from single images. |
Weian Mao; Yongtao Ge; Chunhua Shen; Zhi Tian; Xinlong Wang; Zhibin Wang; Anton van den Hengel; |
216 | SimCC: A Simple Coordinate Classification Perspective for Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the long-standing quantization error problem in the 2D heatmap-based methods leads to several well-known drawbacks: 1) The performance for the low-resolution inputs is limited 2) To improve the feature map resolution for higher localization precision, multiple costly upsampling layers are required 3) Extra post-processing is adopted to reduce the quantization error. To address these issues, we aim to explore a brand new scheme, called SimCC, which reformulates HPE as two classification tasks for horizontal and vertical coordinates. |
Yanjie Li; Sen Yang; Peidong Liu; Shoukui Zhang; Yunxiao Wang; Zhicheng Wang; Wankou Yang; Shu-Tao Xia; |
217 | Regularizing Vector Embedding in Bottom-Up Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that the different dimensions of embeddings are highly linearly correlated. To address this issue, we impose an additional constraint on the embeddings during training phase. |
Haixin Wang; Lu Zhou; Yingying Chen; Ming Tang; Jinqiao Wang; |
218 | A Visual Navigation Perspective for Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, convergence and efficiency are two challenges of this inference procedure. In this paper, we take a deeper look at the inference of analysis-by-synthesis from the perspective of visual navigation, and investigate what is a good navigation policy for this specific task. |
Jiaxin Guo; Fangxun Zhong; Rong Xiong; Yun-Hui Liu; Yue Wang; Yiyi Liao; |
219 | Faster VoxelPose: Real-Time 3D Human Pose Estimation By Orthographic Projection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes. We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes and estimating X, Y, Z coordinates from them separately. |
Hang Ye; Wentao Zhu; Chunyu Wang; Rujie Wu; Yizhou Wang; |
220 | Learning to Fit Morphable Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we build upon recent advances in learned optimization and propose an update rule inspired by the classic Levenberg-Marquardt algorithm. |
Vasileios Choutas; Federica Bogo; Jingjing Shen; Julien Valentin; |
221 | EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing datasets are limited in terms of either size, capture/annotation modalities, ground-truth quality, or interaction diversity. We fill this gap by proposing EgoBody, a novel large-scale dataset for human pose, shape and motion estimation from egocentric views, during interactions in complex 3D scenes. |
Siwei Zhang; Qianli Ma; Yan Zhang; Zhiyin Qian; Taein Kwon; Marc Pollefeys; Federica Bogo; Siyu Tang; |
222 | Grasp’D: Differentiable Contact-Rich Grasp Synthesis for Multi-Fingered Hands Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents Grasp’D, an approach to grasp synthesis by differentiable contact simulation that can work with both known models and visual inputs. |
Dylan Turpin; Liquan Wang; Eric Heiden; Yun-Chun Chen; Miles Macklin; Stavros Tsogkas; Sven Dickinson; Animesh Garg; |
223 | AutoAvatar: Autoregressive Neural Fields for Dynamic Avatar Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Neural fields such as implicit surfaces have recently enabled avatar modeling from raw scans without explicit temporal correspondences. In this work, we exploit autoregressive modeling to further extend this notion to capture dynamic effects, such as soft-tissue deformations. |
Ziqian Bai; Timur Bagautdinov; Javier Romero; Michael Zollhö,fer; Ping Tan; Shunsuke Saito; |
224 | Deep Radial Embedding for Visual Sequence Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we propose an objective function named RadialCTC that constrains sequence features on a hypersphere while retaining the iterative alignment mechanism of CTC. |
Yuecong Min; Peiqi Jiao; Yanan Li; Xiaotao Wang; Lei Lei; Xiujuan Chai; Xilin Chen; |
225 | SAGA: Stochastic Whole-Body Grasping with Contact Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a multi-task generative model, to jointly learn static whole-body grasping poses and human-object contacts. |
Yan Wu; Jiahao Wang; Yan Zhang; Siwei Zhang; Otmar Hilliges; Fisher Yu; Siyu Tang; |
226 | Neural Capture of Animatable 3D Human from Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views. |
Gusi Te; Xiu Li; Xiao Li; Jinglu Wang; Wei Hu; Yan Lu; |
227 | General Object Pose Transformation Network from Unpaired Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address a problem of novel general object pose transformation from unpaired data. |
Yukun Su; Guosheng Lin; Ruizhou Sun; Qingyao Wu; |
228 | Compositional Human-Scene Interaction Synthesis with Semantic Control Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our goal is to synthesize humans interacting with a given 3D scene controlled by high-level semantic specifications as pairs of action categories and object instances, e.g., “sit on the chair”. |
Kaifeng Zhao; Shaofei Wang; Yan Zhang; Thabo Beeler; Siyu Tang; |
229 | PressureVision: Estimating Hand Pressure from A Single RGB Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore the possibility of using a conventional RGB camera to infer hand pressure, enabling machine perception of hand pressure from uninstrumented hands and surfaces. |
Patrick Grady; Chengcheng Tang; Samarth Brahmbhatt; Christopher D. Twigg; Chengde Wan; James Hays; Charles C. Kemp; |
230 | PoseScript: 3D Human Poses from Natural Language Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce the PoseScript dataset, which pairs a few thousand 3D human poses from AMASS with rich human-annotated descriptions of the body parts and their spatial relationships. |
Ginger Delmas; Philippe Weinzaepfel; Thomas Lucas; Francesc Moreno-Noguer; Gré,gory Rogez; |
231 | DProST: Dynamic Projective Spatial Transformer Network for 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, projective geometry in the camera space is not considered in those methods and causes performance degradation. In this regard, we propose a new pose estimation system based on a projective grid instead of object vertices. |
Jaewoo Park; Nam Ik Cho; |
232 | 3D Interacting Hand Pose Estimation By Hand De-Occlusion and Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To tackle these two challenges, we propose a novel Hand De-occlusion and Removal (HDR) framework to perform hand de-occlusion and distractor removal.We also propose the first large-scale synthetic amodal hand dataset, termed Amodal InterHand Dataset (AIH), to facilitate model training and promote the development of the related research. |
Hao Meng; Sheng Jin; Wentao Liu; Chen Qian; Mengxiang Lin; Wanli Ouyang; Ping Luo; |
233 | Pose for Everything: Towards Category-Agnostic Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. |
Lumin Xu; Sheng Jin; Wang Zeng; Wentao Liu; Chen Qian; Wanli Ouyang; Ping Luo; Xiaogang Wang; |
234 | PoseGPT: Quantization-Based 3D Human Motion Generation and Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we generate motion conditioned on observations of arbitrary length, including none. To solve this generalized problem, we propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences. |
Thomas Lucas; Fabien Baradel; Philippe Weinzaepfel; Gré,gory Rogez; |
235 | DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the lack of diversity of datasets, the generalization ability of the pose estimator is poor. To solve this problem, we propose a pose augmentation solution via DH forward kinematics model, which we call DH-AUG. |
Linzhi Huang; Jiahao Liang; Weihong Deng; |
236 | Estimating Spatially-Varying Lighting in Urban Scenes with Disentangled Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present an end-to-end network for spatially-varying outdoor lighting estimation in urban scenes given a single limited field-of-view LDR image and any assigned 2D pixel position. |
Jiajun Tang; Yongjie Zhu; Haoyu Wang; Jun Hoong Chan; Si Li; Boxin Shi; |
237 | Boosting Event Stream Super-Resolution with A Recurrent Neural Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods for event stream super-resolution (SR) either require high-quality and high-resolution frames or underperform for large factor SR. To address these problems, we propose a recurrent neural network for event SR without frames. |
Wenming Weng; Yueyi Zhang; Zhiwei Xiong; |
238 | Projective Parallel Single-Pixel Imaging to Overcome Global Illumination in 3D Structure Light Scanning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present projective parallel single-pixel imaging (pPSI), wherein the 4D LTCs are reduced to multiple projection functions to facilitate a highly efficient data capture process. |
Yuxi Li; Huijie Zhao; Hongzhi Jiang; Xudong Li; |
239 | Semantic-Sparse Colorization Network for Deep Exemplar-Based Colorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous approaches have attempted to construct such a correspondence but are faced with two obstacles. First, using luminance channels for the calculation of correspondence is inaccurate. Second, the dense correspondence they built introduces wrong matching results and increases the computation burden. To address these two problems, we propose Semantic-Sparse Colorization Network (SSCN) to transfer both the global image style and detailed semantic-related colors to the gray-scale image in a coarse-to-fine manner. |
Yunpeng Bai; Chao Dong; Zenghao Chai; Andong Wang; Zhengzhuo Xu; Chun Yuan; |
240 | Practical and Scalable Desktop-Based High-Quality Facial Capture Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel desktop-based system for high-quality facial capture including geometry and facial appearance.We additionally present a novel set of binary illumination patterns for efficient acquisition of reflectance and photometric normals using our setup, with diffuse-specular separation. |
Alexandros Lattas; Yiming Lin; Jayanth Kannan; Ekin Ozturk; Luca Filipi; Giuseppe Claudio Guarnera; Gaurav Chawla; Abhijeet Ghosh; |
241 | FAST-VQA: Efficient End-to-End Video Quality Assessment with Fragment Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. |
Haoning Wu; Chaofeng Chen; Jingwen Hou; Liang Liao; Annan Wang; Wenxiu Sun; Qiong Yan; Weisi Lin; |
242 | Physically-Based Editing of Indoor Scene Lighting from A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks. |
Zhengqin Li; Jia Shi; Sai Bi; Rui Zhu; Kalyan Sunkavalli; Miloš Hašan; Zexiang Xu; Ravi Ramamoorthi; Manmohan Chandraker; |
243 | LEDNet: Joint Low-Light Enhancement and Deblurring in The Dark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Training an end-to-end network is also infeasible as no paired data is available to characterize the coexistence of low light and blurs. We address the problem by introducing a novel data synthesis pipeline that models realistic low-light blurring degradations, especially for blurs in saturated regions, e.g., light streaks, that often appear in the night images. |
Shangchen Zhou; Chongyi Li; Chen Change Loy; |
244 | MPIB: An MPI-Based Bokeh Rendering Framework for Realistic Partial Occlusion Effects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this analysis, we propose an MPI representation module combined with a background inpainting module to implement high-resolution scene representation. |
Juewen Peng; Jianming Zhang; Xianrui Luo; Hao Lu; Ke Xian; Zhiguo Cao; |
245 | Real-RawVSR: Real-World Raw Video Super-Resolution with A Benchmark Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering the superiority of raw image SR over sRGB image SR, we construct a real-world raw video SR (Real-RawVSR) dataset and propose a corresponding SR method. |
Huanjing Yue; Zhiming Zhang; Jingyu Yang; |
246 | Transform Your Smartphone Into A DSLR Camera: Learning The ISP in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a trainable Image Signal Processing (ISP) framework that produces DSLR quality images given RAW images captured by a smartphone. |
Ardhendu Shekhar Tripathi; Martin Danelljan; Samarth Shukla; Radu Timofte; Luc Van Gool; |
247 | Learning Deep Non-Blind Image Deconvolution Without Ground Truths Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes an unsupervised deep learning approach for NBID which avoids accessing GT images. |
Yuhui Quan; Zhuojie Chen; Huan Zheng; Hui Ji; |
248 | NEST: Neural Event Stack for Event-Based Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel event representation named Neural Event STack (NEST), which satisfies physical constraints and encodes comprehensive motion and temporal information sufficient for image enhancement. |
Minggui Teng; Chu Zhou; Hanyue Lou; Boxin Shi; |
249 | Editable Indoor Lighting Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method for estimating lighting from a single perspective image of an indoor scene. |
Henrique Weber; Mathieu Garon; Jean-Franç,ois Lalonde; |
250 | Fast Two-Step Blind Optical Aberration Correction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a two-step scheme to correct optical aberrations in a single raw or JPEG image, i.e., without any prior information on the camera or lens. |
Thomas Eboli; Jean-Michel Morel; Gabriele Facciolo; |
251 | Seeing Far in The Dark with Patterned Flash Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new flash technique, named “patterned flash”, for flash imaging at a long distance. |
Zhanghao Sun; Jian Wang; Yicheng Wu; Shree Nayar; |
252 | PseudoClick: Interactive Image Segmentation with Click Imitation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose PseudoClick, a generic framework that enables existing segmentation networks to propose candidate next clicks. |
Qin Liu; Meng Zheng; Benjamin Planche; Srikrishna Karanam; Terrence Chen; Marc Niethammer; Ziyan Wu; |
253 | CT$^2$: Colorization Transformer Via Color Tokens Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Automatic image colorization is an ill-posed problem with multi-modal uncertainty, and there remains two main challenges with previous methods: incorrect semantic colors and under-saturation. In this paper, we propose an end-to-end transformer-based model to overcome these challenges. |
Shuchen Weng; Jimeng Sun; Yu Li; Si Li; Boxin Shi; |
254 | Simple Baselines for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple baseline that exceeds the SOTA methods and is computationally efficient. |
Liangyu Chen; Xiaojie Chu; Xiangyu Zhang; Jian Sun; |
255 | Spike Transformer: Monocular Depth Estimation for Spiking Camera Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the depth estimation task, which is challenging due to the natural properties of spike streams, such as irregularity, continuity, and spatial-temporal correlation, and has not been explored for the spiking camera.Furthermore, we build two spike-based depth datasets. |
Jiyuan Zhang; Lulu Tang; Zhaofei Yu; Jiwen Lu; Tiejun Huang; |
256 | Improving Image Restoration By Revisiting Global Information Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To reduce the inconsistency and improve test-time performance, we propose a simple method called Test-time Local Converter (TLC). |
Xiaojie Chu; Liangyu Chen; Chengpeng Chen; Xin Lu; |
257 | Data Association Between Event Streams and Intensity Frames Under Diverse Baselines Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a learning-based framework to associate event streams and intensity frames under diverse camera baselines, to simultaneously benefit to camera pose estimation under large baseline and depth estimation under small baseline. |
Dehao Zhang; Qiankun Ding; Peiqi Duan; Chu Zhou; Boxin Shi; |
258 | D2HNet: Joint Denoising and Deblurring with Hierarchical Network for Robust Night Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To exploit the information from successive long- and short-exposure images, we propose a learning-based pipeline to fuse them. |
Yuzhi Zhao; Yongzhe Xu; Qiong Yan; Dingdong Yang; Xuehui Wang; Lai-Man Po; |
259 | Learning Graph Neural Networks for Image Style Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study a novel semi-parametric neural style transfer framework that alleviates the deficiency of both parametric and non-parametric stylization. |
Yongcheng Jing; Yining Mao; Yiding Yang; Yibing Zhan; Mingli Song; Xinchao Wang; Dacheng Tao; |
260 | DeepPS2: Revisiting Photometric Stereo Using Two Differently Illuminated Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we attempt to address an under-explored problem of photometric stereo using just two differently illuminated images, referred to as the PS2 problem. |
Ashish Tiwari; Shanmuganathan Raman; |
261 | Instance Contour Adjustment Via Structure-Driven CNN Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Due to the ignorance of these requirements, the off-the-shelf image editing methods herein are unsuited. Therefore, we propose a specialized two-stage method. |
Shuchen Weng; Yi Wei; Ming-Ching Chang; Boxin Shi; |
262 | Synthesizing Light Field Video from Monocular Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, we propose a self-supervised learning-based algorithm for LF video reconstruction from monocular videos. |
Shrisudhan Govindarajan; Prasan Shedligeri; Sarah; Kaushik Mitra; |
263 | Human-Centric Image Cropping with Partition-Aware and Content-Preserving Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider a specific and practical application: human-centric image cropping, which focuses on the depiction of a person. |
Bo Zhang; Li Niu; Xing Zhao; Liqing Zhang; |
264 | DeMFI: Deep Joint Deblurring and Multi-Frame Interpolation with Flow-Guided Attentive Correlation and Recursive Boosting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel joint deblurring and multi-frame interpolation (DeMFI) framework in a two-stage manner, called DeMFINet, which converts blurry videos of lower-frame-rate to sharp videos at higher-frame-rate based on flow-guided attentive-correlation-based feature bolstering (FAC-FB) module and recursive boosting (RB), in terms of multi-frame interpolation (MFI). |
Jihyong Oh; Munchurl Kim; |
265 | Neural Image Representations for Multi-Image Fusion and Layer Separation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a framework for aligning and fusing multiple images into a single view using neural image representations (NIRs), also known as implicit or coordinate-based neural representations. |
Seonghyeon Nam; Marcus A. Brubaker; Michael S. Brown; |
266 | Bringing Rolling Shutter Images Alive with Dual Reversed Distortion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, since RS distortion is coupled with other factors such as readout settings and the relative velocity of scene elements to the camera, models that only exploit the geometric correlation between temporally adjacent images suffer from poor generality in processing data with different readout settings and dynamic scenes with both camera motion and object motion. In this paper, instead of two consecutive frames, we propose to exploit a pair of images captured by dual RS cameras with reversed RS directions for this highly challenging task. |
Zhihang Zhong; Mingdeng Cao; Xiao Sun; Zhirong Wu; Zhongyi Zhou; Yinqiang Zheng; Stephen Lin; Imari Sato; |
267 | FILM: Frame Interpolation for Large Motion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a frame interpolation algorithm that synthesizes an engaging slow-motion video from near-duplicate photos which often exhibit large scene motion. |
Fitsum Reda; Janne Kontkanen; Eric Tabellion; Deqing Sun; Caroline Pantofaru; Brian Curless; |
268 | Video Interpolation By Event-Driven Anisotropic Adjustment of Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an end-to-end training method A^2OF for video frame interpolation with event-driven Anisotropic Adjustment of Optical Flows. |
Song Wu; Kaichao You; Weihua He; Chen Yang; Yang Tian; Yaoyuan Wang; Ziyang Zhang; Jianxing Liao; |
269 | EvAC3D: From Event-Based Apparent Contours to 3D Models Via Continuous Visual Hulls Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency as well as by the biological evidence that eyes in nature capture the same data and still perceive well 3D shape. |
Ziyun Wang; Kenneth Chaney; Kostas Daniilidis; |
270 | DCCF: Deep Comprehensible Color Filter Learning Framework for High-Resolution Image Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Deep Comprehensible Color Filter (DCCF) learning framework for high-resolution image harmonization. |
Ben Xue; Shenghui Ran; Quan Chen; Rongfei Jia; Binqiang Zhao; Xing Tang; |
271 | SelectionConv: Convolutional Neural Networks for Non-Rectilinear Image Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such data are usually processed using networks and algorithms specialized for each type. In this work, we show that it may not always be necessary to use specialized neural networks to operate on such spaces. |
David Hart; Michael Whitney; Bryan Morse; |
272 | Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the model size and computational cost limit the ability of their models on edge devices and higher-resolution images. In this paper, we propose a spatial-separated curve rendering network (S2CRNet), a novel framework to prove that the simple global editing can effectively address this task as well as the challenge of high-resolution image harmonization for the first time. |
Jingtang Liang; Xiaodong Cun; Chi-Man Pun; Jue Wang; |
273 | BigColor: Colorization Using A Generative Color Prior for Natural Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose BigColor, a novel colorization approach that provides vivid colorization for diverse in-the-wild images with complex structures. |
Geonung Kim; Kyoungkook Kang; Seongtae Kim; Hwayoon Lee; Sehoon Kim; Jonghyun Kim; Seung-Hwan Baek; Sunghyun Cho; |
274 | CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, to achieve high average bit-reduction with less accuracy loss, we propose a novel Content-Aware Dynamic Quantization (CADyQ) method for SR networks that allocates optimal bits to local regions and layers adaptively based on the local contents of an input image. |
Cheeun Hong; Sungyong Baik; Heewon Kim; Seungjun Nah; Kyoung Mu Lee; |
275 | Deep Semantic Statistics Matching (D2SM) Denoising Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. |
Kangfu Mei; Vishal M. Patel; Rui Huang; |
276 | 3D Scene Inference from Transient Histograms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. |
Sacha Jungerman; Atul Ingle; Yin Li; Mohit Gupta; |
277 | Neural Space-Filling Curves Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Neural Space-filling Curves (SFCs), a data-driven approach to infer a context-based scan order for a set of images. |
Hanyu Wang; Kamal Gupta; Larry Davis; Abhinav Shrivastava; |
278 | Exposure-Aware Dynamic Weighted Learning for Single-Shot HDR Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel single-shot high dynamic range (HDR) imaging algorithm based on exposure-aware dynamic weighted learning, which reconstructs an HDR image from a spatially varying exposure (SVE) raw image. |
An Gia Vien; Chul Lee; |
279 | Seeing Through A Black Box: Toward High-Quality Terahertz Imaging Via Subspace-and-Attention Guided Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the problem, we propose a novel Subspace-and-Attention-guided Restoration Network (SARNet) that fuses multi-spectral features of a THz image for effective restoration. |
Weng-Tai Su; Yi-Chun Hung; Po-Jen Yu; Shang-Hua Yang; Chia-Wen Lin; |
280 | Tomography of Turbulence Strength Based on Scintillation Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As far as we know, this work is the first to propose reconstruction of a TS horizontal field, using passive optical scintillation measurements. |
Nir Shaul; Yoav Y. Schechner; |
281 | Realistic Blur Synthesis for Learning Image Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present RSBlur, a novel dataset with real blurred images and the corresponding sharp image sequences to enable a detailed analysis of the difference between real and synthetic blur. |
Jaesung Rim; Geonung Kim; Jungeon Kim; Junyong Lee; Seungyong Lee; Sunghyun Cho; |
282 | Learning Phase Mask for Privacy-Preserving Passive Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The key question we address is: Can cameras be enhanced with a scalable solution to preserve users’ privacy without degrading their machine intelligence capabilities? |
Zaid Tasneem; Giovanni Milione; Yi-Hsuan Tsai; Xiang Yu; Ashok Veeraraghavan; Manmohan Chandraker; Francesco Pittaluga; |
283 | LWGNet – Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a hybrid model-driven residual network that combines the knowledge of the forward imaging system with a deep data-driven network. |
Atreyee Saha; Salman S. Khan; Sagar Sehrawat; Sanjana S. Prabhu; Shanti Bhattacharya; Kaushik Mitra; |
284 | PANDORA: Polarization-Aided Neural Decomposition of Radiance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose PANDORA, a polarimetric inverse rendering approach based on implicit neural representations. |
Akshat Dave; Yongyi Zhao; Ashok Veeraraghavan; |
285 | HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. |
Zhongang Cai; Daxuan Ren; Ailing Zeng; Zhengyu Lin; Tao Yu; Wenjia Wang; Xiangyu Fan; Yang Gao; Yifan Yu; Liang Pan; Fangzhou Hong; Mingyuan Zhang; Chen Change Loy; Lei Yang; Ziwei Liu; |
286 | DVS-Voltmeter: Stochastic Process-Based Event Simulator for Dynamic Vision Sensors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an event simulator, dubbed DVS-Voltmeter, to enable high-performance deep networks for DVS applications. |
Songnan Lin; Ye Ma; Zhenhua Guo; Bihan Wen; |
287 | Benchmarking Omni-Vision Representation Through The Lens of Visual Realms Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Omni-Realm Benchmark (OmniBenchmark) that enables systematically measuring the generalization ability across a wide range of visual realms. |
Yuanhan Zhang; Zhenfei Yin; Jing Shao; Ziwei Liu; |
288 | BEAT: A Large-Scale Semantic and Emotional Multi-modal Dataset for Conversational Gestures Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this observation, we propose a baseline model, Cascaded Motion Network (CaMN), which consists of above six modalities modeled in a cascaded architecture for gesture synthesis. |
Haiyang Liu; Zihao Zhu; Naoya Iwamoto; Yichen Peng; Zhengqing Li; You Zhou; Elif Bozkurt; Bo Zheng; |
289 | Neuromorphic Data Augmentation for Training Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This issue remains unexplored by previous academic works. In an effort to minimize this generalization gap, we propose Neuromorphic Data Augmentation (NDA), a family of geometric augmentations specifically designed for event-based datasets with the goal of significantly stabilizing the SNN training and reducing the generalization gap between training and test performance. |
Yuhang Li; Youngeun Kim; Hyoungseob Park; Tamar Geller; Priyadarshini Panda; |
290 | CelebV-HQ: A Large-Scale Video Facial Attributes Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a large-scale, high-quality, and diverse video dataset, named the High-Quality Celebrity Video Dataset (CelebV-HQ), with rich facial attribute annotations. |
Hao Zhu; Wayne Wu; Wentao Zhu; Liming Jiang; Siwei Tang; Li Zhang; Ziwei Liu; Chen Change Loy; |
291 | MovieCuts: A New Dataset and Benchmark for Cut Type Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces the Cut type recognition task, which requires modeling multi-modal information. |
Alejandro Pardo; Fabian Caba; Juan Leó,n Alcá,zar; Ali Thabet; Bernard Ghanem; |
292 | LaMAR: Benchmarking Localization and Mapping for Augmented Reality Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Furthermore, ground-truth (GT) accuracy is mostly insufficient to satisfy AR requirements. To close this gap, we introduce a new benchmark with a comprehensive capture and GT pipeline, which allow us to co-register realistic AR trajectories in diverse scenes and from heterogeneous devices at scale. |
Paul-Edouard Sarlin; Mihai Dusmanu; Johannes L. Schö,nberger; Pablo Speciale; Lukas Gruber; Viktor Larsson; Ondrej Miksik; Marc Pollefeys; |
293 | Unitail: Detecting, Reading, and Matching in Retail Scene Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. |
Fangyi Chen; Han Zhang; Zaiwang Li; Jiachen Dou; Shentong Mo; Hao Chen; Yongxin Zhang; Uzair Ahmed; Chenchen Zhu; Marios Savvides; |
294 | Not Just Streaks: Towards Ground Truth for Single Image Deraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. |
Yunhao Ba; Howard Zhang; Ethan Yang; Akira Suzuki; Arnold Pfahnl; Chethan Chinder Chandrappa; Celso M. de Melo; Suya You; Stefano Soatto; Alex Wong; Achuta Kadambi; |
295 | ECCV Caption: Correcting False Negatives By Collecting Machine-and-Human-Verified Image-Caption Associations for MS-COCO Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To correct the massive false negatives, we construct the Extended COCO Validation (ECCV) Caption dataset by supplying the missing associations with machine and human annotators. |
Sanghyuk Chun; Wonjae Kim; Song Park; Minsuk Chang; Seong Joon Oh; |
296 | MOTCOM: The Multi-Object Tracking Dataset Complexity Metric Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a remedy, we present the novel MOT dataset complexity metric (MOTCOM), which is a combination of three sub-metrics inspired by key problems in MOT: occlusion, erratic motion, and visual similarity. |
Malte Pedersen; Joakim Bruslund Haurum; Patrick Dendorfer; Thomas B. Moeslund; |
297 | How to Synthesize A Large-Scale and Trainable Micro-Expression Dataset? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper does not contain technical novelty but introduces our key discoveries in a data generation protocol, a database and insights. |
Yuchi Liu; Zhongdao Wang; Tom Gedeon; Liang Zheng; |
298 | A Real World Dataset for Multi-View 3D Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a dataset of 371 3D models of everyday tabletop objects along with their 320,000 real world RGB and depth images. |
Rakesh Shrestha; Siqi Hu; Minghao Gou; Ziyuan Liu; Ping Tan; |
299 | REALY: Rethinking The Evaluation of 3D Face Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel evaluation approach with a new benchmark REALY, consists of 100 globally aligned face scans with accurate facial keypoints, high-quality region masks, and topology-consistent meshes. |
Zenghao Chai; Haoxian Zhang; Jing Ren; Di Kang; Zhengzhuo Xu; Xuefei Zhe; Chun Yuan; Linchao Bao; |
300 | Capturing, Reconstructing, and Simulating: The UrbanScene3D Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present UrbanScene3D, a large-scale data platform for research of urban scene perception and reconstruction. |
Liqiang Lin; Yilin Liu; Yue Hu; Xingguang Yan; Ke Xie; Hui Huang; |
301 | 3D CoMPaT: Composition of Materials on Parts of 3D Things Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present 3D CoMPaT, a richly annotated large-scale dataset of more than 7.19 million rendered compositions of Materials on Parts of 7262 unique 3D Models 990 compositions per model on average. |
Yuchen Li; Ujjwal Upadhyay; Habib Slim; Tezuesh Varshney; Ahmed Abdelreheem; Arpit Prajapati; Suhail Pothigara; Peter Wonka; Mohamed Elhoseiny; |
302 | PartImageNet: A Large, High-Quality Dataset of Parts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This is partly due to the difficulty and high cost of annotating object parts so it has rarely been done except for humans (where there exists a big literature on part-based models). To help address this problem, we propose PartImageNet, a large, high-quality dataset with part segmentation annotations. |
Ju He; Shuo Yang; Shaokang Yang; Adam Kortylewski; Xiaoding Yuan; Jie-Neng Chen; Shuai Liu; Cheng Yang; Qihang Yu; Alan Yuille; |
303 | A-OKVQA: A Benchmark for Visual Question Answering Using World Knowledge Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. |
Dustin Schwenk; Apoorv Khandelwal; Christopher Clark; Kenneth Marino; Roozbeh Mottaghi; |
304 | OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce ROBIN, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation. |
Bingchen Zhao; Shaozuo Yu; Wufei Ma; Mingxin Yu; Shenxiao Mei; Angtian Wang; Ju He; Alan Yuille; Adam Kortylewski; |
305 | Facial Depth and Normal Estimation Using Single Dual-Pixel Camera Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce a DP-oriented Depth/Normal estimation network that reconstructs the 3D facial geometry.In addition, to train the network, we collect DP facial data with more than 135K images for 101 persons captured with our multi-camera structured light systems. |
Minjun Kang; Jaesung Choe; Hyowon Ha; Hae-Gon Jeon; Sunghoon Im; In So Kweon; Kuk-Jin Yoon; |
306 | The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. |
Dawit Mureja Argaw; Fabian Caba; Joon-Young Lee; Markus Woodson; In So Kweon; |
307 | StyleBabel: Artistic Style Tagging and Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. |
Dan Ruta; Andrew Gilbert; Pranav Aggarwal; Naveen Marri; Ajinkya Kale; Jo Briggs; Chris Speed; Hailin Jin; Baldo Faieta; Alex Filipkowski; Zhe Lin; John Collomosse; |
308 | PANDORA: A Panoramic Detection Dataset for Object with Orientation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new bounding box representation, Rotated Bounding Field of View (RBFoV), for the panoramic image object detection task.Then, based on the RBFoV, we present a PANoramic Detection dataset for Object with oRientAtion (PANDORA). |
Hang Xu; Qiang Zhao; Yike Ma; Xiaodong Li; Peng Yuan; Bailan Feng; Chenggang Yan; Feng Dai; |
309 | FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific “pretext” task.We will release the dataset upon acceptance. |
Pinaki Nath Chowdhury; Aneeshan Sain; Ayan Kumar Bhunia; Tao Xiang; Yulia Gryaditskaya; Yi-Zhe Song; |
310 | Exploring Fine-Grained Audiovisual Categorization with The SSW60 Dataset Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. |
Grant Van Horn; Rui Qian; Kimberly Wilber; Hartwig Adam; Oisin Mac Aodha; Serge Belongie; |
311 | The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. |
Justin Kay; Peter Kulits; Suzanne Stathatos; Siqi Deng; Erik Young; Sara Beery; Grant Van Horn; Pietro Perona; |
312 | A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To study VLN with unknown command feasibility, we introduce a new dataset Mobile app Tasks with Iterative Feedback (MoTIF), where the goal is to complete a natural language command in a mobile app. |
Andrea Burns; Deniz Arsan; Sanjna Agrawal; Ranjitha Kumar; Kate Saenko; Bryan A. Plummer; |
313 | BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: These characteristics are found in all existing datasets for dance motion synthesis, and indeed recent methods can achieve good results. We introduce a new dataset aiming to challenge these common assumptions, compiling a set of dynamic dance sequences displaying complex human poses. |
Davide Moltisanti; Jinyi Wu; Bo Dai; Chen Change Loy; |
314 | Dress Code: High-Resolution Multi-Category Virtual Try-On Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This shortcoming arises from a main factor: current publicly available datasets for image-based virtual try-on do not account for this variety, thus limiting progress in the field. To address this deficiency, we introduce Dress Code, which contains images of multi-category clothes. |
Davide Morelli; Matteo Fincato; Marcella Cornia; Federico Landi; Fabio Cesari; Rita Cucchiara; |
315 | A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-Supervised Classification and Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Subjective annotations by annotators often lead to ambiguous labels in real-world datasets. We propose a data-centric approach to relabel such ambiguous labels instead of implementing the handling of this issue in a neural network. |
Lars Schmarje; Monty Santarossa; Simon-Martin Schrö,der; Claudius Zelenka; Rainer Kiko; Jenny Stracke; Nina Volkmann; Reinhard Koch; |
316 | ClearPose: Large-Scale Transparent Object Dataset and Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion, and object-centric pose estimation tasks. |
Xiaotong Chen; Huijie Zhang; Zeren Yu; Anthony Opipari; Odest Chadwicke Jenkins; |
317 | When Deep Classifiers Agree: Analyzing Correlations Between Learning Order and Image Statistics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: It has been hypothesized that neural networks converge not only to similar representations, but also exhibit a notion of empirical agreement on which data instances are learned first. Following in the latter works’ footsteps, we define a metric to quantify the relationship between such classification agreement over time, and posit that the agreement phenomenon can be mapped to core statistics of the investigated dataset. |
Iuliia Pliushch; Martin Mundt; Nicolas Lupp; Visvanathan Ramesh; |
318 | AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel Animation CelebHeads dataset (AnimeCeleb) to address an animation head reenactment. |
Kangyeol Kim; Sunghyun Park; Jaeseong Lee; Sunghyo Chung; Junsoo Lee; Jaegul Choo; |
319 | MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present a large-scale video-audio-text dataset MUGEN, collected using the open-sourced platform game CoinRun. |
Thomas Hayes; Songyang Zhang; Xi Yin; Guan Pang; Sasha Sheng; Harry Yang; Songwei Ge; Qiyuan Hu; Devi Parikh; |
320 | A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A key algorithm for understanding the world is material segmentation, which assigns a label (metal, glass, etc.) to each pixel. We find that a model trained on existing data underperforms in some settings and propose to address this with a large-scale dataset of 3.2 million dense segments on 44,560 indoor and outdoor images, which is 23x more segments than existing data. |
Paul Upchurch; Ransen Niu; |
321 | MimicME: A Large Scale Diverse 4D Database for Facial Expression Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This lack of large datasets hinders the exploitation of the great advances that DNNs can provide. In this paper, we overcome these limitations by introducing MimicMe, a novel large-scale database of dynamic high-resolution 3D faces. |
Athanasios Papaioannou; Baris Gecer; Shiyang Cheng; Grigorios G. Chrysos; Jiankang Deng; Eftychia Fotiadou; Christos Kampouris; Dimitrios Kollias; Stylianos Moschoglou; Kritaphat Songsri-In; Stylianos Ploumpis; George Trigeorgis; Panagiotis Tzirakis; Evangelos Ververas; Yuxiang Zhou; Allan Ponniah; Anastasios Roussos; Stefanos Zafeiriou; |
322 | Delving Into Universal Lesion Segmentation: Method, Dataset, and Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Considering that it is easy to encode CT slices owing to the limited CT scenarios, we propose a Knowledge Embedding Module (KEM) to adapt the concept of dictionary learning for this task. |
Yu Qiu; Jing Xu; |
323 | Large Scale Real-World Multi-person Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a new large scale multi-person tracking dataset. |
Bing Shuai; Alessandro Bergamo; Uta Bü,chler; Andrew Berneshawi; Alyssa Boden; Joseph Tighe; |
324 | D2-TPred: Discontinuous Dependency for Trajectory Prediction Under Traffic Lights Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a trajectory prediction approach with respect to traffic lights, D2-TPred, which uses a spatial dynamic interaction graph (SDG) and a behavior dependency graph (BDG) to handle the problem of discontinuous dependency in the spatial-temporal space. |
Yuzhen Zhang; Wentong Wang; Weizhi Guo; Pei Lv; Mingliang Xu; Wei Chen; Dinesh Manocha; |
325 | The Missing Link: Finding Label Relations Across Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we explore the automatic discovery of visual-semantic relations between labels across datasets. |
Jasper Uijlings; Thomas Mensink; Vittorio Ferrari; |
326 | Learning Omnidirectional Flow in 360° Video Via Siamese Representation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To accommodate the omnidirectional nature, we present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF). |
Keshav Bhandari; Bin Duan; Gaowen Liu; Hugo Latapie; Ziliang Zong; Yan Yan; |
327 | VizWiz-FewShot: Locating Objects in Images Taken By People with Visual Impairments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. |
Yu-Yun Tseng; Alexander Bell; Danna Gurari; |
328 | TRoVE: Transforming Road Scene Datasets Into Photorealistic Virtual Environments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work proposes a synthetic data generation pipeline that utilizes existing datasets, like nuScenes, to address the difficulties and domain-gaps present in simulated datasets. |
Shubham Dokania; Anbumani Subramanian; Manmohan Chandraker; C.V. Jawahar; |
329 | Trapped in Texture Bias? A Large Scale Comparison of Deep Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we aim to understand if certain design decisions such as framework, architecture or pre-training contribute to the semantic understanding of instance segmentation. |
Johannes Theodoridis; Jessica Hofmann; Johannes Maucher; Andreas Schilling; |
330 | Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose AutoAlignV2, a faster and stronger multi-modal 3D detection framework, built on top of AutoAlign. |
Zehui Chen; Zhenyu Li; Shiquan Zhang; Liangji Fang; Qinhong Jiang; Feng Zhao; |
331 | WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a weakly-supervised approach for object pose estimation from RGB-D data using training sets composed of very few labeled images with pose annotations along with weakly-labeled images with ground truth segmentation masks without pose labels. |
Shishir Reddy Vutukur; Ivan Shugurov; Benjamin Busam; Andreas Hutter; Slobodan Ilic; |
332 | Graph R-CNN: Towards Accurate 3D Object Detection with Semantic-Decorated Local Graph Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the patch search to quickly search points in a local region for each 3D proposal. |
Honghui Yang; Zili Liu; Xiaopei Wu; Wenxiao Wang; Wei Qian; Xiaofei He; Deng Cai; |
333 | MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a flexible and high-performance 3D detection frame-work, named MPPNet, for 3D temporal object detection with point cloud sequences. |
Xuesong Chen; Shaoshuai Shi; Benjin Zhu; Ka Chun Cheung; Hang Xu; Hongsheng Li; |
334 | Long-Tail Detection with Effective Class-Margins Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we provide a theoretical understanding of the long-trail detection problem. |
Jang Hyun Cho; Philipp Krä,henbü,hl; |
335 | Semi-Supervised Monocular 3D Object Detection By Multi-View Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To alleviate the annotation effort, we propose MVC-MonoDet, the first semi-supervised training framework that improves Monocular 3D object detection by enforcing multi-view consistency. |
Qing Lian; Yanbo Xu; Weilong Yao; Yingcong Chen; Tong Zhang; |
336 | PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the issues, we perform a progressive way to introduce both temporal information and spatial information for an integrated enhancement. |
Han Wang; Jun Tang; Xiaodong Liu; Shanyan Guan; Rong Xie; Li Song; |
337 | BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images Via Spatiotemporal Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. |
Zhiqi Li; Wenhai Wang; Hongyang Li; Enze Xie; Chonghao Sima; Tong Lu; Yu Qiao; Jifeng Dai; |
338 | Category-Level 6D Object Pose and Size Estimation Using Self-Supervised Deep Prior Deformation Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the easy annotations in synthetic domains bring the downside effect of synthetic-to-real (Sim2Real) domain gap. In this work, we aim to address this issue in the task setting of Sim2Real, unsupervised domain adaptation for category-level 6D object pose and size estimation. |
Jiehong Lin; Zewei Wei; Changxing Ding; Kui Jia; |
339 | Dense Teacher: Dense Pseudo-Labels for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose replacing the sparse pseudo-boxes with the dense prediction as a united and straightforward form of pseudo-label. |
Hongyu Zhou; Zheng Ge; Songtao Liu; Weixin Mao; Zeming Li; Haiyan Yu; Jian Sun; |
340 | Point-to-Box Network for Accurate Object Detection Via Single Point Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the performance gap between point supervised object detection (PSOD) and bounding box supervised detection remains large. In this paper, we attribute such a large performance gap to the failure of generating high-quality proposal bags which are crucial for multiple instance learning (MIL). |
Pengfei Chen; Xuehui Yu; Xumeng Han; Najmul Hassan; Kai Wang; Jiachen Li; Jian Zhao; Humphrey Shi; Zhenjun Han; Qixiang Ye; |
341 | Domain Adaptive Hand Keypoint and Pixel Localization in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to utilize the divergence of two predictions to estimate the confidence of the target image for both tasks. |
Takehiko Ohkawa; Yu-Jhe Li; Qichen Fu; Ryosuke Furuta; Kris M. Kitani; Yoichi Sato; |
342 | Towards Data-Efficient Detection Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In other words, the detection transformers are generally data-hungry. To tackle this problem, we empirically analyze the factors that affect data efficiency, through a step-by-step transition from a data-efficient RCNN variant to the representative DETR. |
Wen Wang; Jing Zhang; Yang Cao; Yongliang Shen; Dacheng Tao; |
343 | Open-Vocabulary DETR with Conditional Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel open-vocabulary detector based on DETR—hence the name OV-DETR—which, once trained, can detect any object given its class name or an exemplar image. |
Yuhang Zang; Wei Li; Kaiyang Zhou; Chen Huang; Chen Change Loy; |
344 | Prediction-Guided Distillation for Dense Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we show that only a very small fraction of features within a ground-truth bounding box are responsible for a teacher’s high detection performance. |
Chenhongyi Yang; Mateusz Ochal; Amos Storkey; Elliot J. Crowley; |
345 | Multimodal Object Detection Via Probabilistic Ensembling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key contribution is a probabilistic ensembling technique, ProbEn, a simple non-learned method that fuses together detections from multi-modalities. |
Yi-Ting Chen; Jinghao Shi; Zelin Ye; Christoph Mertz; Deva Ramanan; Shu Kong; |
346 | Exploiting Unlabeled Data with Vision and Language Models for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectively generating pseudo labels for object detection. |
Shiyu Zhao; Zhixing Zhang; Samuel Schulter; Long Zhao; Vijay Kumar B G; Anastasis Stathopoulos; Manmohan Chandraker; Dimitris N. Metaxas; |
347 | CPO: Change Robust Panorama to Point Cloud Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present CPO, a fast and robust algorithm that localizes a 2D panorama with respect to a 3D point cloud of a scene possibly containing changes. |
Junho Kim; Hojun Jang; Changwoon Choi; Young Min Kim; |
348 | INT: Towards Infinite-Frames 3D Detection with An Efficient Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although increasing the number of frames might improve performance, previous multi-frame studies only used very limited frames to build their systems due to the dramatically increased computational and memory cost. To address these issues, we propose a novel on-stream training and prediction framework that, in theory, can employ an infinite number of frames while keeping the same amount of computation as a single-frame detector. |
Jianyun Xu; Zhenwei Miao; Da Zhang; Hongyu Pan; Kaixuan Liu; Peihan Hao; Jun Zhu; Zhengyang Sun; Hongmin Li; Xin Zhan; |
349 | End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we propose a sparse proposal evolution (SPE) approach, which advances WSOD from the two-stage pipeline with dense proposals to an end-to-end framework with sparse proposals. |
Mingxiang Liao; Fang Wan; Yuan Yao; Zhenjun Han; Jialing Zou; Yuze Wang; Bailan Feng; Peng Yuan; Qixiang Ye; |
350 | Calibration-Free Multi-View Crowd Counting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To extend and apply MVCC to more practical situations, in this paper we propose calibration-free multi-view crowd counting (CF-MVCC), which obtains the scene-level count directly from the density map predictions for each camera view without needing the camera calibrations in the test. |
Qi Zhang; Antoni B. Chan; |
351 | Unsupervised Domain Adaptation for Monocular 3D Object Detection Via Self-Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the depth-shift, we introduce the geometry-aligned multi-scale training strategy to disentangle the camera parameters and guarantee the geometry consistency of domains. |
Zhenyu Li; Zehui Chen; Ang Li; Liangji Fang; Qinhong Jiang; Xianming Liu; Junjun Jiang; |
352 | SuperLine3D: Self-Supervised Line Segmentation and Description for LiDAR Point Cloud Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud. |
Xiangrui Zhao; Sheng Yang; Tianxin Huang; Jun Chen; Teng Ma; Mingyang Li; Yong Liu; |
353 | Exploring Plain Vision Transformer Backbones for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. |
Yanghao Li; Hanzi Mao; Ross Girshick; Kaiming He; |
354 | Adversarially-Aware Robust Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we empirically explore the model training for adversarial robustness in object detection, which greatly attributes to the conflict between learning clean images and adversarial images. |
Ziyi Dong; Pengxu Wei; Liang Lin; |
355 | HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Conventional homogeneous KD (homo-KD) methods suffer from such a gap and are hard to directly obtain satisfactory performance for hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD) framework, leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector to reduce this gap. |
Luting Wang; Xiaojie Li; Yue Liao; Zeren Jiang; Jianlong Wu; Fei Wang; Chen Qian; Si Liu; |
356 | You Should Look at All Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, this paper first revisits FPN in the detection framework and reveals the nature of the success of FPN from the perspective of optimization. Then, we point out that the degraded performance of large-scale objects is due to the arising of improper back-propagation paths after integrating FPN. |
Zhenchao Jin; Dongdong Yu; Luchuan Song; Zehuan Yuan; Lequan Yu; |
357 | Detecting Twenty-Thousand Classes Using Image-Level Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Detic, which simply trains the classifiers of a detector on image classification data and thus expands the vocabulary of detectors to tens of thousands of concepts. |
Xingyi Zhou; Rohit Girdhar; Armand Joulin; Philipp Krä,henbü,hl; Ishan Misra; |
358 | DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, surrogate objectives of correspondence learning in 3D space are a step away from the true ones of object pose estimation, making the learning suboptimal for the end task. In this paper, we address this shortcoming by introducing a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net. |
Hongyang Li; Jiehong Lin; Kui Jia; |
359 | Monocular 3D Object Detection with Depth from Motion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by binocular methods for 3D object detection, we take advantage of the strong geometry structure provided by camera ego-motion for accurate object depth estimation and detection. |
Tai Wang; Jiangmiao Pang; Dahua Lin; |
360 | DISP6D: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Building on a well-known auto-encoding framework to cope with object symmetry and the lack of labeled training data, we achieve scalability by disentangling the latent representation of auto-encoder into shape and pose sub-spaces. |
Yilin Wen; Xiangyu Li; Hao Pan; Lei Yang; Zheng Wang; Taku Komura; Wenping Wang; |
361 | Distilling Object Detectors with Global Knowledge Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, a novel prototype generation module (PGM) is proposed to find the common basis vectors, dubbed prototypes, in the two feature spaces. Then, a robust distilling module (RDM) is applied to construct the global knowledge based on the prototypes and filtrate noisy global and local knowledge by measuring the discrepancy of the representations in two feature spaces. |
Sanli Tang; Zhongyu Zhang; Zhanzhan Cheng; Jing Lu; Yunlu Xu; Yi Niu; Fan He; |
362 | Unifying Visual Perception By Dispersible Points Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a conceptually simple, flexible, and universal visual perception head for variant visual tasks, e.g., classification, object detection, instance segmentation and pose estimation, and different frameworks, such as one-stage or two-stage pipelines. |
Jianming Liang; Guanglu Song; Biao Leng; Yu Liu; |
363 | PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we delve into two key techniques in Semi-Supervised Object Detection (SSOD), namely pseudo labeling and consistency training. |
Gang Li; Xiang Li; Yujie Wang; Yichao Wu; Ding Liang; Shanshan Zhang; |
364 | Exploring Resolution and Degradation Clues As Self-Supervised Signal for Low Quality Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose anovel self-supervised framework to detect objects in degraded low res-olution images. |
Ziteng Cui; Yingying Zhu; Lin Gu; Guo-Jun Qi; Xiaoxiao Li; Renrui Zhang; Zenghui Zhang; Tatsuya Harada; |
365 | Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we introduce a coarse-to-fine optimization strategy that utilizes the rendering process to estimate a sparse set of 6D object proposals, which are subsequently refined with gradient-based optimization. |
Wufei Ma; Angtian Wang; Alan Yuille; Adam Kortylewski; |
366 | Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we mainly address the challenge of cross-modal weakly misalignment in aerial RGB-IR images. |
Maoxun Yuan; Yinyan Wang; Xingxing Wei; |
367 | RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we point out that either box prior in the anchor-based detector or point prior in the anchor-free detector is sub-optimal for tiny objects. |
Chang Xu; Jinwang Wang; Wen Yang; Huai Yu; Lei Yu; Gui-Song Xia; |
368 | Rethinking IoU-Based Optimization for Single-Stage 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Rotation-Decoupled IoU (RDIoU) method that can mitigate the rotation-sensitivity issue, and produce more efficient optimization objectives compared with 3D IoU during the training stage. |
Hualian Sheng; Sijia Cai; Na Zhao; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Min-Jian Zhao; Gim Hee Lee; |
369 | TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In contrast to the bottom-up graph-based approaches, which rely on orientation information, we propose a novel top-down approach to generate road network graphs with a holistic model, namely TD-Road. |
Yang He; Ravi Garg; Amber Roy Chowdhury; |
370 | Multi-faceted Distillation of Base-Novel Commonality for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we propose to learn three types of class-agnostic commonalities between base and novel classes explicitly: recognition-related semantic commonalities, localization-related semantic commonalities and distribution commonalities. |
Shuang Wu; Wenjie Pei; Dianwen Mei; Fanglin Chen; Jiandong Tian; Guangming Lu; |
371 | PointCLM: A Contrastive Learning-Based Framework for Multi-Instance Point Cloud Registration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose PointCLM, a contrastive learning-based framework for mutli-instance point cloud registration. |
Mingzhi Yuan; Zhihao Li; Qiuye Jin; Xinrong Chen; Manning Wang; |
372 | Weakly Supervised Object Localization Via Transformer with Implicit Spatial Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the long-range modeling in Transformer neglects the inherent spatial coherence of the object, and it usually diffuses the semantic-aware regions far from the object boundary, making localization results significantly larger or far smaller. To address such an issue, we introduce a simple yet effective Spatial Calibration Module (SCM) for accurate WSOL, incorporating semantic similarities of patch tokens and their spatial relationships into a unified diffusion model. |
Haotian Bai; Ruimao Zhang; Jiong Wang; Xiang Wan; |
373 | MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, it requires large-scale labeled data and suffers from domain shift, especially when no labeled data is available in the target domain. To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels. |
Jinze Yu; Jiaming Liu; Xiaobao Wei; Haoyi Zhou; Yohei Nakata; Denis Gudovskiy; Tomoyuki Okuno; Jianxin Li; Kurt Keutzer; Shanghang Zhang; |
374 | Multi-Domain Multi-Definition Landmark Localization for Small Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel method for multi image domain and multi-landmark definition learning for small dataset facial localization. |
David Ferman; Gaurav Bharaj; |
375 | DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since the depth is the hardest to estimate for monocular detection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built with existing scale equivariant steerable blocks. |
Abhinav Kumar; Garrick Brazil; Enrique Corona; Armin Parchami; Xiaoming Liu; |
376 | Label-Guided Auxiliary Training Improves 3D Object Detector Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Label-Guided auxiliary training method for 3D object detection (LG3D), which serves as an auxiliary network to enhance the feature learning of existing 3D object detectors. |
Yaomin Huang; Xinmei Liu; Yichen Zhu; Zhiyuan Xu; Chaomin Shen; Zhengping Che; Guixu Zhang; Yaxin Peng; Feifei Feng; Jian Tang; |
377 | PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations. |
Chengjian Feng; Yujie Zhong; Zequn Jie; Xiangxiang Chu; Haibing Ren; Xiaolin Wei; Weidi Xie; Lin Ma; |
378 | Densely Constrained Depth Estimator for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a method that utilizes dense projection constraints from edges of any direction. |
Yingyan Li; Yuntao Chen; Jiawei He; Zhaoxiang Zhang; |
379 | Polarimetric Pose Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper explores how complementary polarisation information, i.e. the orientation of light wave oscillations, influences the accuracy of pose predictions. |
Daoyi Gao; Yitong Li; Patrick Ruhkamp; Iuliia Skobleva; Magdalena Wysocki; HyunJun Jung; Pengyuan Wang; Arturo Guridi; Benjamin Busam; |
380 | DFNet: Enhance Absolute Pose Regression with Direct Feature Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching. |
Shuai Chen; Xinghui Li; Zirui Wang; Victor Adrian Prisacariu; |
381 | Cornerformer: Purifying Instances for Corner-Based Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Accordingly, this paper presents an elegant framework named Cornerformer that is composed of two factors. |
Haoran Wei; Xin Chen; Lingxi Xie; Qi Tian; |
382 | PillarNet: Real-Time and High-Performance Pillar-Based 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, by examining the primary performance gap between pillar- and voxel-based detectors, we develop a real-time and high-performance pillar-based detector, dubbed PillarNet. |
Guangsheng Shi; Ruifeng Li; Chao Ma; |
383 | Robust Object Detection with Inaccurate Bounding Boxes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to address the challenge of learning robust object detectors with inaccurate bounding boxes. |
Chengxin Liu; Kewei Wang; Hao Lu; Zhiguo Cao; Ziming Zhang; |
384 | Efficient Decoder-Free Object Detection with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As a result, transformer-based object detection could not prevail in large-scale applications. To overcome these issues, we propose a novel decoder-free fully transformer-based (DFFT) object detector, achieving high efficiency in both training and inference stages for the first time. |
Peixian Chen; Mengdan Zhang; Yunhang Shen; Kekai Sheng; Yuting Gao; Xing Sun; Ke Li; Chunhua Shen; |
385 | Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the Cross-Modality Knowledge Distillation (CMKD) network for monocular 3D detection to efficiently and directly transfer the knowledge from LiDAR modality to image modality on both features and responses. |
Yu Hong; Hang Dai; Yong Ding; |
386 | ReAct: Temporal Action Detection with Relational Queries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries, similar to DETR, which has shown great success in object detection. |
Dingfeng Shi; Yujie Zhong; Qiong Cao; Jing Zhang; Lin Ma; Jia Li; Dacheng Tao; |
387 | Towards Accurate Active Camera Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we tackle the problem of active camera localization, which controls the camera movements actively to achieve an accurate camera pose. |
Qihang Fang; Yingda Yin; Qingnan Fan; Fei Xia; Siyan Dong; Sheng Wang; Jue Wang; Leonidas J. Guibas; Baoquan Chen; |
388 | Camera Pose Auto-Encoders for Improving Pose Regression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce Camera Pose Auto-Encoders (PAEs), multilayer perceptrons that are trained via a Teacher-Student approach to encode camera poses using APRs as their teachers. |
Yoli Shavit; Yosi Keller; |
389 | Improving The Intra-Class Long-Tail in 3D Detection Via Rare Example Mining Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we identify a new conceptual dimension – rareness – to mine new data for improving the long-tail performance of models. |
Chiyu Max Jiang; Mahyar Najibi; Charles R. Qi; Yin Zhou; Dragomir Anguelov; |
390 | Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus only the discriminative locations are activated when feeding pixel-level features into this classifier. To solve this issue, this paper elaborates a plug-and-play mechanism called BagCAMs to better project a well-trained classifier for the localization task without refining or re-training the baseline structure. |
Lei Zhu; Qian Chen; Lujia Jin; Yunfei You; Yanye Lu; |
391 | UC-OWOD: Unknown-Classified Open World Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel OWOD problem called Unknown-Classified Open World Object Detection (UC-OWOD). |
Zhiheng Wu; Yue Lu; Xingyu Chen; Zhengxing Wu; Liwen Kang; Junzhi Yu; |
392 | RayTran: 3D Pose Estimation and Shape Reconstruction of Multiple Objects from Videos with Ray-Traced Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos. |
Micha? J. Tyszkiewicz; Kevis-Kokitsi Maninis; Stefan Popov; Vittorio Ferrari; |
393 | GTCaR: Graph Transformer for Camera Re-Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a neural network approach with a graph Transformer backbone, namely GTCaR (Graph Transformer for Camera Re-localization), to address the multi-view camera re-localization problem. |
Xinyi Li; Haibin Ling; |
394 | 3D Object Detection with A Self-Supervised Lidar Scene Flow Backbone Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our main contribution leverages learned flow and motion representations and combines a self-supervised backbone with a supervised 3D detection head. |
Emeç Erç,elik; Ekim Yurtsever; Mingyu Liu; Zhijie Yang; Hanzhen Zhang; P?nar Topç,am; Maximilian Listl; Y?lmaz Kaan Ç,ayl?; Alois Knoll; |
395 | Open Vocabulary Object Detection with Pseudo Bounding-Box Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. |
Mingfei Gao; Chen Xing; Juan Carlos Niebles; Junnan Li; Ran Xu; Wenhao Liu; Caiming Xiong; |
396 | Few-Shot Object Detection By Knowledge Distillation Using Bag-of-Visual-Words Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes. |
Wenjie Pei; Shuang Wu; Dianwen Mei; Fanglin Chen; Jiandong Tian; Guangming Lu; |
397 | SALISA: Saliency-Based Input Sampling for Efficient Video Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection that allows for heavy down-sampling of unimportant background regions while preserving the fine-grained details of a high-resolution image. |
Babak Ehteshami Bejnordi; Amirhossein Habibian; Fatih Porikli; Amir Ghodrati; |
398 | ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an efficient structure named Efficient Correspondence Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner, which significantly improves the efficiency of functional model. |
Dongli Tan; Jiang-Jiang Liu; Xingyu Chen; Chao Chen; Ruixin Zhang; Yunhang Shen; Shouhong Ding; Rongrong Ji; |
399 | Vote from The Center: 6 DoF Pose Estimation in RGB-D Images By Radial Keypoint Voting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel keypoint voting scheme based on intersecting spheres, that is more accurate than existing schemes and allows for fewer, more disperse keypoints. |
Yangzheng Wu; Mohsen Zand; Ali Etemad; Michael Greenspan; |
400 | Long-Tailed Instance Segmentation Using Gumbel Optimized Loss Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we identify that Sigmoid or Softmax functions used in deep detectors are a major reason for low performance and are suboptimal for long-tailed detection and segmentation. |
Konstantinos Panagiotis Alexandridis; Jiankang Deng; Anh Nguyen; Shan Luo; |
401 | DetMatch: Two Teachers Are Better Than One for Joint 2D and 3D Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Observing that the distinct characteristics of each sensor cause them to be biased towards detecting different objects, we propose DetMatch, a flexible framework for joint semi-supervised learning on 2D and 3D modalities. |
Jinhyung Park; Chenfeng Xu; Yiyang Zhou; Masayoshi Tomizuka; Wei Zhan; |
402 | ObjectBox: From Centers to Boxes for Anchor-Free Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present ObjectBox, a novel single-stage anchor-free and highly generalizable object detection approach. |
Mohsen Zand; Ali Etemad; Michael Greenspan; |
403 | Is Geometry Enough for Matching in Visual Localization? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to go beyond the well-established approach to vision-based localization that relies on visual descriptor matching between a query image and a 3D point cloud. |
Qunjie Zhou; Sé,rgio Agostinho; Aljoša Ošep; Laura Leal-Taixé,; |
404 | SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. |
Pei Sun; Mingxing Tan; Weiyue Wang; Chenxi Liu; Fei Xia; Zhaoqi Leng; Dragomir Anguelov; |
405 | PCR-CG: Point Cloud Registration Via Deep Explicit Color and Geometry Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce PCR-CG: a novel 3D point cloud registration module explicitly embedding the color signals into geometry representation. |
Yu Zhang; Junle Yu; Xiaolin Huang; Wenhui Zhou; Ji Hou; |
406 | GLAMD: Global and Local Attention Mask Distillation for Object Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To overcome such challenging issues, we propose a novel knowledge distillation method, GLAMD, distilling both global and local knowledge from the teacher. |
Younho Jang; Wheemyung Shin; Jinbeom Kim; Simon Woo; Sung-Ho Bae; |
407 | FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present FCAF3D — a first-in-class fully convolutional anchor-free indoor 3D object detection method. |
Danila Rukhovich; Anna Vorontsova; Anton Konushin; |
408 | Video Anomaly Detection By Solving Decoupled Spatio-Temporal Jigsaw Puzzles Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task, i.e., spatio-temporal jigsaw puzzles, which is cast as a multi-label fine-grained classification problem. |
Guodong Wang; Yunhong Wang; Jie Qin; Dongming Zhang; Xiuguo Bao; Di Huang; |
409 | Class-Agnostic Object Detection with Multi-modal Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. |
Muhammad Maaz; Hanoona Rasheed; Salman Khan; Fahad Shahbaz Khan; Rao Muhammad Anwer; Ming-Hsuan Yang; |
410 | Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose EMMF-Det to do multi-modal fusion leveraging range and camera images. |
Hao Li; Zehan Zhang; Xian Zhao; Yulong Wang; Yuxi Shen; Shiliang Pu; Hui Mao; |
411 | Object Detection As Probabilistic Set Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to view object detection as a set prediction task where detectors predict the distribution over the set of objects. |
Georg Hess; Christoffer Petersson; Lennart Svensson; |
412 | Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data through self-supervised clustering, in order to capture the commonality and individuality of fine-grained actions. |
Zhi Li; Lu He; Huijuan Xu; |
413 | Neural Correspondence Field for Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image. |
Lin Huang; Tomas Hodan; Lingni Ma; Linguang Zhang; Luan Tran; Christopher D. Twigg; Po-Chen Wu; Junsong Yuan; Cem Keskin; Robert Wang; |
414 | On Label Granularity and Object Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we study the role of label granularity in WSOL. |
Elijah Cole; Kimberly Wilber; Grant Van Horn; Xuan Yang; Marco Fornoni; Pietro Perona; Serge Belongie; Andrew Howard; Oisin Mac Aodha; |
415 | OIMNet++: Prototypical Normalization and Localization-Aware Learning for Person Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce OIMNet++ that addresses the aforementioned limitations. |
Sanghoon Lee; Youngmin Oh; Donghyeon Baek; Junghyup Lee; Bumsub Ham; |
416 | Out-of-Distribution Identification: Let Detector Tell Which I Am Not Sure Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, the Feature structured OOD-IDentification (FOOD-ID) model is proposed to reduce the uncertainty of detection results by identifying the OOD instances. |
Ruoqi Li; Chongyang Zhang; Hao Zhou; Chao Shi; Yan Luo; |
417 | Learning with Free Object Segments for Long-Tailed Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. |
Cheng Zhang; Tai-Yu Pan; Tianle Chen; Jike Zhong; Wenjin Fu; Wei-Lun Chao; |
418 | Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset. |
YuXuan Liu; Nikhil Mishra; Maximilian Sieb; Yide Shentu; Pieter Abbeel; Xi Chen; |
419 | 3D Random Occlusion and Multi-layer Projection for Deep Multi-Camera Pedestrian Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. |
Rui Qiu; Ming Xu; Yuyao Yan; Jeremy S. Smith; Xi Yang; |
420 | A Simple Single-Scale Vision Transformer for Object Detection and Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we comprehensively study three architecture design choices on ViT — spatial reduction, doubled channels, and multiscale features — and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy. |
Wuyang Chen; Xianzhi Du; Fan Yang; Lucas Beyer; Xiaohua Zhai; Tsung-Yi Lin; Huizhong Chen; Jing Li; Xiaodan Song; Zhangyang Wang; Denny Zhou; |
421 | Simple Open-Vocabulary Object Detection with Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a strong recipe for transferring image-text models to open-vocabulary object detection. |
Matthias Minderer; Alexey Gritsenko; Austin Stone; Maxim Neumann; Dirk Weissenborn; Alexey Dosovitskiy; Aravindh Mahendran; Anurag Arnab; Mostafa Dehghani; Zhuoran Shen; Xiao Wang; Xiaohua Zhai; Thomas Kipf; Neil Houlsby; |
422 | A Simple Approach and Benchmark for 21,000-Category Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike previous efforts that usually transfer knowledge from base detectors to image classification data, we propose to rely more on a reverse information flow from a base image classifier to object detection data. |
Yutong Lin; Chen Li; Yue Cao; Zheng Zhang; Jianfeng Wang; Lijuan Wang; Zicheng Liu; Han Hu; |
423 | Knowledge Condensation Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Knowledge Condensation Distillation (KCD). |
Chenxin Li; Mingbao Lin; Zhiyuan Ding; Nie Lin; Yihong Zhuang; Yue Huang; Xinghao Ding; Liujuan Cao; |
424 | Reducing Information Loss for Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Meanwhile, quantifying the membrane potential to 0/1 spikes at the firing instants will inevitably introduce the quantization error thus bringing about information loss too. To address these problems, we propose a “Soft Reset mechanism for the supervised training-based SNNs, which will drive the membrane potential to a dynamic reset potential according to its magnitude, and Membrane Potential Rectifier (MPR) to reduce the quantization error via redistributing the membrane potential to a range close to the spikes. |
Yufei Guo; Yuanpei Chen; Liwen Zhang; YingLei Wang; Xiaode Liu; Xinyi Tong; Yuanyuan Ou; Xuhui Huang; Zhe Ma; |
425 | Masked Generative Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper shows that teachers can also improve students’ representation power by guiding students’ feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student’s feature and force it to generate the teacher’s full feature through a simple block. |
Zhendong Yang; Zhe Li; Mingqi Shao; Dachuan Shi; Zehuan Yuan; Chun Yuan; |
426 | Fine-Grained Data Distribution Alignment for Post-Training Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images. To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization. |
Yunshan Zhong; Mingbao Lin; Mengzhao Chen; Ke Li; Yunhang Shen; Fei Chao; Yongjian Wu; Rongrong Ji; |
427 | Learning with Recoverable Forgetting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore a novel learning scheme, termed as \textbf{L}earning w\textbf{I}th \textbf{R}ecoverable \textbf{F}orgetting (LIRF), that explicitly handles the task- or sample-specific knowledge removal and recovery. |
Jingwen Ye; Yifang Fu; Jie Song; Xingyi Yang; Songhua Liu; Xin Jin; Mingli Song; Xinchao Wang; |
428 | Efficient One Pass Self-Distillation with Zipf’s Label Smoothing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes an efficient self-distillation method named Zipf’s Label Smoothing (Zipf’s LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters. |
Jiajun Liang; Linze Li; Zhaodong Bing; Borui Zhao; Yao Tang; Bo Lin; Haoqiang Fan; |
429 | Prune Your Model Before Distill It Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose the novel framework, “prune, then distill,” that prunes the model first to make it more transferrable and then distill it to the student. |
Jinhyuk Park; Albert No; |
430 | Deep Partial Updating: Towards Communication Efficient Updating for On-Device Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge communication round, while achieving a similar performance compared to full updating. |
Zhongnan Qu; Cong Liu; Lothar Thiele; |
431 | Patch Similarity Aware Data-Free Quantization for Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers, to enable the generation of realistic samples based on the vision transformer’s unique properties for calibrating the quantization parameters. |
Zhikai Li; Liping Ma; Mengjuan Chen; Junrui Xiao; Qingyi Gu; |
432 | L3: Accelerator-Friendly Lossless Image Format for High-Resolution, High-Throughput DNN Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we propose L3, a custom lightweight, lossless image format for high-resolution, high-throughput DNN training. |
Jonghyun Bae; Woohyeon Baek; Tae Jun Ham; Jae W. Lee; |
433 | Streaming Multiscale Deep Equilibrium Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present StreamDEQ, a method that infers frame-wise representations on videos with minimal per-frame computation. |
Can Ufuk Ertenli; Emre Akbas; Ramazan Gokberk Cinbis; |
434 | Symmetry Regularization and Saturating Nonlinearity for Robust Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify the network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. |
Sein Park; Yeongsang Jang; Eunhyeok Park; |
435 | SP-Net: Slowly Progressing Dynamic Inference Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To alleviate the problems above, we propose a slowly progressing dynamic inference network to stabilize the optimization. |
Huanyu Wang; Wenhu Zhang; Shihao Su; Hui Wang; Zhenwei Miao; Xin Zhan; Xi Li; |
436 | Equivariance and Invariance Inductive Bias for Learning from Insufficient Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We are interested in learning robust models from insufficient data, without the need for any externally pre-trained checkpoints. |
Tan Wang; Qianru Sun; Sugiri Pranata; Karlekar Jayashree; Hanwang Zhang; |
437 | Mixed-Precision Neural Network Quantization Via Learned Layer-Wise Importance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. |
Chen Tang; Kai Ouyang; Zhi Wang; Yifei Zhu; Wen Ji; Yaowei Wang; Wenwu Zhu; |
438 | Event Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such redundancy occurs at multiple levels of complexity, from low-level pixel values to textures and high-level semantics. We propose Event Neural Networks (EvNets), which leverage this redundancy to achieve considerable computation savings during video inference. |
Matthew Dutson; Yin Li; Mohit Gupta; |
439 | EdgeViTs: Competing Light-Weight CNNs on Mobile Devices with Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency. |
Junting Pan; Adrian Bulat; Fuwen Tan; Xiatian Zhu; Lukasz Dudziak; Hongsheng Li; Georgios Tzimiropoulos; Brais Martinez; |
440 | PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. |
Qinghao Hu; Gang Li; Qiman Wu; Jian Cheng; |
441 | Disentangled Differentiable Network Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel channel pruning method for compression and acceleration of Convolutional Neural Networks (CNNs). |
Shangqian Gao; Feihu Huang; Yanfu Zhang; Heng Huang; |
442 | IDa-Det: An Information Discrepancy-Aware Distillation for 1-Bit Detectors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. |
Sheng Xu; Yanjing Li; Bohan Zeng; Teli Ma; Baochang Zhang; Xianbin Cao; Peng Gao; Jinhu Lü,; |
443 | Learning to Weight Samples for Dynamic Early-Exiting Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the early-exiting behavior during testing has been ignored, leading to a gap between training and testing. In this paper, we propose to bridge this gap by sample weighting. |
Yizeng Han; Yifan Pu; Zihang Lai; Chaofei Wang; Shiji Song; Junfeng Cao; Wenhui Huang; Chao Deng; Gao Huang; |
444 | AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets {b_1, b_2} (b_1, b_2 belong to R) of weights and activations for each layer instead of a fixed set (i.e., {-1, +1}). |
Zhijun Tu; Xinghao Chen; Pengju Ren; Yunhe Wang; |
445 | Adaptive Token Sampling for Efficient Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although the GFLOPs of a vision transformer can be decreased by reducing the number of tokens in the network, there is no setting that is optimal for all input images. In this work, we therefore introduce a differentiable parameter-free Adaptive Token Sampler (ATS) module, which can be plugged into any existing vision transformer architecture. |
Mohsen Fayyaz; Soroush Abbasi Koohpayegani; Farnoush Rezaei Jafari; Sunando Sengupta; Hamid Reza Vaezi Joze; Eric Sommerlade; Hamed Pirsiavash; Jü,rgen Gall; |
446 | Weight Fixing Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new method, which we call Weight Fixing Networks (WFN) that we design to realise four model outcome objectives: i) very few unique weights, ii) low-entropy weight encodings, iii) unique weight values which are amenable to energy-saving versions of hardware multiplication, and iv) lossless task-performance. |
Christopher Subia-Waud; Srinandan Dasmahapatra; |
447 | Self-Slimmed Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To solve the issue, we propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. |
Zhuofan Zong; Kunchang Li; Guanglu Song; Yali Wang; Yu Qiao; Biao Leng; Yu Liu; |
448 | Switchable Online Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Several crucial bottlenecks over the gap between them — e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? — have received limited formal study. In this paper, we propose Switchable Online Knowledge Distillation (SwitOKD), to answer these questions |
Biao Qian; Yang Wang; Hongzhi Yin; Richang Hong; Meng Wang; |
449 | $\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, by leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a general, more principled approach toward reducing the time complexity of robust training. |
Hadi M. Dolatabadi; Sarah Erfani; Christopher Leckie; |
450 | Multi-Granularity Pruning for Model Acceleration on Mobile Devices Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a unified framework for the Joint Channel pruning and Weight pruning, named JCW, which achieves an optimal pruning proportion between channel and weight pruning. |
Tianli Zhao; Xi Sheryl Zhang; Wentao Zhu; Jiaxing Wang; Sen Yang; Ji Liu; Jian Cheng; |
451 | Deep Ensemble Learning By Diverse Knowledge Distillation for Fine-Grained Object Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a knowledge distillation for ensemble by optimizing the elements of knowledge distillation as hyperparameters. |
Naoki Okamoto; Tsubasa Hirakawa; Takayoshi Yamashita; Hironobu Fujiyoshi; |
452 | Helpful or Harmful: Inter-Task Association in Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel approach to differentiate helpful and harmful information for old tasks using a model search to learn a current task effectively. |
Hyundong Jin; Eunwoo Kim; |
453 | Towards Accurate Binary Neural Networks Via Modeling Contextual Dependencies Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such simple bit operations lack the ability of modeling contextual dependencies, which is critical for learning discriminative deep representations in vision models. In this work, we tackle this issue by presenting new designs of binary neural modules, which enables BNNs to learn effective contextual dependencies. |
Xingrun Xing; Yangguang Li; Wei Li; Wenrui Ding; Yalong Jiang; Yufeng Wang; Jing Shao; Chunlei Liu; Xianglong Liu; |
454 | SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). |
Chien-Yu Lin; Anish Prabhu; Thomas Merth; Sachin Mehta; Anurag Ranjan; Maxwell Horton; Mohammad Rastegari; |
455 | Ensemble Knowledge Guided Sub-network Search and Fine-Tuning for Filter Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel sub-network search and fine-tuning method that is named Ensemble Knowledge Guidance (EKG). |
Seunghyun Lee; Byung Cheol Song; |
456 | Network Binarization Via Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the information degradation caused by the binarization operation from FP to binary activations, we establish a novel contrastive learning framework while training BNNs through the lens of Mutual Information (MI) maximization. |
Yuzhang Shang; Dan Xu; Ziliang Zong; Liqiang Nie; Yan Yan; |
457 | Lipschitz Continuity Retained Binary Neural Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce the Lipschitz continuity, a well-defined functional property, as the rigorous criteria to define the model robustness for BNN. |
Yuzhang Shang; Dan Xu; Bin Duan; Ziliang Zong; Liqiang Nie; Yan Yan; |
458 | SPViT: Enabling Faster Vision Transformers Via Latency-Aware Soft Token Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering the computation complexity, the internal data pattern of ViTs, and the edge device deployment, we propose a latency-aware soft token pruning framework, SPViT, which can be set up on vanilla Transformers of both flatten and hierarchical structures, such as DeiTs and Swin-Transformers (Swin). |
Zhenglun Kong; Peiyan Dong; Xiaolong Ma; Xin Meng; Wei Niu; Mengshu Sun; Xuan Shen; Geng Yuan; Bin Ren; Hao Tang; Minghai Qin; Yanzhi Wang; |
459 | Soft Masking for Cost-Constrained Channel Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Soft Masking for cost-constrained Channel Pruning (SMCP) to allow pruned channels to adaptively return to the network while simultaneously pruning towards a target cost constraint. |
Ryan Humble; Maying Shen; Jorge Albericio Latorre; Eric Darve; Jose Alvarez; |
460 | Non-uniform Step Size Quantization for Accurate Post-Training Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a novel PTQ scheme to bridge the gap, with minimal impact on hardware cost. |
Sangyun Oh; Hyeonuk Sim; Jounghyun Kim; Jongeun Lee; |
461 | SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets Via Jointly Architecture Searching and Parameter Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning. |
Haoran You; Baopu Li; Zhanyi Sun; Xu Ouyang; Yingyan Lin; |
462 | Meta-GF: Training Dynamic-Depth Neural Networks Harmoniously Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The interference would reduce performance of the models and cause negative influences on the convergence speed. To address this problem, we investigate the gradient conflict of these multi-exit networks, and propose a novel meta-learning based training paradigm namely Meta-GF(meta gradient fusion) to harmoniously train these exits. |
Yi Sun; Jian Li; Xin Xu; |
463 | Towards Ultra Low Latency Spiking Neural Networks for Vision and Sequential Tasks Using Temporal Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To optimize the accuracy-energy-latency trade-off, we propose a temporal pruning method which starts with an SNN of T timesteps, and reduces T every iteration of training, with threshold and leak as trainable parameters. |
Sayeed Shafayet Chowdhury; Nitin Rathi; Kaushik Roy; |
464 | Towards Accurate Network Quantization with Equivalent Smooth Regularizer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, they still suffer from accuracy degradation due to inappropriate gradients in the optimization phase, especially for low-bit precision network and low-level vision tasks. To alleviate this issue, this paper defines a family of equivalent smooth regularizers for neural network quantization, named as SQR, which represents the equivalent of actual quantization error. |
Kirill Solodskikh; Vladimir Chikin; Ruslan Aydarkhanov; Dehua Song; Irina Zhelavskaya; Jiansheng Wei; |
465 | Explicit Model Size Control and Relaxation Via Smooth Regularization for Mixed-Precision Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The main challenge of the mixed-precision approach is to define the bit-widths for each layer, while staying under memory and latency requirements. Motivated by this challenge, we introduce a novel technique for explicit complexity control of DNNs quantized to mixed-precision, which uses smooth optimization on the surface containing neural networks of constant size. |
Vladimir Chikin; Kirill Solodskikh; Irina Zhelavskaya; |
466 | BASQ: Branch-Wise Activation-Clipping Search Quantization for Sub-4-Bit Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Branch-wise Activation-clipping Search Quantization (BASQ), which is a novel quantization method for low-bit activation. |
Han-Byul Kim; Eunhyeok Park; Sungjoo Yoo; |
467 | You Already Have It: A Generator-Free Low-Precision DNN Training Framework Using Stochastic Rounding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we innovatively propose to employ the stochastic property of DNN training process itself and directly extract random numbers from DNNs in a self-sufficient manner. |
Geng Yuan; Sung-En Chang; Qing Jin; Alec Lu; Yanyu Li; Yushu Wu; Zhenglun Kong; Yanyue Xie; Peiyan Dong; Minghai Qin; Xiaolong Ma; Xulong Tang; Zhenman Fang; Yanzhi Wang; |
468 | Real Spike: Learning Real-Valued Spikes for Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we argue that SNNs may not benefit from the weight-sharing mechanism, which can effectively reduce parameters and improve inference efficiency in DNNs, in some hardwares, and assume that an SNN with unshared convolution kernels could perform better. |
Yufei Guo; Liwen Zhang; Yuanpei Chen; Xinyi Tong; Xiaode Liu; YingLei Wang; Xuhui Huang; Zhe Ma; |
469 | FedLTN: Federated Learning for Sparse and Personalized Lottery Ticket Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose FedLTN, a novel approach motivated by the well-known Lottery Ticket Hypothesis to learn sparse and personalized lottery ticket networks (LTNs) for communication-efficient and personalized FL under non-identically and independently distributed (non-IID) data settings. |
Vaikkunth Mugunthan; Eric Lin; Vignesh Gokul; Christian Lau; Lalana Kagal; Steve Pieper; |
470 | Theoretical Understanding of The Information Flow on Continual Learning Performance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While different CL training regimes have been extensively studied empirically, insufficient attention has been paid to the underlying theory. In this paper, we establish a probabilistic framework to analyze information flow through layers in networks for sequential tasks and its impact on learning performance. |
Joshua Andle; Salimeh Yasaei Sekeh; |
471 | Exploring Lottery Ticket Hypothesis in Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the iterative searching process of LTH brings a huge training computational cost when combined with the multiple timesteps of SNNs. To alleviate such heavy searching cost, we propose Early-Time (ET) ticket where we find the important weight connectivity from a smaller number of timesteps. |
Youngeun Kim; Yuhang Li; Hyoungseob Park; Yeshwanth Venkatesha; Ruokai Yin; Priyadarshini Panda; |
472 | On The Angular Update and Hyperparameter Tuning of A Scale-Invariant Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We first find a common feature of good hyperparameter combinations on such a scale-invariant network, including learning rate, weight decay, number of data samples, and batch size. Then we observe that hyperparameter setups that lead to good performance show similar degrees of angular update during one epoch. |
Juseung Yun; Janghyeon Lee; Hyounguk Shon; Eojindl Yi; Seung Hwan Kim; Junmo Kim; |
473 | LANA: Latency Aware Network Acceleration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce latency-aware network acceleration (LANA)-an approach that builds on neural architecture search technique to accelerate neural networks. |
Pavlo Molchanov; Jimmy Hall; Hongxu Yin; Jan Kautz; Nicolo Fusi; Arash Vahdat; |
474 | RDO-Q: Extremely Fine-Grained Channel-Wise Quantization Via Rate-Distortion Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the problem of efficiently exploring the hyperparameter space of channel bit widths. |
Zhe Wang; Jie Lin; Xue Geng; Mohamed M. Sabry Aly; Vijay Chandrasekhar; |
475 | U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization. |
Ahmet Caner Yü,zü,gü,ler; Nikolaos Dimitriadis; Pascal Frossard; |
476 | PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the twin uniform quantization method to reduce the quantization error on these activation values. |
Zhihang Yuan; Chenhao Xue; Yiqi Chen; Qiang Wu; Guangyu Sun; |
477 | Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve this goal. |
Jiseok Youn; Jaehun Song; Hyung-Sin Kim; Saewoong Bahk; |
478 | Understanding The Dynamics of DNNs Using Graph Modularity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we move a tiny step towards understanding the dynamics of feature representations over layers. |
Yao Lu; Wen Yang; Yunzhe Zhang; Zuohui Chen; Jinyin Chen; Qi Xuan; Zhen Wang; Xiaoniu Yang; |
479 | Latent Discriminant Deterministic Uncertainty Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most successful approaches are computationally intensive. In this work, we attempt to address these challenges in the context of autonomous driving perception tasks. |
Gianni Franchi; Xuanlong Yu; Andrei Bursuc; Emanuel Aldea; Severine Dubuisson; David Filliat; |
480 | Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a novel framework for computing visual counterfactual explanations based on two key ideas. |
Simon Vandenhende; Dhruv Mahajan; Filip Radenovic; Deepti Ghadiyaram; |
481 | HIVE: Evaluating The Human Interpretability of Visual Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce HIVE (Human Interpretability of Visual Explanations), a novel human evaluation framework that assesses the utility of explanations to human users in AI-assisted decision making scenarios, and enables falsifiable hypothesis testing, cross-method comparison, and human-centered evaluation of visual interpretability methods. |
Sunnie S. Y. Kim; Nicole Meister; Vikram V. Ramaswamy; Ruth Fong; Olga Russakovsky; |
482 | BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, many of the high-performing deep learning models that are already trained and deployed are non-Bayesian in nature, and do not provide uncertainty estimates. To address these issues, we propose BayesCap that learns a Bayesian identity mapping for the frozen model, allowing uncertainty estimation. |
Uddeshya Upadhyay; Shyamgopal Karthik; Yanbei Chen; Massimiliano Mancini; Zeynep Akata; |
483 | SESS: Saliency Enhancing with Scaling and Sliding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel saliency enhancing approach called \textbf{SESS} (\textbf{S}aliency \textbf{E}nhancing with \textbf{S}caling and \textbf{S}liding). |
Osman Tursun; Simon Denman; Sridha Sridharan; Clinton Fookes; |
484 | No Token Left Behind: Explainability-Aided Image Classification and Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate it, we present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input, in addition to employing the CLIP similarity loss used in previous works. |
Roni Paiss; Hila Chefer; Lior Wolf; |
485 | Interpretable Image Classification with Differentiable Prototypes Assignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address those shortcomings, we introduce ProtoPool, an interpretable prototype-based model with positive reasoning and three main novelties. |
Dawid Rymarczyk; ?ukasz Struski; Micha? Gó,rszczak; Koryna Lewandowska; Jacek Tabor; Bartosz Zieli?ski; |
486 | Contributions of Shape, Texture, and Color in Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We investigate the contributions of three important features of the human visual system (HVS)—shape, texture, and color —to object classification. |
Yunhao Ge; Yao Xiao; Zhi Xu; Xingrui Wang; Laurent Itti; |
487 | STEEX: Steering Counterfactual Explanations with Semantics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we address the problem of producing counterfactual explanations for high-quality images and complex scenes. |
Paul Jacob; É,loi Zablocki; Hé,di Ben-Younes; Mickaë,l Chen; Patrick Pé,rez; Matthieu Cord; |
488 | Are Vision Transformers Robust to Patch Perturbations? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study the robustness of ViT to patch-wise perturbations. |
Jindong Gu; Volker Tresp; Yao Qin; |
489 | A Dataset Generation Framework for Evaluating Megapixel Image Classifiers \& Their Explanations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To investigate classification and explanation performance, we introduce a framework to (a) generate synthetic control images that reflect common properties of megapixel images and (b) evaluate average test-set correctness. |
Gautam Machiraju; Sylvia Plevritis; Parag Mallick; |
490 | Cartoon Explanations of Image Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present CartoonX (Cartoon Explanation), a novel model-agnostic explanation method tailored towards image classifiers and based on the rate-distortion explanation (RDE) framework. |
Stefan Kolek; Duc Anh Nguyen; Ron Levie; Joan Bruna; Gitta Kutyniok; |
491 | Shap-CAM: Visual Explanations for Convolutional Neural Networks Based on Shapley Value Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we develop a novel post-hoc visual explanation method called Shap-CAM based on class activation mapping. |
Quan Zheng; Ziwei Wang; Jie Zhou; Jiwen Lu; |
492 | Privacy-Preserving Face Recognition with Learnable Privacy Budgets in Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a privacy-preserving face recognition method using differential privacy in the frequency domain. |
Jiazhen Ji; Huan Wang; Yuge Huang; Jiaxiang Wu; Xingkun Xu; Shouhong Ding; ShengChuan Zhang; Liujuan Cao; Rongrong Ji; |
493 | Contrast-Phys: Unsupervised Video-Based Remote Physiological Measurement Via Spatiotemporal Contrast Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an unsupervised rPPG measurement method that does not require ground truth signals for training. |
Zhaodong Sun; Xiaobai Li; |
494 | Source-Free Domain Adaptation with Contrastive Domain Alignment and Self-Supervised Exploration for Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Source-free Domain Adaptation framework for Face Anti-Spoofing, namely SDA-FAS, that addresses the problems of source knowledge adaptation and target data exploration under the source-free setting. |
Yuchen Liu; Yabo Chen; Wenrui Dai; Mengran Gou; Chun-Ting Huang; Hongkai Xiong; |
495 | On Mitigating Hard Clusters for Face Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce two novel modules, Neighborhood-Diffusion-based Density (NDDe) and Transition-Probability-based Distance (TPDi), based on which we can simply apply the standard Density Peak Clustering algorithm with a uniform threshold. |
Yingjie Chen; Huasong Zhong; Chong Chen; Chen Shen; Jianqiang Huang; Tao Wang; Yun Liang; Qianru Sun; |
496 | OneFace: One Threshold for All Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we rethink the limitations of existing evaluation protocols for FR and propose to evaluate the performance of FR models from a new perspective. |
Jiaheng Liu; Zhipeng Yu; Haoyu Qin; Yichao Wu; Ding Liang; Gangming Zhao; Ke Xu; |
497 | Label2Label: A Language Modeling Framework for Multi-Attribute Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a simple yet generic framework named Label2Label to exploit the complex attribute correlations. |
Wanhua Li; Zhexuan Cao; Jianjiang Feng; Jie Zhou; Jiwen Lu; |
498 | AgeTransGAN for Facial Age Transformation with Rectified Performance Metrics Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the AgeTransGAN for facial age transformation and the improvements to the metrics for performance evaluation. |
Gee-Sern Hsu; Rui-Cang Xie; Zhi-Ting Chen; Yu-Hong Lin; |
499 | Hierarchical Contrastive Inconsistency Learning for Deepfake Video Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Hierarchical Contrastive Inconsistency Learning framework (HCIL) with a two-level contrastive paradigm. |
Zhihao Gu; Taiping Yao; Yang Chen; Shouhong Ding; Lizhuang Ma; |
500 | Rethinking Robust Representation Learning Under Fine-Grained Noisy Faces Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Different types of noisy faces can be generated by adjusting the values of N, K, and C. Based on this unified formulation, we found that the main barrier behind the noise-robust representation learning is the flexibility of the algorithm under different N, K, and C. For this potential problem, we constructively propose a new method, named Evolving Sub-centers Learning (ESL), to find optimal hyperplanes to accurately describe the latent space of massive noisy faces. |
Bingqi Ma; Guanglu Song; Boxiao Liu; Yu Liu; |
501 | Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an attention similarity knowledge distillation approach, which transfers attention maps obtained from a high resolution (HR) network as a teacher into an LR network as a student to boost LR recognition performance. |
Sungho Shin; Joosoon Lee; Junseok Lee; Yeonguk Yu; Kyoobin Lee; |
502 | Teaching with Soft Label Smoothing for Mitigating Noisy Labels in Facial Expressions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent studies have highlighted the problem of noisy labels in large scale in-the-wild facial expressions datasets due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators. To solve the problem of noisy labels, we propose Soft Label Smoothing (SLS), which smooths out multiple high-confidence classes in the logits by assigning them a probability based on the corresponding confidence, and at the same time assigning a fixed low probability to the low-confidence classes. |
Tohar Lukov; Na Zhao; Gim Hee Lee; Ser-Nam Lim; |
503 | Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly generalize to an unseen identity with few training data. |
Shuai Shen; Wanhua Li; Zheng Zhu; Yueqi Duan; Jie Zhou; Jiwen Lu; |
504 | CoupleFace: Relation Matters for Face Recognition Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we observe that mutual relation knowledge between samples is also important to improve the discrim- inative ability of the learned representation of the student model, and propose an effective face recognition distilla- tion method called CoupleFace by additionally introducing the Mutual Relation Distillation (MRD) into existing distil- lation framework. |
Jiaheng Liu; Haoyu Qin; Yichao Wu; Jinyang Guo; Ding Liang; Ke Xu; |
505 | Controllable and Guided Face Synthesis for Unconstrained Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although significant advances have been made in face recognition (FR), FR in unconstrained environments remains challenging due to the domain gap between the semi-constrained training datasets and unconstrained testing scenarios. To address this problem, we propose a controllable face synthesis model (CFSM) that can mimic the distribution of target datasets in a style latent space. |
Feng Liu; Minchul Kim; Anil Jain; Xiaoming Liu; |
506 | Towards Robust Face Recognition with Comprehensive Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Previously, the research community tries to improve the performance of each single aspect but failed to present a unified solution on the joint search of the optimal designs for all three aspects. In this paper, we for the first time identify that these aspects are tightly coupled to each other. |
Manyuan Zhang; Guanglu Song; Yu Liu; Hongsheng Li; |
507 | Towards Unbiased Label Distribution Learning for Facial Pose Estimation Using Anisotropic Spherical Gaussian Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an Anisotropic Spherical Gaussian (ASG)-based LDL approach for facial pose estimation. |
Zhiwen Cao; Dongfang Liu; Qifan Wang; Yingjie Chen; |
508 | AU-Aware 3D Face Reconstruction Through Personalized AU-Specific Blendshape Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a multi-stage learning framework that recovers AU-interpretable 3D facial details by learning personalized AU-specific blendshapes from images. |
Chenyi Kuang; Zijun Cui; Jeffrey O. Kephart; Qiang Ji; |
509 | BézierPalm: A Free Lunch for Palmprint Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, by observing that palmar creases are the key information to deep-learning-based palmprint recognition, we propose to synthesize training data by manipulating palmar creases. |
Kai Zhao; Lei Shen; Yingyi Zhang; Chuhan Zhou; Tao Wang; Ruixin Zhang; Shouhong Ding; Wei Jia; Wei Shen; |
510 | Adaptive Transformers for Robust Few-Shot Cross-Domain Face Anti-Spoofing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present adaptive vision transformers (ViT) for robust cross-domain face anti-spoofing. |
Hsin-Ping Huang; Deqing Sun; Yaojie Liu; Wen-Sheng Chu; Taihong Xiao; Jinwei Yuan; Hartwig Adam; Ming-Hsuan Yang; |
511 | Face2Face$^\rho$: Real-Time High-Resolution One-Shot Face Reenactment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce Face2Face^Ï, the first Real-time High-resolution and One-shot (RHO, Ï) face reenactment framework. |
Kewei Yang; Kang Chen; Daoliang Guo; Song-Hai Zhang; Yuan-Chen Guo; Weidong Zhang; |
512 | Towards Racially Unbiased Skin Tone Estimation Via Scene Disambiguation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We find that current methods are biased towards light skin tones due to (1) strongly biased priors that prefer lighter pigmentation and (2) algorithmic solutions that disregard the light/albedo ambiguity. To address this, we propose a new evaluation dataset (FAIR) and an algorithm (TRUST) to improve albedo estimation and, hence, fairness. |
Haiwen Feng; Timo Bolkart; Joachim Tesch; Michael J. Black; Victoria Abrevaya; |
513 | BoundaryFace: A Mining Framework with Noise Label Self-Correction for Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, starting from the perspective of decision boundary, we propose a novel mining framework that focuses on the relationship between a sample’s ground truth class center and its nearest negative class center. |
Shijie Wu; Xun Gong; |
514 | Pre-training Strategies and Datasets for Facial Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our main two findings are: (1) Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements for all facial tasks considered. (2) Many existing facial video datasets seem to have a large amount of redundancy. |
Adrian Bulat; Shiyang Cheng; Jing Yang; Andrew Garbett; Enrique Sanchez; Georgios Tzimiropoulos; |
515 | Look Both Ways: Self-Supervising Driver Gaze Estimation and Road Scene Saliency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new on-road driving dataset, called “Look Both Ways”, which contains synchronized video of both driver faces and the forward road scene, along with ground truth gaze data registered from eye tracking glasses worn by the drivers. |
Isaac Kasahara; Simon Stent; Hyun Soo Park; |
516 | MFIM: Megapixel Facial Identity Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel face-swapping framework called Megapixel Facial Identity Manipulation (MFIM). |
Sanghyeon Na; |
517 | 3D Face Reconstruction with Dense Landmarks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In answer, we present the first method that accurately predicts 10x as many landmarks as usual, covering the whole head, including the eyes and teeth. |
Erroll Wood; Tadas Baltrušaitis; Charlie Hewitt; Matthew Johnson; Jingjing Shen; Nikola Milosavljevi?; Daniel Wilde; Stephan Garbin; Toby Sharp; Ivan Stojiljkovi?; Tom Cashman; Julien Valentin; |
518 | Emotion-Aware Multi-View Contrastive Learning for Facial Emotion Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel approach to generate features related to emotional expression through feature transformation and to use them for emotional representation learning. |
Daeha Kim; Byung Cheol Song; |
519 | Order Learning Using Partially Ordered Data Via Chainization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose the chainization algorithm for effective order learning when only partially ordered data are available. |
Seon-Ho Lee; Chang-Su Kim; |
520 | Unsupervised High-Fidelity Facial Texture Generation and Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel unified pipeline for both tasks, generation of texture with coupled geometry, and reconstruction of high-fidelity texture. |
Ron Slossberg; Ibrahim Jubran; Ron Kimmel; |
521 | Multi-Domain Learning for Updating Face Anti-Spoofing Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study multi-domain learning for face anti-spoofing (MD-FAS), where a pre-trained FAS model needs to be updated to perform equally well on both source and target domains while only using target domain data for updating. |
Xiao Guo; Yaojie Liu; Anil Jain; Xiaoming Liu; |
522 | Towards Metrical Reconstruction of Human Faces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. |
Wojciech Zielonka; Timo Bolkart; Justus Thies; |
523 | Discover and Mitigate Unknown Biases with Debiasing Alternate Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To resolve those problems, we propose Debiasing Alternate Networks (DebiAN), which comprises two networks—a Discoverer and a Classifier. |
Zhiheng Li; Anthony Hoogs; Chenliang Xu; |
524 | Unsupervised and Semi-Supervised Bias Benchmarking in Face Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce Semi-supervised Performance Evaluation for Face Recognition (SPE-FR). |
Alexandra Chouldechova; Siqi Deng; Yongxin Wang; Wei Xia; Pietro Perona; |
525 | Towards Efficient Adversarial Training on Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we first comprehensively study fast adversarial training on a variety of vision transformers and illustrate the relationship between the efficiency and robustness. Then, to expediate adversarial training on ViTs, we propose an efficient Attention Guided Adversarial Training mechanism. |
Boxi Wu; Jindong Gu; Zhifeng Li; Deng Cai; Xiaofei He; Wei Liu; |
526 | MIME: Minority Inclusion for Majority Group Enhancement of AI Performance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. |
Pradyumna Chari; Yunhao Ba; Shreeram Athreya; Achuta Kadambi; |
527 | Studying Bias in GANs Through The Lens of Race Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of the datasets upon which these models are trained. |
Vongani H. Maluleke; Neerja Thakkar; Tim Brooks; Ethan Weber; Trevor Darrell; Alexei A. Efros; Angjoo Kanazawa; Devin Guillory; |
528 | Trust, But Verify: Using Self-Supervised Probing to Improve Trustworthiness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new approach of \emph{self-supervised probing}, which enables us to check and mitigate the overconfidence issue for a trained model, thereby improving its trustworthiness. |
Ailin Deng; Shen Li; Miao Xiong; Zhirui Chen; Bryan Hooi; |
529 | Learning to Censor By Noisy Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this work is to protect sensitive information when learning from point clouds by censoring signal before the point cloud is released for downstream tasks. |
Ayush Chopra; Abhinav Java; Abhishek Singh; Vivek Sharma; Ramesh Raskar; |
530 | An Invisible Black-Box Backdoor Attack Through Frequency Domain Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a simple but effective and invisible black-box backdoor attack FTrojan through trojaning the frequency domain. |
Tong Wang; Yuan Yao; Feng Xu; Shengwei An; Hanghang Tong; Ting Wang; |
531 | FairGRAPE: Fairness-Aware GRAdient Pruning MEthod for Face Attribute Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel pruning method, Fairness-aware GRAdient Pruning mEthod (FairGRAPE), that minimizes the disproportionate impacts of pruning on different sub-groups. |
Xiaofeng Lin; Seungbae Kim; Jungseock Joo; |
532 | Attaining Class-Level Forgetting in Pretrained Model Using Few Samples Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The available data may also be limited due to privacy/ethical concerns, and re-training the model will not be possible. We propose a novel approach to address this problem without affecting the model’s prediction power for the remaining classes. |
Pravendra Singh; Pratik Mazumder; Mohammed Asad Karim; |
533 | Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We study protecting a user’s data (e.g., images in this work) against a learner’s unauthorized use in training neural networks. |
Zihang Zou; Boqing Gong; Liqiang Wang; |
534 | An Impartial Take to The CNN Vs Transformer Robustness Contest Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we perform extensive empirical analyses showing that recent state-of-the-art CNNs (particularly, ConvNeXt) can be as robust and reliable or even sometimes more than the current state-of-the-art Transformers. |
Francesco Pinto; Philip H. S. Torr; Puneet K. Dokania; |
535 | Recover Fair Deep Classification Models Via Altering Pre-trained Structure Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel intra-processing method to improve model fairness by altering the deep network structure. |
Yanfu Zhang; Shangqian Gao; Heng Huang; |
536 | Decouple-and-Sample: Protecting Sensitive Information in Task Agnostic Data Release Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose sanitizer, a framework for secure and task-agnostic data release. |
Abhishek Singh; Ethan Garza; Ayush Chopra; Praneeth Vepakomma; Vivek Sharma; Ramesh Raskar; |
537 | Privacy-Preserving Action Recognition Via Motion Difference Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the one hand, we want these systems to assist in our daily lives by understanding their surroundings, but on the other hand, we want them to do so without capturing any sensitive information. Towards this direction, this paper proposes a simple, yet robust privacy-preserving encoder called BDQ for the task of privacy-preserving human action recognition that is composed of three modules: Blur, Difference, and Quantization. |
Sudhakar Kumawat; Hajime Nagahara; |
538 | Latent Space Smoothing for Individually Fair Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce LASSI, the first representation learning method for certifying individual fairness of high-dimensional data. |
Momchil Peychev; Anian Ruoss; Mislav Balunovi?; Maximilian Baader; Martin Vechev; |
539 | Parameterized Temperature Scaling for Boosting The Expressive Power in Post-Hoc Uncertainty Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We address the problem of uncertainty calibration and introduce a novel calibration method, Parametrized Temperature Scaling (PTS). |
Christian Tomani; Daniel Cremers; Florian Buettner; |
540 | FairStyle: Debiasing StyleGAN2 with Style Channel Manipulations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a method for directly modifying a pre-trained StyleGAN2 model that can be used to generate a balanced set of images with respect to one (e.g., eyeglasses) or more attributes (e.g., gender and eyeglasses). |
Cemre Efe Karakas; Alara Dirik; Eylü,l Yalç,?nkaya; Pinar Yanardag; |
541 | Distilling The Undistillable: Learning from A Nasty Teacher Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. |
Surgan Jandial; Yash Khasbage; Arghya Pal; Vineeth N Balasubramanian; Balaji Krishnamurthy; |
542 | SOS! Self-Supervised Learning Over Sets of Handled Objects in Egocentric Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To overcome both limitations, we introduce Self-supervised learning Over Sets (SOS), an approach to pre-train a generic Objects In Contact (OIC) representation model from video object regions detected by an off-the-shelf hand-object contact detector. |
Victor Escorcia; Ricardo Guerrero; Xiatian Zhu; Brais Martinez; |
543 | Egocentric Activity Recognition and Localization on A 3D Map Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Given a video captured from a first person perspective and the environment context of where the video is recorded, can we recognize what the person is doing and identify where the action occurs in the 3D space? We address this challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos. |
Miao Liu; Lingni Ma; Kiran Somasundaram; Yin Li; Kristen Grauman; James M. Rehg; Chao Li; |
544 | Generative Adversarial Network for Future Hand Segmentation from Egocentric Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the novel problem of anticipating a time series of future hand masks from egocentric video. |
Wenqi Jia; Miao Liu; James M. Rehg; |
545 | My View Is The Best View: Procedure Learning from Egocentric Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present a novel self-supervised Correspond and Cut (CnC) framework for procedure learning. |
Siddhant Bansal; Chetan Arora; C.V. Jawahar; |
546 | GIMO: Gaze-Informed Human Motion Prediction in Context Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To reduce the gap, we propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, as well as ego-centric views with the eye gaze that serves as a surrogate for inferring human intent. |
Yang Zheng; Yanchao Yang; Kaichun Mo; Jiaman Li; Tao Yu; Yebin Liu; Karen Liu; Leonidas J. Guibas; |
547 | Image-Based CLIP-Guided Essence Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our blending operator combines the powerful StyleGAN generator and the semantic encoder of CLIP in a novel way that is simultaneously additive in both latent spaces, resulting in a mechanism that guarantees both identity preservation and high-level feature transfer without relying on a facial recognition network. |
Hila Chefer; Sagie Benaim; Roni Paiss; Lior Wolf; |
548 | Detecting and Recovering Sequential DeepFake Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This new threat requires us to detect a sequence of facial manipulations, which is vital for both detecting deepfake media and recovering original faces afterwards. Motivated by this observation, we emphasize the need and propose a novel research problem called Detecting Sequential DeepFake Manipulation (Seq-DeepFake). |
Rui Shao; Tianxing Wu; Ziwei Liu; |
549 | Self-Supervised Sparse Representation for Video Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To establish a unified approach to solving the two VAD settings, we introduce a self-supervised sparse representation (S3R) framework that models the concept of anomaly at feature level by exploring the synergy between dictionary-based representation and self-supervised learning. |
Jhih-Ciang Wu; He-Yen Hsieh; Ding-Jie Chen; Chiou-Shann Fuh; Tyng-Luh Liu; |
550 | Watermark Vaccine: Adversarial Attacks to Prevent Watermark Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the vulnerability of DNNs on adversarial perturbations, we propose a novel defence mechanism by adversarial machine learning for good. |
Xinwei Liu; Jian Liu; Yang Bai; Jindong Gu; Tao Chen; Xiaojun Jia; Xiaochun Cao; |
551 | Explaining Deepfake Detection By Analysing Image Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to interpret how deepfake detection models learn artifact features of images when just supervised by binary labels. |
Shichao Dong; Jin Wang; Jiajun Liang; Haoqiang Fan; Renhe Ji; |
552 | FrequencyLowCut Pooling – Plug \& Play Against Catastrophic Overfitting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent work [18] in the context of adversarial attacks and distribution shifts, showed after all, that there is a strong correlation between the vulnerability of CNNs and aliasing artifacts induced by poor down-sampling operations. This paper builds on these findings and introduces an aliasing free down-sampling operation which can easily be plugged into any CNN architecture: FrequencyLowCut pooling. |
Julia Grabinski; Steffen Jung; Janis Keuper; Margret Keuper; |
553 | TAFIM: Targeted Adversarial Attacks Against Facial Image Manipulations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we introduce a novel data-driven approach that produces image-specific perturbations which are embedded in the original images. |
Shivangi Aneja; Lev Markhasin; Matthias Nieß,ner; |
554 | FingerprintNet: Synthesized Fingerprints for Generated Image Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To overcome this problem, we analyze the distinctive characteristic of the generated images called ‘fingerprints,’ and propose a new framework to reproduce diverse types of fingerprints generated by various generative models. |
Yonghyun Jeong; Doyeon Kim; Youngmin Ro; Pyounggeon Kim; Jongwon Choi; |
555 | Detecting Generated Images By Real Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observed that the noise pattern of real images exhibits similar characteristics in the frequency domain, while the generated images are far different. Therefore, we can perform image authentication by checking whether an image follows the patterns of authentic images. |
Bo Liu; Fan Yang; Xiuli Bi; Bin Xiao; Weisheng Li; Xinbo Gao; |
556 | An Information Theoretic Approach for Attention-Driven Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our key observation is that most of the forgery clues are hidden in the informative region, which can be measured quantitatively by classical information maximization theory. Motivated by this, we make the first attempt to introduce the self-information metric to enhance the forgery feature representation. |
Ke Sun; Hong Liu; Taiping Yao; Xiaoshuai Sun; Shen Chen; Shouhong Ding; Rongrong Ji; |
557 | Exploring Disentangled Content Information for Face Forgery Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We observe that the detector is prone to focus more on content information than artifact traces, suggesting that the detector is sensitive to the intrinsic bias of the dataset, which leads to severe overfitting. Motivated by this key observation, we design an easily embeddable disentanglement framework for content information removal, and further propose a Content Consistency Constraint (C2C) and a Global Representation Contrastive Constraint (GRCC) to enhance the independence of disentangled features. |
Jiahao Liang; Huafeng Shi; Weihong Deng; |
558 | RepMix: Representation Mixing for Robust Attribution of Synthesized Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Rapid advances in Generative Adversarial Networks (GANs) raise new challenges for image attribution detecting whether an image is synthetic and, if so, determining which GAN architecture created it. Uniquely, we present a solution to this task capable of 1) matching images invariant to their semantic content 2) robust to benign transformations (changes in quality, resolution, shape, etc.) commonly encountered as images are re-shared online. |
Tu Bui; Ning Yu; John Collomosse; |
559 | Totems: Physical Objects for Verifying Visual Integrity Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene. |
Jingwei Ma; Lucy Chai; Minyoung Huh; Tongzhou Wang; Ser-Nam Lim; Phillip Isola; Antonio Torralba; |
560 | Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such reconstruction constraint spends much effort on frame-level temporal context changes without focusing on video-level global semantics that are more useful for retrieval. Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint. |
Pandeng Li; Hongtao Xie; Jiannan Ge; Lei Zhang; Shaobo Min; Yongdong Zhang; |
561 | PASS: Part-Aware Self-Supervised Pre-training for Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a ReID-specific pre-training method, Part-Aware Self-Supervised pre-training (PASS), which can generate part-level features to offer fine-grained information and is more suitable for ReID. |
Kuan Zhu; Haiyun Guo; Tianyi Yan; Yousong Zhu; Jinqiao Wang; Ming Tang; |
562 | Adaptive Cross-Domain Learning for Generalizable Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most existing methods are challenged for dealing with the shared and specific characteristics among different domains, which is called the domain conflict problem. To address this problem, we present an Adaptive Cross-domain Learning (ACL) framework equipped with a CrOss-Domain Embedding Block (CODE-Block) to maintain a common feature space for capturing both the domain-invariant and the domain-specific features, while dynamically mining the relations across different domains. |
Pengyi Zhang; Huanzhang Dou; Yunlong Yu; Xi Li; |
563 | Multi-Query Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite recent progress, imperfect annotations in existing video retrieval datasets have posed significant challenges on model evaluation and development. In this paper, we tackle this issue by focusing on the less-studied setting of multi-query video retrieval, where multiple descriptions are provided to the model for searching over the video archive. |
Zeyu Wang; Yu Wu; Karthik Narasimhan; Olga Russakovsky; |
564 | Hierarchical Average Precision Training for Pertinent Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAPPIER). |
Elias Ramzi; Nicolas Audebert; Nicolas Thome; Clé,ment Rambour; Xavier Bitot; |
565 | Learning Semantic Correspondence with Sparse Annotations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to address the challenge of label sparsity in semantic correspondence by enriching supervision signals from sparse keypoint annotations. |
Shuaiyi Huang; Luyu Yang; Bo He; Songyang Zhang; Xuming He; Abhinav Shrivastava; |
566 | Dynamically Transformed Instance Normalization Network for Generalizable Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a new normalization scheme called Dynamically Transformed Instance Normalization (DTIN) to alleviate the drawback of IN. |
Bingliang Jiao; Lingqiao Liu; Liying Gao; Guosheng Lin; Lu Yang; Shizhou Zhang; Peng Wang; Yanning Zhang; |
567 | Domain Adaptive Person Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we take a further step and present Domain Adaptive Person Search (DAPS), which aims to generalize the model from a labeled source domain to the unlabeled target domain. |
Junjie Li; Yichao Yan; Guanshuo Wang; Fufu Yu; Qiong Jia; Shouhong Ding; |
568 | TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Token Shift and Selection Network (TS2-Net), a novel token shift and selection transformer architecture, which dynamically adjusts the token sequence and selects informative tokens in both temporal and spatial dimensions from input video samples. |
Yuqi Liu; Pengfei Xiong; Luhui Xu; Shengming Cao; Qin Jin; |
569 | Unstructured Feature Decoupling for Vehicle Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To align the features without requirements of additional annotation, this paper proposes a Unstructured Feature Decoupling Network (UFDN), which consists of a transformer-based feature decomposing head (TDH) and a novel cluster-based decoupling constraint (CDC). |
Wen Qian; Hao Luo; Silong Peng; Fan Wang; Chen Chen; Hao Li; |
570 | Deep Hash Distillation for Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel self-distilled hashing scheme to minimize the discrepancy while exploiting the potential of augmented data. |
Young Kyun Jang; Geonmo Gu; Byungsoo Ko; Isaac Kang; Nam Ik Cho; |
571 | Mimic Embedding Via Adaptive Aggregation: Learning Generalizable Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To handle the two issues above, this paper presents a new approach called Mimic Embedding via adapTive Aggregation (META) for DG person ReID. |
Boqiang Xu; Jian Liang; Lingxiao He; Zhenan Sun; |
572 | Granularity-Aware Adaptation for Image Retrieval Over Multiple Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We address it with the proposed Grappa, an approach that starts from a strong pretrained model, and adapts it to tackle multiple retrieval tasks concurrently, using only unlabeled images from the different task domains. |
Jon Almazá,n; Byungsoo Ko; Geonmo Gu; Diane Larlus; Yannis Kalantidis; |
573 | Learning Audio-Video Modalities from Image Captions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Obtaining large-scale, high quality data for video in the form of text-video and text-audio pairs however, is more challenging. To close this gap we propose a new video mining pipeline which involves transferring captions from image captioning datasets to video clips with no additional manual effort. |
Arsha Nagrani; Paul Hongsuck Seo; Bryan Seybold; Anja Hauth; Santiago Manen; Chen Sun; Cordelia Schmid; |
574 | RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Though some strategies are possible to resolve this problem, they still have room to be improved due to the limited performance in real-world scenarios and the lack of real-world clear ground truth. Thus, to resolve this problem, inspired by CycleGAN, we construct a training paradigm called \textbf{RVSL} which integrates ReID and domain transformation techniques. |
Wei-Ting Chen; I-Hsiang Chen; Chih-Yuan Yeh; Hao-Hsiang Yang; Hua-En Chang; Jian-Jiun Ding; Sy-Yen Kuo; |
575 | Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval. |
Fan Hu; Aozhu Chen; Ziyue Wang; Fangming Zhou; Jianfeng Dong; Xirong Li; |
576 | Modality Synergy Complement Learning with Cascaded Aggregation for Visible-Infrared Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Differently, this paper proposes a novel framework, named Modality Synergy Complement Learning Network (MSCLNet) with Cascaded Aggregation. |
Yiyuan Zhang; Sanyuan Zhao; Yuhao Kang; Jianbing Shen; |
577 | Cross-Modality Transformer for Visible-Infrared Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these methods usually damage the modality-specific information and identification information contained in the features. To alleviate the above issues, we propose a novel Cross-Modality Transformer (CMT) to jointly explore a modality-level alignment module and an instance-level module for VI-ReID. |
Kongzhu Jiang; Tianzhu Zhang; Xiang Liu; Bingqiao Qian; Yongdong Zhang; Feng Wu; |
578 | Audio-Visual Mismatch-Aware Video Retrieval Via Association and Adjustment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Mismatch condition can be categorized into two cases: (i) Audio itself does not exist, (ii) Audio exists but does not match with visual. To deal with (i), we introduce audio-visual associative memory (AVA-Memory) to associate audio cues even from videos without audio data. The associated audio cues can guide the video embedding feature to be aware of audio information even in the missing audio condition. To address (ii), we propose audio embedding adjustment by considering the degree of matching between visual and audio data. |
Sangmin Lee; Sungjune Park; Yong Man Ro; |
579 | Connecting Compression Spaces with Transformer for Approximate Nearest Neighbor Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a generic feature compression method for Approximate Nearest Neighbor Search (ANNS) problems, which speeds up existing ANNS methods in a plug-and-play manner. |
Haokui Zhang; Buzhou Tang; Wenze Hu; Xiaoyu Wang; |
580 | SEMICON: A Learning-to-Hash Solution for Large-Scale Fine-Grained Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Suppression-Enhancing Mask based attention and Interactive Channel transformatiON (SEMICON) to learn binary hash codes for dealing with large-scale fine-grained image retrieval tasks. |
Yang Shen; Xuhao Sun; Xiu-Shen Wei; Qing-Yuan Jiang; Jian Yang; |
581 | CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there are dilemmas within existing approaches: (1) 3D solutions model the spatio-temporal interaction but are often troubled with the misalignment of adjacent frames, and (2) 2D solutions adopt a divide-and-conquer strategy against the misalignment but cannot take advantage of the spatio-temporal interactions. To address the above problems, we propose a Contextual Alignment Vision Transformer (\textbf{CAViT}) to the spatio-temporal interaction with a 2D solution. |
Jinlin Wu; Lingxiao He; Wu Liu; Yang Yang; Zhen Lei; Tao Mei; Stan Z. Li; |
582 | Text-Based Temporal Localization of Novel Events Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Moreover, acquiring videos and text comprising all possible scenarios for training is not practical. In this regard, this paper introduces and tackles the problem of text-based temporal localization of novel/unseen events. |
Sudipta Paul; Niluthpol Chowdhury Mithun; Amit K. Roy-Chowdhury; |
583 | Reliability-Aware Prediction Via Uncertainty Learning for Person Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, they rarely describe the reliability of the prediction. In this paper, we propose an Uncertainty-Aware Learning (UAL) method to remedy this issue. |
Zhaopeng Dou; Zhongdao Wang; Weihua Chen; Yali Li; Shengjin Wang; |
584 | Relighting4D: Neural Relightable Human from Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a principled framework, Relighting4D, that enables free-viewpoints relighting from only human videos under unknown illuminations. |
Zhaoxi Chen; Ziwei Liu; |
585 | Real-Time Intermediate Flow Estimation for Video Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose RIFE, a Real-time Intermediate Flow Estimation algorithm for VFI. |
Zhewei Huang; Tianyuan Zhang; Wen Heng; Boxin Shi; Shuchang Zhou; |
586 | PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a progressive pixel synthesis network towards efficient image generation, coined as PixelFolder. |
Jing He; Yiyi Zhou; Qi Zhang; Jun Peng; Yunhang Shen; Xiaoshuai Sun; Chao Chen; Rongrong Ji; |
587 | StyleSwap: Style-Based Generator Empowers Robust Face Swapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a concise and effective framework named StyleSwap. |
Zhiliang Xu; Hang Zhou; Zhibin Hong; Ziwei Liu; Jiaming Liu; Zhizhi Guo; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
588 | Paint2Pix: Interactive Painting Based Progressive Image Synthesis and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, for the first time we study the problem of photorealistic image synthesis from incomplete and primitive human paintings. |
Jaskirat Singh; Liang Zheng; Cameron Smith; Jose Echevarria; |
589 | FurryGAN: High Quality Foreground-Aware Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FurryGAN with three key components: 1) imposing both the foreground image and the composite image to be realistic, 2) designing a mask as a combination of coarse and fine masks, and 3) guiding the generator by an auxiliary mask predictor in the discriminator. |
Jeongmin Bae; Mingi Kwon; Youngjung Uh; |
590 | SCAM! Transferring Humans Between Images with Semantic Cross Attention Modulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. |
Nicolas Dufour; David Picard; Vicky Kalogeiton; |
591 | Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input. To kick-off this novel task, we propose the Sem2NeRF framework. |
Yuedong Chen; Qianyi Wu; Chuanxia Zheng; Tat-Jen Cham; Jianfei Cai; |
592 | WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, previous approaches struggle to synthesize high-frequency signals with fine details, deteriorating the synthesis quality. To address this, we propose WaveGAN, a frequency-aware model for few-shot image generation. |
Mengping Yang; Zhe Wang; Ziqiu Chi; Wenyi Feng; |
593 | End-to-End Visual Editing with A Generatively Pre-trained Artist Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain. |
Andrew Brown; Cheng-Yang Fu; Omkar Parkhi; Tamara L. Berg; Andrea Vedaldi; |
594 | High-Fidelity GAN Inversion with Padding Space Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose to involve the padding space of the generator to complement the latent space with spatial information. |
Qingyan Bai; Yinghao Xu; Jiapeng Zhu; Weihao Xia; Yujiu Yang; Yujun Shen; |
595 | Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an effective end-to-end unified framework to achieve both tasks. |
Chao Xu; Jiangning Zhang; Yue Han; Guanzhong Tian; Xianfang Zeng; Ying Tai; Yabiao Wang; Chengjie Wang; Yong Liu; |
596 | Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a training paradigm for INRs whose target output is image pixels, to encode image derivatives in addition to image values in the neural network. |
Wentao Yuan; Qingtian Zhu; Xiangyue Liu; Yikang Ding; Haotian Zhang; Chi Zhang; |
597 | Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While these methods have incrementally improved the generated image fidelity and text relevancy, several pivotal gaps remain unanswered, limiting applicability and quality. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene, (ii) introducing elements that substantially improve the tokenization process by employing domain-specific knowledge over key image regions (faces and salient objects), and (iii) adapting classifier-free guidance for the transformer use case. |
Oran Gafni; Adam Polyak; Oron Ashual; Shelly Sheynin; Devi Parikh; Yaniv Taigman; |
598 | 3D-FM GAN: Towards 3D-Controllable Face Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While concatenating GAN inversion and a 3D-aware, noise-to-image GAN is a straight-forward solution, it is inefficient and may lead to noticeable drop in editing quality. To fill this gap, we propose 3D-FM GAN, a novel conditional GAN framework designed specifically for 3D-controllable Face Manipulation, and does not require any tuning after the end-to-end learning phase. |
Yuchen Liu; Zhixin Shu; Yijun Li; Zhe Lin; Richard Zhang; S.Y. Kung; |
599 | Multi-Curve Translator for High-Resolution Photorealistic Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present the Multi-Curve Translator (MCT), which not only predicts the translated pixels for the corresponding input pixels but also for their neighboring pixels. |
Yuda Song; Hui Qian; Xin Du; |
600 | Deep Bayesian Video Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present deep Bayesian video frame interpolation, a novel approach for upsampling a low frame-rate video temporally to its higher frame-rate counterpart. |
Zhiyang Yu; Yu Zhang; Xujie Xiang; Dongqing Zou; Xijun Chen; Jimmy S. Ren; |
601 | Cross Attention Based Style Distribution for Controllable Person Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a cross attention based style distribution module that computes between the source semantic styles and target pose for pose transfer. |
Xinyue Zhou; Mingyu Yin; Xinyuan Chen; Li Sun; Changxin Gao; Qingli Li; |
602 | KeypointNeRF: Generalizing Image-Based Volumetric Avatars Using Relative Spatial Encoding of Keypoints Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric avatars from sparse views. |
Marko Mihajlovic; Aayush Bansal; Michael Zollhö,fer; Siyu Tang; Shunsuke Saito; |
603 | ViewFormer: NeRF-Free Neural Rendering from Few Images Using Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. |
Joná,š Kulhá,nek; Erik Derner; Torsten Sattler; Robert Babuška; |
604 | L-Tracing: Fast Light Visibility Estimation on Neural Surfaces By Sphere Tracing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a highly efficient light visibility estimation method, called L-Tracing, for reflectance factorization on neural implicit surfaces. |
Ziyu Chen; Chenjing Ding; Jianfei Guo; Dongliang Wang; Yikang Li; Xuan Xiao; Wei Wu; Li Song; |
605 | A Perceptual Quality Metric for Video Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a dedicated perceptual quality metric for measuring video frame interpolation results.To train our metric, we collected a new video frame interpolation quality assessment dataset. |
Qiqi Hou; Abhijay Ghildyal; Feng Liu; |
606 | Adaptive Feature Interpolation for Low-Shot Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Training of generative models especially Generative Adversarial Networks can easily diverge in low-data setting. To mitigate this issue, we propose a novel implicit data augmentation approach which facilitates stable training and synthesize high-quality samples without need of label information. |
Mengyu Dai; Haibin Hang; Xiaoyang Guo; |
607 | PalGAN: Image Colorization with Palette Generative Adversarial Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Multimodal ambiguity and color bleeding remain challenging in colorization. To tackle these problems, we propose a new GAN-based colorization approach PalGAN, integrated with palette estimation and chromatic attention. |
Yi Wang; Menghan Xia; Lu Qi; Jing Shao; Yu Qiao; |
608 | Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a spatial-temporal compression framework, Fast-Vid2Vid, which focuses on data aspects of generative models. |
Long Zhuo; Guangcong Wang; Shikai Li; Wayne Wu; Ziwei Liu; |
609 | Learning Prior Feature and Attention Enhanced Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, this paper incorporates the pre-training based Masked AutoEncoder (MAE) into the inpainting model, which enjoys richer informative priors to enhance the inpainting process. |
Chenjie Cao; Qiaole Dong; Yanwei Fu; |
610 | Temporal-MPI: Enabling Multi-Plane Images for Dynamic Scene Modelling Via Temporal Basis Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Temporal-MPI representation which is able to encode the rich 3D and dynamic variation information throughout the entire video as compact temporal basis and coefficients jointly learned. |
Wenpeng Xing; Jie Chen; |
611 | 3D-Aware Semantic-Guided Generative Model for Human Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a 3D-aware Semantic-Guided Generative Model (3D-SGAN) for human image synthesis, which combines a GNeRF with a texture generator. |
Jichao Zhang; Enver Sangineto; Hao Tang; Aliaksandr Siarohin; Zhun Zhong; Nicu Sebe; Wei Wang; |
612 | Temporally Consistent Semantic Video Editing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a simple yet effective method to facilitate temporally coherent video editing. |
Yiran Xu; Badour AlBahar; Jia-Bin Huang; |
613 | Error Compensation Framework for Flow-Guided Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose an Error Compensation Framework for Flow-guided Video Inpainting (ECFVI), which takes advantage of the flow-based method and offsets its weaknesses.In addition, we present a new benchmark dataset for evaluation by supplementing the weaknesses of existing test datasets. |
Jaeyeon Kang; Seoung Wug Oh; Seon Joo Kim; |
614 | Scraping Textures from Natural Images for Synthesis and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper aims to scrape textures directly from natural images of everyday objects and scenes, build texture models, and employ them for texture synthesis, texture editing, etc. |
Xueting Li; Xiaolong Wang; Ming-Hsuan Yang; Alexei A. Efros; Sifei Liu; |
615 | Single Stage Virtual Try-On Via Deformable Attention Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods usually build up multi-stage frameworks to deal with clothes warping and body blending respectively, or rely heavily on intermediate parser-based labels which may be noisy or even inaccurate. To solve the above challenges, we propose a single-stage try-on framework by developing a novel Deformable Attention Flow (DAFlow), which applies the deformable attention scheme to multi-flow estimation. |
Shuai Bai; Huiling Zhou; Zhikang Li; Chang Zhou; Hongxia Yang; |
616 | Improving GANs for Long-Tailed Data Through Group Spectral Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we aim to train conditional Generative Adversarial Networks, a class of image generation models on long-tailed distributions. |
Harsh Rangwani; Naman Jaswani; Tejan Karmali; Varun Jampani; R. Venkatesh Babu; |
617 | Hierarchical Semantic Regularization of Latent Spaces in StyleGANs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a Hierarchical Semantic Regularizer (HSR) which aligns the hierarchical representations learnt by the generator to corresponding powerful features learnt by pretrained networks on large amounts of data. |
Tejan Karmali; Rishubh Parihar; Susmit Agrawal; Harsh Rangwani; Varun Jampani; Maneesh Singh; R. Venkatesh Babu; |
618 | IntereStyle: Encoding An Interest Region for Robust StyleGAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we point out that the existing encoders try to lower the distortion not only on the interest region, e.g., human facial region but also on the uninterest region, e.g., background patterns and obstacles. |
Seung-Jun Moon; Gyeong-Moon Park; |
619 | StyleLight: HDR Panorama Generation for Lighting Estimation and Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new lighting estimation and editing framework to generate high-dynamic-range (HDR) indoor panorama lighting from a single limited field-of-view (LFOV) image captured by low-dynamic-range (LDR) cameras. |
Guangcong Wang; Yinuo Yang; Chen Change Loy; Ziwei Liu; |
620 | Contrastive Monotonic Pixel-Level Modulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new formulation called MonoPix, an unsupervised and contrastive continuous modulation model, and take a step further to enable a pixel-level spatial control which is critical but can not be properly handled previously. |
Kun Lu; Rongpeng Li; Honggang Zhang; |
621 | Learning Cross-Video Neural Representations for High-Quality Frame Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Cross-Video Neural Representation (CURE) as the first video interpolation method based on neural fields (NF). |
Wentao Shangguan; Yu Sun; Weijie Gan; Ulugbek S. Kamilov; |
622 | Learning Continuous Implicit Representation for Near-Periodic Patterns Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To further improve the robustness, we introduce a periodicity proposal module to search and use multiple candidate periodicities in our pipeline. |
Bowei Chen; Tiancheng Zhi; Martial Hebert; Srinivasa G. Narasimhan; |
623 | End-to-End Graph-Constrained Vectorized Floorplan Generation with Panoptic Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to synthesize floorplans as sequences of 1-D vectors, which eases user interaction and design customization. |
Jiachen Liu; Yuan Xue; Jose Duarte; Krishnendra Shekhawat; Zihan Zhou; Xiaolei Huang; |
624 | Few-Shot Image Generation with Mixup-Based Distance Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider a challenging task of pretraining-free few-shot image synthesis, and seek to train existing generative models with minimal overfitting and mode collapse. |
Chaerin Kong; Jeesoo Kim; Donghoon Han; Nojun Kwak; |
625 | A Style-Based GAN Encoder for High Fidelity Reconstruction of Images and Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new encoder architecture for GAN inversion. |
Xu Yao; Alasdair Newson; Yann Gousseau; Pierre Hellier; |
626 | FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we revisit and compare different contrastive learning strategies in DE-GANs, and identify (i) the current bottleneck of generative performance is the discontinuity of latent space (ii) compared to other contrastive learning strategies, Instance-perturbation works towards latent space continuity, which brings the major improvement to DE-GANs. |
Ziqiang Li; Chaoyue Wang; Heliang Zheng; Jing Zhang; Bin Li; |
627 | BlobGAN: Spatially Disentangled Scene Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an unsupervised, mid-level representation for a generative model of scenes. |
Dave Epstein; Taesung Park; Richard Zhang; Eli Shechtman; Alexei A. Efros; |
628 | Unified Implicit Neural Stylization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To regularize the geometry in 3D scenes, we propose a novel self-distillation geometry consistency loss which preserves the geometry fidelity of the stylized scenes. |
Zhiwen Fan; Yifan Jiang; Peihao Wang; Xinyu Gong; Dejia Xu; Zhangyang Wang; |
629 | GAN with Multivariate Disentangling for Controllable Hair Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Going a further step, we propose an efficiently controllable method that can provide a set of sliding bars to do continuous and fine hair editing. |
Xuyang Guo; Meina Kan; Tianle Chen; Shiguang Shan; |
630 | Discovering Transferable Forensic Features for CNN-Generated Images Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we conduct the first analytical study to discover and understand T-FF in universal detectors. |
Keshigeyan Chandrasegaran; Ngoc-Trung Tran; Alexander Binder; Ngai-Man Cheung; |
631 | Harmonizer: Learning to Perform White-Box Image and Video Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. |
Zhanghan Ke; Chunyi Sun; Lei Zhu; Ke Xu; Rynson W.H. Lau; |
632 | Text2LIVE: Text-Driven Layered Image and Video Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method for zero-shot, text-driven editing of natural images and videos. |
Omer Bar-Tal; Dolev Ofri-Amar; Rafail Fridman; Yoni Kasten; Tali Dekel; |
633 | Digging Into Radiance Grid for Real-Time View Synthesis with Detail Preservation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we dig into the Radiance Grid representation and present a set of improvements, which together result in boosted performance in terms of both speed and quality. |
Jian Zhang; Jinchi Huang; Bowen Cai; Huan Fu; Mingming Gong; Chaohui Wang; Jiaming Wang; Hongchen Luo; Rongfei Jia; Binqiang Zhao; Xing Tang; |
634 | StyleGAN-Human: A Data-Centric Odyssey of Human Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work takes a data-centric perspective and investigates multiple critical aspects in “data engineering”, which we believe would complement the current practice. |
Jianglin Fu; Shikai Li; Yuming Jiang; Kwan-Yee Lin; Chen Qian; Chen Change Loy; Wayne Wu; Ziwei Liu; |
635 | ColorFormer: Image Colorization Via Color Memory Assisted Hybrid-Attention Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an automatic image colorization method via color memory assisted hybrid-attention transformer, namely ColorFormer. |
Xiaozhong Ji; Boyuan Jiang; Donghao Luo; Guangpin Tao; Wenqing Chu; Zhifeng Xie; Chengjie Wang; Ying Tai; |
636 | EAGAN: Efficient Two-Stage Evolutionary Architecture Search for GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the instability, we propose an efficient two-stage evolutionary algorithm-based NAS framework to search GANs, namely EAGAN. |
Guohao Ying; Xin He; Bin Gao; Bo Han; Xiaowen Chu; |
637 | Weakly-Supervised Stitching Network for Real-World Panoramic Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the most challenging point in deep learning-based stitching is to obtain pairs of input images with a narrow field of view and ground truth images with a wide field of view captured from real-world scenes. To overcome this difficulty, we develop a weakly-supervised learning mechanism to train the stitching model without requiring genuine ground truth images. |
Dae-Young Song; Geonsoo Lee; HeeKyung Lee; Gi-Mun Um; Donghyeon Cho; |
638 | DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. |
Songhua Liu; Jingwen Ye; Sucheng Ren; Xinchao Wang; |
639 | Multimodal Conditional Image Synthesis with Product-of-Experts GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This reduces their practicality as multimodal inputs are more expressive and complement each other. To address this limitation, we propose the Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework, which can synthesize images conditioned on multiple input modalities or any subset of them, even the empty set. |
Xun Huang; Arun Mallya; Ting-Chun Wang; Ming-Yu Liu; |
640 | Auto-Regressive Image Synthesis with Integrated Quantization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a versatile framework for conditional image generation which incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression that naturally leads to diverse image generation. |
Fangneng Zhan; Yingchen Yu; Rongliang Wu; Jiahui Zhang; Kaiwen Cui; Changgong Zhang; Shijian Lu; |
641 | JoJoGAN: One Shot Face Stylization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper describes a simple procedure — JoJoGAN — to learn a style mapper from a single example of the style. |
Min Jin Chong; David Forsyth; |
642 | VecGAN: Image-to-Image Translation with Interpretable Latent Directions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose VecGAN, an image-to-image translation framework for facial attribute editing with interpretable latent directions. |
Yusuf Dalva; Said Fahri Alt?ndi?; Aysegul Dundar; |
643 | Any-Resolution Training for High-Resolution Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To take advantage of varied-size data, we introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. |
Lucy Chai; Michaë,l Gharbi; Eli Shechtman; Phillip Isola; Richard Zhang; |
644 | CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to devise a universally versatile style transfer method capable of performing artistic, photo-realistic, and video style transfer jointly, without seeing videos during training. |
Zijie Wu; Zhen Zhu; Junping Du; Xiang Bai; |
645 | CANF-VC: Conditional Augmented Normalizing Flows for Video Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF). |
Yung-Han Ho; Chih-Peng Chang; Peng-Yu Chen; Alessandro Gnutti; Wen-Hsiao Peng; |
646 | Bi-Level Feature Alignment for Versatile Image Translation and Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. |
Fangneng Zhan; Yingchen Yu; Rongliang Wu; Jiahui Zhang; Kaiwen Cui; Aoran Xiao; Shijian Lu; Chunyan Miao; |
647 | High-Fidelity Image Inpainting with GAN Inversion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Nevertheless, the ignorance of hard constraint in these algorithms may yield the gap between GAN inversion and image inpainting. Addressing this problem, in this paper we devise a novel GAN inversion model for image inpainting, dubbed {\it InvertFill}, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space. |
Yongsheng Yu; Libo Zhang; Heng Fan; Tiejian Luo; |
648 | DeltaGAN: Towards Diverse Few-Shot Image Generation with Sample-Specific Delta Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel Delta Generative Adversarial Network (DeltaGAN), which consists of a reconstruction subnetwork and a generation subnetwork. |
Yan Hong; Li Niu; Jianfu Zhang; Liqing Zhang; |
649 | Image Inpainting with Cascaded Modulation GAN and Object-Aware Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an encoder with Fourier convolution blocks that extract multi-scale feature representations from the input image with holes and a dual-stream decoder with a novel cascaded global-spatial modulation block at each scale level. |
Haitian Zheng; Zhe Lin; Jingwan Lu; Scott Cohen; Eli Shechtman; Connelly Barnes; Jianming Zhang; Ning Xu; Sohrab Amirghodsi; Jiebo Luo; |
650 | StyleFace: Towards Identity-Disentangled Face Generation on Megapixels Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose StyleFace, a unified framework for 1024^2 resolution high-fidelity identity swapping and de-identification. |
Yuchen Luo; Junwei Zhu; Keke He; Wenqing Chu; Ying Tai; Chengjie Wang; Junchi Yan; |
651 | Video Extrapolation in Space and Time Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by these observations, we propose to study the problem of Video Extrapolation in Space and Time (VEST). We propose a model that tackles this problem and leverages the self-supervision from both tasks, while existing methods are designed to solve one of them. |
Yunzhi Zhang; Jiajun Wu; |
652 | Contrastive Learning for Diverse Disentangled Foreground Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new method for diverse foreground generation with explicit control over various factors. |
Yuheng Li; Yijun Li; Jingwan Lu; Eli Shechtman; Yong Jae Lee; Krishna Kumar Singh; |
653 | BIPS: Bi-modal Indoor Panorama Synthesis Via Residual Depth-Aided Adversarial Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study a new problem: RGB-D panorama synthesis under the various configurations of cameras and depth sensors. |
Changgyoon Oh; Wonjune Cho; Yujeong Chae; Daehee Park; Lin Wang; Kuk-Jin Yoon; |
654 | Augmentation of RPPG Benchmark Datasets: Learning to Remove and Embed RPPG Signals Via Double Cycle Consistent Learning from Unpaired Facial Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on the estimation of remote photoplethysmography (rPPG) from facial videos and address the deficiency issues of large-scale benchmarking datasets. |
Cheng-Ju Hsieh; Wei-Hao Chung; Chiou-Ting Hsu; |
655 | Geometry-Aware Single-Image Full-Body Human Relighting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although plausible relighting results can be achieved, previous methods suffer from both the entanglement between albedo and lighting and the lack of hard shadows, which significantly decrease the realism. To tackle these two problems, we propose a geometry-aware single-image human relighting framework that leverages single-image geometry reconstruction for joint deployment of traditional graphics rendering and neural rendering techniques. |
Chaonan Ji; Tao Yu; Kaiwen Guo; Jingxin Liu; Yebin Liu; |
656 | 3D-Aware Indoor Scene Synthesis with Depth Priors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We argue that indoor scenes do not have a shared intrinsic structure, and hence only using 2D images cannot adequately guide the model with the 3D geometry. In this work, we fill in this gap by introducing depth as a 3D prior. |
Zifan Shi; Yujun Shen; Jiapeng Zhu; Dit-Yan Yeung; Qifeng Chen; |
657 | Deep Portrait Delighting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a deep neural network for removing undesirable shading features from an unconstrained portrait image, recovering the underlying texture. |
Joshua Weir; Junhong Zhao; Andrew Chalmers; Taehyun Rhee; |
658 | Vector Quantized Image-to-Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose introducing the vector quantization technique into the image-to-image translation framework. |
Yu-Jie Chen; Shin-I Cheng; Wei-Chen Chiu; Hung-Yu Tseng; Hsin-Ying Lee; |
659 | The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We also introduce a simple yet extremely effective Gated Attention (GA) and Region-of-Interest Generation (RoIG) methodology in this paper. |
Hyeonsu Lee; Chankyu Choi; |
660 | Free-Viewpoint RGB-D Human Performance Capture and Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While prior work has shown impressive performance capture results in laboratory settings, it is non-trivial to achieve casual free-viewpoint human capture and rendering for unseen identities with high fidelity, especially for facial expressions, hands, and clothes. To tackle these challenges we introduce a novel view synthesis framework that generates realistic renders from unseen views of any human captured from a single-view and sparse RGB-D sensor, similar to a low-cost depth camera, and without actor-specific models. |
Phong Nguyen-Ha; Nikolaos Sarafianos; Christoph Lassner; Janne Heikkilä,; Tony Tung; |
661 | Multiview Regenerative Morphing with Dual Flows Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to address a new task of image morphing under a multiview setting, which takes two sets of multiview images as the input and generates intermediate renderings that not only exhibit smooth transitions between the two input sets but also ensure visual consistency across different views at any transition state. To achieve this goal, we propose a novel approach called Multiview Regenerative Morphing that formulates the morphing process as an optimization to solve for rigid transformation and optimal-transport interpolation. |
Chih-Jung Tsai; Cheng Sun; Hwann-Tzong Chen; |
662 | Hallucinating Pose-Compatible Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: What does human pose tell us about a scene? We propose a task to answer this question: given human pose as input, hallucinate a compatible scene. |
Tim Brooks; Alexei A. Efros; |
663 | Motion and Appearance Adaptation for Cross-Domain Motion Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: When there are considerable differences between object in the driving video and that in the source image, traditional single domain motion transfer approaches often produce notable artifacts for example, the synthesized image may fail to preserve the human shape of the source image (cf. Fig. 1 (a)). To address this issue, in the present work, we propose a Motion and Appearance Adaptation (MAA) approach for cross-domain motion transfer, in which we regularize the object in the synthesized image to capture the motion of the object in the driving frame, while still preserving the shape and appearance of the object in the source image. |
Borun Xu; Biao Wang; Jinhong Deng; Jiale Tao; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan; |
664 | Layered Controllable Video Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce layered controllable video generation, where we, without any supervision, decompose the initial frame of a video into foreground and background layers, with which the user can control the video generation process by simply manipulating the foreground mask. |
Jiahui Huang; Yuhe Jin; Kwang Moo Yi; Leonid Sigal; |
665 | Custom Structure Preservation in Face Aging Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel architecture for face age editing that can produce structural modifications while maintaining relevant details present in the original image. |
Guillermo Gomez-Trenado; Sté,phane Lathuiliè,re; Pablo Mesejo; Ó,scar Cordó,n; |
666 | Spatio-Temporal Deformable Attention Network for Video Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Actually, not all the pixels in the video frames are sharp and beneficial for deblurring. To address this problem, we propose the spatio-temporal deformable attention network (STDANet) for video delurring, which extracts the information of sharp pixels by considering the pixel-wise blur levels of the video frames. |
Huicong Zhang; Haozhe Xie; Hongxun Yao; |
667 | NeuMesh: Learning Disentangled Neural Mesh-Based Implicit Field for Geometry and Texture Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel mesh-based representation by encoding the neural implicit field with disentangled geometry and texture codes on mesh vertices, which facilitates a set of editing functionalities, including mesh-guided geometry editing, designated texture editing with texture swapping, filling and painting operations. |
Bangbang Yang; Chong Bao; Junyi Zeng; Hujun Bao; Yinda Zhang; Zhaopeng Cui; Guofeng Zhang; |
668 | NeRF for Outdoor Scene Relighting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present NeRF-OSR, i.e., the first approach for outdoor scene relighting based on neural radiance fields.For evaluation, we collect a new benchmark dataset of several outdoor sites photographed from multiple viewpoints and at different times. |
Viktor Rudnev; Mohamed Elgharib; William Smith; Lingjie Liu; Vladislav Golyanik; Christian Theobalt; |
669 | CoGS: Controllable Generation and Search from Sketch and Style Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. |
Cusuh Ham; Gemma Canet Tarré,s; Tu Bui; James Hays; Zhe Lin; John Collomosse; |
670 | HairNet: Hairstyle Transfer with Pose Changes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel algorithm for automatic hairstyle transfer, specifically targeting complicated inputs that do not match in pose. |
Peihao Zhu; Rameen Abdal; John Femiani; Peter Wonka; |
671 | Unbiased Multi-Modality Guidance for Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, it is time-consuming for some methods to be implemented by multiple stages of complex neural networks. To solve this issue, we develop an end-to-end multi-modality guided transformer network, including one inpainting branch and two auxiliary branches for semantic segmentation and edge textures. |
Yongsheng Yu; Dawei Du; Libo Zhang; Tiejian Luo; |
672 | Intelli-Paint: Towards Developing More Human-Intelligible Painting Agents Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we motivate the need to learn more human-intelligible painting sequences in order to facilitate the use of autonomous painting systems in a more interactive context (e.g. as a painting assistant tool for human users or for robotic painting applications). |
Jaskirat Singh; Cameron Smith; Jose Echevarria; Liang Zheng; |
673 | Motion Transformer for Unsupervised Image Animation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: More specifically, we introduce two types of tokens in our proposed method: i) image tokens formed from patch features and corresponding position encoding and ii) motion tokens encoded with motion information. |
Jiale Tao; Biao Wang; Tiezheng Ge; Yuning Jiang; Wen Li; Lixin Duan; |
674 | NÜWA: Visual Synthesis Pre-training for Neural VisUal World CreAtion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a unified multimodal pre-trained model called NÃœWA that can generate new or manipulate existing visual data (i.e., image and video) for various visual synthesis tasks. |
Chenfei Wu; Jian Liang; Lei Ji; Fan Yang; Yuejian Fang; Daxin Jiang; Nan Duan; |
675 | EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose Exquisite and locally editable GAN for makeup transfer (EleGANt). |
Chenyu Yang; Wanrong He; Yingqing Xu; Yang Gao; |
676 | Editing Out-of-Domain GAN Inversion Via Differential Activations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel GAN prior based editing framework to tackle the out-of-domain inversion problem with a composition-decomposition paradigm. |
Haorui Song; Yong Du; Tianyi Xiang; Junyu Dong; Jing Qin; Shengfeng He; |
677 | On The Robustness of Quality Measures for GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fréchet Inception Distance (FID). |
Motasem Alfarra; Juan C. Pé,rez; Anna Frü,hstü,ck; Philip H. S. Torr; Peter Wonka; Bernard Ghanem; |
678 | Sound-Guided Semantic Video Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space.We provide the new high-resolution landscape video dataset (audio-visual pair) for the sound-guided video generation task. |
Seung Hyun Lee; Gyeongrok Oh; Wonmin Byeon; Chanyoung Kim; Won Jeong Ryoo; Sang Ho Yoon; Hyunjun Cho; Jihyun Bae; Jinkyu Kim; Sangpil Kim; |
679 | Inpainting at Modern Camera Resolution By Guided PatchMatch with Auto-Curation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We contribute an inpainting benchmark dataset of photos at 4K and above representative of modern sensors. |
Lingzhi Zhang; Connelly Barnes; Kevin Wampler; Sohrab Amirghodsi; Eli Shechtman; Zhe Lin; Jianbo Shi; |
680 | Controllable Video Generation Through Global and Local Motion Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present GLASS, a method for Global and Local Action-driven Sequence Synthesis. |
Aram Davtyan; Paolo Favaro; |
681 | StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation Via Pre-trained StyleGAN Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we provide a solution from a novel perspective that differs from existing frameworks. |
Fei Yin; Yong Zhang; Xiaodong Cun; Mingdeng Cao; Yanbo Fan; Xuan Wang; Qingyan Bai; Baoyuan Wu; Jue Wang; Yujiu Yang; |
682 | Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a method that builds on 3D-VQGAN and transformers to generate videos with thousands of frames. |
Songwei Ge; Thomas Hayes; Harry Yang; Xi Yin; Guan Pang; David Jacobs; Jia-Bin Huang; Devi Parikh; |
683 | Combining Internal and External Constraints for Unrolling Shutter in Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a space-time solution to the RS problem. |
Eyal Naor; Itai Antebi; Shai Bagon; Michal Irani; |
684 | WISE: Whitebox Image Stylization By Example-Based Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, adapting or extending these techniques to produce new styles is often a tedious and error-prone task that requires expert knowledge. We propose a new paradigm to alleviate this problem: implementing algorithmic image filtering techniques as differentiable operations that can learn parametrizations aligned to certain reference styles. |
Winfried Lö,tzsch; Max Reimann; Martin Bü,ssemeyer; Amir Semmo; Jü,rgen Dö,llner; Matthias Trapp; |
685 | Neural Radiance Transfer Fields for Relightable Novel-View Synthesis with Global Illumination Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: On the other hand, mature Computer Graphics tools allow modeling of complex photo-realistic light transport given all the scene parameters. Combining these approaches, we propose a method for scene relighting under novel views by learning a neural precomputed radiance transfer function, which implicitly handles global illumination effects using novel environment maps. |
Linjie Lyu; Ayush Tewari; Thomas Leimkü,hler; Marc Habermann; Christian Theobalt; |
686 | Transformers As Meta-Learners for Implicit Neural Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by a generalized formulation of gradient-based meta-learning, we propose a formulation that uses Transformers as hypernetworks for INRs, where it can directly build the whole set of INR weights with Transformers specialized as set-to-set mapping. |
Yinbo Chen; Xiaolong Wang; |
687 | Style Your Hair: Latent Optimization for Pose-Invariant Hairstyle Transfer Via Local-Style-Aware Hair Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: HairFIT, a pose-invariant hairstyle transfer model, alleviates this limitation yet still shows unsatisfactory quality in preserving delicate hair textures. To solve these limitations, we propose a high-performing pose-invariant hairstyle transfer model equipped with latent optimization and a newly presented local-style-matching loss. |
Taewoo Kim; Chaeyeon Chung; Yoonseo Kim; Sunghyun Park; Kangyeol Kim; Jaegul Choo; |
688 | High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To settle the issues, we propose a novel try-on condition generator as a unified module of the two stages (i.e., warping and segmentation generation stages). |
Sangyun Lee; Gyojung Gu; Sunghyun Park; Seunghwan Choi; Jaegul Choo; |
689 | A Codec Information Assisted Framework for Efficient Compressed Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, considering the characteristics of compressed videos, we propose a Codec Information Assisted Framework (CIAF) to boost and accelerate recurrent VSR models for compressed videos. |
Hengsheng Zhang; Xueyi Zou; Jiaming Guo; Youliang Yan; Rong Xie; Li Song; |
690 | Injecting 3D Perception of Controllable NeRF-GAN Into StyleGAN for Editable Portrait Image Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The controllability and interpretability of 3D GANs have not been much explored. In this work, we propose two solutions to overcome these weaknesses of 2D GANs and 3D-aware GANs. |
Jeong-gi Kwak; Yuanming Li; Dongsik Yoon; Donghyeon Kim; David Han; Hanseok Ko; |
691 | AdaNeRF: Adaptive Sampling for Real-Time Rendering of Neural Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel dual-network architecture that takes an orthogonal direction by learning how to best reduce the number of required sample points. |
Andreas Kurz; Thomas Neff; Zhaoyang Lv; Michael Zollhö,fer; Markus Steinberger; |
692 | Improving The Perceptual Quality of 2D Animation Interpolation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we address challenges unexplored in previous animation interpolation systems, with a focus on improving perceptual quality. |
Shuhong Chen; Matthias Zwicker; |
693 | Selective TransHDR: Transformer-Based Selective HDR Imaging Using Ghost Region Mask Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Therefore, the CNN-based methods specialized for local features extraction cannot obtain satisfactory results. To address this issue, we propose a transformer-based selective HDR image reconstruction network that uses a ghost region mask. |
Jou Won Song; Ye-In Park; Kyeongbo Kong; Jaeho Kwak; Suk-Ju Kang; |
694 | Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, their frameworks of single-layer lookup tables limit the extension and generalization capacities of the model. In this paper, we propose a framework of series-parallel lookup tables (SPLUT) to alleviate the above issues and achieve efficient image super-resolution. |
Cheng Ma; Jingyi Zhang; Jie Zhou; Jiwen Lu; |
695 | GeoAug: Data Augmentation for Few-Shot NeRF with Geometry Constraints Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We hereby present GeoAug: a data augmentation method for NeRF, which enriches training data based on multi-view geometric constraint. |
Di Chen; Yu Liu; Lianghua Huang; Bin Wang; Pan Pan; |
696 | DoodleFormer: Creative Sketch Drawing with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. |
Ankan Kumar Bhunia; Salman Khan; Hisham Cholakkal; Rao Muhammad Anwer; Fahad Shahbaz Khan; Jorma Laaksonen; Michael Felsberg; |
697 | Implicit Neural Representations for Variable Length Human Motion Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an action-conditional human motion generation method using variational implicit neural representations (INR). |
Pablo Cervantes; Yusuke Sekikawa; Ikuro Sato; Koichi Shinoda; |
698 | Learning Object Placement Via Dual-Path Graph Completion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM). |
Siyuan Zhou; Liu Liu; Li Niu; Liqing Zhang; |
699 | Expanded Adaptive Scaling Normalization for End to End Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To handle the limitations of GDN, we construct an expanded form of the adaptive scaling module, named Expanded Adaptive Scaling Normalization(EASN). |
Chajin Shin; Hyeongmin Lee; Hanbin Son; Sangjin Lee; Dogyoon Lee; Sangyoun Lee; |
700 | Generator Knows What Discriminator Should Learn in Unconditional GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: From our empirical evidences, we propose a new generator-guided discriminator regularization (GGDR) in which the generator feature maps supervise the discriminator to have rich semantic representations in unconditional generation. |
Gayoung Lee; Hyunsu Kim; Junho Kim; Seonghyeon Kim; Jung-Woo Ha; Yunjey Choi; |
701 | Compositional Visual Generation with Composable Diffusion Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an alternative structured approach for compositional generation using diffusion models. |
Nan Liu; Shuang Li; Yilun Du; Antonio Torralba; Joshua B. Tenenbaum; |
702 | ManiFest: Manifold Deformation for Few-Shot Image Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. |
Fabio Pizzati; Jean-Franç,ois Lalonde; Raoul de Charette; |
703 | Supervised Attribute Information Removal and Reconstruction for Image Manipulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an Attribute Information Removal and Reconstruction (AIRR) network that prevents such information hiding by learning how to remove the attribute information entirely, creating attribute excluded features, and then learns to directly inject the desired attributes in a reconstructed image. |
Nannan Li; Bryan A. Plummer; |
704 | BLT: Bidirectional Layout Transformer for Controllable Layout Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To advance conditional layout generation, we introduce BLT, a bidirectional layout transformer. |
Xiang Kong; Lu Jiang; Huiwen Chang; Han Zhang; Yuan Hao; Haifeng Gong; Irfan Essa; |
705 | Diverse Generation from A Single Video Made Possible Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we question the necessity of a GAN for generation from a single video, and introduce a non-parametric baseline for a variety of generation and manipulation tasks. |
Niv Haim; Ben Feinstein; Niv Granot; Assaf Shocher; Shai Bagon; Tali Dekel; Michal Irani; |
706 | Rayleigh EigenDirections (REDs): Nonlinear GAN Latent Space Traversals for Multidimensional Features Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method for finding paths in a deep generative model’s latent space that can maximally vary one set of image features while holding others constant. |
Guha Balakrishnan; Raghudeep Gadde; Aleix Martinez; Pietro Perona; |
707 | Bridging The Domain Gap Towards Generalization in Automatic Colorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel automatic colorization technique that learns domain-invariance across multiple source domains and is able to leverage such invariance to colorize grayscale images in unseen target domains. |
Hyejin Lee; Daehee Kim; Daeun Lee; Jinkyu Kim; Jaekoo Lee; |
708 | Generating Natural Images with Direct Patch Distributions Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we leverage the Sliced Wasserstein Distance to develop an algorithm that explicitly and efficiently minimizes the distance between patch distributions in two images. |
Ariel Elnekave; Yair Weiss; |
709 | Context-Consistent Semantic Image Editing with Style-Preserved Modulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We attribute this to the fact that SPADE only uses an image-independent local semantic layout but ignores the image-specific styles included in the known pixels. To address this issue, we propose a style-preserved modulation (SPM) comprising two modulations processes: The first modulation incorporates the contextual style and semantic layout, and then generates two fused modulation parameters. |
Wuyang Luo; Su Yang; Hong Wang; Bo Long; Weishan Zhang; |
710 | Eliminating Gradient Conflict in Reference-Based Line-Art Colorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This phenomenon motivates us to alleviate the gradient issue by preserving the dominant gradient branch while removing the conflict ones. We propose a novel attention mechanism using this training strategy, Stop-Gradient Attention (SGA), outperforming the attention baseline by a large margin with better training stability. |
Zekun Li; Zhengyang Geng; Zhao Kang; Wenyu Chen; Yibo Yang; |
711 | Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects, in which no image-pose pairs or foreground masks are used for training. |
Atsuhiro Noguchi; Xiao Sun; Stephen Lin; Tatsuya Harada; |
712 | JPEG Artifacts Removal Via Contrastive Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, they may fail to estimate unseen compression types, affecting the subsequent restoration performance. To remedy this issue, we propose an unsupervised compression quality representation learning strategy for the blind JPEG artifacts removal. |
Xi Wang; Xueyang Fu; Yurui Zhu; Zheng-Jun Zha; |
713 | Unpaired Deep Image Dehazing Using Contrastive Disentanglement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper provides a new perspective to treat image dehazing as a two-class separated factor disentanglement task, i.e, the task-relevant factor of clear image reconstruction and the task-irrelevant factor of haze-relevant distribution. |
Xiang Chen; Zhentao Fan; Pengpeng Li; Longgang Dai; Caihua Kong; Zhuoran Zheng; Yufeng Huang; Yufeng Li; |
714 | Efficient Long-Range Attention Network for Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an efficient long-range attention network (ELAN) for image SR. |
Xindong Zhang; Hui Zeng; Shi Guo; Lei Zhang; |
715 | FlowFormer: A Transformer Architecture for Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. |
Zhaoyang Huang; Xiaoyu Shi; Chao Zhang; Qiang Wang; Ka Chun Cheung; Hongwei Qin; Jifeng Dai; Hongsheng Li; |
716 | Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Transformer-based method, coarse-to-fine sparse Transformer (CST), firstly embedding HSI sparsity into deep learning for HSI reconstruction. |
Yuanhao Cai; Jing Lin; Xiaowan Hu; Haoqian Wang; Xin Yuan; Yulun Zhang; Radu Timofte; Luc Van Gool; |
717 | Learning Shadow Correspondence for Video Shadow Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel Shadow-Consistent Correspondence method (SC-Cor) to enhance pixel-wise similarity of the specific shadow regions across frames for video shadow detection. |
Xinpeng Ding; Jingwen Yang; Xiaowei Hu; Xiaomeng Li; |
718 | Metric Learning Based Interactive Modulation for Real-World Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a Metric Learning based Interactive Modulation for Real-World \Super-Resolution (MM-RealSR). |
Chong Mou; Yanze Wu; Xintao Wang; Chao Dong; Jian Zhang; Ying Shan; |
719 | Dynamic Dual Trainable Bounds for Ultra-Low Precision Super-Resolution Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we identify that the performance drop comes from the contradiction between the layer-wise symmetric quantizer and the highly asymmetric activation distribution in SR models. |
Yunshan Zhong; Mingbao Lin; Xunchao Li; Ke Li; Yunhang Shen; Fei Chao; Yongjian Wu; Rongrong Ji; |
720 | OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present OSFormer, the first one-stage transformer framework for camouflaged instance segmentation (CIS). |
Jialun Pei; Tianyang Cheng; Deng-Ping Fan; He Tang; Chuanbo Chen; Luc Van Gool; |
721 | Highly Accurate Dichotomous Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a systematic study on a new task called dichotomous image segmentation (DIS), which aims to segment highly accurate objects from natural images. |
Xuebin Qin; Hang Dai; Xiaobin Hu; Deng-Ping Fan; Ling Shao; Luc Van Gool; |
722 | Boosting Supervised Dehazing Methods Via Bi-Level Patch Reweighting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a bi-level dehazing (BILD) framework by designing an internal loop for weighted supervised dehazing and an external loop for training patch reweighting. |
Xingyu Jiang; Hongkun Dou; Chengwei Fu; Bingquan Dai; Tianrun Xu; Yue Deng; |
723 | Flow-Guided Transformer for Video Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a flow-guided transformer, which innovatively leverage the motion discrepancy exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video inpainting. |
Kaidong Zhang; Jingjing Fu; Dong Liu; |
724 | Shift-tolerant Perceptual Similarity Metric Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies the effect of small misalignment, specifically a small shift between the input and reference image, on existing metrics, and accordingly develops a shift-tolerant similarity metric. |
Abhijay Ghildyal; Feng Liu; |
725 | Perception-Distortion Balanced ADMM Optimization for Single-Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel super-resolution model with a low-frequency constraint (LFc-SR), which balances the objective and perceptual quality through a single model and yields super-resolved images with high PSNR and perceptual scores. |
Yuehan Zhang; Bo Ji; Jia Hao; Angela Yao; |
726 | VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the classical dictionary-based methods and the recent vector quantization (VQ) technique, we propose a VQ-based face restoration method – VQFR. |
Yuchao Gu; Xintao Wang; Liangbin Xie; Chao Dong; Gen Li; Ying Shan; Ming-Ming Cheng; |
727 | Uncertainty Learning in Kernel Estimation for Multi-stage Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such sequential approaches suffer from two fundamental weaknesses – i.e., the lack of robustness (the performance drops when the estimated degradation is inaccurate) and the lack of transparency (network architectures are heuristic without incorporating domain knowledge). To address these issues, we propose a joint Maximum a Posterior (MAP) approach for estimating the unknown kernel and high-resolution image simultaneously. |
Zhenxuan Fang; Weisheng Dong; Xin Li; Jinjian Wu; Leida Li; Guangming Shi; |
728 | Learning Spatio-Temporal Downsampling for Effective Video Upscaling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we aim to solve the space-time aliasing problem by learning a spatio-temporal downsampler. |
Xiaoyu Xiang; Yapeng Tian; Vijay Rengarajan; Lucas D. Young; Bo Zhu; Rakesh Ranjan; |
729 | Learning Local Implicit Fourier Representation for Image Warping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a local texture estimator for image warping (LTEW) followed by an implicit neural representation to deform images into continuous shapes. |
Jaewon Lee; Kwang Pyo Choi; Kyong Hwan Jin; |
730 | SepLUT: Separable Image-Adaptive Lookup Tables for Real-Time Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the other, the 3D LUTs present enhanced component-correlated transform capability but suffer from heavy memory footprint, high training difficulty, and limited cell utilization. Inspired by the conventional divide-and-conquer practice in the image signal processor, we present SepLUT (separable image-adaptive lookup table) to tackle the above limitations. |
Canqian Yang; Meiguang Jin; Yi Xu; Rui Zhang; Ying Chen; Huaida Liu; |
731 | Blind Image Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose and study a novel task named Blind Image Decomposition (BID), which requires separating a superimposed image into constituent underlying images in a blind setting, that is, both the source components involved in mixing as well as the mixing mechanism are unknown. |
Junlin Han; Weihao Li; Pengfei Fang; Chunyi Sun; Jie Hong; Mohammad Ali Armin; Lars Petersson; Hongdong Li; |
732 | MuLUT: Cooperating Multiple Look-Up Tables for Efficient Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Consequently, the receptive field of a single LUT is restricted, resulting in inferior performance. To address this issue, we extend SR-LUT by enabling the cooperation of Multiple LUTs, termed MuLUT. |
Jiacheng Li; Chang Chen; Zhen Cheng; Zhiwei Xiong; |
733 | Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. |
Zhongwei Qiu; Huan Yang; Jianlong Fu; Dongmei Fu; |
734 | Spatial-Frequency Domain Information Integration for Pan-Sharpening Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we first attempt to address pan-sharpening in both spatial-frequency domain and propose a Spatial-Frequency Information Integration Network, dubbed as SFIIN. |
Man Zhou; Jie Huang; Keyu Yan; Hu Yu; Xueyang Fu; Aiping Liu; Xian Wei; Feng Zhao; |
735 | Adaptive Patch Exiting for Scalable Single Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: As image can be divided into patches, which have various restoration difficulties, we present a scalable method based on Adaptive Patch Exiting (APE) to achieve more practical speedup. |
Shizun Wang; Jiaming Liu; Kaixin Chen; Xiaoqi Li; Ming Lu; Yandong Guo; |
736 | Efficient Meta-Tuning for Content-Aware Neural Video Delivery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a method named Efficient Meta-Tuning (EMT) to reduce the computational cost. |
Xiaoqi Li; Jiaming Liu; Shizun Wang; Cheng Lyu; Ming Lu; Yurong Chen; Anbang Yao; Yandong Guo; Shanghang Zhang; |
737 | Reference-Based Image Super-Resolution with Deformable Attention Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, addressing the RefSR problem has two critical challenges: (i) It is difficult to match the correspondence between LR and Ref images when they are significantly different (ii) How to transfer the relevant texture from Ref images to compensate the details for LR images is very challenging. To address these issues of RefSR, this paper proposes a deformable attention Transformer, namely DATSR, with multiple scales, each of which consists of a texture feature encoder (TFE) module, a reference-based deformable attention (RDA) module and a residual feature aggregation (RFA) module. |
Jiezhang Cao; Jingyun Liang; Kai Zhang; Yawei Li; Yulun Zhang; Wenguan Wang; Luc Van Gool; |
738 | Local Color Distributions Prior for Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on this observation, we propose in this paper to exploit these LCDs as a prior for locating and enhancing the two types of regions (i.e., over-/under-exposed regions).Third, we construct a new dataset to facilitate the learning process, by following the camera image signal processing (ISP) pipeline to render standard RGB images with both under-/over-exposures from raw data. |
Haoyuan Wang; Ke Xu; Rynson W.H. Lau; |
739 | L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce transformer into language-based colorization to tackle the aforementioned issues while keeping the language decoupling property. |
Zheng Chang; Shuchen Weng; Yu Li; Si Li; Boxin Shi; |
740 | From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: More importantly, our method provides a new way to handle the real-world complex scenarios by learning their degradation representations from the facial portions, which can be used to significantly improve the quality of non-facial areas. |
Xiaoming Li; Chaofeng Chen; Xianhui Lin; Wangmeng Zuo; Lei Zhang; |
741 | Towards Interpretable Video Super-Resolution Via Alternating Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate low-resolution blurry video. |
Jiezhang Cao; Jingyun Liang; Kai Zhang; Wenguan Wang; Qin Wang; Yulun Zhang; Hao Tang; Luc Van Gool; |
742 | Event-Based Fusion for Motion Deblurring with Cross-Modal Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we rethink the event-based image deblurring problem and unfold it into an end-to-end two-stage image restoration network. |
Lei Sun; Christos Sakaridis; Jingyun Liang; Qi Jiang; Kailun Yang; Peng Sun; Yaozu Ye; Kaiwei Wang; Luc Van Gool; |
743 | Fast and High Quality Image Denoising Via Malleable Convolution Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present \textbf{Malle}able \textbf{Conv}olution (\textbf{MalleConv}), which performs spatial-varying processing with minimal computational overhead. |
Yifan Jiang; Bartlomiej Wronski; Ben Mildenhall; Jonathan T. Barron; Zhangyang Wang; Tianfan Xue; |
744 | TAPE: Task-Agnostic Prior Embedding for Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel approach that embeds a task-agnostic prior into a transformer. |
Lin Liu; Lingxi Xie; Xiaopeng Zhang; Shanxin Yuan; Xiangyu Chen; Wengang Zhou; Houqiang Li; Qi Tian; |
745 | Uncertainty Inspired Underwater Image Enhancement Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we resolve UIE into distribution estimation and consensus process. |
Zhenqi Fu; Wu Wang; Yue Huang; Xinghao Ding; Kai-Kuang Ma; |
746 | Hourglass Attention Network for Image Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Compared to convolution, attention has a lower inductive bias, and the output is highly correlated with the input, making it more suitable for processing images with various breakage. Inspired by this, in this paper we propose a novel attention-based network (transformer), called hourglass attention network (HAN) for image inpainting, which builds an hourglass-shaped attention structure to generate appropriate features for complemented images. |
Ye Deng; Siqi Hui; Rongye Meng; Sanping Zhou; Jinjun Wang; |
747 | Unfolded Deep Kernel Estimation for Blind Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel unfolded deep kernel estimation (UDKE) method, which, for the first time to our best knowledge, explicitly solves the data term with high efficiency. |
Hongyi Zheng; Hongwei Yong; Lei Zhang; |
748 | Event-Guided Deblurring of Unknown Exposure Time Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the event-guided motion deblurring assuming dynamically variable unknown exposure time of the frame-based camera. |
Taewoo Kim; Jeongmin Lee; Lin Wang; Kuk-Jin Yoon; |
749 | ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-Modality Image Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Concretely, we design a deformation module to explicitly compensate geometrical distortions and an attention mechanism to mitigate ghosting-like artifacts, respectively. |
Zhanbo Huang; Jinyuan Liu; Xin Fan; Risheng Liu; Wei Zhong; Zhongxuan Luo; |
750 | Content Adaptive Latents and Decoder for Neural Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a new NIC framework that improves the content adaptability on both latents and the decoder. |
Guanbo Pan; Guo Lu; Zhihao Hu; Dong Xu; |
751 | Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an efficient and effective degradation-adaptive super-resolution (DASR) network, whose parameters are adaptively specified by estimating the degradation of each input image. |
Jie Liang; Hui Zeng; Lei Zhang; |
752 | Unidirectional Video Denoising By Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the offline issue of BiRNN, we present a novel recurrent network consisting of forward and look-ahead recurrent modules for unidirectional video denoising. |
Junyi Li; Xiaohe Wu; Zhenxing Niu; Wangmeng Zuo; |
753 | Self-Supervised Learning for Real-World Super-Resolution from Dual Zoomed Observations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider two challenging issues in reference-based super-resolution (RefSR), (i) how to choose a proper reference image, and (ii) how to learn real-world RefSR in a self-supervised manner. |
Zhilu Zhang; Ruohao Wang; Hongzhi Zhang; Yunjin Chen; Wangmeng Zuo; |
754 | Secrets of Event-Based Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We develop a principled method to extend the Contrast Maximization framework to estimate optical flow from events alone. |
Shintaro Shiba; Yoshimitsu Aoki; Guillermo Gallego; |
755 | Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoiréing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore moiré pattern removal for ultra-high-definition images. |
Xin Yu; Peng Dai; Wenbo Li; Lan Ma; Jiajun Shen; Jia Li; Xiaojuan Qi; |
756 | ERDN: Equivalent Receptive Field Deformable Network for Video Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an equivalent receptive field deformable network (ERDN) to perform alignment at the feature level without estimating optical flow. |
Bangrui Jiang; Zhihuai Xie; Zhen Xia; Songnan Li; Shan Liu; |
757 | Rethinking Generic Camera Models for Deep Single Image Camera Calibration to Recover Rotation and Fisheye Distortion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This degradation is caused by mismatching between the actual projection and expected projection. To address this problem, we propose a generic camera model that has the potential to address various types of distortion. |
Nobuhiko Wakai; Satoshi Sato; Yasunori Ishii; Takayoshi Yamashita; |
758 | ART-SS: An Adaptive Rejection Technique for Semi-Supervised Restoration for Adverse Weather-Affected Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although various weather degradation synthesis methods exist in the literature, the use of synthetically generated weather degraded images often results in sub-optimal performance on the real weatherdegraded images due to the domain gap between synthetic and real world images. To deal with this problem, various semi-supervised restoration (SSR) methods have been proposed for deraining or dehazing which learn to restore clean image using synthetically generated datasets while generalizing better using unlabeled real-world images. |
Rajeev Yasarla; Carey E. Priebe; Vishal M. Patel; |
759 | Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a powerful image decomposition model for fusion task via the self-supervised representation learning, dubbed Decomposition for Fusion (DeFusion). |
Pengwei Liang; Junjun Jiang; Xianming Liu; Jiayi Ma; |
760 | Learning Degradation Representations for Image Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a framework to learn spatially adaptive degradation representations of blurry images. |
Dasong Li; Yi Zhang; Ka Chun Cheung; Xiaogang Wang; Hongwei Qin; Hongsheng Li; |
761 | Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing methods utilize pseudo or weak supervision in LR space and thus deliver results that are blurry or not faithful to the source modality. To address this issue, we present a mutual modulation SR (MMSR) model, which tackles the task by a mutual modulation strategy, including a source-to-guide modulation and a guide-to-source modulation. |
Xiaoyu Dong; Naoto Yokoya; Longguang Wang; Tatsumi Uezato; |
762 | Spectrum-Aware and Transferable Architecture Search for Hyperspectral Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we disentangle the 3D convolution into lightweight 2D spatial and spectral convolutions, and build a spectrum-aware search space for HSI restoration. |
Wei He; Quanming Yao; Naoto Yokoya; Tatsumi Uezato; Hongyan Zhang; Liangpei Zhang; |
763 | Neural Color Operators for Sequential Image Retouching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel image retouching method by modeling the retouching process as performing a sequence of newly introduced trainable neural color operators. |
Yili Wang; Xin Li; Kun Xu; Dongliang He; Qi Zhang; Fu Li; Errui Ding; |
764 | Optimizing Image Compression Via Joint Learning with Denoising Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a novel two-branch, weight-sharing architecture with plug-in feature denoisers to allow a simple and effective realization of the goal with little computational cost. |
Ka Leong Cheng; Yueqi Xie; Qifeng Chen; |
765 | Restore Globally, Refine Locally: A Mask-Guided Scheme to Accelerate Super-Resolution Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The different areas in an image often require different SR intensities by networks with different complexity. Motivated by this, in this paper, we propose a Mask Guided Acceleration (MGA) scheme to reduce the computational costs of existing SR networks while maintaining their SR capability. |
Xiaotao Hu; Jun Xu; Shuhang Gu; Ming-Ming Cheng; Li Liu; |
766 | Compiler-Aware Neural Architecture Search for On-Mobile Real-Time Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resourcelimited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. |
Yushu Wu; Yifan Gong; Pu Zhao; Yanyu Li; Zheng Zhan; Wei Niu; Hao Tang; Minghai Qin; Bin Ren; Yanzhi Wang; |
767 | Modeling Mask Uncertainty in Hyperspectral Image Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments. To address this challenge, we introduce mask uncertainty for HSI with a complete variational Bayesian learning treatment and explicitly model it through a mask decomposition inspired by real hardware. |
Jiamian Wang; Yulun Zhang; Xin Yuan; Ziyi Meng; Zhiqiang Tao; |
768 | Perceiving and Modeling Density for Image Dehazing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address the problem of modeling real-world haze degradation, we propose a novel Separable Hybrid Attention (SHA) module to perceive haze density by capturing positional-sensitive features in the orthogonal directions to achieve this goal. |
Tian Ye; Yunchen Zhang; Mingchao Jiang; Liang Chen; Yun Liu; Sixiang Chen; Erkang Chen; |
769 | Stripformer: Strip Transformer for Fast Image Deblurring Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the current success of transformers on computer vision and image processing tasks, we develop, Stripformer, a transformer-based architecture that constructs intra- and inter-strip tokens to reweight image features in the horizontal and vertical directions to catch blurred patterns with different orientations. |
Fu-Jen Tsai; Yan-Tsung Peng; Yen-Yu Lin; Chung-Chi Tsai; Chia-Wen Lin; |
770 | Deep Fourier-Based Exposure Correction Network with Spatial-Frequency Interaction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a new perspective for exposure correction with spatial-frequency interaction. |
Jie Huang; Yajing Liu; Feng Zhao; Keyu Yan; Jinghao Zhang; Yukun Huang; Man Zhou; Zhiwei Xiong; |
771 | Frequency and Spatial Dual Guidance for Image Dehazing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel image dehazing framework with frequency and spatial dual guidance. |
Hu Yu; Naishan Zheng; Man Zhou; Jie Huang; Zeyu Xiao; Feng Zhao; |
772 | Towards Real-World HDRTV Reconstruction: A Data Synthesis-Based Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we argue that, although traditional TMOs exploit efficient dynamic range compression priors, they have several drawbacks on modeling the realistic degradation: information over-preservation, color bias and possible artifacts, making the trained reconstruction networks hard to generalize well to real-world cases. |
Zhen Cheng; Tao Wang; Yong Li; Fenglong Song; Chang Chen; Zhiwei Xiong; |
773 | Learning Discriminative Shrinkage Deep Networks for Image Deconvolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, explicitly designing these two terms is quite challenging and usually leads to complex optimization problems which are difficult to solve. This paper proposes an effective non-blind deconvolution approach by learning discriminative shrinkage functions to model these terms implicitly. |
Pin-Hung Kuo; Jinshan Pan; Shao-Yi Chien; Ming-Hsuan Yang; |
774 | KXNet: A Model-Driven Deep Neural Network for Blind Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, to solve the classical SISR model, we propose a simple-yet-effective iterative algorithm. |
Jiahong Fu; Hong Wang; Qi Xie; Qian Zhao; Deyu Meng; Zongben Xu; |
775 | ARM: Any-Time Super-Resolution Method Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes an Any-time super-Resolution Method (ARM) to tackle the over-parameterized single image super-resolution (SISR) models. |
Bohong Chen; Mingbao Lin; Kekai Sheng; Mengdan Zhang; Peixian Chen; Ke Li; Liujuan Cao; Rongrong Ji; |
776 | Attention-Aware Learning for Hyperparameter Prediction in Image Processing Pipelines Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose an attention-aware learning method that integrates the parameter prediction network into ISP tuning and utilizes the multi-attention mechanism to generate the attentive mapping between the input RAW image and the parameter space. |
Haina Qin; Longfei Han; Juan Wang; Congxuan Zhang; Yanwei Li; Bing Li; Weiming Hu; |
777 | RealFlow: EM-Based Realistic Optical Flow Dataset Generation from Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, existing approaches try to adapt the trained model on synthetic datasets to authentic videos, which inevitably suffers from domain discrepancy and hinders the performance for real-world applications. To solve these problems, we propose RealFlow, an Expectation-Maximization based framework that can create large-scale optical flow datasets directly from any unlabeled realistic videos. |
Yunhui Han; Kunming Luo; Ao Luo; Jiangyu Liu; Haoqiang Fan; Guiming Luo; Shuaicheng Liu; |
778 | Memory-Augmented Model-Driven Network for Pansharpening Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel memory-augmented model-driven deep unfolding network for pan-sharpening. |
Keyu Yan; Man Zhou; Li Zhang; Chengjun Xie; |
779 | All You Need Is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we proposed a model-agnostic adversarial defensive method, which maps the input RGB images to Bayer RAW space and back to output RGB using a learned camera image signal processing (ISP) pipeline to eliminate potential adversarial patterns. |
Yuxuan Zhang; Bo Dong; Felix Heide; |
780 | Ghost-Free High Dynamic Range Imaging with Context-Aware Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Context-aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. |
Zhen Liu; Yinglong Wang; Bing Zeng; Shuaicheng Liu; |
781 | Style-Guided Shadow Removal Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: After shadow removal, the shadow and non-shadow regions may exhibit inconsistent appearance, leading to a visually disharmonious image. To address this problem, we propose a style-guided shadow removal network (SG-ShadowNet) for better image style consistency after shadow removal. |
Jin Wan; Hui Yin; Zhenyao Wu; Xinyi Wu; Yanting Liu; Song Wang; |
782 | D2C-SR: A Divergence to Convergence Approach for Real-World Image Super-Resolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present D2C-SR, a novel framework for the task of real-world image super-resolution. |
Youwei Li; Haibin Huang; Lanpeng Jia; Haoqiang Fan; Shuaicheng Liu; |
783 | GRIT-VLP: Grouped Mini-Batch Sampling for Efficient Vision and Language Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast to the mainstream VLP methods, we highlight that two routinely applied steps during pre-training have crucial impact on the performance of the pre-trained model: in-batch hard negative sampling for image-text matching (ITM) and assigning the large masking probability for the masked language modeling (MLM). |
Jaeseok Byun; Taebaek Hwang; Jianlong Fu; Taesup Moon; |
784 | Efficient Video Deblurring Guided By Motion Magnitude Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel framework that utilizes the motion magnitude prior (MMP) as guidance for efficient deep video deblurring.We then build a dataset including the blurry frame and MMP pairs. |
Yusheng Wang; Yunfan Lu; Ye Gao; Lin Wang; Zhihang Zhong; Yinqiang Zheng; Atsushi Yamashita; |
785 | Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this problem, in this paper, we propose a physics-inspired transformer model for imaging through atmospheric turbulence. |
Zhiyuan Mao; Ajay Jaiswal; Zhangyang Wang; Stanley H. Chan; |
786 | Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. |
A. Burakhan Koyuncu; Han Gao; Atanas Boev; Georgii Gaikov; Elena Alshina; Eckehard Steinbach; |
787 | Image Super-Resolution with Deep Dictionary Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose an end-to-end super-resolution network with a deep dictionary (SRDD), where a high-resolution dictionary is explicitly learned without sacrificing the advantages of deep learning. |
Shunta Maeda; |
788 | TempFormer: Temporally Consistent Transformer for Video Denoising Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. |
Mingyang Song; Yang Zhang; Tunç O. Ayd?n; |
789 | RAWtoBit: A Fully End-to-End Camera ISP Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the designing of a fully end-to-end optimized camera ISP incorporating image compression. |
Wooseok Jeong; Seung-Won Jung; |
790 | DRCNet: Dynamic Image Restoration Contrastive Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most existing image-restoration models employed static CNN-based models, where the fixed learned filters cannot fit the diverse degradation well. To address this, in this paper, we propose a novel Dynamic Image Restoration Contrastive Network (DRCNet). |
Fei Li; Lingfeng Shen; Yang Mi; Zhenbo Li; |
791 | Zero-Shot Learning for Reflection Removal of Single 360-Degree Image Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, in many cases, real reflection artifacts are sharp and intensive enough such that even humans cannot completely distinguish between the transmitted and reflected scenes. In this paper, we attempt to remove such challenging reflection artifacts using 360-degree images. |
Byeong-Ju Han; Jae-Young Sim; |
792 | Transformer with Implicit Edges for Particle-Based Physics Simulation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consequently, in this paper we propose a novel Transformer-based method, dubbed as Transformer with Implicit Edges (TIE), to capture the rich semantics of particle interactions in an edge-free manner. |
Yidi Shao; Chen Change Loy; Bo Dai; |
793 | Rethinking Video Rain Streak Removal: A New Synthesis Model and A Deraining Network with Video Rain Prior Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing video synthetic models and deraining methods are mostly built on a simplified video rain model assuming that rain streak layers of different video frames are uncorrelated, thereby producing degraded performance on real-world rainy videos. To address this problem, we devise a new video rain synthesis model with the concept of rain streak motions to enforce a consistency of rain layers between video frames, thereby generating more realistic rainy video data for network training, and then develop a recurrent disentangled deraining network (RDD-Net) based on our video rain model for boosting video deraining. |
Shuai Wang; Lei Zhu; Huazhu Fu; Jing Qin; Carola-Bibiane Schö,nlieb; Wei Feng; Song Wang; |
794 | Super-Resolution By Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). |
Jinjin Gu; Haoming Cai; Chenyu Dong; Ruofan Zhang; Yulun Zhang; Wenming Yang; Chun Yuan; |
795 | Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, the results tend to converge to the mean of the multi-modal possibilities. In this paper, we explicitly account for such motion ambiguity, allowing us to generate multiple plausible solutions all in sharp detail. |
Zhihang Zhong; Xiao Sun; Zhirong Wu; Yinqiang Zheng; Stephen Lin; Imari Sato; |
796 | AlphaVC: High-Performance and Efficient Learned Video Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose several techniques to effectively improve the performance. |
Yibo Shi; Yunying Ge; Jing Wang; Jue Mao; |
797 | Content-Oriented Learned Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a content-oriented image compression method, which handles different kinds of image contents with different strategies. |
Meng Li; Shangyin Gao; Yihui Feng; Yibo Shi; Jing Wang; |
798 | RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as. Therefore, in this work, we propose a reciprocal learning framework that can appropriately leverage such a fact to reinforce the learning of a RefSR network. |
Lin Zhang; Xin Li; Dongliang He; Fu Li; Yili Wang; Zhaoxiang Zhang; |
799 | Contrastive Prototypical Network with Wasserstein Confidence Penalty Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose Wasserstein Confidence Penalty which can impose appropriate penalty on overconfident predictions based on the semantic relationships among pseudo classes. |
Haoqing Wang; Zhi-Hong Deng; |
800 | Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address the compound FER task in the cross-domain few-shot learning (FSL) setting, which requires only a few samples of compound expressions in the target domain. |
Xinyi Zou; Yan Yan; Jing-Hao Xue; Si Chen; Hanzi Wang; |
801 | Self-Support Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel self-support matching idea to alleviate this problem. |
Qi Fan; Wenjie Pei; Yu-Wing Tai; Chi-Keung Tang; |
802 | Few-Shot Object Detection with Model Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we pinpoint and comprehensively investigate the model bias problem in FSOD models and propose a simple yet effective method to address the model bias problem with the facilitation of model calibrations in three levels: 1) Backbone calibration to preserve the well-learned prior knowledge and relieve the model bias toward base classes, 2) RPN calibration to rescue unlabeled objects of novel classes and, 3) Detector calibration to prevent the model bias toward a few training samples for novel classes. |
Qi Fan; Chi-Keung Tang; Yu-Wing Tai; |
803 | Self-Supervision Can Be A Good Few-Shot Learner Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: From an information-theoretic perspective, we propose an effective unsupervised FSL method, learning representations with self-supervision. |
Yuning Lu; Liangjian Wen; Jianzhuang Liu; Yajing Liu; Xinmei Tian; |
804 | TSF: Transformer-Based Semantic Filter for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a light and universal module named transformer-based Semantic Filter (tSF), which can be applied for different FSL tasks. |
Jinxiang Lai; Siqian Yang; Wenlong Liu; Yi Zeng; Zhongyi Huang; Wenlong Wu; Jun Liu; Bin-Bin Gao; Chengjie Wang; |
805 | Adversarial Feature Augmentation for Cross-Domain Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most existing works may fail to generalize to novel classes due to the probably large domain discrepancy across domains. To address this issue, we propose a novel adversarial feature augmentation (AFA) method to bridge the domain gap in few-shot learning. |
Yanxu Hu; Andy J. Ma; |
806 | Constructing Balance from Imbalance for Long-Tailed Image Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate the head-to-tail bias, we propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes, dynamically constructing balance from imbalance to facilitate the classification. |
Yue Xu; Yong-Lu Li; Jiefeng Li; Cewu Lu; |
807 | On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We formalize the task of Multi-Domain Long-Tailed Recognition (MDLT), which learns from multi-domain imbalanced data, addresses label imbalance, domain shift, and divergent label distributions across domains, and generalizes to all domain-class pairs. |
Yuzhe Yang; Hao Wang; Dina Katabi; |
808 | Few-Shot Video Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to visual learning in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. |
Qi Fan; Chi-Keung Tang; Yu-Wing Tai; |
809 | Worst Case Matters for Few-Shot Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since this objective is not accessible, we propose to reduce the standard deviation and increase the average accuracy simultaneously. |
Minghao Fu; Yun-Hao Cao; Jianxin Wu; |
810 | Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a Hierarchical Graphical knowledge Representation framework for the confidence-based classification method, dubbed as HGR-Net. |
Kai Yi; Xiaoqian Shen; Yunhao Gou; Mohamed Elhoseiny; |
811 | Doubly Deformable Aggregation of Covariance Matrices for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For the few-shot segmentation task, the main challenge is how to accurately measure the semantic correspondence between the support and query samples with limited training data. To address this problem, we propose to aggregate the learnable covariance matrices with a deformable 4D Transformer to effectively predict the segmentation map. |
Zhitong Xiong; Haopeng Li; Xiao Xiang Zhu; |
812 | Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation (DCAMA), where both foreground and background support information are fully exploited via multi-level pixel-wise correlations between paired query and support features. |
Xinyu Shi; Dong Wei; Yu Zhang; Donghuan Lu; Munan Ning; Jiashun Chen; Kai Ma; Yefeng Zheng; |
813 | Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it often suffers from label inconsistency and limited diversity, which leads to poor performance. In this work, we prove that the core reason for this comes from the lack of a clustering-friendly property in the embedding space. |
Xingping Dong; Jianbing Shen; Ling Shao; |
814 | CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel cluster-based representation, which regularizes the learning process, yielding a representation that generalizes well to instances from unseen classes. |
Shreyank N Gowda; Laura Sevilla-Lara; Frank Keller; Marcus Rohrbach; |
815 | Few-Shot Class-Incremental Learning for 3D Point Cloud Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address FSCIL in the 3D domain. |
Townim Chowdhury; Ali Cheraghian; Sameera Ramasinghe; Sahar Ahmadi; Morteza Saberi; Shafin Rahman; |
816 | Meta-Learning with Less Forgetting on Large-Scale Non-stationary Task Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an OOD Robust and knowleDge presErved semi-supeRvised meta-learning approach (ORDER) we use ORDER to denote the task distributions sequentially arrive with some ORDER, to tackle these two major challenges. |
Zhenyi Wang; Li Shen; Le Fang; Qiuling Suo; Donglin Zhan; Tiehang Duan; Mingchen Gao; |
817 | DNA: Improving Few-Shot Transfer Learning with Low-Rank Decomposition and Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we proposed to boost the transferability of the self-supervised pre-trained models on cross-domain tasks via a novel self-supervised alignment step on the target domain using only unlabeled data before conducting the downstream supervised fine-tuning. |
Ziyu Jiang; Tianlong Chen; Xuxi Chen; Yu Cheng; Luowei Zhou; Lu Yuan; Ahmed Awadallah; Zhangyang Wang; |
818 | Learning Instance and Task-Aware Dynamic Kernels for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to learn the dynamic kernels of a convolution network as a function of the task at hand, enabling faster generalization. |
Rongkai Ma; Pengfei Fang; Gil Avraham; Yan Zuo; Tianyu Zhu; Tom Drummond; Mehrtash Harandi; |
819 | Open-World Semantic Segmentation Via Contrasting and Clustering Vision-Language Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations, by purely exploiting the image-caption data that naturally exist on the Internet. |
Quande Liu; Youpeng Wen; Jianhua Han; Chunjing Xu; Hang Xu; Xiaodan Liang; |
820 | Few-Shot Classification with Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel contrastive learning-based framework that seamlessly integrates contrastive learning into both stages to improve the performance of few-shot classification. |
Zhanyuan Yang; Jinghua Wang; Yingying Zhu; |
821 | Time-rEversed DiffusioN TEnsor Transformer: A New TENET of Few-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle the challenging problem of Few-shot Object Detection. |
Shan Zhang; Naila Murray; Lei Wang; Piotr Koniusz; |
822 | Self-Promoted Supervision for Few-Shot Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we empirically find that with the same few-shot learning frameworks, replacing the widely used CNN feature extractor with a ViT model often severely impairs few-shot classification performance. |
Bowen Dong; Pan Zhou; Shuicheng Yan; Wangmeng Zuo; |
823 | Few-Shot Object Counting and Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given a few exemplar bounding boxes of a target object class, we seek to count and detect all the objects of the target class. |
Thanh Nguyen; Chau Pham; Khoi Nguyen; Minh Hoai; |
824 | Rethinking Few-Shot Object Detection on A Multi-Domain Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a Multi-dOmain Few-Shot Object Detection (MoFSOD) benchmark consisting of 10 datasets from a wide range of domains to evaluate FSOD algorithms. |
Kibok Lee; Hao Yang; Satyaki Chakraborty; Zhaowei Cai; Gurumurthy Swaminathan; Avinash Ravichandran; Onkar Dabeer; |
825 | Cross-Domain Cross-Set Few-Shot Learning Via Learning Compact and Aligned Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider the domain shift problem in FSL and aim to address the domain gap between the support set and the query set. |
Wentao Chen; Zhang Zhang; Wei Wang; Liang Wang; Zilei Wang; Tieniu Tan; |
826 | Mutually Reinforcing Structure with Proposal Contrastive Consistency for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The majority of former solutions are mainly based on meta-learning or transfer-learning, neglecting the fact that images from the base set might contain unlabeled novel-class objects, which easily leads to performance degradation and poor plasticity since those novel objects are served as the background. Based on the above phenomena, we propose a Mutually Reinforcing Structure Network (MRSN) to make rational use of unlabeled novel class instances in the base set. |
Tianxue Ma; Mingwei Bi; Jian Zhang; Wang Yuan; Zhizhong Zhang; Yuan Xie; Shouhong Ding; Lizhuang Ma; |
827 | Dual Contrastive Learning with Anatomical Auxiliary Supervision for Few-Shot Medical Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a few-shot segmentation model that employs anatomical auxiliary information from medical images without target classes for dual contrastive learning. |
Huisi Wu; Fangyan Xiao; Chongxin Liang; |
828 | Improving Few-Shot Learning Through Multi-task Representation Learning Theory Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. |
Quentin Bouniot; Ievgen Redko; Romaric Audigier; Angé,lique Loesch; Amaury Habrard; |
829 | Tree Structure-Aware Few-Shot Image Classification Via Hierarchical Aggregation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we mainly focus on the problem of how to learn additional feature representations for few-shot image classification through pretext tasks (e.g., rotation or color permutation and so on). |
Min Zhang; Siteng Huang; Wenbin Li; Donglin Wang; |
830 | Inductive and Transductive Few-Shot Video Classification Via Appearance and Temporal Alignments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel method for few-shot video classification, which performs appearance and temporal alignments. |
Khoi D. Nguyen; Quoc-Huy Tran; Khoi Nguyen; Binh-Son Hua; Rang Nguyen; |
831 | Temporal and Cross-Modal Attention for Audio-Visual Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a multi-modal and Temporal Cross-attention Framework for audio-visual generalised zero-shot learning. |
Otniel-Bogdan Mercea; Thomas Hummel; A. Sophia Koepke; Zeynep Akata; |
832 | HM: Hybrid Masking for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a simple, effective, and efficient approach to enhance feature masking (FM). |
Seonghyeon Moon; Samuel S. Sohn; Honglu Zhou; Sejong Yoon; Vladimir Pavlovic; Muhammad Haris Khan; Mubbasir Kapadia; |
833 | TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a transformer framework for few-shot learning, termed TransVLAD, with one focus showing the power of locally aggregated descriptors for few-shot learning. |
Haoquan Li; Laoming Zhang; Daoan Zhang; Lang Fu; Peng Yang; Jianguo Zhang; |
834 | Kernel Relative-Prototype Spectral Filtering for Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a framework of spectral filtering (shrinkage) for measuring the difference between query samples and prototypes, or namely the relative prototypes, in a reproducing kernel Hilbert space (RKHS). |
Tao Zhang; Wu Huang; |
835 | “This Is My Unicorn, Fluffy”: Personalizing Frozen Vision-Language Representations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new learning setup called Personalized Vision & Language (PerVL) with two new benchmark datasets for retrieving and segmenting user-specific (personalized) concepts “in the wild. |
Niv Cohen; Rinon Gal; Eli A. Meirom; Gal Chechik; Yuval Atzmon; |
836 | CLOSE: Curriculum Learning on The Sharing Extent Towards Better One-Shot NAS Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: But these improved methods introduce a large number of extra parameters and thus cause an undesirable trade-off between the training costs and the ranking quality. To alleviate the above issues, we propose to apply Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively. |
Zixuan Zhou; Xuefei Ning; Yi Cai; Jiashu Han; Yiping Deng; Yuhan Dong; Huazhong Yang; Yu Wang; |
837 | Streamable Neural Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose streamable neural fields, a single model that consists of executable sub-networks of various widths. |
Junwoo Cho; Seungtae Nam; Daniel Rho; Jong Hwan Ko; Eunbyung Park; |
838 | Gradient-Based Uncertainty for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a post hoc uncertainty estimation approach for an already trained and thus fixed depth estimation model, represented by a deep neural network. |
Julia Hornauer; Vasileios Belagiannis; |
839 | Online Continual Learning with Contrastive Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a framework Contrastive Vision Transformer (CVT), which designs a focal contrastive learning strategy based on a transformer architecture, to achieve a better stability-plasticity trade-off for online CL. |
Zhen Wang; Liu Liu; Yajing Kong; Jiaxian Guo; Dacheng Tao; |
840 | CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy. |
Taeho Kim; Yongin Kwon; Jemin Lee; Taeho Kim; Sangtae Ha; |
841 | EAutoDet: Efficient Architecture Search for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, this paper introduces an efficient framework, named EAutoDet, that can discover practical backbone and FPN architectures for object detection in 1.4 GPU-days. |
Xiaoxing Wang; Jiale Lin; Juanping Zhao; Xiaokang Yang; Junchi Yan; |
842 | A Max-Flow Based Approach for Neural Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Unlike previous NAS strategies based on reinforcement learning, genetic algorithm, Bayesian optimization, and differential programming method, we formulate the NAS task as a Max-Flow problem on search space consisting of Directed Acyclic Graph (DAG) and thus propose a novel NAS approach, called MF-NAS, which defines the search space and designs the search strategy in a fully graphic manner. |
Chao Xue; Xiaoxing Wang; Junchi Yan; Chun-Guang Li; |
843 | OccamNets: Mitigating Dataset Bias By Favoring Simpler Hypotheses Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. |
Robik Shrestha; Kushal Kafle; Christopher Kanan; |
844 | ERA: Enhanced Rational Activations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite their apparent potential, prior formulations are either not safe, not smooth or not true rational functions, and they only work with careful initialisation. Aiming to mitigate these issues, we propose a novel, enhanced rational function, ERA, and investigate how to better accommodate the specific needs of these activations, to both network components and training regime. |
Martin Trimmel; Mihai Zanfir; Richard Hartley; Cristian Sminchisescu; |
845 | Convolutional Embedding Makes Hierarchical Vision Transformer Stronger Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate the problem by profoundly exploring how the macro architecture of the hybrid CNNs/ViTs enhances the performances of hierarchical ViTs. |
Cong Wang; Hongmin Xu; Xiong Zhang; Li Wang; Zhitong Zheng; Haifeng Liu; |
846 | Active Label Correction Using Robust Parameter Update and Entropy Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Also, naively selecting a batch of low confidence examples can result in redundant labeling of spatially adjacent examples. We present a new ALC algorithm that addresses these challenges. |
Kwang In Kim; |
847 | Unpaired Image Translation Via Vector Symbolic Architectures Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content corruption aka semantic flipping. To address this problem, we propose a new paradigm for image-to-image translation using Vector Symbolic Architectures (VSA), a theoretical framework which defines algebraic operations in a high-dimensional vector (hypervector) space. |
Justin Theiss; Jay Leverett; Daeil Kim; Aayush Prakash; |
848 | UniNet: Unified Architecture Search with Convolution, Transformer, and MLP Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we study the learnable combination of convolution, transformer, and MLP by proposing a novel unified architecture search approach. |
Jihao Liu; Xin Huang; Guanglu Song; Hongsheng Li; Yu Liu; |
849 | AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we thoroughly investigate the key differences between vision Transformers and recent all-MLP models. |
Yongming Rao; Wenliang Zhao; Jie Zhou; Jiwen Lu; |
850 | TinyViT: Fast Pretraining Distillation for Small Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most prevailing ViT models suffer from huge number of parameters, restricting their applicability on devices with limited resources. To alleviate this issue, we propose TinyViT, a new family of tiny and efficient small vision transformers pretrained on large-scale datasets with our proposed fast distillation framework. |
Kan Wu; Jinnian Zhang; Houwen Peng; Mengchen Liu; Bin Xiao; Jianlong Fu; Lu Yuan; |
851 | Equivariant Hypergraph Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the other hand, tensor-based equivariant neural networks enjoy maximal expressiveness, but their application has been limited in hypergraphs due to heavy computation and strict assumptions on fixed-order hyperedges. We resolve these problems and present Equivariant Hypergraph Neural Network (EHNN), the first attempt to realize maximally expressive equivariant layers for general hypergraph learning. |
Jinwoo Kim; Saeyoon Oh; Sungjun Cho; Seunghoon Hong; |
852 | ScaleNet: Searching for The Model to Scale Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. |
Jiyang Xie; Xiu Su; Shan You; Zhanyu Ma; Fei Wang; Chen Qian; |
853 | Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since BC is an approximate physical model violated in several situations, we propose to train a physically-constrained network complemented with a data-driven network. |
Vincent Le Guen; Clé,ment Rambour; Nicolas Thome; |
854 | ViTAS: Vision Transformer Architecture Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we argue that since ViTs mainly operate on token embeddings with little inductive bias, imbalance of channels for different architectures would worsen the weight-sharing assumption and cause the training instability as a result. |
Xiu Su; Shan You; Jiyang Xie; Mingkai Zheng; Fei Wang; Chen Qian; Changshui Zhang; Xiaogang Wang; Chang Xu; |
855 | LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we begin by proposing a unified framework of such, with the key idea being factorizing the neural networks into a series of view transforms and neural layers. |
Chenxi Liu; Zhaoqi Leng; Pei Sun; Shuyang Cheng; Charles R. Qi; Yin Zhou; Mingxing Tan; Dragomir Anguelov; |
856 | Uncertainty-DTW for Time Series and Sequences Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Thus, in this paper, we propose to model the so-called aleatoric uncertainty of a differentiable (soft) version of DTW. |
Lei Wang; Piotr Koniusz; |
857 | Black-Box Few-Shot Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. |
Dang Nguyen; Sunil Gupta; Kien Do; Svetha Venkatesh; |
858 | Revisiting Batch Norm Initialization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We revisit the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues. |
Jim Davis; Logan Frank; |
859 | SSBNet: Improving Visual Recognition Efficiency By Adaptive Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that using adaptive sampling as the main component in a deep neural network can improve network efficiency. |
Ho Man Kwan; Shenghui Song; |
860 | Filter Pruning Via Feature Discrimination in Deep Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hence, we propose Distinguishing Layer Pruning based on RFC (DLRFC), i.e., discriminately prune the filters in different layers, which avoids measuring filters between different layers directly against filter criteria. |
Zhiqiang He; Yaguan Qian; Yuqi Wang; Bin Wang; Xiaohui Guan; Zhaoquan Gu; Xiang Ling; Shaoning Zeng; Haijiang Wang; Wujie Zhou; |
861 | LA3: Efficient Label-Aware AutoAugment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel two-stage data augmentation algorithm, named Label-Aware AutoAugment (LA3), which takes advantage of the label information, and learns augmentation policies separately for samples of different labels. |
Mingjun Zhao; Shan Lu; Zixuan Wang; Xiaoli Wang; Di Niu; |
862 | Interpretations Steered Network Pruning Via Amortized Inferred Saliency Maps Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, these metrics mainly focus on the model’s ‘outputs’ or ‘weights’ and neglect its ‘interpretations’ information. To fill in this gap, we propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process, thereby utilizing information from both inputs and outputs of the model. |
Alireza Ganjdanesh; Shangqian Gao; Heng Huang; |
863 | BA-Net: Bridge Attention for Deep Convolutional Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In attention mechanism research, most existing methods are hard to utilize well the information of the neural network with high computing efficiency due to heavy feature compression in the attention layer. This paper proposes a simple and general approach named Bridge Attention to address this issue. |
Yue Zhao; Junzhou Chen; Zirui Zhang; Ronghui Zhang; |
864 | SAU: Smooth Activation Function Using Convolution with Approximate Identities Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose new smooth approximations of a non-differentiable activation function by convolving it with approximate identities. |
Koushik Biswas; Sandeep Kumar; Shilpak Banerjee; Ashish Kumar Pandey; |
865 | Multi-Exit Semantic Segmentation Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a framework for converting state-of-the-art segmentation CNNs to Multi-Exit Semantic Segmentation (MESS) networks: specially trained models that employ parametrised early exits along their depth to i) dynamically save computation during inference on easier samples and ii) save training and maintenance cost by offering a post-training customisable speed-accuracy trade-off. |
Alexandros Kouris; Stylianos I. Venieris; Stefanos Laskaridis; Nicholas Lane; |
866 | Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new technique for constructing such Lipschitz networks that has a number of desirable properties: it can be applied to any linear network layer (fully-connected or convolutional), it provides formal guarantees on the Lipschitz constant, it is easy to implement and efficient to run, and it can be combined with any training objective and optimization method. |
Bernd Prach; Christoph H. Lampert; |
867 | \texttt{\textbf{PointScatter}}: Point Set Representation for Tubular Structure Extraction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Compared with the traditional mask representation, the point set representation enjoys its flexibility and representation ability, which would not be restricted by the fixed grid as the mask. Inspired by this, we propose PointScatter, an alternative to the segmentation models for the tubular structure extraction task. |
Dong Wang; Zhao Zhang; Ziwei Zhao; Yuhang Liu; Yihong Chen; Liwei Wang; |
868 | Check and Link: Pairwise Lesion Correspondence Guides Mammogram Mass Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new transformer-based framework CL-Net to learn lesion detection and pairwise correspondence in an end-to-end manner. |
Ziwei Zhao; Dong Wang; Yihong Chen; Ziteng Wang; Liwei Wang; |
869 | Graph-Constrained Contrastive Regularization for Semi-Weakly Volumetric Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we investigate how models can be trained from sparsely annotated volumes, i.e. volumes with only individual slices annotated. |
Simon Reiß,; Constantin Seibold; Alexander Freytag; Erik Rodner; Rainer Stiefelhagen; |
870 | Generalizable Medical Image Segmentation Via Random Amplitude Mixup and Domain-Specific Image Restoration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present a novel generalizable medical image segmentation method. |
Ziqi Zhou; Lei Qi; Yinghuan Shi; |
871 | Auto-FedRL: Federated Hyperparameter Optimization for Multi-Institutional Medical Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an efficient reinforcement learning (RL)-based federated hyperparameter optimization algorithm, termed Auto-FedRL, in which an online RL agent can dynamically adjust hyperparameters of each client based on the current training progress. |
Pengfei Guo; Dong Yang; Ali Hatamizadeh; An Xu; Ziyue Xu; Wenqi Li; Can Zhao; Daguang Xu; Stephanie Harmon; Evrim Turkbey; Baris Turkbey; Bradford Wood; Francesca Patella; Elvira Stellato; Gianpaolo Carrafiello; Vishal M. Patel; Holger R. Roth; |
872 | Personalizing Federated Medical Image Segmentation Via Local Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a personalized federated framework with Local Calibration (LC-Fed), to leverage the inter-site in-consistencies in both feature- and prediction- levels to boost the segmentation. |
Jiacheng Wang; Yueming Jin; Liansheng Wang; |
873 | One-Shot Medical Landmark Localization By Edge-Guided Transform and Noisy Landmark Refinement Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To handle the significant structure variations, we learn an end-to-end cascade of global alignment and local deformations, under the guidance of novel loss functions which incorporate edge information. |
Zihao Yin; Ping Gong; Chunyu Wang; Yizhou Yu; Yizhou Wang; |
874 | Ultra-High-Resolution Unpaired Stain Transformation Via Kernelized Instance Normalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, we proposed a strategy for ultra-high-resolution unpaired image-to-image translation: Kernelized Instance Normalization (KIN), which preserves local information and successfully achieves seamless stain transformation with constant GPU memory usage. |
Ming-Yang Ho; Min-Sheng Wu; Che-Ming Wu; |
875 | Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on multi-modal 3D MRI brain tumor segmentation and propose a dynamic architecture network named Med-DANet based on adaptive model selection to achieve effective accuracy and efficiency trade-off. |
Wenxuan Wang; Chen Chen; Jing Wang; Sen Zha; Yan Zhang; Jiangyun Li; |
876 | ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Despite the extensive benchmarks in natural images for dense tasks, such studies are, unfortunately, absent in current works for pathology. Our paper in- tends to narrow this gap. |
Jiawei Yang; Hanbo Chen; Yuan Liang; Junzhou Huang; Lei He; Jianhua Yao; |
877 | CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce cryoAI, an ab initio reconstruction algorithm for homogeneous conformations that uses direct gradient-based optimization of particle poses and the electron scattering potential from single-particle cryo-EM data. |
Axel Levy; Fré,dé,ric Poitevin; Julien Martel; Youssef Nashed; Ariana Peck; Nina Miolane; Daniel Ratner; Mike Dunne; Gordon Wetzstein; |
878 | UniMiSS: Universal Medical Self-Supervised Learning Via Breaking Dimensionality Barrier Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we advocate bringing a wealth of 2D images like chest X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. |
Yutong Xie; Jianpeng Zhang; Yong Xia; Qi Wu; |
879 | DLME: Deep Local-Flatness Manifold Embedding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The poor local connectivity of under-sampling data in the former step and inappropriate optimization objectives in the latter step leads to two problems: structural distortion and underconstrained embedding. This paper proposes a novel ML framework named Deep Local-flatness Manifold Embedding (DLME) to solve these problems. |
Zelin Zang; Siyuan Li; Di Wu; Ge Wang; Kai Wang; Lei Shang; Baigui Sun; Hao Li; Stan Z. Li; |
880 | Semi-Supervised Keypoint Detector and Descriptor for Retinal Image Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For retinal image matching (RIM), we propose SuperRetina, the first end-to-end method with jointly trainable keypoint detector and descriptor. |
Jiazhen Liu; Xirong Li; Qijie Wei; Jie Xu; Dayong Ding; |
881 | Graph Neural Network for Cell Tracking in Microscopy Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel graph neural network (GNN) approach for cell tracking in high-throughput microscopy videos. |
Tal Ben-Haim; Tammy Riklin Raviv; |
882 | CXR Segmentation By AdaIN-Based Domain Adaptation and Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by recent multi-domain image translation approaches, here we propose a novel segmentation framework using adap- tive instance normalization (AdaIN), so that a single generator is trained to perform both domain adaptation and semi-supervised segmentation tasks via knowledge distillation by simply changing task-specific AdaIN codes. |
Yujin Oh; Jong Chul Ye; |
883 | Accurate Detection of Proteins in Cryo-Electron Tomograms from Sparse Labels Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Classical template-based methods have high false-positive rates due to the very low signal-to-noise ratios (SNR) typical of CET volumes, while more recent neural-network based detection algorithms require extensive labeling, are very slow to train and can take days to run. To address these issues, we propose a novel particle detection framework that uses positive-unlabeled learning and exploits the unique properties of 3D tomograms to improve detection performance. |
Qinwen Huang; Ye Zhou; Hsuan-Fu Liu; Alberto Bartesaghi; |
884 | K-SALSA: K-Anonymous Synthetic Averaging of Retinal Images Via Local Style Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While prior works have explored image de-identification strategies based on synthetic averaging of images in other domains (e.g. facial images), existing techniques face difficulty in preserving both privacy and clinical utility in retinal images, as we demonstrate in our work. We therefore introduce k-SALSA, a generative adversarial network (GAN)-based framework for synthesizing retinal fundus images that summarize a given private dataset while satisfying the privacy notion of k-anonymity. |
Minkyu Jeon; Hyeonjin Park; Hyunwoo J. Kim; Michael Morley; Hyunghoon Cho; |
885 | RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention-Guided Disease Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present RadioTransformer, a novel visual attention-driven transformer framework, that leverages radiologists’ gaze patterns and models their visuo-cognitive behavior for disease diagnosis on chest radiographs. |
Moinak Bhattacharya; Shubham Jain; Prateek Prasanna; |
886 | Differentiable Zooming for Multiple Instance Learning on Whole-Slide Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, inspired by the pathological diagnostic process, we propose ZoomMIL, a method that learns to perform multi-level zooming in an end-to-end manner. |
Kevin Thandiackal; Boqi Chen; Pushpak Pati; Guillaume Jaume; Drew F. K. Williamson; Maria Gabrani; Orcun Goksel; |
887 | Learning Uncoupled-Modulation CVAE for 3D Action-Conditioned Human Motion Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose the Uncoupled-Modulation Conditional Variational AutoEncoder(UM-CVAE) to generate action-conditioned motions from scratch in an uncoupled manner. |
Chongyang Zhong; Lei Hu; Zihao Zhang; Shihong Xia; |
888 | Towards Grand Unification of Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. |
Bin Yan; Yi Jiang; Peize Sun; Dong Wang; Zehuan Yuan; Ping Luo; Huchuan Lu; |
889 | ByteTrack: Multi-Object Tracking By Associating Every Detection Box Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones. |
Yifu Zhang; Peize Sun; Yi Jiang; Dongdong Yu; Fucheng Weng; Zehuan Yuan; Ping Luo; Wenyu Liu; Xinggang Wang; |
890 | Robust Multi-Object Tracking By Marginal Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the problem, we present an efficient approach to compute a marginal probability for each pair of objects in real time. |
Yifu Zhang; Chunyu Wang; Xinggang Wang; Wenjun Zeng; Wenyu Liu; |
891 | PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most (3D) multi-object tracking methods rely on appearance-based cues for data association. By contrast, we investigate how far we can get by only encoding geometric relationships between objects in 3D space as cues for data-driven data association. |
Aleksandr Kim; Guillem Brasó,; Aljoša Ošep; Laura Leal-Taixé,; |
892 | Particle Video Revisited: Tracking Through Occlusions Using Point Trajectories Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we revisit Sand and Teller’s particle video approach, and study pixel tracking as a long-range motion estimation problem, where every pixel is described with a trajectory that locates it in multiple future frames. |
Adam W. Harley; Zhaoyuan Fang; Katerina Fragkiadaki; |
893 | Tracking Objects As Pixel-Wise Distributions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike tracking via detected bounding boxes or center points, we propose tracking objects as pixel-wise distributions. |
Zelin Zhao; Ze Wu; Yueqing Zhuang; Boxun Li; Jiaya Jia; |
894 | CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose Context-Matching-Guided Transformer (CMT), a Siamese tracking paradigm for 3D single object tracking. |
Zhiyang Guo; Yunyao Mao; Wengang Zhou; Min Wang; Houqiang Li; |
895 | Towards Generic 3D Tracking in RGBD Videos: Benchmark and Baseline Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Thus, in this paper, we investigate a novel problem: is it possible to track a generic (class-agnostic) 3D object in RGBD videos and predict 3D bounding boxes of the object of interest? |
Jinyu Yang; Zhongqun Zhang; Zhe Li; Hyung Jin Chang; Aleš Leonardis; Feng Zheng; |
896 | Hierarchical Latent Structure for Multi-modal Vehicle Trajectory Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that a similar problem, in which the generated trajectory is located between adjacent lanes, often arises in VAE-based trajectory forecasting models. To mitigate this problem, we introduce a hierarchical latent structure into the VAE-based forecasting model. |
Dooseop Choi; KyoungWook Min; |
897 | AiATrack: Attention in Attention for Transformer Visual Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weights, which inhibits further performance improvement. To address this issue, we propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. |
Shenyuan Gao; Chunluan Zhou; Chao Ma; Xinggang Wang; Junsong Yuan; |
898 | Disentangling Architecture and Training for Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To explore these questions, rather than develop a new model, we revisit three prominent models, PWC-Net, IRR-PWC and RAFT, with a common set of modern training techniques, and observe significantly better performance, demonstrating the importance and generality of these training details. |
Deqing Sun; Charles Herrmann; Fitsum Reda; Michael Rubinstein; David J. Fleet; William T. Freeman; |
899 | A Perturbation-Constrained Adversarial Attack for Evaluating The Robustness of Optical Flow Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, in this work, we propose a novel adversarial attack – the Perturbation-Constrained Flow Attack (PCFA) – that emphasizes destructivity over applicability as a real-world attack. |
Jenny Schmalfuss; Philipp Scholze; André,s Bruhn; |
900 | Robust Landmark-Based Stent Tracking in X-Ray Fluoroscopy Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an end-to-end deep learning framework for single stent tracking, which consists of three hierarchical modules: a U-Net for landmark detection, a ResNet for stent proposal and feature extraction, and a graph convolutional neural network for stent tracking that temporally aggregates both spatial information and appearance features. |
Luojie Huang; Yikang Liu; Li Chen; Eric Z. Chen; Xiao Chen; Shanhui Sun; |
901 | Social ODE: Multi-agent Trajectory Forecasting with Neural Ordinary Differential Equations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Most previous methods use RNNs or Transformers to model agent dynamics in the temporal dimension and social pooling or GNNs to model interactions with other agents these approaches usually fail to learn the underlying continuous temporal dynamics and agent interactions explicitly. To address these problems, we propose Social ODE which explicitly models temporal agent dynamics and agent interactions. |
Song Wen; Hao Wang; Dimitris N. Metaxas; |
902 | Social-SSL: Self-Supervised Cross-Sequence Representation Learning Based on Transformers for Multi-agent Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose Social-SSL that captures cross-sequence trajectory structures via self-supervised pre-training, which plays a crucial role in improving both data efficiency and generalizability of Transformer networks for trajectory prediction. |
Li-Wu Tsao; Yan-Kai Wang; Hao-Siang Lin; Hong-Han Shuai; Lai-Kuan Wong; Wen-Huang Cheng; |
903 | Diverse Human Motion Prediction Guided By Multi-level Spatial-Temporal Anchors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity. |
Sirui Xu; Yu-Xiong Wang; Liang-Yan Gui; |
904 | Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel architecture named GP-Graph which has collective group representations for effective pedestrian trajectory prediction in crowded environments, and is compatible with all types of existing approaches. |
Inhwan Bae; Jin-Hwi Park; Hae-Gon Jeon; |
905 | Sequential Multi-View Fusion Network for Fast LiDAR Point Motion Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Thus, we propose a novel sequential multi-view fusion network (SMVF), composed of a BEV branch and an RV branch, in charge of encoding the motion information and spatial information, respectively. |
Gang Zhang; Xiaoyan Li; Zhenhua Wang; |
906 | E-Graph: Minimal Solution for Rigid Rotation with Extensibility Graphs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, a new minimal solution is proposed to solve relative rotation estimation between two images without overlapping areas by exploiting a new graph structure, which we call Extensibility Graph (E-Graph). |
Yanyan Li; Federico Tombari; |
907 | Point Cloud Compression with Range Image-Based Entropy Model for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a range image-based three-stage framework to compress the scanning LiDAR’s point clouds using the entropy model. |
Sukai Wang; Ming Liu; |
908 | Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. To tackle the above issue, we propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows. |
Botao Ye; Hong Chang; Bingpeng Ma; Shiguang Shan; Xilin Chen; |
909 | MotionCLIP: Exposing Human Motion Generation to CLIP Space Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce MotionCLIP, a 3D human motion auto-encoder featuring a latent embedding that is disentangled, well behaved, and supports highly semantic textual descriptions. |
Guy Tevet; Brian Gordon; Amir Hertz; Amit H. Bermano; Daniel Cohen-Or; |
910 | Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a Simplified Tracking architecture (SimTrack) by leveraging a transformer backbone for joint feature extraction and interaction. |
Boyu Chen; Peixia Li; Lei Bai; Lei Qiao; Qiuhong Shen; Bo Li; Weihao Gan; Wei Wu; Wanli Ouyang; |
911 | Aware of The History: Trajectory Forecasting with The Local Behavior Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite great improvements in trajectory forecasting with the guidance of high-definition maps, only a few works have explored such local historical information. In this work, we re-introduce this information as a new type of input data for trajectory forecasting systems: the local behavior data, which we conceptualize as a collection of location-specific historical trajectories. |
Yiqi Zhong; Zhenyang Ni; Siheng Chen; Ulrich Neumann; |
912 | Optical Flow Training Under Limited Label Budget Via Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We use a simple yet effective semi-supervised training method to show that even a small fraction of labels can improve flow accuracy by a significant margin over unsupervised training. |
Shuai Yuan; Xian Sun; Hannah Kim; Shuzhi Yu; Carlo Tomasi; |
913 | Hierarchical Feature Embedding for Visual Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A more favorable way is to produce features that emphasize both types of information in visual tracking. To achieve this, we propose a hierarchical feature embedding model which separately learns the instance and category information, and progressively embeds them. |
Zhixiong Pi; Weitao Wan; Chong Sun; Changxin Gao; Nong Sang; Chen Li; |
914 | Tackling Background Distraction in Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: One of the main challenges in this task is the existence of background distractors that appear similar to the target objects. We propose three novel strategies to suppress such distractors: 1) a spatio-temporally diversified template construction scheme to obtain generalized properties of the target objects 2) a learnable distance-scoring function to exclude spatially-distant distractors by exploiting the temporal consistency between two consecutive frames 3) swap-and-attach augmentation to force each object to have unique features by providing training samples containing entangled objects. |
Suhwan Cho; Heansung Lee; Minhyeok Lee; Chaewon Park; Sungjun Jang; Minjung Kim; Sangyoun Lee; |
915 | Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Yet, the BoN does not quantify the whole generated samples, resulting in an incomplete view of the model’s prediction quality and performance. We propose a new metric, Average Mahalanobis Distance (AMD) to tackle this issue. |
Abduallah Mohamed; Deyao Zhu; Warren Vu; Mohamed Elhoseiny; Christian Claudel; |
916 | TEMOS: Generating Diverse Human Motions from Textual Descriptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data, in combination with a text encoder that produces distribution parameters compatible with the VAE latent space. |
Mathis Petrovich; Michael J. Black; Gü,l Varol; |
917 | Tracking Every Thing in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a new metric, Track Every Thing Accuracy (TETA), breaking tracking measurement into three sub-factors: localization, association, and classification, allowing comprehensive benchmarking of tracking performance even under inaccurate classification. |
Siyuan Li; Martin Danelljan; Henghui Ding; Thomas E. Huang; Fisher Yu; |
918 | HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Due to the inherent depth ambiguity of monocular settings, 3D motions captured with existing methods often contain severe artefacts such as incorrect body-scene inter-penetrations, jitter and body floating. To tackle these issues, we propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry. |
Soshi Shimada; Vladislav Golyanik; Zhi Li; Patrick Pé,rez; Weipeng Xu; Christian Theobalt; |
919 | Towards Sequence-Level Training for Visual Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms. |
Minji Kim; Seungkwan Lee; Jungseul Ok; Bohyung Han; Minsu Cho; |
920 | Learned Monocular Depth Priors in Visual-Inertial Initialization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In practical scenarios where high-parallax or variable acceleration assumptions are rarely met (e.g. hovering aerial robot, smartphone AR user not gesticulating with phone), classical visual-inertial initialization formulations often become ill-conditioned and/or fail to meaningfully converge. In this paper we target visual-inertial initialization specifically for these low-excitation scenarios critical to in-the-wild usage. |
Yunwen Zhou; Abhishek Kar; Eric Turner; Adarsh Kowdle; Chao X. Guo; Ryan C. DuToit; Konstantine Tsotsos; |
921 | Robust Visual Tracking By Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a segmentation-centric tracking pipeline that not only produces a highly accurate segmentation mask, but also internally works with segmentation masks instead of bounding boxes. |
Matthieu Paul; Martin Danelljan; Christoph Mayer; Luc Van Gool; |
922 | MeshLoc: Mesh-Based Visual Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we thus explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation. |
Vojtech Panek; Zuzana Kukelova; Torsten Sattler; |
923 | S2F2: Single-Stage Flow Forecasting for Future Multiple Trajectories Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a single-stage framework, named S2F2, for forecasting multiple human trajectories from raw video images by predicting future optical flows. |
Yu-Wen Chen; Hsuan-Kung Yang; Chu-Chi Chiu; Chun-Yi Lee; |
924 | Large-Displacement 3D Object Tracking with Hybrid Non-local Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we propose a fast and effective non-local 3D tracking method. |
Xuhui Tian; Xinran Lin; Fan Zhong; Xueying Qin; |
925 | FEAR: Fast, Efficient, Accurate and Robust Visual Tracker Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present FEAR, a family of fast, efficient, accurate, and robust Siamese visual trackers. |
Vasyl Borsuk; Roman Vei; Orest Kupyn; Tetiana Martyniuk; Igor Krashenyi; Ji?i Matas; |
926 | PREF: Predictability Regularized Neural Motion Fields Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we leverage a neural motion field for estimating the motion of all points in a multiview setting. |
Liangchen Song; Xuan Gong; Benjamin Planche; Meng Zheng; David Doermann; Junsong Yuan; Terrence Chen; Ziyan Wu; |
927 | View Vertically: A Hierarchical Network for Trajectory Prediction Via Fourier Spectrums Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Accordingly, we propose a hierarchical network V$^2$-Net, which contains two sub-networks, to hierarchically model and predict agents’ trajectories with trajectory spectrums. |
Conghao Wong; Beihao Xia; Ziming Hong; Qinmu Peng; Wei Yuan; Qiong Cao; Yibo Yang; Xinge You; |
928 | HVC-Net: Unifying Homography, Visibility, and Confidence Learning for Planar Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods tend to obtain wrong correspondences with changing appearance variations, camera-object relative motions and occlusions. To alleviate this problem, we present a unified convolutional neural network (CNN) model that jointly considers homography, visibility, and confidence. |
Haoxian Zhang; Yonggen Ling; |
929 | RamGAN: Region Attentive Morphing GAN for Region-Level Makeup Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a region adaptive makeup transfer GAN, called RamGAN, for precise region-level makeup transfer. |
Jianfeng Xiang; Junliang Chen; Wenshuang Liu; Xianxu Hou; Linlin Shen; |
930 | SinNeRF: Training Neural Radiance Fields on Complex Scenes from A Single Image Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we consider a more ambitious task: training neural radiance field, over realistically complex visual scenes, by “looking only once”, i.e., using only a single view. |
Dejia Xu; Yifan Jiang; Peihao Wang; Zhiwen Fan; Humphrey Shi; Zhangyang Wang; |
931 | Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the ability of the classifier to easily discriminate an incompletely generated image only with high-level structure, the gradient, which is a kind of class information guidance, tends to vanish early, leading to the collapse from conditional generation process into the unconditional process. To address this problem, we propose two simple but effective approaches from two perspectives. |
Guangcong Zheng; Shengming Li; Hui Wang; Taiping Yao; Yang Chen; Shouhong Ding; Xi Li; |
932 | Accelerating Score-Based Generative Models with Preconditioned Diffusion Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We investigate this problem by viewing the diffusion sampling process as a Metropolis adjusted Langevin algorithm, which helps reveal the underlying cause to be ill-conditioned curvature. Under this insight, we propose a model-agnostic preconditioned diffusion sampling (PDS) method that leverages matrix preconditioning to alleviate the aforementioned problem. |
Hengyuan Ma; Li Zhang; Xiatian Zhu; Jianfeng Feng; |
933 | Learning to Generate Realistic LiDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present LiDARGen, a novel, effective, and controllable generative model that produces realistic LiDAR point cloud sensory readings. |
Vlas Zyrianov; Xiyue Zhu; Shenlong Wang; |
934 | RFNet-4D: Joint Object Reconstruction and Flow Estimation from 4D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new network architecture, namely RFNet-4D, that jointly reconstruct objects and their motion flows from 4D point clouds. |
Tuan-Anh Vu; Thanh Nguyen; Binh-Son Hua; Quang-Hieu Pham; Sai-Kit Yeung; |
935 | Diverse Image Inpainting with Normalizing Flow Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose Flow-Fill, a novel two-stage image inpainting framework that utilizes a conditional normalizing flow model to generate diverse structural priors in the first stage. |
Cairong Wang; Yiming Zhu; Chun Yuan; |
936 | Improved Masked Image Generation with Token-Critic Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. |
José Lezama; Huiwen Chang; Lu Jiang; Irfan Essa; |
937 | TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The Frechet Inception distance is one of the most widely used metrics for evaluation of GANs, which assumes that the features from a trained Inception model for a set of images follow a normal distribution. In this paper, we argue that this is an over-simplified assumption, which may lead to unreliable evaluation results, and more accurate density estimation can be achieved using a truncated generalized normal distribution. |
Junghyuk Lee; Jong-Seok Lee; |
938 | Exploring Gradient-Based Multi-directional Controls in GANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. |
Zikun Chen; Ruowei Jiang; Brendan Duke; Han Zhao; Parham Aarabi; |
939 | Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In particular, we introduce a framework, SPAIR3D, to factorize a 3D point cloud into a spatial mixture model where each component corresponds to one object. |
Tianyu Wang; Miaomiao Liu; Kee Siong Ng; |
940 | Neural Scene Decoration from A Single Photograph Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new problem of domain-specific indoor scene image synthesis, namely neural scene decoration. |
Hong-Wing Pang; Yingshu Chen; Phuoc-Hieu Le; Binh-Son Hua; Thanh Nguyen; Sai-Kit Yeung; |
941 | Outpainting By Queries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. |
Kai Yao; Penglei Gao; Xi Yang; Jie Sun; Rui Zhang; Kaizhu Huang; |
942 | Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. |
Sam Bond-Taylor; Peter Hessey; Hiroshi Sasaki; Toby P. Breckon; Chris G. Willcocks; |
943 | ChunkyGAN: Real Image Inversion Via Segments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present ChunkyGAN-a novel paradigm for modeling and editing images using generative adversarial networks. |
Adé,la Šubrtová,; David Futschik; Jan ?ech; Michal Luká,?; Eli Shechtman; Daniel Sý,kora; |
944 | GAN Cocktail: Mixing GANs Without Dataset Access Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we tackle the problem of model merging, given two constraints that often come up in the real world: (1) no access to the original training data, and (2) without increasing the size of the neural network. |
Omri Avrahami; Dani Lischinski; Ohad Fried; |
945 | Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work we develop a generalizable and efficient Neural Radiance Field (NeRF) pipeline for high-fidelity free-viewpoint human body synthesis under settings with sparse camera views. |
Mingfei Chen; Jianfeng Zhang; Xiangyu Xu; Lijuan Liu; Yujun Cai; Jiashi Feng; Shuicheng Yan; |
946 | Controllable Shadow Generation Using Pixel Height Maps Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce “Pixel Height”, a novel geometry representation that encodes the correlations between objects, ground, and camera pose. |
Yichen Sheng; Yifan Liu; Jianming Zhang; Wei Yin; A. Cengiz Oztireli; He Zhang; Zhe Lin; Eli Shechtman; Bedrich Benes; |
947 | Learning Where to Look – Generative NAS Is Surprisingly Efficient Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a generative model, paired with a surrogate predictor, that iteratively learns to generate samples from increasingly promising latent subspaces. |
Jovita Lukasik; Steffen Jung; Margret Keuper; |
948 | Subspace Diffusion Generative Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Score-based models generate samples by mapping noise to data (and vice versa) via a high-dimensional diffusion process. We question whether it is necessary to run this entire process at high dimensionality and incur all the inconveniences thereof. |
Bowen Jing; Gabriele Corso; Renato Berlinghieri; Tommi Jaakkola; |
949 | DuelGAN: A Duel Between Two Discriminators Stabilizes The GAN Training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce DuelGAN, a generative adversarial network (GAN) solution to improve the stability of the generated samples and to mitigate mode collapse. |
Jiaheng Wei; Minghao Liu; Jiahao Luo; Andrew Zhu; James Davis; Yang Liu; |
950 | MINER: Multiscale Implicit Neural Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new neural signal model designed for efficient high-resolution representation of large-scale signals. |
Vishwanath Saragadam; Jasper Tan; Guha Balakrishnan; Richard G. Baraniuk; Ashok Veeraraghavan; |
951 | An Embedded Feature Whitening Approach to Deep Neural Network Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, existing feature whitening methods have a few limitations, such as the large computation and memory cost, incapability to adopt pre-trained DNN models, the introduction of additional parameters, etc., making them impractical to use in optimizing DNNs. To overcome these drawbacks, we propose a novel Embedded Feature Whitening (EFW) approach to DNN optimization. |
Hongwei Yong; Lei Zhang; |
952 | Q-FW: A Hybrid Classical-Quantum Frank-Wolfe for Quadratic Binary Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a hybrid classical-quantum framework based on the Frank-Wolfe algorithm, Q-FW, for solving quadratic, linearly-constrained, binary optimization problems on quantum annealers (QA). |
Alp Yurtsever; Tolga Birdal; Vladislav Golyanik; |
953 | Self-Supervised Learning of Visual Graph Matching Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Inspired by recent progress in self-supervised contrastive learning, we propose an end-to-end label-free self-supervised contrastive graph matching framework (SCGM). |
Chang Liu; Shaofeng Zhang; Xiaokang Yang; Junchi Yan; |
954 | Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the primary barrier, scalability, persists for this paradigm: as the typical L2O models create massive memory overhead due to unrolled computational graphs, it disables L2O’s applicability to large-scale tasks. To overcome this core challenge, we propose a new scalable learning to optimize (SL2O) framework which (i) first constrains the network updates in a tiny subspace and (ii) then explores learning rules on top of it. |
Xuxi Chen; Tianlong Chen; Yu Cheng; Weizhu Chen; Ahmed Awadallah; Zhangyang Wang; |
955 | QISTA-ImageNet: A Deep Compressive Image Sensing Framework Solving $\ell_q$-Norm Optimization Problem Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we study how to reconstruct the original images from the given sensed samples/measurements by proposing a so-called deep compressive image sensing framework. |
Gang-Xuan Lin; Shih-Wei Hu; Chun-Shien Lu; |
956 | R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Though recent DFCIL works introduce techniques such as model inversion to synthesize data for previous classes, they fail to overcome forgetting due to the severe domain gap between the synthetic and real data. To address this issue, this paper proposes relation-guided representation learning (RRL) for DFCIL, dubbed R-DFCIL. |
Qiankun Gao; Chen Zhao; Bernard Ghanem; Jian Zhang; |
957 | Domain Generalization By Mutual-Information Regularization with Pre-trained Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we re-formulate the DG objective using mutual information with the oracle model, a model generalized to any possible domain. |
Junbum Cha; Kyungjae Lee; Sungrae Park; Sanghyuk Chun; |
958 | Predicting Is Not Understanding: Recognizing and Addressing Underspecification in Machine Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we formalize the notion of underspecification and propose a method to identify and address the issue. |
Damien Teney; Maxime Peyrard; Ehsan Abbasnejad; |
959 | Neural-Sim: Learning to Generate Training Data with NeRF Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present the first fully differentiable synthetic data generation pipeline that uses Neural Radiance Fields (NeRFs) in a closed-loop with a target application’s loss function to generate data, on demand, with no human labor, to maximise accuracy for a target task. |
Yunhao Ge; Harkirat Behl; Jiashu Xu; Suriya Gunasekar; Neel Joshi; Yale Song; Xin Wang; Laurent Itti; Vibhav Vineet; |
960 | Bayesian Optimization with Clustering and Rollback for CNN Auto Pruning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel clustering algorithm that reduces the dimension of the design space to speed up the searching process. |
Hanwei Fan; Jiandong Mu; Wei Zhang; |
961 | Learned Variational Video Color Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel method for color propagation that is used to recolor gray-scale videos (e.g. historic movies). |
Markus Hofinger; Erich Kobler; Alexander Effland; Thomas Pock; |
962 | Continual Variational Autoencoder Learning Via Online Cooperative Memorization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we firstly analyze the forgetting behaviour of VAEs by developing a new theoretical framework that formulates CL as a dynamic optimal transport problem. This framework proves approximate bounds to the data likelihood without requiring the task information and explains how the prior knowledge is lost during the training process. We then propose a novel memory buffering approach, namely the Online Cooperative Memorization (OCM) framework, which consists of a Short-Term Memory (STM) that continually stores recent samples to provide future information for the model, and a Long-Term Memory (LTM) aiming to preserve a wide diversity of samples. |
Fei Ye; Adrian G. Bors; |
963 | Learning to Learn with Smooth Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by the stability property that should be satisfied by an ideal optimizer, we propose a regularization term that can enforce the smoothness and stability of the learned optimizers. |
Yuanhao Xiong; Cho-Jui Hsieh; |
964 | Incremental Task Learning with Incremental Rank Updates Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new incremental task learning framework based on low-rank factorization. |
Rakib Hyder; Ken Shao; Boyu Hou; Panos Markopoulos; Ashley Prater-Bennette; M. Salman Asif; |
965 | Batch-Efficient EigenDecomposition for Small and Medium Matrices Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a QR-based ED method dedicated to the application scenarios of computer vision. |
Yue Song; Nicu Sebe; Wei Wang; |
966 | Ensemble Learning Priors Driven Deep Unfolding for Scalable Video Snapshot Compressive Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing DL algorithms are limited by two bottlenecks: 1) a high accuracy network is usually large and requires a long running time 2) DL algorithms are limited by scalability, i.e., a well trained network in general can not be applied to new systems. Towards this end, this paper proposes to use ensemble learning priors in DL to keep high reconstruction speed and accuracy in a single network. |
Chengshuai Yang; Shiyu Zhang; Xin Yuan; |
967 | Approximate Discrete Optimal Transport Plan with Auxiliary Measure Method Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work proposes an auxiliary measure method to use the semi-discrete OT maps to estimate the sparsity of the discrete OT plan with squared Euclidean cost. |
Dongsheng An; Na Lei; Xianfeng Gu; |
968 | A Comparative Study of Graph Matching Algorithms in Computer Vision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address these shortcomings, we present a comparative study of graph matching algorithms.We create a uniform benchmark where we collect and categorize a large set of existing and publicly available computer vision graph matching problems in a common format. |
Stefan Haller; Lorenz Feineis; Lisa Hutschenreiter; Florian Bernard; Carsten Rother; Dagmar Kainmü,ller; Paul Swoboda; Bogdan Savchynskyy; |
969 | Improving Generalization in Federated Learning By Seeking Flat Minima Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model’s lack of generalization capacity to the sharpness of the solution. |
Debora Caldarola; Barbara Caputo; Marco Ciccone; |
970 | Semidefinite Relaxations of Truncated Least-Squares in Robust Rotation Search: Tight or Not Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Whether or not this SDR is theoretically tight in the presence of noise, outliers, or both has remained largely unexplored. We derive conditions that characterize the tightness of this SDR, showing that the tightness depends on the noise level, the truncation parameters of TLS, and the outlier distribution (random or clustered). |
Liangzu Peng; Mahyar Fazlyab; René Vidal; |
971 | Transfer Without Forgetting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, this issue leads to the under-exploitation of knowledge transfer during later tasks. On this ground, we propose Transfer without Forgetting (TwF), a hybrid approach building upon a fixed pretrained sibling network, which continuously propagates the knowledge inherent in the source domain through a layer-wise loss term. |
Matteo Boschini; Lorenzo Bonicelli; Angelo Porrello; Giovanni Bellitto; Matteo Pennisi; Simone Palazzo; Concetto Spampinato; Simone Calderara; |
972 | AdaBest: Minimizing Client Drift in Federated Learning Via Adaptive Bias Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose an adaptive algorithm that accurately estimates drift across clients. |
Farshid Varno; Marzie Saghayi; Laya Rafiee Sevyeri; Sharut Gupta; Stan Matwin; Mohammad Havaei; |
973 | Tackling Long-Tailed Category Distribution Under Domain Shifts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we took a step forward and looked into the problem of long-tailed classification under domain shifts. |
Xiao Gu; Yao Guo; Zeju Li; Jianing Qiu; Qi Dou; Yuxuan Liu; Benny Lo; Guang-Zhong Yang; |
974 | Doubly-Fused ViT: Fuse Information from Vision Transformer Doubly with Local Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an efficient ViT architecture, named Doubly-Fused ViT (DFvT), where we feed low-resolution feature maps to self-attention (SA) to achieve larger context with efficiency (by moving downsampling prior to SA), and enhance it with fine-detailed spatial information. |
Li Gao; Dong Nie; Bo Li; Xiaofeng Ren; |
975 | Improving Vision Transformers By Revisiting High-Frequency Components Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, compared with training Convolutional Neural Network (CNN) models, training Vision Transformer (ViT) models is more difficult and relies on the large-scale training set. To explain this observation we make a hypothesis that \textit{ViT models are less effective in capturing the high-frequency components of images than CNN models}, and verify it by a frequency analysis. |
Jiawang Bai; Li Yuan; Shu-Tao Xia; Shuicheng Yan; Zhifeng Li; Wei Liu; |
976 | Recurrent Bilinear Optimization for Binary Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our work is the first attempt to optimize BNNs from the bilinear perspective. |
Sheng Xu; Yanjing Li; Tiancheng Wang; Teli Ma; Baochang Zhang; Peng Gao; Yu Qiao; Jinhu Lü,; Guodong Guo; |
977 | Neural Architecture Search for Spiking Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most prior SNN methods use ANN-like architectures (e.g., VGG-Net or ResNet), which could provide sub-optimal performance for temporal sequence processing of binary information in SNNs. To address this, in this paper, we introduce a novel Neural Architecture Search (NAS) approach for finding better SNN architectures. |
Youngeun Kim; Yuhang Li; Hyoungseob Park; Yeshwanth Venkatesha; Priyadarshini Panda; |
978 | Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This granularity-wise attention is confirmed by our collected human real-time gaze data on different hierarchy classifications. To leverage this mechanism, we propose a Cross-Hierarchical Region Feature (CHRF) learning framework. |
Yang Liu; Lei Zhou; Pengcheng Zhang; Xiao Bai; Lin Gu; Xiaohan Yu; Jun Zhou; Edwin R. Hancock; |
979 | DaViT: Dual Attention Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. |
Mingyu Ding; Bin Xiao; Noel Codella; Ping Luo; Jingdong Wang; Lu Yuan; |
980 | Optimal Transport for Label-Efficient Visible-Infrared Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we raise a new label-efficient training pipeline for VI-ReID. |
Jiangming Wang; Zhizhong Zhang; Mingang Chen; Yi Zhang; Cong Wang; Bin Sheng; Yanyun Qu; Yuan Xie; |
981 | Locality Guidance for Improving Vision Transformers on Tiny Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While the Vision Transformer (VT) architecture is becoming trendy in computer vision, pure VT models perform poorly on tiny datasets. To address this issue, this paper proposes the locality guidance for improving the performance of VTs on tiny datasets. |
Kehan Li; Runyi Yu; Zhennan Wang; Li Yuan; Guoli Song; Jie Chen; |
982 | Neighborhood Collective Estimation for Noisy Label Identification and Correction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias. To mitigate this issue, we propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors. |
Jichang Li; Guanbin Li; Feng Liu; Yizhou Yu; |
983 | Few-Shot Class-Incremental Learning Via Entropy-Regularized Data-Free Replay Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we show through empirical results that adopting the data replay is surprisingly favorable. |
Huan Liu; Li Gu; Zhixiang Chi; Yang Wang; Yuanhao Yu; Jun Chen; Jin Tang; |
984 | Anti-Retroactive Interference for Lifelong Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. |
Runqi Wang; Yuxiang Bao; Baochang Zhang; Jianzhuang Liu; Wentao Zhu; Guodong Guo; |
985 | Towards Calibrated Hyper-Sphere Representation Via Distribution Overlap Coefficient for Long-Tailed Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, little attention has been given to how to quantify the dominance severity of head classes in the representation space. Motivated by this, we generalize the cosine-based classifiers to a von Mises-Fisher (vMF) mixture model, denoted as vMF classifier, which enables to quantitatively measure representation quality upon the hyper-sphere space via calculating distribution overlap coefficient. |
Hualiang Wang; Siming Fu; Xiaoxuan He; Hangxiang Fang; Zuozhu Liu; Haoji Hu; |
986 | Dynamic Metric Learning with Cross-Level Concept Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To facilitate the cross-level semantic structure of the image representations, we propose a hierarchical concept refiner to construct multiple levels of concept embeddings of an image and then pull closer the distance of the corresponding concepts. |
Wenzhao Zheng; Yuanhui Huang; Borui Zhang; Jie Zhou; Jiwen Lu; |
987 | MENet: A Memory-Based Network with Dual-Branch for Efficient Event Stream Processing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To efficiently extract strong features for event streams containing dynamic information, this paper proposes a novel memory-based network with dual-branch, namely MENet. |
Linhui Sun; Yifan Zhang; Ke Cheng; Jian Cheng; Hanqing Lu; |
988 | Out-of-Distribution Detection with Boundary Aware Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose boundary aware learning (BAL), a novel framework that can learn the distribution of OOD features adaptively.For tackling this problem, previous studies either use real outliers for training or generate synthetic OOD data under strong assumptions, which are either costly or intractable to generalize. |
Sen Pei; Xin Zhang; Bin Fan; Gaofeng Meng; |
989 | Learning Hierarchy Aware Features for Reducing Mistake Severity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel approach for learning Hierarchy Aware Features (HAF) that leverages classifiers at each level of the hierarchy that are constrained to generate predictions consistent with the label hierarchy. |
Ashima Garg; Depanshu Sani; Saket Anand; |
990 | Learning to Detect Every Thing in An Open World Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The key issue lies in their assumption that regions without any annotations should be suppressed as negatives, which teaches the model to treat any unannotated (hidden) objects as background. To address this issue, we propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET). |
Kuniaki Saito; Ping Hu; Trevor Darrell; Kate Saenko; |
991 | KVT: $k$-NN Attention for Boosting Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, since the current dense self-attention uses all image patches (tokens) to compute attention matrix, it may neglect locality of images patches and involve noisy tokens (e.g., clutter background and occlusion), leading to a slow training process and potential degradation of performance. To address these problems, we propose the $k$-NN attention for boosting vision transformers. |
Pichao Wang; Xue Wang; Fan Wang; Ming Lin; Shuning Chang; Hao Li; Rong Jin; |
992 | Registration Based Few-Shot Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. |
Chaoqin Huang; Haoyan Guan; Aofan Jiang; Ya Zhang; Michael Spratling; Yan-Feng Wang; |
993 | Improving Robustness By Enhancing Weak Subnets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate model robustness on perturbed inputs by studying the performance of internal sub-networks (subnets). |
Yong Guo; David Stutz; Bernt Schiele; |
994 | Learning Invariant Visual Representations for Compositional Zero-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose an invariant feature learning framework to align different domains at the representation and gradient levels to capture the intrinsic characteristics associated with the tasks. |
Tian Zhang; Kongming Liang; Ruoyi Du; Xian Sun; Zhanyu Ma; Jun Guo; |
995 | Improving Covariance Conditioning of The SVD Meta-Layer By Orthogonality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer. |
Yue Song; Nicu Sebe; Wei Wang; |
996 | Out-of-Distribution Detection with Semantic Mismatch Under Masking Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel out-of-distribution (OOD) detection framework named MOODCat for image classifiers. |
Yijun Yang; Ruiyuan Gao; Qiang Xu; |
997 | Data-Free Neural Architecture Search Via Recursive Label Calibration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper aims to explore the feasibility of neural architecture search (NAS) given only a pre-trained model without using any original training data. |
Zechun Liu; Zhiqiang Shen; Yun Long; Eric Xing; Kwang-Ting Cheng; Chas Leichner; |
998 | Learning from Multiple Annotator Noisy Labels Via Sample-Wise Label Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous approaches usually assume that all data samples share the same set of parameters related to annotator errors, while we demonstrate that label error learning should be both annotator and data sample dependent. Motivated by this observation, we propose a novel learning algorithm. |
Zhengqi Gao; Fan-Keng Sun; Mingran Yang; Sucheng Ren; Zikai Xiong; Marc Engeler; Antonio Burazer; Linda Wildling; Luca Daniel; Duane S. Boning; |
999 | Acknowledging The Unknown for Multi-Label Learning with Single Positive Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we choose to treat all unannotated labels from an alternative perspective, i.e. acknowledging they are unknown. |
Donghao Zhou; Pengfei Chen; Qiong Wang; Guangyong Chen; Pheng-Ann Heng; |
1000 | AutoMix: Unveiling The Power of Mixup for Stronger Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there arises a trade-off between precise mixing policies and optimization complexity. To address this challenge, we propose a novel automatic mixup (AutoMix) framework, where the mixup policy is parameterized and serves the ultimate classification goal directly. |
Zicheng Liu; Siyuan Li; Di Wu; Zihan Liu; Zhiyuan Chen; Lirong Wu; Stan Z. Li; |
1001 | MaxViT: Multi-axis Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we introduce an efficient and scalable attention model we call multi-axis attention, which consists of two aspects: blocked local and dilated global attention. |
Zhengzhong Tu; Hossein Talebi; Han Zhang; Feng Yang; Peyman Milanfar; Alan Bovik; Yinxiao Li; |
1002 | ScalableViT: Rethinking The Context-Oriented Generalization of Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and graphic representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrices while unbinding them with the input. |
Rui Yang; Hailong Ma; Jie Wu; Yansong Tang; Xuefeng Xiao; Min Zheng; Xiu Li; |
1003 | Three Things Everyone Should Know About Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We offer three insights based on simple and easy to implement variants of vision transformers. |
Hugo Touvron; Matthieu Cord; Alaaeldin El-Nouby; Jakob Verbeek; Hervé Jé,gou; |
1004 | DeiT III: Revenge of The ViT Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we revisit the supervised training of ViTs. |
Hugo Touvron; Matthieu Cord; Hervé Jé,gou; |
1005 | MixSKD: Self-Knowledge Distillation from Mixup for Image Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike the conventional Knowledge Distillation (KD), Self-KD allows a network to learn knowledge from itself without any guidance from extra networks. This paper proposes to perform Self-KD from image Mixture (MixSKD), which integrates these two techniques into a unified framework. |
Chuanguang Yang; Zhulin An; Helong Zhou; Linhang Cai; Xiang Zhi; Jiwen Wu; Yongjun Xu; Qian Zhang; |
1006 | Self-Feature Distillation with Uncertainty Modeling for Degraded Image Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: It treats each pixel in the feature equally and may result in relatively poor reconstruction performance in some difficult regions. To address this issue, we propose a novel self-feature distillation method with uncertainty modeling for better producing HQ-like features from low-quality observations in this paper. |
Zhou Yang; Weisheng Dong; Xin Li; Jinjian Wu; Leida Li; Guangming Shi; |
1007 | Novel Class Discovery Without Forgetting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose 1) a method to generate pseudo-latent representations which act as a proxy for (no longer available) labeled data, thereby alleviating forgetting, 2) a mutual-information based regularizer which enhances unsupervised discovery of novel classes, and 3) a simple Known Class Identifier which aids generalized inference when the testing data contains instances form both seen and unseen categories. |
K J Joseph; Sujoy Paul; Gaurav Aggarwal; Soma Biswas; Piyush Rai; Kai Han; Vineeth N Balasubramanian; |
1008 | SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing methods augment tail-class features to compensate tail classes on feature space, but these methods fail to generalize on test phase. To mitigate this problem, we propose a novel Sample-Adaptive Feature Augmentation (SAFA) to augment features for tail classes resulting in ameliorating the classifier performance. |
Yan Hong; Jianfu Zhang; Zhongyi Sun; Ke Yan; |
1009 | Negative Samples Are at Large: Leveraging Hard-Distance Elastic Loss for Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a Momentum Re-identification (MoReID) framework that can leverage a very large number of negative samples in training for general re-identification task. |
Hyungtae Lee; Sungmin Eum; Heesung Kwon; |
1010 | Discrete-Constrained Regression for Local Counting Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the sensitivity, we loosen the regression formulation from a continuous scale to a discrete ordering and propose a novel discrete-constrained (DC) regression. |
Haipeng Xiong; Angela Yao; |
1011 | Breadcrumbs: Adversarial Class-Balanced Sampling for Long-Tailed Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: A new feature augmentation strategy, EMANATE, based on back-tracking of features across epochs during training, is proposed. |
Bo Liu; Haoxiang Li; Hao Kang; Gang Hua; Nuno Vasconcelos; |
1012 | Chairs Can Be Stood On: Overcoming Object Bias in Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To overcome the object bias problem, we propose a novel plug-and-play Object-wise Debiasing Memory (ODM) method for re-balancing the distribution of interactions under detected objects. |
Guangzhi Wang; Yangyang Guo; Yongkang Wong; Mohan Kankanhalli; |
1013 | A Fast Knowledge Distillation Framework for Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this study, we present a Fast Knowledge Distillation (FKD) framework that replicates the distillation training phase and generates soft labels using the multi-crop KD approach, meanwhile training faster than ReLabel since no post-processes such as RoI align and softmax operations are used. |
Zhiqiang Shen; Eric Xing; |
1014 | DICE: Leveraging Sparsification for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we reveal important insights that reliance on unimportant weights and units can directly attribute to the brittleness of OOD detection. |
Yiyou Sun; Yixuan Li; |
1015 | Invariant Feature Learning for Generalized Long-Tailed Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose an Invariant Feature Learning (IFL) method as the first strong baseline for GLT. |
Kaihua Tang; Mingyuan Tao; Jiaxin Qi; Zhenguang Liu; Hanwang Zhang; |
1016 | Sliced Recursive Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10 30% without sacrificing performance. |
Zhiqiang Shen; Zechun Liu; Eric Xing; |
1017 | Cross-Domain Ensemble Distillation for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a simple yet effective method for domain generalization, named cross-domain ensemble distillation (XDED), that learns domain-invariant features while encouraging the model to converge to flat minima, which recently turned out to be a sufficient condition for domain generalization. |
Kyungmoon Lee; Sungyeon Kim; Suha Kwak; |
1018 | Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a two-stage clean samples identification method to address the aforementioned challenge. |
Ganlong Zhao; Guanbin Li; Yipeng Qin; Feng Liu; Yizhou Yu; |
1019 | Hyperspherical Learning in Multi-Label Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing (partial) multi-label methods are usually studied in the Euclidean space, where the relationship between the label embeddings and image features is not symmetrical and thus can be challenging to learn. To alleviate this problem, we propose reformulating the task into a hyperspherical space, where an angular margin can be incorporated into a hyperspherical multi-label loss function. |
Bo Ke; Yunquan Zhu; Mengtian Li; Xiujun Shu; Ruizhi Qiao; Bo Ren; |
1020 | When Active Learning Meets Implicit Semantic Data Augmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes diversity-aware semantic transformation active learning, or DAST-AL framework, that looks ahead the effect of ISDA in the selection of unlabeled samples. |
Zhuangzhuang Chen; Jin Zhang; Pan Wang; Jie Chen; Jianqiang Li; |
1021 | VL-LTR: Learning Class-Wise Visual-Linguistic Representation for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a novel framework based on pre-trained visual-linguistic models for long-tailed recognition (LTR), termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition tasks. |
Changyao Tian; Wenhai Wang; Xizhou Zhu; Jifeng Dai; Yu Qiao; |
1022 | Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that the widely adopted assumption in prior work–the context bias can be directly annotated or estimated from biased class prediction–renders the context incomplete or even incorrect. |
Jiaxin Qi; Kaihua Tang; Qianru Sun; Xian-Sheng Hua; Hanwang Zhang; |
1023 | Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, when contaminated with unlabeled abnormal samples in training set under semi-supervised settings, current contrastive-based methods generally 1) ignore the comprehensive relation between training data, leading to suboptimal performance, and 2) require fine-tuning, resulting in low efficiency. To address the above two issues, in this paper, we propose a novel hierarchical semi-supervised contrastive learning (HSCL) framework, for contamination-resistant anomaly detection. |
Gaoang Wang; Yibing Zhan; Xinchao Wang; Mingli Song; Klara Nahrstedt; |
1024 | Tracking By Associating Clips Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate an alternative by treating object association as clip-wise matching. |
Sanghyun Woo; Kwanyong Park; Seoung Wug Oh; In So Kweon; Joon-Young Lee; |
1025 | RealPatch: A Statistical Matching Framework for Model Patching with Real Samples Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose RealPatch, a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching. |
Sara Romiti; Christopher Inskip; Viktoriia Sharmanska; Novi Quadrianto; |
1026 | Background-Insensitive Scene Text Recognition with Text Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Background-Insensitive approach BINet by explicitly leveraging the text Semantic Segmentation (SSN) to extract texts more accurately. |
Liang Zhao; Zhenyao Wu; Xinyi Wu; Greg Wilsbacher; Song Wang; |
1027 | Semantic Novelty Detection Via Relational Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We claim that a tailored representation learning strategy may be the right solution for effective and efficient semantic novelty detection. Besides extensively testing state-of-the-art approaches for this task, we propose a novel representation learning paradigm based on relational reasoning. |
Francesco Cappio Borlino; Silvia Bucci; Tatiana Tommasi; |
1028 | Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose TAP, a new Transformer-based model that can utilize context and predict attributes for multiple objects in a scene in a single forward pass, and a training scheme that allows this model to learn attribute prediction from image-text datasets. |
Khoi Pham; Kushal Kafle; Zhe Lin; Zhihong Ding; Scott Cohen; Quan Tran; Abhinav Shrivastava; |
1029 | Training Vision Transformers with Only 2040 Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate how to train ViTs with limited data (e.g., 2040 images). |
Yun-Hao Cao; Hao Yu; Jianxin Wu; |
1030 | Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To resolve these challenges, we present an effective unified learning framework that takes full advantage of all available training data to learn detection and tracking while not losing any LVIS categories to recognize. |
Sanghyun Woo; Kwanyong Park; Seoung Wug Oh; In So Kweon; Joon-Young Lee; |
1031 | TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Accordingly, in this work, we propose a lightweight top-down attention module (TDAM) that iteratively generates a “visual searchlight” to perform channel and spatial modulation of its inputs and outputs more contextually-relevant feature maps at each computation step. |
Shantanu Jaiswal; Basura Fernando; Cheston Tan; |
1032 | Automatic Check-Out Via Prototype-Based Classifier Learning from Single-Product Exemplars Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the gap, we propose a method, termed as PSP, to perform Prototype-based classifier learning from Single-Product exemplars. |
Hao Chen; Xiu-Shen Wei; Faen Zhang; Yang Shen; Hui Xu; Liang Xiao; |
1033 | Overcoming Shortcut Learning in A Target Domain By Generalizing Basic Visual Factors from A Source Domain Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Prior works have shown how this impairs the compositional generalization capability of deep learning models. To address this problem, we propose a novel approach to mitigate shortcut learning in uncontrolled target domains. |
Piyapat Saranrittichai; Chaithanya Kumar Mummadi; Claudia Blaiotta; Mauricio Munoz; Volker Fischer; |
1034 | Photo-Realistic Neural Domain Randomization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that the recent progress in neural rendering enables a new unified approach we call Photo-realistic Neural Domain Randomization (PNDR). |
Sergey Zakharov; Rare? Ambru?; Vitor Guizilini; Wadim Kehl; Adrien Gaidon; |
1035 | Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, existing solutions commonly employ down-sampling operations (e.g., average pooling) over keys/values to dramatically reduce the computational cost. In this work, we argue that such over-aggressive down-sampling design is not invertible and inevitably causes information dropping especially for high-frequency components in objects (e.g., texture details). |
Ting Yao; Yingwei Pan; Yehao Li; Chong-Wah Ngo; Tao Mei; |
1036 | Tailoring Self-Supervision for Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Subsequently, to show how existing pretext tasks can fulfill these and be tailored for supervised learning, we propose a simple auxiliary self-supervision task, predicting localizable rotation (\textbf{LoRot}). |
WonJun Moon; Ji-Hwan Kim; Jae-Pil Heo; |
1037 | Difficulty-Aware Simulator for Open Set Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Therefore, we present a novel framework, DIfficulty-Aware Simulator (DIAS), that generates fakes with diverse difficulty levels to simulate the real world. |
WonJun Moon; Junho Park; Hyun Seok Seong; Cheol-Ho Cho; Jae-Pil Heo; |
1038 | Few-Shot Class-Incremental Learning from An Open-Set Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we first reevaluate the current task setting and propose a more comprehensive and practical setting for the FSCIL task. |
Can Peng; Kun Zhao; Tianren Wang; Meng Li; Brian C. Lovell; |
1039 | FOSTER: Feature Boosting and Compression for Class-Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the gradient boosting algorithm to gradually fit the residuals between the target model and the previous ensemble model, we propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively. |
Fu-Yun Wang; Da-Wei Zhou; Han-Jia Ye; De-Chuan Zhan; |
1040 | Visual Knowledge Tracing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel task of tracing the evolving classification behavior of human learners as they engage in challenging visual classification tasks. |
Neehar Kondapaneni; Pietro Perona; Oisin Mac Aodha; |
1041 | S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: FSCIL suffers from two major challenges: (i) over-fitting on the new classes due to limited amount of data, (ii) catastrophically forgetting about the old classes due to unavailability of data from these classes in the incremental stages. In this work, we propose a self-supervised stochastic classifier (S3C) to counter both these challenges in FSCIL. |
Jayateja Kalla; Soma Biswas; |
1042 | Improving Fine-Grained Visual Recognition in Low Data Regimes Via Self-Boosting Attention Mechanism Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In low data regimes, a network often struggles to choose the correct regions for recognition and tends to overfit spurious correlated patterns from the training data. To tackle this issue, this paper proposes the self-boosting attention mechanism, a novel method for regularizing the network to focus on the key regions shared across samples and classes. |
Yangyang Shu; Baosheng Yu; Haiming Xu; Lingqiao Liu; |
1043 | VSA: Learning Varied-Size Window Attention in Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, current models adopt a hand-crafted fixed-size window design, which restricts their capacity of modeling long-term dependencies and adapting to objects of different sizes. To address this drawback, we propose Varied-Size Window Attention (VSA) to learn adaptive window configurations from data. |
Qiming Zhang; Yufei Xu; Jing Zhang; Dacheng Tao; |
1044 | Unbiased Manifold Augmentation for Coarse Class Subdivision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Leveraging the recent progress of factor-disentangled generators, Unbiased Manifold Augmentation (UMA) is proposed for CCS. |
Baoming Yan; Ke Gao; Bo Gao; Lin Wang; Jiang Yang; Xiaobo Li; |
1045 | DenseHybrid: Hybrid Anomaly Detection for Dense Open-Set Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We therefore design a novel hybrid algorithm based on reinterpreting discriminative logits as a logarithm of the unnormalized joint distribution $\hat{p}(\mathbf{x},\mathbf{y})$ |
Matej Grci?; Petra Bevandi?; Siniša Šegvi?; |
1046 | Rethinking Confidence Calibration for Failure Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we find a general, widely-existed but actually-neglected phenomenon that most confidence calibration methods are useless or harmful for failure prediction. |
Fei Zhu; Zhen Cheng; Xu-Yao Zhang; Cheng-Lin Liu; |
1047 | Uncertainty-Guided Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation. |
Subhankar Roy; Martin Trapp; Andrea Pilzer; Juho Kannala; Nicu Sebe; Elisa Ricci; Arno Solin; |
1048 | Should All Proposals Be Treated Equally in Object Detection? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous works have emphasized detectors implemented with efficient backbones. The impact on this trade-off of proposal processing by the detection head is investigated in this work. |
Yunsheng Li; Yinpeng Chen; Xiyang Dai; Dongdong Chen; Mengchen Liu; Pei Yu; Ying Jin; Lu Yuan; Zicheng Liu; Nuno Vasconcelos; |
1049 | VIP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a unified framework to analyze certified patch defense tasks (including both certified detection and certified recovery) using the recently emerged vision transformer. |
Junbo Li; Huan Zhang; Cihang Xie; |
1050 | IncDFM: Incremental Deep Feature Modeling for Continual Novelty Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: They scale poorly under more realistic, continual learning regimes in which data distribution shifts occur. To address this critical gap, this paper proposes incDFM (incremental Deep Feature Modeling), a self-supervised continual novelty detector. |
Amanda Rios; Nilesh Ahuja; Ibrahima Ndiour; Utku Genc; Laurent Itti; Omesh Tickoo; |
1051 | IGFormer: Interaction Graph Transformer for Skeleton-Based Human Interaction Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. |
Yunsheng Pang; Qiuhong Ke; Hossein Rahmani; James Bailey; Jun Liu; |
1052 | PRIME: A Few Primitives Can Boost Robustness to Common Corruptions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we take a step back and follow a principled approach to achieve robustness to common corruptions. |
Apostolos Modas; Rahul Rade; Guillermo Ortiz-Jimé,nez; Seyed-Mohsen Moosavi-Dezfooli; Pascal Frossard; |
1053 | Rotation Regularization Without Rotation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a regularization method based on random rotation of feature vectors. |
Takumi Kobayashi; |
1054 | Towards Accurate Open-Set Recognition Via Background-Class Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To conduct OSR via a simple inference process (without offline analyses) in standard classifier architectures, we use distance-based classifiers instead of conventional Softmax classifiers. |
Wonwoo Cho; Jaegul Choo; |
1055 | In Defense of Image Pre-training for Spatiotemporal Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Nonetheless, interestingly, by taking a closer look at these from-scratch learned CNNs, we note there exist certain 3D kernels that exhibit much stronger appearance modeling ability than others, arguably suggesting appearance information is already well disentangled in learning. Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels. |
Xianhang Li; Huiyu Wang; Chen Wei; Jieru Mei; Alan Yuille; Yuyin Zhou; Cihang Xie; |
1056 | Augmenting Deep Classifiers with Polynomial Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we cast the study of deep classifiers under a unifying framework. |
Grigorios G. Chrysos; Markos Georgopoulos; Jiankang Deng; Jean Kossaifi; Yannis Panagakis; Anima Anandkumar; |
1057 | Learning with Noisy Labels By Efficient Transition Matrix Estimation to Combat Label Miscorrection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Moreover, every training step requires at least three back-propagations, significantly slowing down the training speed. To mitigate these issues, we propose a robust and efficient method, FasTEN, which learns a label transition matrix on the fly. |
Seong Min Kye; Kwanghee Choi; Joonyoung Yi; Buru Chang; |
1058 | Online Task-Free Continual Learning with Dynamic Sparse Distributed Memory Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose in this paper an efficient semi-distributed associative memory algorithm called Dynamic Sparse Distributed Memory (DSDM) where learning and evaluating can be carried out at any point of time. |
Julien Pourcel; Ngoc-Son Vu; Robert M. French; |
1059 | Contrastive Deep Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, deep supervision conflicts with the well-known observation that the shallow layers learn low-level features instead of task-biased high-level semantic features. To address this issue, this paper proposes a novel training framework named Contrastive Deep Supervision, which supervises the intermediate layers with augmentation-based contrastive learning. |
Linfeng Zhang; Xin Chen; Junbo Zhang; Runpei Dong; Kaisheng Ma; |
1060 | Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e., image classification. By a comprehensive temporal analysis, we observe a trade-off between these two properties. |
Quan Cui; Bingchen Zhao; Zhao-Min Chen; Borui Zhao; Renjie Song; Boyan Zhou; Jiajun Liang; Osamu Yoshie; |
1061 | LocVTP: Video-Text Pre-training for Temporal Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we experimentally analyze and demonstrate the incompatibility of current VTP methods with localization tasks, and propose a novel Localization-oriented Video-Ttext Pre-training framework, dubbed as LocVTP. |
Meng Cao; Tianyu Yang; Junwu Weng; Can Zhang; Jue Wang; Yuexian Zou; |
1062 | Few-Shot End-to-End Object Detection Via Constantly Concentrated Encoding Across Heads Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a few-shot adaptation strategy, Constantly Concentrated Encoding across heads (CoCo-RCNN), for the end-to-end detectors. |
Jiawei Ma; Guangxing Han; Shiyuan Huang; Yuncong Yang; Shih-Fu Chang; |
1063 | Implicit Neural Representations for Image Compression Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such compression algorithms are promising candidates as a general purpose approach for any coordinate-based data modality. However, in order to live up to this promise current INR-based compression algorithms need to improve their rate-distortion performance by a large margin. This work progresses on this problem. |
Yannick Strü,mpler; Janis Postels; Ren Yang; Luc Van Gool; Federico Tombari; |
1064 | LiP-Flow: Learning Inference-Time Priors for Codec Avatars Via Normalizing Flows in Latent Space Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, at inference time, they must be driven by limited inputs such as partial views recorded by headset-mounted cameras or a front-facing camera, and sparse facial landmarks. To mitigate this asymmetry, we introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space. |
Emre Aksan; Shugao Ma; Akin Caliskan; Stanislav Pidhorskyi; Alexander Richard; Shih-En Wei; Jason Saragih; Otmar Hilliges; |
1065 | Learning to Drive By Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we aim to pretrain policy representations for driving tasks by watching hours-long uncurated YouTube videos. |
Qihang Zhang; Zhenghao Peng; Bolei Zhou; |
1066 | Learning Ego 3D Representation As Ray Tracing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel end-to-end architecture for ego 3D representation learning from an arbitrary number of unconstrained camera views. |
Jiachen Lu; Zheyuan Zhou; Xiatian Zhu; Hang Xu; Li Zhang; |
1067 | Static and Dynamic Concepts for Self-Supervised Video Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel learning scheme for self-supervised video representation learning. |
Rui Qian; Shuangrui Ding; Xian Liu; Dahua Lin; |
1068 | SphereFed: Hyperspherical Federated Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. |
Xin Dong; Sai Qian Zhang; Ang Li; H.T. Kung; |
1069 | Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hierarchical spatial-temporal nature of human skeletons. Different from such superficial supervision at the video level, we propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder (Hi-TRS), to explicitly capture spatial, short-term, and long-term temporal dependencies at frame, clip, and video levels, respectively. |
Yuxiao Chen; Long Zhao; Jianbo Yuan; Yu Tian; Zhaoyang Xia; Shijie Geng; Ligong Han; Dimitris N. Metaxas; |
1070 | Posterior Refinement on Metric Matrix Improves Generalization Bound in Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we attempt to fill up this research gap and theoretically analyze the impact of the refined metric matrix property on the generalization gap. |
Mingda Wang; Canqian Yang; Yi Xu; |
1071 | Balancing Stability and Plasticity Through Advanced Null Space in Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new continual learning approach, Advanced Null Space (AdNS), to balance the stability and plasticity without storing any old data of previous tasks. |
Yajing Kong; Liu Liu; Zhen Wang; Dacheng Tao; |
1072 | DisCo: Remedying Self-Supervised Learning on Lightweight Models with Distilled Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Since current SSL methods mainly rely on contrastive learning to train the network, we propose a simple yet effective method termed Distilled Contrastive Learning (DisCo) to ease this issue. |
Yuting Gao; Jia-Xin Zhuang; Shaohui Lin; Hao Cheng; Xing Sun; Ke Li; Chunhua Shen; |
1073 | CoSCL: Cooperation of Small Continual Learners Is Stronger Than A Big One Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. |
Liyuan Wang; Xingxing Zhang; Qian Li; Jun Zhu; Yi Zhong; |
1074 | Manifold Adversarial Learning for Cross-Domain 3D Shape Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to learn 3D point cloud representation on a seen source domain and generalize to an unseen target domain via adversarial learning. |
Hao Huang; Cheng Chen; Yi Fang; |
1075 | Fast-MoCo: Boost Momentum-Based Contrastive Learning with Combinatorial Patches Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Fast-MoCo – a novel framework that utilizes combinatorial patches to construct multiple positive pairs from two augmented views, which provides abundant supervision signals that bring significant acceleration with neglectable extra computational cost. |
Yuanzheng Ci; Chen Lin; Lei Bai; Wanli Ouyang; |
1076 | LoRD: Local 4D Implicit Representation for High-Fidelity Dynamic Human Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While many deep local representations have shown promising results for 3D shape modeling, their 4D counterpart does not exist yet. In this paper, we fill this blank by proposing a novel Local 4D implicit Representation for Dynamic clothed human, named LoRD, which has the merits of both 4D human modeling and local representation, and enables high-fidelity reconstruction with detailed surface deformations, such as clothing wrinkles. |
Boyan Jiang; Xinlin Ren; Mingsong Dou; Xiangyang Xue; Yanwei Fu; Yinda Zhang; |
1077 | On The Versatile Uses of Partial Distance Correlation in Deep Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. |
Xingjian Zhen; Zihang Meng; Rudrasis Chakraborty; Vikas Singh; |
1078 | Self-Regulated Feature Learning Via Teacher-Free Feature Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Conventional feature distillation framework demands extra selecting/training budgets of teachers and complex transformations to align the features between teacher-student models. To address the problem, we analyze teacher roles in feature distillation and have an intriguing observation: additional teacher architectures are not always necessary. |
Lujun Li; |
1079 | Balancing Between Forgetting and Acquisition in Incremental Subpopulation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel two-stage learning scheme to explicitly disentangle the acquisition and forgetting for achieving a better balance between subpopulation learning and seen population forgetting: in the first “gain-acquisition” stage, we progressively learn a new classifier based on the margin-enforce loss, which enforces the hard samples and population to have a larger weight for classifier updating and avoid uniformly updating all the population in the second “counter-forgetting” stage, we search for the proper combination of the new and old classifiers by optimizing a novel objective based on proxies of forgetting and acquisition. |
Mingfu Liang; Jiahuan Zhou; Wei Wei; Ying Wu; |
1080 | Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We analyze that the well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process. In this paper, we propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems. |
Xulin Li; Yan Lu; Bin Liu; Yating Liu; Guojun Yin; Qi Chu; Jinyang Huang; Feng Zhu; Rui Zhao; Nenghai Yu; |
1081 | DAS: Densely-Anchored Sampling for Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we investigate how to alleviate the “missing embedding” issue to improve the sampling quality and achieve effective DML. |
Lizhao Liu; Shangxin Huang; Zhuangwei Zhuang; Ran Yang; Mingkui Tan; Yaowei Wang; |
1082 | Learn from All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore dealing with noisy labels from a new feature-learning perspective. |
Yuhang Zhang; Chengrui Wang; Xu Ling; Weihong Deng; |
1083 | A Non-Isotropic Probabilistic Take On Proxy-Based Deep Metric Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this disregards the embedding norm, which can carry additional beneficial context such as class- or image-intrinsic uncertainty. In addition, proxy-based DML struggles to learn class-internal structures. To address both issues at once, we introduce non-isotropic probabilistic proxy-based DML. |
Michael Kirchhof; Karsten Roth; Zeynep Akata; Enkelejda Kasneci; |
1084 | TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel data augmentation technique TokenMix to improve the performance of vision transformers. |
Jihao Liu; Boxiao Liu; Hang Zhou; Hongsheng Li; Yu Liu; |
1085 | UFO: Unified Feature Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models under real-world and large-scale scenarios, which requires a collection of multiple AI functions. |
Teng Xi; Yifan Sun; Deli Yu; Bi Li; Nan Peng; Gang Zhang; Xinyu Zhang; Zhigang Wang; Jinwen Chen; Jian Wang; Lufei Liu; Haocheng Feng; Junyu Han; Jingtuo Liu; Errui Ding; Jingdong Wang; |
1086 | Sound Localization By Self-Supervised Time Delay Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Estimating a sound’s time delay requires finding correspondences between the signals recorded by each microphone. We propose to learn these correspondences through self-supervision, drawing on recent techniques from visual tracking. |
Ziyang Chen; David F. Fouhey; Andrew Owens; |
1087 | X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a representation learning framework called X-Learner, which learns the universal feature of multiple vision tasks supervised by various sources, with expansion and squeeze stage:1) Expansion Stage: X-Learner learns the task-specific feature to alleviate task interference and enrich the representation by reconciliation layer. |
Yinan He; Gengshi Huang; Siyu Chen; Jianing Teng; Kun Wang; Zhenfei Yin; Lu Sheng; Ziwei Liu; Yu Qiao; Jing Shao; |
1088 | SLIP: Self-Supervision Meets Language-Image Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we explore whether self-supervised learning can aid in the use of language supervision for visual representation learning with Vision Transformers. |
Norman Mu; Alexander Kirillov; David Wagner; Saining Xie; |
1089 | Discovering Deformable Keypoint Pyramids Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In natural scenes, these keypoints are often hierarchically grouped into sets corresponding to coherently moving objects and their moveable and deformable parts. Motivated by this observation, we propose Keypoint Pyramids, an approach to exploit this property for discovering keypoints without explicit supervision. |
Jianing Qian; Anastasios Panagopoulos; Dinesh Jayaraman; |
1090 | Neural Video Compression Using GANs for Detail Synthesis and Propagation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present the first neural video compression method based on generative adversarial networks (GANs). |
Fabian Mentzer; Eirikur Agustsson; Johannes Ballé,; David Minnen; Nick Johnston; George Toderici; |
1091 | A Contrastive Objective for Learning Disentangled Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a new approach, proposing a new domain-wise contrastive objective for ensuring invariant representations. |
Jonathan Kahana; Yedid Hoshen; |
1092 | PT4AL: Using Self-Supervised Pretext Tasks for Active Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. |
John Seon Keun Yi; Minseok Seo; Jongchan Park; Dong-Geol Choi; |
1093 | ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. |
Haokui Zhang; Wenze Hu; Xiaoyu Wang; |
1094 | DualPrompt: Complementary Prompting for Rehearsal-Free Continual Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny set of parameters, called prompt, to properly instruct a pre-trained model to learn tasks arriving sequentially, without buffering past examples. |
Zifeng Wang; Zizhao Zhang; Sayna Ebrahimi; Ruoxi Sun; Han Zhang; Chen-Yu Lee; Xiaoqi Ren; Guolong Su; Vincent Perot; Jennifer Dy; Tomas Pfister; |
1095 | Unifying Visual Contrastive Learning for Object Recognition from A Graph Perspective Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose to \underline{Uni}fy existing unsupervised \underline{V}isual \underline{C}ontrastive \underline{L}earning methods by using a GCN layer as the predictor layer (UniVCL), which deserves two merits to unsupervised learning in object recognition. |
Shixiang Tang; Feng Zhu; Lei Bai; Rui Zhao; Chenyu Wang; Wanli Ouyang; |
1096 | Decoupled Contrastive Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the denominator and significantly improves the learning efficiency. |
Chun-Hsiao Yeh; Cheng-Yao Hong; Yen-Chi Hsu; Tyng-Luh Liu; Yubei Chen; Yann LeCun; |
1097 | Joint Learning of Localized Representations from Medical Images and Reports Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Still, most existing methods target image classification downstream tasks and may not be optimal for localized tasks like semantic segmentation or object detection. We therefore propose Localized representation learning from Vision and Text (LoVT), a text-supervised pre-training method that explicitly targets localized medical imaging tasks. |
Philip Mü,ller; Georgios Kaissis; Congyu Zou; Daniel Rueckert; |
1098 | The Challenges of Continuous Self-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the use of replay buffers as an approach to alleviate the issues of inefficiency and temporal correlations. |
Senthil Purushwalkam; Pedro Morgado; Abhinav Gupta; |
1099 | Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To complete the auxiliary task, we propose an unsupervised stroke disorder algorithm, which does well in stroke extraction and sketch augmentation. |
Zhixin Ling; Zhen Xing; Jian Zhou; Xiangdong Zhou; |
1100 | Identifying Hard Noise in Long-Tailed Sample Distribution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such imbalance makes a classifier less discriminative for the tail classes, whose previously “easy” noises are now turned into “hard” ones–they are almost as outliers as the tail samples. We introduce this new challenge as Noisy Long-Tailed Classification (NLT). |
Xuanyu Yi; Kaihua Tang; Xian-Sheng Hua; Joo-Hwee Lim; Hanwang Zhang; |
1101 | Relative Contrastive Loss for Unsupervised Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by the ability of humans in recognizing relatively positive/negative samples, we propose the Relative Contrastive Loss (RCL) to learn feature representation from relatively positive/negative pairs, which not only learns more real world semantic variations than the single-instance-positive methods but also respects positive-negative relativeness compared with absolute prototype-positive methods. |
Shixiang Tang; Feng Zhu; Lei Bai; Rui Zhao; Wanli Ouyang; |
1102 | Fine-Grained Fashion Representation Learning By Online Deep Clustering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a deep learning based online clustering method to jointly learn fine-grained fashion representations for all attributes at both instance and cluster level, where the attribute-specific cluster centers are online estimated. |
Yang Jiao; Ning Xie; Yan Gao; Chien-chih Wang; Yi Sun; |
1103 | NashAE: Disentangling Representations Through Adversarial Covariance Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a self-supervised method to disentangle factors of variation in high-dimensional data that does not rely on prior knowledge of the underlying variation profile (e.g., no assumptions on the number or distribution of the individual variables to be extracted). |
Eric Yeats; Frank Liu; David Womble; Hai Li; |
1104 | A Gyrovector Space Approach for Symmetric Positive Semi-Definite Matrix Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While these works share a common idea of generalizing some basic operations in deep neural networks (DNNs) to the SPSD manifold setting, their proposed generalizations are usually designed in an ad hoc manner. In this work, we make an attempt to propose a principled framework for building such generalizations. |
Xuan Son Nguyen; |
1105 | Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, recent work suggests that transformers can support learning across multiple modalities and allow knowledge sharing. Inspired by this, we investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. |
Haoxuan You; Luowei Zhou; Bin Xiao; Noel Codella; Yu Cheng; Ruochen Xu; Shih-Fu Chang; Lu Yuan; |
1106 | Contrasting Quadratic Assignments for Set-Based Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we note that the approach of considering individual pairs cannot account for both intra-set and inter-set similarities when the sets are formed from the views of the data. |
Artem Moskalev; Ivan Sosnovik; Volker Fischer; Arnold Smeulders; |
1107 | Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose two distillation-based objectives for class incremental learning that leverage the structure of the feature space to maintain accuracy on previous classes, as well as enable learning the new classes. |
Arjun Ashok; K J Joseph; Vineeth N Balasubramanian; |
1108 | Object Discovery and Representation Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategies, these methods sacrifice the simplicity and generality that makes SSL so powerful. Instead, we propose a self-supervised learning paradigm that discovers this image structure by itself. |
Olivier J. Hé,naff; Skanda Koppula; Evan Shelhamer; Daniel Zoran; Andrew Jaegle; Andrew Zisserman; Joã,o Carreira; Relja Arandjelovi?; |
1109 | Trading Positional Complexity Vs Deepness in Coordinate Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Hitherto, the rationale for the effectiveness of these \emph{positional encodings} has been mainly studied through a Fourier lens. In this paper, we strive to broaden this understanding by showing that alternative non-Fourier embedding functions can indeed be used for positional encoding. |
Jianqiao Zheng; Sameera Ramasinghe; Xueqian Li; Simon Lucey; |
1110 | MVDG: A Unified Multi-View Framework for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel multi-view DG framework to effectively reduce the overfitting in both the training and test stage. |
Jian Zhang; Lei Qi; Yinghuan Shi; Yang Gao; |
1111 | Panoptic Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce panoptic scene graph generation (PSG task), a new problem that requires the model to generate more comprehensive scene graph representations based on panoptic segmentations rather than rigid bounding boxes. |
Jingkang Yang; Yi Zhe Ang; Zujin Guo; Kaiyang Zhou; Wayne Zhang; Ziwei Liu; |
1112 | Object-Compositional Neural Implicit Surfaces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation. |
Qianyi Wu; Xian Liu; Yuedong Chen; Kejie Li; Chuanxia Zheng; Jianfei Cai; Jianmin Zheng; |
1113 | RigNet: Repetitive Image Guided Network for Depth Completion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, blurry guidance in the image and unclear structure in the depth still impede the performance of the image guided frameworks. To tackle these problems, we explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. |
Zhiqiang Yan; Kun Wang; Xiang Li; Zhenyu Zhang; Jun Li; Jian Yang; |
1114 | FADE: Fusing The Assets of Decoder and Encoder for Task-Agnostic Upsampling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present FADE, a novel, plug-and-play, and task-agnostic upsampling operator. |
Hao Lu; Wenze Liu; Hongtao Fu; Zhiguo Cao; |
1115 | LiDAL: Inter-Frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose LiDAL, a novel active learning method for 3D LiDAR semantic segmentation by exploiting inter-frame uncertainty among LiDAR frames. |
Zeyu Hu; Xuyang Bai; Runze Zhang; Xin Wang; Guangyuan Sun; Hongbo Fu; Chiew-Lan Tai; |
1116 | Hierarchical Memory Learning for Fine-Grained Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In order to alleviate the impact of the suboptimum mixed-granularity annotation and long-tail effect problems, this paper proposes a novel Hierarchical Memory Learning (HML) framework to learn the model from simple to complex, which is similar to the human beings’ hierarchical memory learning process. |
Youming Deng; Yansheng Li; Yongjun Zhang; Xiang Xiang; Jian Wang; Jingdong Chen; Jiayi Ma; |
1117 | DODA: Data-Oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a Data-Oriented Domain Adaptation (DODA) framework to mitigate pattern and context gaps caused by different sensing mechanisms and layout placements across domains. |
Runyu Ding; Jihan Yang; Li Jiang; Xiaojuan Qi; |
1118 | MTFormer: Multi-task Learning Via Transformer and Cross-Task Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore the advantages of utilizing transformer structures for addressing multi-task learning (MTL). |
Xiaogang Xu; Hengshuang Zhao; Vibhav Vineet; Ser-Nam Lim; Antonio Torralba; |
1119 | MonoPLFlowNet: Permutohedral Lattice FlowNet for Real-Scale 3D Scene Flow Estimation with Monocular Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a deep learning architecture on permutohedral lattice – MonoPLFlowNet. |
Runfa Li; Truong Nguyen; |
1120 | TO-Scene: A Large-Scale Dataset for Understanding 3D Tabletop Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, it is hard to meet this demand by directly deploying data-driven algorithms, since 3D tabletop scenes are rarely available in current datasets. To remedy this defect, we introduce TO-Scene, a large-scale dataset focusing on tabletop scenes, which contains 20,740 scenes with three variants. |
Mutian Xu; Pei Chen; Haolin Liu; Xiaoguang Han; |
1121 | Is It Necessary to Transfer Temporal Knowledge for Domain Adaptive Video Semantic Segmentation? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we argue that it is not necessary to transfer temporal knowledge since the temporal continuity of video segmentation in the target domain can be estimated and enforced without reference to videos in the source domain. |
Xinyi Wu; Zhenyao Wu; Jin Wan; Lili Ju; Song Wang; |
1122 | Meta Spatio-Temporal Debiasing for Video Scene Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, due to the long-tailed training data in datasets, the generalization performance of existing VidSGG models can be affected by the spatio-temporal conditional bias problem. In this work, from the perspective of meta-learning, we propose a novel Meta Video Scene Graph Generation (MVSGG) framework to address such a bias problem. |
Li Xu; Haoxuan Qu; Jason Kuen; Jiuxiang Gu; Jun Liu; |
1123 | Improving The Reliability for Confidence Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a meta-learning framework that can simultaneously improve upon both qualities in a confidence estimation model.Specifically, we first construct virtual training and testing sets with some intentionally designed distribution differences between them. |
Haoxuan Qu; Yanchao Li; Lin Geng Foo; Jason Kuen; Jiuxiang Gu; Jun Liu; |
1124 | Fine-Grained Scene Graph Generation with Data Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. |
Ao Zhang; Yuan Yao; Qianyu Chen; Wei Ji; Zhiyuan Liu; Maosong Sun; Tat-Seng Chua; |
1125 | Pose2Room: Understanding 3D Scenes from Human Activities Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we pose the question: Can we reason about object structure in real-world environments solely from human trajectory information? |
Yinyu Nie; Angela Dai; Xiaoguang Han; Matthias Nieß,ner; |
1126 | Towards Hard-Positive Query Mining for DETR-Based Human-Object Interaction Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Accordingly, in this paper, we propose to enhance DETR’s robustness by mining hard-positive queries, which are forced to make correct predictions using partial visual cues. |
Xubin Zhong; Changxing Ding; Zijian Li; Shaoli Huang; |
1127 | Discovering Human-Object Interaction Concepts Via Self-Compositional Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, 1) we introduce a novel and challenging task for a comprehensive HOI understanding, which is termed as HOI Concept Discovery and 2) we devise a self-compositional learning framework (or SCL) for HOI concept discovery. |
Zhi Hou; Baosheng Yu; Dacheng Tao; |
1128 | Primitive-Based Shape Abstraction Via Nonparametric Bayesian Inference Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel non-parametric Bayesian statistical method to infer an abstraction, consisting of an unknown number of geometric primitives, from a point cloud. |
Yuwei Wu; Weixiao Liu; Sipu Ruan; Gregory S. Chirikjian; |
1129 | Stereo Depth Estimation with Echoes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by the reciprocal relationship between both modalities, in this paper, we propose an end-to-end framework named StereoEchoes for stereo depth estimation with echoes. |
Chenghao Zhang; Kun Tian; Bolin Ni; Gaofeng Meng; Bin Fan; Zhaoxiang Zhang; Chunhong Pan; |
1130 | Inverted Pyramid Multi-task Transformer for Dense Scene Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel end-to-end Inverted Pyramid multi-task Transformer (InvPT) to perform simultaneous modeling of spatial positions and multiple tasks in a unified framework. |
Hanrong Ye; Dan Xu; |
1131 | PETR: Position Embedding Transformation for Multi-View 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop position embedding transformation (PETR) for multi-view 3D object detection. |
Yingfei Liu; Tiancai Wang; Xiangyu Zhang; Jian Sun; |
1132 | S2Net: Stochastic Sequential Pointcloud Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we tackle the stochastic SPF problem by proposing a generative model with two main components: (1) a conditional variational recurrent neural network that models a temporally-dependent latent space (2) a pyramid-LSTM that increases the fidelity of predictions with temporally-aligned skip connections. |
Xinshuo Weng; Junyu Nan; Kuan-Hui Lee; Rowan McAllister; Adrien Gaidon; Nicholas Rhinehart; Kris M. Kitani; |
1133 | RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth. |
Mu He; Le Hui; Yikai Bian; Jian Ren; Jin Xie; Jian Yang; |
1134 | PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the relationship between the depth and panoptic segmentation is not well explored — simply combining existing methods leads to competition and needs carefully weight balancing. In this paper, we present PolyphonicFormer, a vision transformer to unify these sub-tasks under the DVPS task and lead to more robust results. |
Haobo Yuan; Xiangtai Li; Yibo Yang; Guangliang Cheng; Jing Zhang; Yunhai Tong; Lefei Zhang; Dacheng Tao; |
1135 | SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We observe that, as point clouds are samples of the 3D world, the distribution of points in a local neighbourhood is relatively homogeneous, exhibiting strong semantic similarity. Motivated by this, we propose a new weak supervision method to implicitly augment these highly sparse supervision signals. |
Qingyong Hu; Bo Yang; Guangchi Fang; Yulan Guo; Aleš Leonardis; Niki Trigoni; Andrew Markham; |
1136 | PointMixer: MLP-Mixer for Point Cloud Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unlike images, point clouds are inherently sparse, unordered and irregular, which limits the direct use of MLP-Mixer for point cloud understanding. To overcome these limitations, we propose PointMixer, a universal point set operator that facilitates information sharing among unstructured 3D point cloud. |
Jaesung Choe; Chunghyun Park; Francois Rameau; Jaesik Park; In So Kweon; |
1137 | Initialization and Alignment for Adversarial Texture Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To improve robustness, particularly of recent adversarial texture optimization, we develop an explicit initialization and an alignment procedure. |
Xiaoming Zhao; Zhizhen Zhao; Alexander G. Schwing; |
1138 | MOTR: End-to-End Multiple-Object Tracking with TRansformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose MOTR, which extends DETR \cite{carion2020detr} and introduces “track query” to model the tracked instances in the entire video. |
Fangao Zeng; Bin Dong; Yuang Zhang; Tiancai Wang; Xiangyu Zhang; Yichen Wei; |
1139 | GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aware), a generic foreground object search method with discriminative modeling on geometry and lighting compatibility for open-world image compositing. |
Sijie Zhu; Zhe Lin; Scott Cohen; Jason Kuen; Zhifei Zhang; Chen Chen; |
1140 | LaLaLoc++: Global Floor Plan Comprehension for Layout Localisation in Unvisited Environments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present LaLaLoc++, a method for floor plan localisation in unvisited environments through latent representations of room layout. |
Henry Howard-Jenkins; Victor Adrian Prisacariu; |
1141 | 3D-PL: Domain Adaptive Depth Estimation with 3D-Aware Pseudo-Labeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we develop a domain adaptation framework via generating reliable pseudo ground truths of depth from real data to provide direct supervisions. |
Yu-Ting Yen; Chia-Ni Lu; Wei-Chen Chiu; Yi-Hsuan Tsai; |
1142 | Panoptic-PartFormer: Learning A Unified Model for Panoptic Part Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous work mainly utilizes separated approaches to handle thing, stuff, and part predictions individually without performing any shared computation and task association. In this work, we aim to unify these tasks at the architectural level, designing the first end-to-end unified method named Panoptic-PartFormer. |
Xiangtai Li; Shilin Xu; Yibo Yang; Guangliang Cheng; Yunhai Tong; Dacheng Tao; |
1143 | Salient Object Detection for Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Differing from SOD for images, we find the attention shift of point clouds may provoke saliency conflict, i.e., an object paradoxically belongs to salient and non-salient categories. To eschew this issue, we present a novel view-dependent perspective of salient objects, reasonably reflecting the most eye-catching objects in point cloud scenarios. |
Songlin Fan; Wei Gao; Ge Li; |
1144 | Learning Semantic Segmentation from Multiple Datasets with Label Shifts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: While it is desirable to train segmentation models on an aggregation of multiple datasets, a major challenge is that the label space of each dataset may be in conflict with one another. To tackle this challenge, we propose UniSeg, an effective and model-agnostic approach to automatically train segmentation models across multiple datasets with heterogeneous label spaces, without requiring any manual relabeling efforts. |
Dongwan Kim; Yi-Hsuan Tsai; Yumin Suh; Masoud Faraki; Sparsh Garg; Manmohan Chandraker; Bohyung Han; |
1145 | Weakly Supervised 3D Scene Segmentation with Region-Level Boundary Awareness and Instance Discrimination Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The paper introduces an effective approach to tackle the 3D scene understanding problem when labeled scenes are limited. |
Kangcheng Liu; Yuzhi Zhao; Qiang Nie; Zhi Gao; Ben M. Chen; |
1146 | Towards Open-Vocabulary Scene Graph Generation with Prompt-Based Finetuning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting, in which a model is trained on a small set of base object classes but is required to infer relations for unseen target object classes. |
Tao He; Lianli Gao; Jingkuan Song; Yuan-Fang Li; |
1147 | Variance-Aware Weight Initialization for Point Convolutional Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: While well-founded weight initialization strategies can render batch normalization unnecessary and thus avoid these drawbacks, no such approaches have been proposed for point convolutional networks. To fill this gap, we propose a framework to unify the multitude of continuous convolutions. |
Pedro Hermosilla; Michael Schelling; Tobias Ritschel; Timo Ropinski; |
1148 | Break and Make: Interactive Structural Understanding Using LEGO Bricks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In order to advance research in interactive reasoning for part-based geometric understanding, we propose a challenging new assembly problem using LEGO bricks that we call Break and Make. |
Aaron Walsman; Muru Zhang; Klemen Kotar; Karthik Desingh; Ali Farhadi; Dieter Fox; |
1149 | Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a novel scene flow estimation architecture using bidirectional flow embedding layers. |
Wencan Cheng; Jong Hwan Ko; |
1150 | 3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We tackle the essential task of finding dense visual correspondences between a pair of images. |
Runyu Mao; Chen Bai; Yatong An; Fengqing Zhu; Cheng Lu; |
1151 | Video Restoration Framework and Its Meta-Adaptations to Data-Poor Conditions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose a generic architecture that is effective for any weather condition due to the ability to extract robust feature maps without any domain-specific knowledge. |
Prashant W Patil; Sunil Gupta; Santu Rana; Svetha Venkatesh; |
1152 | MonteBoxFinder: Detecting and Filtering Primitives to Fit A Noisy Point Cloud Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present MonteBoxFinder, a method that, given an noisy input point cloud, detects a dense set of imperfect boxes, and employs a discrete optimization algorithm that efficiently explores the space of allbox arrangements in order to find the arrangement that best fits the pointcloud. |
Michaë,l Ramamonjisoa; Sinisa Stekovic; Vincent Lepetit; |
1153 | Scene Text Recognition with Permuted Autoregressive Sequence Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our method, PARSeq, learns an ensemble of internal AR LMs with shared weights using Permutation Language Modeling. |
Darwin Bautista; Rowel Atienza; |
1154 | When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, such methods may fail to accurately read formulas with complicated structure or generate long markup sequences, as the attention results are often inaccurate due to the large variance of writing styles or spatial layouts. To alleviate this problem, we propose an unconventional network for HMER named Counting-Aware Network (CAN), which jointly optimizes two tasks: HMER and symbol counting. |
Bohan Li; Ye Yuan; Dingkang Liang; Xiao Liu; Zhilong Ji; Jinfeng Bai; Wenyu Liu; Xiang Bai; |
1155 | Detecting Tampered Scene Text in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new task, named Tampered Scene Text Detection (TSTD), to localize text instances and recognize the texture authenticity in an end-to-end manner. |
Yuxin Wang; Hongtao Xie; Mengting Xing; Jing Wang; Shenggao Zhu; Yongdong Zhang; |
1156 | Optimal Boxes: Boosting End-to-End Scene Text Recognition By Adjusting Annotated Bounding Boxes Via Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Box Adjuster, a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models. |
Jingqun Tang; Wenming Qian; Luchuan Song; Xiena Dong; Lan Li; Xiang Bai; |
1157 | GLASS: Global to Local Attention for Scene-Text Spotting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Among the main challenges that end-to-end approaches face is the performance degradation when recognizing text across scale variations (smaller or larger text), and arbitrary word rotation angles. In this work, we address these challenges by proposing a novel global-to-local attention mechanism for text spotting, termed GLASS, that fuses together global and local features. |
Roi Ronen; Shahar Tsiper; Oron Anschel; Inbal Lavi; Amir Markovitz; R. Manmatha; |
1158 | COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we propose a novel task that predicts the link between truncated texts. |
Jeonghun Baek; Yusuke Matsui; Kiyoharu Aizawa; |
1159 | Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a weakly supervised pre-training method, oCLIP, which can acquire effective scene text representations by jointly learning and aligning visual and textual information. |
Chuhui Xue; Wenqing Zhang; Yu Hao; Shijian Lu; Philip H. S. Torr; Song Bai; |
1160 | Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate these problems, we propose to recognize the artistic text at three levels.Besides, we provide an artistic text dataset to benchmark the performance. |
Xudong Xie; Ling Fu; Zhifei Zhang; Zhaowen Wang; Xiang Bai; |
1161 | Levenshtein OCR Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we cast the problem of scene text recognition as an iterative sequence refinement process. |
Cheng Da; Peng Wang; Cong Yao; |
1162 | Multi-Granularity Prediction for Scene Text Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet powerful vision STR model, which is built upon ViT and outperforms previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. |
Peng Wang; Cheng Da; Cong Yao; |
1163 | Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, the input scale has always been a tough trade-off since recognizing a small text instance usually requires enlarging the whole image, which brings high computational costs. In this paper, to address this problem, we propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework, which aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency. |
Ying Chen; Liang Qiao; Zhanzhan Cheng; Shiliang Pu; Yi Niu; Xi Li; |
1164 | Contextual Text Block Detection Towards Scene Text Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents contextual text detection, a new setup that detects CTBs for better understanding of texts in scenes. |
Chuhui Xue; Jiaxing Huang; Wenqing Zhang; Shijian Lu; Changhu Wang; Song Bai; |
1165 | CoMER: Modeling Coverage for Transformer-Based Handwritten Mathematical Expression Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose CoMER, a model that adopts the coverage information in the transformer decoder. |
Wenqi Zhao; Liangcai Gao; |
1166 | Don’t Forget Me: Accurate Background Recovery for Text Removal Via Modeling Local-Global Context Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of the existing methods often generate inconsistent results for complex background. To address this issue, we propose a Contextual-guided Text Removal Network, termed as CTRNet. |
Chongyu Liu; Lianwen Jin; Yuliang Liu; Canjie Luo; Bangdong Chen; Fengjun Guo; Kai Ding; |
1167 | TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In text recognition, we reveal another such shortcut, whereby recognizers overly depend on local image statistics. Motivated by this, we suggest an approach to regulate the reliance on local statistics that improves text recognition performance. |
Oren Nuriel; Sharon Fogel; Ron Litman; |
1168 | Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a novel method, called Multi-modAl Text Recognition Network (MATRN), that enables interactions between visual and semantic features for better recognition performances. |
Byeonghu Na; Yoonsik Kim; Sungrae Park; |
1169 | SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Semantic GAN and Balanced Attention Network (SGBANet) to recognize the texts in scene images. |
Dajian Zhong; Shujing Lyu; Palaiahnakote Shivakumara; Bing Yin; Jiajia Wu; Umapada Pal; Yue Lu; |
1170 | Pure Transformer with Integrated Experts for Scene Text Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work proposes the use of a transformer-only model as a simple baseline which outperforms hybrid CNN-transformer models. |
Yew Lee Tan; Adams Wai-Kin Kong; Jung-Jae Kim; |
1171 | OCR-Free Document Understanding Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR 2) inflexibility of OCR models on languages or types of documents 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. |
Geewook Kim; Teakgyu Hong; Moonbin Yim; JeongYeon Nam; Jinyoung Park; Jinyeong Yim; Wonseok Hwang; Sangdoo Yun; Dongyoon Han; Seunghyun Park; |
1172 | CAR: Class-Aware Regularizations for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, aiming to use class level information more effectively, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. |
Ye Huang; Di Kang; Liang Chen; Xuefei Zhe; Wenjing Jia; Linchao Bao; Xiangjian He; |
1173 | Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the task of synthetic-to-real domain generalized semantic segmentation, which aims to learn a model that is robust to unseen real-world scenes using only synthetic data. |
Yuyang Zhao; Zhun Zhong; Na Zhao; Nicu Sebe; Gim Hee Lee; |
1174 | SeqFormer: Sequential Transformer for Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present SeqFormer for video instance segmentation. |
Junfeng Wu; Yi Jiang; Song Bai; Wenqing Zhang; Xiang Bai; |
1175 | Saliency Hierarchy Modeling Via Generative Kernels for Salient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To alleviate the problem, we propose a Saliency Hierarchy Network (SHNet), modeling saliency patterns via generative kernels from two perspectives: region-level and sample-level. |
Wenhu Zhang; Liangli Zheng; Huanyu Wang; Xintian Wu; Xi Li; |
1176 | In Defense of Online Models for Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and fully exploit history information for stability. |
Junfeng Wu; Qihao Liu; Yi Jiang; Song Bai; Alan Yuille; Xiang Bai; |
1177 | Active Pointly-Supervised Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present an economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object. |
Chufeng Tang; Lingxi Xie; Gang Zhang; Xiaopeng Zhang; Qi Tian; Xiaolin Hu; |
1178 | A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we find that a light weighted decoder counts for segmentation, and propose a pure transformer-based segmentation decoder, named SegDeformer, to seamlessly incorporate into current varied transformer-based encoders. |
Bowen Shi; Dongsheng Jiang; Xiaopeng Zhang; Han Li; Wenrui Dai; Junni Zou; Hongkai Xiong; Qi Tian; |
1179 | XMem: Long-Term Video Object Segmentation with An Atkinson-Shiffrin Memory Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. |
Ho Kei Cheng; Alexander G. Schwing; |
1180 | Self-Distillation for Robust LiDAR Semantic Segmentation in Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new and effective self-distillation framework with our new Test-Time Augmentation (TTA) and Transformer based Voxel Feature Encoder (TransVFE) for robust LiDAR semantic segmentation in autonomous driving, where the robustness is mission-critical but usually neglected. |
Jiale Li; Hang Dai; Yong Ding; |
1181 | 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, in this work, we propose the 2D Priors Assisted Semantic Segmentation (2DPASS), a general training scheme, to boost the representation learning on point clouds, by fully taking advantage of 2D images with rich appearance. |
Xu Yan; Jiantao Gao; Chaoda Zheng; Chao Zheng; Ruimao Zhang; Shuguang Cui; Zhen Li; |
1182 | Extract Free Dense Labels from CLIP Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we wish examine the intrinsic potential of CLIP for pixel-level dense prediction, specifically in semantic segmentation. |
Chong Zhou; Chen Change Loy; Bo Dai; |
1183 | 3D Compositional Zero-Shot Learning with DeCompositional Consensus Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: As a solution, we propose DeCompositional Consensus, which combines a part segmentation network with a part scoring network. |
Muhammad Ferjad Naeem; Evin P?nar Ö,rnek; Yongqin Xian; Luc Van Gool; Federico Tombari; |
1184 | Video Mask Transfiner for High-Quality Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we set out to tackle these issues, with the aim of achieving highly detailed and more temporally stable mask predictions for VIS. |
Lei Ke; Henghui Ding; Martin Danelljan; Yu-Wing Tai; Chi-Keung Tang; Fisher Yu; |
1185 | Box-Supervised Instance Segmentation with Level Set Evolution Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel single-shot box-supervised instance segmentation approach, which integrates the classical level set model with deep neural network delicately. |
Wentong Li; Wenyu Liu; Jianke Zhu; Miaomiao Cui; Xian-Sheng Hua; Lei Zhang; |
1186 | Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a 4D backbone for long-term point cloud video understanding. |
Hao Wen; Yunze Liu; Jingwei Huang; Bo Duan; Li Yi; |
1187 | Adaptive Agent Transformer for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a novel end-to-end adaptive agent transformer (AAFormer) to integrate prototypical and affinity learning to exploit the complementarity between them via a transformer encoder-decoder architecture, including a representation encoder, an agent learning decoder and an agent matching decoder. |
Yuan Wang; Rui Sun; Zhe Zhang; Tianzhu Zhang; |
1188 | Waymo Open Dataset: Panoramic Video Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We therefore present the Waymo Open Dataset: Panoramic Video Panoptic Segmentation dataset, a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. |
Jieru Mei; Alex Zihao Zhu; Xinchen Yan; Hang Yan; Siyuan Qiao; Yukun Zhu; Liang-Chieh Chen; Henrik Kretzschmar; |
1189 | TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, we propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios. |
Zhaoyuan Yin; Pichao Wang; Fan Wang; Xianzhe Xu; Hanling Zhang; Hao Li; Rong Jin; |
1190 | AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects Via Few-Shot Interactions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel framework, named AdaAfford, that learns to perform very few test-time interactions for quickly adapting the affordance priors to more accurate instance-specific posteriors.We will release our code and data upon paper acceptance. |
Yian Wang; Ruihai Wu; Kaichun Mo; Jiaqi Ke; Qingnan Fan; Leonidas J. Guibas; Hao Dong; |
1191 | Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel cost aggregation network, called Volumetric Aggregation with Transformers (VAT), for few-shot segmentation. |
Sunghwan Hong; Seokju Cho; Jisu Nam; Stephen Lin; Seungryong Kim; |
1192 | Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a context-aware compositional data augmentation technique to adapt to out-of-the-distribution YouTube egocentric video. |
Lingzhi Zhang; Shenghao Zhou; Simon Stent; Jianbo Shi; |
1193 | Perceptual Artifacts Localization for Inpainting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by this workflow, we propose a new learning task of automatic segmentation of inpainting perceptual artifacts, and apply the model for inpainting model evaluation and iterative refinement. |
Lingzhi Zhang; Yuqian Zhou; Connelly Barnes; Sohrab Amirghodsi; Zhe Lin; Eli Shechtman; Jianbo Shi; |
1194 | 2D Amodal Instance Segmentation Guided By 3D Shape Prior Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper builds a bridge to link the 2D occluded instances with the 3D complete models by 3D reconstruction and utilizes 3D shape prior for 2D AIS. |
Zhixuan Li; Weining Ye; Tingting Jiang; Tiejun Huang; |
1195 | Data Efficient 3D Learner Via Knowledge Transferred from 2D Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via abundant RGB-D images. |
Ping-Chung Yu; Cheng Sun; Min Sun; |
1196 | Adaptive Spatial-BCE Loss for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an adaptive Spatial Binary Cross-Entropy (Spatial-BCE) Loss for WSSS, which aims to enhance the discrimination between pixels. |
Tong Wu; Guangyu Gao; Junshi Huang; Xiaolin Wei; Xiaoming Wei; Chi Harold Liu; |
1197 | Dense Gaussian Processes for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a few-shot segmentation method based on dense Gaussian process (GP) regression. |
Joakim Johnander; Johan Edstedt; Michael Felsberg; Fahad Shahbaz Khan; Martin Danelljan; |
1198 | 3D Instances As 1D Kernels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a 3D instance representation, termed instance kernels, where instances are represented by one-dimensional vectors that encode the semantic, positional, and shape information of 3D instances. |
Yizheng Wu; Min Shi; Shuaiyuan Du; Hao Lu; Zhiguo Cao; Weicai Zhong; |
1199 | TransMatting: Enhancing Transparent Objects Matting with Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a Transformer-based network, TransMatting, to model transparent objects with a big receptive field. |
Huanqia Cai; Fanglei Xue; Lele Xu; Lili Guo; |
1200 | MVSalNet:Multi-View Augmentation for RGB-D Salient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the geometry information conveyed by depth maps are mostly under-explored in existing RGB-D SOD methods. In this paper, we propose a new framework to address this issue. |
Jiayuan Zhou; Lijun Wang; Huchuan Lu; Kaining Huang; Xinchu Shi; Bocong Liu; |
1201 | $k$-Means Mask Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we rethink the relationship between pixels and object queries and propose to reformulate the cross-attention learning as a clustering process. |
Qihang Yu; Huiyu Wang; Siyuan Qiao; Maxwell Collins; Yukun Zhu; Hartwig Adam; Alan Yuille; Liang-Chieh Chen; |
1202 | SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose an effective and efficient segmentation attack method, dubbed SegPGD. |
Jindong Gu; Hengshuang Zhao; Volker Tresp; Philip H. S. Torr; |
1203 | Adversarial Erasing Framework Via Triplet with Gated Pyramid Pooling Layer for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To resolve the problems, we propose a Gated Pyramid Pooling (GPP) layer which is a substitute for a Global Average Pooling (GAP) layer, and an Adversarial Erasing Framework via Triplet (AEFT). |
Sung-Hoon Yoon; Hyeokjun Kweon; Jegyeong Cho; Shinjeong Kim; Kuk-Jin Yoon; |
1204 | Continual Semantic Segmentation Via Structure Preserving and Projected Feature Alignment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Deep networks have been shown to suffer from catastrophic forgetting. In this work, we try to alleviate this phenomenon in the field of continual semantic segmentation (CSS). |
Zihan Lin; Zilei Wang; Yixin Zhang; |
1205 | Interclass Prototype Relation for Few-Shot Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This study proposes the Interclass Prototype Relation Network (IPRNet), which improves the separation performance by reducing the similarity between other classes. |
Atsuro Okazawa; |
1206 | Slim Scissors: Segmenting Thin Object from Synthetic Background Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Our core idea is to segment thin parts by learning to compare the original image to a synthesized background without thin structures. |
Kunyang Han; Jun Hao Liew; Jiashi Feng; Huawei Tian; Yao Zhao; Yunchao Wei; |
1207 | Abstracting Sketches Through Simple Primitives Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget. |
Stephan Alaniz; Massimiliano Mancini; Anjan Dutta; Diego Marcos; Zeynep Akata; |
1208 | Multi-Scale and Cross-Scale Contrastive Learning for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key methodological insight is to leverage samples from the feature spaces emanating from multiple stages of a model’s encoder itself requiring neither data augmentation nor online memory banks to obtain a diverse set of samples. |
Theodoros Pissas; Claudio S. Ravasio; Lyndon Da Cruz; Christos Bergeles; |
1209 | One-Trimap Video Matting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Recent studies made great progress in video matting by extending the success of trimap-based image matting to the video domain. In this paper, we push this task toward a more practical setting and propose One-Trimap Video Matting network (OTVM) that performs video matting robustly using only one user-annotated trimap. |
Hongje Seong; Seoung Wug Oh; Brian Price; Euntai Kim; Joon-Young Lee; |
1210 | D2ADA: Dynamic Density-Aware Active Domain Adaptation for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we present D2ADA, a general active domain adaptation framework for semantic segmentation. |
Tsung-Han Wu; Yi-Syuan Liou; Shao-Ji Yuan; Hsin-Ying Lee; Tung-I Chen; Kuan-Chih Huang; Winston H. Hsu; |
1211 | Learning Quality-Aware Dynamic Memory for Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. |
Yong Liu; Ran Yu; Fei Yin; Xinyuan Zhao; Wei Zhao; Weihao Xia; Yujiu Yang; |
1212 | Learning Implicit Feature Alignment Function for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, bilinear up-sampling blurs the precise information learned in these feature maps and convolutions incur extra computation costs. To address these issues, we propose the Implicit Feature Alignment function (IFA). |
Hanzhe Hu; Yinbo Chen; Jiarui Xu; Shubhankar Borse; Hong Cai; Fatih Porikli; Xiaolong Wang; |
1213 | Quantum Motion Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper introduces the first algorithm for motion segmentation that relies on adiabatic quantum optimization of the objective function. |
Federica Arrigoni; Willi Menapace; Marcel Seelbach Benkner; Elisa Ricci; Vladislav Golyanik; |
1214 | Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way. |
Feng Zhu; Zongxin Yang; Xin Yu; Yi Yang; Yunchao Wei; |
1215 | Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite the great progress that has been made, the existing approaches fail to efficiently capture sophisticated structure information and critical part features simultaneously, limiting their capability of providing discriminative deep shape features. To address the above issue, we proposed a novel deep learning framework, Laplacian Mesh Transformer, to extract the critical structure and geometry features. |
Xiao-Juan Li; Jie Yang; Fang-Lue Zhang; |
1216 | Geodesic-Former: A Geodesic-Guided Few-Shot 3D Point Cloud Instance Segmenter Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a new problem in 3D point cloud: few-shot instance segmentation. |
Tuan Ngo; Khoi Nguyen; |
1217 | Union-Set Multi-source Model Adaptation for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: For the new setting named union-set multi-source model adaptation, we propose a method with a novel learning strategy named model-invariant feature learning, which takes full advantage of the diverse characteristics of the source-domain models, thereby improving the generalization in the target domain. |
Zongyao Li; Ren Togo; Takahiro Ogawa; Miki Haseyama; |
1218 | Point MixSwap: Attentional Point Cloud Mixing Via Swapping Matched Structural Divisions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a 3D augmentation method that explores the structural variance across multiple point clouds, and generates more diverse point clouds to enrich the training set. |
Ardian Umam; Cheng-Kun Yang; Yung-Yu Chuang; Jen-Hui Chuang; Yen-Yu Lin; |
1219 | BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. |
Ye Yu; Jialin Yuan; Gaurav Mittal; Li Fuxin; Mei Chen; |
1220 | SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, despite advances in deep learning-based methods, RGB-D SOD is still challenging due to the large domain gap between an RGB image and the depth map and low-quality depth maps. To solve this problem, we propose a novel superpixel prototype sampling network (SPSN) architecture. |
Minhyeok Lee; Chaewon Park; Suhwan Cho; Sangyoun Lee; |
1221 | Global Spectral Filter Memory Network for Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose Global Spectral Filter Memory network (GSFM), which improves intra-frame interaction through learning long-term spatial dependencies in the spectral domain. |
Yong Liu; Ran Yu; Jiahao Wang; Xinyuan Zhao; Yitong Wang; Yansong Tang; Yujiu Yang; |
1222 | Video Instance Segmentation Via Multi-Scale Spatio-Temporal Split Attention Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that such an attention computation ignores the multi-scale spatio-temporal feature relationships that are crucial to tackle target appearance deformations in videos. To address this issue, we propose a transformer-based VIS framework, named MS-STS VIS, that comprises a novel multi-scale spatio-temporal split (MS-STS) attention module in the encoder. |
Omkar Thawakar; Sanath Narayan; Jiale Cao; Hisham Cholakkal; Rao Muhammad Anwer; Muhammad Haris Khan; Salman Khan; Michael Felsberg; Fahad Shahbaz Khan; |
1223 | RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: On the other hand, in a typical image or video, only a few categories, i.e., a small subset of the complete label are present. Motivated by this intuition, in this paper, we propose to decompose segmentation into two sub-problems: (i) image-level or video-level multi-label classification and (ii) pixel-level rank-adaptive selected-label classification. |
Haodi He; Yuhui Yuan; Xiangyu Yue; Han Hu; |
1224 | Learning Topological Interactions for Multi-Class Medical Image Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a novel topological interaction module to encode the topological interactions into a deep neural network. |
Saumya Gupta; Xiaoling Hu; James Kaan; Michael Jin; Mutshipay Mpoy; Katherine Chung; Gagandeep Singh; Mary Saltz; Tahsin Kurc; Joel Saltz; Apostolos Tassiopoulos; Prateek Prasanna; Chao Chen; |
1225 | Unsupervised Segmentation in Real-World Images Via Spelke Object Inference Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce the Excitatory-Inhibitory Segment Extraction Network (EISEN), which learns to extract pairwise affinity graphs for static scenes from motion-based training signals. |
Honglin Chen; Rahul Venkatesh; Yoni Friedman; Jiajun Wu; Joshua B. Tenenbaum; Daniel L. K. Yamins; Daniel M. Bear; |
1226 | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper targets open-vocabulary semantic segmentation by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP. |
Mengde Xu; Zheng Zhang; Fangyun Wei; Yutong Lin; Yue Cao; Han Hu; Xiang Bai; |
1227 | Fast Two-View Motion Segmentation Using Christoffel Polynomials Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we propose a fast segmentation algorithm that scales linearly with the number of correspondences and show that on benchmark datasets it offers the best trade-off between error and computational time: it is at least one order of magnitude faster than the best method (with comparable or better accuracy), with the ratio growing up to three orders of magnitude for larger number of correspondences. |
Bengisu Ozbay; Octavia Camps; Mario Sznaier; |
1228 | UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we tackle the problem of RGB-D Semantic Segmentation. |
Xiaowen Ying; Mooi Choo Chuah; |
1229 | Bi-directional Contrastive Learning for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel unsupervised domain adaptation method for semantic segmentation that generalizes a model trained with source images and corresponding ground-truth labels to a target domain. |
Geon Lee; Chanho Eom; Wonkyung Lee; Hyekang Park; Bumsub Ham; |
1230 | Learning Regional Purity for Instance Segmentation on 3D Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we define a novel concept of ”regional purity as the percentage of neighboring points belonging to the same instance within a fixed-radius 3D space. |
Shichao Dong; Guosheng Lin; Tzu-Yi Hung; |
1231 | Cross-Domain Few-Shot Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we extend few-shot semantic segmentation to a new task, called Cross-Domain Few-Shot Semantic Segmentation (CD-FSS), which aims to generalize the meta-knowledge from domains with sufficient training labels to low-resource domains. |
Shuo Lei; Xuchao Zhang; Jianfeng He; Fanglan Chen; Bowen Du; Chang-Tien Lu; |
1232 | Generative Subgraph Contrast for Self-Supervised Graph Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, in this paper, we propose a novel adaptive subgraph generation based contrastive learning framework for efficient and robust self-supervised graph representation learning, and the optimal transport distance is utilized as the similarity metric between the subgraphs. |
Yuehui Han; Le Hui; Haobo Jiang; Jianjun Qian; Jin Xie; |
1233 | SdAE: Self-Distillated Masked Autoencoder Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: MAE does not require a pre-training codebook process, but setting pixels as reconstruction targets may introduce an optimization gap between pre-training and downstream tasks that good reconstruction quality may not always lead to the high descriptive capability for the model. Considering the above issues, in this paper, we propose a simple Self-distillated masked AutoEncoder network, namely SdAE. |
Yabo Chen; Yuchen Liu; Dongsheng Jiang; Xiaopeng Zhang; Wenrui Dai; Hongkai Xiong; Qi Tian; |
1234 | Demystifying Unsupervised Semantic Correspondence Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We thoroughly evaluate several recently proposed unsupervised methods across multiple challenging datasets using a standardized evaluation protocol where we vary factors such as the backbone architecture, the pre-training strategy, and the pre-training and finetuning datasets. To better understand the failure modes of these methods, and in order to provide a clearer path for improvement, we provide a new diagnostic framework along with a new performance metric that is better suited to the semantic matching task. |
Mehmet Aygü,n; Oisin Mac Aodha; |
1235 | Open-Set Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). |
Yen-Cheng Liu; Chih-Yao Ma; Xiaoliang Dai; Junjiao Tian; Peter Vajda; Zijian He; Zsolt Kira; |
1236 | Vibration-Based Uncertainty Estimation for Learning from Limited Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel approach that measures uncertainty from the vibration of sequential data, e.g., the output probability during the training procedure. |
Hengtong Hu; Lingxi Xie; Xinyue Huo; Richang Hong; Qi Tian; |
1237 | Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Thus, we theoretically analyze how and when a subsidiary pretext task could be leveraged to assist the goal task of a given DA problem and develop objective subsidiary task suitability criteria. Based on this criteria, we devise a novel process of sticker intervention and cast sticker classification as a supervised subsidiary DA problem concurrent to the goal task unsupervised DA. |
Jogendra Nath Kundu; Suvaansh Bhambri; Akshay Kulkarni; Hiran Sarkar; Varun Jampani; R. Venkatesh Babu; |
1238 | Weakly Supervised Object Localization Through Inter-class Feature Similarity and Intra-Class Appearance Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a simple but effective WSOL model (named ISIC) through Inter-class feature Similarity and Intra-class appearance Consistency. |
Jun Wei; Sheng Wang; S. Kevin Zhou; Shuguang Cui; Zhen Li; |
1239 | Active Learning Strategies for Weakly-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, there is still a significant performance gap between them. We propose to narrow this gap by fine-tuning a base pre-trained weakly-supervised detector with a few fully-annotated samples automatically selected from the training set using “box-in-box” (BiB), a novel active learning strategy designed specifically to address the well-documented failure modes of weakly-supervised detectors. |
Huy V. Vo; Oriane Simé,oni; Spyros Gidaris; Andrei Bursuc; Patrick Pé,rez; Jean Ponce; |
1240 | Mc-BEiT: Multi-Choice Discretization for Image BERT Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives. |
Xiaotong Li; Yixiao Ge; Kun Yi; Zixuan Hu; Ying Shan; Ling-Yu Duan; |
1241 | Bootstrapped Masked Autoencoders for Vision BERT Pretraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose bootstrapped masked autoencoders (BootMAE), a new approach for vision BERT pretraining. |
Xiaoyi Dong; Jianmin Bao; Ting Zhang; Dongdong Chen; Weiming Zhang; Lu Yuan; Dong Chen; Fang Wen; Nenghai Yu; |
1242 | Unsupervised Visual Representation Learning By Synchronous Momentum Grouping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a genuine group-level contrastive visual representation learning method whose linear evaluation performance on ImageNet surpasses the vanilla supervised learning. |
Bo Pang; Yifan Zhang; Yaoyi Li; Jia Cai; Cewu Lu; |
1243 | Improving Few-Shot Part Segmentation Using Coarse Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a framework to exploit coarse labels such as figure-ground masks and keypoint locations that are readily available for some categories to improve part segmentation models. |
Oindrila Saha; Zezhou Cheng; Subhransu Maji; |
1244 | What to Hide from Your Students: Attention-Guided Masked Image Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we argue that image token masking differs from token masking in text, due to the amount and correlation of tokens in an image. |
Ioannis Kakogeorgiou; Spyros Gidaris; Bill Psomas; Yannis Avrithis; Andrei Bursuc; Konstantinos Karantzalos; Nikos Komodakis; |
1245 | Pointly-Supervised Panoptic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new approach to applying point-level annotations for weakly-supervised panoptic segmentation. |
Junsong Fan; Zhaoxiang Zhang; Tieniu Tan; |
1246 | MVP: Multimodality-Guided Visual Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In the context of vision transformers, MIM learns effective visual representation by aligning the token-level features with a pre-defined space (e,g,, BEIT used a d-VAE trained on a large image corpus as the tokenizer). In this paper, we go one step further by introducing guidance from other modalities and validating that such additional knowledge leads to impressive gains for visual pre-training. |
Longhui Wei; Lingxi Xie; Wengang Zhou; Houqiang Li; Qi Tian; |
1247 | Locally Varying Distance Transform for Unsupervised Visual Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new embedding using a set of locally varying data projections, with each projection responsible for persevering the variations that distinguish a local cluster of instances from all other instances. |
Wen-Yan Lin; Zhonghang Liu; Siying Liu; |
1248 | HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The alternative of training with random crops of high-resolution images alleviates this problem but falls short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution training approach for UDA, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention, while maintaining a manageable GPU memory footprint. |
Lukas Hoyer; Dengxin Dai; Luc Van Gool; |
1249 | SPot-the-Difference Self-Supervised Pre-training for Anomaly Detection and Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new dataset as well as a new self-supervised learning method for ImageNet pre-training to improve anomaly detection and segmentation in 1-class and 2-class 5/10/high-shot training setups. |
Yang Zou; Jongheon Jeong; Latha Pemula; Dongqing Zhang; Onkar Dabeer; |
1250 | Dual-Domain Self-Supervised Learning and Model Adaption for Deep Compressive Imaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Aiming at addressing the limitations of supervised deep learning-based methods caused by their prerequisite on the ground truths of latent images, this paper proposes an unsupervised approach that trains a deep image reconstruction model using only a set of compressive measurements. |
Yuhui Quan; Xinran Qin; Tongyao Pang; Hui Ji; |
1251 | Unsupervised Selective Labeling for More Effective Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. |
Xudong Wang; Long Lian; Stella X. Yu; |
1252 | Max Pooling with Vision Transformers Reconciles Class and Shape in Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work proposes a new WSSS method dubbed ViT-PCM (ViT Patch-Class Mapping), not based on CAM. |
Simone Rossetti; Damiano Zappia; Marta Sanzari; Marco Schaerf; Fiora Pirri; |
1253 | Dense Siamese Network for Dense Unsupervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents Dense Siamese Network (DenseSiam), a simple unsupervised learning framework for dense prediction tasks. |
Wenwei Zhang; Jiangmiao Pang; Kai Chen; Chen Change Loy; |
1254 | Multi-Granularity Distillation Scheme Towards Lightweight Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting. |
Jie Qin; Jie Wu; Ming Li; Xuefeng Xiao; Min Zheng; Xingang Wang; |
1255 | CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a pixel-wise contrastive learning method called CP2 (Copy-Paste Contrastive Pretraining), which facilitates both image- and pixel-level representation learning and therefore is more suitable for downstream dense prediction tasks. |
Feng Wang; Huiyu Wang; Chen Wei; Alan Yuille; Wei Shen; |
1256 | Self-Filtering: A Noise-Aware Sample Selection for Label Noise with Confidence Penalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel selection strategy, \textbf{S}elf-\textbf{F}il\textbf{t}ering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them, which can avoid the selection bias of the small-loss criterion for the boundary examples. |
Qi Wei; Haoliang Sun; Xiankai Lu; Yilong Yin; |
1257 | RDA: Reciprocal Distribution Alignment for Robust Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Reciprocal Distribution Alignment (RDA) to address semi-supervised learning (SSL), which is a hyperparameter-free framework that is independent of confidence threshold and works with both the matched (conventionally) and the mismatched class distributions. |
Yue Duan; Lei Qi; Lei Wang; Luping Zhou; Yinghuan Shi; |
1258 | MemSAC: Memory Augmented Sample Consistency for Large Scale Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we propose MemSAC, which exploits sample level similarity across source and target domains to achieve discriminative transfer, along with architectures that scale to a large number of categories. |
Tarun Kalluri; Astuti Sharma; Manmohan Chandraker; |
1259 | United Defocus Blur Detection and Deblurring Via Adversarial Promoting Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper makes the earliest effort to jointly learn both defocus detection and deblurring without using pixel-level defocus detection annotation and paired defocus deblurring ground truth. |
Wenda Zhao; Fei Wei; You He; Huchuan Lu; |
1260 | Synergistic Self-Supervised and Quantization Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a method called synergistic self-supervised and quantization learning (SSQL) to pretrain quantization-friendly self-supervised models facilitating downstream deployment. |
Yun-Hao Cao; Peiqin Sun; Yechang Huang; Jianxin Wu; Shuchang Zhou; |
1261 | Semi-Supervised Vision Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available. Inspired by this observation, we introduce a joint semi-supervised learning framework, Semiformer, which contains a transformer stream, a convolutional stream and a carefully designed fusion module for knowledge sharing between these streams. |
Zejia Weng; Xitong Yang; Ang Li; Zuxuan Wu; Yu-Gang Jiang; |
1262 | Domain Adaptive Video Segmentation Via Temporal Pseudo Supervision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We design temporal pseudo supervision (TPS), a simple and effective method that explores the idea of consistency training for learning effective representations from unlabelled target videos. |
Yun Xing; Dayan Guan; Jiaxing Huang; Shijian Lu; |
1263 | Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although the performance is desirable, many remaining issues still need to be resolved, for example: (1) the teacher updated by the student using EMA tends to lose its distinctiveness and hence generates similar predictions comparing with student and cause potential noise accumulation as the training proceeds (2) the exploitation of pseudo labels still has much room for improvement. We present a diverse learner semi-supervised object detection framework to tackle these issues. |
Linfeng Li; Minyue Jiang; Yue Yu; Wei Zhang; Xiangru Lin; Yingying Li; Xiao Tan; Jingdong Wang; Errui Ding; |
1264 | A Closer Look at Invariances in Self-Supervised Pre-training for 3D Vision Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although they have achieved promising results, previous researches lack a systematic and fair comparison of these invariances. To address this issue, our work, for the first time, introduces a unified framework, into which previous works fit. |
Lanxiao Li; Michael Heizmann; |
1265 | ConMatch: Semi-Supervised Learning with Confidence-Guided Consistency Regularization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel semi-supervised learning framework that intelligently leverages the consistency regularization between the model’s predictions from two strongly-augmented views of an image, weighted by a confidence of pseudo-label, dubbed ConMatch. |
Jiwon Kim; Youngjo Min; Daehwan Kim; Gyuseong Lee; Junyoung Seo; Kwangrok Ryoo; Seungryong Kim; |
1266 | FedX: Unsupervised Federated Learning with Cross Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents FedX, an unsupervised federated learning framework. |
Sungwon Han; Sungwon Park; Fangzhao Wu; Sundong Kim; Chuhan Wu; Xing Xie; Meeyoung Cha; |
1267 | W2N: Switching from Weak Supervision to Noisy Supervision for Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these approaches simply divide the training set into labeled and unlabeled sets according to the image-level criteria, such that sufficient mislabeled or wrongly localized box predictions are chosen as pseudo ground-truths, resulting in a sub-optimal solution of detection performance. To overcome this issue, we propose a novel WSOD framework with a new paradigm that switches from weak supervision to noisy supervision (W2N). |
Zitong Huang; Yiping Bao; Bowen Dong; Erjin Zhou; Wangmeng Zuo; |
1268 | Decoupled Adversarial Contrastive Learning for Self-Supervised Adversarial Robustness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This motivation shifts the focus of the task from seeking an optimal integrating strategy for a coupled problem to finding sub-solutions for sub-problems. With this said, this work discards prior practices of directly introducing AT to SSL frameworks and proposed a two-stage framework termed \underline{De}coupled \underline{A}dversarial \underline{C}ontrastive \underline{L}earning (DeACL). |
Chaoning Zhang; Kang Zhang; Chenshuang Zhang; Axi Niu; Jiu Feng; Chang D. Yoo; In So Kweon; |
1269 | GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a principled way to combine two modalities. |
Huseyin Coskun; Alireza Zareian; Joshua L. Moore; Federico Tombari; Chen Wang; |
1270 | Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We generalize the mean-shift idea by constraining the search space of NNs using another source of knowledge so that NNs are far from the query while still being semantically related. |
K L Navaneet; Soroush Abbasi Koohpayegani; Ajinkya Tejankar; Kossar Pourahmadi; Akshayvarun Subramanya; Hamed Pirsiavash; |
1271 | Revisiting The Critical Factors of Augmentation-Invariant Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We focus on better understanding the critical factors of augmentation-invariant representation learning. |
Junqiang Huang; Xiangwen Kong; Xiangyu Zhang; |
1272 | CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a novel Class-Agnostic Semi-Supervised Learning (CA-SSL) framework to achieve a more favorable task-specificity balance in extracting training signals from unlabeled data. |
Lu Qi; Jason Kuen; Zhe Lin; Jiuxiang Gu; Fengyun Rao; Dian Li; Weidong Guo; Zhen Wen; Ming-Hsuan Yang; Jiaya Jia; |
1273 | Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we propose a novel DAT (Dual Adaptive Transformations) model for weakly supervised point cloud segmentation, where the dual adaptive transformations are performed via an adversarial strategy at both point-level and region-level, aiming at enforcing the local and structural smoothness constraints on 3D point clouds. |
Zhonghua Wu; Yicheng Wu; Guosheng Lin; Jianfei Cai; Chen Qian; |
1274 | Semantic-Aware Fine-Grained Correspondence Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In contrast, human vision is capable of distinguishing between distinct objects as a pretext to tracking. Inspired by this paradigm, we propose to learn semantic-aware fine-grained correspondence. |
Yingdong Hu; Renhao Wang; Kaifeng Zhang; Yang Gao; |
1275 | Self-Supervised Classification Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Self-Classifier — a novel self-supervised end-to-end classification learning approach. |
Elad Amrani; Leonid Karlinsky; Alex Bronstein; |
1276 | Data Invariants to Understand Unsupervised Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: By performing a large-scale evaluation on different benchmarks and image modalities, we show in this work that most popular state-of-the-art methods are unable to consistently outperform a simple anomaly detector based on pre-trained features and the Mahalanobis distance (MahaAD). |
Lars Doorenbos; Raphael Sznitman; Pablo Má,rquez-Neila; |
1277 | Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a Domain-invariant Masked AutoEncoder (DiMAE) for self-supervised learning from multi-domains, which designs a new pretext task, i.e., the cross-domain reconstruction task, to learn domain-invariant features. |
Haiyang Yang; Shixiang Tang; Meilin Chen; Yizhou Wang; Feng Zhu; Lei Bai; Rui Zhao; Wanli Ouyang; |
1278 | Semi-Supervised Object Detection Via Virtual Category Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, handling confusing samples is nontrivial: discarding valuable confusing samples would compromise the model generalisation while using them for training would exacerbate the confirmation bias issue caused by inevitable mislabelling. To solve this problem, this paper proposes to use confusing samples proactively without label correction. |
Changrui Chen; Kurt Debattista; Jungong Han; |
1279 | Completely Self-Supervised Crowd Counting Via Distribution Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Though existing self-supervised approaches could learn good representations, they require some labeled data to map these features to the end task of density estimation. We mitigate this issue with the proposed paradigm of complete self-supervision, which does not need even a single labeled image. |
Deepak Babu Sam; Abhinav Agarwalla; Jimmy Joseph; Vishwanath A. Sindagi; R. Venkatesh Babu; Vishal M. Patel; |
1280 | Coarse-to-Fine Incremental Few-Shot Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Such images form a new training set (i.e., support set) so that the incremental model is hoped to recognize a basenji (i.e., query) as a basenji next time. This paper formulates such a hybrid natural problem of coarse-to-fine few-shot (C2FS) recognition as a CIL problem named C2FSCIL, and proposes a simple, effective, and theoretically-sound strategy Knowe: to learn, normalize, and freeze a classifier’s weights from fine labels, once learning an embedding space contrastively from coarse labels. |
Xiang Xiang; Yuwen Tan; Qian Wan; Jing Ma; Alan Yuille; Gregory D. Hager; |
1281 | Learning Unbiased Transferability for Domain Adaptation By Uncertainty Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, due to the significant imbalance between the amount of annotated data in the source and target domains, usually only the target distribution is aligned to the source domain, leading to adapting unnecessary source specific knowledge to the target domain, i.e., biased domain adaptation. To resolve this problem, in this work, we delve into the transferability estimation problem in domain adaptation, proposing a non-intrusive Unbiased Transferability Estimation Plug-in (UTEP) by modeling the uncertainty of a discriminator in adversarial-based DA methods to optimize unbiased transfer. |
Jian Hu; Haowen Zhong; Fei Yang; Shaogang Gong; Guile Wu; Junchi Yan; |
1282 | Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to learn what makes a “good” video for action recognition and select only high-quality samples for augmentation. |
Shreyank N Gowda; Marcus Rohrbach; Frank Keller; Laura Sevilla-Lara; |
1283 | CYBORGS: Contrastively Bootstrapping Object Representations By Grounding in Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous works use preprocessing pipelines to localize salient objects for improved cropping, but an end-to-end solution is still elusive. In this work, we propose a framework which accomplishes this goal via joint learning of representations and segmentation. |
Renhao Wang; Hang Zhao; Yang Gao; |
1284 | PSS: Progressive Sample Selection for Open-World Visual Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a novel progressive approach which, at each iteration, selects unlabeled samples that attain a high homogeneity while belonging to classes that are distant to the current set of known classes in the feature space. |
Tianyue Cao; Yongxin Wang; Yifan Xing; Tianjun Xiao; Tong He; Zheng Zhang; Hao Zhou; Joseph Tighe; |
1285 | Improving Self-Supervised Lightweight Model Learning Via Hard-Aware Metric Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Existing SSL methods suffer a precipitous drop in lightweight models, which is important for many mobile devices. To address this problem, we propose a method to improve the lightweight network (as student) via distilling the metric knowledge in a larger SSL model (as teacher). |
Hao Liu; Mang Ye; |
1286 | Object Discovery Via Contrastive Learning for Weakly Supervised Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Current state-of-the-art models benefit from self-supervised instance-level supervision, but since weak supervision does not include count or location information, the most common “argmax” labeling method often ignores many instances of objects. To alleviate this issue, we propose a novel multiple instance labeling method called object discovery. |
Jinhwan Seo; Wonho Bae; Danica J. Sutherland; Junhyug Noh; Daijin Kim; |
1287 | Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Semi-supervised learning (SSL) has achieved new progress recently with the emerging framework of self-training deep networks, where the criteria for selection of unlabeled samples with pseudo labels play a key role in the empirical success. In this work, we propose such a new criterion based on consistency among multiple, stochastic classifiers, termed Stochastic Consensus (STOCO). |
Hui Tang; Lin Sun; Kui Jia; |
1288 | DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. |
Boah Kim; Inhwa Han; Jong Chul Ye; |
1289 | Semi-Leak: Membership Inference Attacks Against Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we take a different angle by studying the training data privacy of SSL. |
Xinlei He; Hongbin Liu; Neil Zhenqiang Gong; Yang Zhang; |
1290 | OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, this is hardly the case in many real-world scenarios, which limits their applicability. In this work, instead, we attempt to solve the challenging open-world SSL problem that does not make such an assumption. |
Mamshad Nayeem Rizve; Navid Kardan; Salman Khan; Fahad Shahbaz Khan; Mubarak Shah; |
1291 | Embedding Contrastive Unsupervised Features to Cluster In- and Out-of-Distribution Noise in Corrupted Image Datasets Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The latter are, in practice, the dominant type of noisy images retrieved. To tackle this noise duality, we propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning to represent images in a feature space. |
Paul Albert; Eric Arazo; Noel E. O’Connor; Kevin McGuinness; |
1292 | Unsupervised Few-Shot Image Classification By Learning Features Into Clustering Space Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on our LF2CS, we put forward an image sampling and c-way k-shot task building method. With this, we propose a novel unsupervised few-shot image classification method, which jointly learns the learnable model, clustering and few-shot image classification. |
Shuo Li; Fang Liu; Zehua Hao; Kaibo Zhao; Licheng Jiao; |
1293 | Towards Realistic Semi-Supervised Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel pseudo-label based approach to tackle SSL in open-world setting. |
Mamshad Nayeem Rizve; Navid Kardan; Mubarak Shah; |
1294 | Masked Siamese Networks for Label-Efficient Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. |
Mahmoud Assran; Mathilde Caron; Ishan Misra; Piotr Bojanowski; Florian Bordes; Pascal Vincent; Armand Joulin; Michael Rabbat; Nicolas Ballas; |
1295 | Natural Synthetic Anomalies for Self-Supervised Anomaly Detection and Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce a simple and intuitive self-supervision task, Natural Synthetic Anomalies (NSA), for training an end-to-end model for anomaly detection and localization using only normal training data. |
Hannah M. Schlü,ter; Jeremy Tan; Benjamin Hou; Bernhard Kainz; |
1296 | Understanding Collapse in Non-Contrastive Siamese Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We empirically analyze these non-contrastive methods and find that SimSiam is extraordinarily sensitive to model size. |
Alexander C. Li; Alexei A. Efros; Deepak Pathak; |
1297 | Federated Self-Supervised Learning for Video Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we evaluate the performance of current state-of-the-art (SOTA) video-SSL techniques and identify their shortcomings when integrated into the large-scale FL setting simulated with kinetics-400 dataset. |
Yasar Abbas Ur Rehman; Yan Gao; Jiajun Shen; Pedro Porto Buarque de Gusmã,o; Nicholas Lane; |
1298 | Towards Efficient and Effective Self-Supervised Learning of Visual Representations Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we explore reasons for the slow convergence of these methods, and further propose to strengthen them using well-posed auxiliary tasks that converge significantly faster, and are also useful for representation learning. |
Sravanti Addepalli; Kaushal Bhogale; Priyam Dey; R. Venkatesh Babu; |
1299 | DSR – A Dual Subspace Re-Projection Network for Surface Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an architecture based on quantized feature space representation with dual decoders, DSR, that avoids the image-level anomaly synthesis requirement. |
Vitjan Zavrtanik; Matej Kristan; Danijel Sko?aj; |
1300 | PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we recognize that pseudo labeling and data augmentation are complementary, thus propose to leverage unlabeled data for data augmentation to enrich the training data. |
Zhaoqi Leng; Shuyang Cheng; Benjamin Caine; Weiyue Wang; Xiao Zhang; Jonathon Shlens; Mingxing Tan; Dragomir Anguelov; |
1301 | MVSTER: Epipolar Transformer for Efficient Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial correlations and bringing additional computation costs. Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently. |
Xiaofeng Wang; Zheng Zhu; Guan Huang; Fangbo Qin; Yun Ye; Yijia He; Xu Chi; Xingang Wang; |
1302 | RelPose: Predicting Probabilistic Relative Rotation for Single Objects in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. |
Jason Y. Zhang; Deva Ramanan; Shubham Tulsiani; |
1303 | R2L: Distilling Neural \textit{Radiance} Field to Neural \textit{Light} Field for Efficient Novel View Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present a \textit{deep residual MLP} network (88 layers) to effectively learn the light field. |
Huan Wang; Jian Ren; Zeng Huang; Kyle Olszewski; Menglei Chai; Yun Fu; Sergey Tulyakov; |
1304 | KD-MVS: Knowledge Distillation Based Self-Supervised Learning for Multi-View Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel self-supervised training pipeline for MVS based on knowledge distillation, termed KD-MVS, which mainly consists of self-supervised teacher training and distillation-based student training. |
Yikang Ding; Qingtian Zhu; Xiangyue Liu; Wentao Yuan; Haotian Zhang; Chi Zhang; |
1305 | SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. |
John Lambert; Yuguang Li; Ivaylo Boyadzhiev; Lambert Wixson; Manjunath Narayana; Will Hutchcroft; James Hays; Frank Dellaert; Sing Bing Kang; |
1306 | RC-MVSNet: Unsupervised Multi-View Stereo with Neural Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, multi-view images in real scenarios observe non-Lambertian surfaces and experience occlusions. In this work, we propose a novel approach with neural rendering (RC-MVSNet) to solve such ambiguity issues of correspondences among views. |
Di Chang; Aljaž Boži?; Tong Zhang; Qingsong Yan; Yingcong Chen; Sabine Sü,sstrunk; Matthias Nieß,ner; |
1307 | Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we look at weakly-supervised 3D semantic instance segmentation. |
Julian Chibane; Francis Engelmann; Tuan Anh Tran; Gerard Pons-Moll; |
1308 | NeILF: Neural Incident Light Field for Physically-Based Material Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a differentiable rendering framework for material and lighting estimation from multi-view images and a reconstructed geometry. |
Yao Yao; Jingyang Zhang; Jingbo Liu; Yihang Qu; Tian Fang; David McKinnon; Yanghai Tsin; Long Quan; |
1309 | ARF: Artistic Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method for transferring the artistic features of an arbitrary style image to a 3D scene. |
Kai Zhang; Nick Kolkin; Sai Bi; Fujun Luan; Zexiang Xu; Eli Shechtman; Noah Snavely; |
1310 | Multiview Stereo with Cascaded Epipolar RAFT Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose CER-MVS (Cascaded Epipolar RAFT Multiview Stereo), a new approach based on the RAFT (Recurrent All-Pairs Field Transforms) architecture developed for optical flow. |
Zeyu Ma; Zachary Teed; Jia Deng; |
1311 | ARAH: Animatable Volume Rendering of Articulated Human SDFs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Further, animating avatars in out-of-distribution poses is not yet possible because the mapping from observation space to canonical space does not generalize faithfully to unseen poses. In this work, we address these shortcomings and propose a model to create animatable clothed human avatars with detailed geometry that generalize well to out-of-distribution poses. |
Shaofei Wang; Katja Schwarz; Andreas Geiger; Siyu Tang; |
1312 | ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To capture context at both global and local granularity, we propose ASpanFormer, a Transformer-based detector-free matcher that is built on hierarchical attention structure, adopting a novel attention operation which is capable of adjusting attention span in a self-adaptive manner. |
Hongkai Chen; Zixin Luo; Lei Zhou; Yurun Tian; Mingmin Zhen; Tian Fang; David McKinnon; Yanghai Tsin; Long Quan; |
1313 | NDF: Neural Deformable Fields for Dynamic Human Modelling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to learn a neural deformable field wrapped around a fitted parametric body model to represent the dynamic human. |
Ruiqi Zhang; Jie Chen; |
1314 | Neural Density-Distance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes Neural Distance-Density Field (NeDDF), a novel 3D representation that reciprocally constrains the distance and density fields. |
Itsuki Ueda; Yoshihiro Fukuhara; Hirokatsu Kataoka; Hiroaki Aizawa; Hidehiko Shishido; Itaru Kitahara; |
1315 | NeXT: Towards High Quality Neural Radiance Fields Via Multi-Skip Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most existing NeRF based methods, including its variants, treat each sample point individually as input, while ignoring the inherent relationships between adjacent sample points from the corresponding rays, thus hindering the reconstruction performance. To address this issue, we explore a brand new scheme, namely NeXT, introducing a multi-skip transformer to capture the rich relationships between various sample points in a ray-level query. |
Yunxiao Wang; Yanjie Li; Peidong Liu; Tao Dai; Shu-Tao Xia; |
1316 | Learning Online Multi-sensor Depth Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we introduce SenFuNet, a depth fusion approach that learns sensor-specific noise and outlier statistics and combines the data streams of depth frames from different sensors in an online fashion. |
Erik Sandströ,m; Martin R. Oswald; Suryansh Kumar; Silvan Weder; Fisher Yu; Cristian Sminchisescu; Luc Van Gool; |
1317 | BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-Scale Scene Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on multi-scale cases where large changes in imagery are observed at drastically different scales. |
Yuanbo Xiangli; Linning Xu; Xingang Pan; Nanxuan Zhao; Anyi Rao; Christian Theobalt; Bo Dai; Dahua Lin; |
1318 | Decomposing The Tangent of Occluding Boundaries According to Curvatures and Torsions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper develops new insight into the local structure of occluding boundaries on 3D surfaces. |
Huizong Yang; Anthony Yezzi; |
1319 | NeuRIS: Neural Reconstruction of Indoor Scenes Using Normal Priors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a new method, dubbed NeuRIS, for high-quality reconstruction of indoor scenes. |
Jiepeng Wang; Peng Wang; Xiaoxiao Long; Christian Theobalt; Taku Komura; Lingjie Liu; Wenping Wang; |
1320 | Generalizable Patch-Based Neural Rendering Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a different paradigm, where no deep features and no NeRF-like volume rendering are needed. |
Mohammed Suhail; Carlos Esteves; Leonid Sigal; Ameesh Makadia; |
1321 | Improving RGB-D Point Cloud Registration By Learning Multi-Scale Local Linear Transformation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, it is not trivial to effectively fuse the geometric and visual information from these two distinctive modalities, especially for the registration problem. In this work, we propose a new Geometry-Aware Visual Feature Extractor (GAVE) that employs multi-scale local linear transformation to progressively fuse these two modalities, where the geometric features from the depth data act as the geometry-dependent convolution kernels to transform the visual features from the RGB data. |
Ziming Wang; Xiaoliang Huo; Zhenghao Chen; Jing Zhang; Lu Sheng; Dong Xu; |
1322 | Real-Time Neural Character Rendering with Pose-Guided Multiplane Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality. |
Hao Ouyang; Bo Zhang; Pan Zhang; Hao Yang; Jiaolong Yang; Dong Chen; Qifeng Chen; Fang Wen; |
1323 | SparseNeuS: Fast Generalizable Neural Surface Reconstruction from Sparse Views Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce SparseNeuS, a novel neural rendering based method for the task of surface reconstruction from multi-view images. |
Xiaoxiao Long; Cheng Lin; Peng Wang; Taku Komura; Wenping Wang; |
1324 | Disentangling Object Motion and Occlusion for Unsupervised Multi-Frame Monocular Depth Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level. In this paper, we accordingly propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels. |
Ziyue Feng; Liang Yang; Longlong Jing; Haiyan Wang; YingLi Tian; Bing Li; |
1325 | Depth Field Networks for Generalizable Multi-View Scene Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Recently, generalist Transformer architectures have achieved impressive results in tasks such as optical flow and depth estimation by encoding geometric priors as inputs rather than as enforced constraints. In this paper, we extend this idea and propose to learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity. |
Vitor Guizilini; Igor Vasiljevic; Jiading Fang; Rare? Ambru?; Greg Shakhnarovich; Matthew R. Walter; Adrien Gaidon; |
1326 | Context-Enhanced Stereo Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, existing methods struggles to generalize and predict reliably in hazardous regions, such as large uniform regions. To overcome these limitations, we propose Context Enhanced Path (CEP). |
Weiyu Guo; Zhaoshuo Li; Yongkui Yang; Zheng Wang; Russell H. Taylor; Mathias Unberath; Alan Yuille; Yingwei Li; |
1327 | PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Existing deep learning based stereo matching methods either focus on achieving optimal performances on the target dataset while with poor generalization for other datasets or focus on handling the cross-domain generalization by suppressing the domain sensitive features which results in a significant sacrifice on the performance. To tackle these problems, we propose PCW-Net, a Pyramid Combination and Warping cost volume-based network to achieve good performance on both cross-domain generalization and stereo matching accuracy on various benchmarks. |
Zhelun Shen; Yuchao Dai; Xibin Song; Zhibo Rao; Dingfu Zhou; Liangjun Zhang; |
1328 | Gen6D: Generalizable Model-Free 6-DoF Object Pose Estimation from RGB Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a generalizable model-free 6-DoF object pose estimator called Gen6D. |
Yuan Liu; Yilin Wen; Sida Peng; Cheng Lin; Xiaoxiao Long; Taku Komura; Wenping Wang; |
1329 | Latency-Aware Collaborative Perception Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To mitigate the effect caused by the inevitable latency, from a machine learning perspective, we present the first latency-aware collaborative perception system, which actively adapts asynchronous perceptual features from multiple agents to the same time stamp, promoting the robustness and effectiveness of collaboration. |
Zixing Lei; Shunli Ren; Yue Hu; Wenjun Zhang; Siheng Chen; |
1330 | TensoRF: Tensorial Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present TensoRF, a novel approach to model and reconstruct radiance fields. |
Anpei Chen; Zexiang Xu; Andreas Geiger; Jingyi Yu; Hao Su; |
1331 | NeFSAC: Neurally Filtered Minimal Samples Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose NeFSAC, an efficient algorithm for neural filtering of motion-inconsistent and poorly-conditioned minimal samples. |
Luca Cavalli; Marc Pollefeys; Daniel Barath; |
1332 | SNeS: Learning Probably Symmetric Neural Surfaces from Incomplete Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method for the accurate 3D reconstruction of partly-symmetric objects. |
Eldar Insafutdinov; Dylan Campbell; Joã,o F. Henriques; Andrea Vedaldi; |
1333 | HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose high dynamic range radiance (HDR) fields, HDR-Plenoxels, that learns a plenoptic function of 3D HDR radiance fields, geometry information, and varying camera settings inherent in 2D low dynamic range (LDR) images. |
Kim Jun-Seong; Kim Yu-Ji; Moon Ye-Bin; Tae-Hyun Oh; |
1334 | NeuMan: Neural Human Radiance Field from A Single Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. |
Wei Jiang; Kwang Moo Yi; Golnoosh Samei; Oncel Tuzel; Anurag Ranjan; |
1335 | TAVA: Template-Free Animatable Volumetric Actors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose TAVA, a method to create Template-free Animatable Volumetric Actors, based on neural representations. |
Ruilong Li; Julian Tanke; Minh Vo; Michael Zollhö,fer; Jü,rgen Gall; Angjoo Kanazawa; Christoph Lassner; |
1336 | EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose to train an \underline{e}lastic and \underline{a}ccurate network for \underline{s}tereo matching (EASNet) that supports various 3D architectural settings on devices with different compute capability. |
Qiang Wang; Shaohuai Shi; Kaiyong Zhao; Xiaowen Chu; |
1337 | Relative Pose from SIFT Features Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes the geometric relationship of epipolar geometry and orientation- and scale-covariant, e.g., SIFT, features. |
Daniel Barath; Zuzana Kukelova; |
1338 | Selection and Cross Similarity for Event-Image Deep Stereo Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we aim to effectively deal with events that continuously occur with different disparity in the scene depending on the camera’s movement. |
Hoonhee Cho; Kuk-Jin Yoon; |
1339 | D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Also, how to discriminatively describe objects in complex 3D environments is not fully studied yet. To address these challenges, we present D3Net, an end-to-end neural speaker-listener architecture that can detect, describe and discriminate. |
Zhenyu Chen; Qirui Wu; Matthias Nieß,ner; Angel X. Chang; |
1340 | CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-Scale Indoor Scene Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present CIRCLE, a framework for large-scale scene completion and geometric refinement based on local implicit signed distance functions. |
Hao-Xiang Chen; Jiahui Huang; Tai-Jiang Mu; Shi-Min Hu; |
1341 | ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Estimating the pose of a moving camera from monocular video is a challenging problem, especially due to the presence of moving objects in dynamic environments, where the performance of existing camera pose estimation methods are susceptible to pixels that are not geometrically consistent. To tackle this challenge, we present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence initialized from pairwise optical flow. |
Wang Zhao; Shaohui Liu; Hengkai Guo; Wenping Wang; Yong-Jin Liu; |
1342 | 4DContrast: Contrastive Learning with Dynamic Correspondences for 3D Scene Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training. |
Yujin Chen; Matthias Nieß,ner; Angela Dai; |
1343 | Few `\textit{Zero Level Set}’-Shot Learning of Shape Signed Distance Functions in Feature Space Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We explore a new idea for learning based shape reconstruction from a point cloud, based on the recently popularized implicit neural shape representations. |
Amine Ouasfi; Adnane Boukhayma; |
1344 | Solution Space Analysis of Essential Matrix Based on Algebraic Error Minimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper reports on a solution space analysis of the essential matrix based on algebraic error minimization. |
Gaku Nakano; |
1345 | Approximate Differentiable Rendering with Algebraic Surfaces Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we develop an approximate differentiable renderer for a compact, interpretable representation, which we call Fuzzy Metaballs. |
Leonid Keselman; Martial Hebert; |
1346 | CoVisPose: Co-Visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360° Indoor Panoramas Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present CoVisPose, a new end-to-end supervised learning method for relative camera pose estimation in wide baseline 360 indoor panoramas. |
Will Hutchcroft; Yuguang Li; Ivaylo Boyadzhiev; Zhiqiang Wan; Haiyan Wang; Sing Bing Kang; |
1347 | Affine Correspondences Between Multi-Camera Systems for 6DOF Relative Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel method to compute the 6DOF relative pose of multi-camera systems using two affine correspondences (ACs). |
Banglei Guan; Ji Zhao; |
1348 | GraphFit: Learning Multi-Scale Graph-Convolutional Representation for Point Cloud Normal Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a precise and efficient normal estimation method that can deal with noise and nonuniform density for unstructured 3D point clouds. |
Keqiang Li; Mingyang Zhao; Huaiyu Wu; Dong-Ming Yan; Zhen Shen; Fei-Yue Wang; Gang Xiong; |
1349 | IS-MVSNet: Importance Sampling-Based MVSNet Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents a novel coarse-to-fine multi-view stereo (MVS) algorithm called importance-sampling-based MVSNet (IS-MVSNet) to address a crucial problem of limited depth resolution adopted by current learning-based MVS methods. |
Likang Wang; Yue Gong; Xinjun Ma; Qirui Wang; Kaixuan Zhou; Lei Chen; |
1350 | Point Scene Understanding Via Disentangled Instance Mesh Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To circumvent the hurdle, we propose a Disentangled Instance Mesh Reconstruction (DIMR) framework for effective point scene understanding. |
Jiaxiang Tang; Xiaokang Chen; Jingbo Wang; Gang Zeng; |
1351 | DiffuStereo: High Quality Human Reconstruction Via Diffusion-Based Stereo Using Sparse Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose DiffuStereo, a novel system using only sparse cameras (8 in this work) for high-quality 3D human reconstruction. |
Ruizhi Shao; Zerong Zheng; Hongwen Zhang; Jingxiang Sun; Yebin Liu; |
1352 | Space-Partitioning RANSAC Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: A new algorithm is proposed to accelerate the RANSAC model quality calculations. |
Daniel Barath; Gá,bor Valasek; |
1353 | SimpleRecon: 3D Reconstruction Without 3D Convolutions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we instead go back to the traditional route, and show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion. |
Mohamed Sayed; John Gibson; Jamie Watson; Victor Prisacariu; Michael Firman; Clé,ment Godard; |
1354 | Structure and Motion from Casual Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Under such conditions, state-of-the-art SfM methods tend to produce erroneous results, often failing entirely. To address these issues, we propose CasualSAM, a method to estimate camera poses and dense depth maps from a monocular, casually-captured video. |
Zhoutong Zhang; Forrester Cole; Zhengqi Li; Noah Snavely; Michael Rubinstein; William T. Freeman; |
1355 | What Matters for 3D Scene Flow Network Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In addition, the estimated correspondence is usually from the forward direction of the adjacent point clouds, and may not be consistent with the estimated correspondence acquired from the backward direction. To tackle these problems, we propose a novel all-to-all flow embedding layer with backward reliability validation during the initial scene flow estimation. |
Guangming Wang; Yunzhe Hu; Zhe Liu; Yiyang Zhou; Masayoshi Tomizuka; Wei Zhan; Hesheng Wang; |
1356 | Correspondence Reweighted Translation Averaging Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we introduce weights for individual correspondences which are iteratively refined to yield improved translation directions. |
Lalit Manam; Venu Madhav Govindu; |
1357 | Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Neural Strands, a novel learning framework for modeling accurate hair geometry and appearance from multi-view image inputs. |
Radu Alexandru Rosu; Shunsuke Saito; Ziyan Wang; Chenglei Wu; Sven Behnke; Giljoo Nam; |
1358 | GraphCSPN: Geometry-Aware Depth Completion Via Dynamic GCNs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion. |
Xin Liu; Xiaofei Shao; Bo Wang; Yali Li; Shengjin Wang; |
1359 | Objects Can Move: 3D Change Detection By Geometric Transformation Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a 3D object discovery method that is based only on scene changes. |
Aikaterini Adam; Torsten Sattler; Konstantinos Karantzalos; Tomas Pajdla; |
1360 | Language-Grounded Indoor 3D Semantic Segmentation in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This large number of class categories also induces a large natural class imbalance, both of which are challenging for existing 3D semantic segmentation methods. To learn more robust 3D features in this context, we propose a language-driven pre-training method to encourage learned 3D features that might have limited training examples to lie close to their pre-trained text embeddings. |
Dá,vid Rozenberszki; Or Litany; Angela Dai; |
1361 | Beyond Periodicity: Towards A Unifying Framework for Activations in Coordinate-MLPs Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we attempt to broaden the current understanding of the effect of activations in coordinate-MLPs, and show that there exists a broader class of activations that are suitable for encoding signals. |
Sameera Ramasinghe; Simon Lucey; |
1362 | Deforming Radiance Fields with Cages Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a method that enables a new type of deformation of the radiance field: free-form radiance field deformation. |
Tianhan Xu; Tatsuya Harada; |
1363 | FLEX: Extrinsic Parameters-Free Multi-View 3D Human Motion Reconstruction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce FLEX (Free muLti-view rEconstruXion), an end-to-end extrinsic parameter-free multi-view model. |
Brian Gordon; Sigal Raab; Guy Azov; Raja Giryes; Daniel Cohen-Or; |
1364 | MODE: Multi-View Omnidirectional Depth Estimation with 360° Cameras Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a two-stage omnidirectional depth estimation framework with multi-view 360-degree cameras. |
Ming Li; Xueqian Jin; Xuejiao Hu; Jingzhao Dai; Sidan Du; Yang Li; |
1365 | GigaDepth: Learning Depth from Structured Light with Branching Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose to split the regression problem into smaller classification sub-problems in a coarse-to-fine manner with the use of a weight-adaptive layer that efficiently implements branching per-pixel Multilayer Perceptrons applied to features extracted by a Convolutional Neural Network. |
Simon Schreiberhuber; Jean-Baptiste Weibel; Timothy Patten; Markus Vincze; |
1366 | ActiveNeRF: Learning Where to See with Uncertainty Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel learning framework, \textit{ActiveNeRF}, aiming to model a 3D scene with a constrained input budget. |
Xuran Pan; Zihang Lai; Shiji Song; Gao Huang; |
1367 | PoserNet: Refining Relative Camera Poses Exploiting Object Detections Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. |
Matteo Taiana; Matteo Toso; Stuart James; Alessio Del Bue; |
1368 | Gaussian Activated Neural Radiance Fields for High Fidelity Reconstruction \& Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present Gaussian Activated Neural Radiance Fields (GARF), a new positional embedding-free neural radiance field architecture — employing Gaussian activations — that is competitive with the current state-of-the-art in terms of high fidelity reconstruction and pose estimation. |
Shin-Fang Chng; Sameera Ramasinghe; Jamie Sherrah; Simon Lucey; |
1369 | Unbiased Gradient Estimation for Differentiable Surface Splatting Via Poisson Sampling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an efficient and GPU-accelerated sampling framework which enables unbiased gradient approximation for differentiable point cloud rendering based on surface splatting. |
Jan U. Mü,ller; Michael Weinmann; Reinhard Klein; |
1370 | Towards Learning Neural Representations from Shadows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a method that learns neural shadow fields, which are neural scene representations that are only learnt from the shadows present in the scene. |
Kushagra Tiwary; Tzofi Klinghoffer; Ramesh Raskar; |
1371 | Class-Incremental Novel Class Discovery Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by rehearsal-based incremental learning methods, in this paper we propose a novel approach for class-iNCD which prevents forgetting of past information about the base classes by jointly exploiting base class feature prototypes and feature-level knowledge distillation. |
Subhankar Roy; Mingxuan Liu; Zhun Zhong; Nicu Sebe; Elisa Ricci; |
1372 | Unknown-Oriented Learning for Open Set Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Though impressing performance, existing works neglect the complex semantic information and huge intra-category variation of unknown category, incapable of representing the complicated distribution. To overcome this, we propose a novel Unknown-Oriented Learning (UOL) framework for OSDA, and it is composed of three stages: true unknown excavation, false unknown suppression and known alignment. |
Jie Liu; Xiaoqing Guo; Yixuan Yuan; |
1373 | Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper studies a new, practical but challenging problem, called Class-Incremental Unsupervised Domain Adaptation (CI-UDA), where the labeled source domain contains all classes, but the classes in the unlabeled target domain increase sequentially. |
Hongbin Lin; Yifan Zhang; Zhen Qiu; Shuaicheng Niu; Chuang Gan; Yanxia Liu; Mingkui Tan; |
1374 | DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we observe two main issues of existing domain-invariant learning framework. |
Xin Lai; Zhuotao Tian; Xiaogang Xu; Yingcong Chen; Shu Liu; Hengshuang Zhao; Liwei Wang; Jiaya Jia; |
1375 | Class-Agnostic Object Counting Robust to Intraclass Diversity Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we focus on classagnostic counting, i.e., counting object instances in an image by simply specifying a few exemplar boxes of interest. |
Shenjian Gong; Shanshan Zhang; Jian Yang; Dengxin Dai; Bernt Schiele; |
1376 | Burn After Reading: Online Adaptation for Cross-Domain Streaming Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an online framework called Burn After Reading, i.e. each online sample is permanently deleted after it is processed. |
Luyu Yang; Mingfei Gao; Zeyuan Chen; Ran Xu; Abhinav Shrivastava; Chetan Ramaiah; |
1377 | Mind The Gap in Distilling StyleGANs Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our key insight is that the main challenge of StyleGAN distillation lies in the output discrepancy issue, where the teacher and student model yield different outputs given the same input latent code. |
Guodong Xu; Yuenan Hou; Ziwei Liu; Chen Change Loy; |
1378 | Improving Test-Time Adaptation Via Shift-Agnostic Weight Regularization and Nearest Source Prototypes Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper proposes a novel test-time adaptation strategy that adjusts the model pre-trained on the source domain using only unlabeled online data from the target domain to alleviate the performance degradation due to the distribution shift between the source and target domains. |
Sungha Choi; Seunghan Yang; Seokeon Choi; Sungrack Yun; |
1379 | Learning Instance-Specific Adaptation for Cross-Domain Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a test-time adaptation method for cross-domain image segmentation. |
Yuliang Zou; Zizhao Zhang; Chun-Liang Li; Han Zhang; Tomas Pfister; Jia-Bin Huang; |
1380 | RegionCL: Exploring Contrastive Region Pairs for Self-Supervised Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we make the first attempt to demonstrate the importance of both regions in cropping from a complete perspective and the effectiveness of using both regions via designing a simple yet effective pretext task called Region Contrastive Learning (RegionCL). |
Yufei Xu; Qiming Zhang; Jing Zhang; Dacheng Tao; |
1381 | Long-Tailed Class Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work we propose two long-tailed CIL scenarios, which we term Ordered and Shuffled LT-CIL. |
Xialei Liu; Yu-Song Hu; Xu-Sheng Cao; Andrew D. Bagdanov; Ke Li; Ming-Ming Cheng; |
1382 | DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we explore an alternative framework to incremental learning where we continually fine-tune the model from a pre-trained representation. |
Hyounguk Shon; Janghyeon Lee; Seung Hwan Kim; Junmo Kim; |
1383 | Adversarial Partial Domain Adaptation By Cycle Inconsistency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Accordingly, we propose to filter out source samples of outlier classes by weight suppression and align the distributions of shared classes between the source and target domains by adversarial learning. |
Kun-Yu Lin; Jiaming Zhou; Yukun Qiu; Wei-Shi Zheng; |
1384 | Combating Label Distribution Shift for Active Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We consider the problem of active domain adaptation (ADA) to unlabeled target data, of which subset is actively selected and labeled given a budget constraint. |
Sehyun Hwang; Sohyun Lee; Sungyeon Kim; Jungseul Ok; Suha Kwak; |
1385 | GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most approaches in the literature neglect an important aspect, i.e., how to deal with domain shift when handling dynamic scenes. This can significantly hinder the navigation capabilities of self-driving vehicles. This paper advances the state of the art in this research field. |
Cristiano Saltori; Evgeny Krivosheev; Sté,phane Lathuiliè,re; Nicu Sebe; Fabio Galasso; Giuseppe Fiameni; Elisa Ricci; Fabio Poiesi; |
1386 | CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a new approach of sample mixing for point cloud UDA, namely Compositional Semantic Mix (CoSMix), the first UDA approach for point cloud segmentation based on sample mixing. |
Cristiano Saltori; Fabio Galasso; Giuseppe Fiameni; Nicu Sebe; Elisa Ricci; Fabio Poiesi; |
1387 | A Unified Framework for Domain Adaptive Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision. |
Donghyun Kim; Kaihong Wang; Kate Saenko; Margrit Betke; Stan Sclaroff; |
1388 | A Broad Study of Pre-training for Domain Generalization and Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets. |
Donghyun Kim; Kaihong Wang; Stan Sclaroff; Kate Saenko; |
1389 | Prior Knowledge Guided Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The waive of labels in the target domain makes Unsupervised Domain Adaptation (UDA) an attractive technique in many real-world applications, though it also brings great challenges as model adaptation becomes harder without labeled target data. In this paper, we address this issue by seeking compensation from target domain prior knowledge, which is often (partially) available in practice, e.g., from human expertise. |
Tao Sun; Cheng Lu; Haibin Ling; |
1390 | GCISG: Guided Causal Invariant Learning for Improved Syn-to-Real Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we characterize the domain gap by using a causal framework for data generation. |
Gilhyun Nam; Gyeongjae Choi; Kyungmin Lee; |
1391 | AcroFOD: An Adaptive Method for Cross-Domain Few-Shot Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: There exists two significant challenges: (1) Highly insufficient target domain data (2) Potential over-adaptation and misleading caused by inappropriately amplified target samples without any restriction. To address these challenges, we propose an adaptive method consisting of two parts. |
Yipeng Gao; Lingxiao Yang; Yunmu Huang; Song Xie; Shiyong Li; Wei-Shi Zheng; |
1392 | Unsupervised Domain Adaptation for One-Stage Object Detector Using Offsets to Bounding Box Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: With a very simple and effective conditioning method, we propose OADA (Offset-Aware Domain Adaptive object detector) that achieves state-of-the-art performances in various experimental settings. |
Jayeon Yoo; Inseop Chung; Nojun Kwak; |
1393 | Visual Prompt Tuning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. |
Menglin Jia; Luming Tang; Bor-Chun Chen; Claire Cardie; Serge Belongie; Bharath Hariharan; Ser-Nam Lim; |
1394 | Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose an integrated scheme consisting of physically realistic synthesis of object point clouds via rendering stereo images via projection of speckle patterns onto CAD models and a novel quasi-balanced self-training designed for more balanced data distribution by sparsity-driven selection of pseudo labeled samples for long tailed classes. |
Yongwei Chen; Zihao Wang; Longkun Zou; Ke Chen; Kui Jia; |
1395 | Interpretable Open-Set Domain Adaptation Via Angular Margin Separation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Following that line, in this work, we propose a representation learning framework termed Angular Margin Separation (AMS) that unveils the power of discriminative and robust representation for both open-set domain adaptation and cross-domain semantic recovery. |
Xinhao Li; Jingjing Li; Zhekai Du; Lei Zhu; Wen Li; |
1396 | TACS: Taxonomy Adaptive Cross-Domain Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive cross-domain semantic segmentation (TACS) problem, allowing for inconsistent taxonomies between the two domains. |
Rui Gong; Martin Danelljan; Dengxin Dai; Danda Pani Paudel; Ajad Chhatkuli; Fisher Yu; Luc Van Gool; |
1397 | Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Prototypical Contrast Adaptation (ProCA), a simple and efficient contrastive learning method for unsupervised domain adaptive semantic segmentation. |
Zhengkai Jiang; Yuxi Li; Ceyuan Yang; Peng Gao; Yabiao Wang; Ying Tai; Chengjie Wang; |
1398 | RBC: Rectifying The Biased Context in Continual Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To tackle the obstacle, we propose a biased-context-rectified CSS framework with a context-rectified image-duplet learning scheme and a biased-context-insensitive consistency loss. |
Hanbin Zhao; Fengyu Yang; Xinghe Fu; Xi Li; |
1399 | Factorizing Knowledge in Neural Networks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore a novel and ambitious knowledge-transfer task, termed Knowledge Factorization (KF). |
Xingyi Yang; Jingwen Ye; Xinchao Wang; |
1400 | Contrastive Vicinal Space for Unsupervised Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertainty instances in the vicinal space to tackle the stated problem. |
Jaemin Na; Dongyoon Han; Hyung Jin Chang; Wonjun Hwang; |
1401 | Cross-Modal Knowledge Transfer Without Task-Relevant Source Data Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. |
Sk Miraj Ahmed; Suhas Lohit; Kuan-Chuan Peng; Michael J. Jones; Amit K. Roy-Chowdhury; |
1402 | Online Domain Adaptation for Semantic Segmentation in Ever-Changing Conditions Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we tackle Online Domain Adaptation (OnDA) for semantic segmentation. |
Theodoros Panagiotakopoulos; Pier Luigi Dovesi; Linus Hä,renstam-Nielsen; Matteo Poggi; |
1403 | Source-Free Video Domain Adaptation By Learning Temporal Consistency for Action Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel Attentive Temporal Consistent Network (ATCoN) to address SFVDA by learning temporal consistency, guaranteed by two novel consistency objectives, namely feature consistency and source prediction consistency, performed across local temporal features. |
Yuecong Xu; Jianfei Yang; Haozhi Cao; Keyu Wu; Min Wu; Zhenghua Chen; |
1404 | BMD: A General Class-Balanced Multicentric Dynamic Prototype Strategy for Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In addition, we found that a monocentric feature prototype may be ineffective to represent each category and introduce negative transfer, especially for those hard-transfer data. To address these issues, we propose a general class-Balanced Multicentric Dynamic prototype (BMD) strategy for the SFDA task. |
Sanqing Qu; Guang Chen; Jing Zhang; Zhijun Li; Wei He; Dacheng Tao; |
1405 | Generalized Brain Image Synthesis with Transferable Convolutional Sparse Coding Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a novel generalized brain image synthesis method, powered by our transferable convolutional sparse coding networks, to address the lack of interpretable cross-modal medical image representation learning. |
Yawen Huang; Feng Zheng; Xu Sun; Yuexiang Li; Ling Shao; Yefeng Zheng; |
1406 | Incomplete Multi-View Domain Adaptation Via Channel Enhancement and Knowledge Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This scenario is defined as incomplete multi-view domain adaptation (IMVDA), which considers that the source domain consists of multi-view data while the target domain only includes single-view instances. To overcome this practical demand, this paper proposes a novel Channel Enhancement and Knowledge Transfer (CEKT) framework with two modules. |
Haifeng Xia; Pu Wang; Zhengming Ding; |
1407 | DistPro: Searching A Fast Knowledge Distillation Process Via Meta Optimization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose DistPro, a novel framework which searches for an optimal KD process via differentiable meta-learning. |
Xueqing Deng; Dawei Sun; Shawn Newsam; Peng Wang; |
1408 | ML-BPM: Multi-Teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we introduce a multi-teacher framework with bidirectional photometric mixing to adapt to every target subdomain separately. |
Fei Pan; Sungsu Hur; Seokju Lee; Junsik Kim; In So Kweon; |
1409 | PACTran: PAC-Bayesian Metrics for Estimating The Transferability of Pretrained Models to Classification Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we present PACTran, a theoretically grounded family of metrics for pretrained model selection and transferability measurement. |
Nan Ding; Xi Chen; Tomer Levinboim; Soravit Changpinyo; Radu Soricut; |
1410 | Personalized Education: Blind Knowledge Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: By designing exploratory experiments with theoretical analysis, we find that model capacity differences are not necessarily the root reason instead the distillation data matter when the student capacity is greater than a threshold. In light of this, we propose personalized education (PE) to first help each student adaptively find its own blind knowledge region (BKR) where the student has not captured the knowledge from the teacher, and then teach the student on this region. |
Xiang Deng; Jian Zheng; Zhongfei Zhang; |
1411 | Not All Models Are Equal: Predicting Model Transferability in A Self-Challenging Fisher Space Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper addresses an important problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks. |
Wenqi Shao; Xun Zhao; Yixiao Ge; Zhaoyang Zhang; Lei Yang; Xiaogang Wang; Ying Shan; Ping Luo; |
1412 | How Stable Are Transferability Metrics Evaluations? Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper we conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations. |
Andrea Agostinelli; Michal Pá,ndy; Jasper Uijlings; Thomas Mensink; Vittorio Ferrari; |
1413 | Attention Diversification for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. |
Rang Meng; Xianfeng Li; Weijie Chen; Shicai Yang; Jie Song; Xinchao Wang; Lei Zhang; Mingli Song; Di Xie; Shiliang Pu; |
1414 | ESS: Learning Event-Based Semantic Segmentation from Still Images Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). |
Zhaoning Sun; Nico Messikommer; Daniel Gehrig; Davide Scaramuzza; |
1415 | An Efficient Spatio-Temporal Pyramid Transformer for Action Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we present an efficient hierarchical Spatio-Temporal Pyramid Transformer (STPT) for action detection, building upon the fact that the early self-attention layers in Transformers still focus on local patterns. |
Yuetian Weng; Zizheng Pan; Mingfei Han; Xiaojun Chang; Bohan Zhuang; |
1416 | Human Trajectory Prediction Via Neural Social Physics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new method combining both methodologies based on a new Neural Differential Equation model. |
Jiangbei Yue; Dinesh Manocha; He Wang; |
1417 | Towards Open Set Video Anomaly Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we propose to use graph neural networks and triplet loss to learn discriminative features for training the EDL classifier, where the EDL is capable of identifying the unknown anomalies by quantifying the uncertainty. |
Yuansheng Zhu; Wentao Bao; Qi Yu; |
1418 | ECLIPSE: Efficient Long-Range Video Retrieval Using Sight and Sound Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce an audiovisual method for long-range text-to-video retrieval. |
Yan-Bo Lin; Jie Lei; Mohit Bansal; Gedas Bertasius; |
1419 | Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a training strategy to identify and remove modality-specific noisy labels dynamically. |
Haoyue Cheng; Zhaoyang Liu; Hang Zhou; Chen Qian; Wayne Wu; Limin Wang; |
1420 | Less Than Few: Self-Shot Video Instance Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this paper is to bypass the need for labelled examples in few-shot video understanding at run time. |
Pengwan Yang; Yuki M. Asano; Pascal Mettes; Cees G. M. Snoek; |
1421 | Adaptive Face Forgery Detection in Cross Domain Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In addition to this, the inconsistency problem in the previous methods is significantly exacerbated due to the diversities among various forgery methods. To address this problem, we propose a novel deep learning framework for face forgery detection in cross domain. |
Luchuan Song; Zheng Fang; Xiaodan Li; Xiaoyi Dong; Zhenchao Jin; Yuefeng Chen; Siwei Lyu; |
1422 | Real-Time Online Video Detection with Temporal Smoothing Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Unfortunately, in most existing methods, the computational complexity grows linearly or quadratically with the length of the considered dynamics. This issue is particularly pronounced in transformer-based architectures. To address this issue, we reformulate the cross-attention in a video transformer through the lens of kernel and apply two kinds of temporal smoothing kernel: A box kernel or a Laplace kernel. |
Yue Zhao; Philipp Krä,henbü,hl; |
1423 | TALLFormer: Temporal Action Localization with A Long-Memory Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This issue becomes even worse with the recent video transformer models, many of which have quadratic memory complexity. To address these issues, we propose TALLFormer, a memory-efficient and end-to-end trainable Temporal Action Localization Transformer with Long-term memory. |
Feng Cheng; Gedas Bertasius; |
1424 | Mining Relations Among Cross-Frame Affinities for Video Semantic Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by traditional feature processing, we propose Single-scale Affinity Refinement (SAR) and Multi-scale Affinity Aggregation (MAA). |
Guolei Sun; Yun Liu; Hao Tang; Ajad Chhatkuli; Le Zhang; Luc Van Gool; |
1425 | TLDW? Summarizing Instructional Videos with Task Relevance \& Cross-Modal Saliency Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on summarizing instructional videos, an under-explored area of video summarization. |
Medhini Narasimhan; Arsha Nagrani; Chen Sun; Michael Rubinstein; Trevor Darrell; Anna Rohrbach; Cordelia Schmid; |
1426 | Rethinking Learning Approaches for Long-Term Action Anticipation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. |
Megha Nawhal; Akash Abdu Jyothi; Greg Mori; |
1427 | DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new transformer architecture termed DualFormer, which can efficiently perform space-time attention for video recognition. |
Yuxuan Liang; Pan Zhou; Roger Zimmermann; Shuicheng Yan; |
1428 | Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, optical flow is intrinsically an instantaneous velocity of all pixels among consecutive frames, thus making the motion features not aligned well with the primary objects among the corresponding frames. To solve the above challenge, we propose a concise, practical, and efficient architecture for appearance and motion feature alignment, dubbed hierarchical feature alignment network (HFAN). |
Gensheng Pei; Fumin Shen; Yazhou Yao; Guo-Sen Xie; Zhenmin Tang; Jinhui Tang; |
1429 | PAC-Net: Highlight Your Video Via History Preference Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a Preference-Adaptive Classification (PAC-Net) framework, which can model users’ personalized preferences from their user history. |
Hang Wang; Penghao Zhou; Chong Zhou; Zhao Zhang; Xing Sun; |
1430 | How Severe Is Benchmark-Sensitivity in Video Self-Supervised Learning? Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate how sensitive video self-supervised learning is to the current conventional benchmark and whether methods generalize beyond the canonical evaluation setting. |
Fida Mohammad Thoker; Hazel Doughty; Piyush Bagad; Cees G. M. Snoek; |
1431 | A Sliding Window Scheme for Online Temporal Action Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we propose Online Anchor Transformer (OAT) to extend the anchor-based action localization model to the online setting. |
Young Hwi Kim; Hyolim Kang; Seon Joo Kim; |
1432 | ERA: Expert Retrieval and Assembly for Early Action Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. |
Lin Geng Foo; Tianjiao Li; Hossein Rahmani; Qiuhong Ke; Jun Liu; |
1433 | Dual Perspective Network for Audio-Visual Event Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Past works have traditionally viewed videos as temporally sequenced multi-modal streams. We improve and extend on this view by proposing a novel architecture, the Dual Perspective Network (DPNet), that – (1) additionally operates on an intuitive graph perspective of a video to simultaneously facilitate cross-modal guidance and short-term temporal aggregation using a Graph NeuralNetwork (GNN), (2) deploys a Temporal Convolutional Network (TCN)to achieve long-term dependency resolution, and (3) encourages interactive feature learning using an acyclic feature refinement process that alternates between the GNN and TCN. |
Varshanth Rao; Md Ibrahim Khalil; Haoda Li; Peng Dai; Juwei Lu; |
1434 | NSNet: Non-Saliency Suppression Sampler for Efficient Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Taking all frames as positive samples, few of them pay attention to the discrimination between positive samples (salient frames) and negative samples (non-salient frames) in supervisions. To fill this gap, in this paper, we propose a novel Non-saliency Suppression Network (NSNet), which effectively suppresses the responses of non-salient frames. |
Boyang Xia; Wenhao Wu; Haoran Wang; Rui Su; Dongliang He; Haosen Yang; Xiaoran Fan; Wanli Ouyang; |
1435 | Video Activity Localisation with Uncertainties in Temporal Boundary Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Such uncertainties in temporal labelling are currently ignored in model training, resulting in learning mis-matched video-text correlation with poor generalisation in test. In this work, we solve this problem by introducing Elastic Moment Bounding (EMB) to accommodate flexible and adaptive activity temporal boundaries towards modelling universally interpretable video-text correlation with tolerance to underlying temporal uncertainties in pre-fixed annotations. |
Jiabo Huang; Hailin Jin; Shaogang Gong; Yang Liu; |
1436 | Temporal Saliency Query Network for Efficient Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To achieve it, we propose a Temporal Saliency Query Network (TSQNet) that includes two instantiations of the TSQ mechanism based on visual appearance similarities and textual event-object relations. |
Boyang Xia; Zhihao Wang; Wenhao Wu; Haoran Wang; Jungong Han; |
1437 | Efficient One-Stage Video Object Detection By Exploiting Temporal Consistency Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames. |
Guanxiong Sun; Yang Hua; Guosheng Hu; Neil Robertson; |
1438 | Leveraging Action Affinity and Continuity for Semi-Supervised Temporal Action Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a semi-supervised learning approach to the temporal action segmentation task. |
Guodong Ding; Angela Yao; |
1439 | Spotting Temporally Precise, Fine-Grained Events in Video Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In response, we propose E2E-Spot, a compact, end-to-end model that performs well on the precise spotting task and can be trained quickly on a single GPU. |
James Hong; Haotian Zhang; Michaë,l Gharbi; Matthew Fisher; Kayvon Fatahalian; |
1440 | Unified Fully and Timestamp Supervised Temporal Action Segmentation Via Sequence to Sequence Translation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. |
Nadine Behrmann; S. Alireza Golestaneh; Zico Kolter; Jü,rgen Gall; Mehdi Noroozi; |
1441 | Efficient Video Transformers with Spatial-Temporal Token Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present STTS, a token selection framework that dynamically selects a few informative tokens in both temporal and spatial dimensions conditioned on input video samples. |
Junke Wang; Xitong Yang; Hengduo Li; Li Liu; Zuxuan Wu; Yu-Gang Jiang; |
1442 | Long Movie Clip Classification with State-Space Video Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Instead, we propose ViS4mer, an efficient long-range video model that combines the strengths of self-attention and the recently introduced structured state-space sequence (S4) layer. |
Md Mohaiminul Islam; Gedas Bertasius; |
1443 | Prompting Visual-Language Models for Efficient Video Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents a simple but strong baseline to efficiently adapt the pre-trained I-VL model for video understanding tasks, with minimal training. |
Chen Ju; Tengda Han; Kunhao Zheng; Ya Zhang; Weidi Xie; |
1444 | Asymmetric Relation Consistency Reasoning for Video Relation Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Asymmetric Relation Consistency (ARC) reasoning model to solve the video relation grounding problem. |
Huan Li; Ping Wei; Jiapeng Li; Zeyu Ma; Jiahui Shang; Nanning Zheng; |
1445 | Self-Supervised Social Relation Representation for Human Group Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new two-stage multi-head framework for human group detection. |
Jiacheng Li; Ruize Han; Haomin Yan; Zekun Qian; Wei Feng; Song Wang; |
1446 | K-Centered Patch Sampling for Efficient Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: For decades, it has been a common practice to choose a subset of video frames for reducing the computational burden of a video understanding model. In this paper, we argue that this popular heuristic might be sub-optimal under recent transformer-based models. |
Seong Hyeon Park; Jihoon Tack; Byeongho Heo; Jung-Woo Ha; Jinwoo Shin; |
1447 | A Deep Moving-Camera Background Model Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a new method, called DeepMCBM, that eliminates all the aforementioned issues and achieves state-of-the-art results. |
Guy Erez; Ron Shapira Weber; Oren Freifeld; |
1448 | GraphVid: It Only Takes A Few Nodes to Understand A Video Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a concise representation of videos that encode perceptually meaningful features into graphs. |
Eitan Kosman; Dotan Di Castro; |
1449 | Delta Distillation for Efficient Video Processing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper aims to accelerate video stream processing, such as object detection and semantic segmentation, by leveraging the temporal redundancies that exist between video frames. |
Amirhossein Habibian; Haitam Ben Yahia; Davide Abati; Efstratios Gavves; Fatih Porikli; |
1450 | MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, whether it is possible to build a generic MLP-Like architecture on video domain has not been explored, due to complex spatial-temporal modeling with large computation burden. To fill this gap, we present an efficient self-attention free backbone, namely MorphMLP, which flexibly leverages the concise Fully-Connected (FC) layer for video representation learning. |
David Junhao Zhang; Kunchang Li; Yali Wang; Yunpeng Chen; Shashwat Chandra; Yu Qiao; Luoqi Liu; Mike Zheng Shou; |
1451 | COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale and learns group activity compositionally. |
Honglu Zhou; Asim Kadav; Aviv Shamsian; Shijie Geng; Farley Lai; Long Zhao; Ting Liu; Mubbasir Kapadia; Hans Peter Graf; |
1452 | E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose E-NeRV, which dramatically expedites NeRV by decomposing the image-wise implicit neural representation into separate spatial and temporal context. |
Zizhang Li; Mengmeng Wang; Huaijin Pi; Kechun Xu; Jianbiao Mei; Yong Liu; |
1453 | TDViT: Temporal Dilated Video Transformer for Dense Video Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, these models are expensive for deployment, less effective when handling redundant frames and difficult to capture long-range temporal correlations. To overcome these issues, we propose a Temporal Dilated Video Transformer (TDViT) that consists of carefully-designed temporal dilated transformer blocks (TDTB). |
Guanxiong Sun; Yang Hua; Guosheng Hu; Neil Robertson; |
1454 | Semi-Supervised Learning of Optical Flow By Flow Supervisor Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Specifically, we propose a flow supervisor for self-supervision, which consists of parameter separation and a student output connection. |
Woobin Im; Sebin Lee; Sung-Eui Yoon; |
1455 | Flow Graph to Video Grounding for Weakly-Supervised Multi-step Localization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos.To this end, we introduce the new problem of flow graph to video grounding. |
Nikita Dvornik; Isma Hadji; Hai Pham; Dhaivat Bhatt; Brais Martinez; Afsaneh Fazly; Allan D. Jepson; |
1456 | Deep 360° Optical Flow Estimation Based on Multi-Projection Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address the distortions of panoramic representations when applying convolutional neural networks, we propose a novel multi-projection fusion framework that fuses the optical flow predicted by the models trained using different projection methods. |
Yiheng Li; Connelly Barnes; Kun Huang; Fang-Lue Zhang; |
1457 | MaCLR: Motion-Aware Contrastive Learning of Representations for Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present MaCLR, a novel method to explicitly perform cross-modal self-supervised video representations learning from visual and motion modalities. |
Fanyi Xiao; Joseph Tighe; Davide Modolo; |
1458 | Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. |
Kyle Min; Sourya Roy; Subarna Tripathi; Tanaya Guha; Somdeb Majumdar; |
1459 | Frozen CLIP Models Are Efficient Video Learners Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Efficient Video Learning (EVL) – an efficient framework for directly training high-quality video recognition models with frozen CLIP features. |
Ziyi Lin; Shijie Geng; Renrui Zhang; Peng Gao; Gerard de Melo; Xiaogang Wang; Jifeng Dai; Yu Qiao; Hongsheng Li; |
1460 | PIP: Physical Interaction Prediction Via Mental Simulation with Span Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: With these motivations, we propose a novel scheme: \textbf{P}hysical \textbf{I}nteraction \textbf{P}rediction via Mental Simulation with Span Selection (PIP). |
Jiafei Duan; Samson Yu; Soujanya Poria; Bihan Wen; Cheston Tan; |
1461 | Panoramic Vision Transformer for Saliency Detection in 360° Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a new framework named Panoramic Vision Transformer (PAVER). |
Heeseung Yun; Sehun Lee; Gunhee Kim; |
1462 | Bayesian Tracking of Video Graphs Using Joint Kalman Smoothing and Registration Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper develops a Kalman-smoothing method for estimating graphs from noisy, cluttered, and incomplete data. |
Aditi Basu Bal; Ramy Mounir; Sathyanarayanan Aakur; Sudeep Sarkar; Anuj Srivastava; |
1463 | Motion Sensitive Contrastive Learning for Self-Supervised Video Representation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose Motion Sensitive Contrastive Learning (MSCL) that injects the motion information captured by optical flows into RGB frames to strengthen feature learning. |
Jingcheng Ni; Nan Zhou; Jie Qin; Qian Wu; Junqi Liu; Boxun Li; Di Huang; |
1464 | Dynamic Temporal Filtering In Video Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a new recipe of temporal feature learning, namely Dynamic Temporal Filter (DTF), that novelly performs spatial-aware temporal modeling in frequency domain with large temporal receptive field. |
Fuchen Long; Zhaofan Qiu; Yingwei Pan; Ting Yao; Chong-Wah Ngo; Tao Mei; |
1465 | Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a training-free adaption method for CLIP to conduct few-shot classification, termed as Tip-Adapter, which not only inherits the training-free advantage of zero-shot CLIP but also performs comparably to those training-required approaches. |
Renrui Zhang; Wei Zhang; Rongyao Fang; Peng Gao; Kunchang Li; Jifeng Dai; Yu Qiao; Hongsheng Li; |
1466 | Temporal Lift Pooling for Continuous Sign Language Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we derive temporal lift pooling (TLP) from the Lifting Scheme in signal processing to intelligently downsample features of different temporal hierarchies. |
Lianyu Hu; Liqing Gao; Zekang Liu; Wei Feng; |
1467 | MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, aiming at improving 3D dense captioning via capturing and utilizing the complex relations in the 3D scene, we propose MORE, a Multi-Order RElation mining model, to support generating more descriptive and comprehensive captions. |
Yang Jiao; Shaoxiang Chen; Zequn Jie; Jingjing Chen; Lin Ma; Yu-Gang Jiang; |
1468 | SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate how to achieve better referring visual grounding with modern vision-language transformers, and propose a simple yet powerful Selective Retraining (SiRi) mechanism. |
Mengxue Qu; Yu Wu; Wu Liu; Qiqi Gong; Xiaodan Liang; Olga Russakovsky; Yao Zhao; Yunchao Wei; |
1469 | Cross-Modal Prototype Driven Network for Radiology Report Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Here we propose a Cross-modal PROtotype driven NETwork (XPRONET) to promote cross-modal pattern learning and exploit it to improve the task of radiology report generation. |
Jun Wang; Abhir Bhalerao; Yulan He; |
1470 | TM2T: Stochastic and Tokenized Modeling for The Reciprocal Generation of 3D Human Motions and Texts Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Inspired by the strong ties between vision and language, the two intimate human sensing and communication modalities, our paper aims to explore the generation of 3D human full-body motions from texts, as well as its reciprocal task, shorthanded for text2motion and motion2text, respectively. |
Chuan Guo; Xinxin Zuo; Sen Wang; Li Cheng; |
1471 | SeqTR: A Simple Yet Universal Network for Visual Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e.g., phrase localization, referring expression comprehension (REC) and segmentation (RES). |
Chaoyang Zhu; Yiyi Zhou; Yunhang Shen; Gen Luo; Xingjia Pan; Mingbao Lin; Chao Chen; Liujuan Cao; Xiaoshuai Sun; Rongrong Ji; |
1472 | VTC: Improving Video-Text Retrieval with User Comments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we a) introduce a new dataset of videos, titles and comments b) present an attention-based mechanism that allows the model to learn from sometimes irrelevant data such as comments c) show that by using comments, our method is able to learn better, more contextualised, representations for image, video and audio representations. |
Laura Hanu; James Thewlis; Yuki M. Asano; Christian Rupprecht; |
1473 | FashionViL: Fashion-Focused Vision-and-Language Representation Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel fashion-focused V+L representation learning framework, dubbed as FashionViL. |
Xiao Han; Licheng Yu; Xiatian Zhu; Li Zhang; Yi-Zhe Song; Tao Xiang; |
1474 | Weakly Supervised Grounding for VQA in Vision-Language Transformers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most systems that show good performance of those tasks still rely on pre-trained object detectors during training, which limits their applicability to the object classes available for those detectors. To mitigate this limitation, this paper focuses on the problem of weakly supervised grounding in the context of visual question answering in transformers. |
Aisha Urooj; Hilde Kuehne; Chuang Gan; Niels Da Vitoria Lobo; Mubarak Shah; |
1475 | Automatic Dense Annotation of Large-Vocabulary Sign Language Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we propose a simple, scalable framework to vastly increase the density of automatic annotations. |
Liliane Momeni; Hannah Bull; K R Prajwal; Samuel Albanie; Gü,l Varol; Andrew Zisserman; |
1476 | MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we for the first time investigate masked visual modeling in video-text pre-training with the dual-encoder architecture. |
Yuying Ge; Yixiao Ge; Xihui Liu; Jinpeng Wang; Jianping Wu; Ying Shan; Xiaohu Qie; Ping Luo; |
1477 | GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new dataset called Kinetic-GEB+. |
Yuxuan Wang; Difei Gao; Licheng Yu; Weixian Lei; Matt Feiszli; Mike Zheng Shou; |
1478 | A Simple and Robust Correlation Filtering Method for Text-Based Person Search Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel end-to-end Simple and Robust Correlation Filtering (SRCF) method which can effectively extract key clues and adaptively align the discriminative features. |
Wei Suo; Mengyang Sun; Kai Niu; Yiqi Gao; Peng Wang; Yanning Zhang; Qi Wu; |
1479 | Making The Most of Text Semantics to Improve Biomedical Vision-Language Processing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision-language processing. |
Benedikt Boecking; Naoto Usuyama; Shruthi Bannur; Daniel C. Castro; Anton Schwaighofer; Stephanie Hyland; Maria Wetscherek; Tristan Naumann; Aditya Nori; Javier Alvarez-Valle; Hoifung Poon; Ozan Oktay; |
1480 | Generative Negative Text Replay for Continual Vision-Language Pretraining Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we focus on learning a VLP model with sequential data chunks of image-text pairs. |
Shipeng Yan; Lanqing Hong; Hang Xu; Jianhua Han; Tinne Tuytelaars; Zhenguo Li; Xuming He; |
1481 | Video Graph Transformer for Video Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a Video Graph Transformer (VGT) model for Video Quetion Answering (VideoQA). |
Junbin Xiao; Pan Zhou; Tat-Seng Chua; Shuicheng Yan; |
1482 | Trace Controlled Text to Image Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Motivated by this, we propose a Trace Controlled Text to Image Generation model (TCTIG), which takes trace as a bridge between semantic concepts and spatial conditions. |
Kun Yan; Lei Ji; Chenfei Wu; Jianmin Bao; Ming Zhou; Nan Duan; Shuai Ma; |
1483 | Video Question Answering with Iterative Video-Text Co-Tokenization Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel multi-stream video encoder for video question answering that uses multiple video inputs and a new video-text iterative co-tokenization approach to answer a variety of questions related to videos. |
AJ Piergiovanni; Kairo Morton; Weicheng Kuo; Michael S. Ryoo; Anelia Angelova; |
1484 | Rethinking Data Augmentation for Robust Visual Question Answering Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a new Knowledge Distillation based Data Augmentation for VQA, dubbed KDDAug. |
Long Chen; Yuhang Zheng; Jun Xiao; |
1485 | Explicit Image Caption Editing Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we introduce a new task: Explicit Caption Editing (ECE).To further facilitate ECE research, we propose two new ECE benchmarks by re-organizing two existing datasets, dubbed COCO-EE and Flickr30K-EE, respectively. |
Zhen Wang; Long Chen; Wenbo Ma; Guangxing Han; Yulei Niu; Jian Shao; Jun Xiao; |
1486 | Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, this paper proposes a novel training framework for grounding models to use shuffled videos to address temporal bias problem without losing grounding accuracy. |
Jiachang Hao; Haifeng Sun; Pengfei Ren; Jingyu Wang; Qi Qi; Jianxin Liao; |
1487 | Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we promote a problem formulation for reliable VQA, where we prefer abstention over providing an incorrect answer. |
Spencer Whitehead; Suzanne Petryk; Vedaad Shakib; Joseph Gonzalez; Trevor Darrell; Anna Rohrbach; Marcus Rohrbach; |
1488 | GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper proposes a Transformer-only neural architecture, dubbed GRIT (Grid- and Region-based Image captioning Transformer), that effectively utilizes the two visual features to generate better captions. |
Van-Quang Nguyen; Masanori Suganuma; Takayuki Okatani; |
1489 | Selective Query-Guided Debiasing for Video Corpus Moment Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Although recent debiasing methods have focused on removing this retrieval bias, we argue that these biased predictions sometimes should be preserved because there are many queries where biased predictions are rather helpful. To conjugate this retrieval bias, we propose a Selective Query-guided Debiasing network (SQuiDNet), which incorporates the following two main properties: (1) Biased Moment Retrieval that intentionally uncovers the biased moments inherent in objects of the query and (2) Selective Query-guided Debiasing that performs selective debiasing guided by the meaning of the query. |
Sunjae Yoon; Ji Woo Hong; Eunseop Yoon; Dahyun Kim; Junyeong Kim; Hee Suk Yoon; Chang D. Yoo; |
1490 | Spatial and Visual Perspective-Taking Via View Rotation and Relation Reasoning for Embodied Reference Understanding Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a REasoning from your Perspective (REP) method to tackle the challenge by modeling relations between the receiver and the sender as well as the sender and the objects via the proposed novel view rotation and relation reasoning. |
Cheng Shi; Sibei Yang; |
1491 | Object-Centric Unsupervised Image Captioning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we explore the task of unsupervised image captioning which utilizes unpaired images and texts to train the model so that the texts can come from different sources than the images. |
Zihang Meng; David Yang; Xuefei Cao; Ashish Shah; Ser-Nam Lim; |
1492 | Contrastive Vision-Language Pre-training with Limited Resources Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we propose a stack of novel methods, which significantly cut down the heavy resource dependency and allow us to conduct dual-encoder multi-modal representation alignment with limited resources. |
Quan Cui; Boyan Zhou; Yu Guo; Weidong Yin; Hao Wu; Osamu Yoshie; Yubo Chen; |
1493 | Learning Linguistic Association Towards Efficient Text-Video Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a general framework, LINguistic ASsociation (LINAS), which utilizes the complementarity between captions corresponding to the same video. |
Sheng Fang; Shuhui Wang; Junbao Zhuo; Xinzhe Han; Qingming Huang; |
1494 | ASSISTER: Assistive Navigation Via Conditional Instruction Generation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a novel vision-and-language navigation (VLN) task of learning to provide real-time guidance to a blind follower situated in complex dynamic navigation scenarios. |
Zanming Huang; Zhongkai Shangguan; Jimuyang Zhang; Gilad Bar; Matthew Boyd; Eshed Ohn-Bar; |
1495 | X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. |
Zhaowei Cai; Gukyeong Kwon; Avinash Ravichandran; Erhan Bas; Zhuowen Tu; Rahul Bhotika; Stefano Soatto; |
1496 | Learning Disentanglement with Decoupled Labels for Vision-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, most methods only utilize the whole complex instruction or inaccurate sub-instructions due to the lack of accurate disentanglement as an intermediate supervision stage. To address this problem, we propose a new Disentanglement framework with Decoupled Labels (DDL) for VLN. |
Wenhao Cheng; Xingping Dong; Salman Khan; Jianbing Shen; |
1497 | Switch-BERT: Learning to Model Multimodal Interactions By Switching Attention and Input Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: They can achieve exceptional performances on specific tasks, but face a particularly challenging problem of modality mismatch because of diversity of input modalities and their fixed structures. In this paper, we present Switch-BERT for joint vision and language representation learning to address this problem. |
Qingpei Guo; Kaisheng Yao; Wei Chu; |
1498 | Word-Level Fine-Grained Story Visualization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Current works still struggle with output images’ quality and consistency, and rely on additional semantic information or auxiliary captioning networks. To address these challenges, we first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem. Then, we propose a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency. |
Bowen Li; |
1499 | Unifying Event Detection and Captioning As Sequence Generation Via Pre-training Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Besides, previous event detection methods normally ignore temporal dependencies between events, leading to event redundancy or inconsistency problems. To tackle above the two defects, in this paper, we define event detection as a sequence generation task and propose a unified pre-training and fine-tuning framework to naturally enhance the inter-task association between event detection and captioning. |
Qi Zhang; Yuqing Song; Qin Jin; |
1500 | Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering a single fixed-length vector is often insufficient to capture long-term temporal context, in this paper, we introduce Multimodal Transformer with Variable-length Memory (MTVM) for visually-grounded natural language navigation by modeling the temporal context explicitly. |
Chuang Lin; Yi Jiang; Jianfei Cai; Lizhen Qu; Gholamreza Haffari; Zehuan Yuan; |
1501 | Fine-Grained Visual Entailment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Visual entailment is a recently proposed multimodal reasoning task where the goal is to predict the logical relationship of a piece of text to an image. In this paper, we propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image. |
Christopher Thomas; Yipeng Zhang; Shih-Fu Chang; |
1502 | Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a language grounding model that attends on the referential utterance and on the object proposal pool computed from a pre-trained detector to decode referenced objects with a detection head, without selecting them from the pool. |
Ayush Jain; Nikolaos Gkanatsios; Ishita Mediratta; Katerina Fragkiadaki; |
1503 | New Datasets and Models for Contextual Reasoning in Visual Dialog Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on developing new datasets and models to highlight the role of contextual reasoning in VD. |
Yifeng Zhang; Ming Jiang; Qi Zhao; |
1504 | VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis Via Speech-Visage Feature Selection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: The goal of this work is to reconstruct speech from a silent talking face video. |
Joanna Hong; Minsu Kim; Yong Man Ro; |
1505 | Classification-Regression for Chart Comprehension Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Most existing CQA datasets and models are based on simplifying assumptions that often enable surpassing human performance. In this work, we address this outcome and propose a new model that jointly learns classification and regression. |
Matan Levy; Rami Ben-Ari; Dani Lischinski; |
1506 | AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we define a new task called Affordance-centric Question-driven Task Completion, where the AI assistant should learn from instructional videos to provide step-by-step help in the user’s view. |
Benita Wong; Joya Chen; You Wu; Stan Weixian Lei; Dongxing Mao; Difei Gao; Mike Zheng Shou; |
1507 | FindIt: Generalized Localization with Natural Language Queries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose FindIt, a simple and versatile framework that unifies a variety of visual grounding and localization tasks including referring expression comprehension, text-based localization, and object detection. |
Weicheng Kuo; Fred Bertsch; Wei Li; AJ Piergiovanni; Mohammad Saffar; Anelia Angelova; |
1508 | UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose UniTAB that Unifies Text And Box outputs for grounded vision-language (VL) modeling. |
Zhengyuan Yang; Zhe Gan; Jianfeng Wang; Xiaowei Hu; Faisal Ahmed; Zicheng Liu; Yumao Lu; Lijuan Wang; |
1509 | Scaling Open-Vocabulary Image Segmentation with Image-Level Labels Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We argue that these models miss an important step of visual grouping, which organizes pixels into groups before learning visual-semantic alignments. We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions. |
Golnaz Ghiasi; Xiuye Gu; Yin Cui; Tsung-Yi Lin; |
1510 | The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. |
Jack Hessel; Jena D. Hwang; Jae Sung Park; Rowan Zellers; Chandra Bhagavatula; Anna Rohrbach; Kate Saenko; Yejin Choi; |
1511 | Speaker-Adaptive Lip Reading with User-Dependent Padding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding. |
Minsu Kim; Hyunjun Kim; Yong Man Ro; |
1512 | TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we conduct a study on the state-of-the-art methods for text-to-image synthesis and propose a framework to evaluate these methods.To overcome these issues, we propose a combined set of existing and new metrics to systematically evaluate the methods. |
Tan M. Dinh; Rang Nguyen; Binh-Son Hua; |
1513 | SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an effective technique for image augmentation by injecting contextually meaningful knowledge into the scenes. |
Morgan Heisler; Amin Banitalebi-Dehkordi; Yong Zhang; |
1514 | Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce the problem of referring object manipulation (ROM), which aims to generate photo-realistic image edits regarding two textual descriptions: 1) a text referring to an object in the input image and 2) a text describing how to manipulate the referred object. |
Myungsub Choi; |
1515 | NewsStories: Illustrating Articles with Visual Summaries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. |
Reuben Tan; Bryan A. Plummer; Kate Saenko; JP Lewis; Avneesh Sud; Thomas Leung; |
1516 | Webly Supervised Concept Expansion for General Purpose Vision Models Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs — the ability to transfer visual knowledge across skills. |
Amita Kamath; Christopher Clark; Tanmay Gupta; Eric Kolve; Derek Hoiem; Aniruddha Kembhavi; |
1517 | FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we introduce privacy-preserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. |
Kaiwen Zhou; Xin Eric Wang; |
1518 | CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose our novel Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation. |
Haoran Wang; Dongliang He; Wenhao Wu; Boyang Xia; Min Yang; Fu Li; Yunlong Yu; Zhong Ji; Errui Ding; Jingdong Wang; |
1519 | Language-Driven Artistic Style Transfer Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce a new task—language-driven artistic style transfer (LDAST)—to manipulate the style of a content image, guided by a text. |
Tsu-Jui Fu; Xin Eric Wang; William Yang Wang; |
1520 | Single-Stream Multi-level Alignment for Vision-Language Pretraining Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a single stream architecture that aligns images and language at multiple levels: global, fine-grained patch-token, and conceptual/semantic, using two novel tasks: symmetric cross-modality reconstruction (XMM) and a pseudo-labeled key word prediction (PSL). |
Zaid Khan; Vijay Kumar B G; Xiang Yu; Samuel Schulter; Manmohan Chandraker; Yun Fu; |
1521 | Most and Least Retrievable Images in Visual-Language Query Systems Related Papers Related Patents Related Grants Related Orgs Related Experts View Abstract: This is the first work to introduce the Most Retrievable Image(MRI) and Least Retrievable Image(LRI) concepts in modern text-to-image retrieval systems. An MRI is associated with … |
Liuwan Zhu; Rui Ning; Jiang Li; Chunsheng Xin; Hongyi Wu; |
1522 | Sports Video Analysis on Large-Scale Data Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those datasets and real-world applications (2) previously proposed methods require extensive annotation efforts (i.e., player and ball segmentation at pixel level) on localizing useful visual features to yield acceptable results (3) very few public datasets are available. In this paper, we propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning, to address the above challenges. |
Dekun Wu; He Zhao; Xingce Bao; Richard P. Wildes; |
1523 | Grounding Visual Representations with Texts for Domain Generalization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we advocate for leveraging natural language supervision for the domain generalization task. |
Seonwoo Min; Nokyung Park; Siwon Kim; Seunghyun Park; Jinkyu Kim; |
1524 | Bridging The Visual Semantic Gap in VLN Via Semantically Richer Instructions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To encourage a more suitable use of the visual information, we propose a new data augmentation method that fosters the inclusion of more explicit visual information in the generation of textual navigational instructions. |
Joaquí,n Ossandó,n; Benjamí,n Earle; Alvaro Soto; |
1525 | StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Hence, we first propose the task of story continuation, where the generated visual story is conditioned on a source image, allowing for better generalization to narratives with new characters. Then, we enhance or ‘retro-fit’ the pretrained text-to-image synthesis models with task-specific modules for (a) sequential image generation and (b) copying relevant elements from an initial frame. |
Adyasha Maharana; Darryl Hannan; Mohit Bansal; |
1526 | VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Current methods rely heavily on training to a specific domain (e.g., only faces), manual work or algorithm tuning to latent vector discovery, and manual effort in mask selection to alter only a part of an image. We address all of these usability constraints while producing images of high visual and semantic quality through a unique combination of OpenAI’s CLIP (Radford et al., 2021), VQGAN (Esser et al., 2021), and a generation augmentation strategy to produce VQGAN-CLIP. |
Katherine Crowson; Stella Biderman; Daniel Kornis; Dashiell Stander; Eric Hallahan; Louis Castricato; Edward Raff; |
1527 | Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which creates delicate audio-driven portraits using one unified set of NeRF. |
Xian Liu; Yinghao Xu; Qianyi Wu; Hang Zhou; Wayne Wu; Bolei Zhou; |
1528 | End-to-End Active Speaker Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an end-to-end ASD workflow where feature learning and contextual predictions are jointly learned. |
Juan Leó,n Alcá,zar; Moritz Cordes; Chen Zhao; Bernard Ghanem; |
1529 | Emotion Recognition for Multiple Context Awareness Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To alleviate these issues, we present a context-aware emotion recognition framework that combines four complementary contexts. |
Dingkang Yang; Shuai Huang; Shunli Wang; Yang Liu; Peng Zhai; Liuzhen Su; Mingcheng Li; Lihua Zhang; |
1530 | Adaptive Fine-Grained Sketch-Based Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we offer a novel perspective — instead of asking for a model that generalises, we advocate for one that quickly adapts, with just very few samples during testing (in a few-shot manner). |
Ayan Kumar Bhunia; Aneeshan Sain; Parth Hiren Shah; Animesh Gupta; Pinaki Nath Chowdhury; Tao Xiang; Yi-Zhe Song; |
1531 | Quantized GAN for Complex Music Generation from Dance Videos Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos. |
Ye Zhu; Kyle Olszewski; Yu Wu; Panos Achlioptas; Menglei Chai; Yan Yan; Sergey Tulyakov; |
1532 | Uncertainty-Aware Multi-modal Learning Via Cross-Modal Random Network Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new Uncertainty-aware Multi-modal Learner that estimates uncertainty by measuring feature density via Cross-modal Random Network Prediction (CRNP). |
Hu Wang; Jianpeng Zhang; Yuanhong Chen; Congbo Ma; Jodie Avery; Louise Hull; Gustavo Carneiro; |
1533 | Localizing Visual Sounds The Easy Way Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a simple yet effective approach for Easy Visual Sound Localization, namely EZ-VSL, without relying on the construction of positive and/or negative regions during training. |
Shentong Mo; Pedro Morgado; |
1534 | Learning Visual Styles from Audio-Visual Associations Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a method for learning visual styles from unlabeled audio-visual data. |
Tingle Li; Yichen Liu; Andrew Owens; Hang Zhao; |
1535 | Remote Respiration Monitoring of Moving Person Using Radio Signals Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this study, we examine the task of estimating the respiration signal of a non-stationary subject (a person with large body movements or even walking around) based on radio signals. |
Jae-Ho Choi; Ki-Bong Kang; Kyung-Tae Kim; |
1536 | Camera Pose Estimation and Localization with Active Audio Sensing Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we show how to estimate a device’s position and orientation indoors by echolocation, i.e., by interpreting the echoes of an audio signal that the device itself emits. |
Karren Yang; Michael Firman; Eric Brachmann; Clé,ment Godard; |
1537 | PACS: A Dataset for Physical Audiovisual Commonsense Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our paper takes a step towards real-world physical commonsense reasoning by contributing PACS: the first audiovisual benchmark annotated for physical commonsense attributes. |
Samuel Yu; Peter Wu; Paul Pu Liang; Ruslan Salakhutdinov; Louis-Philippe Morency; |
1538 | VoViT: Low Latency Graph-Based Audio-Visual Voice Separation Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper presents an audio-visual approach for voice separation which produces state-of-the-art results at a low latency in two scenarios: speech and singing voice. |
Juan F. Montesinos; Venkatesh S. Kadandale; Gloria Haro; |
1539 | Telepresence Video Quality Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Here we address the significant challenges of Telepresence Video Quality Assessment (TVQA) in several ways. |
Zhenqiang Ying; Deepti Ghadiyaram; Alan Bovik; |
1540 | MultiMAE: Multi-modal Multi-task Masked Autoencoders Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a pre-training strategy called Multi-modal Multi-task Masked Autoencoders (MultiMAE). |
Roman Bachmann; David Mizrahi; Andrei Atanov; Amir Zamir; |
1541 | AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce AudioScopeV2, a state-of-the-art universal audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. |
Efthymios Tzinis; Scott Wisdom; Tal Remez; John R. Hershey; |
1542 | Audio—Visual Segmentation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To deal with the AVS problem, we propose a new method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. |
Jinxing Zhou; Jianyuan Wang; Jiayi Zhang; Weixuan Sun; Jing Zhang; Stan Birchfield; Dan Guo; Lingpeng Kong; Meng Wang; Yiran Zhong; |
1543 | Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this problem, we need to suppress the light effects in bright regions while, at the same time, boosting the intensity of dark regions. With this idea in mind, we introduce an unsupervised method that integrates a layer decomposition network and a light-effects suppression network. |
Yeying Jin; Wenhan Yang; Robby T. Tan; |
1544 | Relationformer: A Unified Framework for Image-to-Graph Generation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations. |
Suprosanna Shit; Rajat Koner; Bastian Wittmann; Johannes Paetzold; Ivan Ezhov; Hongwei Li; Jiazhen Pan; Sahand Sharifzadeh; Georgios Kaissis; Volker Tresp; Bjoern Menze; |
1545 | GAMa: Cross-view Video Geo-localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. |
Shruti Vyas; Chen Chen; Mubarak Shah; |
1546 | Revisiting A KNN-based Image Classification System with High-capacity Storage Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we investigate a system that stores knowledge for image classification, such as image feature maps, labels, and original images, not in model parameters but in external storage. |
Kengo Nakata; Youyang Ng; Daisuke Miyashita; Asuka Maki; Yu-Chieh Lin; Jun Deguchi; |
1547 | Geometric Representation Learning for Document Image Rectification Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To this end, we present DocGeoNet for document image rectification by introducing explicit geometric representation. |
Hao Feng; Wengang Zhou; Jiajun Deng; Yuechen Wang; Houqiang Li; |
1548 | S2-VER: Semi-Supervised Visual Emotion Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Both of them would induce the suboptimal performance of the learned model. To address these issues, we propose S2-VER, the first SSL algorithm for VER, which consists of two com- ponents. |
Guoli Jia; Jufeng Yang; |
1549 | Image Coding for Machines with Omnipotent Feature Learning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we attempt to develop an ICM framework by learning universal features while also considering compression. |
Ruoyu Feng; Xin Jin; Zongyu Guo; Runsen Feng; Yixin Gao; Tianyu He; Zhizheng Zhang; Simeng Sun; Zhibo Chen; |
1550 | Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate the unsupervised cross-domain image retrieval task, where class labels and pairing annotations are no longer a prerequisite for training. |
Conghui Hu; Gim Hee Lee; |
1551 | Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on joint human fashion segmentation and attribute recognition. |
Shilin Xu; Xiangtai Li; Jingbo Wang; Guangliang Cheng; Yunhai Tong; Dacheng Tao; |
1552 | Semantic-Guided Multi-Mask Image Harmonization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel way to edit the inharmonious images by predicting a series of operator masks. |
Xuqian Ren; Yifan Liu; |
1553 | Learning An Isometric Surface Parameterization for Texture Unwrapping Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present a novel approach to learn texture mapping for an isometrically deformed 3D surface and apply it for texture unwrapping of documents or other objects. |
Sagnik Das; Ke Ma; Zhixin Shu; Dimitris Samaras; |
1554 | Towards Regression-Free Neural Networks for Diverse Compute Platforms Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We introduce REGression constrained Neural Architecture Search (REG-NAS) to design a family of highly accurate models that engender fewer negative flips. |
Rahul Duggal; Hao Zhou; Shuo Yang; Jun Fang; Yuanjun Xiong; Wei Xia; |
1555 | Relationship Spatialization for Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To this end, we strive to spatializing the relationships by devising a novel learning-based framework. |
Xiaoyu Xu; Jiayan Qiu; Xinchao Wang; Zhou Wang; |
1556 | Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Our paper explores the potential of transferring 2D model architectures and weights to understand 3D point-clouds, by empirically investigating the feasibility of the transfer, the benefits of the transfer, and shedding light on why the transfer works. |
Chenfeng Xu; Shijia Yang; Tomer Galanti; Bichen Wu; Xiangyu Yue; Bohan Zhai; Wei Zhan; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka; |
1557 | FAR: Fourier Aerial Video Recognition Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a method, Fourier Activity Recognition (FAR), for UAV video activity recognition. |
Divya Kothandaraman; Tianrui Guan; Xijun Wang; Shuowen Hu; Ming Lin; Dinesh Manocha; |
1558 | Translating A Visual LEGO Manual to A Machine-Executable Plan Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This task poses the challenge of establishing a 2D-3D correspondence between the manual image and the real 3D object, and 3D pose estimation for unseen 3D objects, since a new component to be added in a step can be an object built from previous steps. To address these two challenges, we present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images. |
Ruocheng Wang; Yunzhi Zhang; Jiayuan Mao; Chin-Yi Cheng; Jiajun Wu; |
1559 | Fabric Material Recovery from Video Using Multi-Scale Geometric Auto-Encoder Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose an end-to-end network model that uses video input to estimate the fabric materials of the garment worn by a human or an avatar in a virtual world. |
Junbang Liang; Ming Lin; |
1560 | MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose MegBA, a GPU-based distributed BA library. |
Jie Ren; Wenteng Liang; Ran Yan; Luo Mai; Shiwen Liu; Xiao Liu; |
1561 | The One Where They Reconstructed 3D Humans and Environments in TV Shows Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the majority of the existing work focuses on 2D recognition tasks. In this paper, we make the observation that there is a certain persistence in TV shows, i.e., repetition of the environments and the humans, which makes possible the 3D reconstruction of this content. |
Georgios Pavlakos; Ethan Weber; Matthew Tancik; Angjoo Kanazawa; |
1562 | TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices Using Submodular Mutual Information Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose TALISMAN, a novel framework for Targeted Active Learning or object detectIon with rare slices using Submodular MutuAl iNformation. |
Suraj Kothawade; Saikat Ghosh; Sumit Shekhar; Yu Xiang; Rishabh Iyer; |
1563 | An Efficient Person Clustering Algorithm for Open Checkout-Free Groceries Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Motivated by unique challenges in the open checkout-free grocery, we propose an efficient and effective person clustering method. |
Junde Wu; Yu Zhang; Rao Fu; Yuanpei Liu; Jing Gao; |
1564 | POP: Mining POtential Performance of New Fashion Products Via Webly Cross-Modal Query Expansion Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a data-centric pipeline able to generate exogenous observation data for the New Fashion Product Performance Forecasting (NFPPF) problem, i.e., predicting the performance of a brand-new clothing probe with no available past observations. |
Christian Joppi; Geri Skenderi; Marco Cristani; |
1565 | Pose Forecasting in Industrial Human-Robot Collaboration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting.As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO). |
Alessio Sampieri; Guido Maria D’Amely di Melendugno; Andrea Avogaro; Federico Cunico; Francesco Setti; Geri Skenderi; Marco Cristani; Fabio Galasso; |
1566 | Actor-Centered Representations for Action Localization in Streaming Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a framework driven by the notion of hierarchical predictive learning to construct actor-centered features by attention-based contextualization. |
Sathyanarayanan Aakur; Sudeep Sarkar; |
1567 | Bandwidth-Aware Adaptive Codec for DNN Inference Offloading in IoT Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This paper presents AutoJPEG, a bandwidth-aware adaptive compression solution that learns the JPEG encoding parameters to optimize the DNN inference accuracy under bandwidth constraints. |
Xiufeng Xie; Ning Zhou; Wentao Zhu; Ji Liu; |
1568 | Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To aggravate the problem, the errors to be detected in the workouts are very subtle. To that end, we propose to learn exercise-oriented image and video representations from unlabeled samples such that a small dataset annotated by experts suffices for supervised error detection. |
Paritosh Parmar; Amol Gharat; Helge Rhodin; |
1569 | Responsive Listening Head Generation: A Benchmark Dataset and Baseline Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a novel dataset ViCo, highlighting the listening head generation during a face-to-face conversation.We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation. |
Mohan Zhou; Yalong Bai; Wei Zhang; Ting Yao; Tiejun Zhao; Tao Mei; |
1570 | Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular Depth Estimation By Integrating IMU Motion Dynamics Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Although current methods have reached a high up-to-scale accuracy, they usually fail to learn the true scale metric due to the inherent scale ambiguity from training with monocular sequences. In this work, we tackle this problem and propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics. |
Sen Zhang; Jing Zhang; Dacheng Tao; |
1571 | TIPS: Text-Induced Pose Synthesis Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. |
Prasun Roy; Subhankar Ghosh; Saumik Bhattacharya; Umapada Pal; Michael Blumenstein; |
1572 | Addressing Heterogeneity in Federated Learning Via Distributional Transformation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel framework, called DisTrans, to improve FL performance (i.e., model accuracy) via train and test-time distributional transformations along with a double-input-channel model structure. |
Haolin Yuan; Bo Hui; Yuchen Yang; Philippe Burlina; Neil Zhenqiang Gong; Yinzhi Cao; |
1573 | Where in The World Is This Image? Transformer-Based Geo-Localization in The Wild Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. |
Shraman Pramanick; Ewa M. Nowara; Joshua Gleason; Carlos D. Castillo; Rama Chellappa; |
1574 | Colorization for In Situ Marine Plankton Images Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel deep networks-based vision system IsPlanktonCLR for automatic colorization of in situ marine plankton images. |
Guannan Guo; Qi Lin; Tao Chen; Zhenghui Feng; Zheng Wang; Jianping Li; |
1575 | Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose an adaptive deep-learning based VIO method that reduces computational redundancy by opportunistically disabling the visual modality. |
Mingyu Yang; Yu Chen; Hun-Seok Kim; |
1576 | A Sketch Is Worth A Thousand Words: Image Retrieval with Text and Sketch Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for image retrieval using a text description and a sketch as input. |
Patsorn Sangkloy; Wittawat Jitkrittum; Diyi Yang; James Hays; |
1577 | A Cloud 3D Dataset and Application-Specific Learned Image Compression in Cloud 3D Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This paper explores computation time reduction techniques for learned image compression to make it more suitable for cloud 3D. |
Tianyi Liu; Sen He; Vinodh Kumaran Jayakumar; Wei Wang; |
1578 | AutoTransition: Learning to Recommend Video Transition Effects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present the premier work on performing automatic video transitions recommendation (VTR): given a sequence of raw video shots and companion audio, recommend video transitions for each pair of neighboring shots. |
Yaojie Shen; Libo Zhang; Kai Xu; Xiaojie Jin; |
1579 | Online Segmentation of LiDAR Sequences: Dataset and Algorithm Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To address this issue, we first introduce HelixNet, a 10 billion point dataset with fine-grained labels, timestamps, and sensor rotation information necessary to accurately assess the real-time readiness of segmentation algorithms. Second, we propose Helix4D, a compact and efficient spatio-temporal transformer architecture specifically designed for rotating LiDAR sequences. |
Romain Loiseau; Mathieu Aubry; Loï,c Landrieu; |
1580 | Open-World Semantic Segmentation for LIDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a REdundAncy cLassifier (REAL) framework to provide a general architecture for both open-set semantic segmentation and incremental learning. |
Jun Cen; Peng Yun; Shiwei Zhang; Junhao Cai; Di Luan; Mingqian Tang; Ming Liu; Michael Yu Wang; |
1581 | KING: Generating Safety-Critical Driving Scenarios for Robust Imitation Via Kinematics Gradients Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we study this approach to safety-critical driving scenario generation using the CARLA simulator. |
Niklas Hanselmann; Katrin Renz; Kashyap Chitta; Apratim Bhattacharyya; Andreas Geiger; |
1582 | Differentiable Raycasting for Self-Supervised Occupancy Forecasting Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we use geometric occupancy as a natural alternative to view-dependent representations such as freespace. |
Tarasha Khurana; Peiyun Hu; Achal Dave; Jason Ziglar; David Held; Deva Ramanan; |
1583 | InAction: Interpretable Action Decision Making for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel Interpretable Action decision making (InAction) model to provide an enriched explanation from both explicit human annotation and implicit visual semantics. |
Taotao Jing; Haifeng Xia; Renran Tian; Haoran Ding; Xiao Luo; Joshua Domeyer; Rini Sherony; Zhengming Ding; |
1584 | CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. |
Jyh-Jing Hwang; Henrik Kretzschmar; Joshua Manela; Sean Rafferty; Nicholas Armstrong-Crews; Tiffany Chen; Dragomir Anguelov; |
1585 | CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases. Hence, we introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors. |
Kaican Li; Kai Chen; Haoyu Wang; Lanqing Hong; Chaoqiang Ye; Jianhua Han; Yukuai Chen; Wei Zhang; Chunjing Xu; Dit-Yan Yeung; Xiaodan Liang; Zhenguo Li; Hang Xu; |
1586 | Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: To address this difficulty, this paper pioneers a novel and challenging direction, i.e., training perception and prediction models to understand open-set moving objects, with no human supervision. |
Mahyar Najibi; Jingwei Ji; Yin Zhou; Charles R. Qi; Xinchen Yan; Scott Ettinger; Dragomir Anguelov; |
1587 | StretchBEV: Stretching Future Instance Prediction Spatially and Temporally Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, the quality of future predictions degrades over time while extending to longer time horizons due to multiple plausible predictions. In this work, we address this inherent uncertainty in future predictions with a stochastic temporal model. |
Adil Kaan Akan; Fatma Gü,ney; |
1588 | RCLane: Relay Chain Prediction for Lane Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we present a new method for lane detection based on relay chain prediction. |
Shenghua Xu; Xinyue Cai; Bin Zhao; Li Zhang; Hang Xu; Yanwei Fu; Xiangyang Xue; |
1589 | Drive\&Segment: Unsupervised Semantic Segmentation of Urban Scenes Via Cross-Modal Distillation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: This work investigates learning pixel-wise semantic image segmentation in urban scenes without any manual annotation, just from the raw non-curated data collected by cars which, equipped with cameras and LiDAR sensors, drive around a city. |
Antonin Vobecky; David Hurych; Oriane Simé,oni; Spyros Gidaris; Andrei Bursuc; Patrick Pé,rez; Josef Sivic; |
1590 | CenterFormer: Center-based Transformer for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. |
Zixiang Zhou; Xiangchen Zhao; Yu Wang; Panqu Wang; Hassan Foroosh; |
1591 | Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop an attack against learning-based MDE. |
Zhiyuan Cheng; James Liang; Hongjun Choi; Guanhong Tao; Zhiwen Cao; Dongfang Liu; Xiangyu Zhang; |
1592 | ST-P3: End-to-End Vision-Based Autonomous Driving Via Spatial-Temporal Feature Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. |
Shengchao Hu; Li Chen; Penghao Wu; Hongyang Li; Junchi Yan; Dacheng Tao; |
1593 | PersFormer: 3D Lane Detection Via Perspective Transformer and The OpenLane Benchmark Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Previous work struggled in complex cases due to their simple designs of the spatial transformation between front view and bird’s eye view (BEV) and the lack of a realistic dataset. Towards these issues, we present PersFormer: an end-to-end monocular 3D lane detector with a novel Transformer-based spatial feature transformation module. |
Li Chen; Chonghao Sima; Yang Li; Zehan Zheng; Jiajie Xu; Xiangwei Geng; Hongyang Li; Conghui He; Jianping Shi; Yu Qiao; Junchi Yan; |
1594 | PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, previous methods often fail to counteract particular regions related to dynamic objects with more severe environmental changes. To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation. |
Kwonyoung Kim; Jungin Park; Jiyoung Lee; Dongbo Min; Kwanghoon Sohn; |
1595 | BRNet: Exploring Comprehensive Features for Monocular Depth Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we address the comprehensive feature representation problem for self-supervised depth estimation.i.e paying attention to both local and global feature representation. |
Wencheng Han; Junbo Yin; Xiaogang Jin; Xiangdong Dai; Jianbing Shen; |
1596 | SiamDoGe: Domain Generalizable Semantic Segmentation Using Siamese Network Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel domain generalizable semantic segmentation method, “SiamDoGe”, which builds upon a DR approach without using auxiliary domains and employs a Siamese architecture to learn domain-agnostic features from the training dataset. |
Zhenyao Wu; Xinyi Wu; Xiaoping Zhang; Lili Ju; Song Wang; |
1597 | Context-Aware Streaming Perception in Dynamic Environments Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose to maximize streaming accuracy for every environment context. |
Gur-Eyal Sela; Ionel Gog; Justin Wong; Kumar Krishna Agrawal; Xiangxi Mo; Sukrit Kalra; Peter Schafhalter; Eric Leong; Xin Wang; Bharathan Balaji; Joseph Gonzalez; Ion Stoica; |
1598 | SpOT: Spatiotemporal Modeling for 3D Object Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we develop a holistic representation of traffic scenes that leverages both spatial and temporal information of the actors in the scene. |
Colton Stearns; Davis Rempe; Jie Li; Rare? Ambru?; Sergey Zakharov; Vitor Guizilini; Yanchao Yang; Leonidas J. Guibas; |
1599 | Multimodal Transformer for Automatic 3D Annotation and Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. |
Chang Liu; Xiaoyan Qian; Binxiao Huang; Xiaojuan Qi; Edmund Lam; Siew-Chong Tan; Ngai Wong; |
1600 | Dynamic 3D Scene Analysis By Point Cloud Accumulation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In the present paper, we explore multi-frame point cloud accumulation as a mid-level representation of 3D scan sequences, and develop a method that exploits inductive biases of outdoor street scenes, including their geometric layout and object-level rigidity. |
Shengyu Huang; Zan Gojcic; Jiahui Huang; Andreas Wieser; Konrad Schindler; |
1601 | Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a homogeneous multi-modal feature fusion and interaction method (HMFI) for 3D object detection. |
Xin Li; Botian Shi; Yuenan Hou; Xingjiao Wu; Tianlong Ma; Yikang Li; Liang He; |
1602 | JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we address these issues by proposing a novel joint perception framework named JPerceiver, which can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence. |
Haimei Zhao; Jing Zhang; Sen Zhang; Dacheng Tao; |
1603 | Semi-Supervised 3D Object Detection with Proficient Teachers Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a new Pseudo-Labeling framework for semi-supervised 3D object detection, by enhancing the teacher model to a proficient one with several necessary designs. |
Junbo Yin; Jin Fang; Dingfu Zhou; Liangjun Zhang; Cheng-Zhong Xu; Jianbing Shen; Wenguan Wang; |
1604 | Point Cloud Compression with Sibling Context and Surface Priors Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We present a novel octree-based multi-level framework for large-scale point cloud compression, which can organize sparse and unstructured point clouds in a memory-efficient way. |
Zhili Chen; Zian Qian; Sukai Wang; Qifeng Chen; |
1605 | Lane Detection Transformer Based on Multi-Frame Horizontal and Vertical Attention and Visual Transformer Module Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a novel lane detection Transformer based on multi-frame input to regress the parameters of lanes under a lane shape modeling. |
Han Zhang; Yunchao Gu; Xinliang Wang; Junjun Pan; Minghui Wang; |
1606 | ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Considering region-level representations are more suitable for 3D object detection, we devise a new unsupervised point cloud pre-training framework, called ProposalContrast, that learns robust 3D representations by contrasting region proposals. |
Junbo Yin; Dingfu Zhou; Liangjun Zhang; Jin Fang; Cheng-Zhong Xu; Jianbing Shen; Wenguan Wang; |
1607 | PreTraM: Self-Supervised Pre-training Via Connecting Trajectory and Map Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose PreTraM, a self-supervised Pre-training scheme via connecting Trajectories and Maps for trajectory forecasting. |
Chenfeng Xu; Tian Li; Chen Tang; Lingfeng Sun; Kurt Keutzer; Masayoshi Tomizuka; Alireza Fathi; Wei Zhan; |
1608 | Master of All: Simultaneous Generalization of Urban-Scene Segmentation to All Adverse Weather Conditions Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Even then, they typically adapt to a single(specific) target domain(s). To remedy this, we propose a novel, fully test time, adaptation technique, named \textit{Master of ALL} (\mall), for simultaneous generalization to multiple target domains. |
Nikhil Reddy; Abhinav Singhal; Abhishek Kumar; Mahsa Baktashmotlagh; Chetan Arora; |
1609 | LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Specifically, we leverage geometry patterns in outdoor scenes to have a heuristic pre-segmentation to reduce the manual labeling and jointly design the learning targets with the labeling process. |
Minghua Liu; Yin Zhou; Charles R. Qi; Boqing Gong; Hao Su; Dragomir Anguelov; |
1610 | Visual Cross-View Metric Localization with Dense Uncertainty Estimates Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: This work addresses visual cross-view metric localization for outdoor robotics. |
Zimin Xia; Olaf Booij; Marco Manfredi; Julian F. P. Kooij; |
1611 | V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles. |
Runsheng Xu; Hao Xiang; Zhengzhong Tu; Xin Xia; Ming-Hsuan Yang; Jiaqi Ma; |
1612 | DevNet: Self-Supervised Monocular Depth Learning Via Density Volume Construction Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, they neither fully exploit the 3D point-wise geometric correspondences, nor effectively tackle the ambiguities in the photometric warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework, that can consider 3D spatial information, and exploit stronger geometric constraints among adjacent camera frustums. |
Kaichen Zhou; Lanqing Hong; Changhao Chen; Hang Xu; Chaoqiang Ye; Qingyong Hu; Zhenguo Li; |
1613 | Action-Based Contrastive Learning for Trajectory Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we address the problem of predicting future pedestrian trajectories in a first person view setting with a moving camera. |
Marah Halawa; Olaf Hellwich; Pia Bideau; |
1614 | Radatron: Accurate Detection Using Multi-Resolution Cascaded MIMO Radar Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we present Radatron, a system capable of accurate object detection using mmWave radar as a stand-alone sensor. |
Sohrab Madani; Jayden Guan; Waleed Ahmed; Saurabh Gupta; Haitham Hassanieh; |
1615 | LiDAR Distillation: Bridging The Beam-Induced Domain Gap for 3D Object Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection. |
Yi Wei; Zibu Wei; Yongming Rao; Jiaxin Li; Jie Zhou; Jiwen Lu; |
1616 | Efficient Point Cloud Segmentation with Geometry-Aware Sparse Networks Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, these works fail to maintain the balance among performance, efficiency, and memory consumption, showing incapability to integrate sparsity and geometry appropriately. To address these issues, we propose the Geometry-aware Sparse Networks (GASN) by utilizing the sparsity and geometry of a point cloud in a single voxel representation. |
Maosheng Ye; Rui Wan; Shuangjie Xu; Tongyi Cao; Qifeng Chen; |
1617 | FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose a fast hierarchical network, FH-Net, which directly gets the key points flow through a lightweight Trans-flow layer utilizing the reliable local geometry prior, and optionally back-propagates the computed sparse flows through an inverse Trans-up layer to obtain hierarchical flows at different resolutions. |
Lihe Ding; Shaocong Dong; Tingfa Xu; Xinli Xu; Jie Wang; Jianan Li; |
1618 | SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Based on the key idea of DETR this paper introduces an object-centric 3D object detection framework that operates on a limited number of 3D object queries instead of dense bounding box proposals followed by non-maximum suppression. |
Simon Doll; Richard Schulz; Lukas Schneider; Viviane Benzin; Markus Enzweiler; Hendrik P.A. Lensch; |
1619 | Pixel-Wise Energy-Biased Abstention Learning for Anomaly Segmentation on Complex Urban Driving Scenes Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a new anomaly segmentation method, named pixel-wise energy-biased abstention learning (PEBAL), that explores pixel-wise abstention learning (AL) with a model that learns an adaptive pixel-level anomaly class, and an energy-based model (EBM) that learns inlier pixel distribution. |
Yu Tian; Yuyuan Liu; Guansong Pang; Fengbei Liu; Yuanhong Chen; Gustavo Carneiro; |
1620 | Rethinking Closed-Loop Training for Autonomous Driving Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents, such as how to design traffic scenarios and scale training environments. |
Chris Zhang; Runsheng Guo; Wenyuan Zeng; Yuwen Xiong; Binbin Dai; Rui Hu; Mengye Ren; Raquel Urtasun; |
1621 | SLiDE: Self-Supervised LiDAR De-Snowing Through Reconstruction Difficulty Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Semantic segmentation with snow labels would be a straightforward solution for removing them, but it requires laborious point-wise annotation. To address this problem, we propose a novel self-supervised learning framework for snow points removal in LiDAR point clouds. |
Gwangtak Bae; Byungjun Kim; Seongyong Ahn; Jihong Min; Inwook Shim; |
1622 | Generative Meta-Adversarial Network for Unseen Object Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we focus on the problem of navigating to unseen objects in new environments only based on limited training knowledge. |
Sixian Zhang; Weijie Li; Xinhang Song; Yubing Bai; Shuqiang Jiang; |
1623 | Object Manipulation Via Visual Target Localization Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose Manipulation via Visual Object Location Estimation (m-VOLE), an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible, thus robustly aiding the task of manipulating these objects throughout the episode. |
Kiana Ehsani; Ali Farhadi; Aniruddha Kembhavi; Roozbeh Mottaghi; |
1624 | MoDA: Map Style Transfer for Self-Supervised Domain Adaptation of Embodied Agents Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We propose a domain adaptation method, MoDA, which adapts a pretrained embodied agent to a new, noisy environment without ground-truth supervision. |
Eun Sun Lee; Junho Kim; SangWon Park; Young Min Kim; |
1625 | Housekeep: Tidying Virtual Households Using Commonsense Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. |
Yash Kant; Arun Ramachandran; Sriram Yenamandra; Igor Gilitschenski; Dhruv Batra; Andrew Szot; Harsh Agrawal; |
1626 | Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: Commercial depth sensors usually generate noisy and missing depths, especially on specular and transparent objects, which poses critical issues to downstream depth or point cloud-based tasks. To mitigate this problem, we propose a powerful RGBD fusion network, SwinDRNet, for depth restoration. |
Qiyu Dai; Jiyao Zhang; Qiwei Li; Tianhao Wu; Hao Dong; Ziyuan Liu; Ping Tan; He Wang; |
1627 | Resolving Copycat Problems in Visual Imitation Learning Via Residual Action Prediction Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: However, surprisingly people find that sometimes imitation from observation histories performs worse than imitation from the most recent observation. In this paper, we explain this phenomenon from the information flow within the neural network perspective. |
Chia-Chi Chuang; Donglin Yang; Chuan Wen; Yang Gao; |
1628 | OPD: Single-View 3D Openable Part Detection Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We address the task of predicting what parts of an object can open and how they move when they do so. |
Hanxiao Jiang; Yongsen Mao; Manolis Savva; Angel X. Chang; |
1629 | AirDet: Few-Shot Detection Without Fine-Tuning for Autonomous Exploration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We find that their major limitation is that the little but valuable information from a few support images is not fully exploited. To solve this problem, we propose a brand new architecture, AirDet, and surprisingly find that, by learning class-agnostic relation with the support images in all modules, including cross-scale object proposal network, shots aggregation module, and localization network, AirDet without fine-tuning achieves comparable or even better results than many fine-tuned methods, reaching up to 30-40% improvements. |
Bowen Li; Chen Wang; Pranay Reddy; Seungchan Kim; Sebastian Scherer; |
1630 | TransGrasp: Grasp Pose Estimation of A Category of Objects By Transferring Grasps from Only One Labeled Instance Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: However, most of existing methods require exact 3D object models available beforehand or a large amount of grasp annotations for training. To avoid these problems, we propose TransGrasp, a category-level grasp pose estimation method that predicts grasp poses of a category of objects by labeling only one object instance. |
Hongtao Wen; Jianhang Yan; Wanli Peng; Yi Sun; |
1631 | StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we propose State-Action-Reward Transformer (StARformer) for visual RL, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. |
Jinghuan Shang; Kumara Kahatapitiya; Xiang Li; Michael S. Ryoo; |
1632 | TIDEE: Tidying Up Novel Rooms Using Visuo-Semantic Commonsense Priors Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We introduce TIDEE, an embodied agent that tidies up a disordered scene based on learned commonsense object placement and room arrangement priors. |
Gabriel Sarch; Zhaoyuan Fang; Adam W. Harley; Paul Schydlo; Michael J. Tarr; Saurabh Gupta; Katerina Fragkiadaki; |
1633 | Learning Efficient Multi-agent Cooperative Visual Exploration Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper, we propose a novel RL-based multi-agent planning module, Multi-agent Spatial Planner (MSP). |
Chao Yu; Xinyi Yang; Jiaxuan Gao; Huazhong Yang; Yu Wang; Yi Wu; |
1634 | Zero-Shot Category-Level Object Pose Estimation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. |
Walter Goodwin; Sagar Vaze; Ioannis Havoutis; Ingmar Posner; |
1635 | Sim-to-Real 6D Object Pose Estimation Via Iterative Self-Training for Robotic Bin Picking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. |
Kai Chen; Rui Cao; Stephen James; Yichuan Li; Yun-Hui Liu; Pieter Abbeel; Qi Dou; |
1636 | Active Audio-Visual Separation of Dynamic Sound Sources Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone to recover the dynamic target audio, using self-attention to make high-quality estimates for current timesteps and also simultaneously improve its past estimates. |
Sagnik Majumder; Kristen Grauman; |
1637 | DexMV: Imitation Learning for Dexterous Manipulation from Human Videos Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this paper, we propose a new platform and pipeline, DexMV (Dexterous Manipulation from Videos), for imitation learning to bridge the gap between computer vision and robot learning. |
Yuzhe Qin; Yueh-Hua Wu; Shaowei Liu; Hanwen Jiang; Ruihan Yang; Yang Fu; Xiaolong Wang; |
1638 | Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Despite sharing the high-level task and even the underlying instruction-path data, performance on VLN-CE lags behind VLN significantly. In this work, we explore this gap by transferring an agent from the abstract environment of VLN to the continuous environment of VLN-CE. |
Jacob Krantz; Stefan Lee; |
1639 | Style-Agnostic Reinforcement Learning Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning in the reinforcement learning framework. |
Juyong Lee; Seokjun Ahn; Jaesik Park; |
1640 | Self-Supervised Interactive Object Segmentation Through A Singulation-and-Grasping Approach Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Instance segmentation with unseen objects is a challenging problem in unstructured environments. To solve this problem, we propose a robot learning approach to actively interact with novel objects and collect each object’s training label for further fine-tuning to improve the segmentation model performance, while avoiding the time-consuming process of manually labeling a dataset. |
Houjian Yu; Changhyun Choi; |
1641 | Learning from Unlabeled 3D Environments for Vision-and-Language Navigation Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: In this work, we address the data scarcity issue by proposing to automatically create a large-scale VLN dataset from 900 unlabeled 3D buildings from HM3D. |
Shizhe Chen; Pierre-Louis Guhur; Makarand Tapaswi; Cordelia Schmid; Ivan Laptev; |
1642 | BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: Many in the wild sequences of human motion are captured by a moving camera, which adds the complication of conflated camera and human motion to the estimation. We therefore present BodySLAM, a monocular SLAM system that jointly estimates the position, shape, and posture of human bodies, as well as the camera trajectory. |
Dorian F. Henning; Tristan Laidlow; Stefan Leutenegger; |
1643 | FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: We overcome this shortcoming by presenting a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks.In order to assess the fusion capabilities of our model thoroughly, we created three novel datasets for image fusion based on popular computer vision datasets. |
Fabian Duffhauss; Ngo Anh Vien; Hanna Ziesche; Gerhard Neumann; |
1644 | Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning Related Papers Related Patents Related Grants Related Orgs Related Experts View Highlight: In this work, we follow the classicist’s call and propose a hybrid approach to improve systematic generalization in reasoning. |
Chi Zhang; Sirui Xie; Baoxiong Jia; Ying Nian Wu; Song-Chun Zhu; Yixin Zhu; |
1645 | Video Dialog As Conversation About Objects Living in Space-Time Related Papers Related Patents Related Grants Related Orgs Related Experts Related Code View Highlight: The task poses great visual, linguistic, and reasoning challenges that cannot be easily overcome without an appropriate representation scheme over video and dialog that supports high-level reasoning. To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST – which stands for Conversation about Objects in Space-Time. |
Hoang-Anh Pham; Thao Minh Le; Vuong Le; Tu Minh Phuong; Truyen Tran; |