Paper Digest: CVPR 2020 Highlights
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2020, it is to be held virtually due to covid-19 pandemic. There were more than 6,600 paper submissions, of which ~1,470 were accepted. More than 200 papers also published their code (download link).
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: Paper Digest: CVPR 2020 Highlights v0
Title | Authors | Highlight | |
---|---|---|---|
1 | Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild | Shangzhe Wu; Christian Rupprecht; Andrea Vedaldi; | We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. |
2 | Footprints and Free Space From a Single Color Image | Jamie Watson; Michael Firman; Aron Monszpart; Gabriel J. Brostow; | We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input. |
3 | Dynamic Fluid Surface Reconstruction Using Deep Neural Network | Simron Thapa; Nianyi Li; Jinwei Ye; | Here we present a learning-based single-image approach for 3D fluid surface reconstruction. |
4 | CvxNet: Learnable Convex Decomposition | Boyang Deng; Kyle Genova; Soroosh Yazdani; Sofien Bouaziz; Geoffrey Hinton; Andrea Tagliasacchi; | We introduce a network architecture to represent a low dimensional family of convexes. |
5 | BSP-Net: Generating Compact Meshes via Binary Space Partitioning | Zhiqin Chen; Andrea Tagliasacchi; Hao Zhang; | The core ingredient of BSP is an operation for recursive subdivision of space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition. |
6 | Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image | Yinyu Nie; Xiaoguang Han; Shihui Guo; Yujian Zheng; Jian Chang; Jian Jun Zhang; | In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image. |
7 | Generating and Exploiting Probabilistic Monocular Depth Estimates | Zhihao Xia; Patrick Sullivan; Ayan Chakrabarti; | Instead, we propose a versatile task-agnostic monocular model that outputs a probability distribution over scene depth given an input color image, as a sample approximation of outputs from a patch-wise conditional VAE. |
8 | Neural Cages for Detail-Preserving 3D Deformations | Wang Yifan; Noam Aigerman; Vladimir G. Kim; Siddhartha Chaudhuri; Olga Sorkine-Hornung; | We propose a novel learnable representation for detail preserving shape deformation. |
9 | PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization | Shunsuke Saito; Tomas Simon; Jason Saragih; Hanbyul Joo; | Due to memory limitations in current hardware, previous approaches tend to take low resolution images as input to cover large spatial context, and produce less precise (or low resolution) 3D estimates as a result. We address this limitation by formulating a multi-level architecture that is end-to-end trainable. |
10 | A Lighting-Invariant Point Processor for Shading | Kathryn Heal; Jialiang Wang; Steven J. Gortler; Todd Zickler; | We describe the geometry of this variety, and we introduce a concise feedforward model that computes an explicit, differentiable approximation of the variety from the intensity and its derivatives at any single image point. |
11 | ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture | Sena Kiciroglu; Helge Rhodin; Sudipta N. Sinha; Mathieu Salzmann; Pascal Fua; | Specifically, given a short video sequence, we introduce an algorithm that predicts which viewpoints should be chosen to capture future frames so as to maximize 3D human pose estimation accuracy. |
12 | Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations | Ziyu Jiang; Buyu Liu; Samuel Schulter; Zhangyang Wang; Manmohan Chandraker; | We address the challenging task of occlusion-aware indoor 3D scene understanding. |
13 | Multi-Modal Domain Adaptation for Fine-Grained Action Recognition | Jonathan Munro; Dima Damen; | In this work we exploit the correspondence of modalities as a self-supervised alignment approach for UDA in addition to adversarial alignment (Fig. 1). |
14 | Evolving Losses for Unsupervised Video Representation Learning | AJ Piergiovanni; Anelia Angelova; Michael S. Ryoo; | We present a new method to learn video representations from large-scale unlabeled video data. |
15 | Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition | Ziyu Liu; Hongwen Zhang; Zhenghao Chen; Zhiyong Wang; Wanli Ouyang; | In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. |
16 | A Multigrid Method for Efficiently Training Video Models | Chao-Yuan Wu; Ross Girshick; Kaiming He; Christoph Feichtenhofer; Philipp Krahenbuhl; | Inspired by multigrid methods in numerical optimization, we propose to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule. |
17 | Ego-Topo: Environment Affordances From Egocentric Video | Tushar Nagarajan; Yanghao Li; Christoph Feichtenhofer; Kristen Grauman; | We introduce a model for environment affordances that is learned directly from egocentric video. |
18 | Generative Hybrid Representations for Activity Forecasting With No-Regret Learning | Jiaqi Guan; Ye Yuan; Kris M. Kitani; Nicholas Rhinehart; | In this work, we develop an efficient deep generative model to jointly forecast a person’s future discrete actions and continuous motions. |
19 | Skeleton-Based Action Recognition With Shift Graph Convolutional Network | Ke Cheng; Yifan Zhang; Xiangyu He; Weihan Chen; Jian Cheng; Hanqing Lu; | In this paper, we propose a novel shift graph convolutional network (Shift-GCN) to overcome both shortcomings. |
20 | Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning | Zhibo Yang; Lihan Huang; Yupei Chen; Zijun Wei; Seoyoung Ahn; Gregory Zelinsky; Dimitris Samaras; Minh Hoai; | We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. |
21 | X3D: Expanding Architectures for Efficient Video Recognition | Christoph Feichtenhofer; | This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. |
22 | Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction | Maosen Li; Siheng Chen; Yangheng Zhao; Ya Zhang; Yanfeng Wang; Qi Tian; | We propose novel dynamic multiscale graph neural networks (DMGNN) to predict 3D skeleton-based human motions. |
23 | Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects | Kiana Ehsani; Shubham Tulsiani; Saurabh Gupta; Ali Farhadi; Abhinav Gupta; | In this paper, we take a step towards more physical understanding of actions. |
24 | DaST: Data-Free Substitute Training for Adversarial Attacks | Mingyi Zhou; Jing Wu; Yipeng Liu; Shuaicheng Liu; Ce Zhu; | In this paper, we propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks without the requirement of any real data. |
25 | Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations | Jeet Mohapatra; Tsui-Wei Weng; Pin-Yu Chen; Sijia Liu; Luca Daniel; | To bridge this gap, we propose Semantify-NN, a model-agnostic and generic robustness verification approach against semantic perturbations for neural networks. |
26 | The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks | Yuheng Zhang; Ruoxi Jia; Hengzhi Pei; Wenxiao Wang; Bo Li; Dawn Song; | Here we present a novel attack method, termed the generative model-inversion attack, which can invert deep neural networks with high success rates. |
27 | A Self-supervised Approach for Adversarial Robustness | Muzammal Naseer; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Fatih Porikli; | In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space. |
28 | Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization | Saehyung Lee; Hyungyu Lee; Sungroh Yoon; | In this paper, we identify Adversarial Feature Overfitting (AFO), which may cause poor adversarially robust generalization, and we show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to AFO in our simple Gaussian model. |
29 | How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework | Xuanqing Liu; Tesi Xiao; Si Si; Qin Cao; Sanjiv Kumar; Cho-Jui Hsieh; | In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE), which naturally incorporates various commonly used regularization mechanisms based on random noise injection. |
30 | Unpaired Image Super-Resolution Using Pseudo-Supervision | Shunta Maeda; | In this paper, we propose an unpaired SR method using a generative adversarial network that does not require a paired/aligned training dataset. |
31 | Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs | Soheil Kolouri; Aniruddha Saha; Hamed Pirsiavash; Heiko Hoffmann; | In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs). |
32 | Robustness Guarantees for Deep Neural Networks on Videos | Min Wu; Marta Kwiatkowska; | In this paper, we consider the robustness of deep neural networks on videos, which comprise both the spatial features of individual frames extracted by a convolutional neural network and the temporal dynamics between adjacent frames captured by a recurrent neural network. |
33 | Benchmarking Adversarial Robustness on Image Classification | Yinpeng Dong; Qi-An Fu; Xiao Yang; Tianyu Pang; Hang Su; Zihao Xiao; Jun Zhu; | In this paper, we establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks. |
34 | What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients | Alvin Chan; Yi Tay; Yew-Soon Ong; | Using only natural images, we show here that training a student model’s input gradients to match those of a robust teacher model can gain robustness close to a strong baseline that is robustly trained from scratch. |
35 | Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking | Hongjun Wang; Guangrun Wang; Ya Li; Dongyu Zhang; Liang Lin; | In this work, we examine the insecurity of current best-performing ReID models by proposing a learning-to-mis-rank formulation to perturb the ranking of the system output. |
36 | Video Modeling With Correlation Networks | Heng Wang; Du Tran; Lorenzo Torresani; Matt Feiszli; | This paper proposes an alternative approach based on a learnable correlation operator that can be used to establish frame-to-frame matches over convolutional feature maps in the different layers of the network. |
37 | Projection & Probability-Driven Black-Box Attack | Jie Li; Rongrong Ji; Hong Liu; Jianzhuang Liu; Bineng Zhong; Cheng Deng; Qi Tian; | In this paper, we propose Projection & Probability-driven Black-box Attack (PPBA) to tackle this problem by reducing the solution space and providing better optimization. |
38 | Auxiliary Training: Towards Accurate and Robust Models | Linfeng Zhang; Muzhou Yu; Tong Chen; Zuoqiang Shi; Chenglong Bao; Kaisheng Ma; | In this paper, we propose a novel training method via introducing the auxiliary classifiers for training on corrupted samples, while the clean samples are normally trained with the primary classifier. |
39 | PaStaNet: Toward Human Activity Knowledge Engine | Yong-Lu Li; Liang Xu; Xinpeng Liu; Xijie Huang; Yue Xu; Shiyi Wang; Hao-Shu Fang; Ze Ma; Mingyang Chen; Cewu Lu; | In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics. |
40 | A Hierarchical Graph Network for 3D Object Detection on Point Clouds | Jintai Chen; Biwen Lei; Qingyu Song; Haochao Ying; Danny Z. Chen; Jian Wu; | In this paper, we propose a new graph convolution (GConv) based hierarchical graph network (HGNet) for 3D object detection, which processes raw point clouds directly to predict 3D bounding boxes. |
41 | Learning Generative Models of Shape Handles | Matheus Gadelha; Giorgio Gori; Duygu Ceylan; Radomir Mech; Nathan Carr; Tamy Boubekeur; Rui Wang; Subhransu Maji; | We present a generative model to synthesize 3D shapes as sets of handles — lightweight proxies that approximate the original 3D shape — for applications in interactive editing, shape parsing, and building compact 3D representations. |
42 | One Man’s Trash Is Another Man’s Treasure: Resisting Adversarial Examples by Adversarial Examples | Chang Xiao; Changxi Zheng; | We embrace the omnipresence of adversarial examples and the numerical procedure of crafting them, and turn this harmful attacking process into a useful defense mechanism. |
43 | Toward a Universal Model for Shape From Texture | Dor Verbin; Todd Zickler; | We consider the shape from texture problem, where the input is a single image of a curved, textured surface, and the texture and shape are both a priori unknown. |
44 | HybridPose: 6D Object Pose Estimation Under Hybrid Representations | Chen Song; Jiaru Song; Qixing Huang; | We introduce HybridPose, a novel 6D object pose estimation approach. |
45 | Boundary-Aware 3D Building Reconstruction From a Single Overhead Image | Jisan Mahmud; True Price; Akash Bapat; Jan-Michael Frahm; | We propose a boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image. |
46 | Articulation-Aware Canonical Surface Mapping | Nilesh Kulkarni; Abhinav Gupta; David F. Fouhey; Shubham Tulsiani; | We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape , and 2) inferring the articulation and pose of the template corresponding to the input image. |
47 | BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion | Fu-En Wang; Yu-Hsuan Yeh; Min Sun; Wei-Chen Chiu; Yi-Hsuan Tsai; | Thus we propose a bi-projection fusion scheme along with learnable masks to balance the feature map from the two projections. |
48 | Transformation GAN for Unsupervised Image Synthesis and Representation Learning | Jiayu Wang; Wengang Zhou; Guo-Jun Qi; Zhongqian Fu; Qi Tian; Houqiang Li; | To improve both image synthesis quality and representation learning performance under the unsupervised setting, in this paper, we propose a simple yet effective Transformation Generative Adversarial Networks (TrGAN). |
49 | PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection | Yue Liao; Si Liu; Fei Wang; Yanjie Chen; Chen Qian; Jiashi Feng; | We propose a single-stage Human-Object Interaction (HOI) detection method that has outperformed all existing methods on HICO-DET dataset at 37 fps on a single Titan XP GPU. |
50 | Height and Uprightness Invariance for 3D Prediction From a Single View | Manel Baradad; Antonio Torralba; | To account for this, we propose a system that directly regresses 3D world coordinates for each pixel. |
51 | SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation | Mohsen Fayyaz; Jurgen Gall; | In order to address this task, we propose an approach that can be trained end-to-end on such data. |
52 | 3DV: 3D Dynamic Voxel for Action Recognition in Depth Video | Yancheng Wang; Yang Xiao; Fu Xiong; Wenxiang Jiang; Zhiguo Cao; Joey Tianyi Zhou; Junsong Yuan; | With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. |
53 | Adaptive Interaction Modeling via Graph Operations Search | Haoxin Li; Wei-Shi Zheng; Yu Tao; Haifeng Hu; Jian-Huang Lai; | In this paper, we automate the process of structures design to learn adaptive structures for interaction modeling. |
54 | Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction | Yuan Yao; Nico Schertler; Enrique Rosales; Helge Rhodin; Leonid Sigal; Alla Sheffer; | In this work, we induce structure and geometric constraints by leveraging three core observations: (1) the surface of most everyday objects is often almost entirely exposed from pairs of typical opposite views; (2) everyday objects often exhibit global reflective symmetries which can be accurately predicted from single views; (3) opposite orthographic views of a 3D shape share consistent silhouettes. |
55 | SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation | Lijun Wang; Jianming Zhang; Oliver Wang; Zhe Lin; Huchuan Lu; | Due to its complexity, we propose a deep neural network model based on a semantic divide-and-conquer approach. |
56 | Single-View View Synthesis With Multiplane Images | Richard Tucker; Noah Snavely; | Our method learns to predict a multiplane image directly from a single image input, and we introduce scale-invariant view synthesis for supervision, enabling us to train on online video. |
57 | Deep Parametric Shape Predictions Using Distance Fields | Dmitriy Smirnov; Matthew Fisher; Vladimir G. Kim; Richard Zhang; Justin Solomon; | Hence, we propose a new framework for predicting parametric shape primitives using deep learning. |
58 | Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction | Yana Hasson; Bugra Tekin; Federica Bogo; Ivan Laptev; Marc Pollefeys; Cordelia Schmid; | To overcome this challenge we present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video. |
59 | Ensemble Generative Cleaning With Feedback Loops for Defending Adversarial Attacks | Jianhe Yuan; Zhihai He; | In this paper, we develop a new method called ensemble generative cleaning with feedback loops (EGC-FL) for effective defense of deep neural networks. |
60 | Temporal Pyramid Network for Action Recognition | Ceyuan Yang; Yinghao Xu; Jianping Shi; Bo Dai; Bolei Zhou; | In this work we propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks in a plug-and-play manner. |
61 | FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction | Haotian Yang; Hao Zhu; Yanru Wang; Mingkai Huang; Qiu Shen; Ruigang Yang; Xun Cao; | In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and propose a novel algorithm that is able to predict elaborate riggable 3D face models from a single image input. |
62 | Structure-Guided Ranking Loss for Single Image Depth Prediction | Ke Xian; Jianming Zhang; Oliver Wang; Long Mai; Zhe Lin; Zhiguo Cao; | To more effectively learn from such pseudo-depth data, we propose to use a simple pair-wise ranking loss with a novel sampling strategy. |
63 | In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction From 2D Landmarks | Heng Yang; Luca Carlone; | We study the problem of 3D shape reconstruction from 2D landmarks extracted in a single image. |
64 | When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks | Minghao Guo; Yuzhe Yang; Rui Xu; Ziwei Liu; Dahua Lin; | In this work, we take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks. |
65 | Towards Transferable Targeted Attack | Maosen Li; Cheng Deng; Tengjiao Li; Junchi Yan; Xinbo Gao; Heng Huang; | To overcome the above problems, we propose a novel targeted attack approach to effectively generate more transferable adversarial examples. |
66 | Self-Supervised Human Depth Estimation From Monocular Videos | Feitong Tan; Hao Zhu; Zhaopeng Cui; Siyu Zhu; Marc Pollefeys; Ping Tan; | This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. |
67 | Recursive Social Behavior Graph for Trajectory Prediction | Jianhua Sun; Qinhong Jiang; Cewu Lu; | In this paper, we present a novel insight of group-based social interaction model to explore relationships among pedestrians. |
68 | Context-Aware and Scale-Insensitive Temporal Repetition Counting | Huaidong Zhang; Xuemiao Xu; Guoqiang Han; Shengfeng He; | In this paper, we tailor a context-aware and scale-insensitive framework, to tackle the challenges in repetition counting caused by the unknown and diverse cycle-lengths. |
69 | OASIS: A Large-Scale Dataset for Single Image 3D in the Wild | Weifeng Chen; Shengyi Qian; David Fan; Noriyuki Kojima; Max Hamilton; Jia Deng; | We address this issue by presenting Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images. |
70 | VPLNet: Deep Single View Normal Estimation With Vanishing Points and Lines | Rui Wang; David Geraghty; Kevin Matzen; Richard Szeliski; Jan-Michael Frahm; | We present a novel single-view surface normal estimation method that combines traditional line and vanishing point analysis with a deep learning approach. |
71 | Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning | Tianlong Chen; Sijia Liu; Shiyu Chang; Yu Cheng; Lisa Amini; Zhangyang Wang; | We introduce adversarial training into self-supervision, to provide general-purpose robust pretrained models for the first time. |
72 | Defending Against Universal Attacks Through Selective Feature Regeneration | Tejas Borkar; Felix Heide; Lina Karam; | Departing from existing defense strategies that work mostly in the image domain, we present a novel defense which operates in the DNN feature domain and effectively defends against such universal perturbations. |
73 | Universal Physical Camouflage Attacks on Object Detectors | Lifeng Huang; Chengying Gao; Yuyin Zhou; Cihang Xie; Alan L. Yuille; Changqing Zou; Ning Liu; | In this paper, we study physical adversarial attacks on object detectors in the wild. |
74 | Intra- and Inter-Action Understanding via Temporal Action Parsing | Dian Shao; Yue Zhao; Bo Dai; Dahua Lin; | Towards this goal, we construct TAPOS, a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top. |
75 | Lightweight Photometric Stereo for Facial Details Recovery | Xueying Wang; Yudong Guo; Bailin Deng; Juyong Zhang; | In this paper, we present a lightweight strategy that only requires sparse inputs or even a single image to recover high-fidelity face shapes with images captured under near-field lights. |
76 | Bundle Pooling for Polygonal Architecture Segmentation Problem | Huayi Zeng; Kevin Joseph; Adam Vest; Yasutaka Furukawa; | This paper introduces a polygonal architecture segmentation problem, proposes bundle-pooling modules for line structure reasoning, and demonstrates a virtual remodeling application that produces production quality results. |
77 | AvatarMe: Realistically Renderable 3D Facial Reconstruction "In-the-Wild" | Alexandros Lattas; Stylianos Moschoglou; Baris Gecer; Stylianos Ploumpis; Vasileios Triantafyllou; Abhijeet Ghosh; Stefanos Zafeiriou; | In this paper, we introduce AvatarMe, the first method that is able to reconstruct photorealistic 3D faces from a single "in-the-wild" image with an increasing level of detail. |
78 | Defending Against Model Stealing Attacks With Adaptive Misinformation | Sanjay Kariyappa; Moinuddin K. Qureshi; | We propose "Adaptive Misinformation" to defend against such model stealing attacks. |
79 | Learning to Generate 3D Training Data Through Hybrid Gradient | Dawei Yang; Jia Deng; | In this work, we propose a new method that optimizes the generation of 3D training data based on what we call "hybrid gradient". |
80 | Cascaded Refinement Network for Point Cloud Completion | Xiaogang Wang; Marcelo H. Ang Jr.; Gim Hee Lee; | To this end, we propose a cascaded refinement network together with a coarse-to-fine strategy to synthesize the detailed object shapes. |
81 | Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder | Guanlin Li; Shuya Ding; Jun Luo; Chang Liu; | In this paper, we propose an attack-agnostic defence framework to enhance the intrinsic robustness of neural networks, without jeopardizing the ability of generalizing clean samples. |
82 | Learning to Discriminate Information for Online Action Detection | Hyunjun Eun; Jinyoung Moon; Jongyoul Park; Chanho Jung; Changick Kim; | For online action detection, in this paper, we propose a novel recurrent unit to explicitly discriminate the information relevant to an ongoing action from others. |
83 | Adversarial Examples Improve Image Recognition | Cihang Xie; Mingxing Tan; Boqing Gong; Jiang Wang; Alan L. Yuille; Quoc V. Le; | Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner. |
84 | PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes | Rundi Wu; Yixin Zhuang; Kai Xu; Hao Zhang; Baoquan Chen; | We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly. |
85 | Actor-Transformers for Group Activity Recognition | Kirill Gavrilyuk; Ryan Sanford; Mehrsan Javan; Cees G. M. Snoek; | While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. |
86 | SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans | Angela Dai; Christian Diller; Matthias Niessner; | We present a novel approach that converts partial and noisy RGB-D scans into high-quality 3D scene reconstructions by inferring unobserved scene geometry. |
87 | Geometry-Aware Satellite-to-Ground Image Synthesis for Urban Areas | Xiaohu Lu; Zuoyue Li; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys; Rongjun Qin; | We present a novel method for generating panoramic street-view images which are geometrically consistent with a given satellite image. |
88 | Action Modifiers: Learning From Adverbs in Instructional Videos | Hazel Doughty; Ivan Laptev; Walterio Mayol-Cuevas; Dima Damen; | We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations. |
89 | ZSTAD: Zero-Shot Temporal Activity Detection | Lingling Zhang; Xiaojun Chang; Jun Liu; Minnan Luo; Sen Wang; Zongyuan Ge; Alexander Hauptmann; | To solve this challenging problem, we propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected. |
90 | Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery | Lei Jin; Yanyu Xu; Jia Zheng; Junfei Zhang; Rui Tang; Shugong Xu; Jingyi Yu; Shenghua Gao; | Motivated by the correlation between the depth and the geometric structure of a 360 indoor image, we propose a novel learning-based depth estimation framework that leverages the geometric structure of a scene to conduct depth estimation. |
91 | Deep Kinematics Analysis for Monocular 3D Human Pose Estimation | Jingwei Xu; Zhenbo Yu; Bingbing Ni; Jiancheng Yang; Xiaokang Yang; Wenjun Zhang; | In this paper, we propose to address above issue in a systematic view. |
92 | TEA: Temporal Excitation and Aggregation for Action Recognition | Yan Li; Bin Ji; Xintian Shi; Jianguo Zhang; Bin Kang; Limin Wang; | In this paper, we propose a Temporal Excitation and Aggregation (TEA) block, including a motion excitation (ME) module and a multiple temporal aggregation (MTA) module, specifically designed to capture both short- and long-range temporal evolution. |
93 | Oops! Predicting Unintentional Action in Video | Dave Epstein; Boyuan Chen; Carl Vondrick; | We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recognizing, localizing, and anticipating its onset. |
94 | Scene Recomposition by Learning-Based ICP | Hamid Izadinia; Steven M. Seitz; | In addition to the fully automatic system, the key technical contribution is a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning. |
95 | Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction | Yantao Lu; Yunhan Jia; Jianyu Wang; Bai Li; Weiheng Chai; Lawrence Carin; Senem Velipasalar; | Our proposed attack minimizes the “dispersion” of the internal feature map, overcoming the limitations of existing attacks, that require task-specific loss functions and/or probing a target model. |
96 | Deep Non-Line-of-Sight Reconstruction | Javier Grau Chopite; Matthias B. Hullin; Michael Wand; Julian Iseringhausen; | In this paper, we employ convolutional feed-forward networks for solving the reconstruction problem efficiently while maintaining good reconstruction quality. |
97 | SSRNet: Scalable 3D Surface Reconstruction Network | Zhenxing Mi; Yiming Luo; Wenbing Tao; | In this paper, we propose the SSRNet, a novel scalable learning-based method for surface reconstruction. |
98 | Progressive Relation Learning for Group Activity Recognition | Guyue Hu; Bo Cui; Yuan He; Shan Yu; | In this paper, we propose a novel method based on deep reinforcement learning to progressively refine the low-level features and high-level relations of group activities. |
99 | Cooling-Shrinking Attack: Blinding the Tracker With Imperceptible Noises | Bin Yan; Dong Wang; Huchuan Lu; Xiaoyun Yang; | In this paper, a cooling-shrinking attack method is proposed to deceive state-of-the-art SiameseRPN-based trackers. |
100 | Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles | Ranjie Duan; Xingjun Ma; Yisen Wang; James Bailey; A. K. Qin; Yun Yang; | In this paper, we propose a novel approach, called Adversarial Camouflage (AdvCam), to craft and camouflage physical-world adversarial examples into natural styles that appear legitimate to human observers. |
101 | Weakly-Supervised Action Localization by Generative Attention Modeling | Baifeng Shi; Qi Dai; Yadong Mu; Jingdong Wang; | To solve the problem, in this paper we propose to model the class-agnostic frame-wise probability conditioned on the frame attention using conditional Variational Auto-Encoder (VAE). |
102 | Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes | Sravanti Addepalli; Vivek B.S.; Arya Baburaj; Gaurang Sriramanan; R. Venkatesh Babu; | In this work, we attempt to address this problem by training networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction. |
103 | Polishing Decision-Based Adversarial Noise With a Customized Sampling | Yucheng Shi; Yahong Han; Qi Tian; | In this paper, we demonstrate the advantage of using current noise and historical queries to customize the variance and mean of sampling in boundary attack to polish adversarial noise. |
104 | Towards Large Yet Imperceptible Adversarial Image Perturbations With Perceptual Color Distance | Zhengyu Zhao; Zhuoran Liu; Martha Larson; | In this work, we drop this assumption by pursuing an approach that exploits human color perception, and more specifically, minimizing perturbation size with respect to perceptual color distance. |
105 | Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks | Joanna Materzynska; Tete Xiao; Roei Herzig; Huijuan Xu; Xiaolong Wang; Trevor Darrell; | In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. |
106 | Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image | Despoina Paschalidou; Luc Van Gool; Andreas Geiger; | We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. |
107 | Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation | Maxim Maximov; Kevin Galim; Laura Leal-Taixe; | In this paper, we tackle this issue by using domain invariant defocus blur as direct supervision. |
108 | Active Vision for Early Recognition of Human Actions | Boyu Wang; Lihan Huang; Minh Hoai; | We propose a method for early recognition of human actions, one that can take advantages of multiple cameras while satisfying the constraints due to limited communication bandwidth and processing power. |
109 | SmallBigNet: Integrating Core and Contextual Views for Video Classification | Xianhang Li; Yali Wang; Zhipeng Zhou; Yu Qiao; | To alleviate this problem, we propose a concise and novel SmallBig network, with the cooperation of small and big views. |
110 | Gate-Shift Networks for Video Action Recognition | Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz; | In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels. |
111 | Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition | Pengfei Zhang; Cuiling Lan; Wenjun Zeng; Junliang Xing; Jianru Xue; Nanning Zheng; | In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition. |
112 | Exploiting Joint Robustness to Adversarial Perturbations | Ali Dabouei; Sobhan Soleymani; Fariborz Taherkhani; Jeremy Dawson; Nasser M. Nasrabadi; | In this paper, we exploit first-order interactions within ensembles to formalize a reliable and practical defense. |
113 | From Image Collections to Point Clouds With Self-Supervised Shape and Pose Networks | K L Navaneet; Ansu Mathew; Shashank Kashyap; Wei-Chih Hung; Varun Jampani; R. Venkatesh Babu; | In this work, we propose a deep learning technique for 3D object reconstruction from a single image. |
114 | Searching for Actions on the Hyperbole | Teng Long; Pascal Mettes; Heng Tao Shen; Cees G. M. Snoek; | In this paper, we introduce hierarchical action search. |
115 | ColorFool: Semantic Adversarial Colorization | Ali Shahin Shamsabadi; Ricardo Sanchez-Matilla; Andrea Cavallaro; | In this paper, we propose a content-based black-box adversarial attack that generates unrestricted perturbations by exploiting image semantics to selectively modify colors within chosen ranges that are perceived as natural by humans. |
116 | Boosting the Transferability of Adversarial Samples via Attention | Weibin Wu; Yuxin Su; Xixian Chen; Shenglin Zhao; Irwin King; Michael R. Lyu; Yu-Wing Tai; | In this work, we propose a novel mechanism to alleviate the overfitting issue. |
117 | ActionBytes: Learning From Trimmed Videos to Localize Actions | Mihir Jain; Amir Ghodrati; Cees G. M. Snoek; | We propose a method to train an action localization network that segments a video into interpretable fragments, we call ActionBytes. |
118 | Efficient Adversarial Training With Transferable Adversarial Examples | Haizhong Zheng; Ziqi Zhang; Juncheng Gu; Honglak Lee; Atul Prakash; | Leveraging this property, we propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs. |
119 | Alleviation of Gradient Exploding in GANs: Fake Can Be Real | Song Tao; Jia Wang; | In order to alleviate the notorious mode collapse phenomenon in generative adversarial networks (GANs), we propose a novel training method of GANs in which certain fake samples are considered as real ones during the training process. |
120 | On Isometry Robustness of Deep 3D Point Cloud Models Under Adversarial Attacks | Yue Zhao; Yuwei Wu; Caihua Chen; Andrew Lim; | Incorporating with the Restricted Isometry Property, we propose a novel framework of white-box attack on top of spectral norm based perturbation. |
121 | Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations | Sven Gowal; Chongli Qin; Po-Sen Huang; Taylan Cemgil; Krishnamurthy Dvijotham; Timothy Mann; Pushmeet Kohli; | In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input. |
122 | QEBA: Query-Efficient Boundary-Based Blackbox Attack | Huichen Li; Xiaojun Xu; Xiaolu Zhang; Shuang Yang; Bo Li; | In this paper, we propose a Query-Efficient Boundary-based blackbox Attack (QEBA) based only on model’s final prediction labels. |
123 | Learning to Simulate Dynamic Environments With GameGAN | Seung Wook Kim; Yuhao Zhou; Jonah Philion; Antonio Torralba; Sanja Fidler; | In this paper, we aim to learn a simulator by simply watching an agent interact with an environment. |
124 | Learn2Perturb: An End-to-End Feature Perturbation Learning to Improve Adversarial Robustness | Ahmadreza Jeddi; Mohammad Javad Shafiee; Michelle Karg; Christian Scharfenberger; Alexander Wong; | In this study, we introduce Learn2Perturb, an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. |
125 | SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization | Yue Jiang; Dantong Ji; Zhizhong Han; Matthias Zwicker; | We propose SDFDiff, a novel approach for image-based shape optimization using differentiable rendering of 3D shapes represented by signed distance functions (SDFs). |
126 | Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes | Zhengqin Li; Yu-Ying Yeh; Manmohan Chandraker; | Our novel contributions include a normal representation that enables the network to model complex light transport through local computation, a rendering layer that models refractions and reflections, a cost volume specifically designed for normal refinement of transparent shapes and a feature mapping based on predicted normals for 3D point cloud reconstruction. |
127 | TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning | Joo Ho Lee; Hyunho Ha; Yue Dong; Xin Tong; Min H. Kim; | In this work, we propose a progressive texture-fusion method specially designed for real-time RGB-D scanning. |
128 | D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry | Nan Yang; Lukas von Stumberg; Rui Wang; Daniel Cremers; | We propose D3VO as a novel framework for monocular visual odometry that exploits deep networks on three levels — deep depth, pose and uncertainty estimation. |
129 | Deep Implicit Volume Compression | Danhang Tang; Saurabh Singh; Philip A. Chou; Christian Hane; Mingsong Dou; Sean Fanello; Jonathan Taylor; Philip Davidson; Onur G. Guleryuz; Yinda Zhang; Shahram Izadi; Andrea Tagliasacchi; Sofien Bouaziz; Cem Keskin; | We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures. |
130 | MAGSAC++, a Fast, Reliable and Accurate Robust Estimator | Daniel Barath; Jana Noskova; Maksym Ivashechkin; Jiri Matas; | We propose MAGSAC++ and Progressive NAPSAC sampler, P-NAPSAC in short. |
131 | OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression | Lila Huang; Shenlong Wang; Kelvin Wong; Jerry Liu; Raquel Urtasun; | We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds. |
132 | 4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras | Yuxiang Zhang; Liang An; Tao Yu; Xiu Li; Kun Li; Yebin Liu; | This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs. |
133 | Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion | Gengshan Yang; Deva Ramanan; | We describe an approach for upgrading 2D optical flow to 3D scene flow. |
134 | Robust 3D Self-Portraits in Seconds | Zhe Li; Tao Yu; Chuanyu Pan; Zerong Zheng; Yebin Liu; | In this paper, we propose an efficient method for robust 3D self-portraits using a single RGBD camera. |
135 | FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation | Matias Tassano; Julie Delon; Thomas Veit; | In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture. |
136 | Learning to Have an Ear for Face Super-Resolution | Givi Meishvili; Simon Jenni; Paolo Favaro; | We propose a novel method to use both audio and a low-resolution image to perform extreme face super-resolution (a 16x increase of the input size). |
137 | Deep Optics for Single-Shot High-Dynamic-Range Imaging | Christopher A. Metzler; Hayato Ikoma; Yifan Peng; Gordon Wetzstein; | Inspired by recent deep optical imaging approaches, we interpret this problem as jointly training an optical encoder and electronic decoder where the encoder is parameterized by the point spread function (PSF) of the lens, the bottleneck is the sensor with a limited dynamic range, and the decoder is a convolutional neural network (CNN). |
138 | Learning Rank-1 Diffractive Optics for Single-Shot High Dynamic Range Imaging | Qilin Sun; Ethan Tseng; Qiang Fu; Wolfgang Heidrich; Felix Heide; | In this work, we propose a method for snapshot HDR imaging by learning an optical HDR encoding in a single image which maps saturated highlights into neighboring unsaturated areas using a diffractive optical element (DOE). |
139 | Deep White-Balance Editing | Mahmoud Afifi; Michael S. Brown; | We introduce a deep learning approach to realistically edit an sRGB image’s white balance. |
140 | Non-Line-of-Sight Surface Reconstruction Using the Directional Light-Cone Transform | Sean I. Young; David B. Lindell; Bernd Girod; David Taubman; Gordon Wetzstein; | We propose a joint albedo-normal approach to non-line-of-sight (NLOS) surface reconstruction using the directional light-cone transform (D-LCT). |
141 | Seeing the World in a Bag of Chips | Jeong Joon Park; Aleksander Holynski; Steven M. Seitz; | Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. |
142 | Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers | Shady Abu Hussein; Tom Tirer; Raja Giryes; | Inspired by the literature on generalized sampling, in this work we propose a method for improving the performance of DNNs that have been trained with a fixed kernel on observations acquired by other kernels. |
143 | Retina-Like Visual Image Reconstruction via Spiking Neural Model | Lin Zhu; Siwei Dong; Jianing Li; Tiejun Huang; Yonghong Tian; | In this paper, we design a retina-like visual image reconstruction framework, which is flexible in reconstructing full texture of natural scenes from the totally new spike data. |
144 | Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging | Xin Yuan; Yang Liu; Jinli Suo; Qionghai Dai; | In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. |
145 | Neural Network Pruning With Residual-Connections and Limited-Data | Jian-Hao Luo; Jianxin Wu; | In order to avoid the influence of label noise, we propose a label refinement approach to solve this problem. |
146 | AdderNet: Do We Really Need Multiplications in Deep Learning? | Hanting Chen; Yunhe Wang; Chunjing Xu; Boxin Shi; Chao Xu; Qi Tian; Chang Xu; | In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. |
147 | NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks | Eugene Lee; Chen-Yi Lee; | In this work, we attempt to search for the neuron (filter) configuration of a fixed network architecture that maximizes accuracy. |
148 | Training Quantized Neural Networks With a Full-Precision Auxiliary Module | Bohan Zhuang; Lingqiao Liu; Mingkui Tan; Chunhua Shen; Ian Reid; | In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function. |
149 | Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model | Dongdong Wang; Yandong Li; Liqiang Wang; Boqing Gong; | To tackle these challenges, we propose an approach that blends mixup and active learning. |
150 | Multi-Dimensional Pruning: A Unified Framework for Model Compression | Jinyang Guo; Wanli Ouyang; Dong Xu; | In this work, we propose a unified model compression framework called Multi-Dimensional Pruning (MDP) to simultaneously compress the convolutional neural networks (CNNs) on multiple dimensions. |
151 | Towards Efficient Model Compression via Learned Global Ranking | Ting-Wu Chin; Ruizhou Ding; Cha Zhang; Diana Marculescu; | To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters. |
152 | HRank: Filter Pruning Using High-Rank Feature Map | Mingbao Lin; Rongrong Ji; Yan Wang; Yichen Zhang; Baochang Zhang; Yonghong Tian; Ling Shao; | In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). |
153 | DMCP: Differentiable Markov Channel Pruning for Neural Networks | Shaopeng Guo; Yujie Wang; Quanquan Li; Junjie Yan; | In this paper, we propose a novel differentiable method for channel pruning, named Differentiable Markov Channel Pruning (DMCP), to efficiently search the optimal sub-structure. |
154 | ReSprop: Reuse Sparsified Backpropagation | Negar Goli; Tor M. Aamodt; | In this work, we focus on accelerating training by observing that about 90% of gradients are reusable during training. |
155 | Adversarial Texture Optimization From RGB-D Scans | Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu "Max" Jiang, Leonidas J. Guibas, Matthias Niessner, Thomas Funkhouser; | In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views. |
156 | Synchronizing Probability Measures on Rotations via Optimal Transport | Tolga Birdal; Michael Arbel; Umut Simsekli; Leonidas J. Guibas; | We propose a nonparametric Riemannian particle optimization approach to solve the problem. |
157 | GhostNet: More Features From Cheap Operations | Kai Han; Yunhe Wang; Qi Tian; Jianyuan Guo; Chunjing Xu; Chang Xu; | This paper proposes a novel Ghost module to generate more feature maps from cheap operations. |
158 | Attention-Aware Multi-View Stereo | Keyang Luo; Tao Guan; Lili Ju; Yuesong Wang; Zhuo Chen; Yawei Luo; | In this paper, we propose an attention-aware deep neural network "AttMVS" for learning multi-view stereo. |
159 | Bi3D: Stereo Depth Estimation via Binary Classifications | Abhishek Badki; Alejandro Troccoli; Kihwan Kim; Jan Kautz; Pradeep Sen; Orazio Gallo; | We present Bi3D, a method that estimates depth via a series of binary classifications. |
160 | Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging | Zihao W. Wang; Peiqi Duan; Oliver Cossairt; Aggelos Katsaggelos; Tiejun Huang; Boxin Shi; | We present a novel computational imaging system with high resolution and low noise. |
161 | SGAS: Sequential Greedy Architecture Search | Guohao Li; Guocheng Qian; Itzel C. Delgadillo; Matthias Muller; Ali Thabet; Bernard Ghanem; | Aiming to alleviate this common issue, we introduce sequential greedy architecture search (SGAS), an efficient method for neural architecture search. |
162 | HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection | Maosheng Ye; Shuangjie Xu; Tongyi Cao; | We present Hybrid Voxel Network (HVNet), a novel one-stage unified network for point cloud based 3D object detection for autonomous driving. |
163 | Frequency Domain Compact 3D Convolutional Neural Networks | Hanting Chen; Yunhe Wang; Han Shu; Yehui Tang; Chunjing Xu; Boxin Shi; Chao Xu; Qi Tian; Chang Xu; | In this paper, we develop a novel approach for eliminating redundancy in the time dimensionality of 3D convolution filters by converting them into the frequency domain through a series of learned optimal transforms with extremely fewer parameters. |
164 | Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline | Yu-Lun Liu; Wei-Sheng Lai; Yu-Sheng Chen; Yi-Lung Kao; Ming-Hsuan Yang; Yung-Yu Chuang; Jia-Bin Huang; | In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model. |
165 | DNU: Deep Non-Local Unrolling for Computational Spectral Imaging | Lizhi Wang; Chen Sun; Maoqing Zhang; Ying Fu; Hua Huang; | In this paper, we propose an interpretable neural network for computational spectral imaging. |
166 | Single Image Optical Flow Estimation With an Event Camera | Liyuan Pan; Miaomiao Liu; Richard Hartley; | In this paper, we propose a single image (potentially blurred) and events based optical flow estimation approach. |
167 | Multi-View Neural Human Rendering | Minye Wu; Yuehao Wang; Qiang Hu; Jingyi Yu; | We present an end-to-end Neural Human Renderer (NHR) for dynamic human captures under the multi-view setting. |
168 | Depth Sensing Beyond LiDAR Range | Kai Zhang; Jiaxin Xie; Noah Snavely; Qifeng Chen; | To that end, we propose a novel three-camera system that utilizes small field of view cameras. |
169 | Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras | R. Wes Baldwin; Mohammed Almatrafi; Vijayan Asari; Keigo Hirakawa; | This paper presents a novel method for labeling real-world neuromorphic camera sensor data by calculating the likelihood of generating an event at each pixel within a short time window, which we refer to as "event probability mask" or EPM. |
170 | Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud | Weijing Shi; Raj Rajkumar; | In this paper, we propose a graph neural network to detect objects from a LiDAR point cloud. |
171 | Self-Learning Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence | Wenhan Yang; Robby T. Tan; Shiqi Wang; Jiaying Liu; | In this paper, we address the problem of rain streaks removal in video by developing a self-learned rain streak removal method, which does not require any clean groundtruth images in the training process. |
172 | Neuromorphic Camera Guided High Dynamic Range Imaging | Jin Han; Chu Zhou; Peiqi Duan; Yehui Tang; Chang Xu; Chao Xu; Tiejun Huang; Boxin Shi; | In this paper, we propose a neuromorphic camera guided high dynamic range imaging pipeline, and a network consisting of specially designed modules according to each step in the pipeline, which bridges the domain gaps on resolution, dynamic range, and color representation between two types of sensors and images. |
173 | Learning in the Frequency Domain | Kai Xu; Minghai Qin; Fei Sun; Yuhao Wang; Yen-Kuang Chen; Fengbo Ren; | Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. |
174 | Polarized Reflection Removal With Perfect Alignment in the Wild | Chenyang Lei; Xuhua Huang; Mengdi Zhang; Qiong Yan; Wenxiu Sun; Qifeng Chen; | We present a novel formulation to removing reflection from polarized images in the wild. |
175 | Learning Multiview 3D Point Cloud Registration | Zan Gojcic; Caifa Zhou; Jan D. Wegner; Leonidas J. Guibas; Tolga Birdal; | We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm. |
176 | A Sparse Resultant Based Method for Efficient Minimal Solvers | Snehal Bhayani; Zuzana Kukelova; Janne Heikkila; | In this paper we study an alternative algebraic method for solving systems of polynomial equations, i.e., the sparse resultant-based method and propose a novel approach to convert the resultant constraint to an eigenvalue problem. |
177 | Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement | Chunle Guo; Chongyi Li; Jichang Guo; Chen Change Loy; Junhui Hou; Sam Kwong; Runmin Cong; | The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network. |
178 | BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks | Yao Yao; Zixin Luo; Shiwei Li; Jingyang Zhang; Yufan Ren; Lei Zhou; Tian Fang; Long Quan; | In this paper, we introduce BlendedMVS, a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS. |
179 | Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis | Zhi-Hao Lin; Sheng-Yu Huang; Yu-Chiang Frank Wang; | In this paper, we propose 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced. |
180 | A Semi-Supervised Assessor of Neural Architectures | Yehui Tang; Yunhe Wang; Yixing Xu; Hanting Chen; Boxin Shi; Chao Xu; Chunjing Xu; Qi Tian; Chang Xu; | In contrast with classical performance predictor optimized in a fully supervised way, this paper suggests a semi-supervised assessor of neural architectures. |
181 | Learning a Reinforced Agent for Flexible Exposure Bracketing Selection | Zhouxia Wang; Jiawei Zhang; Mude Lin; Jiong Wang; Ping Luo; Jimmy Ren; | Unlike previous methods that have many restrictions such as requiring camera response function, sensor noise model, and a stream of preview images with different exposures (not accessible in some scenarios e.g. mobile applications), we propose a novel deep neural network to automatically select exposure bracketing, named EBSNet, which is sufficiently flexible without having the above restrictions. |
182 | CARS: Continuous Evolution for Efficient Neural Architecture Search | Zhaohui Yang; Yunhe Wang; Xinghao Chen; Boxin Shi; Chao Xu; Chunjing Xu; Qi Tian; Chang Xu; | In contrast, we develop an efficient continuous evolutionary approach for searching neural networks. |
183 | Joint 3D Instance Segmentation and Object Detection for Autonomous Driving | Dingfu Zhou; Jin Fang; Xibin Song; Liu Liu; Junbo Yin; Yuchao Dai; Hongdong Li; Ruigang Yang; | To tackle this problem, we propose a simple but practical detection framework to jointly predict the 3D BBox and instance segmentation. |
184 | View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis | Xin Wei; Ruixuan Yu; Jian Sun; | In this work, we propose a novel view-based Graph Convolutional Neural Network, dubbed as view-GCN, to recognize 3D shape based on graph representation of multiple views in flexible view configurations. |
185 | Collaborative Distillation for Ultra-Resolution Universal Style Transfer | Huan Wang; Yijun Li; Yuehai Wang; Haoji Hu; Ming-Hsuan Yang; | In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters. |
186 | TomoFluid: Reconstructing Dynamic Fluid From Sparse View Videos | Guangming Zang; Ramzi Idoughi; Congli Wang; Anthony Bennett; Jianguo Du; Scott Skeen; William L. Roberts; Peter Wonka; Wolfgang Heidrich; | In this paper, we present a state-of-the-art 4D tomographic reconstruction framework that integrates several regularizers into a multi-scale matrix free optimization algorithm. |
187 | Instance Shadow Detection | Tianyu Wang; Xiaowei Hu; Qiong Wang; Pheng-Ann Heng; Chi-Wing Fu; | Second, we design LISA, named after Light-guided Instance Shadow-object Association, an end-to-end framework to automatically predict the shadow and object instances, together with the shadow-object associations and light direction. Then, we pair up the predicted shadow and object instances, and match them with the predicted shadow-object associations to generate the final results. |
188 | Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image | Yuhui Quan; Mingqin Chen; Tongyao Pang; Hui Ji; | Taking one step further, this paper proposes a self-supervised learning method which only uses the input noisy image itself for training. |
189 | Discrete Model Compression With Resource Constraint for Deep Neural Networks | Shangqian Gao; Feihu Huang; Jian Pei; Heng Huang; | In this paper, we target to address the problem of compression and acceleration of Convolutional Neural Networks (CNNs). |
190 | Structured Compression by Weight Encryption for Unstructured Pruning and Quantization | Se Jung Kwon; Dongsoo Lee; Byeongwook Kim; Parichay Kapoor; Baeseong Park; Gu-Yeon Wei; | This paper proposes a new weight representation scheme for Sparse Quantized Neural Networks, specifically achieved by fine-grained and unstructured pruning method. |
191 | End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds | Lei Li; Siyu Zhu; Hongbo Fu; Ping Tan; Chiew-Lan Tai; | In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds. |
192 | Minimal Solutions for Relative Pose With a Single Affine Correspondence | Banglei Guan; Ji Zhao; Zhang Li; Fang Sun; Friedrich Fraundorfer; | In this paper we present four cases of minimal solutions for two-view relative pose estimation by exploiting the affine transformation between feature points and we demonstrate efficient solvers for these cases. |
193 | Point Cloud Completion by Skip-Attention Network With Hierarchical Folding | Xin Wen; Tianyang Li; Zhizhong Han; Yu-Shen Liu; | To address this problem, we propose Skip-Attention Network (SA-Net) for 3D point cloud completion. |
194 | Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement | Zehao Yu; Shenghua Gao; | Towards this end, this paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS. |
195 | AANet: Adaptive Aggregation Network for Efficient Stereo Matching | Haofei Xu; Juyong Zhang; | In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy. |
196 | Towards Unified INT8 Training for Convolutional Neural Network | Feng Zhu; Ruihao Gong; Fengwei Yu; Xianglong Liu; Yanfei Wang; Zhelong Li; Xiuqi Yang; Junjie Yan; | In this paper, we give an attempt to build a unified 8-bit (INT8) training framework for common convolutional neural networks from the aspects of both accuracy and speed. |
197 | Active 3D Motion Visualization Based on Spatiotemporal Light-Ray Integration | Fumihiko Sakaue; Jun Sato; | In this paper, we propose a method of visualizing 3D motion with zero latency. |
198 | Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation | Changlin Li; Jiefeng Peng; Liuchun Yuan; Guangrun Wang; Xiaodan Liang; Liang Lin; Xiaojun Chang; | In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates. |
199 | GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet | Shan You; Tao Huang; Mingmin Yang; Fei Wang; Chen Qian; Changshui Zhang; | In this paper, instead of covering all paths, we ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data. |
200 | Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration | Yang He; Yuhang Ding; Ping Liu; Linchao Zhu; Hanwang Zhang; Yi Yang; | In this paper, we propose Learning Filter Pruning Criteria (LFPC) to solve the above problems. |
201 | DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing | Shaohui Liu; Yinda Zhang; Songyou Peng; Boxin Shi; Marc Pollefeys; Zhaopeng Cui; | We propose a differentiable sphere tracing algorithm to bridge the gap between inverse graphics methods and the recently proposed deep learning based implicit signed distance function. |
202 | Visually Imbalanced Stereo Matching | Yicun Liu; Jimmy Ren; Jiawei Zhang; Jianbo Liu; Mude Lin; | To avoid such collapse, we propose a solution to recover the stereopsis by a joint guided-view-restoration and stereo-reconstruction framework. |
203 | Mesh-Guided Multi-View Stereo With Pyramid Architecture | Yuesong Wang; Tao Guan; Zhuo Chen; Yawei Luo; Keyang Luo; Lili Ju; | To overcome this difficulty, we propose a mesh-guided MVS method with pyramid architecture, which makes use of the surface mesh obtained from coarse-scale images to guide the reconstruction process. |
204 | BiDet: An Efficient Binarized Object Detector | Ziwei Wang; Ziyi Wu; Jiwen Lu; Jie Zhou; | In this paper, we propose a binarized neural network learning method called BiDet for efficient object detection. |
205 | Local Non-Rigid Structure-From-Motion From Diffeomorphic Mappings | Shaifali Parashar; Mathieu Salzmann; Pascal Fua; | We propose a new formulation to non-rigid structure-from-motion that only requires the deforming surface to preserve its differential structure. |
206 | Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar | Nicolas Scheiner; Florian Kraus; Fangyin Wei; Buu Phan; Fahim Mannan; Nils Appenrodt; Werner Ritter; Jurgen Dickmann; Klaus Dietmayer; Bernhard Sick; Felix Heide; | In this work, we depart from visible-wavelength approaches and demonstrate detection, classification, and tracking of hidden objects in large-scale dynamic environments using Doppler radars that can be manufactured at low-cost in series production. |
207 | APQ: Joint Search for Network Architecture, Pruning and Quantization Policy | Tianzhe Wang; Kuan Wang; Han Cai; Ji Lin; Zhijian Liu; Hanrui Wang; Yujun Lin; Song Han; | We present APQ, a novel design methodology for efficient deep learning deployment. |
208 | On the Acceleration of Deep Learning Model Parallelism With Staleness | An Xu; Zhouyuan Huo; Heng Huang; | In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges. |
209 | RevealNet: Seeing Behind Objects in RGB-D Scans | Ji Hou; Angela Dai; Matthias Niessner; | We tackle this problem by introducing RevealNet, a new data-driven approach that jointly detects object instances and predicts their complete geometry. |
210 | MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning | Peiye Liu; Bo Wu; Huadong Ma; Mingoo Seok; | To address this challenge, we propose MemNAS, a novel growing and trimming based neural architecture search framework that optimizes not only performance but also memory requirement of an inference network. |
211 | StegaStamp: Invisible Hyperlinks in Physical Photographs | Matthew Tancik; Ben Mildenhall; Ren Ng; | Our key technical contribution is StegaStamp, a learned steganographic algorithm to enable robust encoding and decoding of arbitrary hyperlink bitstrings into photos in a manner that approaches perceptual invisibility. |
212 | L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks | Yuning You; Tianlong Chen; Zhangyang Wang; Yang Shen; | In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities. |
213 | Polarized Non-Line-of-Sight Imaging | Kenichiro Tanaka; Yasuhiro Mukaigawa; Achuta Kadambi; | This paper presents a method of passive non-line-of-sight (NLOS) imaging using polarization cues. |
214 | AdaBits: Neural Network Quantization With Adaptive Bit-Widths | Qing Jin; Linjie Yang; Zhenyu Liao; | In this paper, we investigate a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model. |
215 | Multi-Scale Boosted Dehazing Network With Dense Feature Fusion | Hang Dong; Jinshan Pan; Lei Xiang; Zhe Hu; Xinyi Zhang; Fei Wang; Ming-Hsuan Yang; | In this paper, we propose a Multi-Scale Boosted Dehazing Network with Dense Feature Fusion based on the U-Net architecture. |
216 | ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings | Jiahui Huang; Sheng Yang; Tai-Jiang Mu; Shi-Min Hu; | We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. |
217 | Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach | Haichuan Yang; Shupeng Gui; Yuhao Zhu; Ji Liu; | In this paper, we propose a framework to jointly prune and quantize the DNNs automatically according to a target model size without using any hyper-parameters to manually set the compression ratio for each layer. |
218 | Normal Assisted Stereo Depth Estimation | Uday Kusupati; Shuo Cheng; Rui Chen; Hao Su; | In this paper, we study how to enforce the consistency between surface normal and depth at training time to improve the performance. |
219 | Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach | Zhe Zhang; Chunyu Wang; Wenhu Qin; Wenjun Zeng; | We present a geometric approach to reinforce the visual features of each pair of joints based on the IMUs. |
220 | gDLS*: Generalized Pose-and-Scale Estimation Given Scale and Gravity Priors | Victor Fragoso; Joseph DeGol; Gang Hua; | We present gDLS*, a generalized-camera-model pose-and-scale estimator that utilizes rotation and scale priors. |
221 | Embodied Language Grounding With 3D Visual Feature Representations | Mihir Prabhudesai; Hsiao-Yu Fish Tung; Syed Ashar Javed; Maximilian Sieb; Adam W. Harley; Katerina Fragkiadaki; | We present generative models that condition on the dependency tree of an utterance and generate a corresponding visual 3D feature map as well as reason about its plausibility, and detector models that condition on both the dependency tree of an utterance and a related image and localize the object referents in the 3D feature map inferred from the image. |
222 | Learning to Autofocus | Charles Herrmann; Richard Strong Bowen; Neal Wadhwa; Rahul Garg; Qiurui He; Jonathan T. Barron; Ramin Zabih; | We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning. |
223 | Joint Demosaicing and Denoising With Self Guidance | Lin Liu; Xu Jia; Jianzhuang Liu; Qi Tian; | In this paper, we propose a self-guidance network (SGNet), where the green channels are initially estimated and then works as a guidance to recover all missing values in the input image. |
224 | Forward and Backward Information Retention for Accurate Binary Neural Networks | Haotong Qin; Ruihao Gong; Xianglong Liu; Mingzhu Shen; Ziran Wei; Fengwei Yu; Jingkuan Song; | To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. |
225 | Light Field Spatial Super-Resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization | Jing Jin; Junhui Hou; Jie Chen; Sam Kwong; | In this paper, we propose a novel learning-based LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding. |
226 | A Multi-Hypothesis Approach to Color Constancy | Daniel Hernandez-Juarez; Sarah Parisot; Benjamin Busam; Ales Leonardis; Gregory Slabaugh; Steven McDonagh; | We propose a Bayesian framework that naturally handles color constancy ambiguity via a multi-hypothesis strategy. |
227 | Learning to Restore Low-Light Images via Decomposition-and-Enhancement | Ke Xu; Xin Yang; Baocai Yin; Rynson W.H. Lau; | Based on this model, we present a novel network that first learns to recover image objects in the low-frequency layer and then enhances high-frequency details based on the recovered image objects. |
228 | Background Matting: The World Is Your Green Screen | Soumyadip Sengupta; Vivek Jayaram; Brian Curless; Steven M. Seitz; Ira Kemelmacher-Shlizerman; | We propose a method for creating a matte – the per-pixel foreground color and alpha – of a person by taking photos or videos in an everyday setting with a handheld camera. |
229 | Supervised Raw Video Denoising With a Benchmark Dataset on Dynamic Scenes | Huanjing Yue; Cong Cao; Lei Liao; Ronghe Chu; Jingyu Yang; | In this paper, we solve this problem by creating motions for controllable objects, such as toys, and capturing each static moment for multiple times to generate clean video frames. |
230 | Photometric Stereo via Discrete Hypothesis-and-Test Search | Kenji Enomoto; Michael Waechter; Kiriakos N. Kutulakos; Yasuyuki Matsushita; | In this paper, we consider the problem of estimating surface normals of a scene with spatially varying, general BRDFs observed by a static camera under varying, known, distant illumination. |
231 | Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference | Thomas Verelst; Tinne Tuytelaars; | To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image. |
232 | Fixed-Point Back-Propagation Training | Xishan Zhang; Shaoli Liu; Rui Zhang; Chang Liu; Di Huang; Shiyi Zhou; Jiaming Guo; Qi Guo; Zidong Du; Tian Zhi; Yunji Chen; | In this paper, we propose a novel training approach, which applies a layer-wise precision-adaptive quantization in deep neural networks. |
233 | Heterogeneous Knowledge Distillation Using Information Flow Modeling | Nikolaos Passalis; Maria Tzelepi; Anastasios Tefas; | In this paper we propose a novel KD method that works by modeling the information flow through the various layers of the teacher model and then train a student model to mimic this information flow. |
234 | Rethinking Differentiable Search for Mixed-Precision Neural Networks | Zhaowei Cai; Nuno Vasconcelos; | In this work, the problem of optimal mixed-precision network search (MPS) is considered. |
235 | Residual Feature Aggregation Network for Image Super-Resolution | Jie Liu; Wenjie Zhang; Yuting Tang; Jie Tang; Gangshan Wu; | To address this issue, we propose a novel residual feature aggregation (RFA) framework for more efficient feature extraction. |
236 | Resolution Adaptive Networks for Efficient Inference | Le Yang; Yizeng Han; Xi Chen; Shiji Song; Jifeng Dai; Gao Huang; | In this paper, we focus on spatial redundancy of input samples and propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs containing large objects with prototypical features, while only some "hard" samples need spatially detailed information. |
237 | Learning to Forget for Meta-Learning | Sungyong Baik; Seokil Hong; Kyoung Mu Lee; | Thus, we propose task-and-layer-wise attenuation on the compromised initialization to reduce its influence. |
238 | Deep Learning for Handling Kernel/model Uncertainty in Image Deconvolution | Yuesong Nan; Hui Ji; | Based on an error-in-variable (EIV) model of image blurring that takes kernel error into consideration, this paper presents a deep learning method for deconvolution, which unrolls a total-least-squares (TLS) estimator whose relating priors are learned by neural networks (NNs). |
239 | Reflection Scene Separation From a Single Image | Renjie Wan; Boxin Shi; Haoliang Li; Ling-Yu Duan; Alex C. Kot; | In this paper, instead of removing reflection components from the mixture image, we aim at recovering reflection scenes from the mixture image. |
240 | Wavelet Synthesis Net for Disparity Estimation to Synthesize DSLR Calibre Bokeh Effect on Smartphones | Chenchi Luo; Yingmao Li; Kaimo Lin; George Chen; Seok-Jun Lee; Jihwan Choi; Youngjun Francis Yoo; Michael O. Polley; | Empowered by a novel wavelet synthesis network architecture, we have greatly narrowed the gap between DSLR and smartphone camera in terms of the bokeh more than ever before. |
241 | Bundle Adjustment on a Graph Processor | Joseph Ortiz; Mark Pupilli; Stefan Leutenegger; Andrew J. Davison; | We show for the first time that the classical computer vision problem of bundle adjustment (BA) can be solved extremely fast on a graph processor using Gaussian Belief Propagation. |
242 | 3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset | Malte Pedersen; Joakim Bruslund Haurum; Stefan Hein Bengtson; Thomas B. Moeslund; | In this work we present a novel publicly available stereo based 3D RGB dataset for multi-object zebrafish tracking, called 3D-ZeF. |
243 | PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models | Sachit Menon; Alexandru Damian; Shijia Hu; Nikhil Ravi; Cynthia Rudin; | We present a novel super-resolution algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature. |
244 | Scalability in Perception for Autonomous Driving: Waymo Open Dataset | Pei Sun; Henrik Kretzschmar; Xerxes Dotiwalla; Aurelien Chouard; Vijaysai Patnaik; Paul Tsui; James Guo; Yin Zhou; Yuning Chai; Benjamin Caine; Vijay Vasudevan; Wei Han; Jiquan Ngiam; Hang Zhao; Aleksei Timofeev; Scott Ettinger; Maxim Krivokon; Amy Gao; Aditya Joshi; Yu Zhang; Jonathon Shlens; Zhifeng Chen; Dragomir Anguelov; | In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. |
245 | Extreme Relative Pose Network Under Hybrid Representations | Zhenpei Yang; Siming Yan; Qixing Huang; | In this paper, we introduce a novel RGB-D based relative pose estimation approach that is suitable for small-overlapping or non-overlapping scans and can output multiple relative poses. |
246 | Single-Shot Monocular RGB-D Imaging Using Uneven Double Refraction | Andreas Meuleman; Seung-Hwan Baek; Felix Heide; Min H. Kim; | In this work, we propose a method for monocular single-shot RGB-D imaging. |
247 | Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image | Zhengqin Li; Mohammad Shafiei; Ravi Ramamoorthi; Kalyan Sunkavalli; Manmohan Chandraker; | We propose a deep inverse rendering framework for indoor scenes. |
248 | 3D Packing for Self-Supervised Monocular Depth Estimation | Vitor Guizilini; Rares Ambrus; Sudeep Pillai; Allan Raventos; Adrien Gaidon; | In this work, we propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos. |
249 | Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching | Xiaodong Gu; Zhiwen Fan; Siyu Zhu; Zuozhuo Dai; Feitong Tan; Ping Tan; | In this paper, we propose a both memory and time efficient cost volume formulation that is complementary to existing multi-view stereo and stereo matching approaches based on 3D cost volumes. |
250 | From Two Rolling Shutters to One Global Shutter | Cenek Albl; Zuzana Kukelova; Viktor Larsson; Michal Polic; Tomas Pajdla; Konrad Schindler; | We explore a surprisingly simple camera configuration that makes it possible to undo the rolling shutter distortion: two cameras mounted to have different rolling shutter directions. |
251 | Deep Global Registration | Christopher Choy; Wei Dong; Vladlen Koltun; | We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans. |
252 | Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness | Shuo Cheng; Zexiang Xu; Shilin Zhu; Zhuwen Li; Li Erran Li; Ravi Ramamoorthi; Hao Su; | We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images. |
253 | Why Having 10,000 Parameters in Your Camera Model Is Better Than Twelve | Thomas Schops; Viktor Larsson; Marc Pollefeys; Torsten Sattler; | We propose a calibration pipeline for generic models that is fully automated, easy to use, and can act as a drop-in replacement for parametric calibration, with a focus on accuracy. |
254 | Blur Aware Calibration of Multi-Focus Plenoptic Camera | Mathieu Labussiere; Celine Teuliere; Frederic Bernardin; Omar Ait-Aider; | This paper presents a novel calibration algorithm for Multi-Focus Plenoptic Cameras (MFPCs) using raw images only. |
255 | Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields | Jinglei Shi; Xiaoran Jiang; Christine Guillemot; | In this paper, we present a learning-based framework for light field view synthesis from a subset of input views. |
256 | SAL: Sign Agnostic Learning of Shapes From Raw Data | Matan Atzmon; Yaron Lipman; | In this paper we introduce Sign Agnostic Learning (SAL), a deep learning approach for learning implicit shape representations directly from raw, unsigned geometric data, such as point clouds and triangle soups. |
257 | Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval | Tobias Weyand; Andre Araujo; Bingyi Cao; Jack Sim; | We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. |
258 | Instance Guided Proposal Network for Person Search | Wenkai Dong; Zhaoxiang Zhang; Chunfeng Song; Tieniu Tan; | In this paper, we propose a new detection network for person search, named Instance Guided Proposal Network (IGPN), which can learn the similarity between query persons and proposals. |
259 | Which Is Plagiarism: Fashion Image Retrieval Based on Regional Representation for Design Protection | Yining Lang; Yuan He; Fan Yang; Jianfeng Dong; Hui Xue; | Different from the existing works that mainly focus on identical or similar fashion item retrieval, in this paper, we aim to study the plagiarized clothes retrieval which is somewhat ignored in the academic community while itself has great application value. |
260 | Inter-Task Association Critic for Cross-Resolution Person Re-Identification | Zhiyi Cheng; Qi Dong; Shaogang Gong; Xiatian Zhu; | In this paper, we introduce a novel model training regularisation method, called Inter-Task Association Critic (INTACT), to address this fundamental problem. |
261 | FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding | Dian Shao; Yue Zhao; Bo Dai; Dahua Lin; | To take action recognition to a new level, we develop FineGym, a new dataset built on top of gymnasium videos. |
262 | Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition | Frederik Warburg; Soren Hauberg; Manuel Lopez-Antequera; Pau Gargallo; Yubin Kuang; Javier Civera; | We contribute with Mapillary Street-Level Sequences (SLS), a large dataset for urban and suburban place recognition from image sequences. |
263 | BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning | Fisher Yu; Haofeng Chen; Xin Wang; Wenqi Xian; Yingying Chen; Fangchen Liu; Vashisht Madhavan; Trevor Darrell; | We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. |
264 | Rethinking Computer-Aided Tuberculosis Diagnosis | Yun Liu; Yu-Huan Wu; Yunfeng Ban; Huifang Wang; Ming-Ming Cheng; | To solve this problem, we establish a large-scale TB dataset, namely Tuberculosis X-ray (TBX11K) dataset. |
265 | IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning | Xi Yang; Ding Xia; Taichi Kin; Takeo Igarashi; | In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. |
266 | Revisiting Saliency Metrics: Farthest-Neighbor Area Under Curve | Sen Jia; Neil D. B. Bruce; | In this paper, we propose a new metric to address the long-standing problem of center bias in saliency evaluation. |
267 | Computing the Testing Error Without a Testing Set | Ciprian A. Corneanu; Sergio Escalera; Aleix M. Martinez; | Here, we derive an algorithm to estimate the performance gap between training and testing without the need of a testing dataset. |
268 | Improving Confidence Estimates for Unfamiliar Examples | Zhizhong Li; Derek Hoiem; | In this paper, we compare and evaluate several methods to improve confidence estimates for unfamiliar and familiar samples. |
269 | CycleISP: Real Image Restoration via Improved Data Synthesis | Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; Ling Shao; | In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. |
270 | Enhanced Blind Face Restoration With Multi-Exemplar Images and Adaptive Spatial Feature Fusion | Xiaoming Li; Wenyu Li; Dongwei Ren; Hongzhi Zhang; Meng Wang; Wangmeng Zuo; | To address these issues, this paper suggests to enhance blind face restoration performance by utilizing multi-exemplar images and adaptive fusion of features from guidance and degraded images. |
271 | Explorable Super Resolution | Yuval Bahat; Tomer Michaeli; | In this paper, we introduce the task of explorable super resolution. |
272 | Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes | Rajeev Yasarla; Vishwanath A. Sindagi; Vishal M. Patel; | We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset while generalizing better using unlabeled real-world images. |
273 | Deblurring by Realistic Blurring | Kaihao Zhang; Wenhan Luo; Yiran Zhong; Lin Ma; Bjorn Stenger; Wei Liu; Hongdong Li; | To address this problem, we propose a new method which combines two GAN models, i.e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images. |
274 | Bringing Old Photos Back to Life | Ziyu Wan; Bo Zhang; Dongdong Chen; Pan Zhang; Dong Chen; Jing Liao; Fang Wen; | We propose to restore old photos that suffer from severe degradation through a deep learning approach. |
275 | A Physics-Based Noise Formation Model for Extreme Low-Light Raw Denoising | Kaixuan Wei; Ying Fu; Jiaolong Yang; Hua Huang; | To address this issue, we present a highly accurate noise formation model based on the characteristics of CMOS photosensors, thereby enabling us to synthesize realistic samples that better match the physics of image formation process. |
276 | Camouflaged Object Detection | Deng-Ping Fan; Ge-Peng Ji; Guolei Sun; Ming-Ming Cheng; Jianbing Shen; Ling Shao; | We present a comprehensive study on a new task named camouflaged object detection (COD), which aims to identify objects that are "seamlessly" embedded in their surroundings. |
277 | Holistically-Attracted Wireframe Parsing | Nan Xue; Tianfu Wu; Song Bai; Fudong Wang; Gui-Song Xia; Liangpei Zhang; Philip H.S. Torr; | This paper presents a fast and parsimonious parsing method to accurately and robustly detect a vectorized wireframe in an input image with a single forward pass. |
278 | Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction | Fuyang Zhang; Nelson Nauata; Yasutaka Furukawa; | This paper proposes a novel message passing neural (MPN) architecture Conv-MPN, which reconstructs an outdoor building as a planar graph from a single RGB image. |
279 | Domain Adaptation for Image Dehazing | Yuanjie Shao; Lerenhan Li; Wenqi Ren; Changxin Gao; Nong Sang; | To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules. |
280 | Auto-Encoding Twin-Bottleneck Hashing | Yuming Shen; Jie Qin; Jiaxin Chen; Mengyang Yu; Li Liu; Fan Zhu; Fumin Shen; Ling Shao; | In this paper, we tackle the above problems by proposing an efficient and adaptive code-driven graph, which is updated by decoding in the context of an auto-encoder. |
281 | Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis | Mang Tik Chiu; Xingqian Xu; Yunchao Wei; Zilong Huang; Alexander G. Schwing; Robert Brunner; Hrant Khachatrian; Hovnatan Karapetyan; Ivan Dozier; Greg Rose; David Wilson; Adrian Tudor; Naira Hovakimyan; Thomas S. Huang; Honghui Shi; | To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns. |
282 | Bi-Directional Interaction Network for Person Search | Wenkai Dong; Zhaoxiang Zhang; Chunfeng Song; Tieniu Tan; | To address this issue, we propose a Siamese network which owns an additional instance-aware branch, named Bi-directional Interaction Network (BINet). |
283 | Meshlet Priors for 3D Mesh Reconstruction | Abhishek Badki; Orazio Gallo; Jan Kautz; Pradeep Sen; | We introduce meshlets, small patches of mesh that we use to learn local shape priors. |
284 | Space-Time-Aware Multi-Resolution Video Enhancement | Muhammad Haris; Greg Shakhnarovich; Norimichi Ukita; | We consider the problem of space-time super-resolution (ST-SR): increasing spatial resolution of video frames and simultaneously interpolating frames to increase the frame rate. |
285 | FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation | Xiang Li; Tianhan Wei; Yau Pun Chen; Yu-Wing Tai; Chi-Keung Tang; | In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. |
286 | MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation | John Lambert; Zhuang Liu; Ozan Sener; James Hays; Vladlen Koltun; | We present MSeg, a composite dataset that unifies se- mantic segmentation datasets from different domains. |
287 | Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification | Yichao Yan; Jie Qin; Jiaxin Chen; Li Liu; Fan Zhu; Ying Tai; Ling Shao; | In this work, we propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH), to pursue better representational capabilities by modeling spatiotemporal dependencies in terms of multiple granularities. |
288 | Online Joint Multi-Metric Adaptation From Frequent Sharing-Subset Mining for Person Re-Identification | Jiahuan Zhou; Bing Su; Ying Wu; | Therefore, we propose an online joint multi-metric adaptation model to adapt the offline learned P-RID models for the online data by learning a series of metrics for all the sharing-subsets. |
289 | Taking a Deeper Look at Co-Salient Object Detection | Deng-Ping Fan; Zheng Lin; Ge-Peng Ji; Dingwen Zhang; Huazhu Fu; Ming-Ming Cheng; | To tackle this issue, we first collect a new high-quality dataset, named CoSOD3k, which contains 3,316 images divided into 160 groups with multiple level annotations, i.e., category, bounding box, object, and instance levels. |
290 | Single-Stage 6D Object Pose Estimation | Yinlin Hu; Pascal Fua; Wei Wang; Mathieu Salzmann; | In this work, we introduce a deep architecture that directly regresses 6D poses from correspondences. |
291 | OccuSeg: Occupancy-Aware 3D Instance Segmentation | Lei Han; Tian Zheng; Lan Xu; Lu Fang; | In this paper, we define “3D occupancy size”, as the number of voxels occupied by each instance. It owns advantages of robustness in prediction, on which basis, OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed. |
292 | Camera Trace Erasing | Chang Chen; Zhiwei Xiong; Xiaoming Liu; Feng Wu; | In this paper, we address a new low-level vision problem, camera trace erasing, to reveal the weakness of trace-based forensic methods. |
293 | Deep Metric Learning via Adaptive Learnable Assessment | Wenzhao Zheng; Jiwen Lu; Jie Zhou; | In this paper, we propose a deep metric learning via adaptive learnable assessment (DML-ALA) method for image retrieval and clustering, which aims to learn a sample assessment strategy to maximize the generalization of the trained metric. |
294 | Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective | Jialun Liu; Yifan Sun; Chuchu Han; Zhaopeng Dou; Wenhui Li; | To this end, we propose to augment each instance of the tail classes with certain disturbances in the deep feature space. |
295 | Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention | Ming Jiang; Shi Chen; Jinhui Yang; Qi Zhao; | Specifically, we introduce the first dataset of top-down attention in immersive scenes. |
296 | HUMBI: A Large Multiview Dataset of Human Body Expressions | Zhixuan Yu; Jae Shin Yoon; In Kyu Lee; Prashanth Venkatesh; Jaesik Park; Jihun Yu; Hyun Soo Park; | This paper presents a new large multiview dataset called HUMBI for human body expressions with natural clothing. |
297 | Image Search With Text Feedback by Visiolinguistic Attention Learning | Yanbei Chen; Shaogang Gong; Loris Bazzani; | In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework. |
298 | Image Processing Using Multi-Code GAN Prior | Jinjin Gu; Yujun Shen; Bolei Zhou; | In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. |
299 | What Does Plate Glass Reveal About Camera Calibration? | Qian Zheng; Jinnan Chen; Zhan Lu; Boxin Shi; Xudong Jiang; Kim-Hui Yap; Ling-Yu Duan; Alex C. Kot; | This paper aims to calibrate the orientation of glass and the field of view of the camera from a single reflection-contaminated image. We collect a dataset containing 320 samples as well as their camera parameters for evaluation. |
300 | Zero-Assignment Constraint for Graph Matching With Outliers | Fudong Wang; Nan Xue; Jin-Gang Yu; Gui-Song Xia; | To address this issue, we present the zero-assignment constraint (ZAC) for approaching the graph matching problem in the presence of outliers. |
301 | Cascaded Deep Video Deblurring Using Temporal Sharpness Prior | Jinshan Pan; Haoran Bai; Jinhui Tang; | We present a simple and effective deep convolutional neural network (CNN) model for video deblurring. |
302 | JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection | Keren Fu; Deng-Ping Fan; Ge-Peng Ji; Qijun Zhao; | This paper proposes a novel joint learning and densely-cooperative fusion (JL-DCF) architecture for RGB-D salient object detection. |
303 | From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement | Wenhan Yang; Shiqi Wang; Yuming Fang; Yue Wang; Jiaying Liu; | To address these problems, we propose a novel semi-supervised learning approach for low-light image enhancement. |
304 | Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution | Lei Zhang; Jiangtao Nie; Wei Wei; Yanning Zhang; Shengcai Liao; Ling Shao; | To tackle this problem, we present an unsupervised adaptation learning (UAL) framework. |
305 | Central Similarity Quantization for Efficient Image and Video Retrieval | Li Yuan; Tao Wang; Xiaopeng Zhang; Francis EH Tay; Zequn Jie; Wei Liu; Jiashi Feng; | In this work, we propose a new global similarity metric, termed as central similarity, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy. |
306 | ARCH: Animatable Reconstruction of Clothed Humans | Zeng Huang; Yuanlu Xu; Christoph Lassner; Hao Li; Tony Tung; | In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. |
307 | A Model-Driven Deep Neural Network for Single Image Rain Removal | Hong Wang; Qi Xie; Qian Zhao; Deyu Meng; | To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable network structures. |
308 | Novel Object Viewpoint Estimation Through Reconstruction Alignment | Mohamed El Banani; Jason J. Corso; David F. Fouhey; | The goal of this paper is to estimate the viewpoint for a novel object. |
309 | Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing | Hengtong Hu; Lingxi Xie; Richang Hong; Qi Tian; | In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method. |
310 | Evaluating Weakly Supervised Object Localization Methods Right | Junsuk Choe; Seong Joon Oh; Seungho Lee; Sanghyuk Chun; Zeynep Akata; Hyunjung Shim; | In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. |
311 | Style Normalization and Restitution for Generalizable Person Re-Identification | Xin Jin; Cuiling Lan; Wenjun Zeng; Zhibo Chen; Li Zhang; | In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. |
312 | Reconstruct Locally, Localize Globally: A Model Free Method for Object Pose Estimation | Ming Cai; Ian Reid; | Instead, we propose a learning-based method whose input is a collection of images of a target object, and whose output is the pose of the object in a novel view. |
313 | RoboTHOR: An Open Simulation-to-Real Embodied AI Platform | Matt Deitke; Winson Han; Alvaro Herrasti; Aniruddha Kembhavi; Eric Kolve; Roozbeh Mottaghi; Jordi Salvador; Dustin Schwenk; Eli VanderBilt; Matthew Wallingford; Luca Weihs; Mark Yatskar; Ali Farhadi; | In this paper, we introduce RoboTHOR to democratize research in interactive and embodied visual AI. |
314 | All in One Bad Weather Removal Using Architectural Search | Ruoteng Li; Robby T. Tan; Loong-Fah Cheong; | In this paper, we propose a method that can handle multiple bad weather degradations: rain, fog, snow and adherent raindrops using a single network. |
315 | Relation-Aware Global Attention for Person Re-Identification | Zhizheng Zhang; Cuiling Lan; Wenjun Zeng; Xin Jin; Zhibo Chen; | In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. |
316 | HOnnotate: A Method for 3D Annotation of Hand and Object Poses | Shreyas Hampali; Mahdi Rad; Markus Oberweger; Vincent Lepetit; | We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. |
317 | Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics | Yuezun Li; Xin Yang; Pu Sun; Honggang Qi; Siwei Lyu; | We present a new large-scale challenging DeepFake video dataset, Celeb-DF, which contains 5,639 high-quality DeepFake videos of celebrities generated using improved synthesis process. |
318 | Deep Unfolding Network for Image Super-Resolution | Kai Zhang; Luc Van Gool; Radu Timofte; | To address this issue, this paper proposes an end-to-end trainable unfolding network which leverages both learningbased methods and model-based methods. |
319 | On the Uncertainty of Self-Supervised Monocular Depth Estimation | Matteo Poggi; Filippo Aleotti; Fabio Tosi; Stefano Mattoccia; | Purposely, we explore for the first time how to estimate the uncertainty for this task and how this affects depth accuracy, proposing a novel peculiar technique specifically designed for self-supervised approaches. |
320 | Proxy Anchor Loss for Deep Metric Learning | Sungyeon Kim; Dongwon Kim; Minsu Cho; Suha Kwak; | This paper presents a new proxy-based loss that takes advantages of both pair- and proxy-based methods and overcomes their limitations. |
321 | Unsupervised Learning for Intrinsic Image Decomposition From a Single Image | Yunfei Liu; Yu Li; Shaodi You; Feng Lu; | In this paper, we propose a novel unsupervised intrinsic image decomposition framework, which relies on neither labeled training data nor hand-crafted priors. |
322 | Multi-Domain Learning for Accurate and Few-Shot Color Constancy | Jin Xiao; Shuhang Gu; Lei Zhang; | In this paper, we start a pioneer work to introduce multi-domain learning to color constancy area. |
323 | PANDA: A Gigapixel-Level Human-Centric Video Dataset | Xueyang Wang; Xiya Zhang; Yinheng Zhu; Yuchen Guo; Xiaoyun Yuan; Liuyu Xiang; Zerun Wang; Guiguang Ding; David Brady; Qionghai Dai; Lu Fang; | We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. |
324 | Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS | Long Chen; Haizhou Ai; Rui Chen; Zijie Zhuang; Shuang Liu; | In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. |
325 | Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification | Jinrui Yang; Wei-Shi Zheng; Qize Yang; Ying-Cong Chen; Qi Tian; | In this work, we propose a novel Spatial-Temporal Graph Convolutional Network (STGCN) to solve these problems. |
326 | Salience-Guided Cascaded Suppression Network for Person Re-Identification | Xuesong Chen; Canmiao Fu; Yong Zhao; Feng Zheng; Jingkuan Song; Rongrong Ji; Yi Yang; | To handle this limitation, we propose a novel Salience-guided Cascaded Suppression Network (SCSN) which enables the model to mine diverse salient features and integrate these features into the final representation by a cascaded manner. |
327 | Fashion Outfit Complementary Item Retrieval | Yen-Liang Lin; Son Tran; Larry S. Davis; | We propose a new framework for outfit complementary item retrieval. |
328 | Learning Event-Based Motion Deblurring | Zhe Jiang; Yu Zhang; Dongqing Zou; Jimmy Ren; Jiancheng Lv; Yebin Liu; | In this paper, we start from a sequential formulation of event-based motion deblurring, then show how its optimization can be unfolded with a novel end-toend deep architecture. |
329 | Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation | Yunhan Zhao; Shu Kong; Daeyun Shin; Charless Fowlkes; | Based on these observations, we develop an attention module that learns to identify and remove difficult out-of-domain regions in real images in order to improve depth prediction for a model trained primarily on synthetic data. |
330 | Neural Blind Deconvolution Using Deep Priors | Dongwei Ren; Kai Zhang; Qilong Wang; Qinghua Hu; Wangmeng Zuo; | To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution. |
331 | Anisotropic Convolutional Networks for 3D Semantic Scene Completion | Jie Li; Kai Han; Peng Wang; Yu Liu; Xia Yuan; | To handle such variations, we propose a novel module called anisotropic convolution, which properties with flexibility and power impossible for the competing methods such as standard 3D convolution and some of its variations. |
332 | TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution | Yapeng Tian; Yulun Zhang; Yun Fu; Chenliang Xu; | To overcome the limitation, we propose a temporally-deformable alignment network (TDAN) to adaptively align the reference frame and each supporting frame at the feature level without computing optical flow. |
333 | Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution | Xiaoyu Xiang; Yapeng Tian; Yulun Zhang; Yun Fu; Jan P. Allebach; Chenliang Xu; | In this paper, we explore the space-time video super-resolution task, which aims to generate a high-resolution (HR) slow-motion video from a low frame rate (LFR), low-resolution (LR) video. |
334 | Fast MSER | Hailiang Xu; Siqi Xie; Fan Chen; | In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2. |
335 | Unsupervised Person Re-Identification via Softened Similarity Learning | Yutian Lin; Lingxi Xie; Yu Wu; Chenggang Yan; Qi Tian; | In this paper, we follow the iterative training mechanism but discard clustering, since it incurs loss from hard quantization, yet its only product, image-level similarity, can be easily replaced by pairwise computation and a softened classification task. |
336 | COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification | Shijie Yu; Shihua Li; Dapeng Chen; Rui Zhao; Junjie Yan; Yu Qiao; | To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named Clothes Changing Person Set (COCAS), which provides multiple images of the same identity with different clothes. |
337 | Learning Formation of Physically-Based Face Attributes | Ruilong Li; Karl Bladin; Yajie Zhao; Chinmay Chinara; Owen Ingraham; Pengda Xiang; Xinglei Ren; Pratusha Prasad; Bipin Kishore; Jun Xing; Hao Li; | Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering. |
338 | Generalized Product Quantization Network for Semi-Supervised Image Retrieval | Young Kyun Jang; Nam Ik Cho; | To resolve this issue, we propose the first quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network. |
339 | Stereoscopic Flash and No-Flash Photography for Shape and Albedo Recovery | Xu Cao; Michael Waechter; Boxin Shi; Ye Gao; Bo Zheng; Yasuyuki Matsushita; | We present a minimal imaging setup that harnesses both geometric and photometric approaches for shape and albedo recovery. |
340 | Context-Aware Group Captioning via Self-Attention and Contrastive Features | Zhuowan Li; Quan Tran; Long Mai; Zhe Lin; Alan L. Yuille; | To solve this problem, we propose a framework combining self-attention mechanism with contrastive feature construction to effectively summarize common information from each image group while capturing discriminative information between them. |
341 | MEBOW: Monocular Estimation of Body Orientation in the Wild | Chenyan Wu; Yukun Chen; Jiajia Luo; Che-Chun Su; Anuja Dawane; Bikramjot Hanzra; Zhuo Deng; Bilan Liu; James Z. Wang; Cheng-hao Kuo; | We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image. |
342 | Distilling Image Dehazing With Heterogeneous Task Imitation | Ming Hong; Yuan Xie; Cuihua Li; Yanyun Qu; | In this paper, we propose a knowledge-distill dehazing network which distills image dehazing with the heterogeneous task imitation. |
343 | Select, Supplement and Focus for RGB-D Saliency Detection | Miao Zhang; Weisong Ren; Yongri Piao; Zhengkun Rong; Huchuan Lu; | In this paper, we propose a new framework for accurate RGB-D saliency detection taking account of local and global complementarities from two modalities. |
344 | Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization | Yoonsik Kim; Jae Woong Soh; Gu Yong Park; Nam Ik Cho; | In order to cope with various and complex real-noise, we propose a well-generalized denoising architecture and a transfer learning scheme. |
345 | On Joint Estimation of Pose, Geometry and svBRDF From a Handheld Scanner | Carolin Schmitt; Simon Donne; Gernot Riegler; Vladlen Koltun; Andreas Geiger; | We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF. |
346 | Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision | Michael Niemeyer; Lars Mescheder; Michael Oechsle; Andreas Geiger; | In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. |
347 | Meta-Transfer Learning for Zero-Shot Super-Resolution | Jae Woong Soh; Sunwoo Cho; Nam Ik Cho; | In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. |
348 | Solving Jigsaw Puzzles With Eroded Boundaries | Dov Bridger; Dov Danon; Ayellet Tal; | This paper focuses on a specific variant of the problem–solving puzzles with eroded boundaries. |
349 | Context-Aware Attention Network for Image-Text Retrieval | Qi Zhang; Zhen Lei; Zhaoxiang Zhang; Stan Z. Li; | In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context. |
350 | M-LVC: Multiple Frames Prediction for Learned Video Compression | Jianping Lin; Dong Liu; Houqiang Li; Feng Wu; | We propose an end-to-end learned video compression scheme for low-latency scenarios. |
351 | Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training | Yuan Yuan; Wei Su; Dandan Ma; | In this paper, we start from the deblurring deconvolution operation, then design an effective and real-time deblurring network. |
352 | Single Image Reflection Removal Through Cascaded Refinement | Chao Li; Yixiao Yang; Kun He; Stephen Lin; John E. Hopcroft; | Inspired by iterative structure reduction for hidden community detection in social networks, we propose an Iterative Boost Convolutional LSTM Network (IBCLN) that enables cascaded prediction for reflection removal. |
353 | From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality | Zhenqiang Ying; Haoran Niu; Praful Gupta; Dhruv Mahajan; Deepti Ghadiyaram; Alan Bovik; | To advance progress on this problem, we introduce the largest (by far) subjective picture quality database, containing about 40, 000 real-world distorted pictures and 120, 000 patches, on which we collected about 4M human judgments of picture quality. |
354 | Video to Events: Recycling Video Datasets for Event Cameras | Daniel Gehrig; Mathias Gehrig; Javier Hidalgo-Carrio; Davide Scaramuzza; | In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data. |
355 | Composed Query Image Retrieval Using Locally Bounded Features | Mehrdad Hosseinzadeh; Yang Wang; | In this paper, we propose a novel method that represents the image using a set of local areas in the image. |
356 | Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring | Maitreya Suin; Kuldeep Purohit; A. N. Rajagopalan; | In this work, we propose an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations and process each test image adaptively. |
357 | End-to-End Illuminant Estimation Based on Deep Metric Learning | Bolei Xu; Jingxin Liu; Xianxu Hou; Bozhi Liu; Guoping Qiu; | To overcome this problem, we introduce a deep metric learning approach named Illuminant-Guided Triplet Network (IGTN) to color constancy. |
358 | Variational-EM-Based Deep Learning for Noise-Blind Image Deblurring | Yuesong Nan; Yuhui Quan; Hui Ji; | This paper aims at developing a deep learning framework for deblurring images with unknown noise level. |
359 | Image Demoireing with Learnable Bandpass Filters | Bolun Zheng; Shanxin Yuan; Gregory Slabaugh; Ales Leonardis; | In this paper, we propose a novel multiscale bandpass convolutional neural network (MBCNN) to address this problem. |
360 | Assessing Image Quality Issues for Real-World Problems | Tai-Yin Chiu; Yinan Zhao; Danna Gurari; | We introduce a new large-scale dataset that links the assessment of image quality issues to two practical vision tasks: image captioning and visual question answering. |
361 | Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising | Haokui Zhang; Ying Li; Hao Chen; Chunhua Shen; | In this paper, we propose HiNAS (Hierarchical NAS), an effort towards employing NAS to automatically design effective neural network architectures for image denoising. |
362 | Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network | Shaolin Su; Qingsen Yan; Yu Zhu; Cheng Zhang; Xin Ge; Jinqiu Sun; Yanning Zhang; | To deal with the challenge, we propose a self-adaptive hyper network architecture to blind assess image quality in the wild. |
363 | Perceptual Quality Assessment of Smartphone Photography | Yuming Fang; Hanwei Zhu; Yan Zeng; Kede Ma; Zhou Wang; | We introduce the Smartphone Photography Attribute and Quality (SPAQ) database, consisting of 11,125 pictures taken by 66 smartphones, where each image is attached with so far the richest annotations. |
364 | Don’t Hit Me! Glass Detection in Real-World Scenes | Haiyang Mei; Xin Yang; Yang Wang; Yuanyuan Liu; Shengfeng He; Qiang Zhang; Xiaopeng Wei; Rynson W.H. Lau; | In this paper, we propose an important problem of detecting glass from a single RGB image. |
365 | Progressive Mirror Detection | Jiaying Lin; Guodong Wang; Rynson W.H. Lau; | Hence, we propose a model in this paper to progressively learn the content similarity between the inside and outside of the mirror while explicitly detecting the mirror edges. |
366 | Category-Level Articulated Object Pose Estimation | Xiaolong Li; He Wang; Li Yi; Leonidas J. Guibas; A. Lynn Abbott; Shuran Song; | We present a novel category-level approach that correctly accommodates object instances previously unseen during training. |
367 | Unbiased Scene Graph Generation From Biased Training | Kaihua Tang; Yulei Niu; Jianqiang Huang; Jiaxin Shi; Hanwang Zhang; | In this paper, we present a novel SGG framework based on causal inference but not the conventional likelihood. |
368 | Dynamic Graph Message Passing Networks | Li Zhang; Dan Xu; Anurag Arnab; Philip H.S. Torr; | We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. |
369 | Weakly Supervised Visual Semantic Parsing | Alireza Zareian; Svebor Karaman; Shih-Fu Chang; | In this paper, we address those two limitations by first proposing a generalized formulation of SGG, namely Visual Semantic Parsing, which disentangles entity and predicate recognition, and enables sub-quadratic performance. |
370 | GPS-Net: Graph Property Sensing Network for Scene Graph Generation | Xin Lin; Changxing Ding; Jinquan Zeng; Dacheng Tao; | Accordingly, in this paper, we propose a Graph Property Sensing Network (GPS-Net) that fully explores these three properties for SGG. |
371 | End-to-End Optimization of Scene Layout | Andrew Luo; Zhoutong Zhang; Jiajun Wu; Joshua B. Tenenbaum; | We propose an end-to-end variational generative model for scene layout synthesis conditioned on scene graphs. |
372 | Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision | Fei Pan; Inkyu Shin; Francois Rameau; Seokju Lee; In So Kweon; | In this work, we propose a two-step self-supervised domain adaptation approach to minimize the inter-domain and intra-domain gap together. |
373 | Dual Super-Resolution Learning for Semantic Segmentation | Li Wang; Dong Li; Yousong Zhu; Lu Tian; Yi Shan; | In this paper, we propose a simple and flexible two-stream framework named Dual Super-Resolution Learning (DSRL) to effectively improve the segmentation accuracy without introducing extra computation costs. |
374 | Self-Supervised Scene De-Occlusion | Xiaohang Zhan; Xingang Pan; Bo Dai; Ziwei Liu; Dahua Lin; Chen Change Loy; | In this paper, we investigate the problem of scene de-occlusion, which aims to recover the underlying occlusion ordering and complete the invisible parts of occluded objects. |
375 | BANet: Bidirectional Aggregation Network With Occlusion Handling for Panoptic Segmentation | Yifeng Chen; Guangchen Lin; Songyuan Li; Omar Bourahla; Yiming Wu; Fangfang Wang; Junyi Feng; Mingliang Xu; Xi Li; | Motivated by these observations, we propose a novel deep panoptic segmentation scheme based on a bidirectional learning pipeline. |
376 | CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries | Han Yang; Xingjian Zhen; Ying Chi; Lei Zhang; Xian-Sheng Hua; | Motivated by the wide application of the graph neural network in structured data, in this paper, we propose a conditional partial-residual graph convolutional network (CPR-GCN), which takes both position and CT image into consideration, since CT image contains abundant information such as branch size and spanning direction. |
377 | Cross-View Correspondence Reasoning Based on Bipartite Graph Convolutional Network for Mammogram Mass Detection | Yuhang Liu; Fandong Zhang; Qianyi Zhang; Siwen Wang; Yizhou Wang; Yizhou Yu; | In this paper, we introduce bipartite graph convolutional network to endow existing methods with cross-view reasoning ability of radiologists in mammogram mass detection. |
378 | MPM: Joint Representation of Motion and Position Map for Cell Tracking | Junya Hayashida; Kazuya Nishimura; Ryoma Bise; | In this paper, we propose the Motion and Position Map (MPM) that jointly represents both detection and association for not only migration but also cell division. |
379 | Deep Distance Transform for Tubular Structure Segmentation in CT Scans | Yan Wang; Xu Wei; Fengze Liu; Jieneng Chen; Yuyin Zhou; Wei Shen; Elliot K. Fishman; Alan L. Yuille; | Inspired by this, we propose a geometry-aware tubular structure segmentation method, Deep Distance Transform (DDT), which combines intuitions from the classical distance transform for skeletonization and modern deep segmentation networks. |
380 | Instance Segmentation of Biological Images Using Harmonic Embeddings | Victor Kulikov; Victor Lempitsky; | We present a new instance segmentation approach tailored to biological images, where instances may correspond to individual cells, organisms or plant parts. |
381 | Multi-scale Domain-adversarial Multiple-instance CNN for Cancer Subtype Classification with Unannotated Histopathological Images | Noriaki Hashimoto; Daisuke Fukushima; Ryoichi Koga; Yusuke Takagi; Kaho Ko; Kei Kohno; Masato Nakaguro; Shigeo Nakamura; Hidekata Hontani; Ichiro Takeuchi; | We propose a new method for cancer subtype classification from histopathological images, which can automatically detect tumor-specific features in a given whole slide image (WSI). |
382 | SOS: Selective Objective Switch for Rapid Immunofluorescence Whole Slide Image Classification | Sam Maksoud; Kun Zhao; Peter Hobson; Anthony Jennings; Brian C. Lovell; | In this paper, we demonstrate that conventional patch-based processing is redundant for certain WSI classification tasks where high resolution is only required in a minority of cases. |
383 | Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density Networks | Sungjoon Choi; Sanghoon Hong; Kyungjae Lee; Sungbin Lim; | In this paper, we focus on weakly supervised learning with noisy training data for both classification and regression problems. |
384 | METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos | Da Zhang; Xiyang Dai; Yuan-Fang Wang; | Towards this objective, we propose a novel Similarity Pyramid Network (SPN) that adopts the few-shot learning technique of Relation Network and directly encodes hierarchical multi-scale correlations, which we learn by optimizing two complimentary loss functions in an end-to-end manner. |
385 | Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data | Xi Yan; David Acuna; Sanja Fidler; | We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. |
386 | Revisiting Knowledge Distillation via Label Smoothing Regularization | Li Yuan; Francis EH Tay; Guilin Li; Tao Wang; Jiashi Feng; | In this work, we challenge this common belief by following experimental observations: 1) beyond the acknowledgment that the teacher can improve the student, the student can also enhance the teacher significantly by reversing the KD procedure; 2) a poorly-trained teacher with much lower accuracy than the student can still improve the latter significantly. |
387 | WCP: Worst-Case Perturbations for Semi-Supervised Deep Learning | Liheng Zhang; Guo-Jun Qi; | In this paper, we present a novel regularization mechanism for training deep networks by minimizing the Worse-Case Perturbation (WCP). |
388 | DEPARA: Deep Attribution Graph for Deep Knowledge Transferability | Jie Song; Yixin Chen; Jingwen Ye; Xinchao Wang; Chengchao Shen; Feng Mao; Mingli Song; | In this paper, we propose the DEeP Attribution gRAph (DEPARA) to investigate the transferability of knowledge learned from PR-DNNs. |
389 | Conditional Channel Gated Networks for Task-Aware Continual Learning | Davide Abati; Jakub Tomczak; Tijmen Blankevoort; Simone Calderara; Rita Cucchiara; Babak Ehteshami Bejnordi; | In this work, we introduce a novel framework to tackle this problem with conditional computation. |
390 | Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations | Shuhao Cui; Shuhui Wang; Junbao Zhuo; Liang Li; Qingming Huang; Qi Tian; | Accordingly, to improve both discriminability and diversity, we propose Batch Nuclear-norm Maximization (BNM) on the output matrix. |
391 | FocalMix: Semi-Supervised Learning for 3D Medical Image Detection | Dong Wang; Yuan Zhang; Kexin Zhang; Liwei Wang; | In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. |
392 | Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions | Johanna Wald; Helisa Dhamo; Nassir Navab; Federico Tombari; | In our work we focus on scene graphs, a data structure that organizes the entities of a scene in a graph, where objects are nodes and their relationships modeled as edges. |
393 | Self-Supervised Viewpoint Learning From Image Collections | Siva Karthik Mustikovela; Varun Jampani; Shalini De Mello; Sifei Liu; Umar Iqbal; Carsten Rother; Jan Kautz; | We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner with a generative network, along with symmetry and adversarial constraints to successfully supervise our viewpoint estimation network. |
394 | Two-Shot Spatially-Varying BRDF and Shape Estimation | Mark Boss; Varun Jampani; Kihwan Kim; Hendrik P.A. Lensch; Jan Kautz; | We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF. |
395 | Variational Context-Deformable ConvNets for Indoor Scene Parsing | Zhitong Xiong; Yuan Yuan; Nianhui Guo; Qi Wang; | Thus, in this paper, we propose a novel variational context-deformable (VCD) module to learn adaptive receptive-field in a structured fashion. |
396 | Strip Pooling: Rethinking Spatial Pooling for Scene Parsing | Qibin Hou; Li Zhang; Ming-Ming Cheng; Jiashi Feng; | In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1. |
397 | Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector | Qi Fan; Wei Zhuo; Chi-Keung Tang; Yu-Wing Tai; | In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples. |
398 | What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation | Jiahua Dong; Yang Cong; Gan Sun; Bineng Zhong; Xiaowei Xu; | To address these challenges, we develop a new unsupervised semantic transfer model including two complementary modules (i.e., T_D and T_F ) for endoscopic lesions segmentation, which can alternatively determine where and how to explore transferable domain-invariant knowledge between labeled source lesions dataset (e.g., gastroscope) and unlabeled target diseases dataset (e.g., enteroscopy). |
399 | ADINet: Attribute Driven Incremental Network for Retinal Image Classification | Qier Meng; Satoh Shin’ichi; | In this paper, we design a framework named "Attribute Driven Incremental Network" (ADINet), a new architecture that integrates class label prediction and attribute prediction into an incremental learning framework to boost the classification performance. |
400 | Unsupervised Domain Adaptation With Hierarchical Gradient Synchronization | Lanqing Hu; Meina Kan; Shiguang Shan; Xilin Chen; | Inspired by this, we propose a novel method called Hierarchical Gradient Synchronization to model the synchronization relationship among the local distribution pieces and global distribution, aiming for more precise domain-invariant features. |
401 | Deep Grouping Model for Unified Perceptual Parsing | Zhiheng Li; Wenxuan Bao; Jiayang Zheng; Chenliang Xu; | Overcoming these challenges, we propose a deep grouping model (DGM) that tightly marries the two types of representations and defines a bottom-up and a top-down process for feature exchanging. |
402 | Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching | Yujiao Shi; Xin Yu; Dylan Campbell; Hongdong Li; | Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. |
403 | Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging | Xiangrui Zeng; Min Xu; | We propose a Geometric unsupervised matching Net-work (Gum-Net) for finding the geometric correspondence between two images with application to 3D subtomogram alignment and averaging. |
404 | FDA: Fourier Domain Adaptation for Semantic Segmentation | Yanchao Yang; Stefano Soatto; | We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other. |
405 | Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery | Zhuo Zheng; Yanfei Zhong; Junjue Wang; Ailong Ma; | In this paper, we argue that the problems lie on the lack of foreground modeling and propose a foreground-aware relation network (FarSeg) from the perspectives of relation-based and optimization-based foreground modeling, to alleviate the above two problems. |
406 | When2com: Multi-Agent Perception via Communication Graph Grouping | Yen-Cheng Liu; Junjiao Tian; Nathaniel Glaser; Zsolt Kira; | In this paper, we address the collaborative perception problem, where one agent is required to perform a perception task and can communicate and share information with other agents on the same task. |
407 | Learning Human-Object Interaction Detection Using Interaction Points | Tiancai Wang; Tong Yang; Martin Danelljan; Fahad Shahbaz Khan; Xiangyu Zhang; Jian Sun; | In this paper, we therefore propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs. |
408 | C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation | Qihang Yu; Dong Yang; Holger Roth; Yutong Bai; Yixiao Zhang; Alan L. Yuille; Daguang Xu; | In this paper, we propose a coarse-to-fine neural architecture search (C2FNAS) to automatically search a 3D segmentation network from scratch without inconsistency on network size or input size. |
409 | Adaptive Subspaces for Few-Shot Learning | Christian Simon; Piotr Koniusz; Richard Nock; Mehrtash Harandi; | In this paper, we provide a framework for few-shot learning by introducing dynamic classifiers that are constructed from few samples. |
410 | Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection | Fa-Ting Hong; Wei-Hong Li; Wei-Shi Zheng; | To overcome this problem, we propose learning important people detection on partially annotated images. |
411 | Stochastic Sparse Subspace Clustering | Ying Chen; Chun-Guang Li; Chong You; | In particular, we show that dropout is equivalent to adding a squared l_2 norm regularization on the representation coefficients, therefore induces denser solutions. Then, we reformulate the optimization problem as a consensus problem over a set of small-scale subproblems. |
412 | CRNet: Cross-Reference Networks for Few-Shot Segmentation | Weide Liu; Chi Zhang; Guosheng Lin; Fayao Liu; | In this paper, we propose a cross-reference network (CRNet) for few-shot segmentation. |
413 | Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data | Wanyu Lin; Zhaolin Gao; Baochun Li; | To address the problem of semi-supervised learning in the presence of severely limited labeled samples, we propose a new framework, called Shoestring , that incorporates metric learning into the paradigm of graph-based semi-supervised learning. |
414 | Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings | Paul Bergmann; Michael Fauser; David Sattlegger; Carsten Steger; | We introduce a powerful student-teacher framework for the challenging problem of unsupervised anomaly detection and pixel-precise anomaly segmentation in high-resolution images. |
415 | 3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior | Xiaokang Chen; Kwan-Yee Lin; Chen Qian; Gang Zeng; Hongsheng Li; | In this paper, we propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation, which could still be able to encode sufficient geometric information, e.g., room layout, object’s sizes and shapes, to infer the invisible areas of the scene with well structure-preserving details. |
416 | Graph-Guided Architecture Search for Real-Time Semantic Segmentation | Peiwen Lin; Peng Sun; Guangliang Cheng; Sirui Xie; Xi Li; Jianping Shi; | In order to release researchers from these tedious mechanical trials, we propose a Graph-guided Architecture Search (GAS) pipeline to automatically search real-time semantic segmentation networks. |
417 | Composing Good Shots by Exploiting Mutual Relations | Debang Li; Junge Zhang; Kaiqi Huang; Ming-Hsuan Yang; | Motivated by this, we propose a graph-based module with a gated feature update to model the relations between different candidates. |
418 | Organ at Risk Segmentation for Head and Neck Cancer Using Stratified Learning and Neural Architecture Search | Dazhou Guo; Dakai Jin; Zhuotun Zhu; Tsung-Ying Ho; Adam P. Harrison; Chun-Hung Chao; Jing Xiao; Le Lu; | For such scenarios, insights can be gained from the stratification approaches seen in manual clinical OAR delineation. This is the goal of our work, where we introduce stratified organ at risk segmentation (SOARS), an approach that stratifies OARs into anchor, mid-level, and small & hard (S&H) categories. |
419 | G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features | Wei Chen; Xi Jia; Hyung Jin Chang; Jinming Duan; Ales Leonardis; | In this paper, we propose a novel real-time 6D object pose estimation framework, named G2L-Net. |
420 | Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-Weighting | Dongnan Liu; Donghao Zhang; Yang Song; Fan Zhang; Lauren O’Donnell; Heng Huang; Mei Chen; Weidong Cai; | In this work, we propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images, by learning from fluorescence microscopy images. |
421 | Single-Stage Semantic Segmentation From Image Labels | Nikita Araslanov; Stefan Roth; | In this work, we first define three desirable properties of a weakly supervised method: local consistency, semantic fidelity, and completeness. Using these properties as guidelines, we then develop a segmentation-based network model and a self-supervised training scheme to train for semantic masks from image-level annotations in a single stage. |
422 | Cascaded Human-Object Interaction Recognition | Tianfei Zhou; Wenguan Wang; Siyuan Qi; Haibin Ling; Jianbing Shen; | Considering the intrinsic complexity of the task, we introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. |
423 | DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction With Deep T1 Prior | Bo Zhou; S. Kevin Zhou; | In this work, we address the above two limitations by proposing a Dual Domain Recurrent Network (DuDoRNet) with deep T1 prior embedded to simultaneously recover k-space and images for accelerating the acquisition of MRI with a long imaging protocol. |
424 | Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation | Junsong Fan; Zhaoxiang Zhang; Chunfeng Song; Tieniu Tan; | In this paper, we argue that the critical factor preventing to obtain the full object mask is the classification boundary mismatch problem in applying the CAM to WSSS. |
425 | FPConv: Learning Local Flattening for Point Convolution | Yiqun Lin; Zizheng Yan; Haibin Huang; Dong Du; Ligang Liu; Shuguang Cui; Xiaoguang Han; | We introduce FPConv, a novel surface-style convolution operator designed for 3D point cloud analysis. |
426 | Rotation Equivariant Graph Convolutional Network for Spherical Image Classification | Qin Yang; Chenglin Li; Wenrui Dai; Junni Zou; Guo-Jun Qi; Hongkai Xiong; | In this paper, we generalize the grid-based CNNs to a non-Euclidean space by taking into account the geometry of spherical surfaces and propose a Spherical Graph Convolutional Network (SGCN) to encode rotation equivariant representations. |
427 | FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation | Hanchao Yu; Shanhui Sun; Haichao Yu; Xiao Chen; Honghui Shi; Thomas S. Huang; Terrence Chen; | In this context, we proposed a novel fast online adaptive learning (FOAL) framework: an online gradient descent based optimizer that is optimized by a meta-learner. |
428 | ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation | Sharon Fogel; Hadar Averbuch-Elor; Sarel Cohen; Shai Mazor; Roee Litman; | We present ScrabbleGAN, a semi-supervised approach to synthesize handwritten text images that are versatile both in style and lexicon. |
429 | Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer | Fengmao Lv; Tao Liang; Xiang Chen; Guosheng Lin; | In this paper, we propose a new domain adaptation approach, called Pivot Interaction Transfer (PIT). |
430 | Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition | Linchao Zhu; Yi Yang; | To deal with the class imbalance problem, we introduce an Inflated Episodic Memory (IEM) for long-tailed visual recognition. |
431 | Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior | Osama Makansi; Ozgun Cicek; Kevin Buchicchio; Thomas Brox; | In this paper, we investigate the problem of anticipating future dynamics, particularly the future location of other vehicles and pedestrians, in the view of a moving vehicle. |
432 | Structure Preserving Generative Cross-Domain Learning | Haifeng Xia; Zhengming Ding; | To this end, we develop a novel Generative cross-domain learning via Structure-Preserving (GSP), which attempts to transform target data into the source domain in order to take advantage of source supervision. |
433 | Reverse Perspective Network for Perspective-Aware Object Counting | Yifan Yang; Guorong Li; Zhe Wu; Li Su; Qingming Huang; Nicu Sebe; | We propose a reverse perspective network to solve the scale variations of input images, instead of generating perspective maps to smooth final outputs. |
434 | Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds | Jiacheng Wei; Guosheng Lin; Kim-Hui Yap; Tzu-Yi Hung; Lihua Xie; | In this paper, we propose a weakly supervised approach to predict point-level results using weak labels on 3D point clouds. |
435 | Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation | Renjun Xu; Pelen Liu; Liyan Wang; Chao Chen; Jindong Wang; | In this paper, we present Reliable Weighted Optimal Transport (RWOT) for unsupervised domain adaptation, including novel Shrinking Subspace Reliability (SSR) and weighted optimal transport strategy. |
436 | ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes | Charles R. Qi; Xinlei Chen; Or Litany; Leonidas J. Guibas; | In this work, we build on top of VoteNet and propose a 3D detection architecture called ImVoteNet specialized for RGB-D scenes. |
437 | Understanding Road Layout From Videos as a Whole | Buyu Liu; Bingbing Zhuang; Samuel Schulter; Pan Ji; Manmohan Chandraker; | In this paper, we address the problem of inferring the layout of complex road scenes from video sequences. |
438 | Bi-Directional Relationship Inferring Network for Referring Image Segmentation | Zhiwei Hu; Guang Feng; Jiayu Sun; Lihe Zhang; Huchuan Lu; | In this work, we propose a bi-directional relationship inferring network (BRINet) to model the dependencies of cross-modal information. |
439 | Perspective Plane Program Induction From a Single Image | Yikai Li; Jiayuan Mao; Xiuming Zhang; William T. Freeman; Joshua B. Tenenbaum; Jiajun Wu; | We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image. |
440 | DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration | Jian Wang; Miaomiao Zhang; | This paper presents DeepFLASH, a novel network with efficient training and inference for learning-based medical image registration. |
441 | Semi-Supervised Learning for Few-Shot Image-to-Image Translation | Yaxing Wang; Salman Khan; Abel Gonzalez-Garcia; Joost van de Weijer; Fahad Shahbaz Khan; | In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training. |
442 | Semantic Correspondence as an Optimal Transport Problem | Yanbin Liu; Linchao Zhu; Makoto Yamada; Yi Yang; | The whole procedure is combined into a unified optimal transport algorithm by converting the maximization problem to the optimal transport formulation and incorporating the staircase weights into optimal transport algorithm to act as empirical distributions. |
443 | How Much Time Do You Have? Modeling Multi-Duration Saliency | Camilo Fosco; Anelise Newman; Pat Sukhum; Yun Bin Zhang; Nanxuan Zhao; Aude Oliva; Zoya Bylinskii; | In this paper we propose to capture gaze as a series of snapshots, by generating population-level saliency heatmaps for multiple viewing durations. We collect the CodeCharts1K dataset, which contains multiple distinct heatmaps per image corresponding to 0.5, 3, and 5 seconds of free-viewing. |
444 | Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention | Dat Huynh; Ehsan Elhamifar; | Instead of aligning a global feature vector of an image with its associated class semantic vector, we propose an attribute embedding technique that aligns each attribute-based feature with its attribute semantic vector. |
445 | Online Depth Learning Against Forgetting in Monocular Videos | Zhenyu Zhang; Stephane Lathuiliere; Elisa Ricci; Nicu Sebe; Yan Yan; Jian Yang; | Specifically, to adapt temporal-continuous depth patterns in videos, we introduce a novel meta-learning approach to learn adapter modules by combining online adaptation process into the learning objective. |
446 | Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation | Lingjing Wang; Xiang Li; Yi Fang; | In comparison, we propose a novel 3D shape segmentation method that requires few labeled data for training. |
447 | Pattern-Structure Diffusion for Multi-Task Learning | Ling Zhou; Zhen Cui; Chunyan Xu; Zhenyu Zhang; Chaoqun Wang; Tong Zhang; Jian Yang; | Inspired by the observation that pattern structures high-frequently recur within intra-task also across tasks, we propose a pattern-structure diffusion (PSD) framework to mine and propagate task-specific and task-across pattern structures in the task-level space for joint depth estimation, segmentation and surface normal prediction. |
448 | Training Noise-Robust Deep Neural Networks via Meta-Learning | Zhen Wang; Guosheng Hu; Qinghua Hu; | In this work, we propose a new loss correction approach, named as Meta Loss Correction (MLC), to directly learn T from data via the meta-learning framework. |
449 | Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation | Jiazhao Zhang; Chenyang Zhu; Lintao Zheng; Kai Xu; | We propose a novel fusion-aware 3D point convolution which operates directly on the geometric surface being reconstructed and exploits effectively the inter-frame correlation for high-quality 3D feature learning. |
450 | Universal Source-Free Domain Adaptation | Jogendra Nath Kundu; Naveen Venkat; Rahul M V; R. Venkatesh Babu; | Devoid of such impractical assumptions, we propose a novel two-stage learning process. |
451 | Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction | Beibei Jin; Yu Hu; Qiankun Tang; Jingyu Niu; Zhiping Shi; Yinhe Han; Xiaowei Li; | Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to uniformly deal with spatial and temporal information. |
452 | Varicolored Image De-Hazing | Akshay Dudhane; Kuldeep M. Biradar; Prashant W. Patil; Praful Hambarde; Subrahmanyam Murala; | In this paper, we propose a varicolored end-to-end image de-hazing network which restores the color balance in a given varicolored hazy image and recovers the haze-free image. |
453 | SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds | Hanyu Shi; Guosheng Lin; Hao Wang; Tzu-Yi Hung; Zhenhua Wang; | In this paper, we propose SpSequenceNet to address this problem. |
454 | Separating Particulate Matter From a Single Microscopic Image | Tushar Sandhan; Jin Young Choi; | In this work, we thoroughly analyze the physical properties of PM, microscope and their inevitable interaction; and propose an optimization scheme, which removes the PM from a high-resolution microscopic image within a few seconds. |
455 | Adaptive Dilated Network With Self-Correction Supervision for Counting | Shuai Bai; Zhiqun He; Yu Qiao; Hanzhe Hu; Wei Wu; Junjie Yan; | In this paper, we propose an adaptive dilated convolution and a novel supervised learning framework named self-correction (SC) supervision. |
456 | PointPainting: Sequential Fusion for 3D Object Detection | Sourabh Vora; Alex H. Lang; Bassam Helou; Oscar Beijbom; | In this work, we propose PointPainting: a sequential fusion method to fill this gap. |
457 | Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications | Biagio Brattoli; Joseph Tighe; Fedor Zhdanov; Pietro Perona; Krzysztof Chalupka; | We propose the first end-to-end algorithm for ZSL in video classification. |
458 | Learning to Select Base Classes for Few-Shot Classification | Linjun Zhou; Peng Cui; Xu Jia; Shiqiang Yang; Qi Tian; | In this paper, we utilize a simple yet effective measure, the Similarity Ratio, as an indicator for the generalization performance of a few-shot model. We then formulate the base class selection problem as a submodular optimization problem over Similarity Ratio. |
459 | CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus | Florian Kluger; Eric Brachmann; Hanno Ackermann; Carsten Rother; Michael Ying Yang; Bodo Rosenhahn; | We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. |
460 | Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks | Tony C.W. Mok; Albert C.S. Chung; | In this paper, we present a novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously. |
461 | Distilled Semantics for Comprehensive Scene Understanding from Videos | Fabio Tosi; Filippo Aleotti; Pierluigi Zama Ramirez; Matteo Poggi; Samuele Salti; Luigi Di Stefano; Stefano Mattoccia; | In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics, with supervision for the latter provided by a pre-trained network distilling proxy ground truth images. |
462 | Modeling Biological Immunity to Adversarial Examples | Edward Kim; Jocelyn Rego; Yijing Watkins; Garrett T. Kenyon; | In this work, we explored this gap through the lens of biology and neuroscience in order to understand the robustness exhibited in human perception. |
463 | DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-Move Forgery Detection and Localization | Ashraful Islam; Chengjiang Long; Arslan Basharat; Anthony Hoogs; | In this paper, we propose a Generative Adversarial Network with a dual-order attention model to detect and localize copy-move forgeries. |
464 | Correspondence-Free Material Reconstruction using Sparse Surface Constraints | Sebastian Weiss; Robert Maier; Daniel Cremers; Rudiger Westermann; Nils Thuerey; | We present a method to infer physical material parameters, and even external boundaries, from the scanned motion of a homogeneous deformable object via the solution of an inverse problem. |
465 | Augmenting Colonoscopy Using Extended and Directional CycleGAN for Lossy Image Translation | Shawn Mathew; Saad Nadeem; Sruti Kumari; Arie Kaufman; | In this paper, we present a deep learning framework, Extended and Directional CycleGAN, for lossy unpaired image-to-image translation between OC and VC to augment OC video sequences with scale-consistent depth information from VC and VC with patient-specific textures, color and specular highlights from OC (e.g. for realistic polyp synthesis). |
466 | Attention Scaling for Crowd Counting | Xiaoheng Jiang; Li Zhang; Mingliang Xu; Tianzhu Zhang; Pei Lv; Bing Zhou; Xin Yang; Yanwei Pang; | To overcome this problem, we propose an approach to alleviate the counting performance differences in different regions. |
467 | Shape Reconstruction by Learning Differentiable Surface Representations | Jan Bednarik; Shaifali Parashar; Erhan Gundogdu; Mathieu Salzmann; Pascal Fua; | In this paper, we show that we can exploit the inherent differentiability of deep networks to leverage differential surface properties during training so as to prevent patch collapse and strongly reduce patch overlap. |
468 | A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image | Yuyu Guo; Lei Bi; Euijoon Ahn; Dagan Feng; Qian Wang; Jinman Kim; | In this paper, we present a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images. |
469 | Attention-Based Context Aware Reasoning for Situation Recognition | Thilini Cooray; Ngai-Man Cheung; Wei Lu; | Inspired by the success achieved by query-based visual reasoning (e.g., Visual Question Answering), we propose to address semantic role prediction as a query-based visual reasoning problem. |
470 | PatchVAE: Learning Local Latent Codes for Recognition | Kamal Gupta; Saurabh Singh; Abhinav Shrivastava; | Drawing inspiration from the mid-level representation discovery work, we propose PatchVAE, that reasons about images at patch level. |
471 | Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume | Adrian Johnston; Gustavo Carneiro; | In this paper, we propose two new ideas to improve self-supervised monocular trained depth estimation: 1) self-attention, and 2) discrete disparity prediction. |
472 | STAViS: Spatio-Temporal AudioVisual Saliency Network | Antigoni Tsiami; Petros Koutras; Petros Maragos; | We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. |
473 | More Grounded Image Captioning by Distilling Image-Text Matching Model | Yuanen Zhou; Meng Wang; Daqing Liu; Zhenzhen Hu; Hanwang Zhang; | To this end, we propose a Part-of-Speech (POS) enhanced image-text matching model (SCAN): POS-SCAN, as the effective knowledge distillation for more grounded image captioning. |
474 | DUNIT: Detection-Based Unsupervised Image-to-Image Translation | Deblina Bhattacharjee; Seungryong Kim; Guillaume Vizier; Mathieu Salzmann; | In this paper, we introduce a Detection-based Unsupervised Image-to-image Translation (DUNIT) approach that explicitly accounts for the object instances in the translation process. |
475 | Learning to Observe: Approximating Human Perceptual Thresholds for Detection of Suprathreshold Image Transformations | Alan Dolhasz; Carlo Harvey; Ian Williams; | In this paper, we propose to directly approximate the perceptual function performed by human observers completing a visual detection task. |
476 | Show, Edit and Tell: A Framework for Editing Image Captions | Fawaz Sammani; Luke Melas-Kyriazi; | This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. |
477 | Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary | Hong Joo Lee; Jung Uk Kim; Sangmin Lee; Hak Gu Kim; Yong Man Ro; | In this paper, we propose a novel image segmentation method to tackle two critical problems of medical image, which are (i) ambiguity of structure boundary in the medical image domain and (ii) uncertainty of the segmented region without specialized domain knowledge. |
478 | Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers | Lyujian Lu; Hua Wang; Saad Elbeleidy; Feiping Nie; | To tackle this problem, in this paper we propose a novel formulation to learn an enriched representation for imaging biomarkers that can simultaneously capture both the information conveyed by baseline neuroimaging records and that by progressive variations of varied counts of available follow-up records over time. |
479 | Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution | Yu Zhao; Fan Yang; Yuqi Fang; Hailing Liu; Niyun Zhou; Jun Zhang; Jiarui Sun; Sen Yang; Bjoern Menze; Xinjuan Fan; Jianhua Yao; | In this paper, we propose a multiple instance learning method based on deep graph convolutional network and feature selection (FS-GCN-MIL) for histopathological image classification. |
480 | Extremely Dense Point Correspondences Using a Learned Feature Descriptor | Xingtong Liu; Yiping Zheng; Benjamin Killeen; Masaru Ishii; Gregory D. Hager; Russell H. Taylor; Mathias Unberath; | In this work, we present an effective self-supervised training scheme and novel loss design for dense descriptor learning. |
481 | Local Deep Implicit Functions for 3D Shape | Kyle Genova; Forrester Cole; Avneesh Sud; Aaron Sarna; Thomas Funkhouser; | Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a structured set of learned implicit functions. |
482 | PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation | Li Jiang; Hengshuang Zhao; Shaoshuai Shi; Shu Liu; Chi-Wing Fu; Jiaya Jia; | In this paper, we present PointGroup, a new end-to-end bottom-up architecture, specifically focused on better grouping the points by exploring the void space between objects. |
483 | Cost Volume Pyramid Based Depth Inference for Multi-View Stereo | Jiayu Yang; Wei Mao; Jose M. Alvarez; Miaomiao Liu; | We propose a cost volume-based neural network for depth inference from multi-view images. |
484 | RoutedFusion: Learning Real-Time Depth Map Fusion | Silvan Weder; Johannes Schonberger; Marc Pollefeys; Martin R. Oswald; | To this end, we present a novel real-time capable machine learning-based method for depth map fusion. |
485 | VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals | Zhixiang Min; Yiding Yang; Enrique Dunn; | We propose a dense indirect visual odometry method taking as input externally estimated optical flow fields instead of hand-crafted feature correspondences. |
486 | Learning to Optimize Non-Rigid Tracking | Yang Li; Aljaz Bozic; Tianwei Zhang; Yanli Ji; Tatsuya Harada; Matthias Niessner; | In this paper, we employ learnable optimizations to improve tracking robustness and speed up solver convergence. |
487 | KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering | Lei Zhou; Zixin Luo; Tianwei Shen; Jiahui Zhang; Mingmin Zhen; Yao Yao; Tian Fang; Long Quan; | In this work, we improve the temporal relocalization method by using a network architecture that incorporates Kalman filtering (KFNet) for online camera relocalization. |
488 | Information-Driven Direct RGB-D Odometry | Alejandro Fontan; Javier Civera; Rudolph Triebel; | This paper presents an information-theoretic approach to point selection in direct RGB-D odometry. |
489 | SuperGlue: Learning Feature Matching With Graph Neural Networks | Paul-Edouard Sarlin; Daniel DeTone; Tomasz Malisiewicz; Andrew Rabinovich; | This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points. |
490 | Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task | Aritra Bhowmik; Stefan Gumhold; Carsten Rother; Eric Brachmann; | We propose a new training methodology which embeds the feature detector in a complete vision pipeline, and where the learnable parameters are trained in an end-to-end fashion. |
491 | ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction | Wenbin Zhu; HsiangTao Wu; Zeyu Chen; Noranart Vesdapunt; Baoyuan Wang; | To further reduce the ambiguities, we present a novel framework called "Reinforced Differentiable Attributes" ("ReDA") which is more general and effective than previous Differentiable Rendering ("DR"). |
492 | EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera | Lan Xu; Weipeng Xu; Vladislav Golyanik; Marc Habermann; Lu Fang; Christian Theobalt; | In this paper, we propose EventCap — the first approach for 3D capturing of high-speed human motions using a single event camera. |
493 | Cross-Modal Deep Face Normals With Deactivable Skip Connections | Victoria Fernandez Abrevaya; Adnane Boukhayma; Philip H.S. Torr; Edmond Boyer; | We present an approach for estimating surface normals from in-the-wild color images of faces. |
494 | Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild | Dominik Kulon; Riza Alp Guler; Iasonas Kokkinos; Michael M. Bronstein; Stefanos Zafeiriou; | We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss. |
495 | Face X-Ray for More General Face Forgery Detection | Lingzhi Li; Jianmin Bao; Ting Zhang; Hao Yang; Dong Chen; Fang Wen; Baining Guo; | In this paper we propose a novel image representation called face X-ray for detecting forgery in face images. |
496 | A Morphable Face Albedo Model | William A. P. Smith; Alassane Seck; Hannah Dee; Bernard Tiddeman; Joshua B. Tenenbaum; Bernhard Egger; | In this paper, we bring together two divergent strands of research: photometric face capture and statistical 3D face appearance modelling. |
497 | Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses | Rongliang Wu; Gongjie Zhang; Shijian Lu; Tao Chen; | To address these limitations, we propose Cascade Expression Focal GAN (Cascade EF-GAN), a novel network that performs progressive facial expression editing with local expression focuses. |
498 | GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes | Enric Corona; Albert Pumarola; Guillem Alenya; Francesc Moreno-Noguer; Gregory Rogez; | To this end, we introduce a generative model that jointly reasons in all these levels and 1) regresses the 3D shape and pose of the objects in the scene; 2) estimates the grasp types; and 3) refines the 51-DoF of a 3D hand model that minimize a graspability loss. |
499 | Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing | Zezheng Wang; Zitong Yu; Chenxu Zhao; Xiangyu Zhu; Yunxiao Qin; Qiusheng Zhou; Feng Zhou; Zhen Lei; | In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces. |
500 | DeepCap: Monocular Human Performance Capture Using Weak Supervision | Marc Habermann; Weipeng Xu; Michael Zollhofer; Gerard Pons-Moll; Christian Theobalt; | We propose a novel deep learning approach for monocular dense human performance capture. |
501 | Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction | Ruixu Liu; Ju Shen; He Wang; Chen Chen; Sen-ching Cheung; Vijayan Asari; | We propose a novel attention-based framework for 3D human pose estimation from a monocular video. |
502 | Advancing High Fidelity Identity Swapping for Forgery Detection | Lingzhi Li; Jianmin Bao; Hao Yang; Dong Chen; Fang Wen; | In this work, we study various existing benchmarks for deepfake detection researches. |
503 | Controllable Person Image Synthesis With Attribute-Decomposed GAN | Yifang Men; Yiming Mao; Yuning Jiang; Wei-Ying Ma; Zhouhui Lian; | This paper introduces the Attribute-Decomposed GAN, a novel generative model for controllable person image synthesis, which can produce realistic person images with desired human attributes (e.g., pose, head, upper clothes and pants) provided in various source inputs. |
504 | Attentive Normalization for Conditional Image Generation | Yi Wang; Ying-Cong Chen; Xiangyu Zhang; Jian Sun; Jiaya Jia; | In this paper, we characterize long-range dependence with attentive normalization (AN), which is an extension to traditional instance normalization. |
505 | SEAN: Image Synthesis With Semantic Region-Adaptive Normalization | Peihao Zhu; Rameen Abdal; Yipeng Qin; Peter Wonka; | We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. |
506 | Blurry Video Frame Interpolation | Wang Shen; Wenbo Bao; Guangtao Zhai; Li Chen; Xiongkuo Min; Zhiyong Gao; | In this paper, we propose a blurry video frame interpolation method to reduce motion blur and up-convert frame rate simultaneously. |
507 | Learning Physics-Guided Face Relighting Under Directional Light | Thomas Nestmeyer; Jean-Francois Lalonde; Iain Matthews; Andreas Lehrmann; | We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. |
508 | Disentangled Image Generation Through Structured Noise Injection | Yazeed Alharbi; Peter Wonka; | Instead of traditional approaches, we propose feeding multiple noise codes through separate fully-connected layers respectively. |
509 | Cross-Domain Correspondence Learning for Exemplar-Based Image Translation | Pan Zhang; Bo Zhang; Dong Chen; Lu Yuan; Fang Wen; | We present a general framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain (e.g., semantic segmentation mask, or edge map, or pose keypoints), given an exemplar image. |
510 | Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning | Yu Deng; Jiaolong Yang; Dong Chen; Fang Wen; Xin Tong; | We propose an approach for face image generation of virtual people with disentangled, precisely-controllable latent representations for identity of non-existing people, expression, pose, and illumination. |
511 | Single Image Reflection Removal With Physically-Based Training Images | Soomin Kim; Yuchi Huo; Sung-Eui Yoon; | In this paper, physically based rendering is used for faithfully synthesizing the required training images, and a corresponding network structure and loss term are proposed. |
512 | SketchyCOCO: Image Generation From Freehand Scene Sketches | Chengying Gao; Qi Liu; Qi Xu; Limin Wang; Jianzhuang Liu; Changqing Zou; | We introduce the first method for automatic image generation from scene-level freehand sketches. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution. |
513 | Image Based Virtual Try-On Network From Unpaired Data | Assaf Neuberger; Eran Borenstein; Bar Hilleli; Eduard Oks; Sharon Alpert; | This paper presents a new image-based virtual try-on approach (Outfit-VITON) that helps visualize how a composition of clothing items selected from various reference images form a cohesive outfit on a person in a query image. |
514 | PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer | Wentao Jiang; Si Liu; Chen Gao; Jie Cao; Ran He; Jiashi Feng; Shuicheng Yan; | In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image. |
515 | RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild | Jiankang Deng; Jia Guo; Evangelos Ververas; Irene Kotsia; Stefanos Zafeiriou; | In this paper, we present a novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane. |
516 | Semantic Image Manipulation Using Scene Graphs | Helisa Dhamo; Azade Farshad; Iro Laina; Nassir Navab; Gregory D. Hager; Federico Tombari; Christian Rupprecht; | Our goal is to encode image information in a given constellation and from there on generate new constellations, such as replacing objects or even changing relationships between objects, while respecting the semantics and style from the original image. |
517 | A Stochastic Conditioning Scheme for Diverse Human Motion Prediction | Sadegh Aliakbarian; Fatemeh Sadat Saleh; Mathieu Salzmann; Lars Petersson; Stephen Gould; | Alternatively, in this paper, we propose to stochastically combine the root of variations with previous pose information, so as to force the model to take the noise into account. |
518 | Transferring Dense Pose to Proximal Animal Classes | Artsiom Sanakoyeu; Vasil Khalidov; Maureen S. McCarthy; Andrea Vedaldi; Natalia Neverova; | We show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes. |
519 | Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild | Umar Iqbal; Pavlo Molchanov; Jan Kautz; | We propose a novel end-to-end learning framework that enables weakly-supervised training using multi-view consistency. |
520 | VIBE: Video Inference for Human Body Pose and Shape Estimation | Muhammed Kocabas; Nikos Athanasiou; Michael J. Black; | To address this problem, we propose "Video Inference for Body Pose and Shape Estimation" (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. |
521 | G3AN: Disentangling Appearance and Motion for Video Generation | Yaohui Wang; Piotr Bilinski; Francois Bremond; Antitza Dantcheva; | To tackle this challenge, we introduce G3AN, a novel spatio-temporal generative model, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner. |
522 | Domain Adaptive Image-to-Image Translation | Ying-Cong Chen; Xiaogang Xu; Jiaya Jia; | To deal with these issues, we propose the Domain Adaptive Image-To-Image translation (DAI2I) framework that adapts an I2I model for out-of-domain samples. |
523 | GAN Compression: Efficient Architectures for Interactive Conditional GANs | Muyang Li; Ji Lin; Yaoyao Ding; Zhijian Liu; Jun-Yan Zhu; Song Han; | In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. |
524 | Searching Central Difference Convolutional Networks for Face Anti-Spoofing | Zitong Yu; Chenxu Zhao; Zezheng Wang; Yunxiao Qin; Zhuo Su; Xiaobai Li; Feng Zhou; Guoying Zhao; | Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information. |
525 | TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting | Zhuoqian Yang; Wentao Zhu; Wayne Wu; Chen Qian; Qiang Zhou; Bolei Zhou; Chen Change Loy; | We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person. |
526 | AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation | Hyeongmin Lee; Taeoh Kim; Tae-young Chung; Daehyun Pak; Yuseok Ban; Sangyoun Lee; | To solve this problem, we propose a new warping module named Adaptive Collaboration of Flows (AdaCoF). |
527 | FReeNet: Multi-Identity Face Reenactment | Jiangning Zhang; Xianfang Zeng; Mengmeng Wang; Yusu Pan; Liang Liu; Yong Liu; Yu Ding; Changjie Fan; | This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model. |
528 | Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera | Jae Shin Yoon; Kihwan Kim; Orazio Gallo; Hyun Soo Park; Jan Kautz; | This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. |
529 | Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data | Yuxiao Zhou; Marc Habermann; Weipeng Xu; Ikhsanul Habibie; Christian Theobalt; Feng Xu; | We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy. |
530 | The GAN That Warped: Semantic Attribute Editing With Unpaired Data | Garoe Dorta; Sara Vicente; Neill D. F. Campbell; Ivor J. A. Simpson; | This work proposes to learn how to perform semantic image edits through the application of smooth warp fields. |
531 | 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos | Aayush Bansal; Minh Vo; Yaser Sheikh; Deva Ramanan; Srinivasa Narasimhan; | We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras. |
532 | Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds | Yongming Rao; Jiwen Lu; Jie Zhou; | Based on this hypothesis, we propose to learn point cloud representation by bidirectional reasoning between the local structures at different abstraction hierarchies and the global shape without human supervision. |
533 | HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation | Bowen Cheng; Bin Xiao; Jingdong Wang; Honghui Shi; Thomas S. Huang; Lei Zhang; | In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. |
534 | Detecting Attended Visual Targets in Video | Eunji Chong; Yongxin Wang; Nataniel Ruiz; James M. Rehg; | Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior. |
535 | Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution | Yong Guo; Jian Chen; Jingdong Wang; Qi Chen; Jiezhang Cao; Zeshuai Deng; Yanwu Xu; Mingkui Tan; | To address the above issues, we propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions. |
536 | Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool | Konstantinos Rematas; Vittorio Ferrari; | We present a neural rendering framework that maps a voxelized scene into a high quality image. |
537 | Neural Contours: Learning to Draw Lines From 3D Shapes | Difan Liu; Mohamed Nabail; Aaron Hertzmann; Evangelos Kalogerakis; | This paper introduces a method for learning to generate line drawings from 3D models. |
538 | Softmax Splatting for Video Frame Interpolation | Simon Niklaus; Feng Liu; | We propose softmax splatting to address this paradigm shift and show its effectiveness on the application of frame interpolation. |
539 | CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks | Maxim Maximov; Ismail Elezi; Laura Leal-Taixe; | We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks. |
540 | Probabilistic Structural Latent Representation for Unsupervised Embedding | Mang Ye; Jianbing Shen; | To tackle these issues, this paper proposes a probabilistic structural latent representation (PSLR), which incorporates an adaptable softmax embedding to approximate the positive concentrated and negative instance separated properties in the graph latent space. |
541 | Semantically Multi-Modal Image Synthesis | Zhen Zhu; Zhiliang Xu; Ansheng You; Xiang Bai; | In this paper, we focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level. |
542 | Nested Scale-Editing for Conditional Image Synthesis | Lingzhi Zhang; Jiancong Wang; Yinshuang Xu; Jie Min; Tarmily Wen; James C. Gee; Jianbo Shi; | We propose an image synthesis approach that provides stratified navigation in the latent code space. |
543 | UnrealText: Synthesizing Realistic Scene Text Images From the Unreal World | Shangbang Long; Cong Yao; | In this paper, we introduce UnrealText, an efficient image synthesis method that renders realistic images via a 3D graphics engine. |
544 | Fast Texture Synthesis via Pseudo Optimizer | Wu Shi; Yu Qiao; | We propose a new efficient method that aims to simulate the optimization process while retains most of the properties. |
545 | Towards Learning Structure via Consensus for Face Segmentation and Parsing | Iacopo Masi; Joe Mathai; Wael AbdAlmageed; | We thereby offer a novel learning mechanism to enforce structure in the prediction via consensus, guided by a robust loss function that forces pixel objects to be consistent with each other. |
546 | CookGAN: Causality Based Text-to-Image Synthesis | Bin Zhu; Chong-Wah Ngo; | This paper presents a new network architecture, CookGAN, that mimics visual effect in causality chain, preserves fine-grained details and progressively upsamples image. |
547 | Weakly Supervised Discriminative Feature Learning With State Information for Person Identification | Hong-Xing Yu; Wei-Shi Zheng; | In this work we propose utilizing the state information as weak supervision to address the visual discrepancy caused by different states. |
548 | Future Video Synthesis With Object Motion Prediction | Yue Wu; Rongrong Gao; Jaesik Park; Qifeng Chen; | We present an approach to predict future video frames given a sequence of continuous video frames in the past. |
549 | MaskGAN: Towards Diverse and Interactive Facial Image Manipulation | Cheng-Han Lee; Ziwei Liu; Lingyun Wu; Ping Luo; | To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ. |
550 | A Graduated Filter Method for Large Scale Robust Estimation | Huu Le; Christopher Zach; | In this paper, we introduce a novel solver for robust estimation that possesses a strong ability to escape poor local minima. |
551 | Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation | Cheng Ma; Zhenyu Jiang; Yongming Rao; Jiwen Lu; Jie Zhou; | In this paper, we propose a deep face super-resolution (FSR) method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively. |
552 | Coherent Reconstruction of Multiple Humans From a Single Image | Wen Jiang; Nikos Kolotouros; Georgios Pavlakos; Xiaowei Zhou; Kostas Daniilidis; | In this work, we address the problem of multi-person 3D pose estimation from a single image. |
553 | PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling | Xu Yan; Chaoda Zheng; Zhen Li; Sheng Wang; Shuguang Cui; | In this paper, we present a novel end-to-end network for robust point clouds processing, named PointASNL, which can deal with point clouds with noise effectively. |
554 | A Neural Rendering Framework for Free-Viewpoint Relighting | Zhang Chen; Anpei Chen; Guli Zhang; Chengyuan Wang; Yu Ji; Kiriakos N. Kutulakos; Jingyi Yu; | We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis and relighting using multi-view image inputs. |
555 | A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection | Zhihao Chen; Lei Zhu; Liang Wan; Song Wang; Wei Feng; Pheng-Ann Heng; | To boost the shadow detection performance, this paper presents a multi-task mean teacher model for semi-supervised shadow detection by leveraging unlabeled data and exploring the learning of multiple information of shadows simultaneously. |
556 | GroupFace: Learning Latent Groups and Constructing Group-Based Representations for Face Recognition | Yonghyun Kim; Wonpyo Park; Myung-Cheol Roh; Jongju Shin; | We propose a novel face-recognition-specialized architecture called GroupFace that utilizes multiple group-aware representations, simultaneously, to improve the quality of the embedding feature. |
557 | Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution | Xibin Song; Yuchao Dai; Dingfu Zhou; Liu Liu; Wei Li; Hongdong Li; Ruigang Yang; | In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with realworld DSR tasks. |
558 | Time Flies: Animating a Still Image With Time-Lapse Video As Reference | Chia-Chi Cheng; Hung-Yu Chen; Wei-Chen Chiu; | In this paper, we propose a self-supervised end-to-end model to generate the time-lapse video from a single image and a reference video. |
559 | SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness | Philipp Terhorst; Jan Niklas Kolf; Naser Damer; Florian Kirchbuchner; Arjan Kuijper; | Avoiding the use of inaccurate quality labels, we proposed a novel concept to measure face quality based on an arbitrary face recognition model. |
560 | Grid-GCN for Fast and Scalable Point Cloud Learning | Qiangeng Xu; Xudong Sun; Cho-Ying Wu; Panqu Wang; Ulrich Neumann; | In this paper, we present a method, named Grid-GCN, for fast and scalable point cloud learning. |
561 | Domain Balancing: Face Recognition on Long-Tailed Domains | Dong Cao; Xiangyu Zhu; Xingyu Huang; Jianzhu Guo; Zhen Lei; | In this paper, we propose a novel Domain Balancing (DB) mechanism to handle this problem. |
562 | AdversarialNAS: Adversarial Neural Architecture Search for GANs | Chen Gao; Yunpeng Chen; Si Liu; Zhenxiong Tan; Shuicheng Yan; | In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation. |
563 | Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining | Yiqun Mei; Yuchen Fan; Yuqian Zhou; Lichao Huang; Thomas S. Huang; Honghui Shi; | In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network. |
564 | The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation | Junjie Huang; Zheng Zhu; Feng Guo; Guan Huang; | In this paper, we focus on this problem and find that the devil of top-down pose estimator is in the biased data processing. |
565 | Data Uncertainty Learning in Face Recognition | Jie Chang; Zhonghao Lan; Changmao Cheng; Yichen Wei; | This work applies data uncertainty learning to face recognition, such that the feature (mean) and uncertainty (variance) are learnt simultaneously, for the first time. |
566 | Regularizing Discriminative Capability of CGANs for Semi-Supervised Generative Learning | Yi Liu; Guangchang Deng; Xiangping Zeng; Si Wu; Zhiwen Yu; Hau-San Wong; | To address this issue, we propose a regularization technique based on Random Regional Replacement (R^3-regularization) to facilitate the generative learning process. |
567 | FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification | Wenxuan Wang; Yanwei Fu; Xuelin Qian; Yu-Gang Jiang; Qi Tian; Xiangyang Xue; | To address these challenges, we propose a unified Face Morphological Multi-branch Network (FMMu-Net) for makeup-invariant face verification, which can simultaneously synthesize many diverse makeup faces through face morphology network (FM-Net) and effectively learn cosmetics-robust face representations using attention-based multi-branch learning network (AttM-Net). |
568 | UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation | Lei Zhao; Qihang Mo; Sihuan Lin; Zhizhong Wang; Zhiwen Zuo; Haibo Chen; Wei Xing; Dongming Lu; | In order to produce multiple and diverse reasonable solutions, we present Unsupervised Cross-space Translation Generative Adversarial Network (called UCTGAN) which mainly consists of three network modules: conditional encoder module, manifold projection module and generation module. |
569 | Decoupled Representation Learning for Skeleton-Based Gesture Recognition | Jianbo Liu; Yongcheng Liu; Ying Wang; Veronique Prinet; Shiming Xiang; Chunhong Pan; | In this paper, we propose to decouple the gesture into hand posture variations and hand movements, which are then modeled separately. |
570 | An Efficient PointLSTM for Point Clouds Based Gesture Recognition | Yuecong Min; Yanxiao Zhang; Xiujuan Chai; Xilin Chen; | In this paper, we formulate gesture recognition as an irregular sequence recognition problem and aim to capture long-term spatial correlations across point cloud sequences. |
571 | Editing in Style: Uncovering the Local Semantics of GANs | Edo Collins; Raja Bala; Bob Price; Sabine Susstrunk; | Focusing on StyleGAN, we introduce a simple and effective method for making local, semantically-aware edits to a target output image. |
572 | On the Detection of Digital Face Manipulation | Hao Dang; Feng Liu; Joel Stehouwer; Xiaoming Liu; Anil K. Jain; | Instead of simply using multi-task learning to simultaneously detect manipulated images and predict the manipulated mask (regions), we propose to utilize an attention mechanism to process and improve the feature maps for the classification task. |
573 | Learning Texture Transformer Network for Image Super-Resolution | Fuzhi Yang; Huan Yang; Jianlong Fu; Hongtao Lu; Baining Guo; | In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively. |
574 | Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence | Junsoo Lee; Eungyeup Kim; Yunsung Lee; Dongjun Kim; Jaehyuk Chang; Jaegul Choo; | To tackle this challenge, we propose to utilize the identical image with geometric distortion as a virtual reference, which makes it possible to secure the ground truth for a colored output image. |
575 | Deblurring Using Analysis-Synthesis Networks Pair | Adam Kaufman; Raanan Fattal; | We propose a new architecture which breaks the deblurring network into an analysis network which estimates the blur, and a synthesis network that uses this kernel to deblur the image. |
576 | Exploring Unlabeled Faces for Novel Attribute Discovery | Hyojin Bahng; Sunghyo Chung; Seungjoo Yoo; Jaegul Choo; | In this paper, we attempt to alleviate this necessity for labeled data in the facial image translation domain. |
577 | Neural Pose Transfer by Spatially Adaptive Instance Normalization | Jiashun Wang; Chao Wen; Yanwei Fu; Haitao Lin; Tianyun Zou; Xiangyang Xue; Yinda Zhang; | Particularly in this paper, we are interested in transferring the pose of source human mesh to deform the target human mesh, while the source and target meshes may have different identity information. |
578 | Fine-Grained Image-to-Image Transformation Towards Visual Recognition | Wei Xiong; Yutong He; Yixuan Zhang; Wenhan Luo; Lin Ma; Jiebo Luo; | In this paper, we aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks. |
579 | Deep Facial Non-Rigid Multi-View Stereo | Ziqian Bai; Zhaopeng Cui; Jamal Ahmed Rahim; Xiaoming Liu; Ping Tan; | We present a method for 3D face reconstruction from multi-view images with different expressions. |
580 | Attention-Driven Cropping for Very High Resolution Facial Landmark Detection | Prashanth Chandran; Derek Bradley; Markus Gross; Thabo Beeler; | Building on top of recent progress in attention-based networks, we present a novel, fully convolutional regional architecture that is specially designed for predicting landmarks on very high resolution facial images without downsampling. |
581 | Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis | Yiyi Liao; Katja Schwarz; Lars Mescheder; Andreas Geiger; | We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. |
582 | End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection | Rui Qian; Divyansh Garg; Yan Wang; Yurong You; Serge Belongie; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; Wei-Lun Chao; | In this paper, we introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end. |
583 | Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks | Jiangke Lin; Yi Yuan; Tianjia Shao; Kun Zhou; | In this paper, we introduce a method to reconstruct 3D facial shapes with high-fidelity textures from single-view images in the wild, without the need to capture a large-scale face texture database. |
584 | CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition | Yuge Huang; Yuhan Wang; Ying Tai; Xiaoming Liu; Pengcheng Shen; Shaoxin Li; Jilin Li; Feiyue Huang; | In this work, we propose a novel Adaptive Curriculum Learning loss (CurricularFace) that embeds the idea of curriculum learning into the loss function to achieve a novel training strategy for deep face recognition, which mainly addresses easy samples in the early training stage and hard ones in the later stage. |
585 | Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images | Hang Zhou; Jihao Liu; Ziwei Liu; Yu Liu; Xiaogang Wang; | To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild. |
586 | One-Shot Domain Adaptation for Face Generation | Chao Yang; Ser-Nam Lim; | In this paper, we propose a framework capable of generating face images that fall into the same distribution as that of a given one-shot example. |
587 | BidNet: Binocular Image Dehazing Without Explicit Disparity Estimation | Yanwei Pang; Jing Nie; Jin Xie; Jungong Han; Xuelong Li; | On the assumption that dehazed binocular images are superior to the hazy ones for stereo vision tasks such as 3D object detection and according to the fact that image haze is a function of depth, this paper proposes a Binocular image dehazing Network (BidNet) aiming at dehazing both the left and right images of binocular images within the deep learning framework. |
588 | Deep Shutter Unrolling Network | Peidong Liu; Zhaopeng Cui; Viktor Larsson; Marc Pollefeys; | We present a novel network for rolling shutter effect correction. |
589 | Joint Texture and Geometry Optimization for RGB-D Reconstruction | Yanping Fu; Qingan Yan; Jie Liao; Chunxia Xiao; | In this paper, we propose a novel approach that can jointly optimize the camera poses, texture and geometry of the reconstructed model, and color consistency between the key-frames. |
590 | Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images | Sai Bi; Zexiang Xu; Kalyan Sunkavalli; David Kriegman; Ravi Ramamoorthi; | We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object from a sparse set of only six images captured by wide-baseline cameras under collocated point lighting. |
591 | Auto-Tuning Structured Light by Optical Stochastic Gradient Descent | Wenzheng Chen; Parsa Mirdehghan; Sanja Fidler; Kiriakos N. Kutulakos; | We consider the problem of optimizing the performance of an active imaging system by automatically discovering the illuminations it should use, and the way to decode them. |
592 | MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction | Zhenyu Xu; Yiguang Liu; Xuelei Shi; Ying Wang; Yunan Zheng; | In this paper, we present a novel method, matching ambiguity reduced multiple view stereo (MARMVS) to address this issue. |
593 | Uncertainty Based Camera Model Selection | Michal Polic; Stanislav Steidl; Cenek Albl; Zuzana Kukelova; Tomas Pajdla; | In this paper, we present a new automatic method for camera model selection in large scale SfM that is based on efficient uncertainty evaluation. |
594 | Local Implicit Grid Representations for 3D Scenes | Chiyu "Max" Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Niessner, Thomas Funkhouser; | In this paper, we introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality. |
595 | TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell | Hayato Onizuka; Zehra Hayirci; Diego Thomas; Akihiro Sugimoto; Hideaki Uchiyama; Rin-ichiro Taniguchi; | In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression. |
596 | Averaging Essential and Fundamental Matrices in Collinear Camera Settings | Amnon Geifman; Yoni Kasten; Meirav Galun; Ronen Basri; | In this paper, we introduce an analysis and algorithms for averaging bifocal tensors (essential or fundamental matrices) when either subsets or all of the camera centers are collinear. |
597 | On the Distribution of Minima in Intrinsic-Metric Rotation Averaging | Kyle Wilson; David Bindel; | In this paper, we study the spatial distribution of local minima. |
598 | Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation | Edoardo Remelli; Shangchen Han; Sina Honari; Pascal Fua; Robert Wang; | We present a lightweight solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. |
599 | A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset | Jin Liu; Shunping Ji; | We also introduce in this paper a novel network, called RED-Net, for wide-range depth inference, which we developed from a recurrent encoder-decoder structure to regularize cost maps across depths and a 2D fully convolutional network as framework. |
600 | Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation | Jean Kossaifi; Antoine Toisoul; Adrian Bulat; Yannis Panagakis; Timothy M. Hospedales; Maja Pantic; | In this paper, we unify these two approaches by proposing a tensor factorization framework for efficient multidimensional (separable) convolutions of higher-order. |
601 | Effectively Unbiased FID and Inception Score and Where to Find Them | Min Jin Chong; David Forsyth; | This paper shows that two commonly used evaluation metrics for generative models, the Frechet Inception Distance (FID) and the Inception Score (IS), are biased — the expected value of the score computed for a finite sample set is not the true value of the score. |
602 | Robust Homography Estimation via Dual Principal Component Pursuit | Tianjiao Ding; Yunchen Yang; Zhihui Zhu; Daniel P. Robinson; Rene Vidal; Laurent Kneip; Manolis C. Tsakiris; | We revisit robust estimation of homographies over point correspondences between two or three views, a fundamental problem in geometric vision. |
603 | Non-Adversarial Video Synthesis With Learned Priors | Abhishek Aich; Akash Gupta; Rameswar Panda; Rakib Hyder; M. Salman Asif; Amit K. Roy-Chowdhury; | Different from these methods, we focus on the problem of generating videos from latent noise vectors, without any reference input frames. |
604 | Uncertainty-Aware Mesh Decoder for High Fidelity 3D Face Reconstruction | Gun-Hee Lee; Seong-Whan Lee; | In this paper, we propose to employ (i) an uncertainty-aware encoder that presents face features as distributions and (ii) a fully nonlinear decoder model combining Graph CNN with GAN. |
605 | 3FabRec: Fast Few-Shot Face Alignment by Reconstruction | Bjorn Browatzki; Christian Wallraven; | We introduce a semi-supervised method in which the crucial idea is to first generate implicit face knowledge from the large amounts of unlabeled images of faces available today. |
606 | Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects | Seungryul Baek; Kwang In Kim; Tae-Kyun Kim; | In this work, we propose a novel end-to-end trainable pipeline that adapts the hand-object domain to the single hand-only domain, while learning for HPE. |
607 | Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition | Chi Nhan Duong; Thanh-Dat Truong; Khoa Luu; Kha Gia Quach; Hung Bui; Kaushik Roy; | This paper presents a novel generative structure with Bijective Metric Learning, namely Bijective Generative Adversarial Networks in a Distillation framework (DiBiGAN), for synthesizing faces of an identity given that person’s features. |
608 | StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images | Ayush Tewari; Mohamed Elgharib; Gaurav Bharaj; Florian Bernard; Hans-Peter Seidel; Patrick Perez; Michael Zollhofer; Christian Theobalt; | We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. |
609 | Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis | Jogendra Nath Kundu; Siddharth Seth; Varun Jampani; Mugalodi Rakesh; R. Venkatesh Babu; Anirban Chakraborty; | Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames. |
610 | Learning Meta Face Recognition in Unseen Domains | Jianzhu Guo; Xiangyu Zhu; Chenxu Zhao; Dong Cao; Zhen Lei; Stan Z. Li; | In this paper, we aim to learn a generalized model that can directly handle new unseen domains without any model updating. Besides, we propose two benchmarks for generalized face recognition evaluation. |
611 | Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data | Shichao Li; Lei Ke; Kevin Pratama; Yu-Wing Tai; Chi-Keung Tang; Kwang-Ting Cheng; | This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for training 2D-to-3D networks, (2) can effectively reduce dataset bias. |
612 | GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models | Hongyi Xu; Eduard Gabriel Bazavan; Andrei Zanfir; William T. Freeman; Rahul Sukthankar; Cristian Sminchisescu; | We present a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. |
613 | Generating 3D People in Scenes Without People | Yan Zhang; Mohamed Hassan; Heiko Neumann; Michael J. Black; Siyu Tang; | We present a fully automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. |
614 | Transferring Cross-Domain Knowledge for Video Sign Language Recognition | Dongxu Li; Xin Yu; Chenchen Xu; Lars Petersson; Hongdong Li; | Motivated by this observation, we propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them. |
615 | Bodies at Rest: 3D Human Pose and Shape Estimation From a Pressure Image Using Synthetic Data | Henry M. Clever; Zackory Erickson; Ariel Kapusta; Greg Turk; Karen Liu; Charles C. Kemp; | We describe a physics-based method that simulates human bodies at rest in a bed with a pressure sensing mat, and present PressurePose, a synthetic dataset with 206K pressure images with 3D human poses and shapes. |
616 | Bayesian Adversarial Human Motion Synthesis | Rui Zhao; Hui Su; Qiang Ji; | We propose a generative probabilistic model for human motion synthesis. |
617 | LSM: Learning Subspace Minimization for Low-Level Vision | Chengzhou Tang; Lu Yuan; Ping Tan; | We study the energy minimization problem in low-level vision tasks from a novel perspective. |
618 | Learning a Neural Solver for Multiple Object Tracking | Guillem Braso; Laura Leal-Taixe; | In this work, we exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs). |
619 | GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences | Prune Truong; Martin Danelljan; Radu Timofte; | In this work, we propose a universal network architecture that is directly applicable to all the aforementioned dense correspondence problems. |
620 | SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking | Dongyan Guo; Jun Wang; Ying Cui; Zhenhua Wang; Shengyong Chen; | By decomposing the visual tracking task into two subproblems as classification for pixel category and regression for object bounding box at this pixel, we propose a novel fully convolutional Siamese network to solve visual tracking end-to-end in a per-pixel manner. |
621 | MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask | Shengyu Zhao; Yilun Sheng; Yue Dong; Eric I-Chao Chang; Yan Xu; | In this paper, we propose an asymmetric occlusion-aware feature matching module, which can learn a rough occlusion mask that filters useless (occluded) areas immediately after feature warping without any explicit supervision. |
622 | Tracking by Instance Detection: A Meta-Learning Approach | Guangting Wang; Chong Luo; Xiaoyan Sun; Zhiwei Xiong; Wenjun Zeng; | We propose a principled three-step approach to build a high-performance tracker. |
623 | High-Performance Long-Term Tracking With Meta-Updater | Kenan Dai; Yunhua Zhang; Dong Wang; Jianhua Li; Huchuan Lu; Xiaoyun Yang; | In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? |
624 | TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model | Bo Pang; Yizhuo Li; Yifan Zhang; Muchen Li; Cewu Lu; | To address these challenges, we propose a concise end-to-end model TubeTK which only needs one step training by introducing the "bounding-tube" to indicate temporal-spatial locations of objects in a short video clip. |
625 | Collaborative Motion Prediction via Neural Motion Message Passing | Yue Hu; Siheng Chen; Ya Zhang; Xiao Gu; | To address this challenge, we propose neural motion message passing (NMMP) to explicitly model the interaction and learn representations for directed interactions between actors. |
626 | P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds | Haozhe Qi; Chen Feng; Zhiguo Cao; Feng Zhao; Yang Xiao; | Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. |
627 | Self-Supervised Deep Visual Odometry With Online Adaptation | Shunkai Li; Xin Wang; Yingdian Cao; Fei Xue; Zike Yan; Hongbin Zha; | In this paper, we propose an online meta-learning algorithm to enable VO networks to continuously adapt to new environments in a self-supervised manner. |
628 | Globally Optimal Contrast Maximisation for Event-Based Motion Estimation | Daqi Liu; Alvaro Parra; Tat-Jun Chin; | To alleviate this weakness, we propose a new globally optimal event-based motion estimation algorithm. |
629 | D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features | Xuyang Bai; Zixin Luo; Lei Zhou; Hongbo Fu; Long Quan; Chiew-Lan Tai; | In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. |
630 | Towards Backward-Compatible Representation Learning | Yantao Shen; Yuanjun Xiong; Wei Xia; Stefano Soatto; | We propose a framework to train embedding models, called backward-compatible training (BCT), as a first step towards backward compatible representation learning. |
631 | PointAugment: An Auto-Augmentation Framework for Point Cloud Classification | Ruihui Li; Xianzhi Li; Pheng-Ann Heng; Chi-Wing Fu; | We present PointAugment, a new auto-augmentation framework that automatically optimizes and augments point cloud samples to enrich the data diversity when we train a classification network. |
632 | Cross-Batch Memory for Embedding Learning | Xun Wang; Haozhi Zhang; Weilin Huang; Matthew R. Scott; | In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating throughout the training process. |
633 | Circle Loss: A Unified Perspective of Pair Similarity Optimization | Yifan Sun; Changmao Cheng; Yuhan Zhang; Chi Zhang; Liang Zheng; Zhongdao Wang; Yichen Wei; | This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity s_p and minimize the between-class similarity s_n. |
634 | Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics | Simon Jenni; Hailin Jin; Paolo Favaro; | We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations of an image. |
635 | Hyperbolic Image Embeddings | Valentin Khrulkov; Leyla Mirvakhabova; Evgeniya Ustinova; Ivan Oseledets; Victor Lempitsky; | In this work, we demonstrate that in many practical scenarios, hyperbolic embeddings provide a better alternative. |
636 | Controllable Orthogonalization in Training DNNs | Lei Huang; Li Liu; Fan Zhu; Diwen Wan; Zehuan Yuan; Bo Li; Ling Shao; | This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton’s iteration (ONI), to learn a layer-wise orthogonal weight matrix in DNNs. |
637 | An Investigation Into the Stochasticity of Batch Whitening | Lei Huang; Lei Zhao; Yi Zhou; Fan Zhu; Li Liu; Ling Shao; | Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios. |
638 | High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification | Guan’an Wang; Shuo Yang; Huanyu Liu; Zhicheng Wang; Yang Yang; Shuliang Wang; Gang Yu; Erjin Zhou; Jian Sun; | In this paper, we propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. |
639 | Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance | Jaime Spencer; Richard Bowden; Simon Hadfield; | The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval, regardless of the current seasonal or temporal appearance. |
640 | Learning to Dress 3D People in Generative Clothing | Qianli Ma; Jinlong Yang; Anurag Ranjan; Sergi Pujades; Gerard Pons-Moll; Siyu Tang; Michael J. Black; | To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing. |
641 | MAST: A Memory-Augmented Self-Supervised Tracker | Zihang Lai; Erika Lu; Weidi Xie; | We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. |
642 | Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation | Liang Liu; Jiangning Zhang; Ruifei He; Yong Liu; Yabiao Wang; Ying Tai; Donghao Luo; Chengjie Wang; Jilin Li; Feiyue Huang; | In this work, we present a framework to use more reliable supervision from transformations. |
643 | GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning | Xinshuo Weng; Yongxin Wang; Yunze Man; Kris M. Kitani; | In this work, we propose two techniques to improve the discriminative feature learning for MOT: (1) instead of obtaining features for each object independently, we propose a novel feature interaction mechanism by introducing the Graph Neural Network. |
644 | ClusterFit: Improving Generalization of Visual Representations | Xueting Yan; Ishan Misra; Abhinav Gupta; Deepti Ghadiyaram; Dhruv Mahajan; | In this work, we present a simple strategy – ClusterFit to improve the robustness of the visual representations learned during pre-training. |
645 | Learning Dynamic Relationships for 3D Human Motion Prediction | Qiongjie Cui; Huaijiang Sun; Fei Yang; | To tackle these issues, we propose a deep generative model based on graph networks and adversarial learning. |
646 | Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge | Long Zhao; Xi Peng; Yuxiao Chen; Mubbasir Kapadia; Dimitris N. Metaxas; | In this paper, we propose a novel scheme to train the Student in a Target dataset where the Teacher is unavailable. |
647 | S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation | Yizhe Zhu; Martin Renqiang Min; Asim Kadav; Hans Peter Graf; | We propose a sequential variational autoencoder to learn disentangled representations of sequential data (e.g., videos and audios) under self-supervision. |
648 | Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning | Yuan Yao; Chang Liu; Dezhao Luo; Yu Zhou; Qixiang Ye; | In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. |
649 | Learning to Manipulate Individual Objects in an Image | Yanchao Yang; Yutong Chen; Stefano Soatto; | The key to our method is the combination of spatial disentanglement, enforced by a Contextual Information Separation loss, and perceptual cycle-consistency, enforced by a loss that penalizes changes in the image partition in response to perturbations of the latent factors. |
650 | PADS: Policy-Adapted Sampling for Visual Similarity Learning | Karsten Roth; Timo Milbich; Bjorn Ommer; | We, therefore, employ reinforcement learning and have a teacher network adjust the sampling distribution based on the current state of the learner network, which represents visual similarity. |
651 | Siam R-CNN: Visual Tracking by Re-Detection | Paul Voigtlaender; Jonathon Luiten; Philip H.S. Torr; Bastian Leibe; | We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. |
652 | ASLFeat: Learning Local Features of Accurate Shape and Localization | Zixin Luo; Lei Zhou; Xuyang Bai; Hongkai Chen; Jiahui Zhang; Yao Yao; Shiwei Li; Tian Fang; Long Quan; | In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. |
653 | Filter Grafting for Deep Neural Networks | Fanxu Meng; Hao Cheng; Ke Li; Zhixin Xu; Rongrong Ji; Xing Sun; Guangming Lu; | This paper proposes a new learning paradigm called filter grafting, which aims to improve the representation capability of Deep Neural Networks (DNNs). |
654 | HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation | Bardia Doosti; Shujon Naha; Majid Mirbagheri; David J. Crandall; | In this paper, we propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time. |
655 | DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation | Mohammad Rami Koujan; Anastasios Roussos; Stefanos Zafeiriou; | In this work, we propose DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense estimation of 3D non-rigid facial flow between pairs of monocular images. |
656 | Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement | Ren Yang; Fabian Mentzer; Luc Van Gool; Radu Timofte; | In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. |
657 | Learning Better Lossless Compression Using Lossy Compression | Fabian Mentzer; Luc Van Gool; Michael Tschannen; | We leverage the powerful lossy image compression algorithm BPG to build a lossless image compression system. |
658 | Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching | Pengpeng Liu; Irwin King; Michael R. Lyu; Jia Xu; | In this paper, we propose a unified method to jointly learn optical flow and stereo matching. |
659 | Multi-Scale Fusion Subspace Clustering Using Similarity Constraint | Zhiyuan Dang; Cheng Deng; Xu Yang; Heng Huang; | In this paper, we propose the Multi-Scale Fusion Subspace Clustering Using Similarity Constraint (SC-MSFSC) network, which learns a more discriminative self-expression coefficient matrix by a novel multi-scale fusion module. |
660 | Siamese Box Adaptive Network for Visual Tracking | Zedu Chen; Bineng Zhong; Guorong Li; Shengping Zhang; Rongrong Ji; | To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). |
661 | Cross-Domain Face Presentation Attack Detection via Multi-Domain Disentangled Representation Learning | Guoqing Wang; Hu Han; Shiguang Shan; Xilin Chen; | In light of this, we propose an efficient disentangled representation learning for cross-domain face PAD. |
662 | Online Deep Clustering for Unsupervised Representation Learning | Xiaohang Zhan; Jiahao Xie; Ziwei Liu; Yew-Soon Ong; Chen Change Loy; | To overcome this challenge, we propose Online Deep Clustering (ODC) that performs clustering and network update simultaneously rather than alternatingly. |
663 | Density-Aware Feature Embedding for Face Clustering | Senhui Guo; Jing Xu; Dapeng Chen; Chao Zhang; Xiaogang Wang; Rui Zhao; | In this paper, we propose a Density-Aware Feature Embedding Network (DA-Net) for the task of face clustering, which utilizes both local and non-local information, to learn a robust feature embedding. |
664 | Self-Supervised Learning of Pretext-Invariant Representations | Ishan Misra; Laurens van der Maaten; | Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as `pearl’) that learns invariant representations based on pretext tasks. |
665 | ROAM: Recurrently Optimizing Tracking Model | Tianyu Yang; Pengfei Xu; Runbo Hu; Hua Chai; Antoni B. Chan; | In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations. |
666 | Deformable Siamese Attention Networks for Visual Object Tracking | Yuechen Yu; Yilei Xiong; Weilin Huang; Matthew R. Scott; | In this paper, we propose Deformable Siamese Attention Networks, referred to as SiamAttn, by introducing a new Siamese attention mechanism that computes deformable self-attention and cross-attention. |
667 | 15 Keypoints Is All You Need | Michael Snower; Asim Kadav; Farley Lai; Hans Peter Graf; | We present an efficient multi-person pose-tracking method, KeyTrack that only relies on keypoint information without using any RGB or optical flow to locate and track human keypoints in real-time. |
668 | Optical Flow in the Dark | Yinqiang Zheng; Mingfang Zhang; Feng Lu; | We propose an end-to-end data-driven method that avoids error accumulation and learns optical flow directly from low-light noisy images. We also collect a new optical flow dataset in raw format with a large range of exposure to be used as a benchmark. |
669 | Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt | Hangyu Lin; Yanwei Fu; Xiangyang Xue; Yu-Gang Jiang; | Particularly, towards the pre-training task, we present a novel Sketch Gestalt Model (SGM) to help train the Sketch-BERT. |
670 | A Unified Object Motion and Affinity Model for Online Multi-Object Tracking | Junbo Yin; Wenguan Wang; Qinghao Meng; Ruigang Yang; Jianbing Shen; | In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure. |
671 | Sub-Frame Appearance and 6D Pose Estimation of Fast Moving Objects | Denys Rozumnyi; Jan Kotera; Filip Sroubek; Jiri Matas; | We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time. |
672 | How to Train Your Deep Multi-Object Tracker | Yihong Xu; Aljosa Osep; Yutong Ban; Radu Horaud; Laura Leal-Taixe; Xavier Alameda-Pineda; | In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers. |
673 | TPNet: Trajectory Proposal Network for Motion Prediction | Liangji Fang; Qinhong Jiang; Jianping Shi; Bolei Zhou; | In this work we propose a novel two-stage motion prediction framework, Trajectory Proposal Network (TPNet). |
674 | Large Scale Video Representation Learning via Relational Graph Clustering | Hyodong Lee; Joonseok Lee; Joe Yue-Hei Ng; Paul Natsev; | In this work, we explore two promising scalable representation learning approaches on video domain. |
675 | Towards Universal Representation Learning for Deep Face Recognition | Yichun Shi; Xiang Yu; Kihyuk Sohn; Manmohan Chandraker; Anil K. Jain; | Instead, we propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge. |
676 | Robust Partial Matching for Person Search in the Wild | Yingji Zhong; Xiaoyu Wang; Shiliang Zhang; | To alleviate this issue, this paper proposes an Align-to-Part Network (APNet) for person detection and re-Identification (reID). |
677 | Correlation-Guided Attention for Corner Detection Based Visual Tracking | Fei Du; Peng Liu; Wei Zhao; Xianglong Tang; | We analyze the reasons for their failure and propose a state-of-the-art tracker that performs correlation-guided attentional corner detection in two stages. |
678 | Learning Multi-Object Tracking and Segmentation From Automatic Annotations | Lorenzo Porzi; Markus Hofinger; Idoia Ruiz; Joan Serrat; Samuel Rota Bulo; Peter Kontschieder; | In this work we contribute a novel pipeline to automatically generate training data, and to improve over state-of-the-art multi-object tracking and segmentation (MOTS) methods. |
679 | PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation | Abdallah Benzine; Florian Chabot; Bertrand Luvison; Quoc Cuong Pham; Catherine Achard; | In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based Network), a new single-shot, anchor-based and multi-person 3D pose estimation approach. |
680 | Rotation Consistent Margin Loss for Efficient Low-Bit Face Recognition | Yudong Wu; Yichao Wu; Ruihao Gong; Yuanhao Lv; Ken Chen; Ding Liang; Xiaolin Hu; Xianglong Liu; Junjie Yan; | In this paper, we consider the low-bit quantization problem of face recognition (FR) under the open-set protocol. |
681 | Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking | Peiliang Li; Jieqi Shi; Shaojie Shen; | To benefit from both the powerful object understanding skill from deep neural network meanwhile tackle precise geometry modeling for consistent trajectory estimation, we propose a joint spatial-temporal optimization-based stereo 3D object tracking method. |
682 | Unity Style Transfer for Person Re-Identification | Chong Liu; Xiaojun Chang; Yi-Dong Shen; | To solve this problem, we propose a UnityStyle adaption method, which can smooth the style disparities within the same camera and across different cameras. |
683 | Suppressing Uncertainties for Large-Scale Facial Expression Recognition | Kai Wang; Xiaojiang Peng; Jianfei Yang; Shijian Lu; Yu Qiao; | To address this problelm, this paper proposes to suppress the uncertainties by a simple yet efficient Self-Cure Network (SCN). |
684 | Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation | Rahul Mitra; Nitesh B. Gundavarapu; Abhishek Sharma; Arjun Jain; | To reduce this annotation dependency, we propose Multiview-Consistent Semi Supervised Learning (MCSS) framework that utilizes similarity in pose information from unannotated, uncalibrated but synchronized multi-view videos of human motions as additional weak supervision signal to guide 3D human pose regression. |
685 | Regularizing Neural Networks via Minimizing Hyperspherical Energy | Rongmei Lin; Weiyang Liu; Zhen Liu; Chen Feng; Zhiding Yu; James M. Rehg; Li Xiong; Le Song; | To address these problems, we propose the compressive minimum hyperspherical energy (CoMHE) as a more effective regularization for neural networks. |
686 | Learning Representations by Predicting Bags of Visual Words | Spyros Gidaris; Andrei Bursuc; Nikos Komodakis; Patrick Perez; Matthieu Cord; | Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words. |
687 | AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces | Muhammad Haris Khan; John McDonagh; Salman Khan; Muhammad Shahabuddin; Aditya Arora; Fahad Shahbaz Khan; Ling Shao; Georgios Tzimiropoulos; | To this end, we introduce a large-scale, hierarchical annotated dataset of animal faces, featuring 22.4K faces from 350 diverse species and 21 animal orders across biological taxonomy. |
688 | A Transductive Approach for Video Object Segmentation | Yizhuo Zhang; Zhirong Wu; Houwen Peng; Stephen Lin; | To address this issue, we propose a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed. |
689 | Dynamic Face Video Segmentation via Reinforcement Learning | Yujiang Wang; Mingzhi Dong; Jie Shen; Yang Wu; Shiyang Cheng; Maja Pantic; | To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an efficient and effective scheduling policy from expert information about decision history and from the process of maximising global return. |
690 | Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion | Julian Chibane; Thiemo Alldieck; Gerard Pons-Moll; | To solve this, we propose Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions, but critically they can also retain detail when it is present in the input data, and can reconstruct articulated humans. |
691 | Semantic Drift Compensation for Class-Incremental Learning | Lu Yu; Bartlomiej Twardowski; Xialei Liu; Luis Herranz; Kai Wang; Yongmei Cheng; Shangling Jui; Joost van de Weijer; | Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. |
692 | Context-Aware Human Motion Prediction | Enric Corona; Albert Pumarola; Guillem Alenya; Francesc Moreno-Noguer; | In this paper, we explore this scenario using a novel context-aware motion prediction architecture. |
693 | DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data | Aljaz Bozic; Michael Zollhofer; Christian Theobalt; Matthias Niessner; | Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline. |
694 | Optical Non-Line-of-Sight Physics-Based 3D Human Pose Estimation | Mariko Isogawa; Ye Yuan; Matthew O’Toole; Kris M. Kitani; | We describe a method for 3D human pose estimation from transient images (i.e., a 3D spatio-temporal histogram of photons) acquired by an optical non-line-of-sight (NLOS) imaging system. |
695 | Learning to Transfer Texture From Clothing Images to 3D Humans | Aymen Mir; Thiemo Alldieck; Gerard Pons-Moll; | In this paper, we present a simple yet effective method to automatically transfer textures of clothing images (front and back) to 3D garments worn on top SMPL, in real time. |
696 | UniPose: Unified Human Pose Estimation in Single Images and Videos | Bruno Artacho; Andreas Savakis; | We propose UniPose, a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. |
697 | Minimal Solutions to Relative Pose Estimation From Two Views Sharing a Common Direction With Unknown Focal Length | Yaqing Ding; Jian Yang; Jean Ponce; Hui Kong; | We propose minimal solutions to relative pose estimation problem from two views sharing a common direction with unknown focal length. |
698 | 3D Human Mesh Regression With Dense Correspondence | Wang Zeng; Wanli Ouyang; Ping Luo; Wentao Liu; Xiaogang Wang; | This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i.e. a 2D space used for texture mapping of 3D mesh). |
699 | Cross-Modal Pattern-Propagation for RGB-T Tracking | Chaoqun Wang; Chunyan Xu; Zhen Cui; Ling Zhou; Tong Zhang; Xiaoya Zhang; Jian Yang; | Motivated by our observations on RGB-T data that pattern correlations are high-frequently recurred across modalities also along sequence frames, in this paper, we propose a cross-modal pattern-propagation (CMPP) tracking framework to diffuse instance patterns across RGB-T data on spatial domain as well as temporal domain. |
700 | Distilling Knowledge From Graph Convolutional Networks | Yiding Yang; Jiayan Qiu; Mingli Song; Dacheng Tao; Xinchao Wang; | In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a pre-trained GCN model. |
701 | Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment | Po-Hsiang Huang; Fu-En Yang; Yu-Chiang Frank Wang; | In this paper, we propose a unique network of CrossID-GAN to perform multi-ID face reenactment. |
702 | Distribution-Aware Coordinate Representation for Human Pose Estimation | Feng Zhang; Xiatian Zhu; Hanbin Dai; Mao Ye; Ce Zhu; | For the first time, we find that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for the performance. We further probe the design limitations of the standard coordinate decoding method, and propose a more principled distributionaware decoding method. |
703 | Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification | Dechao Meng; Liang Li; Xuejing Liu; Yadong Li; Shijie Yang; Zheng-Jun Zha; Xingyu Gao; Shuhui Wang; Qingming Huang; | In this paper, we propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID. |
704 | HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map | Jameel Malik; Ibrahim Abdelaziz; Ahmed Elhayek; Soshi Shimada; Sk Aziz Ali; Vladislav Golyanik; Christian Theobalt; Didier Stricker; | In contrast, we propose a novel architecture with 3D convolutions trained in a weakly-supervised manner. |
705 | Determinant Regularization for Gradient-Efficient Graph Matching | Tianshu Yu; Junchi Yan; Baoxin Li; | In this paper, we show a novel regularization technique with the tool of determinant analysis on the matching matrix which is relaxed into continuous domain with gradient based optimization. |
706 | D3S – A Discriminative Single Shot Segmentation Tracker | Alan Lukezic; Jiri Matas; Matej Kristan; | We propose a discriminative single-shot segmentation tracker – D3S, which narrows the gap between visual object tracking and video object segmentation. |
707 | MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction | Francesco Marchetti; Federico Becattini; Lorenzo Seidenari; Alberto Del Bimbo; | In this paper we address the problem of multimodal trajectory prediction exploiting a Memory Augmented Neural Network. |
708 | End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances | Marin Toromanoff; Emilie Wirbel; Fabien Moutarde; | We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane keeping, pedestrians and vehicles avoidance, and traffic light detection. |
709 | GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-Wise Transformations | Xiang Gao; Wei Hu; Guo-Jun Qi; | To this end, we propose a novel unsupervised learning of Graph Transformation Equivariant Representations (GraphTER), aiming to capture intrinsic patterns of graph structure under both global and local transformations. |
710 | Can Facial Pose and Expression Be Separated With Weak Perspective Camera? | Evangelos Sariyanidi; Casey J. Zampella; Robert T. Schultz; Birkan Tunc; | This paper critically examines the suitability of WP camera for separating facial pose and expression. |
711 | Probabilistic Regression for Visual Tracking | Martin Danelljan; Luc Van Gool; Radu Timofte; | In this work, we therefore propose a probabilistic regression formulation and apply it to tracking. |
712 | 3DRegNet: A Deep Neural Network for 3D Point Registration | G. Dias Pais; Srikumar Ramalingam; Venu Madhav Govindu; Jacinto C. Nascimento; Rama Chellappa; Pedro Miraldo; | We present 3DRegNet, a novel deep learning architecture for the registration of 3D scans. |
713 | Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation | Matteo Fabbri; Fabio Lanzi; Simone Calderara; Stefano Alletto; Rita Cucchiara; | In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. |
714 | Three-Dimensional Reconstruction of Human Interactions | Mihai Fieraru; Mihai Zanfir; Elisabeta Oneata; Alin-Ionut Popa; Vlad Olaru; Cristian Sminchisescu; | This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2,525 contact events, 728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences within 138,213 selected contact regions. |
715 | Distribution-Induced Bidirectional Generative Adversarial Network for Graph Representation Learning | Shuai Zheng; Zhenfeng Zhu; Xingxing Zhang; Zhizhe Liu; Jian Cheng; Yao Zhao; | In this paper, we propose a Distribution-induced Bidirectional Generative Adversarial Network (named DBGAN) for graph representation learning. |
716 | Minimal Solvers for 3D Scan Alignment With Pairs of Intersecting Lines | Andre Mateus; Srikumar Ramalingam; Pedro Miraldo; | In this paper, we present minimal solvers that combine these different type of constraints: 1) three line intersections and one point match; 2) one line intersection and two point matches; 3) three line intersections and one plane match; 4) one line intersection and two plane matches; and 5) one line intersection, one point match, and one plane match. |
717 | Wavelet Integrated CNNs for Noise-Robust Image Classification | Qiufu Li; Linlin Shen; Sheng Guo; Zhihui Lai; | We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification. |
718 | Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning | Byungsoo Ko; Geonmo Gu; | In this paper, inspired by query expansion and database augmentation, we propose an augmentation method in an embedding space for pair-based metric learning losses, called embedding expansion. |
719 | PropagationNet: Propagate Points to Curve to Learn Structure Information | Xiehe Huang; Weihong Deng; Haifeng Shen; Xiubao Zhang; Jieping Ye; | In this paper, we explore the instincts and reasons behind our two proposals, i.e. Propagation Module and Focal Wing Loss, to tackle the problem. |
720 | Sequential 3D Human Pose and Shape Estimation From Point Clouds | Kangkan Wang; Jin Xie; Guofeng Zhang; Lei Liu; Jian Yang; | In this paper, we propose a novel sequential 3D human pose and shape estimation framework from a sequence of point clouds. |
721 | Improving the Robustness of Capsule Networks to Image Affine Transformations | Jindong Gu; Volker Tresp; | Furthermore, we explore the limitations of capsule transformations and propose affine CapsNets (Aff-CapsNets), which are more robust to affine transformations. |
722 | Noise Modeling, Synthesis and Classification for Generic Object Anti-Spoofing | Joel Stehouwer; Amin Jourabloo; Yaojie Liu; Xiaoming Liu; | In this work, we define and tackle the problem of Generic Object Anti-Spoofing (GOAS) for the first time. |
723 | Quaternion Product Units for Deep Learning on 3D Rotation Groups | Xuan Zhang; Shaofei Qin; Yi Xu; Hongteng Xu; | We propose a novel quaternion product unit (QPU) to represent data on 3D rotation groups. |
724 | Unsupervised Representation Learning for Gaze Estimation | Yu Yu; Jean-Marc Odobez; | To address this issue, our main contribution in this paper is to propose an effective approach to learn a low dimensional gaze representation without gaze annotations, which to the best of our best knowledge, is the first work to do so. |
725 | P-nets: Deep Polynomial Neural Networks | Grigorios G. Chrysos; Stylianos Moschoglou; Giorgos Bouritsas; Yannis Panagakis; Jiankang Deng; Stefanos Zafeiriou; | In this paper, we propose \Pi-Nets, a new class of DCNNs. |
726 | Hierarchically Robust Representation Learning | Qi Qian; Juhua Hu; Hao Li; | In this work, we investigate this phenomenon and demonstrate that deep features can be suboptimal due to the fact that they are learned by minimizing the empirical risk. |
727 | How Useful Is Self-Supervised Pretraining for Visual Tasks? | Alejandro Newell; Jia Deng; | We investigate what factors may play a role in the utility of these pretraining methods for practitioners. |
728 | Copy and Paste GAN: Face Hallucination From Shaded Thumbnails | Yang Zhang; Ivor W. Tsang; Yawei Luo; Chang-Hui Hu; Xiaobo Lu; Xin Yu; | This paper proposes a Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination. |
729 | TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style | Chaitanya Patel; Zhouyingcheng Liao; Gerard Pons-Moll; | In this paper, we present TailorNet, a neural model which predicts clothing deformation in 3D as a function of three factors: pose, shape and style (garment geometry), while retaining wrinkle detail. |
730 | Object-Occluded Human Shape and Pose Estimation From a Single Color Image | Tianshu Zhang; Buzhen Huang; Yangang Wang; | In this paper, we focus on the problem of directly estimating the object-occluded human shape and pose from single color images. To supervise the network training, we further build a novel dataset named as 3DOH50K. |
731 | Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking | Jin Gao; Weiming Hu; Yan Lu; | Despite many dedicated techniques proposed to somehow treat those issues, in this paper we take a new way to strike a compromise between them based on the recursive least-squares estimation (LSE) algorithm. |
732 | Self-Supervised Monocular Scene Flow Estimation | Junhwa Hur; Stefan Roth; | We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance. |
733 | Learning Fast and Robust Target Models for Video Object Segmentation | Andreas Robinson; Felix Jaremo Lawin; Martin Danelljan; Fahad Shahbaz Khan; Michael Felsberg; | We propose a novel VOS architecture consisting of two network components. |
734 | Reciprocal Learning Networks for Human Trajectory Prediction | Hao Sun; Zhiqun Zhao; Zhihai He; | Based on this unique property, we develop a new approach, called reciprocal learning, for human trajectory prediction. |
735 | Nonparametric Object and Parts Modeling With Lie Group Dynamics | David S. Hayden; Jason Pacheco; John W. Fisher III; | Here, we relax such strong assumptions via an unsupervised, Bayesian nonparametric parts model that infers an unknown number of parts with motions coupled by a body dynamic and parameterized by SE(D), the Lie group of rigid transformations. |
736 | Learning to Shadow Hand-Drawn Sketches | Qingyuan Zheng; Zhuoru Li; Adam Bargteil; | We present a fully automatic method to generate detailed and accurate artistic shadows from pairs of line drawing sketches and lighting directions. |
737 | Intuitive, Interactive Beard and Hair Synthesis With Generative Models | Kyle Olszewski; Duygu Ceylan; Jun Xing; Jose Echevarria; Zhili Chen; Weikai Chen; Hao Li; | We present an interactive approach to synthesizing realistic variations in facial hair in images, ranging from subtle edits to existing hair to the addition of complex and challenging hair in images of clean-shaven subjects. |
738 | Semantic Pyramid for Image Generation | Assaf Shocher; Yossi Gandelsman; Inbar Mosseri; Michal Yarom; Michal Irani; William T. Freeman; Tali Dekel; | We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. |
739 | SynSin: End-to-End View Synthesis From a Single Image | Olivia Wiles; Georgia Gkioxari; Richard Szeliski; Justin Johnson; | We propose a novel end-to-end model for this task using a single image at test time; it is trained on real images without any ground-truth 3D information. |
740 | A Characteristic Function Approach to Deep Implicit Generative Modeling | Abdul Fatir Ansari; Jonathan Scarlett; Harold Soh; | In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions. |
741 | High-Resolution Daytime Translation Without Domain Labels | Ivan Anokhin; Pavel Solovev; Denis Korzhenkov; Alexey Kharlamov; Taras Khakhulin; Aleksei Silvestrov; Sergey Nikolenko; Victor Lempitsky; Gleb Sterkin; | We present the high-resolution daytime translation (HiDT) model for this task. |
742 | Leveraging 2D Data to Learn Textured 3D Mesh Generation | Paul Henderson; Vagia Tsiminaki; Christoph H. Lampert; | In this work, we present the first generative model of textured 3D meshes. |
743 | Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting | Zili Yi; Qiang Tang; Shekoofeh Azizi; Daesik Jang; Zhan Xu; | Motivated by this, we propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches, thus only requiring a low-resolution prediction from the network. |
744 | Flow Contrastive Estimation of Energy-Based Models | Ruiqi Gao; Erik Nijkamp; Diederik P. Kingma; Zhen Xu; Andrew M. Dai; Ying Nian Wu; | This paper studies a training method to jointly estimate an energy-based model and a flow-based model, in which the two models are iteratively updated based on a shared adversarial value function. |
745 | Hardware-in-the-Loop End-to-End Optimization of Camera Image Processing Pipelines | Ali Mosleh; Avinash Sharma; Emmanuel Onzon; Fahim Mannan; Nicolas Robidoux; Felix Heide; | Departing from such approximations, we present a hardware-in-the-loop method that directly optimizes hardware image processing pipelines for end-to-end domain-specific losses by solving a nonlinear multi-objective optimization problem with a novel 0th-order stochastic solver directly interfaced with the hardware ISP. |
746 | Search to Distill: Pearls Are Everywhere but Not the Eyes | Yu Liu; Xuhui Jia; Mingxing Tan; Raviteja Vemulapalli; Yukun Zhu; Bradley Green; Xiaogang Wang; | To achieve this, we present a new Architecture-aware Knowledge Distillation (AKD) approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model. |
747 | Total Deep Variation for Linear Inverse Problems | Erich Kobler; Alexander Effland; Karl Kunisch; Thomas Pock; | In this paper, we propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning. |
748 | Relative Interior Rule in Block-Coordinate Descent | Tomas Werner; Daniel Prusa; Tomas Dlask; | Based on this observation, we develop a theoretical framework for block-coordinate descent applied to general convex problems. |
749 | Learning Combinatorial Solver for Graph Matching | Tao Wang; He Liu; Yidong Li; Yi Jin; Xiaohui Hou; Haibin Ling; | In this paper we propose a fully trainable framework for graph matching, in which learning of affinities and solving for combinatorial optimization are not explicitly separated as in many previous arts. |
750 | SampleNet: Differentiable Point Cloud Sampling | Itai Lang; Asaf Manor; Shai Avidan; | We introduce a novel differentiable relaxation for point cloud sampling that approximates sampled points as a mixture of points in the primary input cloud. |
751 | Can We Learn Heuristics for Graphical Model Inference Using Reinforcement Learning? | Safa Messaoud; Maghav Kumar; Alexander G. Schwing; | In this paper, we show that we can learn program heuristics, i.e., policies, for solving inference in higher order CRFs for the task of semantic segmentation, using reinforcement learning. |
752 | Quasi-Newton Solver for Robust Non-Rigid Registration | Yuxin Yao; Bailin Deng; Weiwei Xu; Juyong Zhang; | In this paper, we propose a formulation for robust non-rigid registration based on a globally smooth robust estimator for data fitting and regularization, which can handle outliers and partial overlaps. |
753 | Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective | Muhammad Abdullah Jamal; Matthew Brown; Ming-Hsuan Yang; Liqiang Wang; Boqing Gong; | To this end, we propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach. |
754 | Optimizing Rank-Based Metrics With Blackbox Differentiation | Michal Rolinek; Vit Musil; Anselm Paulus; Marin Vlastelica; Claudio Michaelis; Georg Martius; | We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent. |
755 | DualSDF: Semantic Shape Manipulation Using a Two-Level Representation | Zekun Hao; Hadar Averbuch-Elor; Noah Snavely; Serge Belongie; | We propose DualSDF, a representation expressing shapes at two levels of granularity, one capturing fine details and the other representing an abstracted proxy shape using simple and semantically consistent shape primitives. |
756 | Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives | Duo Li; Qifeng Chen; | Complementary to previous training strategies, we propose Dynamic Hierarchical Mimicking, a generic feature learning mechanism, to advance CNN training with enhanced generalization ability. |
757 | Deep Homography Estimation for Dynamic Scenes | Hoang Le; Feng Liu; Shu Zhang; Aseem Agarwala; | This paper investigates and discusses how to design and train a deep neural network that handles dynamic scenes. |
758 | PF-Net: Point Fractal Network for 3D Point Cloud Completion | Zitian Huang; Yikuan Yu; Jiawen Xu; Feng Ni; Xinyi Le; | In this paper, we propose a Point Fractal Network (PF-Net), a novel learning-based approach for precise and high-fidelity point cloud completion. |
759 | On the Regularization Properties of Structured Dropout | Ambar Pal; Connor Lane; Rene Vidal; Benjamin D. Haeffele; | In this work we show that for single hidden-layer linear networks, DropBlock induces spectral k-support norm regularization, and promotes solutions that are low-rank and have factors with equal norm. |
760 | Learning Oracle Attention for High-Fidelity Face Completion | Tong Zhou; Changxing Ding; Shaowen Lin; Xinchao Wang; Dacheng Tao; | Accordingly, in this paper, we design a comprehensive framework for face completion based on the U-Net structure. |
761 | Deep Image Spatial Transformation for Person Image Generation | Yurui Ren; Xiaoming Yu; Junming Chen; Thomas H. Li; Ge Li; | In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level. |
762 | Learning to Optimize on SPD Manifolds | Zhi Gao; Yuwei Wu; Yunde Jia; Mehrtash Harandi; | In this paper, we propose a meta-learning method to automatically learn an iterative optimizer on SPD manifolds. |
763 | Deep 3D Portrait From a Single Image | Sicheng Xu; Jiaolong Yang; Dong Chen; Fang Wen; Yu Deng; Yunde Jia; Xin Tong; | In this paper, we present a learning-based approach for recovering the 3D geometry of human head from a single portrait image. |
764 | RDCFace: Radial Distortion Correction for Face Recognition | He Zhao; Xianghua Ying; Yongjie Shi; Xin Tong; Jingsi Wen; Hongbin Zha; | In this paper, we propose a distortion-invariant face recognition system called RDCFace, which directly and only utilize the distorted images of faces, to alleviate the effects of radial lens distortion. |
765 | Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition | Yaobin Zhang; Weihong Deng; Mei Wang; Jiani Hu; Xian Li; Dongyue Zhao; Dongchao Wen; | To solve this problem, we propose an effective automatic label noise cleansing framework for face recognition datasets, FaceGraph. |
766 | MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis | Shuchen Weng; Wenbo Li; Dawei Li; Hongxia Jin; Boxin Shi; | In this paper, we explore synthesizing person images with multiple conditions for various backgrounds. |
767 | SAINT: Spatially Aware Interpolation NeTwork for Medical Slice Synthesis | Cheng Peng; Wei-An Lin; Haofu Liao; Rama Chellappa; S. Kevin Zhou; | In this paper, we introduce a Spatially Aware Interpolation NeTwork (SAINT) for medical slice synthesis to alleviate the memory constraint that volumetric data poses. |
768 | Recurrent Feature Reasoning for Image Inpainting | Jingyuan Li; Ning Wang; Lefei Zhang; Bo Du; Dacheng Tao; | In this paper, we devise a Recurrent Feature Reasoning (RFR) network which is mainly constructed by a plug-and-play Recurrent Feature Reasoning module and a Knowledge Consistent Attention (KCA) module. |
769 | Structure-Preserving Super Resolution With Gradient Guidance | Cheng Ma; Yongming Rao; Yean Cheng; Ce Chen; Jiwen Lu; Jie Zhou; | In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details. |
770 | Epipolar Transformers | Yihui He; Rui Yan; Katerina Fragkiadaki; Shoou-I Yu; | Therefore, we propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. |
771 | Diversified Arbitrary Style Transfer via Deep Feature Perturbation | Zhizhong Wang; Lei Zhao; Haibo Chen; Lihong Qiu; Qihang Mo; Sihuan Lin; Wei Xing; Dongming Lu; | In this paper, we tackle these limitations and propose a simple yet effective method for diversified arbitrary style transfer. |
772 | MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks | Animesh Karnewar; Oliver Wang; | In this work, we propose the Multi-Scale Gradient Generative Adversarial Network (MSG-GAN), a simple but effective technique for addressing this by allowing the flow of gradients from the discriminator to the generator at multiple scales. |
773 | Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization | Miao Zhang; Huiqi Li; Shirui Pan; Xiaojun Chang; Steven Su; | In this paper, we formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures during the supernet training. |
774 | Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds | Mohsen Joneidi; Saeed Vahidian; Ashkan Esmaeili; Weijia Wang; Nazanin Rahnavard; Bill Lin; Mubarak Shah; | A simple and efficient selection algorithm with a linear complexity order, referred to as spectrum pursuit (SP), is proposed that pursuits spectral components of the dataset using available sample points. |
775 | Neural Point Cloud Rendering via Multi-Plane Projection | Peng Dai; Yinda Zhang; Zhuwen Li; Shuaicheng Liu; Bing Zeng; | We present a new deep point cloud rendering pipeline through multi-plane projections. |
776 | Wish You Were Here: Context-Aware Human Generation | Oran Gafni; Lior Wolf; | We present a novel method for inserting objects, specifically humans, into existing images, such that they blend in a photorealistic manner, while respecting the semantic context of the scene. |
777 | Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content | Han Yang; Ruimao Zhang; Xiaobao Guo; Wei Liu; Wangmeng Zuo; Ping Luo; | To address this issue, we propose a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN). |
778 | Breaking the Cycle – Colleagues Are All You Need | Ori Nizan; Ayellet Tal; | This paper proposes a novel approach to performing image-to-image translation between unpaired domains. |
779 | Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation | Hao Tang; Dan Xu; Yan Yan; Philip H.S. Torr; Nicu Sebe; | In this paper, we address the task of semantic-guided scene generation. |
780 | ManiGAN: Text-Guided Image Manipulation | Bowen Li; Xiaojuan Qi; Thomas Lukasiewicz; Philip H.S. Torr; | The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text. |
781 | Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions | Ricard Durall; Margret Keuper; Janis Keuper; | In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly. |
782 | Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems | Patrick Knobelreiter; Christian Sormann; Alexander Shekhovtsov; Friedrich Fraundorfer; Thomas Pock; | In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: connect it to learning formulations with losses on marginals and compute the backprop operation. |
783 | Barycenters of Natural Images Constrained Wasserstein Barycenters for Image Morphing | Dror Simon; Aviad Aberdam; | In this work, we propose a novel approach for image morphing that possesses all three desired properties. |
784 | Guided Variational Autoencoder for Disentanglement Learning | Zheng Ding; Yifan Xu; Weijian Xu; Gaurav Parmar; Yang Yang; Max Welling; Zhuowen Tu; | We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning. |
785 | Cross-Spectral Face Hallucination via Disentangling Independent Factors | Boyan Duan; Chaoyou Fu; Yi Li; Xingguang Song; Ran He; | Rather than building a monolithic but complex structure, this paper proposes a Pose Aligned Cross-spectral Hallucination (PACH) approach to disentangle the independent factors and deal with them in individual stages. |
786 | Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules | Zhengxue Cheng; Heming Sun; Masaru Takeuchi; Jiro Katto; | Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. |
787 | C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds | Albert Pumarola; Stefan Popov; Francesc Moreno-Noguer; Vittorio Ferrari; | In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for multimodal data modeling. |
788 | Cogradient Descent for Bilinear Optimization | Li’an Zhuo; Baochang Zhang; Linlin Yang; Hanlin Chen; Qixiang Ye; David Doermann; Rongrong Ji; Guodong Guo; | In this paper, we introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem, based on a theoretical framework to coordinate the gradient of hidden variables via a projection function. |
789 | Instance-Aware Image Colorization | Jheng-Wei Su; Hung-Kuo Chu; Jia-Bin Huang; | In this paper, we propose a method for achieving instance-aware colorization. |
790 | Joint Training of Variational Auto-Encoder and Latent Energy-Based Model | Tian Han; Erik Nijkamp; Linqi Zhou; Bo Pang; Song-Chun Zhu; Ying Nian Wu; | This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM). |
791 | Adaptive Loss-Aware Quantization for Multi-Bit Networks | Zhongnan Qu; Zimu Zhou; Yun Cheng; Lothar Thiele; | We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. |
792 | ScopeFlow: Dynamic Scene Scoping for Optical Flow | Aviram Bar-Haim; Lior Wolf; | We propose to modify the common training protocols of optical flow, leading to sizable accuracy improvements without adding to the computational complexity of the training process. |
793 | Video Super-Resolution With Temporal Group Attention | Takashi Isobe; Songjiang Li; Xu Jia; Shanxin Yuan; Gregory Slabaugh; Chunjing Xu; Ya-Li Li; Shengjin Wang; Qi Tian; | In this work, we propose a novel method that can effectively incorporate temporal information in a hierarchical way. |
794 | Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression | Yawei Li; Shuhang Gu; Christoph Mayer; Luc Van Gool; Radu Timofte; | In this paper, we analyze two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense. |
795 | 3D Photography Using Context-Aware Layered Depth Inpainting | Meng-Li Shih; Shih-Yang Su; Johannes Kopf; Jia-Bin Huang; | We propose a method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view. |
796 | MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation | Yuheng Li; Krishna Kumar Singh; Utkarsh Ojha; Yong Jae Lee; | We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. |
797 | Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer | Yerlan Idelbayev; Miguel A. Carreira-Perpinan; | We show that, with a suitable formulation, this problem is amenable to a mixed discrete-continuous optimization jointly over the ranks and over the matrix elements, and give a corresponding algorithm. |
798 | Global Texture Enhancement for Fake Face Detection in the Wild | Zhengzhe Liu; Xiaojuan Qi; Philip H.S. Torr; | In this paper, we conduct an empirical study on fake/real faces, and have two important observations: firstly, the texture of fake faces is substantially different from real ones; secondly, global texture statistics are more robust to image editing and transferable to fake faces from different GANs and datasets. |
799 | Panoptic-Based Image Synthesis | Aysegul Dundar; Karan Sapra; Guilin Liu; Andrew Tao; Bryan Catanzaro; | We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps which unify semantic and instance information. |
800 | Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination | Pratul P. Srinivasan; Ben Mildenhall; Matthew Tancik; Jonathan T. Barron; Richard Tucker; Noah Snavely; | We present a deep learning solution for estimating the incident illumination at any 3D location within a scene from an input narrow-baseline stereo image pair. |
801 | Learning to Cartoonize Using White-Box Cartoon Representations | Xinrui Wang; Jinze Yu; | This paper presents an approach for image cartoonization. |
802 | End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization | Bo Chen; Alvaro Parra; Jiewei Cao; Nan Li; Tat-Jun Chin; | Towards this aim, we present BPnP, a novel network module that backpropagates gradients through a Perspective-n-Points (PnP) solver to guide parameter updates of a neural network. |
803 | Analyzing and Improving the Image Quality of StyleGAN | Tero Karras; Samuli Laine; Miika Aittala; Janne Hellsten; Jaakko Lehtinen; Timo Aila; | We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. |
804 | Fashion Editing With Adversarial Parsing Learning | Haoye Dong; Xiaodan Liang; Yixuan Zhang; Xujie Zhang; Xiaohui Shen; Zhenyu Xie; Bowen Wu; Jian Yin; | In this paper, we propose a novel Fashion Editing Generative Adversarial Network (FE-GAN), which is capable of manipulating fashion images by free-form sketches and sparse color strokes. |
805 | Augment Your Batch: Improving Generalization Through Instance Repetition | Elad Hoffer; Tal Ben-Nun; Itay Hubara; Niv Giladi; Torsten Hoefler; Daniel Soudry; | We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. |
806 | ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes | Daquan Liu; Chengjiang Long; Hongpan Zhang; Hanning Yu; Xinzhi Dong; Chunxia Xiao; | To address this problem, we propose an end-to-end Generative Adversarial Network for shadow generation named ARShadowGAN for augmented reality in single light scenes. |
807 | An End-to-End Edge Aggregation Network for Moving Object Segmentation | Prashant W. Patil; Kuldeep M. Biradar; Akshay Dudhane; Subrahmanyam Murala; | In this paper, the inherent correlation learning-based edge extraction mechanism (EEM) and dense residual block (DRB) are proposed for the discriminative foreground representation. |
808 | Learning Video Stabilization Using Optical Flow | Jiyang Yu; Ravi Ramamoorthi; | We propose a novel neural network that infers the per-pixel warp fields for video stabilization from the optical flow fields of the input video. |
809 | Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation | Runfa Chen; Wenbing Huang; Binghui Huang; Fuchun Sun; Bin Fang; | To tackle this issue, we develop a decoupled training strategy by which the encoder is only trained when maximizing the adversary loss while keeping frozen otherwise. |
810 | Robust Design of Deep Neural Networks Against Adversarial Attacks Based on Lyapunov Theory | Arash Rahnama; Andre T. Nguyen; Edward Raff; | In this work, we take a control theoretic approach to the problem of robustness in DNNs. |
811 | StarGAN v2: Diverse Image Synthesis for Multiple Domains | Yunjey Choi; Youngjung Uh; Jaejun Yoo; Jung-Woo Ha; | We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. |
812 | Warping Residual Based Image Stitching for Large Parallax | Kyu-Yul Lee; Jae-Young Sim; | In this paper, we propose an image stitching algorithm robust to large parallax based on the novel concept of warping residuals. |
813 | A U-Net Based Discriminator for Generative Adversarial Networks | Edgar Schonfeld; Bernt Schiele; Anna Khoreva; | To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature. |
814 | Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping | Ran Yi; Yong-Jin Liu; Yu-Kun Lai; Paul L. Rosin; | To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible (by a truncation loss) and only embedded in selective facial regions (by a relaxed forward cycle-consistency loss). |
815 | When to Use Convolutional Neural Networks for Inverse Problems | Nathaniel Chodosh; Simon Lucey; | In this work we argue that for some types of inverse problems the CNN approximation breaks down leading to poor performance. |
816 | LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood | Abhinav Kumar; Tim K. Marks; Wenxuan Mou; Ye Wang; Michael Jones; Anoop Cherian; Toshiaki Koike-Akino; Xiaoming Liu; Chen Feng; | In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. |
817 | Affinity Graph Supervision for Visual Recognition | Chu Wang; Babak Samari; Vladimir G. Kim; Siddhartha Chaudhuri; Kaleem Siddiqi; | Here we propose a principled method to directly supervise the learning of weights in affinity graphs, to exploit meaningful connections between entities in the data source. |
818 | Unsupervised Magnification of Posture Deviations Across Subjects | Michael Dorkenwald; Uta Buchler; Bjorn Ommer; | We present an approach to unsupervised magnification of posture differences across individuals despite large deviations in appearance. |
819 | Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network | Fukun Yin; Shizhe Zhou; | In this paper we address the problem of accurately estimating the height of a person with arbitrary postures from a single depth image. |
820 | Fast Soft Color Segmentation | Naofumi Akimoto; Huachun Zhu; Yanghua Jin; Yoshimitsu Aoki; | To address this issue, we propose a neural network based method for this task that decomposes a given image into multiple layers in a single forward pass. |
821 | Global Optimality for Point Set Registration Using Semidefinite Programming | Jose Pedro Iglesias; Carl Olsson; Fredrik Kahl; | In this paper we present a study of global optimality conditions for Point Set Registration (PSR) with missing data. |
822 | Image2StyleGAN++: How to Edit the Embedded Images? | Rameen Abdal; Yipeng Qin; Peter Wonka; | We propose Image2StyleGAN++, a flexible image editing framework with many applications. |
823 | SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking | Yanru Huang; Feiyu Zhu; Zheni Zeng; Xi Qiu; Yuan Shen; Jianan Wu; | We present a novel self quality evaluation metric SQE for parameters optimization in the challenging yet critical multi-object tracking task. |
824 | EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning | Lin Wang; Tae-Kyun Kim; Kuk-Jin Yoon; | To tackle the challenges, we propose a novel end-to-end pipeline that reconstructs LR images from event streams, enhances the image qualities and upsamples the enhanced images, called EventSR. |
825 | Hierarchical Pyramid Diverse Attention Networks for Face Recognition | Qiangchang Wang; Tianyi Wu; He Zheng; Guodong Guo; | In this work, we propose a hierarchical pyramid diverse attention (HPDA) network. |
826 | RGBD-Dog: Predicting Canine Pose from RGBD Sensors | Sinead Kearney; Wenbin Li; Martin Parsons; Kwang In Kim; Darren Cosker; | In our work, we focus on the problem of 3D canine pose estimation from RGBD images, recording a diverse range of dog breeds with several Microsoft Kinect v2s, simultaneously obtaining the 3D ground truth skeleton via a motion capture system. |
827 | Multi-Scale Progressive Fusion Network for Single Image Deraining | Kui Jiang; Zhongyuan Wang; Peng Yi; Chen Chen; Baojin Huang; Yimin Luo; Jiayi Ma; Junjun Jiang; | In this work, we explore the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi-scale progressive fusion network (MSPFN) for single image rain streak removal. |
828 | Learning a Neural 3D Texture Space From 2D Exemplars | Philipp Henzler; Niloy J. Mitra; Tobias Ritschel; | We suggest a generative model of 2D and 3D natural textures with diversity, visual fidelity and at high computational efficiency. |
829 | BachGAN: High-Resolution Image Synthesis From Salient Object Layout | Yandong Li; Yu Cheng; Zhe Gan; Licheng Yu; Liqiang Wang; Jingjing Liu; | We propose a new task towards more practical applications for image generation – high-quality image synthesis from salient object layout. |
830 | Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy | Jaejun Yoo; Namhyuk Ahn; Kyung-Ah Sohn; | In this paper, we provide a comprehensive analysis of the existing augmentation methods applied to the super-resolution task. |
831 | On Positive-Unlabeled Classification in GAN | Tianyu Guo; Chang Xu; Jiajun Huang; Yunhe Wang; Boxin Shi; Chao Xu; Dacheng Tao; | This paper defines a positive and unlabeled classification problem for standard GANs, which then leads to a novel technique to stabilize the training of the discriminator in GANs. |
832 | DoveNet: Deep Image Harmonization via Domain Verification | Wenyan Cong; Jianfu Zhang; Li Niu; Liu Liu; Zhixin Ling; Weiyuan Li; Liqing Zhang; | In this work, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on COCO (resp., Adobe5k, Flickr, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, HFlickr, Hday2night) sub-dataset. |
833 | Noise Robust Generative Adversarial Networks | Takuhiro Kaneko; Tatsuya Harada; | As an alternative, we propose a novel family of GANs called noise robust GANs (NR-GANs), which can learn a clean image generator even when training images are noisy. |
834 | Normalizing Flows With Multi-Scale Autoregressive Priors | Apratim Bhattacharyya; Shweta Mahajan; Mario Fritz; Bernt Schiele; Stefan Roth; | In this work, we improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR). |
835 | Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution | Gyumin Shim; Jinsun Park; In So Kweon; | In this paper, we propose a novel and efficient reference feature extraction module referred to as the Similarity Search and Extraction Network (SSEN) for reference-based super-resolution (RefSR) tasks. |
836 | Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings | Amy Zhao; Guha Balakrishnan; Kathleen M. Lewis; Fredo Durand; John V. Guttag; Adrian V. Dalca; | We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. |
837 | GeoDA: A Geometric Framework for Black-Box Adversarial Attacks | Ali Rahmati; Seyed-Mohsen Moosavi-Dezfooli; Pascal Frossard; Huaiyu Dai; | We propose a geometric framework to generate adversarial examples in one of the most challenging black-box settings where the adversary can only generate a small number of queries, each of them returning the top-1 label of the classifier. |
838 | GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data | Seongwook Yoon; Sanghoon Sull; | We propose a novel imputation method for highly missing data. |
839 | An Internal Covariate Shift Bounding Algorithm for Deep Neural Networks by Unitizing Layers’ Outputs | You Huang; Yuanlong Yu; | Thus this paper proposes a measure for ICS by using the Earth Mover (EM) distance and then derives the upper and lower bounds for the measure to provide a theoretical analysis of BN. |
840 | A Unified Optimization Framework for Low-Rank Inducing Penalties | Marcus Valtonen Ornhag; Carl Olsson; | In this paper we study the convex envelopes of a new class of functions. |
841 | Single-Side Domain Generalization for Face Anti-Spoofing | Yunpei Jia; Jie Zhang; Shiguang Shan; Xilin Chen; | In this work, we propose an end-to-end single-side domain generalization framework (SSDG) to improve the generalization ability of face anti-spoofing. |
842 | The Knowledge Within: Methods for Data-Free Model Compression | Matan Haroush; Itay Hubara; Elad Hoffer; Daniel Soudry; | Contributions: We present three methods for generating synthetic samples from trained models. Then, we demonstrate how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process. |
843 | Scale-Space Flow for End-to-End Optimized Video Compression | Eirikur Agustsson; David Minnen; Nick Johnston; Johannes Balle; Sung Jin Hwang; George Toderici; | In this paper, we show that a generalized warping operator that better handles common failure cases, e.g. disocclusions and fast motion, can provide competitive compression results with a greatly simplified model and training procedure. |
844 | Dynamic Neural Relational Inference | Colin Graber; Alexander G. Schwing; | In response to this, we develop Dynamic Neural Relational Inference (dNRI), which incorporates insights from sequential latent variable models to predict separate relation graphs for every time-step. |
845 | Real-Time Panoptic Segmentation From Dense Detections | Rui Hou; Jie Li; Arjun Bhargava; Allan Raventos; Vitor Guizilini; Chao Fang; Jerome Lynch; Adrien Gaidon; | In this paper, we propose a new single-shot panoptic segmentation network that leverages dense detections and a global self-attention mechanism to operate in real-time with performance approaching the state of the art. |
846 | Deep Snake for Real-Time Instance Segmentation | Sida Peng; Wen Jiang; Huaijin Pi; Xiuli Li; Hujun Bao; Xiaowei Zhou; | This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. |
847 | AdaCoSeg: Adaptive Shape Co-Segmentation With Group Consistency Loss | Chenyang Zhu; Kai Xu; Siddhartha Chaudhuri; Li Yi; Leonidas J. Guibas; Hao Zhang; | We introduce AdaCoSeg, a deep neural network architecture for adaptive co-segmentation of a set of 3D shapes represented as point clouds. |
848 | Learning Dynamic Routing for Semantic Segmentation | Yanwei Li; Lin Song; Yukang Chen; Zeming Li; Xiangyu Zhang; Xingang Wang; Jian Sun; | This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. |
849 | Boosting Semantic Human Matting With Coarse Annotations | Jinlin Liu; Yuan Yao; Wendi Hou; Miaomiao Cui; Xuansong Xie; Changshui Zhang; Xian-Sheng Hua; | In this paper, we propose to leverage coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input. |
850 | BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation | Hao Chen; Kunyang Sun; Zhi Tian; Chunhua Shen; Yongming Huang; Youliang Yan; | Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches. |
851 | UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders | Jing Zhang; Deng-Ping Fan; Yuchao Dai; Saeed Anwar; Fatemeh Sadat Saleh; Tong Zhang; Nick Barnes; | In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. |
852 | Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence | Nicolas Donati; Abhishek Sharma; Maks Ovsjanikov; | We present a novel learning-based approach for computing correspondences between non-rigid 3D shapes. |
853 | Deep Polarization Cues for Transparent Object Segmentation | Agastya Kalra; Vage Taamazyan; Supreeth Krishna Rao; Kartik Venkataraman; Ramesh Raskar; Achuta Kadambi; | This paper reframes the problem of transparent object segmentation into the realm of light polarization, i.e., the rotation of light waves. |
854 | DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes | Jonas Schult; Francis Engelmann; Theodora Kontogianni; Bastian Leibe; | We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. |
855 | F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation | Konstantin Sofiiuk; Ilia Petrov; Olga Barinova; Anton Konushin; | We propose f-BRS (feature backpropagating refinement scheme) that solves an optimization problem with respect to auxiliary variables instead of the network inputs, and requires running forward and backward passes just for a small part of a network. |
856 | Approximating shapes in images with low-complexity polygons | Muxingzi Li; Florent Lafarge; Renaud Marlet; | We present an algorithm for extracting and vectorizing objects in images with polygons. |
857 | Towards Visually Explaining Variational Autoencoders | Wenqian Liu; Runze Li; Meng Zheng; Srikrishna Karanam; Ziyan Wu; Bir Bhanu; Richard J. Radke; Octavia Camps; | In this work, we take a step towards bridging this crucial gap, proposing the first technique to visually explain VAEs by means of gradient-based attention. |
858 | Towards Global Explanations of Convolutional Neural Networks With Concept Attribution | Weibin Wu; Yuxin Su; Xixian Chen; Shenglin Zhao; Irwin King; Michael R. Lyu; Yu-Wing Tai; | To overcome such drawbacks, we propose a novel two-stage framework, Attacking for Interpretability (AfI), which explains model decisions in terms of the importance of user-defined concepts. |
859 | Interpretable and Accurate Fine-grained Recognition via Region Grouping | Zixuan Huang; Yin Li; | We present an interpretable deep model for fine-grained visual recognition. |
860 | SAM: The Sensitivity of Attribution Methods to Hyperparameters | Naman Bansal; Chirag Agarwal; Anh Nguyen; | In this paper, we provide a thorough empirical study on the sensitivity of existing attribution methods. |
861 | High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks | Haohan Wang; Xindi Wu; Zeyi Huang; Eric P. Xing; | We investigate the relationship between the frequency spectrum of image data and the generalization behavior of convolutional neural networks (CNN). |
862 | FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions | Shaohua Li; Kaiping Xue; Bin Zhu; Chenkai Ding; Xindi Gao; David Wei; Tao Wan; | In this paper, we focus on the scenario where clients want to classify private images with a convolutional neural network model hosted in the server, while both parties keep their data private. |
863 | Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion | Hongxu Yin; Pavlo Molchanov; Jose M. Alvarez; Zhizhong Li; Arun Mallya; Derek Hoiem; Niraj K. Jha; Jan Kautz; | We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network. |
864 | Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering | Hui Tang; Ke Chen; Kui Jia; | To alleviate this risk, we are motivated by the assumption of structural domain similarity, and propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data. |
865 | HyperSTAR: Task-Aware Hyperparameters for Deep Networks | Gaurav Mittal; Chang Liu; Nikolaos Karianakis; Victor Fragoso; Mei Chen; Yun Fu; | To reduce HPO time, we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware method to warm-start HPO for deep neural networks. |
866 | ActBERT: Learning Global-Local Video-Text Representations | Linchao Zhu; Yi Yang; | In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data. |
867 | State-Relabeling Adversarial Active Learning | Beichen Zhang; Liang Li; Shijie Yang; Shuhui Wang; Zheng-Jun Zha; Qingming Huang; | In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples. |
868 | Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization | Jinjie Mai; Meng Yang; Wenfeng Luo; | To remedy this, we propose a simple yet powerful approach by introducing a novel adversarial erasing technique, erasing integrated learning (EIL). |
869 | A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning | Dat Huynh; Ehsan Elhamifar; | In this work, we develop a shared multi-attention model for multi-label zero-shot learning. |
870 | Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos | Tomas Jakab; Ankush Gupta; Hakan Bilen; Andrea Vedaldi; | We propose a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses. |
871 | Few-Shot Open-Set Recognition Using Meta-Learning | Bo Liu; Hao Kang; Haoxiang Li; Gang Hua; Nuno Vasconcelos; | This combines the random selection of a set of novel classes per episode, a loss that maximizes the posterior entropy for examples of those classes, and a new metric learning formulation based on the Mahalanobis distance. |
872 | Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions | Han-Jia Ye; Hexiang Hu; De-Chuan Zhan; Fei Sha; | In this paper, we propose a novel approach to adapt the instance embeddings to the target classification task with a set-to-set function, yielding embeddings that are task-specific and are discriminative. |
873 | Temporally Distributed Networks for Fast Video Semantic Segmentation | Ping Hu; Fabian Caba; Oliver Wang; Zhe Lin; Stan Sclaroff; Federico Perazzi; | We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. |
874 | Benchmarking the Robustness of Semantic Segmentation Models | Christoph Kamann; Carsten Rother; | While there are recent robustness studies for full-image classification, we are the first to present an exhaustive study for semantic segmentation, based on the state-of-the-art model DeepLabv3+. |
875 | There and Back Again: Revisiting Backpropagation Saliency Methods | Sylvestre-Alvise Rebuffi; Ruth Fong; Xu Ji; Andrea Vedaldi; | In this work, we conduct a thorough analysis of backpropagation-based saliency methods and propose a single framework under which several such methods can be unified. |
876 | Deep Semantic Clustering by Partition Confidence Maximisation | Jiabo Huang; Shaogang Gong; Xiatian Zhu; | In this work, we propose to solve this problem by learning the most confident clustering solution from all the possible separations, based on the observation that assigning samples from the same semantic categories into different clusters will reduce both the intra-cluster compactness and inter-cluster diversity, i.e. lower partition confidence. |
877 | StructEdit: Learning Structural Shape Variations | Kaichun Mo; Paul Guerrero; Li Yi; Hao Su; Peter Wonka; Niloy J. Mitra; Leonidas J. Guibas; | Instead, we treat shape differences as primary objects in their own right and propose to encode them in their own latent space. |
878 | Harmonizing Transferability and Discriminability for Adapting Object Detectors | Chaoqi Chen; Zebiao Zheng; Xinghao Ding; Yue Huang; Qi Dou; | In this paper, we propose a Hierarchical Transferability Calibration Network (HTCN) that hierarchically (local-region/image/instance) calibrates the transferability of feature representations for harmonizing transferability and discriminability. |
879 | Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching | Xuhua Huang; Jiarui Xu; Yu-Wing Tai; Chi-Keung Tang; | In this paper, we introduce "tracking-by-detection" into VOS which can coherently integrates segmentation into tracking, by proposing a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. |
880 | CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement | Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang; | In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data. |
881 | Correlating Edge, Pose With Parsing | Ziwei Zhang; Chi Su; Liang Zheng; Xiaodong Xie; | To capture such correlations, we propose a Correlation Parsing Machine (CorrPM) employing a heterogeneous non-local block to discover the spatial affinity among feature maps from the edge, pose and parsing. |
882 | VecRoad: Point-Based Iterative Graph Exploration for Road Graphs Extraction | Yong-Qiang Tan; Shang-Hua Gao; Xuan-Yi Li; Ming-Ming Cheng; Bo Ren; | To enhance the road connectivity while maintaining the precise alignment between the graph and real road, we propose a point-based iterative graph exploration scheme with segmentation-cues guidance and flexible steps. |
883 | Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation | Zeyu Wang; Klint Qinami; Ioannis Christos Karakozis; Kyle Genova; Prem Nair; Kenji Hata; Olga Russakovsky; | We highlight the shortcomings of popular adversarial training approaches for bias mitigation, propose a simple but similarly effective alternative to the inference-time Reducing Bias Amplification method of Zhao et al., and design a domain-independent training technique that outperforms all other methods. |
884 | Hierarchical Human Parsing With Typed Part-Relation Reasoning | Wenguan Wang; Hailong Zhu; Jifeng Dai; Yanwei Pang; Jianbing Shen; Ling Shao; | Focusing on this, we seek to simultaneously exploit the representational capacity of deep graph networks and the hierarchical human structures. |
885 | Compositional Convolutional Neural Networks: A Deep Architecture With Innate Robustness to Partial Occlusion | Adam Kortylewski; Ju He; Qing Liu; Alan L. Yuille; | Inspired by the success of compositional models at classifying partially occluded objects, we propose to integrate compositional models and DCNNs into a unified deep model with innate robustness to partial occlusion. |
886 | Spatial Pyramid Based Graph Reasoning for Semantic Segmentation | Xia Li; Yibo Yang; Qijie Zhao; Tiancheng Shen; Zhouchen Lin; Hong Liu; | In this paper, we apply graph convolution into the semantic segmentation task and propose an improved Laplacian. |
887 | Learning Video Object Segmentation From Unlabeled Videos | Xiankai Lu; Wenguan Wang; Jianbing Shen; Yu-Wing Tai; David J. Crandall; Steven C. H. Hoi; | We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data. |
888 | Part-Aware Context Network for Human Parsing | Xiaomei Zhang; Yingying Chen; Bingke Zhu; Jinqiao Wang; Ming Tang; | In this work, we propose a Part-aware Context Network (PCNet), a novel and effective algorithm to deal with the challenge. |
889 | SCOUT: Self-Aware Discriminant Counterfactual Explanations | Pei Wang; Nuno Vasconcelos; | A new family of discriminant explanations is introduced. These produce heatmaps that attribute high scores to image regions informative of a classifier prediction but not of a counter class. |
890 | Weakly-Supervised Semantic Segmentation via Sub-Category Exploration | Yu-Ting Chang; Qiaosong Wang; Wei-Chih Hung; Robinson Piramuthu; Yi-Hsuan Tsai; Ming-Hsuan Yang; | To enforce the network to pay attention to other parts of an object, we propose a simple yet effective approach that introduces a self-supervised task by exploiting the sub-category information. |
891 | Continual Learning With Extended Kronecker-Factored Approximate Curvature | Janghyeon Lee; Hyeong Gwon Hong; Donggyu Joo; Junmo Kim; | We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. |
892 | Phase Consistent Ecological Domain Adaptation | Yanchao Yang; Dong Lao; Ganesh Sundaramoorthi; Stefano Soatto; | We introduce two criteria to regularize the optimization involved in learning a classifier in a domain where no annotated data are available, leveraging annotated data in a different domain, a problem known as unsupervised domain adaptation. |
893 | AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification | Yunpeng Zhai; Shijian Lu; Qixiang Ye; Xuebo Shan; Jie Chen; Rongrong Ji; Yonghong Tian; | This paper presents a novel augmented discriminative clustering (AD-Cluster) technique that estimates and augments person clusters in target domains and enforces the discrimination ability of re-ID models with the augmented clusters. |
894 | 3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation | Francis Engelmann; Martin Bokeloh; Alireza Fathi; Bastian Leibe; Matthias Niessner; | We present 3D-MPA, a method for instance segmentation on 3D point clouds. |
895 | Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision | Denis Gudovskiy; Alec Hodgkinson; Takuya Yamaguchi; Sotaro Tsukizawa; | To implement such acquisition function, we propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators. |
896 | Adaptive Graph Convolutional Network With Attention Graph Clustering for Co-Saliency Detection | Kaihua Zhang; Tengpeng Li; Shiwen Shen; Bo Liu; Jin Chen; Qingshan Liu; | For this task, we present a novel adaptive graph convolutional network with attention graph clustering (GCAGC). |
897 | A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection | Yongri Piao; Zhengkun Rong; Miao Zhang; Weisong Ren; Huchuan Lu; | To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. |
898 | Deep Fair Clustering for Visual Learning | Peizhao Li; Han Zhao; Hongfu Liu; | In light of these limitations, in this paper, we propose Deep Fair Clustering (DFC) to learn fair and clustering-favorable representations for clustering simultaneously. |
899 | Bidirectional Graph Reasoning Network for Panoptic Segmentation | Yangxin Wu; Gengwei Zhang; Yiming Gao; Xiajun Deng; Ke Gong; Xiaodan Liang; Liang Lin; | We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and inter-modular relations within and between foreground things and background stuff classes. |
900 | Exploit Clues From Views: Self-Supervised and Regularized Learning for Multiview Object Recognition | Chih-Hui Ho; Bo Liu; Tz-Ying Wu; Nuno Vasconcelos; | In this work, the problem of multiview self-supervised learning (MV-SSL) is investigated, where only image to object association is given. |
901 | Spherical Space Domain Adaptation With Robust Pseudo-Label Loss | Xiang Gu; Jian Sun; Zongben Xu; | In this paper, we propose a novel adversarial DA approach completely defined in spherical feature space, in which we define spherical classifier for label prediction and spherical domain discriminator for discriminating domain labels. |
902 | Stochastic Classifiers for Unsupervised Domain Adaptation | Zhihe Lu; Yongxin Yang; Xiatian Zhu; Cong Liu; Yi-Zhe Song; Tao Xiang; | In this paper, we introduce a novel method called STochastic clAssifieRs (STAR) for addressing this problem. |
903 | Unsupervised Learning of Intrinsic Structural Representation Points | Nenglun Chen; Lingjie Liu; Zhiming Cui; Runnan Chen; Duygu Ceylan; Changhe Tu; Wenping Wang; | We present a simple yet interpretable unsupervised method for learning a new structural representation in the form of 3D structure points. |
904 | PolyTransform: Deep Polygon Transformer for Instance Segmentation | Justin Liang; Namdar Homayounfar; Wei-Chiu Ma; Yuwen Xiong; Rui Hu; Raquel Urtasun; | In this paper, we propose PolyTransform, a novel instance segmentation algorithm that produces precise, geometry-preserving masks by combining the strengths of prevailing segmentation approaches and modern polygon-based methods. |
905 | Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection | Huajun Zhou; Xiaohua Xie; Jian-Huang Lai; Zixuan Chen; Lingxiao Yang; | In this paper, we first analyze such correlation and then propose an interactive two-stream decoder to explore multiple cues, including saliency, contour and their correlation. |
906 | Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet | Wang Zhao; Shaohui Liu; Yezhi Shu; Yong-Jin Liu; | In this work, we tackle the essential problem of scale inconsistency for self supervised joint depth-pose learning. |
907 | LT-Net: Label Transfer by Learning Reversible Voxel-Wise Correspondence for One-Shot Medical Image Segmentation | Shuxin Wang; Shilei Cao; Dong Wei; Renzhen Wang; Kai Ma; Liansheng Wang; Deyu Meng; Yefeng Zheng; | We introduce a one-shot segmentation method to alleviate the burden of manual annotation for medical images. |
908 | FGN: Fully Guided Network for Few-Shot Instance Segmentation | Zhibo Fan; Jin-Gang Yu; Zhihao Liang; Jiarong Ou; Changxin Gao; Gui-Song Xia; Yuanqing Li; | This paper presents a Fully Guided Network (FGN) for few-shot instance segmentation. |
909 | A Quantum Computational Approach to Correspondence Problems on Point Sets | Vladislav Golyanik; Christian Theobalt; | We review AQC and derive a new algorithm for correspondence problems on point sets suitable for execution on AQC. |
910 | Data-Efficient Semi-Supervised Learning by Reliable Edge Mining | Peibin Chen; Tao Ma; Xu Qin; Weidi Xu; Shuchang Zhou; | We propose Reliable Edge Mining (REM), which forms a reliable graph by only selecting reliable and useful edges. |
911 | NestedVAE: Isolating Common Factors via Weak Supervision | Matthew J. Vowels; Necati Cihan Camgoz; Richard Bowden; | To isolate the common factors we combine the theory of deep latent variable models with information bottleneck theory for scenarios whereby data may be naturally paired across domains and no additional supervision is required. |
912 | Progressive Adversarial Networks for Fine-Grained Domain Adaptation | Sinan Wang; Xinyang Chen; Yunbo Wang; Mingsheng Long; Jianmin Wang; | This paper presents the Progressive Adversarial Networks (PAN) to align fine-grained categories across domains with a curriculum-based adversarial learning framework. |
913 | A Disentangling Invertible Interpretation Network for Explaining Latent Representations | Patrick Esser; Robin Rombach; Bjorn Ommer; | We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user. |
914 | Modeling the Background for Incremental Learning in Semantic Segmentation | Fabio Cermelli; Massimiliano Mancini; Samuel Rota Bulo; Elisa Ricci; Barbara Caputo; | In this work we revisit classical incremental learning methods, proposing a new distillation-based framework which explicitly accounts for this shift. |
915 | Interpreting the Latent Space of GANs for Semantic Face Editing | Yujun Shen; Jinjin Gu; Xiaoou Tang; Bolei Zhou; | In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs. |
916 | Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation | Jianqiang Wan; Yang Liu; Donglai Wei; Xiang Bai; Yongchao Xu; | In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD. |
917 | Self-Learning With Rectification Strategy for Human Parsing | Tao Li; Zhiyuan Liang; Sanyuan Zhao; Jiahao Gong; Jianbing Shen; | In this paper, we solve the sample shortage problem in the human parsing task. |
918 | Hyperbolic Visual Embedding Learning for Zero-Shot Recognition | Shaoteng Liu; Jingjing Chen; Liangming Pan; Chong-Wah Ngo; Tat-Seng Chua; Yu-Gang Jiang; | This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition. |
919 | Sequential Mastery of Multiple Visual Tasks: Networks Naturally Learn to Learn and Forget to Forget | Guy Davidson; Michael C. Mozer; | We explore the behavior of a standard convolutional neural net in a continual-learning setting that introduces visual classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks. |
920 | Distilling Effective Supervision From Severe Label Noise | Zizhao Zhang; Han Zhang; Sercan O. Arik; Honglak Lee; Tomas Pfister; | We present a holistic framework to train deep neural networks in a way that is highly invulnerable to label noise. |
921 | Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks | Aditya Golatkar; Alessandro Achille; Stefano Soatto; | We propose a method for "scrubbing" the weights clean of information about a particular set of training data. |
922 | CenterMask: Single Shot Instance Segmentation With Point Representation | Yuqing Wang; Zhaoliang Xu; Hao Shen; Baoshan Cheng; Lirong Yang; | In this paper, we propose a single-shot instance segmentation method, which is simple, fast and accurate. |
923 | Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning | Mei Wang; Weihong Deng; | A reinforcement learning based race balance network (RL-RBN) is proposed. |
924 | MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images | Yaxing Wang; Abel Gonzalez-Garcia; David Berga; Luis Herranz; Fahad Shahbaz Khan; Joost van de Weijer; | We propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. |
925 | DLWL: Improving Detection for Lowshot Classes With Weakly Labelled Data | Vignesh Ramanathan; Rui Wang; Dhruv Mahajan; | Towards this end, we propose a modification to the FRCNN model to automatically infer label assignment for objects proposals from weakly labelled images during training. |
926 | Unsupervised Deep Shape Descriptor With Point Distribution Learning | Yi Shi; Mengchen Xu; Shuaihang Yuan; Yi Fang; | This paper proposes a novel probabilistic framework for the learning of unsupervised deep shape descriptors with point distribution learning. |
927 | Stylization-Based Architecture for Fast Deep Exemplar Colorization | Zhongyou Xu; Tingting Wang; Faming Fang; Yun Sheng; Guixu Zhang; | To tackle these problems, we propose a deep exemplar colorization architecture inspired by the characteristics of stylization in feature extracting and blending. |
928 | Cars Can’t Fly Up in the Sky: Improving Urban-Scene Segmentation via Height-Driven Attention Networks | Sungha Choi; Joanne T. Kim; Jaegul Choo; | This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban-scene images. |
929 | State-Aware Tracker for Real-Time Video Object Segmentation | Xi Chen; Zuoxin Li; Ye Yuan; Gang Yu; Jianxin Shen; Donglian Qi; | In this work, we address the task of semi-supervised video object segmentation (VOS) and explore how to make efficient use of video property to tackle the challenge of semi-supervision. |
930 | Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning | Xuan Liao; Wenhao Li; Qisen Xu; Xiangfeng Wang; Bo Jin; Xiaoyun Zhang; Yanfeng Wang; Ya Zhang; | We here propose to model the dynamic process of iterative interactive image segmentation as a Markov decision process (MDP) and solve it with reinforcement learning (RL). |
931 | ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition | Song Bian; Tianchen Wang; Masayuki Hiromoto; Yiyu Shi; Takashi Sato; | In this work, we propose ENSEI, a secure inference (SI) framework based on the frequency-domain secure convolution (FDSC) protocol for the efficient execution of image inference in the encrypted domain. |
932 | Multi-Scale Interactive Network for Salient Object Detection | Youwei Pang; Xiaoqi Zhao; Lihe Zhang; Huchuan Lu; | In this paper, we propose the aggregate interaction modules to integrate the features from adjacent levels, in which less noise is introduced because of only using small up-/down-sampling rates. |
933 | Interactive Multi-Label CNN Learning With Partial Labels | Dat Huynh; Ehsan Elhamifar; | We introduce a new loss function that regularizes the cross-entropy loss with a cost function that measures the smoothness of labels and features of images on the data manifold. |
934 | ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation | Yawar Siddiqui; Julien Valentin; Matthias Niessner; | We propose ViewAL, a novel active learning strategy for semantic segmentation that exploits viewpoint consistency in multi-view datasets. |
935 | Scene-Adaptive Video Frame Interpolation via Meta-Learning | Myungsub Choi; Janghoon Choi; Sungyong Baik; Tae Hyun Kim; Kyoung Mu Lee; | In this work, we propose to adapt the model to each video by making use of additional information that is readily available at test time and yet has not been exploited in previous works. |
936 | Action Segmentation With Joint Self-Supervised Temporal Domain Adaptation | Min-Hung Chen; Baopu Li; Yingze Bao; Ghassan AlRegib; Zsolt Kira; | To reduce the discrepancy, we propose SelfSupervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches. |
937 | Pixel Consensus Voting for Panoptic Segmentation | Haochen Wang; Ruotian Luo; Michael Maire; Greg Shakhnarovich; | The core of our approach, Pixel Consensus Voting, is a framework for instance segmentation based on the generalized Hough transform. |
938 | Minimizing Discrete Total Curvature for Image Processing | Qiuxiang Zhong; Yutong Li; Yijie Yang; Yuping Duan; | In this paper, we propose a novel curvature regularity, the total curvature (TC), by minimizing the normal curvatures along different directions. |
939 | Towards Robust Image Classification Using Sequential Attention Models | Daniel Zoran; Mike Chrzanowski; Po-Sen Huang; Sven Gowal; Alex Mott; Pushmeet Kohli; | In this paper we propose to augment a modern neural-network architecture with an attention model inspired by human perception. |
940 | Discovering Synchronized Subsets of Sequences: A Large Scale Solution | Evangelos Sariyanidi; Casey J. Zampella; Keith G. Bartley; John D. Herrington; Theodore D. Satterthwaite; Robert T. Schultz; Birkan Tunc; | We present an approximate, but highly efficient and scalable, method that represents the search space as a union of sets called epsilon-expanded clusters, one of which is theoretically guaranteed to contain the largest subset of synchronized sequences. |
941 | Going Deeper With Lean Point Networks | Eric-Tuan Le; Iasonas Kokkinos; Niloy J. Mitra; | In this work we introduce Lean Point Networks (LPNs) to train deeper and more accurate point processing networks by relying on three novel point processing blocks that improve memory consumption, inference time, and accuracy: a convolution-type block for point sets that blends neighborhood information in a memory-efficient manner; a crosslink block that efficiently shares information across low- and high-resolution processing branches; and a multi-resolution point cloud processing block for faster diffusion of information. |
942 | Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment | Rui Xiang; Rongjie Lai; Hongkai Zhao; | In this work, we introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment through sparsity control for shape correspondence between two approximate isometric surfaces. |
943 | Explainable Object-Induced Action Decision for Autonomous Vehicles | Yiran Xu; Xiaoyin Yang; Lihang Gong; Hsuan-Chu Lin; Tz-Ying Wu; Yunsheng Li; Nuno Vasconcelos; | A new paradigm is proposed for autonomous driving. The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem. |
944 | Spatially Attentive Output Layer for Image Classification | Ildoo Kim; Woonhyuk Baek; Sungwoong Kim; | In this paper, we propose a novel spatial output layer on top of the existing convolutional feature maps to explicitly exploit the location-specific output information. |
945 | Attack to Explain Deep Representation | Mohammad A. A. K. Jalwana; Naveed Akhtar; Mohammed Bennamoun; Ajmal Mian; | This paper counter-argues and proposes the first attack on deep learning that aims at explaining the learned representation instead of fooling it. |
946 | Computing Valid P-Values for Image Segmentation by Selective Inference | Kosuke Tanizaki; Noriaki Hashimoto; Yu Inatsu; Hidekata Hontani; Ichiro Takeuchi; | To overcome this difficulty, we introduce a statistical approach called selective inference, and develop a framework for computing valid p-values in which segmentation bias is properly accounted for. |
947 | Unsupervised Learning From Video With Deep Neural Embeddings | Chengxu Zhuang; Tianwei She; Alex Andonian; Max Sobol Mark; Daniel Yamins; | Here we present the Video Instance Embedding (VIE) framework, which trains deep nonlinear embeddings on video sequence inputs. |
948 | Partial Weight Adaptation for Robust DNN Inference | Xiufeng Xie; Kyu-Han Kim; | We present GearNN, an adaptive inference architecture that accommodates DNN inputs with varying distortions. |
949 | Probability Weighted Compact Feature for Domain Adaptive Retrieval | Fuxiang Huang; Lei Zhang; Yang Yang; Xichuan Zhou; | In this paper, considering the practical application, we focus on challenging cross-domain retrieval. |
950 | Where Does It End? – Reasoning About Hidden Surfaces by Object Intersection Constraints | Michael Strecke; Jorg Stuckler; | In this paper we propose Co-Section, an optimization-based approach to 3D dynamic scene reconstruction, which infers hidden shape information from intersection constraints. |
951 | PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation | Yang Zhang; Zixiang Zhou; Philip David; Xiangyu Yue; Zerong Xi; Boqing Gong; Hassan Foroosh; | The combination of the aforementioned challenges motivates us to propose a new LiDAR-specific, KNN-free segmentation algorithm – PolarNet. |
952 | Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation | Dwarikanath Mahapatra; Behzad Bozorgtabar; Ling Shao; | We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. |
953 | Transferring and Regularizing Prediction for Semantic Segmentation | Yiheng Zhang; Zhaofan Qiu; Ting Yao; Chong-Wah Ngo; Dong Liu; Tao Mei; | In this paper, we novelly exploit the intrinsic properties of semantic segmentation to alleviate such problem for model transfer. |
954 | PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition | Kun Su; Xiulong Liu; Eli Shlizerman; | We propose a novel system for unsupervised skeleton-based action recognition. |
955 | Model Adaptation: Unsupervised Domain Adaptation Without Source Data | Rui Li; Qianfen Jiao; Wenming Cao; Hau-San Wong; Si Wu; | In this paper, we investigate a challenging unsupervised domain adaptation setting — unsupervised model adaptation. |
956 | Evade Deep Image Retrieval by Stashing Private Images in the Hash Space | Yanru Xiao; Cong Wang; Xing Gao; | In this paper, we propose a new mechanism based on adversarial examples to "stash" private images in the deep hash space while maintaining perceptual similarity. |
957 | Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules | Jinkyu Kim; Suhong Moon; Anna Rohrbach; Trevor Darrell; John Canny; | We propose a new approach that learns vehicle control with the help of human advice. |
958 | ProAlignNet: Unsupervised Learning for Progressively Aligning Noisy Contours | VSR Veeravasarapu; Abhishek Goel; Deepak Mittal; Maneesh Singh; | This work presents a novel ConvNet, "ProAlignNet," that accounts for large scale misalignments and complex transformations between the contour shapes. |
959 | Attribution in Scale and Space | Shawn Xu; Subhashini Venugopalan; Mukund Sundararajan; | We propose a new technique called Blur Integrated Gradients (Blur IG). |
960 | Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing | Vedika Agarwal; Rakshith Shetty; Mario Fritz; | In this paper, we propose a novel way to analyze and measure the robustness of the state of the art models w.r.t semantic visual variations as well as propose ways to make models more robust against spurious correlations. |
961 | Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection | Shi-Xue Zhang; Xiaobin Zhu; Jie-Bo Hou; Chang Liu; Chun Yang; Hongfa Wang; Xu-Cheng Yin; | In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. |
962 | Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels | Junran Peng; Xingyuan Bu; Ming Sun; Zhaoxiang Zhang; Tieniu Tan; Junjie Yan; | In this work, we quantitatively analyze these label problems and provide a simple but effective solution. |
963 | BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition | Boyan Zhou; Quan Cui; Xiu-Shen Wei; Zhao-Min Chen; | Therefore, we propose a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately. |
964 | Momentum Contrast for Unsupervised Visual Representation Learning | Kaiming He; Haoqi Fan; Yuxin Wu; Saining Xie; Ross Girshick; | We present Momentum Contrast (MoCo) for unsupervised visual representation learning. |
965 | Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation | Gedas Bertasius; Lorenzo Torresani; | We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. |
966 | Weakly Supervised Fine-Grained Image Classification via Guassian Mixture Model Oriented Discriminative Learning | Zhihui Wang; Shijie Wang; Shuhui Yang; Haojie Li; Jianjun Li; Zezhou Li; | In this paper, we propose an end-to-end Discriminative Feature-oriented Gaussian Mixture Model (DF-GMM), to address the problem of discriminative region diffusion and find better fine-grained details. |
967 | Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection | Shifeng Zhang; Cheng Chi; Yongqiang Yao; Zhen Lei; Stan Z. Li; | In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. |
968 | Learning User Representations for Open Vocabulary Image Hashtag Prediction | Thibaut Durand; | In this paper, we introduce an open vocabulary model for image hashtag prediction – the task of mapping an image to its accompanying hashtags. |
969 | Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval | Ayan Kumar Bhunia; Yongxin Yang; Timothy M. Hospedales; Tao Xiang; Yi-Zhe Song; | In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. |
970 | Few-Shot Pill Recognition | Suiyi Ling; Andreas Pastor; Jing Li; Zhaohui Che; Junle Wang; Jieun Kim; Patrick Le Callet; | In this study, a new pill image database, namely CURE, is first developed with more varied imaging conditions and instances for each pill category. Secondly, a W2-net is proposed for better pill segmentation. Thirdly, a Multi-Stream (MS) deep network that captures task-related features along with a novel two-stage training methodology are proposed. |
971 | PointRend: Image Segmentation As Rendering | Alexander Kirillov; Yuxin Wu; Kaiming He; Ross Girshick; | We present a new method for efficient high-quality image segmentation of objects and scenes. |
972 | ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network | Yuliang Liu; Hao Chen; Chunhua Shen; Tong He; Lianwen Jin; Liangwei Wang; | Our contributions are three-fold: 1) For the first time, we adaptively fit oriented or curved text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. |
973 | Learning Temporal Co-Attention Models for Unsupervised Video Action Localization | Guoqiang Gong; Xinghan Wang; Yadong Mu; Qi Tian; | To solve ACL, we propose a two-step "clustering + localization" iterative procedure. |
974 | Spatiotemporal Fusion in 3D CNNs: A Probabilistic View | Yizhou Zhou; Xiaoyan Sun; Chong Luo; Zheng-Jun Zha; Wenjun Zeng; | In this paper, we propose to convert the spatiotemporal fusion strategies into a probability space, which allows us to perform network-level evaluations of various fusion strategies without having to train them separately. |
975 | Uncertainty-Aware Score Distribution Learning for Action Quality Assessment | Yansong Tang; Zanlin Ni; Jiahuan Zhou; Danyang Zhang; Jiwen Lu; Ying Wu; Jie Zhou; | To address this issue, we propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA). |
976 | Learning Interactions and Relationships Between Movie Characters | Anna Kukleva; Makarand Tapaswi; Ivan Laptev; | In this work, we propose neural models to learn and jointly predict interactions, relationships, and the pair of characters that are involved. |
977 | Video Panoptic Segmentation | Dahun Kim; Sanghyun Woo; Joon-Young Lee; In So Kweon; | In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation. |
978 | Understanding Human Hands in Contact at Internet Scale | Dandan Shan; Jiaqi Geng; Michelle Shu; David F. Fouhey; | This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: hand location, side, contact state, and a box around the object in contact. |
979 | End-to-End Learning of Visual Representations From Uncurated Instructional Videos | Antoine Miech; Jean-Baptiste Alayrac; Lucas Smaira; Ivan Laptev; Josef Sivic; Andrew Zisserman; | In this work we propose a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos. |
980 | You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions | Evonne Ng; Donglai Xiang; Hanbyul Joo; Kristen Grauman; | We propose a learning-based approach to estimate the camera wearer’s 3D body pose from egocentric video sequences. |
981 | Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection | Jie Chen; Zhiheng Li; Jiebo Luo; Chenliang Xu; | To overcome these challenges, we propose a general Weakly-Supervised framework with a Wise Selection of training samples and model evaluation criterion (WS^2). |
982 | Learning to Measure the Static Friction Coefficient in Cloth Contact | Abdullah Haroon Rasheed; Victor Romero; Florence Bertails-Descoubes; Stefanie Wuhrer; Jean-Sebastien Franco; Arnaud Lazarus; | We propose a first vision-based measurement network for friction between cloth and a substrate, using a simple and repeatable video acquisition protocol. |
983 | SpeedNet: Learning the Speediness in Videos | Sagie Benaim; Ariel Ephrat; Oran Lang; Inbar Mosseri; William T. Freeman; Michael Rubinstein; Michal Irani; Tali Dekel; | The core component in our approach is SpeedNet–a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. |
984 | Telling Left From Right: Learning Spatial Correspondence of Sight and Sound | Karren Yang; Bryan Russell; Justin Salamon; | We propose a novel self-supervised task to leverage an orthogonal principle: matching spatial information in the audio stream to the positions of sound sources in the visual stream. |
985 | Visual-Textual Capsule Routing for Text-Based Video Segmentation | Bruce McIntosh; Kevin Duarte; Yogesh S Rawat; Mubarak Shah; | In this work, we focus on integration of video and text for the task of actor and action video segmentation from a sentence. |
986 | Graph-Structured Referring Expression Reasoning in the Wild | Sibei Yang; Guanbin Li; Yizhou Yu; | In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression. |
987 | Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs | Shizhe Chen; Qin Jin; Peng Wang; Qi Wu; | In this work, we propose the Abstract Scene Graph (ASG) structure to represent user intention in fine-grained level and control what and how detailed the generated description should be. |
988 | Hierarchical Conditional Relation Networks for Video Question Answering | Thao Minh Le; Vuong Le; Svetha Venkatesh; Truyen Tran; | We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated structures for representation and reasoning over video. |
989 | REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments | Yuankai Qi; Qi Wu; Peter Anderson; Xin Wang; William Yang Wang; Chunhua Shen; Anton van den Hengel; | In the hope that it might drive progress towards more flexible and powerful human interactions with robots, we propose a dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images. |
990 | Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA | Ronghang Hu; Amanpreet Singh; Trevor Darrell; Marcus Rohrbach; | In this work, we propose a novel model for the TextVQA task based on a multimodal transformer architecture accompanied by a rich representation for text in images. |
991 | SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions | Ramprasaath R. Selvaraju; Purva Tendulkar; Devi Parikh; Eric Horvitz; Marco Tulio Ribeiro; Besmira Nushi; Ece Kamar; | To address this shortcoming, we propose an approach called Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the model to attend to the same parts of the image when answering the reasoning question and the perception sub question. |
992 | Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks | Fengda Zhu; Yi Zhu; Xiaojun Chang; Xiaodan Liang; | In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to exploit the additional training signals derived from these semantic information. |
993 | Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation | Necati Cihan Camgoz; Oscar Koller; Simon Hadfield; Richard Bowden; | We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner. |
994 | Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation | Gen Luo; Yiyi Zhou; Xiaoshuai Sun; Liujuan Cao; Chenglin Wu; Cheng Deng; Rongrong Ji; | In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time. |
995 | Counterfactual Vision and Language Learning | Ehsan Abbasnejad; Damien Teney; Amin Parvaneh; Javen Shi; Anton van den Hengel; | We propose a method that addresses this problem by introducing counterfactuals in the training. |
996 | Iterative Context-Aware Graph Inference for Visual Dialog | Dan Guo; Hui Wang; Hanwang Zhang; Zheng-Jun Zha; Meng Wang; | To this end, we propose a novel Context-Aware Graph (CAG) neural network. |
997 | TA-Student VQA: Multi-Agents Training by Self-Questioning | Peixi Xiong; Ying Wu; | We introduce our self-questioning model with multi-agent training: TA-student VQA. |
998 | Exploring Self-Attention for Image Recognition | Hengshuang Zhao; Jiaya Jia; Vladlen Koltun; | We explore variations of self-attention and assess their effectiveness for image recognition. |
999 | Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension | Zhenfang Chen; Peng Wang; Lin Ma; Kwan-Yee K. Wong; Qi Wu; | To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features. |
1000 | Improving Convolutional Networks With Self-Calibrated Convolutions | Jiang-Jiang Liu; Qibin Hou; Ming-Ming Cheng; Changhu Wang; Jiashi Feng; | In this paper, we consider how to improve the basic convolutional feature transformation process of CNNs without tuning the model architectures. |
1001 | Modality Shifting Attention Network for Multi-Modal Video Question Answering | Junyeong Kim; Minuk Ma; Trung Pham; Kyungsu Kim; Chang D. Yoo; | This paper considers a network referred to as Modality Shifting Attention Network (MSAN) for Multimodal Video Question Answering (MVQA) task. |
1002 | Learning to Structure an Image With Few Colors | Yunzhong Hou; Liang Zheng; Stephen Gould; | To this end, we propose a color quantization network, ColorCNN, which learns to structure the images from the classification loss in an end-to-end manner. |
1003 | On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering | Xinyu Wang; Yuliang Liu; Chunhua Shen; Chun Chet Ng; Canjie Luo; Lianwen Jin; Chee Seng Chan; Anton van den Hengel; Liangwei Wang; | We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method’s ability to reason. |
1004 | From Paris to Berlin: Discovering Fashion Style Influences Around the World | Ziad Al-Halah; Kristen Grauman; | We introduce an approach that detects which cities influence which other cities in terms of propagating their styles. |
1005 | A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation | Anyi Rao; Linning Xu; Yu Xiong; Guodong Xu; Qingqiu Huang; Bolei Zhou; Dahua Lin; | Towards this goal, we scale up the scene segmentation task by building a large-scale video dataset MovieScenes, which contains 21K annotated scene segments from 150 movies. We further propose a local-to-global scene segmentation framework, which integrates multi-modal information across three levels, i.e. clip, segment, and movie. |
1006 | G-TAD: Sub-Graph Localization for Temporal Action Detection | Mengmeng Xu; Chen Zhao; David S. Rojas; Ali Thabet; Bernard Ghanem; | In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. |
1007 | Detailed 2D-3D Joint Representation for Human-Object Interaction | Yong-Lu Li; Xinpeng Liu; Han Lu; Shiyi Wang; Junqi Liu; Jiefeng Li; Cewu Lu; | In light of these, we propose a detailed 2D-3D joint representation learning method. To better evaluate the 2D ambiguity processing capacity of models, we propose a new benchmark named Ambiguous-HOI consisting of hard ambiguous images. |
1008 | One-Shot Adversarial Attacks on Visual Tracking With Dual Attention | Xuesong Chen; Xiyu Yan; Feng Zheng; Yong Jiang; Shu-Tao Xia; Yong Zhao; Rongrong Ji; | In this paper, we propose a novel one-shot adversarial attack method to generate adversarial examples for free-model single object tracking, where merely adding slight perturbations on the target patch in the initial frame causes state-of-the-art trackers to lose the target in subsequent frames. |
1009 | Rethinking Classification and Localization for Object Detection | Yue Wu; Yinpeng Chen; Lu Yuan; Zicheng Liu; Lijuan Wang; Hongzhi Li; Yun Fu; | Based upon these findings, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head for bounding box regression. |
1010 | Correspondence Networks With Adaptive Neighbourhood Consensus | Shuda Li; Kai Han; Theo W. Costain; Henry Howard-Jenkins; Victor Prisacariu; | In this paper, we tackle the task of establishing dense visual correspondences between images containing objects of the same category. |
1011 | Multiple Anchor Learning for Visual Object Detection | Wei Ke; Tianliang Zhang; Zeyi Huang; Qixiang Ye; Jianzhuang Liu; Dong Huang; | In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector. |
1012 | PhraseCut: Language-Based Image Segmentation in the Wild | Chenyun Wu; Zhe Lin; Scott Cohen; Trung Bui; Subhransu Maji; | We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs. |
1013 | Mask Encoding for Single Shot Instance Segmentation | Rufeng Zhang; Zhi Tian; Chunhua Shen; Mingyu You; Youliang Yan; | In this work, we propose a simple single-shot instance segmentation framework, termed mask encoding based instance segmentation (MEInst). |
1014 | Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs | Jingwei Ji; Ranjay Krishna; Li Fei-Fei; Juan Carlos Niebles; | Inspired by evidence that the prototypical unit of an event is an action-object interaction, we introduce Action Genome, a representation that decomposes actions into spatio-temporal scene graphs. |
1015 | Learning Unseen Concepts via Hierarchical Decomposition and Composition | Muli Yang; Cheng Deng; Junchi Yan; Xianglong Liu; Dacheng Tao; | We propose to learn unseen concepts in a hierarchical decomposition-and-composition manner. |
1016 | Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification | Seokeon Choi; Sumin Lee; Youngeun Kim; Taekyung Kim; Changick Kim; | To reduce both intra- and cross-modality discrepancies, we propose a Hierarchical Cross-Modality Disentanglement (Hi-CMD) method, which automatically disentangles ID-discriminative factors and ID-excluded factors from visible-thermal images. |
1017 | In Defense of Grid Features for Visual Question Answering | Huaizu Jiang; Ishan Misra; Marcus Rohrbach; Erik Learned-Miller; Xinlei Chen; | In this paper, we revisit grid features for VQA, and find they can work surprisingly well — running more than an order of magnitude faster with the same accuracy (e.g. if pre-trained in a similar fashion). |
1018 | Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation | Tao Zhou; Huazhu Fu; Chen Gong; Jianbing Shen; Ling Shao; Fatih Porikli; | To this end, we propose a novel multi-mutual consistency induced transfer subspace learning framework for human motion segmentation. |
1019 | Dense Regression Network for Video Grounding | Runhao Zeng; Haoming Xu; Wenbing Huang; Peihao Chen; Mingkui Tan; Chuang Gan; | The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy. |
1020 | Neural Architecture Search for Lightweight Non-Local Networks | Yingwei Li; Xiaojie Jin; Jieru Mei; Xiaochen Lian; Linjie Yang; Cihang Xie; Qihang Yu; Yuyin Zhou; Song Bai; Alan L. Yuille; | We propose AutoNL to overcome the above two obstacles. |
1021 | Learning Saliency Propagation for Semi-Supervised Instance Segmentation | Yanzhao Zhou; Xin Wang; Jianbin Jiao; Trevor Darrell; Fisher Yu; | We propose ShapeProp, which learns to activate the salient regions within the object detection and propagate the areas to the whole instance through an iterative learnable message passing module. |
1022 | Speech2Action: Cross-Modal Supervision for Action Recognition | Arsha Nagrani; Chen Sun; David Ross; Rahul Sukthankar; Cordelia Schmid; Andrew Zisserman; | In this work we investigate the link between spoken words and actions in movies. |
1023 | Normalized and Geometry-Aware Self-Attention Network for Image Captioning | Longteng Guo; Jing Liu; Xinxin Zhu; Peng Yao; Shichen Lu; Hanqing Lu; | In this paper, we improve SA from two aspects to promote the performance of image captioning. |
1024 | Memory Enhanced Global-Local Aggregation for Video Object Detection | Yihong Chen; Yue Cao; Han Hu; Liwei Wang; | In this paper we introduce memory enhanced global-local aggregation (MEGA) network, which is among the first trials that takes full consideration of both global and local information. |
1025 | Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval | Kaiyue Pang; Yongxin Yang; Timothy M. Hospedales; Tao Xiang; Yi-Zhe Song; | In this paper, we propose a self-supervised alternative for representation pre-training. |
1026 | LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks | Hang Zhou; Dongdong Chen; Jing Liao; Kejiang Chen; Xiaoyi Dong; Kunlin Liu; Weiming Zhang; Gang Hua; Nenghai Yu; | To overcome these shortcomings, this paper proposes a novel label guided adversarial network (LG-GAN) for real-time flexible targeted point cloud attack. |
1027 | Memory Aggregation Networks for Efficient Interactive Video Object Segmentation | Jiaxu Miao; Yunchao Wei; Yi Yang; | In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. |
1028 | VQA With No Questions-Answers Training | Ben-Zion Vatashsky; Shimon Ullman; | We propose a novel method that consists of two main parts: generating a question graph representation, and an answering procedure, guided by the abstract structure of the question graph to invoke an extendable set of visual estimators. |
1029 | Counting Out Time: Class Agnostic Video Repetition Counting in the Wild | Debidatta Dwibedi; Yusuf Aytar; Jonathan Tompson; Pierre Sermanet; Andrew Zisserman; | We present an approach for estimating the period with which an action is repeated in a video. |
1030 | SaccadeNet: A Fast and Accurate Object Detector | Shiyi Lan; Zhou Ren; Yi Wu; Larry S. Davis; Gang Hua; | In this paper, inspired by such mechanism, we propose a fast and accurate object detector called SaccadeNet. |
1031 | Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification | Zhizheng Zhang; Cuiling Lan; Wenjun Zeng; Zhibo Chen; | In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. |
1032 | Video Object Grounding Using Semantic Roles in Language Description | Arka Sadhu; Kan Chen; Ram Nevatia; | Here, we investigate the role of object relations in VOG and propose a novel framework VOGNet to encode multi-modal object relations via self-attention with relative position encoding. |
1033 | Designing Network Design Spaces | Ilija Radosavovic; Raj Prateek Kosaraju; Ross Girshick; Kaiming He; Piotr Dollar; | In this work, we present a new network design paradigm. |
1034 | 12-in-1: Multi-Task Vision and Language Representation Learning | Jiasen Lu; Vedanuj Goswami; Marcus Rohrbach; Devi Parikh; Stefan Lee; | In this work, we investigate these relationships between vision-and-language tasks by developing a large-scale, multi-task model. |
1035 | MLCVNet: Multi-Level Context VoteNet for 3D Object Detection | Qian Xie; Yu-Kun Lai; Jing Wu; Zhoutao Wang; Yiming Zhang; Kai Xu; Jun Wang; | In this paper, we address the 3D object detection task by capturing multi-level contextual information with the self-attention mechanism and multi-scale feature fusion. |
1036 | Listen to Look: Action Recognition by Previewing Audio | Ruohan Gao; Tae-Hyun Oh; Kristen Grauman; Lorenzo Torresani; | We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. |
1037 | Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization | Ruyi Ji; Longyin Wen; Libo Zhang; Dawei Du; Yanjun Wu; Chen Zhao; Xianglong Liu; Feiyue Huang; | Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree. |
1038 | Music Gesture for Visual Sound Separation | Chuang Gan; Deng Huang; Hang Zhao; Joshua B. Tenenbaum; Antonio Torralba; | To address this, we propose "Music Gesture," a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. |
1039 | Referring Image Segmentation via Cross-Modal Progressive Comprehension | Shaofei Huang; Tianrui Hui; Si Liu; Guanbin Li; Yunchao Wei; Jizhong Han; Luoqi Liu; Bo Li; | In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task. |
1040 | Cloth in the Wind: A Case Study of Physical Measurement Through Simulation | Tom F. H. Runia; Kirill Gavrilyuk; Cees G. M. Snoek; Arnold W. M. Smeulders; | In this paper, we propose to measure latent physical properties for cloth in the wind without ever having seen a real example before. |
1041 | The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction | Junwei Liang; Lu Jiang; Kevin Murphy; Ting Yu; Alexander Hauptmann; | This paper studies the problem of predicting the distribution over multiple possible future paths of people as they move through various visual scenes. |
1042 | CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection | Zhiwei Dong; Guoxuan Li; Yue Liao; Fei Wang; Pengju Ren; Chen Qian; | In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. |
1043 | PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection | Shaoshuai Shi; Chaoxu Guo; Li Jiang; Zhe Wang; Jianping Shi; Xiaogang Wang; Hongsheng Li; | We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. |
1044 | Graph Embedded Pose Clustering for Anomaly Detection | Amir Markovitz; Gilad Sharir; Itamar Friedman; Lihi Zelnik-Manor; Shai Avidan; | We propose a new method for anomaly detection of human actions. |
1045 | Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation | Jiaming Sun; Linghao Chen; Yiming Xie; Siyu Zhang; Qinhong Jiang; Xiaowei Zhou; Hujun Bao; | In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. |
1046 | Deepstrip: High-Resolution Boundary Refinement | Peng Zhou; Brian Price; Scott Cohen; Gregg Wilensky; Larry S. Davis; | In this paper, we target refining the boundaries in high resolution images given low resolution masks. |
1047 | Smoothing Adversarial Domain Attack and P-Memory Reconsolidation for Cross-Domain Person Re-Identification | Guangcong Wang; Jian-Huang Lai; Wenqi Liang; Guangrun Wang; | To reduce the gap between the source and target domains, we propose a Smoothing Adversarial Domain Attack (SADA) approach that guides the source domain images to align the target domain images by using a trained camera classifier. |
1048 | Meshed-Memory Transformer for Image Captioning | Marcella Cornia; Matteo Stefanini; Lorenzo Baraldi; Rita Cucchiara; | With the aim of filling this gap, we present M2 – a Meshed Transformer with Memory for Image Captioning. |
1049 | Learning From Noisy Anchors for One-Stage Object Detection | Hengduo Li; Zuxuan Wu; Chen Zhu; Caiming Xiong; Richard Socher; Larry S. Davis; | In this paper, we propose to mitigate noise incurred by imperfect label assignment such that the contributions of anchors are dynamically determined by a carefully constructed cleanliness score associated with each anchor. |
1050 | Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection | Zhongzheng Ren; Zhiding Yu; Xiaodong Yang; Ming-Yu Liu; Yong Jae Lee; Alexander G. Schwing; Jan Kautz; | To target these issues we develop an instance-aware and context-focused unified framework. |
1051 | Density-Based Clustering for 3D Object Detection in Point Clouds | Syeda Mariam Ahmed; Chee Meng Chew; | In this work, we introduce a novel approach for 3D object detection that is significant in two main aspects: a) cascaded modular approach that focuses the receptive field of each module on specific points in the point cloud, for improved feature learning and b) a class agnostic instance segmentation module that is initiated using unsupervised clustering. |
1052 | Few-Shot Video Classification via Temporal Alignment | Kaidi Cao; Jingwei Ji; Zhangjie Cao; Chien-Yi Chang; Juan Carlos Niebles; | In this paper, we propose the Ordered Temporal Alignment Module (OTAM), a novel few-shot learning framework that can learn to classify a previously unseen video. |
1053 | Densely Connected Search Space for More Flexible Neural Architecture Search | Jiemin Fang; Yuzhu Sun; Qian Zhang; Yuan Li; Wenyu Liu; Xinggang Wang; | In this paper, we propose to search block counts and block widths by designing a densely connected search space, i.e., DenseNAS. |
1054 | Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning | Shizhe Chen; Yida Zhao; Qin Jin; Qi Wu; | To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels. |
1055 | Warp to the Future: Joint Forecasting of Features and Feature Motion | Josip Saric; Marin Orsic; Tonci Antunovic; Sacha Vrazic; Sinisa Segvic; | We propose to address this issue by complementing F2M forecasting with the classic F2F approach. |
1056 | Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio | Zhengsu Chen; Jianwei Niu; Lingxi Xie; Xuefeng Liu; Longhui Wei; Qi Tian; | This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs, so that under each network configuration, one can estimate the FLOPs utilization ratio (FUR) for each layer and use it to determine whether to increase or decrease the number of channels on the layer. |
1057 | Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences | Zhu Zhang; Zhou Zhao; Yang Zhao; Qi Wang; Huasheng Liu; Lianli Gao; | In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG). |
1058 | Cross-Modal Cross-Domain Moment Alignment Network for Person Search | Ya Jing; Wei Wang; Liang Wang; Tieniu Tan; | Specially, we propose a moment alignment network (MAN) to solve the cross-modal cross-domain person search task in this paper. |
1059 | Self-Training With Noisy Student Improves ImageNet Classification | Qizhe Xie; Minh-Thang Luong; Eduard Hovy; Quoc V. Le; | We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. |
1060 | Learning Longterm Representations for Person Re-Identification Using Radio Signals | Lijie Fan; Tianhong Li; Rongyao Fang; Rumen Hristov; Yuan Yuan; Dina Katabi; | In this paper, we introduce RF-ReID, a novel approach that harnesses radio frequency (RF) signals for longterm person ReID. |
1061 | LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation | Keunhong Park; Arsalan Mousavian; Yu Xiang; Dieter Fox; | We propose a novel framework for 6D pose estimation of unseen objects. |
1062 | Learning Instance Occlusion for Panoptic Segmentation | Justin Lazarow; Kwonjoon Lee; Kunyu Shi; Zhuowen Tu; | To resolve this issue, we propose a branch that is tasked with modeling how two instance masks should overlap one another as a binary relation. |
1063 | Vision-Dialog Navigation by Exploring Cross-Modal Memory | Yi Zhu; Fengda Zhu; Zhaohuan Zhan; Bingqian Lin; Jianbin Jiao; Xiaojun Chang; Xiaodan Liang; | In this paper, we propose the Cross-modal Memory Network (CMN) for remembering and understanding the rich information relevant to historical navigation actions. |
1064 | ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks | Mohit Shridhar; Jesse Thomason; Daniel Gordon; Yonatan Bisk; Winson Han; Roozbeh Mottaghi; Luke Zettlemoyer; Dieter Fox; | We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. |
1065 | NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing | Xin Huang; Zheng Ge; Zequn Jie; Osamu Yoshie; | To avoid such a dilemma, this paper proposes a novel Representative Region NMS (R2NMS) approach leveraging the less occluded visible parts, effectively removing the redundant boxes without bringing in many false positives. |
1066 | Visual Commonsense R-CNN | Tan Wang; Jianqiang Huang; Hanwang Zhang; Qianru Sun; | We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. |
1067 | What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective | Qilong Wang; Li Zhang; Banggu Wu; Dongwei Ren; Peihua Li; Wangmeng Zuo; Qinghua Hu; | In this paper, we make an attempt to understand what deep CNNs benefit from GCP in a viewpoint of optimization. |
1068 | EfficientDet: Scalable and Efficient Object Detection | Mingxing Tan; Ruoming Pang; Quoc V. Le; | In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. |
1069 | Fast Template Matching and Update for Video Object Tracking and Segmentation | Mingjie Sun; Jimin Xiao; Eng Gee Lim; Bingfeng Zhang; Yao Zhao; | In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided. |
1070 | Counterfactual Samples Synthesizing for Robust Visual Question Answering | Long Chen; Xin Yan; Jun Xiao; Hanwang Zhang; Shiliang Pu; Yueting Zhuang; | To this end, we propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme. |
1071 | Local-Global Video-Text Interactions for Temporal Grounding | Jonghwan Mun; Minsu Cho; Bohyung Han; | We tackle this problem using a novel regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query, which corresponds to important semantic entities described in the query (e.g., actors, objects, and actions), and reflect bi-modal interactions between the linguistic features of the query and the visual features of the video in multiple levels. |
1072 | Set-Constrained Viterbi for Set-Supervised Action Segmentation | Jun Li; Sinisa Todorovic; | Our first contribution is the formulation of a new set-constrained Viterbi algorithm (SCV). |
1073 | Probabilistic Video Prediction From Noisy Data With a Posterior Confidence | Yunbo Wang; Jiajun Wu; Mingsheng Long; Joshua B. Tenenbaum; | In this paper, we propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Network (BP-Net). |
1074 | Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context | Chenchen Liu; Yang Jin; Kehan Xu; Guoqiang Gong; Yadong Mu; | To address these issues, this work proposes a novel sliding-window scheme to simultaneously predict short-term and long-term relationships. |
1075 | Visual Grounding in Video for Unsupervised Word Translation | Gunnar A. Sigurdsson; Jean-Baptiste Alayrac; Aida Nematzadeh; Lucas Smaira; Mateusz Malinowski; Joao Carreira; Phil Blunsom; Andrew Zisserman; | Our goal is to use visual grounding to improve unsupervised word mapping between languages. |
1076 | Two Causal Principles for Improving Visual Dialog | Jiaxin Qi; Yulei Niu; Jianqiang Huang; Hanwang Zhang; | This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial). |
1077 | Spatio-Temporal Graph for Video Captioning With Knowledge Distillation | Boxiao Pan; Haoye Cai; De-An Huang; Kuan-Hui Lee; Adrien Gaidon; Ehsan Adeli; Juan Carlos Niebles; | In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time. |
1078 | A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension | Yue Liao; Si Liu; Guanbin Li; Fei Wang; Yanjie Chen; Chen Qian; Bo Li; | To this end, we propose a novel Realtime Cross-modality Correlation Filtering method (RCCF). |
1079 | Better Captioning With Sequence-Level Exploration | Jia Chen; Qin Jin; | In this work, we show the limitation of the current sequence-level learning objective for captioning tasks from both theory and empirical result. |
1080 | Violin: A Large-Scale Dataset for Video-and-Language Inference | Jingzhou Liu; Wenhu Chen; Yu Cheng; Zhe Gan; Licheng Yu; Yiming Yang; Jingjing Liu; | We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text. |
1081 | RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge | Jun Cheng; Fuxiang Wu; Yanling Tian; Lei Wang; Dapeng Tao; | To address this problem, we propose a novel rich feature generating text-to-image synthesis, called RiFeGAN, to enrich the given description. |
1082 | Graph Structured Network for Image-Text Matching | Chunxiao Liu; Zhendong Mao; Tianzhu Zhang; Hongtao Xie; Bin Wang; Yongdong Zhang; | In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. |
1083 | Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data | Washington Ramos; Michel Silva; Edson Araujo; Leandro Soriano Marcolino; Erickson Nascimento; | In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. |
1084 | Multi-Modality Cross Attention Network for Image and Sentence Matching | Xi Wei; Tianzhu Zhang; Yan Li; Yongdong Zhang; Feng Wu; | Different from them, in this work, we propose a novel MultiModality Cross Attention (MMCA) Network for image and sentence matching by jointly modeling the intra-modality and inter-modality relationships of image regions and sentence words in a unified deep model. |
1085 | Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data | Yen-Chang Hsu; Yilin Shen; Hongxia Jin; Zsolt Kira; | We base our work on a popular method ODIN, proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance. |
1086 | Learning Augmentation Network via Influence Functions | Donghoon Lee; Hyunsin Park; Trung Pham; Chang D. Yoo; | This paper considers an influence function that predicts how generalization performance, in terms of validation loss, is affected by a particular augmented training sample. |
1087 | X-Linear Attention Networks for Image Captioning | Yingwei Pan; Ting Yao; Yehao Li; Tao Mei; | In this paper, we introduce a unified attention block — X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning. |
1088 | Unsupervised Person Re-Identification via Multi-Label Classification | Dongkai Wang; Shiliang Zhang; | This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels. |
1089 | Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax | Yu Li; Tao Wang; Bingyi Kang; Sheng Tang; Chunfeng Wang; Jintao Li; Jiashi Feng; | In this work, we propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training. |
1090 | What You See is What You Get: Exploiting Visibility for 3D Object Detection | Peiyun Hu; Jason Ziglar; David Held; Deva Ramanan; | We argue that representing 2.5D data as collections of (x,y,z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning. |
1091 | Deep Structure-Revealed Network for Texture Recognition | Wei Zhai; Yang Cao; Zheng-Jun Zha; HaiYong Xie; Feng Wu; | To address this problem, we propose a novel Deep Structure-Revealed Network (DSR-Net) that leverages spatial dependency among the captured primitives as structural representation for texture recognition. |
1092 | Online Knowledge Distillation via Collaborative Learning | Qiushan Guo; Xinjiang Wang; Yichao Wu; Zhipeng Yu; Ding Liang; Xiaolin Hu; Ping Luo; | This work presents an efficient yet effective online Knowledge Distillation method via Collaborative Learning, termed KDCL, which is able to consistently improve the generalization ability of deep neural networks (DNNs) that have different learning capacities. |
1093 | Dynamic Convolution: Attention Over Convolution Kernels | Yinpeng Chen; Xiyang Dai; Mengchen Liu; Dongdong Chen; Lu Yuan; Zicheng Liu; | To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. |
1094 | 3DSSD: Point-Based 3D Single Stage Object Detector | Zetong Yang; Yanan Sun; Shu Liu; Jiaya Jia; | In this paper, we present a lightweight point-based 3D single stage object detector 3DSSD to achieve decent balance of accuracy and efficiency. |
1095 | Deep Degradation Prior for Low-Quality Image Classification | Yang Wang; Yang Cao; Zheng-Jun Zha; Jing Zhang; Zhiwei Xiong; | To address this problem, this paper proposes a novel deep degradation prior for low-quality image classification. |
1096 | ViBE: Dressing for Diverse Body Shapes | Wei-Lin Hsiao; Kristen Grauman; | We introduce ViBE, a VIsual Body-aware Embedding that captures clothing’s affinity with different body shapes. |
1097 | Don’t Judge an Object by Its Context: Learning to Overcome Contextual Bias | Krishna Kumar Singh; Dhruv Mahajan; Kristen Grauman; Yong Jae Lee; Matt Feiszli; Deepti Ghadiyaram; | Our goal is to accurately recognize a category in the absence of its context, without compromising on performance when it co-occurs with context. |
1098 | SESS: Self-Ensembling Semi-Supervised 3D Object Detection | Na Zhao; Tat-Seng Chua; Gim Hee Lee; | Inspired by the recent success of self-ensembling technique in semi-supervised image classification task, we propose SESS, a self-ensembling semi-supervised 3D object detection framework. |
1099 | Combining Detection and Tracking for Human Pose Estimation in Videos | Manchen Wang; Joseph Tighe; Davide Modolo; | We propose a novel top-down approach that tackles the problem of multi-person human pose estimation and tracking in videos. |
1100 | SAPIEN: A SimulAted Part-Based Interactive ENvironment | Fanbo Xiang; Yuzhe Qin; Kaichun Mo; Yikuan Xia; Hao Zhu; Fangchen Liu; Minghua Liu; Hanxiao Jiang; Yifu Yuan; He Wang; Li Yi; Angel X. Chang; Leonidas J. Guibas; Hao Su; | SAPIEN enables various robotic vision and interaction tasks that require detailed part-level understanding. |
1101 | RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds | Qingyong Hu; Bo Yang; Linhai Xie; Stefano Rosa; Yulan Guo; Zhihua Wang; Niki Trigoni; Andrew Markham; | In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. |
1102 | SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving | Zhenpei Yang; Yuning Chai; Dragomir Anguelov; Yin Zhou; Pei Sun; Dumitru Erhan; Sean Rafferty; Henrik Kretzschmar; | In this paper, we present a simple yet effective approach to generate realistic scenario sensor data, based only on a limited amount of lidar and camera data collected by an autonomous vehicle. |
1103 | A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors | Edward Kim; Divya Gopinath; Corina Pasareanu; Sanjit A. Seshia; | In this paper, we present a programmatic and semantic approach to explaining, understanding, and debugging the correct and incorrect behaviors of a neural network based perception system. |
1104 | Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks | Thomas Roddick; Roberto Cipolla; | In this work we present a simple, unified approach for estimating these map representations directly from monocular images using a single end-to-end deep learning architecture. |
1105 | Efficient Derivative Computation for Cumulative B-Splines on Lie Groups | Christiane Sommer; Vladyslav Usenko; David Schubert; Nikolaus Demmel; Daniel Cremers; | In this work we present an alternative derivation of time derivatives based on recurrence relations that needs O(k) instead of O(k^2) matrix operations (for a spline of order k) and results in simple and elegant expressions. |
1106 | RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real | Kanishka Rao; Chris Harris; Alex Irpan; Sergey Levine; Julian Ibarz; Mohi Khansari; | In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. |
1107 | LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World | Sivabalan Manivasagam; Shenlong Wang; Kelvin Wong; Wenyuan Zeng; Mikita Sazanovich; Shuhan Tan; Bin Yang; Wei-Chiu Ma; Raquel Urtasun; | We tackle the problem of producing realistic simulations of LiDAR point clouds, the sensor of preference for most self-driving vehicles. |
1108 | Just Go With the Flow: Self-Supervised Scene Flow Estimation | Himangi Mittal; Brian Okorn; David Held; | As an alternative, we present a method of training scene flow that uses two self-supervised losses, based on nearest neighbors and cycle consistency. |
1109 | TITAN: Future Forecast Using Action Priors | Srikanth Malla; Behzad Dariush; Chiho Choi; | In an attempt to address this problem, we introduce TITAN (Trajectory Inference using Targeted Action priors Network), a new model that incorporates prior positions, actions, and context to forecast future trajectory of agents and future ego-motion. |
1110 | Robust Learning Through Cross-Task Consistency | Amir R. Zamir; Alexander Sax; Nikhil Cheerla; Rohan Suri; Zhangjie Cao; Jitendra Malik; Leonidas J. Guibas; | We propose a flexible and fully computational framework for learning while enforcing Cross-Task Consistency (X-TAC). |
1111 | Dynamic Refinement Network for Oriented and Densely Packed Object Detection | Xingjia Pan; Yuqiang Ren; Kekai Sheng; Weiming Dong; Haolei Yuan; Xiaowei Guo; Chongyang Ma; Changsheng Xu; | To resolve the first two issues, we present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH). |
1112 | AOWS: Adaptive and Optimal Network Width Search With Latency Constraints | Maxim Berman; Leonid Pishchulin; Ning Xu; Matthew B. Blaschko; Gerard Medioni; | We introduce a novel efficient one-shot NAS approach to optimally search for channel numbers, given latency constraints on a specific hardware. |
1113 | High-Dimensional Convolutional Networks for Geometric Pattern Recognition | Christopher Choy; Junha Lee; Rene Ranftl; Jaesik Park; Vladlen Koltun; | In this work, we present high-dimensional convolutional networks for geometric pattern recognition problems that arise in 2D and 3D registration problems. |
1114 | Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks | Saurabh Singh; Shankar Krishnan; | In this paper we propose the Filter Response Normalization (FRN) layer, a novel combination of a normalization and an activation function, that can be used as a replacement for other normalizations and activations. |
1115 | Deep Iterative Surface Normal Estimation | Jan Eric Lenssen; Christian Osendorfer; Jonathan Masci; | This paper presents an end-to-end differentiable algorithm for robust and detail-preserving surface normal estimation on unstructured point-clouds. |
1116 | Dataless Model Selection With the Deep Frame Potential | Calvin Murdock; Simon Lucey; | Building upon theoretical connections between deep learning and sparse approximation, we propose the deep frame potential: a measure of coherence that is approximately related to representation stability but has minimizers that depend only on network structure. |
1117 | UNAS: Differentiable Architecture Search Meets Reinforcement Learning | Arash Vahdat; Arun Mallya; Ming-Yu Liu; Jan Kautz; | In this work, we present UNAS, a unified framework for NAS, that encapsulates recent DNAS and RL-based approaches under one framework. |
1118 | Local Context Normalization: Revisiting Local Normalization | Anthony Ortiz; Caleb Robinson; Dan Morris; Olac Fuentes; Christopher Kiekintveld; Md Mahmudulla Hassan; Nebojsa Jojic; | We propose an algorithmic solution to make LCN efficient for arbitrary window sizes, even if every point in the image has a unique window. |
1119 | ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning | Weiwei Sun; Wei Jiang; Eduard Trulls; Andrea Tagliasacchi; Kwang Moo Yi; | In this paper, we propose Attentive Context Normalization (ACN), a simple yet effective technique to build permutation-equivariant networks robust to outliers. |
1120 | Learning Situational Driving | Eshed Ohn-Bar; Aditya Prakash; Aseem Behl; Kashyap Chitta; Andreas Geiger; | Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes. |
1121 | From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction | Kaiyue Lu; Nick Barnes; Saeed Anwar; Liang Zheng; | This paper continues this line of research and aims to overcome the above shortcomings. |
1122 | Symmetry and Group in Attribute-Object Compositions | Yong-Lu Li; Yue Xu; Xiaohan Mao; Cewu Lu; | Incorporating the symmetry principle, a transformation framework inspired by group theory is built, i.e. SymNet. |
1123 | Noise-Aware Fully Webly Supervised Object Detection | Yunhang Shen; Rongrong Ji; Zhiwei Chen; Xiaopeng Hong; Feng Zheng; Jianzhuang Liu; Mingliang Xu; Qi Tian; | In this work, we propose an end-to-end framework to jointly learn webly supervised detectors and reduce the negative impact of noisy labels. |
1124 | 3D Part Guided Image Editing for Fine-Grained Object Understanding | Zongdai Liu; Feixiang Lu; Peng Wang; Hui Miao; Liangjun Zhang; Ruigang Yang; Bin Zhou; | In this paper, we fill this important missing piece in autonomous driving by solving two critical issues. |
1125 | STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction | Zhishuai Zhang; Jiyang Gao; Junhua Mao; Yukai Liu; Dragomir Anguelov; Congcong Li; | In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet). |
1126 | Rethinking Performance Estimation in Neural Architecture Search | Xiawu Zheng; Rongrong Ji; Qiang Wang; Qixiang Ye; Zhenguo Li; Yonghong Tian; Qi Tian; | In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space. |
1127 | Feature-Metric Registration: A Fast Semi-Supervised Approach for Robust Point Cloud Registration Without Correspondences | Xiaoshui Huang; Guofeng Mei; Jian Zhang; | We present a fast feature-metric point cloud registration framework, which enforces the optimisation of registration by minimising a feature-metric projection error without correspondences. |
1128 | Learning Multi-View Camera Relocalization With Graph Neural Networks | Fei Xue; Xin Wu; Shaojun Cai; Junqiu Wang; | We propose to construct a view graph to excavate the information of the whole given sequence for absolute camera pose estimation. |
1129 | MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps | Pengxiang Wu; Siheng Chen; Dimitris N. Metaxas; | In this work, we propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds. |
1130 | EcoNAS: Finding Proxies for Economical Neural Architecture Search | Dongzhan Zhou; Xinchi Zhou; Wenwei Zhang; Chen Change Loy; Shuai Yi; Xuesen Zhang; Wanli Ouyang; | In this paper, we observe that most existing proxies exhibit different behaviors in maintaining the rank consistency among network candidates. |
1131 | Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection | Jianyuan Guo; Kai Han; Yunhe Wang; Chao Zhang; Zhaohui Yang; Han Wu; Xinghao Chen; Chang Xu; | To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i.e. backbone, neck, and head) of object detector in an end-to-end manner. |
1132 | Geometrically Principled Connections in Graph Neural Networks | Shunwang Gong; Mehdi Bahri; Michael M. Bronstein; Stefanos Zafeiriou; | In this paper, we argue geometry should remain the primary driving force behind innovation in the emerging field of geometric deep learning. |
1133 | On Vocabulary Reliance in Scene Text Recognition | Zhaoyi Wan; Jielei Zhang; Liang Zhang; Jiebo Luo; Cong Yao; | In this paper, we establish an analytical framework, in which different datasets, metrics and module combinations for quantitative comparisons are devised, to conduct an in-depth study on the problem of vocabulary reliance in scene text recognition. |
1134 | Generating Accurate Pseudo-Labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations | Vishnu Suresh Lokhande; Songwong Tasneeyapant; Abhay Venkatesh; Sathya N. Ravi; Vikas Singh; | Motivated by some of these results, we explore the use of Hermite polynomial expansions as a substitute for ReLUs in deep networks. |
1135 | GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping | Hao-Shu Fang; Chenxi Wang; Minghao Gou; Cewu Lu; | In this work, we contribute a large-scale grasp pose detection dataset with a unified evaluation system. |
1136 | PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation | Jianzhun Shao; Yuhang Jiang; Gu Wang; Zhigang Li; Xiangyang Ji; | In this work, to get rid of the burden of 6D annotations, we formulate the 6D pose refinement as a Markov Decision Process and impose on the reinforcement learning approach with only 2D image annotations as weakly-supervised 6D pose information, via a delicate reward definition and a composite reinforced optimization method for efficient and effective policy training. |
1137 | Through Fog High-Resolution Imaging Using Millimeter Wave Radar | Junfeng Guan; Sohrab Madani; Suraj Jog; Saurabh Gupta; Haitham Hassanieh; | We introduce HawkEye, a system that leverages a cGAN architecture to recover high-frequency shapes from raw low-resolution mmWave heat-maps. |
1138 | Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction | Vincent Le Guen; Nicolas Thome; | Since physics is too restrictive for describing the full visual content of generic video sequences, we introduce PhyDNet, a two-branch deep architecture, which explicitly disentangles PDE dynamics from unknown complementary information. |
1139 | D2Det: Towards High Quality Object Detection and Instance Segmentation | Jiale Cao; Hisham Cholakkal; Rao Muhammad Anwer; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao; | We propose a novel two-stage detection method, D2Det, that collectively addresses both precise localization and accurate classification. |
1140 | LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention | Junbo Yin; Jianbing Shen; Chenye Guan; Dingfu Zhou; Ruigang Yang; | In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences. |
1141 | Orthogonal Convolutional Neural Networks | Jiayun Wang; Yubei Chen; Rudrasis Chakraborty; Stella X. Yu; | We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, instead of the common kernel orthogonality approach, which we show is only necessary but not sufficient for ensuring orthogonal convolutions. |
1142 | Self-Robust 3D Point Recognition via Gather-Vector Guidance | Xiaoyi Dong; Dongdong Chen; Hang Zhou; Gang Hua; Weiming Zhang; Nenghai Yu; | In this paper, we look into the problem of 3D adversary attack, and propose to leverage the internal properties of the point clouds and the adversarial examples to design a new self-robust deep neural network (DNN) based 3D recognition systems. |
1143 | VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation | Jiyang Gao; Chen Sun; Hang Zhao; Yi Shen; Dragomir Anguelov; Congcong Li; Cordelia Schmid; | This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. |
1144 | ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks | Qilong Wang; Banggu Wu; Pengfei Zhu; Peihua Li; Wangmeng Zuo; Qinghua Hu; | To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. |
1145 | MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning | Yuan Gao; Haoping Bai; Zequn Jie; Jiayi Ma; Kui Jia; Wei Liu; | We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). |
1146 | PnPNet: End-to-End Perception and Prediction With Tracking in the Loop | Ming Liang; Bin Yang; Wenyuan Zeng; Yun Chen; Rui Hu; Sergio Casas; Raquel Urtasun; | Towards this goal we propose PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories. |
1147 | Revisiting the Sibling Head in Object Detector | Guanglu Song; Yu Liu; Xiaogang Wang; | This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process, but this misalignment can be resolved by a very simple operator called task-aware spatial disentanglement (TSD). |
1148 | Visual Reaction: Learning to Play Catch With Your Drone | Kuo-Hao Zeng; Roozbeh Mottaghi; Luca Weihs; Ali Farhadi; | In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agents itself. |
1149 | Prime Sample Attention in Object Detection | Yuhang Cao; Kai Chen; Chen Change Loy; Dahua Lin; | In this work, we revisit this paradigm through a careful study on how different samples contribute to the overall performance measured in terms of mAP. |
1150 | SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization | Xianzhi Du; Tsung-Yi Lin; Pengchong Jin; Golnaz Ghiasi; Mingxing Tan; Yin Cui; Quoc V. Le; Xiaodan Song; | In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. |
1151 | KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects | Xingyu Liu; Rico Jonschkowski; Anelia Angelova; Kurt Konolige; | We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called KeyPose, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects. |
1152 | SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel | Huan Lei; Naveed Akhtar; Ajmal Mian; | Inspired by this observation, we incorporate a fuzzy mechanism into discrete convolutional kernels for 3D point clouds as our first major contribution. |
1153 | nuScenes: A Multimodal Dataset for Autonomous Driving | Holger Caesar; Varun Bankiti; Alex H. Lang; Sourabh Vora; Venice Erin Liong; Qiang Xu; Anush Krishnan; Yu Pan; Giancarlo Baldan; Oscar Beijbom; | In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view. |
1154 | PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation | Yisheng He; Wei Sun; Haibin Huang; Jianran Liu; Haoqiang Fan; Jian Sun; | In this work, we present a novel data-driven method for robust 6DoF object pose estimation from a single RGBD image. |
1155 | Probabilistic Pixel-Adaptive Refinement Networks | Anne S. Wannenwetsch; Stefan Roth; | We introduce probabilistic pixel-adaptive convolutions (PPACs), which not only depend on image guidance data for filtering, but also respect the reliability of per-pixel predictions. |
1156 | Discovering Human Interactions With Novel Objects via Zero-Shot Learning | Suchen Wang; Kim-Hui Yap; Junsong Yuan; Yap-Peng Tan; | We aim to detect human interactions with novel objects through zero-shot learning. |
1157 | Equalization Loss for Long-Tailed Object Recognition | Jingru Tan; Changbao Wang; Buyu Li; Quanquan Li; Wanli Ouyang; Changqing Yin; Junjie Yan; | In this work, we analyze this problem from a novel perspective: each positive sample of one category can be seen as a negative sample for other categories, making the tail categories receive more discouraging gradients. |
1158 | Learning Depth-Guided Convolutions for Monocular 3D Object Detection | Mingyu Ding; Yuqi Huo; Hongwei Yi; Zhe Wang; Jianping Shi; Zhiwu Lu; Ping Luo; | In this work, instead of using pseudo-LiDAR representation, we improve the fundamental 2D fully convolutions by proposing a new local convolutional network (LCN), termed Depth-guided Dynamic-Depthwise-Dilated LCN (D4LCN), where the filters and their receptive fields can be automatically learned from image-based depth maps, making different pixels of different images have different filters. |
1159 | Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather | Mario Bijelic; Tobias Gruber; Fahim Mannan; Florian Kraus; Werner Ritter; Klaus Dietmayer; Felix Heide; | To this end, we present a deep fusion network for robust fusion without a large corpus of labeled training data covering all asymmetric distortions. |
1160 | Don’t Even Look Once: Synthesizing Features for Zero-Shot Detection | Pengkai Zhu; Hanxiao Wang; Venkatesh Saligrama; | We propose a novel detection algorithm "Don’t Even Look Once (DELO)," that synthesizes visual features for unseen objects and augments existing training algorithms to incorporate unseen object detection. |
1161 | EPOS: Estimating 6D Pose of Objects With Symmetries | Tomas Hodan; Daniel Barath; Jiri Matas; | We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. |
1162 | Train in Germany, Test in the USA: Making 3D Object Detectors Generalize | Yan Wang; Xiangyu Chen; Yurong You; Li Erran Li; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; Wei-Lun Chao; | In this paper we consider the task of adapting 3D object detectors from one dataset to another. |
1163 | Exploring Categorical Regularization for Domain Adaptive Object Detection | Chang-Dong Xu; Xing-Ran Zhao; Xin Jin; Xiu-Shen Wei; | In this paper, we tackle the domain adaptive object detection problem, where the main challenge lies in significant domain gaps between source and target domains. |
1164 | Neural Implicit Embedding for Point Cloud Analysis | Kent Fujiwara; Taiichi Hashimoto; | We present a novel representation for point clouds that encapsulates the local characteristics of the underlying structure. |
1165 | Pose-Guided Visible Part Matching for Occluded Person ReID | Shang Gao; Jingya Wang; Huchuan Lu; Zimo Liu; | To address this issue, we propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility in an end-to-end framework. |
1166 | ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection | Yuxin Wang; Hongtao Xie; Zheng-Jun Zha; Mengting Xing; Zilong Fu; Yongdong Zhang; | In this paper, we propose the ContourNet, which effectively handles these two problems taking a further step toward accurate arbitrary-shaped text detection. |
1167 | Exploring Data Aggregation in Policy Learning for Vision-Based Urban Autonomous Driving | Aditya Prakash; Aseem Behl; Eshed Ohn-Bar; Kashyap Chitta; Andreas Geiger; | Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy’s state distribution. |
1168 | Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition | Mohan Zhou; Yalong Bai; Wei Zhang; Tiejun Zhao; Tao Mei; | In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework. |
1169 | Recognizing Objects From Any View With Object and Viewer-Centered Representations | Sainan Liu; Vincent Nguyen; Isaac Rehg; Zhuowen Tu; | In this paper, we tackle an important task in computer vision: any view object recognition. |
1170 | Gated Channel Transformation for Visual Recognition | Zongxin Yang; Linchao Zhu; Yu Wu; Yi Yang; | In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. |
1171 | Non-Local Neural Networks With Grouped Bilinear Attentional Transforms | Lu Chi; Zehuan Yuan; Yadong Mu; Changhu Wang; | This work proposes a novel non-local operator. It is inspired by the attention mechanism of human visual system, which can quickly attend to important local parts in sight and suppress other less-relevant information. |
1172 | Generative-Discriminative Feature Representations for Open-Set Recognition | Pramuditha Perera; Vlad I. Morariu; Rajiv Jain; Varun Manjunatha; Curtis Wigington; Vicente Ordonez; Vishal M. Patel; | We propose two techniques to force class activations of open-set samples to be low. |
1173 | RPM-Net: Robust Point Matching Using Learned Features | Zi Jian Yew; Gim Hee Lee; | In this paper, we propose the RPM-Net — a less sensitive to initialization and more robust deep learning-based approach for rigid point cloud registration. |
1174 | Sideways: Depth-Parallel Training of Video Models | Mateusz Malinowski; Grzegorz Swirszcz; Joao Carreira; Viorica Patraucean; | We propose Sideways, an approximate backpropagation scheme for training video models. |
1175 | Basis Prediction Networks for Effective Burst Denoising With Large Kernels | Zhihao Xia; Federico Perazzi; Michael Gharbi; Kalyan Sunkavalli; Ayan Chakrabarti; | To this end, we introduce a novel basis prediction network that, given an input burst, predicts a set of global basis kernels — shared within the image — and the corresponding mixing coefficients — which are specific to individual pixels. |
1176 | Private-kNN: Practical Differential Privacy for Computer Vision | Yuqing Zhu; Xiang Yu; Manmohan Chandraker; Yu-Xiang Wang; | We propose a practically data-efficient scheme based on private release of k-nearest neighbor (kNN) queries, which altogether avoids splitting the training dataset. |
1177 | SP-NAS: Serial-to-Parallel Backbone Search for Object Detection | Chenhan Jiang; Hang Xu; Wei Zhang; Xiaodan Liang; Zhenguo Li; | In this paper, we propose a two-phase serial-to-parallel architecture search framework named SP-NAS towards a flexible task-oriented detection backbone. |
1178 | Structure Aware Single-Stage 3D Object Detection From Point Cloud | Chenhang He; Hui Zeng; Jianqiang Huang; Xian-Sheng Hua; Lei Zhang; | In this work, we propose to improve the localization precision of single-stage detectors by explicitly leveraging the structure information of 3D point cloud. |
1179 | "Looking at the Right Stuff" – Guided Semantic-Gaze for Autonomous Driving | Anwesan Pal; Sayan Mondal; Henrik I. Christensen; | We propose a novel Semantics Augmented GazE (SAGE) detection approach that captures driving specific contextual information, in addition to the raw gaze. |
1180 | What’s Hidden in a Randomly Weighted Neural Network? | Vivek Ramanujan; Mitchell Wortsman; Aniruddha Kembhavi; Ali Farhadi; Mohammad Rastegari; | We empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an "untrained subnetwork" approaches a network with learned weights in accuracy. |
1181 | Structured Multi-Hashing for Model Compression | Elad Eban; Yair Movshovitz-Attias; Hao Wu; Mark Sandler; Andrew Poon; Yerlan Idelbayev; Miguel A. Carreira-Perpinan; | In this work we combine ideas from weight hashing and dimensionality reductions resulting in a simple and powerful structured multi-hashing method based on matrix products that allows direct control of model size of any deep network and is trained end-to-end. |
1182 | DOPS: Learning to Detect 3D Objects and Predict Their 3D Shapes | Mahyar Najibi; Guangda Lai; Abhijit Kundu; Zhichao Lu; Vivek Rathod; Thomas Funkhouser; Caroline Pantofaru; David Ross; Larry S. Davis; Alireza Fathi; | We propose DOPS, a fast single-stage 3D object detection method for LIDAR data. |
1183 | AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization | Yiming Li; Changhong Fu; Fangqiang Ding; Ziyuan Huang; Geng Lu; | In this work, a novel approach is proposed to online automatically and adaptively learn spatio-temporal regularization term. |
1184 | GP-NAS: Gaussian Process Based Neural Architecture Search | Zhihang Li; Teng Xi; Jiankang Deng; Gang Zhang; Shengzhao Wen; Ran He; | In this paper, we aim to address three important questions in NAS: (1) How to measure the correlation between architectures and their performances? (2) How to evaluate the correlation between different architectures? (3) How to learn these correlations with a small number of samples? |
1185 | NAS-FCOS: Fast Neural Architecture Search for Object Detection | Ning Wang; Yang Gao; Hao Chen; Peng Wang; Zhi Tian; Chunhua Shen; Yanning Zhang; | Here we propose to search for the decoder structure of object detectors with search efficiency being taken into consideration. |
1186 | TCTS: A Task-Consistent Two-Stage Framework for Person Search | Cheng Wang; Bingpeng Ma; Hong Chang; Shiguang Shan; Xilin Chen; | To address the consistency problem, we introduce a Task-Consist Two-Stage (TCTS) person search framework, includes an identity-guided query (IDGQ) detector and a Detection Results Adapted (DRA) re-ID model. |
1187 | SCATTER: Selective Context Attentional Scene Text Recognizer | Ron Litman; Oron Anschel; Shahar Tsiper; Roee Litman; Shai Mazor; R. Manmatha; | In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). |
1188 | Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation | Dengsheng Chen; Jun Li; Zheng Wang; Kai Xu; | We present a novel approach to category-level 6D object pose and size estimation. |
1189 | Hierarchical Scene Coordinate Classification and Regression for Visual Localization | Xiaotian Li; Shuzhe Wang; Yi Zhao; Jakob Verbeek; Juho Kannala; | In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. |
1190 | MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation | Chaoyang He; Haishan Ye; Li Shen; Tong Zhang; | To remedy this, this paper proposes MiLeNAS, a mixed-level reformulation for NAS that can be optimized efficiently and reliably. |
1191 | Scalable Uncertainty for Computer Vision With Functional Variational Inference | Eduardo D. C. Carvalho; Ronald Clark; Andrea Nicastro; Paul H. J. Kelly; | By leveraging the structure of the induced covariance matrices, we propose numerically efficient algorithms which enable fast training in the context of high-dimensional tasks such as depth estimation and semantic segmentation. |
1192 | Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End | Abdelrahman Eldesokey; Michael Felsberg; Karl Holmquist; Michael Persson; | In this work, we thus focus on modeling the uncertainty of depth data in depth completion starting from the sparse noisy input all the way to the final prediction. |
1193 | Butterfly Transform: An Efficient FFT Based Neural Architecture Design | Keivan Alizadeh vahid; Anish Prabhu; Ali Farhadi; Mohammad Rastegari; | In this paper, we show that extending the butterfly operations from the FFT algorithm to a general Butterfly Transform (BFT) can be beneficial in building an efficient block structure for CNN designs. |
1194 | A Certifiably Globally Optimal Solution to Generalized Essential Matrix Estimation | Ji Zhao; Wanting Xu; Laurent Kneip; | We present a convex optimization approach for generalized essential matrix (GEM) estimation. |
1195 | MUXConv: Information Multiplexing in Convolutional Neural Networks | Zhichao Lu; Kalyanmoy Deb; Vishnu Naresh Boddeti; | To overcome this limitation, we present MUXConv, a layer that is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network, while mitigating computational complexity. |
1196 | PointGMM: A Neural GMM Network for Point Clouds | Amir Hertz; Rana Hanocka; Raja Giryes; Daniel Cohen-Or; | We present PointGMM, a neural network that learns to generate hGMMs which are characteristic of the shape class, and also coincide with the input point cloud. |
1197 | Noisier2Noise: Learning to Denoise From Unpaired Noisy Data | Nick Moran; Dan Schmidt; Yu Zhong; Patrick Coady; | We present a method for training a neural network to perform image denoising without access to clean training examples or access to paired noisy training examples. |
1198 | TRPLP – Trifocal Relative Pose From Lines at Points | Ricardo Fabbri; Timothy Duff; Hongyi Fan; Margaret H. Regan; David da Costa de Pinho; Elias Tsigaridas; Charles W. Wampler; Jonathan D. Hauenstein; Peter J. Giblin; Benjamin Kimia; Anton Leykin; Tomas Pajdla; | We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of (i) three points and one line and (ii) three points and two lines through two of the points. |
1199 | DSNAS: Direct Neural Architecture Search Without Parameter Retraining | Shoukang Hu; Sirui Xie; Hehui Zheng; Chunxiao Liu; Jianping Shi; Xunying Liu; Dahua Lin; | In this work, we propose a new problem definition for NAS, task-specific end-to-end, based on this observation. |
1200 | MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships | Yongjian Chen; Lei Tai; Kai Sun; Mingyang Li; | To this end, we propose a novel method to improve the monocular 3D object detection by considering the relationship of paired samples. |
1201 | Regularization on Spatio-Temporally Smoothed Feature for Action Recognition | Jinhyung Kim; Seunghwan Cha; Dongyoon Wee; Soonmin Bae; Junmo Kim; | In this paper, we propose Random Mean Scaling (RMS), a simple and effective regularization method, to relieve the overfitting problem in 3D residual networks. |
1202 | Towards Accurate Scene Text Recognition With Semantic Reasoning Networks | Deli Yu; Xuan Li; Chengquan Zhang; Tao Liu; Junyu Han; Jingtuo Liu; Errui Ding; | To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission. |
1203 | Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation | Juncheng Li; Xin Wang; Siliang Tang; Haizhou Shi; Fei Wu; Yueting Zhuang; William Yang Wang; | In this paper, we focus on visual navigation in the low-resource setting, where we have only a few training environments annotated with object information. |
1204 | Inferring Attention Shift Ranks of Objects for Image Saliency | Avishek Siris; Jianbo Jiao; Gary K.L. Tam; Xianghua Xie; Rynson W.H. Lau; | Following psychological studies, in this paper, we propose to predict the saliency rank by inferring human attention shift. Due to the lack of such data, we first construct a large-scale salient object ranking dataset. |
1205 | Camera On-Boarding for Person Re-Identification Using Hypothesis Transfer Learning | Sk Miraj Ahmed; Aske R. Lejbolle; Rameswar Panda; Amit K. Roy-Chowdhury; | Rather, based on the fact that it is easy to store the learned re-identifications models, which mitigates any data privacy concern, we develop an efficient model adaptation approach using hypothesis transfer learning that aims to transfer the knowledge using only source models and limited labeled data, but without using any source camera data from the existing network. |
1206 | Joint Graph-Based Depth Refinement and Normal Estimation | Mattia Rossi; Mireille El Gheche; Andreas Kuhn; Pascal Frossard; | With these settings in mind, we devise a novel depth refinement framework that aims at recovering the underlying piece-wise planarity of those inverse depth maps associated to piece-wise planar scenes. |
1207 | DR Loss: Improving Object Detection by Distributional Ranking | Qi Qian; Lei Chen; Hao Li; Rong Jin; | In this work, we propose a novel distributional ranking (DR) loss to handle the challenge. |
1208 | Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection | Guansong Pang; Cheng Yan; Chunhua Shen; Anton van den Hengel; Xiao Bai; | By formulating a surrogate two-class ordinal regression task we devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data. |
1209 | Few-Shot Class-Incremental Learning | Xiaoyu Tao; Xiaopeng Hong; Xinyuan Chang; Songlin Dong; Xing Wei; Yihong Gong; | To address this problem, we represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes. On this basis, we propose the TOpology-Preserving knowledge InCrementer (TOPIC) framework. |
1210 | PolarMask: Single Shot Instance Segmentation With Polar Representation | Enze Xie; Peize Sun; Xiaoge Song; Wenhai Wang; Xuebo Liu; Ding Liang; Chunhua Shen; Ping Luo; | In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most off-the-shelf detection methods. |
1211 | DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers | Chi Zhang; Yujun Cai; Guosheng Lin; Chunhua Shen; | In this paper, we address the few-shot classification task from a new perspective of optimal matching between image regions. |
1212 | Detection in Crowded Scenes: One Proposal, Multiple Predictions | Xuangeng Chu; Anlin Zheng; Xiangyu Zhang; Jian Sun; | We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes. |
1213 | Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors | Sergey Zakharov; Wadim Kehl; Arjun Bhargava; Adrien Gaidon; | We present an automatic annotation pipeline to recover 9D cuboids and 3D shapes from pre-trained off-the-shelf 2D detectors and sparse LIDAR data. |
1214 | Interactive Object Segmentation With Inside-Outside Guidance | Shiyin Zhang; Jun Hao Liew; Yunchao Wei; Shikui Wei; Yao Zhao; | To achieve this, we propose an Inside-Outside Guidance (IOG) approach in this work. |
1215 | Mnemonics Training: Multi-Class Incremental Learning Without Forgetting | Yaoyao Liu; Yuting Su; An-An Liu; Bernt Schiele; Qianru Sun; | This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. |
1216 | Learning to Segment 3D Point Clouds in 2D Image Space | Yecheng Lyu; Xinming Huang; Ziming Zhang; | In contrast to the literature where local patterns in 3D point clouds are captured by customized convolutional operators, in this paper we study the problem of how to effectively and efficiently project such point clouds into a 2D image space so that traditional 2D convolutional neural networks (CNNs) such as U-Net can be applied for segmentation. |
1217 | Smooth Shells: Multi-Scale Shape Registration With Functional Maps | Marvin Eisenberger; Zorah Lahner; Daniel Cremers; | We propose a novel 3D shape correspondence method based on the iterative alignment of so-called smooth shells. |
1218 | Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation | Yude Wang; Jie Zhang; Meina Kan; Shiguang Shan; Xilin Chen; | In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap. |
1219 | Efficient Neural Vision Systems Based on Convolutional Image Acquisition | Pedram Pad; Simon Narduzzi; Clement Kundig; Engin Turetken; Siavash A. Bigdeli; L. Andrea Dunbar; | In this paper, we tackle this fundamental challenge by introducing a hybrid optical-digital implementation of a convolutional neural network (CNN) based on engineering of the point spread function (PSF) of an optical imaging system. |
1220 | Visual Chirality | Zhiqiu Lin; Jin Sun; Abe Davis; Noah Snavely; | In this paper, we investigate how the statistics of visual data are changed by reflection. |
1221 | What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images | Xing Xu; Jiefu Chen; Jinhui Xiao; Lianli Gao; Fumin Shen; Heng Tao Shen; | Specifically, we propose a novel and efficient optimization-based method that can be naturally integrated to different sequential prediction schemes, i.e., connectionist temporal classification (CTC) and attention mechanism. |
1222 | Dynamic Traffic Modeling From Overhead Imagery | Scott Workman; Nathan Jacobs; | Instead, we propose an automatic approach for generating dynamic maps of traffic speeds using convolutional neural networks. |
1223 | Satellite Image Time Series Classification With Pixel-Set Encoders and Temporal Self-Attention | Vivien Sainte Fare Garnot; Loic Landrieu; Sebastien Giordano; Nesrine Chehata; | We propose an alternative approach in which the convolutional layers are advantageously replaced with encoders operating on unordered sets of pixels to exploit the typically coarse resolution of publicly available satellite images. |
1224 | DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads | Xi Zhang; Xiaolin Wu; Xinliang Zhai; Xianye Ben; Chengjie Tu; | To address this problem, we present a novel deep convolutional neural network (DCNN) method for very low bit rate video reconstruction of talking heads. |
1225 | Learning When and Where to Zoom With Deep Reinforcement Learning | Burak Uzkent; Stefano Ermon; | In this direction, we propose PatchDrop a reinforcement learning approach to dynamically identify when and where to use/acquire high resolution data conditioned on the paired, cheap, low resolution images. |
1226 | Cross-Domain Detection via Graph-Induced Prototype Alignment | Minghao Xu; Hang Wang; Bingbing Ni; Qi Tian; Wenjun Zhang; | To mitigate these problems, we propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment via elaborate prototype representations. |
1227 | Meta-Learning of Neural Architectures for Few-Shot Learning | Thomas Elsken; Benedikt Staffler; Jan Hendrik Metzen; Frank Hutter; | To improve upon this, we propose MetaNAS, the first method which fully integrates NAS with gradient-based meta-learning. |
1228 | Towards Inheritable Models for Open-Set Domain Adaptation | Jogendra Nath Kundu; Naveen Venkat; Ambareesh Revanur; Rahul M V; R. Venkatesh Babu; | Addressing this, we introduce a practical DA paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. |
1229 | Learning From Synthetic Animals | Jiteng Mu; Weichao Qiu; Gregory D. Hager; Alan L. Yuille; | In this paper, we use synthetic images and ground truth generated from CAD animal models to address this challenge. |
1230 | Distilling Cross-Task Knowledge via Relationship Matching | Han-Jia Ye; Su Lu; De-Chuan Zhan; | This paper deals with a general scenario reusing the knowledge from a cross-task teacher — two models are targeting non-overlapping label spaces. |
1231 | Open Compound Domain Adaptation | Ziwei Liu; Zhongqi Miao; Xingang Pan; Xiaohang Zhan; Dahua Lin; Stella X. Yu; Boqing Gong; | We propose a new approach based on two technical insights into OCDA: 1) a curriculum domain adaptation strategy to bootstrap generalization across domains in a data-driven self-organizing fashion and 2) a memory module to increase the model’s agility towards novel domains. |
1232 | Context Prior for Scene Segmentation | Changqian Yu; Jingbo Wang; Changxin Gao; Gang Yu; Chunhua Shen; Nong Sang; | In this work, we directly supervise the feature aggregation to distinguish the intra-class and interclass context clearly. |
1233 | Tangent Images for Mitigating Spherical Distortion | Marc Eder; Mykhailo Shvets; John Lim; Jan-Michael Frahm; | In this work, we propose "tangent images," a spherical image representation that facilitates transferable and scalable 360 degree computer vision. |
1234 | Learning a Dynamic Map of Visual Appearance | Tawfiq Salem; Scott Workman; Nathan Jacobs; | Every day billions of images capture this complex relationship, many of which are associated with precise time and location metadata. We propose to use these images to construct a global-scale, dynamic map of visual appearance attributes. |
1235 | Webly Supervised Knowledge Embedding Model for Visual Reasoning | Wenbo Zheng; Lan Yan; Chao Gou; Fei-Yue Wang; | We present a two-stage approach for the task that can augment knowledge through an effective embedding model with weakly supervised web data. |
1236 | Gradually Vanishing Bridge for Adversarial Domain Adaptation | Shuhao Cui; Shuhui Wang; Junbao Zhuo; Chi Su; Qingming Huang; Qi Tian; | In this paper, we equip adversarial domain adaptation with Gradually Vanishing Bridge (GVB) mechanism on both generator and discriminator. |
1237 | Active Speakers in Context | Juan Leon Alcazar; Fabian Caba; Long Mai; Federico Perazzi; Joon-Young Lee; Pablo Arbelaez; Bernard Ghanem; | This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. |
1238 | Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation | Bowen Cheng; Maxwell D. Collins; Yukun Zhu; Ting Liu; Thomas S. Huang; Hartwig Adam; Liang-Chieh Chen; | In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. |
1239 | Inter-Region Affinity Distillation for Road Marking Segmentation | Yuenan Hou; Zheng Ma; Chunxiao Liu; Tak-Wai Hui; Chen Change Loy; | In this work, we explore a novel knowledge distillation (KD) approach that can transfer ‘knowledge’ on scene structure more effectively from a teacher to a student model. |
1240 | Unified Dynamic Convolutional Network for Super-Resolution With Variational Degradations | Yu-Syuan Xu; Shou-Yao Roy Tseng; Yu Tseng; Hsien-Kai Kuo; Yi-Min Tsai; | To fulfill this requirement, this paper proposes a unified network to accommodate the variations from inter-image (cross-image variations) and intra-image (spatial variations). |
1241 | Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks | Luca Bertinetto; Romain Mueller; Konstantinos Tertikas; Sina Samangooei; Nicholas A. Lord; | In this paper, we aim to renew interest in this problem by reviewing past approaches and proposing two simple methods which outperform the prior art under several metrics on two large datasets with complex class hierarchies: tieredImageNet and iNaturalist’19. |
1242 | Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN | Jingwen Ye; Yixin Ji; Xinchao Wang; Xin Gao; Mingli Song; | In this paper, we propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers. |
1243 | Screencast Tutorial Video Understanding | Kunpeng Li; Chen Fang; Zhaowen Wang; Seokhwan Kim; Hailin Jin; Yun Fu; | In this paper, we propose visual understanding of screencast tutorials as a new research problem to the computer vision community. We collect a new dataset of Adobe Photoshop video tutorials and annotate it with both low-level and high-level semantic labels. |
1244 | DSGN: Deep Stereo Geometry Network for 3D Object Detection | Yilun Chen; Shu Liu; Xiaoyong Shen; Jiaya Jia; | Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation — 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. |
1245 | Weakly-Supervised Salient Object Detection via Scribble Annotations | Jing Zhang; Xin Yu; Aixuan Li; Peipei Song; Bowen Liu; Yuchao Dai; | In this paper, we propose a weakly-supervised salient object detection model to learn saliency from such annotations. |
1246 | Learning to Learn Single Domain Generalization | Fengchun Qiao; Long Zhao; Xi Peng; | We propose a new method named adversarial domain augmentation to solve this Out-of-Distribution (OOD) generalization problem. |
1247 | Severity-Aware Semantic Segmentation With Reinforced Wasserstein Training | Xiaofeng Liu; Wenxuan Ji; Jane You; Georges El Fakhri; Jonghye Woo; | To sidestep this, in this work, we propose to incorporate the severity-aware inter-class correlation into our Wasserstein training framework by configuring its ground distance matrix. |
1248 | Boosting Few-Shot Learning With Adaptive Margin Loss | Aoxue Li; Weiran Huang; Xu Lan; Jiashi Feng; Zhenguo Li; Liwei Wang; | This paper proposes an adaptive margin principle to improve the generalization ability of metric-based meta-learning approaches for few-shot learning problems. |
1249 | JA-POLS: A Moving-Camera Background Model via Joint Alignment and Partially-Overlapping Local Subspaces | Irit Chelly; Vlad Winter; Dor Litvak; David Rosen; Oren Freifeld; | Here we propose a purely-2D unsupervised modular method that systematically eliminates those issues. |
1250 | AugFPN: Improving Multi-Scale Feature Learning for Object Detection | Chaoxu Guo; Bin Fan; Qian Zhang; Shiming Xiang; Chunhong Pan; | In this paper, we begin by first analyzing the design defects of feature pyramid in FPN, and then introduce a new feature pyramid architecture named AugFPN to address these problems. |
1251 | xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation | Maximilian Jaritz; Tuan-Hung Vu; Raoul de Charette; Emilie Wirbel; Patrick Perez; | In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation. |
1252 | Norm-Aware Embedding for Efficient Person Search | Di Chen; Shanshan Zhang; Jian Yang; Bernt Schiele; | To this end, We present a novel approach called Norm-Aware Embedding to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training. |
1253 | Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only | Qi Chen; Qi Wu; Rui Tang; Yuhan Wang; Shuai Wang; Mingkui Tan; | In this paper, we formulate it as a language conditioned visual content generation problem that is further divided into a floor plan generation and an interior texture (such as floor and wall) synthesis task. |
1254 | Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation | Zhonghao Wang; Mo Yu; Yunchao Wei; Rogerio Feris; Jinjun Xiong; Wen-mei Hwu; Thomas S. Huang; Honghui Shi; | We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work. |
1255 | Robust Object Detection Under Occlusion With Context-Aware CompositionalNets | Angtian Wang; Yihong Sun; Adam Kortylewski; Alan L. Yuille; | In this work, we propose to overcome two limitations of CompositionalNets which will enable them to detect partially occluded objects: 1) CompositionalNets, as well as other DCNN architectures, do not explicitly separate the representation of the context from the object itself. |
1256 | IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval | Hui Chen; Guiguang Ding; Xudong Liu; Zijia Lin; Ji Liu; Jungong Han; | In this paper, to address such a deficiency, we propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences between images and texts are captured with multiple steps of alignments. |
1257 | Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning | Shaobo Min; Hantao Yao; Hongtao Xie; Chaoqun Wang; Zheng-Jun Zha; Yongdong Zhang; | In this paper, we propose a novel Domain-aware Visual Bias Eliminating (DVBE) network that constructs two complementary visual representations, i.e., semantic-free and semantic-aligned, to treat seen and unseen domains separately. |
1258 | Semi-Supervised Semantic Segmentation With Cross-Consistency Training | Yassine Ouali; Celine Hudelot; Myriam Tami; | In this paper, we present a novel cross-consistency based semi-supervised approach for semantic segmentation. |
1259 | Learning to Learn Cropping Models for Different Aspect Ratio Requirements | Debang Li; Junge Zhang; Kaiqi Huang; | In this paper, we propose a meta-learning (learning to learn) based aspect ratio specified image cropping method called Mars, which can generate cropping results of different expected aspect ratios. |
1260 | What Makes Training Multi-Modal Classification Networks Hard? | Weiyao Wang; Du Tran; Matt Feiszli; | This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. We address these two problems with a technique we call Gradient-Blending, which computes an optimal blending of modalities based on their overfitting behaviors. |
1261 | Selective Transfer With Reinforced Transfer Network for Partial Domain Adaptation | Zhihong Chen; Chao Chen; Zhaowei Cheng; Boyuan Jiang; Ke Fang; Xinyu Jin; | In this paper, we propose a reinforced transfer network (RTNet), which utilizes both high-level and pixel-level information for PDA problem. |
1262 | Semi-Supervised Semantic Image Segmentation With Self-Correcting Networks | Mostafa S. Ibrahim; Arash Vahdat; Mani Ranjbar; William G. Macready; | In this paper, we introduce a principled semi-supervised framework that only use a small set of fully supervised images (having semantic segmentation labels and box labels) and a set of images with only object bounding box labels (we call it the weak-set). |
1263 | Exemplar Normalization for Learning Deep Representation | Ruimao Zhang; Zhanglin Peng; Lingyun Wu; Zhen Li; Ping Luo; | This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network. |
1264 | Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation | Mengshi Qi; Jie Qin; Yu Wu; Yi Yang; | To this end, we propose a novel imitative non-autoregressive modeling method to simultaneously handle the trajectory prediction task and the missing value imputation task. |
1265 | Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text | Difei Gao; Ke Li; Ruiping Wang; Shiguang Shan; Xilin Chen; | Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN). |
1266 | StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching | Rui Liu; Chengxi Yang; Wenxiu Sun; Xiaogang Wang; Hongsheng Li; | In this paper, we propose an end-to-end training framework with domain translation and stereo matching networks to tackle this challenge. |
1267 | Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning | Jiamin Wu; Tianzhu Zhang; Zheng-Jun Zha; Jiebo Luo; Yongdong Zhang; Feng Wu; | To address this issue, we propose an end-to-end Self-supervised Domain-aware Generative Network (SDGN) by integrating self-supervised learning into feature generating model for unbiased GZSL. |
1268 | Sparse Layered Graphs for Multi-Object Segmentation | Niels Jeppesen; Anders N. Christensen; Vedrana A. Dahl; Anders B. Dahl; | We introduce the novel concept of a Sparse Layered Graph (SLG) for s-t graph cut segmentation of image data. |
1269 | Visual-Semantic Matching by Exploring High-Order Attention and Distraction | Yongzhi Li; Duo Zhang; Yadong Mu; | In this work, we address this task from two previously-ignored aspects: high-order semantic information (e.g., object-predicate-subject triplet, object-attribute pair) and visual distraction (i.e., despite the high relevance to textual query, images may also contain many prominent distracting objects or visual relations). |
1270 | End-to-End 3D Point Cloud Instance Segmentation Without Detection | Haiyong Jiang; Feilong Yan; Jianfei Cai; Jianmin Zheng; Jun Xiao; | In this paper, we introduce a novel framework to enable end-to-end instance segmentation without detection and a separate step of grouping. |
1271 | Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images | Zhengxia Zou; Sen Lei; Tianyang Shi; Zhenwei Shi; Jieping Ye; | We propose a unified framework named "deep adversarial decomposition" for single superimposed image separation. |
1272 | Differentiable Adaptive Computation Time for Visual Reasoning | Cristobal Eyzaguirre; Alvaro Soto; | This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT, which, unlike existing ones, is end-to-end differentiable. |
1273 | DeepLPF: Deep Local Parametric Filters for Image Enhancement | Sean Moran; Pierre Marza; Steven McDonagh; Sarah Parisot; Gregory Slabaugh; | In this paper, we introduce a novel approach to automatically enhance images using learned spatially local filters of three different types (Elliptical Filter, Graduated Filter, Polynomial Filter). |
1274 | Instance Credibility Inference for Few-Shot Learning | Yikai Wang; Chengming Xu; Chen Liu; Li Zhang; Yanwei Fu; | In contrast, this paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning. |
1275 | Learning From Web Data With Self-Organizing Memory Module | Yi Tu; Li Niu; Junjie Chen; Dawei Cheng; Liqing Zhang; | In this paper, we propose a novel method, which is capable of handling these two types of noises together, without the supervision of clean images in the training stage. |
1276 | TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning | Zhongjie Yu; Lin Chen; Zhongwei Cheng; Jiebo Luo; | In this paper, we propose a new transfer-learning framework for semi-supervised few-shot learning to fully utilize the auxiliary information from labeled base-class data and unlabeled novel-class data. |
1277 | Learning the Redundancy-Free Features for Generalized Zero-Shot Object Recognition | Zongyan Han; Zhenyong Fu; Jian Yang; | To reduce the superfluous information in the fine-grained objects, in this paper, we propose to learn the redundancy-free features for generalized zero-shot learning. |
1278 | Neural Topological SLAM for Visual Navigation | Devendra Singh Chaplot; Ruslan Salakhutdinov; Abhinav Gupta; Saurabh Gupta; | This paper studies the problem of image-goal navigation which involves navigating to the location indicated by a goal image in a novel previously unseen environment. |
1279 | WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching | Menglong Yang; Fangrui Wu; Wei Li; | This paper proposes a novel stereo matching method called WaveletStereo, which learns the wavelet coefficients of the disparity rather than the disparity itself. |
1280 | Robust Superpixel-Guided Attentional Adversarial Attack | Xiaoyi Dong; Jiangfan Han; Dongdong Chen; Jiayang Liu; Huanyu Bian; Zehua Ma; Hongsheng Li; Xiaogang Wang; Weiming Zhang; Nenghai Yu; | Based on these two considerations, we propose the first robust superpixel-guided attentional adversarial attack method. |
1281 | BEDSR-Net: A Deep Shadow Removal Network From a Single Document Image | Yun-Hsuan Lin; Wen-Chin Chen; Yung-Yu Chuang; | This paper proposes the Background Estimation Document Shadow Removal Network (BEDSR-Net), the first deep network specifically designed for document image shadow removal. |
1282 | Cross-Domain Document Object Detection: Benchmark Suite and Method | Kai Li; Curtis Wigington; Chris Tensmeyer; Handong Zhao; Nikolaos Barmpalios; Vlad I. Morariu; Varun Manjunatha; Tong Sun; Yun Fu; | We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain. |
1283 | Explaining Knowledge Distillation by Quantifying the Knowledge | Xu Cheng; Zhefan Rao; Yilan Chen; Quanshi Zhang; | This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN). |
1284 | Exploring Bottom-Up and Top-Down Cues With Attentive Learning for Webly Supervised Object Detection | Zhonghua Wu; Qingyi Tao; Guosheng Lin; Jianfei Cai; | Within our approach, we introduce a bottom-up mechanism based on the well-trained fully supervised object detector (i.e. Faster RCNN) as an object region estimator for web images by recognizing the common objectiveness shared by base and novel classes. |
1285 | Enhancing Generic Segmentation With Learned Region Representations | Or Isaacs; Oran Shayer; Michael Lindenbaum; | We propose an alternative approach called Deep Generic Segmentation (DGS) and try to follow the path used for semantic segmentation. |
1286 | Adaptive Hierarchical Down-Sampling for Point Cloud Classification | Ehsan Nezhadarya; Ehsan Taghavi; Ryan Razani; Bingbing Liu; Jun Luo; | In this paper, we propose a novel deterministic, adaptive, permutation-invariant down-sampling layer, called Critical Points Layer (CPL), which learns to reduce the number of points in an unordered point cloud while retaining the important (critical) ones. |
1287 | FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions | Alvin Wan; Xiaoliang Dai; Peizhao Zhang; Zijian He; Yuandong Tian; Saining Xie; Bichen Wu; Matthew Yu; Tao Xu; Kan Chen; Peter Vajda; Joseph E. Gonzalez; | To address this bottleneck, we propose a memory and computationally efficient DNAS variant: DMaskingNAS. |
1288 | Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation | Myeongjin Kim; Hyeran Byun; | In this paper, considering the fundamental difference between the two domains as the texture, we propose a method to adapt to the target domain’s texture. |
1289 | Putting Visual Object Recognition in Context | Mengmi Zhang; Claire Tseng; Gabriel Kreiman; | We propose a biologically-inspired context-aware object recognition model consisting of a two-stream architecture. |
1290 | SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection | Ze Chen; Zhihang Fu; Rongxin Jiang; Yaowu Chen; Xian-Sheng Hua; | In this paper, we propose a spatial likelihood voting (SLV) module to converge the proposal localizing process without any bounding box annotations. |
1291 | Universal Weighting Metric Learning for Cross-Modal Matching | Jiwei Wei; Xing Xu; Yang Yang; Yanli Ji; Zheng Wang; Heng Tao Shen; | To address this problem, we propose a simple and interpretable universal weighting framework for cross-modal matching, which provides a tool to analyze the interpretability of various loss functions. |
1292 | IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous Driving | Wanli Peng; Hao Pan; He Liu; Yi Sun; | Considering more general scenes, where there is no LiDAR data in the 3D datasets, we propose a 3D object detection approach from stereo vision which does not rely on LiDAR data either as input or as supervision in training, but solely takes RGB images with corresponding annotated 3D bounding boxes as training data. |
1293 | Label Decoupling Framework for Salient Object Detection | Jun Wei; Shuhui Wang; Zhe Wu; Chi Su; Qingming Huang; Qi Tian; | To address this problem, we propose a label decoupling framework (LDF) which consists of a label decoupling (LD) procedure and a feature interaction network (FIN). |
1294 | Transform and Tell: Entity-Aware News Image Captioning | Alasdair Tran; Alexander Mathews; Lexing Xie; | We propose an end-to-end model which generates captions for images embedded in news articles. |
1295 | HAMBox: Delving Into Mining High-Quality Anchors on Face Detection | Yang Liu; Xu Tang; Junyu Han; Jingtuo Liu; Dinger Rui; Xiang Wu; | In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors. |
1296 | Hierarchical Feature Embedding for Attribute Recognition | Jie Yang; Jiarou Fan; Yiru Wang; Yige Wang; Weihao Gan; Lin Liu; Wei Wu; | To address this problem, we propose a hierarchical feature embedding (HFE) framework, which learns a fine-grained feature embedding by combining attribute and ID information. |
1297 | Squeeze-and-Attention Networks for Semantic Segmentation | Zilong Zhong; Zhong Qiu Lin; Rene Bidart; Xiaodan Hu; Ibrahim Ben Daya; Zhifeng Li; Wei-Shi Zheng; Jonathan Li; Alexander Wong; | In this paper, we propose a novel squeeze-and-attention network (SANet) architecture that leverages an effective squeeze-and-attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixel-wise prediction. |
1298 | Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection | Sara Beery; Guanhang Wu; Vivek Rathod; Ronny Votel; Jonathan Huang; | In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera. |
1299 | Mixture Dense Regression for Object Detection and Human Pose Estimation | Ali Varamesh; Tinne Tuytelaars; | To this end, we devise a framework for spatial regression using mixture density networks. |
1300 | Syntax-Aware Action Targeting for Video Captioning | Qi Zheng; Chaoyue Wang; Dacheng Tao; | Specifically, we propose a Syntax-Aware Action Targeting (SAAT) module that firstly builds a self-attended scene representation to draw global dependence among multiple objects within a scene, and then decodes the visually-related syntax components by setting different queries. |
1301 | Learning Visual Emotion Representations From Web Data | Zijun Wei; Jianming Zhang; Zhe Lin; Joon-Young Lee; Niranjan Balasubramanian; Minh Hoai; Dimitris Samaras; | We present a scalable approach for learning powerful visual features for emotion recognition. |
1302 | The Edge of Depth: Explicit Constraints Between Segmentation and Depth | Shengjie Zhu; Garrick Brazil; Xiaoming Liu; | In this work we study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images. |
1303 | A Context-Aware Loss Function for Action Spotting in Soccer Videos | Anthony Cioppa; Adrien Deliege; Silvio Giancola; Bernard Ghanem; Marc Van Droogenbroeck; Rikke Gade; Thomas B. Moeslund; | In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot. |
1304 | Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training | Weituo Hao; Chunyuan Li; Xiujun Li; Lawrence Carin; Jianfeng Gao; | In this paper, we present the first pre-training and fine-tuning paradigm for vision-and-language navigation (VLN) tasks. |
1305 | Video Instance Segmentation Tracking With a Modified VAE Architecture | Chung-Ching Lin; Ying Hung; Rogerio Feris; Linglin He; | We propose a modified variational autoencoder (VAE) architecture built on top of Mask R-CNN for instance-level video segmentation and tracking. |
1306 | Deformation-Aware Unpaired Image Translation for Pose Estimation on Laboratory Animals | Siyuan Li; Semih Gunel; Mirela Ostrek; Pavan Ramdya; Pascal Fua; Helge Rhodin; | Our goal is to capture the pose of real animals using synthetic training examples, without using any manual supervision. |
1307 | ZeroQ: A Novel Zero Shot Quantization Framework | Yaohui Cai; Zhewei Yao; Zhen Dong; Amir Gholami; Michael W. Mahoney; Kurt Keutzer; | Here, we propose \OURS, a novel zero-shot quantization framework to address this. |
1308 | Disparity-Aware Domain Adaptation in Stereo Image Restoration | Bo Yan; Chenxi Ma; Bahetiyaer Bare; Weimin Tan; Steven C. H. Hoi; | Towards this end, this paper analyses how to effectively explore disparity information, and proposes a unified stereo image restoration framework. |
1309 | Offset Bin Classification Network for Accurate Object Detection | Heqian Qiu; Hongliang Li; Qingbo Wu; Hengcan Shi; | In this paper, we propose an offset bin classification network optimized with cross entropy loss to predict more accurate offsets. |
1310 | TBT: Targeted Neural Network Attack With Bit Trojan | Adnan Siraj Rakin; Zhezhi He; Deliang Fan; | In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through bit-flip attack. |
1311 | Maintaining Discrimination and Fairness in Class Incremental Learning | Bowen Zhao; Xi Xiao; Guojun Gan; Bin Zhang; Shu-Tao Xia; | In this paper, we propose a simple and effective solution motivated by the aforementioned observations to address catastrophic forgetting. |
1312 | Background Data Resampling for Outlier-Aware Classification | Yi Li; Nuno Vasconcelos; | The problem of learning an image classifier that allows detection of out-of-distribution (OOD) examples, with the help of auxiliary background datasets, is studied. |
1313 | STEFANN: Scene Text Editor Using Font Adaptive Neural Network | Prasun Roy; Saumik Bhattacharya; Subhankar Ghosh; Umapada Pal; | In this paper, we propose a method to modify text in an image at character-level. |
1314 | Geometry and Learning Co-Supported Normal Estimation for Unstructured Point Cloud | Haoran Zhou; Honghua Chen; Yidan Feng; Qiong Wang; Jing Qin; Haoran Xie; Fu Lee Wang; Mingqiang Wei; Jun Wang; | In this paper, we propose a normal estimation method for unstructured point cloud. |
1315 | Sequential Motif Profiles and Topological Plots for Offline Signature Verification | Elias N. Zois; Evangelos Zervas; Dimitrios Tsourounis; George Economou; | In this paper, inspired by the recent use of image visibility graphs for mapping images into networks, we introduce for the first time in offline SV literature their use as a parameter free, agnostic representation for exploring global as well as local information. |
1316 | Optical Flow in Dense Foggy Scenes Using Semi-Supervised Learning | Wending Yan; Aashish Sharma; Robby T. Tan; | To address the problem, we introduce a semi-supervised deep learning technique that employs real fog images without optical flow ground-truths in the training process. |
1317 | A Spatial RNN Codec for End-to-End Image Compression | Chaoyi Lin; Jiabao Yao; Fangdong Chen; Li Wang; | In this paper, we propose a fast yet effective method for end-to-end image compression by incorporating a novel spatial recurrent neural network. |
1318 | Object Relational Graph With Teacher-Recommended Learning for Video Captioning | Ziqi Zhang; Yaya Shi; Chunfeng Yuan; Bing Li; Peijin Wang; Weiming Hu; Zheng-Jun Zha; | In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. |
1319 | MMTM: Multimodal Transfer Module for CNN Fusion | Hamid Reza Vaezi Joze; Amirreza Shaban; Michael L. Iuzzolino; Kazuhito Koishida; | In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks. |
1320 | Generalized Zero-Shot Learning via Over-Complete Distribution | Rohit Keshari; Richa Singh; Mayank Vatsa; | To learn a discriminative classifier which yields good performance in Zero-Shot Learning (ZSL) settings, we propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes. |
1321 | Gait Recognition via Semi-supervised Disentangled Representation Learning to Identity and Covariate Features | Xiang Li; Yasushi Makihara; Chi Xu; Yasushi Yagi; Mingwu Ren; | We therefore propose a method of gait recognition via disentangled representation learning that considers both identity and covariate features. |
1322 | Unifying Training and Inference for Panoptic Segmentation | Qizhu Li; Xiaojuan Qi; Philip H.S. Torr; | We present an end-to-end network to bridge the gap between training and inference pipeline for panoptic segmentation, a task that seeks to partition an image into semantic regions for "stuff" and object instances for "things". |
1323 | Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection | Liang Du; Xiaoqing Ye; Xiao Tan; Jianfeng Feng; Zhenbo Xu; Errui Ding; Shilei Wen; | In this paper, we innovatively propose a domain adaptation like approach to enhance the robustness of the feature representation. |
1324 | Interactive Image Segmentation With First Click Attention | Zheng Lin; Zhao Zhang; Lin-Zhuo Chen; Ming-Ming Cheng; Shao-Ping Lu; | In this paper, we demonstrate the critical role of the first click about providing the location and main body information of the target object. |
1325 | NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection | Yazhao Li; Yanwei Pang; Jianbing Shen; Jiale Cao; Ling Shao; | With this observation, we propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features. |
1326 | Scale-Equalizing Pyramid Convolution for Object Detection | Xinjiang Wang; Shilong Zhang; Zhuoran Yu; Litong Feng; Wayne Zhang; | Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. |
1327 | Learning to Cluster Faces via Confidence and Connectivity Estimation | Lei Yang; Dapeng Chen; Xiaohang Zhan; Rui Zhao; Chen Change Loy; Dahua Lin; | In this paper, we propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. |
1328 | Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer | Yan Lu; Yue Wu; Bin Liu; Tianzhu Zhang; Baopu Li; Qi Chu; Nenghai Yu; | In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the reidentification performance. |
1329 | DPGN: Distribution Propagation Graph Network for Few-Shot Learning | Ling Yang; Liangliang Li; Zilun Zhang; Xinyu Zhou; Erjin Zhou; Yu Liu; | We propose a novel approach named distribution propagation graph network (DPGN) for few-shot learning. |
1330 | Density-Aware Graph for Deep Semi-Supervised Visual Recognition | Suichan Li; Bin Liu; Dongdong Chen; Qi Chu; Lu Yuan; Nenghai Yu; | Motivated by these limitations, this paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged and the feature learning and label propagation can also be trained in an end-to-end way. |
1331 | Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation | Moab Arar; Yiftach Ginger; Dov Danon; Amit H. Bermano; Daniel Cohen-Or; | In this work, we bypass the difficulties of developing cross-modality similarity measures, by training an image-to-image translation network on the two input modalities. |
1332 | Binarizing MobileNet via Evolution-Based Searching | Hai Phan; Zechun Liu; Dang Huynh; Marios Savvides; Kwang-Ting Cheng; Zhiqiang Shen; | In this paper, we propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet, a compact network with separable depth-wise convolution. |
1333 | Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians | Jialian Wu; Chunluan Zhou; Ming Yang; Qian Zhang; Yuan Li; Junsong Yuan; | In this paper, we exploit the local temporal context of pedestrians in videos and propose a tube feature aggregation network (TFAN) aiming at enhancing pedestrian detectors against severe occlusions. |
1334 | Orderless Recurrent Models for Multi-Label Classification | Vacit Oguz Yazici; Abel Gonzalez-Garcia; Arnau Ramisa; Bartlomiej Twardowski; Joost van de Weijer; | Therefore, in this paper, we propose ways to dynamically order the ground truth labels with the predicted label sequence. |
1335 | Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning | Ehsan Abbasnejad; Iman Abbasnejad; Qi Wu; Javen Shi; Anton van den Hengel; | We propose a reinforcement-learning approach that maintains a distribution over its internal information, thus explicitly representing the ambiguity in what it knows, and needs to know, towards achieving its goal. |
1336 | Rethinking the Route Towards Weakly Supervised Object Localization | Chen-Lin Zhang; Yun-Hao Cao; Jianxin Wu; | In this paper, we demonstrate that weakly supervised object localization should be divided into two parts: class-agnostic object localization and object classification. |
1337 | Adversarial Feature Hallucination Networks for Few-Shot Learning | Kai Li; Yulun Zhang; Kunpeng Li; Yun Fu; | In this paper, we propose Adversarial Feature Hallucination Networks (AFHN) which is based on conditional Wasserstein Generative Adversarial networks (cWGAN) and hallucinates diverse and discriminative features conditioned on the few labeled samples. |
1338 | Conditional Gaussian Distribution Learning for Open Set Recognition | Xin Sun; Zhenning Yang; Chi Zhang; Keck-Voon Ling; Guohao Peng; | In this paper, we propose a novel method, Conditional Gaussian Distribution Learning (CGDL), for open set recognition. |
1339 | Connect-and-Slice: An Hybrid Approach for Reconstructing 3D Objects | Hao Fang; Florent Lafarge; | In this paper, we address this issue with an hybrid method that successively connects and slices planes detected from 3D data. |
1340 | Attentive Weights Generation for Few Shot Learning via Information Maximization | Yiluan Guo; Ngai-Man Cheung; | In this work, we present Attentive Weights Generation for few shot learning via Information Maximization (AWGIM), which introduces two novel contributions: i) Mutual information maximization between generated weights and data within the task; this enables the generated weights to retain information of the task and the specific query sample. |
1341 | Assessing Eye Aesthetics for Automatic Multi-Reference Eye In-Painting | Bo Yan; Qing Lin; Weimin Tan; Shili Zhou; | In this paper, aesthetic assessment is introduced into eye in-painting task for the first time. We construct an eye aesthetic dataset, and train the eye aesthetic assessment network on this basis. |
1342 | PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation | Zhuo Chen; Chaoyue Wang; Bo Yuan; Dacheng Tao; | In this paper, we devised a novel two-stage framework called PuppeteerGAN for solving these challenges. |
1343 | SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition | Zhi Qiao; Yu Zhou; Dongbao Yang; Yucan Zhou; Weiping Wang; | In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts. |
1344 | Texture and Shape Biased Two-Stream Networks for Clothing Classification and Attribute Recognition | Yuwei Zhang; Peng Zhang; Chun Yuan; Zhi Wang; | To this end, we propose to use two streams to enhance the extraction of shape and texture, respectively. |
1345 | Distortion Agnostic Deep Watermarking | Xiyang Luo; Ruohan Zhan; Huiwen Chang; Feng Yang; Peyman Milanfar; | In this paper, we propose a new framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. |
1346 | RMP-SNN: Residual Membrane Potential Neuron for Enabling Deeper High-Accuracy and Low-Latency Spiking Neural Network | Bing Han; Gopalakrishnan Srinivasan; Kaushik Roy; | We propose ANN-SNN conversion using "soft reset" spiking neuron model, referred to as Residual Membrane Potential (RMP) spiking neuron, which retains the "residual" membrane potential above threshold at the firing instants. |
1347 | BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector | Yang Liu; Xu Tang; | To resolve this, the success of Neural Archi-tecture Search (NAS) inspires us to search face-appropriate backbone and featrue pyramid network (FPN) architecture. |
1348 | PFCNN: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames | Yuqi Yang; Shilin Liu; Hao Pan; Yang Liu; Xin Tong; | We use parallel frames on surface to define PFCNNs that enable effective feature learning on surface meshes by mimicking standard convolutions faithfully. |
1349 | iTAML: An Incremental Task-Agnostic Meta-learning Approach | Jathushan Rajasegaran; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Mubarak Shah; | In this pursuit, we introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks. |
1350 | Optimal least-squares solution to the hand-eye calibration problem | Amit Dekel; Linus Harenstam-Nielsen; Sergio Caccamo; | We propose a least-squares formulation to the noisy hand-eye calibration problem using dual-quaternions, and introduce efficient algorithms to find the exact optimal solution, based on analytic properties of the problem, avoiding non-linear optimization. |
1351 | MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices | Bo Chen; Golnaz Ghiasi; Hanxiao Liu; Tsung-Yi Lin; Dmitry Kalenichenko; Hartwig Adam; Quoc V. Le; | We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models. |
1352 | VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions | Oytun Ulutan; A S M Iftekhar; B. S. Manjunath; | VSGNet extracts visual features from the human-object pairs, refines the features with spatial configurations of the pair, and utilizes the structural connections between the pair via graph convolutions. |
1353 | End-to-End Camera Calibration for Broadcast Videos | Long Sha; Jennifer Hobbs; Panna Felsen; Xinyu Wei; Patrick Lucey; Sujoy Ganguly; | In this paper, we propose an end-to-end approach for single moving camera calibration across challenging scenarios in sports. |
1354 | Regularizing CNN Transfer Learning With Randomised Regression | Yang Zhong; Atsuto Maki; | This paper is about regularizing deep convolutional networks (CNNs) based on an adaptive framework for transfer learning with limited training data in the target domain. |
1355 | KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations | Yang You; Yujing Lou; Chengkun Li; Zhoujun Cheng; Liangwei Li; Lizhuang Ma; Cewu Lu; Weiming Wang; | To handle the inconsistency between annotations from different people, we propose a novel method to aggregate these keypoints automatically, through minimization of a fidelity loss. |
1356 | Hierarchical Clustering With Hard-Batch Triplet Loss for Person Re-Identification | Kaiwei Zeng; Munan Ning; Yaohua Wang; Yang Guo; | In order to improve the quality of pseudo labels in existing methods, we propose the HCT method which combines hierarchical clustering with hard-batch triplet loss. |
1357 | Joint Semantic Segmentation and Boundary Detection Using Iterative Pyramid Contexts | Mingmin Zhen; Jinglu Wang; Lei Zhou; Shiwei Li; Tianwei Shen; Jiaxiang Shang; Tian Fang; Long Quan; | In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection. |
1358 | Attention-Guided Hierarchical Structure Aggregation for Image Matting | Yu Qiao; Yuhao Liu; Xin Yang; Dongsheng Zhou; Mingliang Xu; Qiang Zhang; Xiaopeng Wei; | In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input. |
1359 | MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation | Rongchang Xie; Chunyu Wang; Yizhou Wang; | In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset. |
1360 | Prior Guided GAN Based Semantic Inpainting | Avisek Lahiri; Arnav Kumar Jain; Sanskar Agrawal; Pabitra Mitra; Prabir Kumar Biswas; | In this paper, going against the general trend, we focus on the second paradigm of inpainting and address both of its mentioned problems. |
1361 | Weakly Supervised Semantic Point Cloud Segmentation: Towards 10x Fewer Labels | Xun Xu; Gim Hee Lee; | In this work, we propose a weakly supervised point cloud segmentation approach which requires only a tiny fraction of points to be labelled in the training stage. |
1362 | Physically Realizable Adversarial Examples for LiDAR Object Detection | James Tu; Mengye Ren; Sivabalan Manivasagam; Ming Liang; Bin Yang; Richard Du; Frank Cheng; Raquel Urtasun; | In this paper, we address this issue and present a method to generate universal 3D adversarial objects to fool LiDAR detectors. |
1363 | Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization | Hongxin Wei; Lei Feng; Xiangyu Chen; Bo An; | In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. |
1364 | Light-weight Calibrator: A Separable Component for Unsupervised Domain Adaptation | Shaokai Ye; Kailu Wu; Mu Zhou; Yunfei Yang; Sia Huat Tan; Kaidi Xu; Jiebo Song; Chenglong Bao; Kaisheng Ma; | In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain’s performance. |
1365 | Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition | Canjie Luo; Yuanzhi Zhu; Lianwen Jin; Yongpan Wang; | In this paper, we propose a new method for text image augmentation. |
1366 | Learning Selective Self-Mutual Attention for RGB-D Saliency Detection | Nian Liu; Ni Zhang; Junwei Han; | In this paper, we propose to fuse attention learned in both modalities. |
1367 | Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation | Yangtao Zheng; Di Huang; Songtao Liu; Yunhong Wang; | To address such an issue, this paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection. |
1368 | Estimating Low-Rank Region Likelihood Maps | Gabriela Csurka; Zoltan Kato; Andor Juhasz; Martin Humenberger; | Herein, we propose a novel self-supervised low-rank region detection deep network that predicts a low-rank likelihood map from an image. |
1369 | Neural Head Reenactment with Latent Pose Descriptors | Egor Burkov; Igor Pasechnik; Artur Grigorev; Victor Lempitsky; | We propose a neural head reenactment system, which is driven by a latent pose representation and is capable of predicting the foreground segmentation alongside the RGB image. |
1370 | Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis | K R Prajwal; Rudrabha Mukhopadhyay; Vinay P. Namboodiri; C.V. Jawahar; | In this work, we explore the task of lip to speech synthesis, i.e., learning to generate natural speech given only the lip movements of a speaker. |
1371 | Self-Supervised Learning of Video-Induced Visual Invariances | Michael Tschannen; Josip Djolonga; Marvin Ritter; Aravindh Mahendran; Neil Houlsby; Sylvain Gelly; Mario Lucic; | We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI). |
1372 | Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer | Jan Svoboda; Asha Anoosheh; Christian Osendorfer; Jonathan Masci; | This paper introduces a neural style transfer model to generate a stylized image conditioning on a set of examples describing the desired style. |
1373 | MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment | Florian Bernard; Zeeshan Khan Suri; Christian Theobalt; | To this end, we propose a novel shape deformation model based on an efficient low-dimensional discrete model, so that finding a globally optimal solution is tractable in (most) practical cases. |
1374 | Improving One-Shot NAS by Suppressing the Posterior Fading | Xiang Li; Chen Lin; Chuming Li; Ming Sun; Wei Wu; Junjie Yan; Wanli Ouyang; | In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the Posterior Fading problem, which compromises the effectiveness of shared weights. |
1375 | Incremental Few-Shot Object Detection | Juan-Manuel Perez-Rua; Xiatian Zhu; Timothy M. Hospedales; Tao Xiang; | We present the first study aiming to go beyond these limitations by considering the Incremental Few-Shot Detection (iFSD) problem setting, where new classes must be registered incrementally (without revisiting base classes) and with few examples. |
1376 | Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data | Qi Chang; Hui Qu; Yikai Zhang; Mert Sabuncu; Chao Chen; Tong Zhang; Dimitris N. Metaxas; | In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). |
1377 | Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation | Yingwei Pan; Ting Yao; Yehao Li; Chong-Wah Ngo; Tao Mei; | In this paper, we address this problem by augmenting the state-of-the-art domain adaptation technique, Self-Ensembling, with category-agnostic clusters in target domain. |
1378 | Regularizing Class-Wise Predictions via Self-Knowledge Distillation | Sukmin Yun; Jongjin Park; Kimin Lee; Jinwoo Shin; | To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. |
1379 | Hierarchical Graph Attention Network for Visual Relationship Detection | Li Mi; Zhenzhong Chen; | In this work, a Hierarchical Graph Attention Network (HGAT) is proposed to capture the dependencies on both object-level and triplet-level. |
1380 | M2m: Imbalanced Classification via Major-to-Minor Translation | Jaehyung Kim; Jongheon Jeong; Jinwoo Shin; | In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples (e.g., images) from more-frequent classes. |
1381 | CenterMask: Real-Time Anchor-Free Instance Segmentation | Youngwan Lee; Jongyoul Park; | We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector (FCOS) in the same vein with Mask R-CNN. |
1382 | Multi-Path Learning for Object Pose Estimation Across Domains | Martin Sundermeyer; Maximilian Durner; En Yen Puang; Zoltan-Csaba Marton; Narunas Vaskevicius; Kai O. Arras; Rudolph Triebel; | We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. |
1383 | Incremental Learning in Online Scenario | Jiangpeng He; Runyu Mao; Zeman Shao; Fengqing Zhu; | In this paper, we propose an incremental learning framework that can work in the challenging online learning scenario and handle both new classes data and new observations of old classes. |
1384 | Enhanced Transport Distance for Unsupervised Domain Adaptation | Mengxue Li; Yi-Ming Zhai; You-Wei Luo; Peng-Fei Ge; Chuan-Xian Ren; | In this work, we propose an enhanced transport distance (ETD) for UDA. |
1385 | TESA: Tensor Element Self-Attention via Matricization | Francesca Babiloni; Ioannis Marras; Gregory Slabaugh; Stefanos Zafeiriou; | In this paper, we introduce a new method, called Tensor Element Self-Attention (TESA) that generalizes such work to capture interdependencies along all dimensions of the tensor using matricization. |
1386 | Training a Steerable CNN for Guidewire Detection | Donghang Li; Adrian Barbu; | In this paper, we present a steerable Convolutional Neural Network (CNN), which is a Fully Convolutional Neural Network (FCNN) that can detect objects rotated by an arbitrary 2D angle, without being rotation invariant. |
1387 | Superpixel Segmentation With Fully Convolutional Networks | Fengting Yang; Qian Sun; Hailin Jin; Zihan Zhou; | Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, we present a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid. |
1388 | SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation | Koutilya PNVR; Hao Zhou; David Jacobs; | We propose a novel method for combining synthetic and real images when training networks to determine geometric information from a single image. |
1389 | Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition | Shikai Chen; Jianfeng Wang; Yuedong Chen; Zhongchao Shi; Xin Geng; Yong Rui; | To solve the problem, we propose a novel approach named Label Distribution Learning on Auxiliary Label Space Graphs(LDL-ALSG) that leverages the topological information of the labels from related but more distinct tasks, such as action unit recognition and facial landmark detection. |
1390 | Deep Residual Flow for Out of Distribution Detection | Ev Zisselman; Aviv Tamar; | In this work, we present a novel approach that improves upon the state-of-the-art by leveraging an expressive density model based on normalizing flows. |
1391 | FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation | Shurui Gui; Chaoyue Wang; Qihua Chen; Dacheng Tao; | In this work, we devised a novel structure-to-texture generation framework which splits the video interpolation task into two stages: structure-guided interpolation and texture refinement. |
1392 | Learning Nanoscale Motion Patterns of Vesicles in Living Cells | Arif Ahmed Sekh; Ida Sundvor Opstad; Asa Birna Birgisdottir; Truls Myrmel; Balpreet Singh Ahluwalia; Krishna Agarwal; Dilip K. Prasad; | We propose an integrative approach, built upon physics based simulations, nanoscopy algorithms, and shallow residual attention network to make it possible for the first time to analysis sub-resolution motion patterns in vesicles that may also be of sub-resolution diameter. |
1393 | Improving Action Segmentation via Graph-Based Temporal Reasoning | Yifei Huang; Yusuke Sugano; Yoichi Sato; | In this paper, we propose a network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans. |
1394 | Episode-Based Prototype Generating Network for Zero-Shot Learning | Yunlong Yu; Zhong Ji; Jungong Han; Zhongfei Zhang; | We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the learning system requires to recognize unseen classes given only the corresponding class semantics. |
1395 | Learning to Segment the Tail | Xinting Hu; Yi Jiang; Kaihua Tang; Jingyuan Chen; Chunyan Miao; Hanwang Zhang; | We propose a "divide&conquer" strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one. |
1396 | Learning to Evaluate Perception Models Using Planner-Centric Metrics | Jonah Philion; Amlan Kar; Sanja Fidler; | In this paper, we propose a principled metric for 3D object detection specifically for the task of self-driving. |
1397 | Where, What, Whether: Multi-Modal Learning Meets Pedestrian Detection | Yan Luo; Chongyang Zhang; Muming Zhao; Hao Zhou; Jun Sun; | In this paper, we propose W^3Net, which attempts to address above challenges by decomposing the pedestrian detection task into Where, What and Whether problem directing against pedestrian localization, scale prediction and classification correspondingly. |
1398 | CoverNet: Multimodal Behavior Prediction Using Trajectory Sets | Tung Phan-Minh; Elena Corina Grigore; Freddy A. Boulton; Oscar Beijbom; Eric M. Wolff; | We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving. |
1399 | Real-World Person Re-Identification via Degradation Invariance Learning | Yukun Huang; Zheng-Jun Zha; Xueyang Fu; Richang Hong; Liang Li; | In this paper, to solve the above problem, we propose a degradation invariance learning framework for real-world person Re-ID. |
1400 | Defending and Harnessing the Bit-Flip Based Adversarial Weight Attack | Zhezhi He; Adnan Siraj Rakin; Jingtao Li; Chaitali Chakrabarti; Deliang Fan; | In this work, we conduct comprehensive investigations on BFA and propose to leverage binarization-aware training and its relaxation — piece-wise clustering as simple and effective countermeasures to BFA. |
1401 | Adversarial Latent Autoencoders | Stanislav Pidhorskyi; Donald A. Adjeroh; Gianfranco Doretto; | We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE). |
1402 | Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment | Qiuyu Chen; Wei Zhang; Ning Zhou; Peng Lei; Yi Xu; Yu Zheng; Jianping Fan; | In this paper, an adaptive fractional dilated convolution (AFDC), which is aspect-ratio-embedded, composition-preserving and parameter-free, is developed to tackle this issue natively in convolutional kernel level. |
1403 | Deep Generative Model for Robust Imbalance Classification | Xinyue Wang; Yilin Lyu; Liping Jing; | In this paper, a deep generative classifier is proposed to mitigate this issue via both data perturbation and model perturbation. |
1404 | Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses | Wanqing Zhao; Shaobo Zhang; Ziyu Guan; Wei Zhao; Jinye Peng; Jianping Fan; | In this paper, we develop a keypoint-based 6D object pose detection method (and its deep network) called Object Keypoint based POSe Estimation (OK-POSE). |
1405 | MetaIQA: Deep Meta-Learning for No-Reference Image Quality Assessment | Hancheng Zhu; Leida Li; Jinjian Wu; Weisheng Dong; Guangming Shi; | With this motivation, this paper presents a no-reference IQA metric based on deep meta-learning. |
1406 | Sketchformer: Transformer-Based Representation for Sketched Structure | Leo Sampaio Ferraz Ribeiro; Tu Bui; John Collomosse; Moacir Ponti; | Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. |
1407 | Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation | Sunghun Joung; Seungryong Kim; Hanjae Kim; Minsu Kim; Ig-Jae Kim; Junghyun Cho; Kwanghoon Sohn; | To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. |
1408 | Learning a Unified Sample Weighting Network for Object Detection | Qi Cai; Yingwei Pan; Yu Wang; Jingen Liu; Ting Yao; Tao Mei; | To this end, we devise a general loss function to cover most region-based object detectors with various sampling strategies, and then based on it we propose a unified sample weighting network to predict a sample’s task weights. |
1409 | Old Is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm | Muhammad Zaigham Zaheer; Jin-Ha Lee; Marcella Astrid; Seung-Ik Lee; | In this study, we propose a framework that effectively generates stable results across a wide range of training steps and allows us to use both the generator and the discriminator of an adversarial model for efficient and robust anomaly detection. |
1410 | An Adaptive Neural Network for Unsupervised Mosaic Consistency Analysis in Image Forensics | Quentin Bammey; Rafael Grompone von Gioi; Jean-Michel Morel; | In this paper we develop a blind method that can train directly on unlabelled and potentially forged images to point out local mosaic inconsistencies. |
1411 | McFlow: Monte Carlo Flow Models for Data Imputation | Trevor W. Richardson; Wencheng Wu; Lei Lin; Beilei Xu; Edgar A. Bernal; | To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling. |
1412 | Learning to See Through Obstructions | Yu-Lun Liu; Wei-Sheng Lai; Ming-Hsuan Yang; Yung-Yu Chuang; Jia-Bin Huang; | We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera. |
1413 | GaitPart: Temporal Part-Based Model for Gait Recognition | Chao Fan; Yunjie Peng; Chunshui Cao; Xu Liu; Saihui Hou; Jiannan Chi; Yongzhen Huang; Qing Li; Zhiqiang He; | Then, we propose a novel part-based model GaitPart and get two aspects effect of boosting the performance: On the one hand, Focal Convolution Layer, a new applying of convolution, is presented to enhance the fine-grained learning of the part-level spatial features. On the other hand, the Micro-motion Capture Module (MCM) is proposed and there are several parallel MCMs in the GaitPart corresponding to the pre-defined parts of the human body, respectively. |
1414 | EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle | Trisha Mittal; Pooja Guhan; Uttaran Bhattacharya; Rohan Chandra; Aniket Bera; Dinesh Manocha; | We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. We also introduce a new dataset, GroupWalk, which is a collection of videos captured in multiple real-world settings of people walking. |
1415 | Can Deep Learning Recognize Subtle Human Activities? | Vincent Jacquot; Zhuofan Ying; Gabriel Kreiman; | In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models. |
1416 | PhysGAN: Generating Physical-World-Resilient Adversarial Examples for Autonomous Driving | Zelun Kong; Junfeng Guo; Ang Li; Cong Liu; | We present PhysGAN, which generates physical-world-resilient adversarial examples for misleading autonomous driving systems in a continuous manner. |
1417 | ILFO: Adversarial Attack on Adaptive Neural Networks | Mirazul Haque; Anki Chauhan; Cong Liu; Wei Yang; | In this paper, we investigate the robustness of neural networks against energy-oriented attacks. |
1418 | On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location | Osman Semih Kayhan; Jan C. van Gemert; | In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant. |
1419 | Diverse Image Generation via Self-Conditioned GANs | Steven Liu; Tongzhou Wang; David Bau; Jun-Yan Zhu; Antonio Torralba; | We introduce a simple but effective unsupervised method for generating diverse images. |
1420 | Inducing Hierarchical Compositional Model by Sparsifying Generator Network | Xianglei Xing; Tianfu Wu; Song-Chun Zhu; Ying Nian Wu; | This paper proposes to learn hierarchical compositional AND-OR model for interpretable image synthesis by sparsifying the generator network. |
1421 | CARP: Compression Through Adaptive Recursive Partitioning for Multi-Dimensional Images | Rongjie Liu; Meng Li; Li Ma; | We present such a method for multi-dimensional image compression called Compression via Adaptive Recursive Partitioning (CARP). |
1422 | GrappaNet: Combining Parallel Imaging With Deep Learning for Multi-Coil MRI Reconstruction | Anuroop Sriram; Jure Zbontar; Tullie Murrell; C. Lawrence Zitnick; Aaron Defazio; Daniel K. Sodickson; | In this paper, we present a novel method to integrate traditional parallel imaging methods into deep neural networks that is able to generate high quality reconstructions even for high acceleration factors. |
1423 | Can Weight Sharing Outperform Random Architecture Search? An Investigation With TuNAS | Gabriel Bender; Hanxiao Liu; Bo Chen; Grace Chu; Shuyang Cheng; Pieter-Jan Kindermans; Quoc V. Le; | While the efficacies of both methods are problem-dependent, our experiments demonstrate that there are large, realistic tasks where efficient search methods can provide substantial gains over random search. |
1424 | Context Aware Graph Convolution for Skeleton-Based Action Recognition | Xikun Zhang; Chang Xu; Dacheng Tao; | In this paper, we propose a context aware graph convolutional network (CA-GCN). |
1425 | Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning | Thiago M. Paixao; Rodrigo F. Berriel; Maria C. S. Boeres; Alessandro L. Koerich; Claudine Badue; Alberto F. De Souza; Thiago Oliveira-Santos; | This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly (rather than quadratically) with the number of shreds. |
1426 | Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition | Luming Tang; Davis Wertheimer; Bharath Hariharan; | A solution is to use pose-normalized representations: first localize semantic parts in each image, and then describe images by characterizing the appearance of each part. While such representations are out of favor for fully supervised classification, we show that they are extremely effective for few-shot fine-grained classification. |
1427 | RankMI: A Mutual Information Maximizing Ranking Loss | Mete Kemertas; Leila Pishdad; Konstantinos G. Derpanis; Afsaneh Fazly; | We introduce an information-theoretic loss function, RankMI, and an associated training algorithm for deep representation learning for image retrieval. |
1428 | Learning Memory-Guided Normality for Anomaly Detection | Hyunjong Park; Jongyoun Noh; Bumsub Ham; | To address this problem, we present an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs. |
1429 | Appearance Shock Grammar for Fast Medial Axis Extraction From Real Images | Charles-Olivier Dufresne Camaro; Morteza Rezanejad; Stavros Tsogkas; Kaleem Siddiqi; Sven Dickinson; | We combine ideas from shock graph theory with more recent appearance-based methods for medial axis extraction from complex natural scenes, improving upon the present best unsupervised method, in terms of efficiency and performance. |
1430 | Generalizing Hand Segmentation in Egocentric Videos With Uncertainty-Guided Model Adaptation | Minjie Cai; Feng Lu; Yoichi Sato; | In this work, we solve the hand segmentation generalization problem without requiring segmentation labels in the target domain. |
1431 | DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning | Jaime Spencer; Richard Bowden; Simon Hadfield; | We propose DeFeat-Net (Depth & Feature network), an approach to simultaneously learn a cross-domain dense feature representation, alongside a robust depth-estimation framework based on warped feature consistency. |
1432 | Learning Visual Motion Segmentation Using Event Surfaces | Anton Mitrokhin; Zhiyuan Hua; Cornelia Fermuller; Yiannis Aloimonos; | In this work we present a Graph Convolutional neural network for the task of scene motion segmentation by a moving camera. |
1433 | Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction | Abduallah Mohamed; Kun Qian; Mohamed Elhoseiny; Christian Claudel; | We propose the Social Spatio-Temporal Graph Convolutional Neural Network (Social-STGCNN), which substitutes the need of aggregation methods by modeling the interactions as a graph. |
1434 | Discriminative Multi-Modality Speech Recognition | Bo Xu; Cheng Lu; Yandong Guo; Jacob Wang; | In this paper, we propose a two-stage speech recognition model. |
1435 | Clean-Label Backdoor Attacks on Video Recognition Models | Shihao Zhao; Xingjun Ma; Xiang Zheng; James Bailey; Jingjing Chen; Yu-Gang Jiang; | In this paper, we show that existing image backdoor attacks are far less effective on videos, and outline 4 strict conditions where existing attacks are likely to fail: 1) scenarios with more input dimensions (eg. videos), 2) scenarios with high resolution, 3) scenarios with a large number of classes and few examples per class (a “sparse dataset”), and 4) attacks with access to correct labels (eg. clean-label attacks). |
1436 | Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors | Gilad Cohen; Guillermo Sapiro; Raja Giryes; | In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier. |
1437 | Unsupervised Model Personalization While Preserving Privacy and Scalability: An Open Problem | Matthias De Lange; Xu Jia; Sarah Parisot; Ales Leonardis; Gregory Slabaugh; Tinne Tuytelaars; | We aim to address this challenge within the continual learning paradigm and provide a novel Dual User-Adaptation framework (DUA) to explore the problem. |
1438 | GIFnets: Differentiable GIF Encoding Framework | Innfarn Yoo; Xiyang Luo; Yilin Wang; Feng Yang; Peyman Milanfar; | To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. |
1439 | Learning Invariant Representation for Unsupervised Image Restoration | Wenchao Du; Hu Chen; Hongyu Yang; | Instead, we propose an unsupervised learning method that explicitly learns invariant presentation from noisy data and reconstructs clear observations. |
1440 | Improved Few-Shot Visual Classification | Peyman Bateni; Raghav Goyal; Vaden Masrani; Frank Wood; Leonid Sigal; | In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. |
1441 | Learning Weighted Submanifolds With Variational Autoencoders and Riemannian Variational Autoencoders | Nina Miolane; Susan Holmes; | In this paper, we are interested in variants to learn potentially highly curved submanifolds of manifold-valued data. |
1442 | Learning Geocentric Object Pose in Oblique Monocular Images | Gordon Christie; Rodrigo Rene Rai Munoz Abujder; Kevin Foster; Shea Hagstrom; Gregory D. Hager; Myron Z. Brown; | Inspired by recent work in monocular height above ground prediction and optical flow prediction from static images, we develop an encoding of geocentric pose to address this challenge and train a deep network to compute the representation densely, supervised by publicly available airborne lidar. |
1443 | Understanding Adversarial Examples From the Mutual Influence of Images and Perturbations | Chaoning Zhang; Philipp Benz; Tooba Imtiaz; In So Kweon; | We propose to treat the DNN logits as a vector for feature representation, and exploit them to analyze the mutual influence of two independent inputs based on the Pearson correlation coefficient (PCC). |
1444 | Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models | Giannis Daras; Augustus Odena; Han Zhang; Alexandros G. Dimakis; | We introduce a new local sparse attention layer that preserves two-dimensional geometry and locality. |
1445 | MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion | Kentaro Wada; Edgar Sucar; Stephen James; Daniel Lenton; Andrew J. Davison; | We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. |
1446 | HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting | Geunseob Oh; Jean-Sebastien Valois; | We introduce Hyper-Conditioned Neural Autoregressive Flow (HCNAF); a powerful universal distribution approximator designed to model arbitrarily complex conditional probability density functions. |
1447 | Detail-recovery Image Deraining via Context Aggregation Networks | Sen Deng; Mingqiang Wei; Jun Wang; Yidan Feng; Luming Liang; Haoran Xie; Fu Lee Wang; Meng Wang; | We propose an end-to-end detail-recovery image deraining network (termed a DRDNet) to solve the problem. |
1448 | MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model | Han Fu; Rui Wu; Chenghao Liu; Jianling Sun; | In this paper, we focus on the task of cross-modal retrieval between food images and cooking recipes. |
1449 | Hypergraph Attention Networks for Multimodal Learning | Eun-Sol Kim; Woo Young Kang; Kyoung-Woon On; Yu-Jung Heo; Byoung-Tak Zhang; | To resolve this problem, we propose Hypergraph Attention Networks (HANs), which define a common semantic space among the modalities with symbolic graphs and extract a joint representation of the modalities based on a co-attention map constructed in the semantic space. |
1450 | Moving in the Right Direction: A Regularization for Deep Metric Learning | Deen Dayal Mohan; Nishant Sankaran; Dennis Fedorishin; Srirangaraj Setlur; Venu Govindaraju; | In this work, we identify a shortcoming of existing loss formulations which fail to consider more optimal directions of pair displacements as another criterion for optimization. |
1451 | Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets | Daniel Haase; Manuel Amthor; | We introduce blueprint separable convolutions (BSConv) as highly efficient building blocks for CNNs. |
1452 | Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization | Lourenco V. Pato; Renato Negrinho; Pedro M. Q. Aguiar; | We propose to incorporate context in object detection by post-processing the output of an arbitrary detector to rescore the confidences of its detections. |
1453 | End-to-End Adversarial-Attention Network for Multi-Modal Clustering | Runwu Zhou; Yi-Dong Shen; | In this paper, we present an End-to-end Adversarial-attention network for Multi-modal Clustering (EAMC), where adversarial learning and attention mechanism are leveraged to align the latent feature distributions and quantify the importance of modalities respectively. |
1454 | Fast Sparse ConvNets | Erich Elsen; Marat Dukhan; Trevor Gale; Karen Simonyan; | In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. |
1455 | Few Sample Knowledge Distillation for Efficient Network Compression | Tianhong Li; Jianguo Li; Zhuang Liu; Changshui Zhang; | This paper proposes a novel solution for knowledge distillation from label-free few samples to realize both data efficiency and training/processing efficiency. |
1456 | Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields | Michael Ramamonjisoa; Yuming Du; Vincent Lepetit; | We instead learn to predict, given a depth map predicted by some reconstruction method, a 2D displacement field able to re-sample pixels around the occlusion boundaries into sharper reconstructions. |
1457 | Shape correspondence using anisotropic Chebyshev spectral CNNs | Qinsong Li; Shengjun Liu; Ling Hu; Xinru Liu; | In this paper, we propose a novel architecture for shape correspondence, termed Anisotropic Chebyshev spectral CNNs (ACSCNNs), based on a new extension of the manifold convolution operator. |
1458 | RetinaTrack: Online Single Stage Joint Detection and Tracking | Zhichao Lu; Vivek Rathod; Ronny Votel; Jonathan Huang; | In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. |
1459 | Multimodal Categorization of Crisis Events in Social Media | Mahdi Abavisani; Liwei Wu; Shengli Hu; Joel Tetreault; Alejandro Jaimes; | In this paper, we present a new multimodal fusion method that leverages both images and texts as input. |
1460 | SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings | Wenyu Han; Siyuan Xiang; Chenhui Liu; Ruoyu Wang; Chen Feng; | Can deep networks be trained to perform spatial reasoning tasks? How can we measure their “spatial intelligence”? To answer these questions, we present the SPARE3D dataset. |
1461 | SwapText: Image Based Texts Transfer in Scenes | Qiangpeng Yang; Jun Huang; Wei Lin; | In this work, we present SwapText, a three-stage framework to transfer texts across scene images. |
1462 | OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold | Mohamed Yousef; Tom E. Bishop; | We propose a novel and simple neural network module, termed OrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer, to convert it into a multi-line version by providing the model with enough spatial capacity to be able to properly collapse a 2D input signal into 1D without losing information. |
1463 | FroDO: From Detections to 3D Objects | Martin Runz; Kejie Li; Meng Tang; Lingni Ma; Chen Kong; Tanner Schmidt; Ian Reid; Lourdes Agapito; Julian Straub; Steven Lovegrove; Richard Newcombe; | We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner. |
1464 | Single-Step Adversarial Training With Dropout Scheduling | Vivek B.S.; R. Venkatesh Babu; | In this work, (i) we show that models trained using single-step adversarial training method learn to prevent the generation of single-step adversaries, and this is due to over-fitting of the model during the initial stages of training, and (ii) to mitigate this effect, we propose a single-step adversarial training method with dropout scheduling. |
1465 | Learning to Super Resolve Intensity Images From Events | S. Mohammad Mostafavi I.; Jonghyun Choi; Kuk-Jin Yoon; | We propose an end-to-end network to reconstruct high resolution, high dynamic range (HDR) images directly from the event stream. |
1466 | DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection | Liming Jiang; Ren Li; Wayne Wu; Chen Qian; Chen Change Loy; | In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. |
1467 | CNN-Generated Images Are Surprisingly Easy to Spot… for Now | Sheng-Yu Wang; Oliver Wang; Richard Zhang; Andrew Owens; Alexei A. Efros; | In this work we ask whether it is possible to create a “universal” detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used. |