Paper Digest: CVPR 2020 Highlights

June 7, 2020November 10, 2020 admin

Download CVPR-2020-Paper-Digests.pdf– highlights of all CVPR-2020 papers. Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2020, it is to be held virtually due to covid-19 pandemic. There were more than 6,600 paper submissions, of which ~1,470 were accepted. More than 200 papers also published their code (download link).

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: Paper Digest: CVPR 2020 Highlights v0

	Title	Authors	Highlight
1	Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild	Shangzhe Wu; Christian Rupprecht; Andrea Vedaldi;	We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision.
2	Footprints and Free Space From a Single Color Image	Jamie Watson; Michael Firman; Aron Monszpart; Gabriel J. Brostow;	We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input.
3	Dynamic Fluid Surface Reconstruction Using Deep Neural Network	Simron Thapa; Nianyi Li; Jinwei Ye;	Here we present a learning-based single-image approach for 3D fluid surface reconstruction.
4	CvxNet: Learnable Convex Decomposition	Boyang Deng; Kyle Genova; Soroosh Yazdani; Sofien Bouaziz; Geoffrey Hinton; Andrea Tagliasacchi;	We introduce a network architecture to represent a low dimensional family of convexes.
5	BSP-Net: Generating Compact Meshes via Binary Space Partitioning	Zhiqin Chen; Andrea Tagliasacchi; Hao Zhang;	The core ingredient of BSP is an operation for recursive subdivision of space to obtain convex sets. By exploiting this property, we devise BSP-Net, a network that learns to represent a 3D shape via convex decomposition.
6	Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes From a Single Image	Yinyu Nie; Xiaoguang Han; Shihui Guo; Yujian Zheng; Jian Chang; Jian Jun Zhang;	In this paper, we bridge the gap between understanding and reconstruction, and propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image.
7	Generating and Exploiting Probabilistic Monocular Depth Estimates	Zhihao Xia; Patrick Sullivan; Ayan Chakrabarti;	Instead, we propose a versatile task-agnostic monocular model that outputs a probability distribution over scene depth given an input color image, as a sample approximation of outputs from a patch-wise conditional VAE.
8	Neural Cages for Detail-Preserving 3D Deformations	Wang Yifan; Noam Aigerman; Vladimir G. Kim; Siddhartha Chaudhuri; Olga Sorkine-Hornung;	We propose a novel learnable representation for detail preserving shape deformation.
9	PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization	Shunsuke Saito; Tomas Simon; Jason Saragih; Hanbyul Joo;	Due to memory limitations in current hardware, previous approaches tend to take low resolution images as input to cover large spatial context, and produce less precise (or low resolution) 3D estimates as a result. We address this limitation by formulating a multi-level architecture that is end-to-end trainable.
10	A Lighting-Invariant Point Processor for Shading	Kathryn Heal; Jialiang Wang; Steven J. Gortler; Todd Zickler;	We describe the geometry of this variety, and we introduce a concise feedforward model that computes an explicit, differentiable approximation of the variety from the intensity and its derivatives at any single image point.
11	ActiveMoCap: Optimized Viewpoint Selection for Active Human Motion Capture	Sena Kiciroglu; Helge Rhodin; Sudipta N. Sinha; Mathieu Salzmann; Pascal Fua;	Specifically, given a short video sequence, we introduce an algorithm that predicts which viewpoints should be chosen to capture future frames so as to maximize 3D human pose estimation accuracy.
12	Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations	Ziyu Jiang; Buyu Liu; Samuel Schulter; Zhangyang Wang; Manmohan Chandraker;	We address the challenging task of occlusion-aware indoor 3D scene understanding.
13	Multi-Modal Domain Adaptation for Fine-Grained Action Recognition	Jonathan Munro; Dima Damen;	In this work we exploit the correspondence of modalities as a self-supervised alignment approach for UDA in addition to adversarial alignment (Fig. 1).
14	Evolving Losses for Unsupervised Video Representation Learning	AJ Piergiovanni; Anelia Angelova; Michael S. Ryoo;	We present a new method to learn video representations from large-scale unlabeled video data.
15	Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition	Ziyu Liu; Hongwen Zhang; Zhenghao Chen; Zhiyong Wang; Wanli Ouyang;	In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D.
16	A Multigrid Method for Efficiently Training Video Models	Chao-Yuan Wu; Ross Girshick; Kaiming He; Christoph Feichtenhofer; Philipp Krahenbuhl;	Inspired by multigrid methods in numerical optimization, we propose to use variable mini-batch shapes with different spatial-temporal resolutions that are varied according to a schedule.
17	Ego-Topo: Environment Affordances From Egocentric Video	Tushar Nagarajan; Yanghao Li; Christoph Feichtenhofer; Kristen Grauman;	We introduce a model for environment affordances that is learned directly from egocentric video.
18	Generative Hybrid Representations for Activity Forecasting With No-Regret Learning	Jiaqi Guan; Ye Yuan; Kris M. Kitani; Nicholas Rhinehart;	In this work, we develop an efficient deep generative model to jointly forecast a person’s future discrete actions and continuous motions.
19	Skeleton-Based Action Recognition With Shift Graph Convolutional Network	Ke Cheng; Yifan Zhang; Xiangyu He; Weihan Chen; Jian Cheng; Hanqing Lu;	In this paper, we propose a novel shift graph convolutional network (Shift-GCN) to overcome both shortcomings.
20	Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning	Zhibo Yang; Lihan Huang; Yupei Chen; Zijun Wei; Seoyoung Ahn; Gregory Zelinsky; Dimitris Samaras; Minh Hoai;	We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search.
21	X3D: Expanding Architectures for Efficient Video Recognition	Christoph Feichtenhofer;	This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.
22	Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction	Maosen Li; Siheng Chen; Yangheng Zhao; Ya Zhang; Yanfeng Wang; Qi Tian;	We propose novel dynamic multiscale graph neural networks (DMGNN) to predict 3D skeleton-based human motions.
23	Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects	Kiana Ehsani; Shubham Tulsiani; Saurabh Gupta; Ali Farhadi; Abhinav Gupta;	In this paper, we take a step towards more physical understanding of actions.
24	DaST: Data-Free Substitute Training for Adversarial Attacks	Mingyi Zhou; Jing Wu; Yipeng Liu; Shuaicheng Liu; Ce Zhu;	In this paper, we propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks without the requirement of any real data.
25	Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations	Jeet Mohapatra; Tsui-Wei Weng; Pin-Yu Chen; Sijia Liu; Luca Daniel;	To bridge this gap, we propose Semantify-NN, a model-agnostic and generic robustness verification approach against semantic perturbations for neural networks.
26	The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks	Yuheng Zhang; Ruoxi Jia; Hengzhi Pei; Wenxiao Wang; Bo Li; Dawn Song;	Here we present a novel attack method, termed the generative model-inversion attack, which can invert deep neural networks with high success rates.
27	A Self-supervised Approach for Adversarial Robustness	Muzammal Naseer; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Fatih Porikli;	In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space.
28	Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization	Saehyung Lee; Hyungyu Lee; Sungroh Yoon;	In this paper, we identify Adversarial Feature Overfitting (AFO), which may cause poor adversarially robust generalization, and we show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to AFO in our simple Gaussian model.
29	How Does Noise Help Robustness? Explanation and Exploration under the Neural SDE Framework	Xuanqing Liu; Tesi Xiao; Si Si; Qin Cao; Sanjiv Kumar; Cho-Jui Hsieh;	In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE), which naturally incorporates various commonly used regularization mechanisms based on random noise injection.
30	Unpaired Image Super-Resolution Using Pseudo-Supervision	Shunta Maeda;	In this paper, we propose an unpaired SR method using a generative adversarial network that does not require a paired/aligned training dataset.
31	Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs	Soheil Kolouri; Aniruddha Saha; Hamed Pirsiavash; Heiko Hoffmann;	In this paper, we introduce a benchmark technique for detecting backdoor attacks (aka Trojan attacks) on deep convolutional neural networks (CNNs).
32	Robustness Guarantees for Deep Neural Networks on Videos	Min Wu; Marta Kwiatkowska;	In this paper, we consider the robustness of deep neural networks on videos, which comprise both the spatial features of individual frames extracted by a convolutional neural network and the temporal dynamics between adjacent frames captured by a recurrent neural network.
33	Benchmarking Adversarial Robustness on Image Classification	Yinpeng Dong; Qi-An Fu; Xiao Yang; Tianyu Pang; Hang Su; Zihao Xiao; Jun Zhu;	In this paper, we establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks.
34	What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients	Alvin Chan; Yi Tay; Yew-Soon Ong;	Using only natural images, we show here that training a student model’s input gradients to match those of a robust teacher model can gain robustness close to a strong baseline that is robustly trained from scratch.
35	Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking	Hongjun Wang; Guangrun Wang; Ya Li; Dongyu Zhang; Liang Lin;	In this work, we examine the insecurity of current best-performing ReID models by proposing a learning-to-mis-rank formulation to perturb the ranking of the system output.
36	Video Modeling With Correlation Networks	Heng Wang; Du Tran; Lorenzo Torresani; Matt Feiszli;	This paper proposes an alternative approach based on a learnable correlation operator that can be used to establish frame-to-frame matches over convolutional feature maps in the different layers of the network.
37	Projection & Probability-Driven Black-Box Attack	Jie Li; Rongrong Ji; Hong Liu; Jianzhuang Liu; Bineng Zhong; Cheng Deng; Qi Tian;	In this paper, we propose Projection & Probability-driven Black-box Attack (PPBA) to tackle this problem by reducing the solution space and providing better optimization.
38	Auxiliary Training: Towards Accurate and Robust Models	Linfeng Zhang; Muzhou Yu; Tong Chen; Zuoqiang Shi; Chenglong Bao; Kaisheng Ma;	In this paper, we propose a novel training method via introducing the auxiliary classifiers for training on corrupted samples, while the clean samples are normally trained with the primary classifier.
39	PaStaNet: Toward Human Activity Knowledge Engine	Yong-Lu Li; Liang Xu; Xinpeng Liu; Xijie Huang; Yue Xu; Shiyi Wang; Hao-Shu Fang; Ze Ma; Mingyang Chen; Cewu Lu;	In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics.
40	A Hierarchical Graph Network for 3D Object Detection on Point Clouds	Jintai Chen; Biwen Lei; Qingyu Song; Haochao Ying; Danny Z. Chen; Jian Wu;	In this paper, we propose a new graph convolution (GConv) based hierarchical graph network (HGNet) for 3D object detection, which processes raw point clouds directly to predict 3D bounding boxes.
41	Learning Generative Models of Shape Handles	Matheus Gadelha; Giorgio Gori; Duygu Ceylan; Radomir Mech; Nathan Carr; Tamy Boubekeur; Rui Wang; Subhransu Maji;	We present a generative model to synthesize 3D shapes as sets of handles — lightweight proxies that approximate the original 3D shape — for applications in interactive editing, shape parsing, and building compact 3D representations.
42	One Man’s Trash Is Another Man’s Treasure: Resisting Adversarial Examples by Adversarial Examples	Chang Xiao; Changxi Zheng;	We embrace the omnipresence of adversarial examples and the numerical procedure of crafting them, and turn this harmful attacking process into a useful defense mechanism.
43	Toward a Universal Model for Shape From Texture	Dor Verbin; Todd Zickler;	We consider the shape from texture problem, where the input is a single image of a curved, textured surface, and the texture and shape are both a priori unknown.
44	HybridPose: 6D Object Pose Estimation Under Hybrid Representations	Chen Song; Jiaru Song; Qixing Huang;	We introduce HybridPose, a novel 6D object pose estimation approach.
45	Boundary-Aware 3D Building Reconstruction From a Single Overhead Image	Jisan Mahmud; True Price; Akash Bapat; Jan-Michael Frahm;	We propose a boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image.
46	Articulation-Aware Canonical Surface Mapping	Nilesh Kulkarni; Abhinav Gupta; David F. Fouhey; Shubham Tulsiani;	We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape , and 2) inferring the articulation and pose of the template corresponding to the input image.
47	BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion	Fu-En Wang; Yu-Hsuan Yeh; Min Sun; Wei-Chen Chiu; Yi-Hsuan Tsai;	Thus we propose a bi-projection fusion scheme along with learnable masks to balance the feature map from the two projections.
48	Transformation GAN for Unsupervised Image Synthesis and Representation Learning	Jiayu Wang; Wengang Zhou; Guo-Jun Qi; Zhongqian Fu; Qi Tian; Houqiang Li;	To improve both image synthesis quality and representation learning performance under the unsupervised setting, in this paper, we propose a simple yet effective Transformation Generative Adversarial Networks (TrGAN).
49	PPDM: Parallel Point Detection and Matching for Real-Time Human-Object Interaction Detection	Yue Liao; Si Liu; Fei Wang; Yanjie Chen; Chen Qian; Jiashi Feng;	We propose a single-stage Human-Object Interaction (HOI) detection method that has outperformed all existing methods on HICO-DET dataset at 37 fps on a single Titan XP GPU.
50	Height and Uprightness Invariance for 3D Prediction From a Single View	Manel Baradad; Antonio Torralba;	To account for this, we propose a system that directly regresses 3D world coordinates for each pixel.
51	SCT: Set Constrained Temporal Transformer for Set Supervised Action Segmentation	Mohsen Fayyaz; Jurgen Gall;	In order to address this task, we propose an approach that can be trained end-to-end on such data.
52	3DV: 3D Dynamic Voxel for Action Recognition in Depth Video	Yancheng Wang; Yang Xiao; Fu Xiong; Wenxiang Jiang; Zhiguo Cao; Joey Tianyi Zhou; Junsong Yuan;	With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling.
53	Adaptive Interaction Modeling via Graph Operations Search	Haoxin Li; Wei-Shi Zheng; Yu Tao; Haifeng Hu; Jian-Huang Lai;	In this paper, we automate the process of structures design to learn adaptive structures for interaction modeling.
54	Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction	Yuan Yao; Nico Schertler; Enrique Rosales; Helge Rhodin; Leonid Sigal; Alla Sheffer;	In this work, we induce structure and geometric constraints by leveraging three core observations: (1) the surface of most everyday objects is often almost entirely exposed from pairs of typical opposite views; (2) everyday objects often exhibit global reflective symmetries which can be accurately predicted from single views; (3) opposite orthographic views of a 3D shape share consistent silhouettes.
55	SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation	Lijun Wang; Jianming Zhang; Oliver Wang; Zhe Lin; Huchuan Lu;	Due to its complexity, we propose a deep neural network model based on a semantic divide-and-conquer approach.
56	Single-View View Synthesis With Multiplane Images	Richard Tucker; Noah Snavely;	Our method learns to predict a multiplane image directly from a single image input, and we introduce scale-invariant view synthesis for supervision, enabling us to train on online video.
57	Deep Parametric Shape Predictions Using Distance Fields	Dmitriy Smirnov; Matthew Fisher; Vladimir G. Kim; Richard Zhang; Justin Solomon;	Hence, we propose a new framework for predicting parametric shape primitives using deep learning.
58	Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction	Yana Hasson; Bugra Tekin; Federica Bogo; Ivan Laptev; Marc Pollefeys; Cordelia Schmid;	To overcome this challenge we present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
59	Ensemble Generative Cleaning With Feedback Loops for Defending Adversarial Attacks	Jianhe Yuan; Zhihai He;	In this paper, we develop a new method called ensemble generative cleaning with feedback loops (EGC-FL) for effective defense of deep neural networks.
60	Temporal Pyramid Network for Action Recognition	Ceyuan Yang; Yinghao Xu; Jianping Shi; Bo Dai; Bolei Zhou;	In this work we propose a generic Temporal Pyramid Network (TPN) at the feature-level, which can be flexibly integrated into 2D or 3D backbone networks in a plug-and-play manner.
61	FaceScape: A Large-Scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction	Haotian Yang; Hao Zhu; Yanru Wang; Mingkai Huang; Qiu Shen; Ruigang Yang; Xun Cao;	In this paper, we present a large-scale detailed 3D face dataset, FaceScape, and propose a novel algorithm that is able to predict elaborate riggable 3D face models from a single image input.
62	Structure-Guided Ranking Loss for Single Image Depth Prediction	Ke Xian; Jianming Zhang; Oliver Wang; Long Mai; Zhe Lin; Zhiguo Cao;	To more effectively learn from such pseudo-depth data, we propose to use a simple pair-wise ranking loss with a novel sampling strategy.
63	In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction From 2D Landmarks	Heng Yang; Luca Carlone;	We study the problem of 3D shape reconstruction from 2D landmarks extracted in a single image.
64	When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks	Minghao Guo; Yuzhe Yang; Rui Xu; Ziwei Liu; Dahua Lin;	In this work, we take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks.
65	Towards Transferable Targeted Attack	Maosen Li; Cheng Deng; Tengjiao Li; Junchi Yan; Xinbo Gao; Heng Huang;	To overcome the above problems, we propose a novel targeted attack approach to effectively generate more transferable adversarial examples.
66	Self-Supervised Human Depth Estimation From Monocular Videos	Feitong Tan; Hao Zhu; Zhaopeng Cui; Siyu Zhu; Marc Pollefeys; Ping Tan;	This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network.
67	Recursive Social Behavior Graph for Trajectory Prediction	Jianhua Sun; Qinhong Jiang; Cewu Lu;	In this paper, we present a novel insight of group-based social interaction model to explore relationships among pedestrians.
68	Context-Aware and Scale-Insensitive Temporal Repetition Counting	Huaidong Zhang; Xuemiao Xu; Guoqiang Han; Shengfeng He;	In this paper, we tailor a context-aware and scale-insensitive framework, to tackle the challenges in repetition counting caused by the unknown and diverse cycle-lengths.
69	OASIS: A Large-Scale Dataset for Single Image 3D in the Wild	Weifeng Chen; Shengyi Qian; David Fan; Noriyuki Kojima; Max Hamilton; Jia Deng;	We address this issue by presenting Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images.
70	VPLNet: Deep Single View Normal Estimation With Vanishing Points and Lines	Rui Wang; David Geraghty; Kevin Matzen; Richard Szeliski; Jan-Michael Frahm;	We present a novel single-view surface normal estimation method that combines traditional line and vanishing point analysis with a deep learning approach.
71	Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning	Tianlong Chen; Sijia Liu; Shiyu Chang; Yu Cheng; Lisa Amini; Zhangyang Wang;	We introduce adversarial training into self-supervision, to provide general-purpose robust pretrained models for the first time.
72	Defending Against Universal Attacks Through Selective Feature Regeneration	Tejas Borkar; Felix Heide; Lina Karam;	Departing from existing defense strategies that work mostly in the image domain, we present a novel defense which operates in the DNN feature domain and effectively defends against such universal perturbations.
73	Universal Physical Camouflage Attacks on Object Detectors	Lifeng Huang; Chengying Gao; Yuyin Zhou; Cihang Xie; Alan L. Yuille; Changqing Zou; Ning Liu;	In this paper, we study physical adversarial attacks on object detectors in the wild.
74	Intra- and Inter-Action Understanding via Temporal Action Parsing	Dian Shao; Yue Zhao; Bo Dai; Dahua Lin;	Towards this goal, we construct TAPOS, a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
75	Lightweight Photometric Stereo for Facial Details Recovery	Xueying Wang; Yudong Guo; Bailin Deng; Juyong Zhang;	In this paper, we present a lightweight strategy that only requires sparse inputs or even a single image to recover high-fidelity face shapes with images captured under near-field lights.
76	Bundle Pooling for Polygonal Architecture Segmentation Problem	Huayi Zeng; Kevin Joseph; Adam Vest; Yasutaka Furukawa;	This paper introduces a polygonal architecture segmentation problem, proposes bundle-pooling modules for line structure reasoning, and demonstrates a virtual remodeling application that produces production quality results.
77	AvatarMe: Realistically Renderable 3D Facial Reconstruction "In-the-Wild"	Alexandros Lattas; Stylianos Moschoglou; Baris Gecer; Stylianos Ploumpis; Vasileios Triantafyllou; Abhijeet Ghosh; Stefanos Zafeiriou;	In this paper, we introduce AvatarMe, the first method that is able to reconstruct photorealistic 3D faces from a single "in-the-wild" image with an increasing level of detail.
78	Defending Against Model Stealing Attacks With Adaptive Misinformation	Sanjay Kariyappa; Moinuddin K. Qureshi;	We propose "Adaptive Misinformation" to defend against such model stealing attacks.
79	Learning to Generate 3D Training Data Through Hybrid Gradient	Dawei Yang; Jia Deng;	In this work, we propose a new method that optimizes the generation of 3D training data based on what we call "hybrid gradient".
80	Cascaded Refinement Network for Point Cloud Completion	Xiaogang Wang; Marcelo H. Ang Jr.; Gim Hee Lee;	To this end, we propose a cascaded refinement network together with a coarse-to-fine strategy to synthesize the detailed object shapes.
81	Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder	Guanlin Li; Shuya Ding; Jun Luo; Chang Liu;	In this paper, we propose an attack-agnostic defence framework to enhance the intrinsic robustness of neural networks, without jeopardizing the ability of generalizing clean samples.
82	Learning to Discriminate Information for Online Action Detection	Hyunjun Eun; Jinyoung Moon; Jongyoul Park; Chanho Jung; Changick Kim;	For online action detection, in this paper, we propose a novel recurrent unit to explicitly discriminate the information relevant to an ongoing action from others.
83	Adversarial Examples Improve Image Recognition	Cihang Xie; Mingxing Tan; Boqing Gong; Jiang Wang; Alan L. Yuille; Quoc V. Le;	Here we present an opposite perspective: adversarial examples can be used to improve image recognition models if harnessed in the right manner.
84	PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes	Rundi Wu; Yixin Zhuang; Kai Xu; Hao Zhang; Baoquan Chen;	We introduce PQ-NET, a deep neural network which represents and generates 3D shapes via sequential part assembly.
85	Actor-Transformers for Group Activity Recognition	Kirill Gavrilyuk; Ryan Sanford; Mehrsan Javan; Cees G. M. Snoek;	While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition.
86	SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans	Angela Dai; Christian Diller; Matthias Niessner;	We present a novel approach that converts partial and noisy RGB-D scans into high-quality 3D scene reconstructions by inferring unobserved scene geometry.
87	Geometry-Aware Satellite-to-Ground Image Synthesis for Urban Areas	Xiaohu Lu; Zuoyue Li; Zhaopeng Cui; Martin R. Oswald; Marc Pollefeys; Rongjun Qin;	We present a novel method for generating panoramic street-view images which are geometrically consistent with a given satellite image.
88	Action Modifiers: Learning From Adverbs in Instructional Videos	Hazel Doughty; Ivan Laptev; Walterio Mayol-Cuevas; Dima Damen;	We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations.
89	ZSTAD: Zero-Shot Temporal Activity Detection	Lingling Zhang; Xiaojun Chang; Jun Liu; Minnan Luo; Sen Wang; Zongyuan Ge; Alexander Hauptmann;	To solve this challenging problem, we propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
90	Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery	Lei Jin; Yanyu Xu; Jia Zheng; Junfei Zhang; Rui Tang; Shugong Xu; Jingyi Yu; Shenghua Gao;	Motivated by the correlation between the depth and the geometric structure of a 360 indoor image, we propose a novel learning-based depth estimation framework that leverages the geometric structure of a scene to conduct depth estimation.
91	Deep Kinematics Analysis for Monocular 3D Human Pose Estimation	Jingwei Xu; Zhenbo Yu; Bingbing Ni; Jiancheng Yang; Xiaokang Yang; Wenjun Zhang;	In this paper, we propose to address above issue in a systematic view.
92	TEA: Temporal Excitation and Aggregation for Action Recognition	Yan Li; Bin Ji; Xintian Shi; Jianguo Zhang; Bin Kang; Limin Wang;	In this paper, we propose a Temporal Excitation and Aggregation (TEA) block, including a motion excitation (ME) module and a multiple temporal aggregation (MTA) module, specifically designed to capture both short- and long-range temporal evolution.
93	Oops! Predicting Unintentional Action in Video	Dave Epstein; Boyuan Chen; Carl Vondrick;	We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recognizing, localizing, and anticipating its onset.
94	Scene Recomposition by Learning-Based ICP	Hamid Izadinia; Steven M. Seitz;	In addition to the fully automatic system, the key technical contribution is a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning.
95	Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction	Yantao Lu; Yunhan Jia; Jianyu Wang; Bai Li; Weiheng Chai; Lawrence Carin; Senem Velipasalar;	Our proposed attack minimizes the “dispersion” of the internal feature map, overcoming the limitations of existing attacks, that require task-specific loss functions and/or probing a target model.
96	Deep Non-Line-of-Sight Reconstruction	Javier Grau Chopite; Matthias B. Hullin; Michael Wand; Julian Iseringhausen;	In this paper, we employ convolutional feed-forward networks for solving the reconstruction problem efficiently while maintaining good reconstruction quality.
97	SSRNet: Scalable 3D Surface Reconstruction Network	Zhenxing Mi; Yiming Luo; Wenbing Tao;	In this paper, we propose the SSRNet, a novel scalable learning-based method for surface reconstruction.
98	Progressive Relation Learning for Group Activity Recognition	Guyue Hu; Bo Cui; Yuan He; Shan Yu;	In this paper, we propose a novel method based on deep reinforcement learning to progressively refine the low-level features and high-level relations of group activities.
99	Cooling-Shrinking Attack: Blinding the Tracker With Imperceptible Noises	Bin Yan; Dong Wang; Huchuan Lu; Xiaoyun Yang;	In this paper, a cooling-shrinking attack method is proposed to deceive state-of-the-art SiameseRPN-based trackers.
100	Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles	Ranjie Duan; Xingjun Ma; Yisen Wang; James Bailey; A. K. Qin; Yun Yang;	In this paper, we propose a novel approach, called Adversarial Camouflage (AdvCam), to craft and camouflage physical-world adversarial examples into natural styles that appear legitimate to human observers.
101	Weakly-Supervised Action Localization by Generative Attention Modeling	Baifeng Shi; Qi Dai; Yadong Mu; Jingdong Wang;	To solve the problem, in this paper we propose to model the class-agnostic frame-wise probability conditioned on the frame attention using conditional Variational Auto-Encoder (VAE).
102	Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes	Sravanti Addepalli; Vivek B.S.; Arya Baburaj; Gaurang Sriramanan; R. Venkatesh Babu;	In this work, we attempt to address this problem by training networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction.
103	Polishing Decision-Based Adversarial Noise With a Customized Sampling	Yucheng Shi; Yahong Han; Qi Tian;	In this paper, we demonstrate the advantage of using current noise and historical queries to customize the variance and mean of sampling in boundary attack to polish adversarial noise.
104	Towards Large Yet Imperceptible Adversarial Image Perturbations With Perceptual Color Distance	Zhengyu Zhao; Zhuoran Liu; Martha Larson;	In this work, we drop this assumption by pursuing an approach that exploits human color perception, and more specifically, minimizing perturbation size with respect to perceptual color distance.
105	Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks	Joanna Materzynska; Tete Xiao; Roei Herzig; Huijuan Xu; Xiaolong Wang; Trevor Darrell;	In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions.
106	Learning Unsupervised Hierarchical Part Decomposition of 3D Objects From a Single RGB Image	Despoina Paschalidou; Luc Van Gool; Andreas Geiger;	We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision.
107	Focus on Defocus: Bridging the Synthetic to Real Domain Gap for Depth Estimation	Maxim Maximov; Kevin Galim; Laura Leal-Taixe;	In this paper, we tackle this issue by using domain invariant defocus blur as direct supervision.
108	Active Vision for Early Recognition of Human Actions	Boyu Wang; Lihan Huang; Minh Hoai;	We propose a method for early recognition of human actions, one that can take advantages of multiple cameras while satisfying the constraints due to limited communication bandwidth and processing power.
109	SmallBigNet: Integrating Core and Contextual Views for Video Classification	Xianhang Li; Yali Wang; Zhipeng Zhou; Yu Qiao;	To alleviate this problem, we propose a concise and novel SmallBig network, with the cooperation of small and big views.
110	Gate-Shift Networks for Video Action Recognition	Swathikiran Sudhakaran; Sergio Escalera; Oswald Lanz;	In this paper we introduce spatial gating in spatial-temporal decomposition of 3D kernels.
111	Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition	Pengfei Zhang; Cuiling Lan; Wenjun Zeng; Junliang Xing; Jianru Xue; Nanning Zheng;	In this paper, we propose a simple yet effective semantics-guided neural network (SGN) for skeleton-based action recognition.
112	Exploiting Joint Robustness to Adversarial Perturbations	Ali Dabouei; Sobhan Soleymani; Fariborz Taherkhani; Jeremy Dawson; Nasser M. Nasrabadi;	In this paper, we exploit first-order interactions within ensembles to formalize a reliable and practical defense.
113	From Image Collections to Point Clouds With Self-Supervised Shape and Pose Networks	K L Navaneet; Ansu Mathew; Shashank Kashyap; Wei-Chih Hung; Varun Jampani; R. Venkatesh Babu;	In this work, we propose a deep learning technique for 3D object reconstruction from a single image.
114	Searching for Actions on the Hyperbole	Teng Long; Pascal Mettes; Heng Tao Shen; Cees G. M. Snoek;	In this paper, we introduce hierarchical action search.
115	ColorFool: Semantic Adversarial Colorization	Ali Shahin Shamsabadi; Ricardo Sanchez-Matilla; Andrea Cavallaro;	In this paper, we propose a content-based black-box adversarial attack that generates unrestricted perturbations by exploiting image semantics to selectively modify colors within chosen ranges that are perceived as natural by humans.
116	Boosting the Transferability of Adversarial Samples via Attention	Weibin Wu; Yuxin Su; Xixian Chen; Shenglin Zhao; Irwin King; Michael R. Lyu; Yu-Wing Tai;	In this work, we propose a novel mechanism to alleviate the overfitting issue.
117	ActionBytes: Learning From Trimmed Videos to Localize Actions	Mihir Jain; Amir Ghodrati; Cees G. M. Snoek;	We propose a method to train an action localization network that segments a video into interpretable fragments, we call ActionBytes.
118	Efficient Adversarial Training With Transferable Adversarial Examples	Haizhong Zheng; Ziqi Zhang; Juncheng Gu; Honglak Lee; Atul Prakash;	Leveraging this property, we propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs.
119	Alleviation of Gradient Exploding in GANs: Fake Can Be Real	Song Tao; Jia Wang;	In order to alleviate the notorious mode collapse phenomenon in generative adversarial networks (GANs), we propose a novel training method of GANs in which certain fake samples are considered as real ones during the training process.
120	On Isometry Robustness of Deep 3D Point Cloud Models Under Adversarial Attacks	Yue Zhao; Yuwei Wu; Caihua Chen; Andrew Lim;	Incorporating with the Restricted Isometry Property, we propose a novel framework of white-box attack on top of spectral norm based perturbation.
121	Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations	Sven Gowal; Chongli Qin; Po-Sen Huang; Taylan Cemgil; Krishnamurthy Dvijotham; Timothy Mann; Pushmeet Kohli;	In this paper, we propose a novel approach to express and formalize robustness to these kinds of real-world transformations of the input.
122	QEBA: Query-Efficient Boundary-Based Blackbox Attack	Huichen Li; Xiaojun Xu; Xiaolu Zhang; Shuang Yang; Bo Li;	In this paper, we propose a Query-Efficient Boundary-based blackbox Attack (QEBA) based only on model’s final prediction labels.
123	Learning to Simulate Dynamic Environments With GameGAN	Seung Wook Kim; Yuhao Zhou; Jonah Philion; Antonio Torralba; Sanja Fidler;	In this paper, we aim to learn a simulator by simply watching an agent interact with an environment.
124	Learn2Perturb: An End-to-End Feature Perturbation Learning to Improve Adversarial Robustness	Ahmadreza Jeddi; Mohammad Javad Shafiee; Michelle Karg; Christian Scharfenberger; Alexander Wong;	In this study, we introduce Learn2Perturb, an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks.
125	SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization	Yue Jiang; Dantong Ji; Zhizhong Han; Matthias Zwicker;	We propose SDFDiff, a novel approach for image-based shape optimization using differentiable rendering of 3D shapes represented by signed distance functions (SDFs).
126	Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes	Zhengqin Li; Yu-Ying Yeh; Manmohan Chandraker;	Our novel contributions include a normal representation that enables the network to model complex light transport through local computation, a rendering layer that models refractions and reflections, a cost volume specifically designed for normal refinement of transparent shapes and a feature mapping based on predicted normals for 3D point cloud reconstruction.
127	TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning	Joo Ho Lee; Hyunho Ha; Yue Dong; Xin Tong; Min H. Kim;	In this work, we propose a progressive texture-fusion method specially designed for real-time RGB-D scanning.
128	D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry	Nan Yang; Lukas von Stumberg; Rui Wang; Daniel Cremers;	We propose D3VO as a novel framework for monocular visual odometry that exploits deep networks on three levels — deep depth, pose and uncertainty estimation.
129	Deep Implicit Volume Compression	Danhang Tang; Saurabh Singh; Philip A. Chou; Christian Hane; Mingsong Dou; Sean Fanello; Jonathan Taylor; Philip Davidson; Onur G. Guleryuz; Yinda Zhang; Shahram Izadi; Andrea Tagliasacchi; Sofien Bouaziz; Cem Keskin;	We describe a novel approach for compressing truncated signed distance fields (TSDF) stored in 3D voxel grids, and their corresponding textures.
130	MAGSAC++, a Fast, Reliable and Accurate Robust Estimator	Daniel Barath; Jana Noskova; Maksym Ivashechkin; Jiri Matas;	We propose MAGSAC++ and Progressive NAPSAC sampler, P-NAPSAC in short.
131	OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression	Lila Huang; Shenlong Wang; Kelvin Wong; Jerry Liu; Raquel Urtasun;	We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds.
132	4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras	Yuxiang Zhang; Liang An; Tao Yu; Xiu Li; Kun Li; Yebin Liu;	This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs.
133	Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion	Gengshan Yang; Deva Ramanan;	We describe an approach for upgrading 2D optical flow to 3D scene flow.
134	Robust 3D Self-Portraits in Seconds	Zhe Li; Tao Yu; Chuanyu Pan; Zerong Zheng; Yebin Liu;	In this paper, we propose an efficient method for robust 3D self-portraits using a single RGBD camera.
135	FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation	Matias Tassano; Julie Delon; Thomas Veit;	In this paper, we propose a state-of-the-art video denoising algorithm based on a convolutional neural network architecture.
136	Learning to Have an Ear for Face Super-Resolution	Givi Meishvili; Simon Jenni; Paolo Favaro;	We propose a novel method to use both audio and a low-resolution image to perform extreme face super-resolution (a 16x increase of the input size).
137	Deep Optics for Single-Shot High-Dynamic-Range Imaging	Christopher A. Metzler; Hayato Ikoma; Yifan Peng; Gordon Wetzstein;	Inspired by recent deep optical imaging approaches, we interpret this problem as jointly training an optical encoder and electronic decoder where the encoder is parameterized by the point spread function (PSF) of the lens, the bottleneck is the sensor with a limited dynamic range, and the decoder is a convolutional neural network (CNN).
138	Learning Rank-1 Diffractive Optics for Single-Shot High Dynamic Range Imaging	Qilin Sun; Ethan Tseng; Qiang Fu; Wolfgang Heidrich; Felix Heide;	In this work, we propose a method for snapshot HDR imaging by learning an optical HDR encoding in a single image which maps saturated highlights into neighboring unsaturated areas using a diffractive optical element (DOE).
139	Deep White-Balance Editing	Mahmoud Afifi; Michael S. Brown;	We introduce a deep learning approach to realistically edit an sRGB image’s white balance.
140	Non-Line-of-Sight Surface Reconstruction Using the Directional Light-Cone Transform	Sean I. Young; David B. Lindell; Bernd Girod; David Taubman; Gordon Wetzstein;	We propose a joint albedo-normal approach to non-line-of-sight (NLOS) surface reconstruction using the directional light-cone transform (D-LCT).
141	Seeing the World in a Bag of Chips	Jeong Joon Park; Aleksander Holynski; Steven M. Seitz;	Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone.
142	Correction Filter for Single Image Super-Resolution: Robustifying Off-the-Shelf Deep Super-Resolvers	Shady Abu Hussein; Tom Tirer; Raja Giryes;	Inspired by the literature on generalized sampling, in this work we propose a method for improving the performance of DNNs that have been trained with a fixed kernel on observations acquired by other kernels.
143	Retina-Like Visual Image Reconstruction via Spiking Neural Model	Lin Zhu; Siwei Dong; Jianing Li; Tiejun Huang; Yonghong Tian;	In this paper, we design a retina-like visual image reconstruction framework, which is flexible in reconstructing full texture of natural scenes from the totally new spike data.
144	Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging	Xin Yuan; Yang Liu; Jinli Suo; Qionghai Dai;	In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework.
145	Neural Network Pruning With Residual-Connections and Limited-Data	Jian-Hao Luo; Jianxin Wu;	In order to avoid the influence of label noise, we propose a label refinement approach to solve this problem.
146	AdderNet: Do We Really Need Multiplications in Deep Learning?	Hanting Chen; Yunhe Wang; Chunjing Xu; Boxin Shi; Chao Xu; Qi Tian; Chang Xu;	In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs.
147	NeuralScale: Efficient Scaling of Neurons for Resource-Constrained Deep Neural Networks	Eugene Lee; Chen-Yi Lee;	In this work, we attempt to search for the neuron (filter) configuration of a fixed network architecture that maximizes accuracy.
148	Training Quantized Neural Networks With a Full-Precision Auxiliary Module	Bohan Zhuang; Lingqiao Liu; Mingkui Tan; Chunhua Shen; Ian Reid;	In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function.
149	Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model	Dongdong Wang; Yandong Li; Liqiang Wang; Boqing Gong;	To tackle these challenges, we propose an approach that blends mixup and active learning.
150	Multi-Dimensional Pruning: A Unified Framework for Model Compression	Jinyang Guo; Wanli Ouyang; Dong Xu;	In this work, we propose a unified model compression framework called Multi-Dimensional Pruning (MDP) to simultaneously compress the convolutional neural networks (CNNs) on multiple dimensions.
151	Towards Efficient Model Compression via Learned Global Ranking	Ting-Wu Chin; Ruizhou Ding; Cha Zhang; Diana Marculescu;	To this end, we propose to learn a global ranking of the filters across different layers of the ConvNet, which is used to obtain a set of ConvNet architectures that have different accuracy/latency trade-offs by pruning the bottom-ranked filters.
152	HRank: Filter Pruning Using High-Rank Feature Map	Mingbao Lin; Rongrong Ji; Yan Wang; Yichen Zhang; Baochang Zhang; Yonghong Tian; Ling Shao;	In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank).
153	DMCP: Differentiable Markov Channel Pruning for Neural Networks	Shaopeng Guo; Yujie Wang; Quanquan Li; Junjie Yan;	In this paper, we propose a novel differentiable method for channel pruning, named Differentiable Markov Channel Pruning (DMCP), to efficiently search the optimal sub-structure.
154	ReSprop: Reuse Sparsified Backpropagation	Negar Goli; Tor M. Aamodt;	In this work, we focus on accelerating training by observing that about 90% of gradients are reusable during training.
155	Adversarial Texture Optimization From RGB-D Scans	Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu "Max" Jiang, Leonidas J. Guibas, Matthias Niessner, Thomas Funkhouser;	In this work, we present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views.
156	Synchronizing Probability Measures on Rotations via Optimal Transport	Tolga Birdal; Michael Arbel; Umut Simsekli; Leonidas J. Guibas;	We propose a nonparametric Riemannian particle optimization approach to solve the problem.
157	GhostNet: More Features From Cheap Operations	Kai Han; Yunhe Wang; Qi Tian; Jianyuan Guo; Chunjing Xu; Chang Xu;	This paper proposes a novel Ghost module to generate more feature maps from cheap operations.
158	Attention-Aware Multi-View Stereo	Keyang Luo; Tao Guan; Lili Ju; Yuesong Wang; Zhuo Chen; Yawei Luo;	In this paper, we propose an attention-aware deep neural network "AttMVS" for learning multi-view stereo.
159	Bi3D: Stereo Depth Estimation via Binary Classifications	Abhishek Badki; Alejandro Troccoli; Kihwan Kim; Jan Kautz; Pradeep Sen; Orazio Gallo;	We present Bi3D, a method that estimates depth via a series of binary classifications.
160	Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging	Zihao W. Wang; Peiqi Duan; Oliver Cossairt; Aggelos Katsaggelos; Tiejun Huang; Boxin Shi;	We present a novel computational imaging system with high resolution and low noise.
161	SGAS: Sequential Greedy Architecture Search	Guohao Li; Guocheng Qian; Itzel C. Delgadillo; Matthias Muller; Ali Thabet; Bernard Ghanem;	Aiming to alleviate this common issue, we introduce sequential greedy architecture search (SGAS), an efficient method for neural architecture search.
162	HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection	Maosheng Ye; Shuangjie Xu; Tongyi Cao;	We present Hybrid Voxel Network (HVNet), a novel one-stage unified network for point cloud based 3D object detection for autonomous driving.
163	Frequency Domain Compact 3D Convolutional Neural Networks	Hanting Chen; Yunhe Wang; Han Shu; Yehui Tang; Chunjing Xu; Boxin Shi; Chao Xu; Qi Tian; Chang Xu;	In this paper, we develop a novel approach for eliminating redundancy in the time dimensionality of 3D convolution filters by converting them into the frequency domain through a series of learned optimal transforms with extremely fewer parameters.
164	Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline	Yu-Lun Liu; Wei-Sheng Lai; Yu-Sheng Chen; Yi-Lung Kao; Ming-Hsuan Yang; Yung-Yu Chuang; Jia-Bin Huang;	In contrast to existing learning-based methods, our core idea is to incorporate the domain knowledge of the LDR image formation pipeline into our model.
165	DNU: Deep Non-Local Unrolling for Computational Spectral Imaging	Lizhi Wang; Chen Sun; Maoqing Zhang; Ying Fu; Hua Huang;	In this paper, we propose an interpretable neural network for computational spectral imaging.
166	Single Image Optical Flow Estimation With an Event Camera	Liyuan Pan; Miaomiao Liu; Richard Hartley;	In this paper, we propose a single image (potentially blurred) and events based optical flow estimation approach.
167	Multi-View Neural Human Rendering	Minye Wu; Yuehao Wang; Qiang Hu; Jingyi Yu;	We present an end-to-end Neural Human Renderer (NHR) for dynamic human captures under the multi-view setting.
168	Depth Sensing Beyond LiDAR Range	Kai Zhang; Jiaxin Xie; Noah Snavely; Qifeng Chen;	To that end, we propose a novel three-camera system that utilizes small field of view cameras.
169	Event Probability Mask (EPM) and Event Denoising Convolutional Neural Network (EDnCNN) for Neuromorphic Cameras	R. Wes Baldwin; Mohammed Almatrafi; Vijayan Asari; Keigo Hirakawa;	This paper presents a novel method for labeling real-world neuromorphic camera sensor data by calculating the likelihood of generating an event at each pixel within a short time window, which we refer to as "event probability mask" or EPM.
170	Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud	Weijing Shi; Raj Rajkumar;	In this paper, we propose a graph neural network to detect objects from a LiDAR point cloud.
171	Self-Learning Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence	Wenhan Yang; Robby T. Tan; Shiqi Wang; Jiaying Liu;	In this paper, we address the problem of rain streaks removal in video by developing a self-learned rain streak removal method, which does not require any clean groundtruth images in the training process.
172	Neuromorphic Camera Guided High Dynamic Range Imaging	Jin Han; Chu Zhou; Peiqi Duan; Yehui Tang; Chang Xu; Chao Xu; Tiejun Huang; Boxin Shi;	In this paper, we propose a neuromorphic camera guided high dynamic range imaging pipeline, and a network consisting of specially designed modules according to each step in the pipeline, which bridges the domain gaps on resolution, dynamic range, and color representation between two types of sensors and images.
173	Learning in the Frequency Domain	Kai Xu; Minghai Qin; Fei Sun; Yuhao Wang; Yen-Kuang Chen; Fengbo Ren;	Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss.
174	Polarized Reflection Removal With Perfect Alignment in the Wild	Chenyang Lei; Xuhua Huang; Mengdi Zhang; Qiong Yan; Wenxiu Sun; Qifeng Chen;	We present a novel formulation to removing reflection from polarized images in the wild.
175	Learning Multiview 3D Point Cloud Registration	Zan Gojcic; Caifa Zhou; Jan D. Wegner; Leonidas J. Guibas; Tolga Birdal;	We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm.
176	A Sparse Resultant Based Method for Efficient Minimal Solvers	Snehal Bhayani; Zuzana Kukelova; Janne Heikkila;	In this paper we study an alternative algebraic method for solving systems of polynomial equations, i.e., the sparse resultant-based method and propose a novel approach to convert the resultant constraint to an eigenvalue problem.
177	Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement	Chunle Guo; Chongyi Li; Jichang Guo; Chen Change Loy; Junhui Hou; Sam Kwong; Runmin Cong;	The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network.
178	BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks	Yao Yao; Zixin Luo; Shiwei Li; Jingyang Zhang; Yufan Ren; Lei Zhou; Tian Fang; Long Quan;	In this paper, we introduce BlendedMVS, a novel large-scale dataset, to provide sufficient training ground truth for learning-based MVS.
179	Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis	Zhi-Hao Lin; Sheng-Yu Huang; Yu-Chiang Frank Wang;	In this paper, we propose 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced.
180	A Semi-Supervised Assessor of Neural Architectures	Yehui Tang; Yunhe Wang; Yixing Xu; Hanting Chen; Boxin Shi; Chao Xu; Chunjing Xu; Qi Tian; Chang Xu;	In contrast with classical performance predictor optimized in a fully supervised way, this paper suggests a semi-supervised assessor of neural architectures.
181	Learning a Reinforced Agent for Flexible Exposure Bracketing Selection	Zhouxia Wang; Jiawei Zhang; Mude Lin; Jiong Wang; Ping Luo; Jimmy Ren;	Unlike previous methods that have many restrictions such as requiring camera response function, sensor noise model, and a stream of preview images with different exposures (not accessible in some scenarios e.g. mobile applications), we propose a novel deep neural network to automatically select exposure bracketing, named EBSNet, which is sufficiently flexible without having the above restrictions.
182	CARS: Continuous Evolution for Efficient Neural Architecture Search	Zhaohui Yang; Yunhe Wang; Xinghao Chen; Boxin Shi; Chao Xu; Chunjing Xu; Qi Tian; Chang Xu;	In contrast, we develop an efficient continuous evolutionary approach for searching neural networks.
183	Joint 3D Instance Segmentation and Object Detection for Autonomous Driving	Dingfu Zhou; Jin Fang; Xibin Song; Liu Liu; Junbo Yin; Yuchao Dai; Hongdong Li; Ruigang Yang;	To tackle this problem, we propose a simple but practical detection framework to jointly predict the 3D BBox and instance segmentation.
184	View-GCN: View-Based Graph Convolutional Network for 3D Shape Analysis	Xin Wei; Ruixuan Yu; Jian Sun;	In this work, we propose a novel view-based Graph Convolutional Neural Network, dubbed as view-GCN, to recognize 3D shape based on graph representation of multiple views in flexible view configurations.
185	Collaborative Distillation for Ultra-Resolution Universal Style Transfer	Huan Wang; Yijun Li; Yuehai Wang; Haoji Hu; Ming-Hsuan Yang;	In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters.
186	TomoFluid: Reconstructing Dynamic Fluid From Sparse View Videos	Guangming Zang; Ramzi Idoughi; Congli Wang; Anthony Bennett; Jianguo Du; Scott Skeen; William L. Roberts; Peter Wonka; Wolfgang Heidrich;	In this paper, we present a state-of-the-art 4D tomographic reconstruction framework that integrates several regularizers into a multi-scale matrix free optimization algorithm.
187	Instance Shadow Detection	Tianyu Wang; Xiaowei Hu; Qiong Wang; Pheng-Ann Heng; Chi-Wing Fu;	Second, we design LISA, named after Light-guided Instance Shadow-object Association, an end-to-end framework to automatically predict the shadow and object instances, together with the shadow-object associations and light direction. Then, we pair up the predicted shadow and object instances, and match them with the predicted shadow-object associations to generate the final results.
188	Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image	Yuhui Quan; Mingqin Chen; Tongyao Pang; Hui Ji;	Taking one step further, this paper proposes a self-supervised learning method which only uses the input noisy image itself for training.
189	Discrete Model Compression With Resource Constraint for Deep Neural Networks	Shangqian Gao; Feihu Huang; Jian Pei; Heng Huang;	In this paper, we target to address the problem of compression and acceleration of Convolutional Neural Networks (CNNs).
190	Structured Compression by Weight Encryption for Unstructured Pruning and Quantization	Se Jung Kwon; Dongsoo Lee; Byeongwook Kim; Parichay Kapoor; Baeseong Park; Gu-Yeon Wei;	This paper proposes a new weight representation scheme for Sparse Quantized Neural Networks, specifically achieved by fine-grained and unstructured pruning method.
191	End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds	Lei Li; Siyu Zhu; Hongbo Fu; Ping Tan; Chiew-Lan Tai;	In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds.
192	Minimal Solutions for Relative Pose With a Single Affine Correspondence	Banglei Guan; Ji Zhao; Zhang Li; Fang Sun; Friedrich Fraundorfer;	In this paper we present four cases of minimal solutions for two-view relative pose estimation by exploiting the affine transformation between feature points and we demonstrate efficient solvers for these cases.
193	Point Cloud Completion by Skip-Attention Network With Hierarchical Folding	Xin Wen; Tianyang Li; Zhizhong Han; Yu-Shen Liu;	To address this problem, we propose Skip-Attention Network (SA-Net) for 3D point cloud completion.
194	Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement	Zehao Yu; Shenghua Gao;	Towards this end, this paper presents a Fast-MVSNet, a novel sparse-to-dense coarse-to-fine framework, for fast and accurate depth estimation in MVS.
195	AANet: Adaptive Aggregation Network for Efficient Stereo Matching	Haofei Xu; Juyong Zhang;	In this paper, we aim at completely replacing the commonly used 3D convolutions to achieve fast inference speed while maintaining comparable accuracy.
196	Towards Unified INT8 Training for Convolutional Neural Network	Feng Zhu; Ruihao Gong; Fengwei Yu; Xianglong Liu; Yanfei Wang; Zhelong Li; Xiuqi Yang; Junjie Yan;	In this paper, we give an attempt to build a unified 8-bit (INT8) training framework for common convolutional neural networks from the aspects of both accuracy and speed.
197	Active 3D Motion Visualization Based on Spatiotemporal Light-Ray Integration	Fumihiko Sakaue; Jun Sato;	In this paper, we propose a method of visualizing 3D motion with zero latency.
198	Block-Wisely Supervised Neural Architecture Search With Knowledge Distillation	Changlin Li; Jiefeng Peng; Liuchun Yuan; Guangrun Wang; Xiaodan Liang; Liang Lin; Xiaojun Chang;	In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates.
199	GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet	Shan You; Tao Huang; Mingmin Yang; Fei Wang; Chen Qian; Changshui Zhang;	In this paper, instead of covering all paths, we ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data.
200	Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration	Yang He; Yuhang Ding; Ping Liu; Linchao Zhu; Hanwang Zhang; Yi Yang;	In this paper, we propose Learning Filter Pruning Criteria (LFPC) to solve the above problems.
201	DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing	Shaohui Liu; Yinda Zhang; Songyou Peng; Boxin Shi; Marc Pollefeys; Zhaopeng Cui;	We propose a differentiable sphere tracing algorithm to bridge the gap between inverse graphics methods and the recently proposed deep learning based implicit signed distance function.
202	Visually Imbalanced Stereo Matching	Yicun Liu; Jimmy Ren; Jiawei Zhang; Jianbo Liu; Mude Lin;	To avoid such collapse, we propose a solution to recover the stereopsis by a joint guided-view-restoration and stereo-reconstruction framework.
203	Mesh-Guided Multi-View Stereo With Pyramid Architecture	Yuesong Wang; Tao Guan; Zhuo Chen; Yawei Luo; Keyang Luo; Lili Ju;	To overcome this difficulty, we propose a mesh-guided MVS method with pyramid architecture, which makes use of the surface mesh obtained from coarse-scale images to guide the reconstruction process.
204	BiDet: An Efficient Binarized Object Detector	Ziwei Wang; Ziyi Wu; Jiwen Lu; Jie Zhou;	In this paper, we propose a binarized neural network learning method called BiDet for efficient object detection.
205	Local Non-Rigid Structure-From-Motion From Diffeomorphic Mappings	Shaifali Parashar; Mathieu Salzmann; Pascal Fua;	We propose a new formulation to non-rigid structure-from-motion that only requires the deforming surface to preserve its differential structure.
206	Seeing Around Street Corners: Non-Line-of-Sight Detection and Tracking In-the-Wild Using Doppler Radar	Nicolas Scheiner; Florian Kraus; Fangyin Wei; Buu Phan; Fahim Mannan; Nils Appenrodt; Werner Ritter; Jurgen Dickmann; Klaus Dietmayer; Bernhard Sick; Felix Heide;	In this work, we depart from visible-wavelength approaches and demonstrate detection, classification, and tracking of hidden objects in large-scale dynamic environments using Doppler radars that can be manufactured at low-cost in series production.
207	APQ: Joint Search for Network Architecture, Pruning and Quantization Policy	Tianzhe Wang; Kuan Wang; Han Cai; Ji Lin; Zhijian Liu; Hanrui Wang; Yujun Lin; Song Han;	We present APQ, a novel design methodology for efficient deep learning deployment.
208	On the Acceleration of Deep Learning Model Parallelism With Staleness	An Xu; Zhouyuan Huo; Heng Huang;	In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), to address these challenges.
209	RevealNet: Seeing Behind Objects in RGB-D Scans	Ji Hou; Angela Dai; Matthias Niessner;	We tackle this problem by introducing RevealNet, a new data-driven approach that jointly detects object instances and predicts their complete geometry.
210	MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning	Peiye Liu; Bo Wu; Huadong Ma; Mingoo Seok;	To address this challenge, we propose MemNAS, a novel growing and trimming based neural architecture search framework that optimizes not only performance but also memory requirement of an inference network.
211	StegaStamp: Invisible Hyperlinks in Physical Photographs	Matthew Tancik; Ben Mildenhall; Ren Ng;	Our key technical contribution is StegaStamp, a learned steganographic algorithm to enable robust encoding and decoding of arbitrary hyperlink bitstrings into photos in a manner that approaches perceptual invisibility.
212	L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks	Yuning You; Tianlong Chen; Zhangyang Wang; Yang Shen;	In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities.
213	Polarized Non-Line-of-Sight Imaging	Kenichiro Tanaka; Yasuhiro Mukaigawa; Achuta Kadambi;	This paper presents a method of passive non-line-of-sight (NLOS) imaging using polarization cues.
214	AdaBits: Neural Network Quantization With Adaptive Bit-Widths	Qing Jin; Linjie Yang; Zhenyu Liao;	In this paper, we investigate a novel option to achieve this goal by enabling adaptive bit-widths of weights and activations in the model.
215	Multi-Scale Boosted Dehazing Network With Dense Feature Fusion	Hang Dong; Jinshan Pan; Lei Xiang; Zhe Hu; Xinyi Zhang; Fei Wang; Ming-Hsuan Yang;	In this paper, we propose a Multi-Scale Boosted Dehazing Network with Dense Feature Fusion based on the U-Net architecture.
216	ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings	Jiahui Huang; Sheng Yang; Tai-Jiang Mu; Shi-Min Hu;	We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects.
217	Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach	Haichuan Yang; Shupeng Gui; Yuhao Zhu; Ji Liu;	In this paper, we propose a framework to jointly prune and quantize the DNNs automatically according to a target model size without using any hyper-parameters to manually set the compression ratio for each layer.
218	Normal Assisted Stereo Depth Estimation	Uday Kusupati; Shuo Cheng; Rui Chen; Hao Su;	In this paper, we study how to enforce the consistency between surface normal and depth at training time to improve the performance.
219	Fusing Wearable IMUs With Multi-View Images for Human Pose Estimation: A Geometric Approach	Zhe Zhang; Chunyu Wang; Wenhu Qin; Wenjun Zeng;	We present a geometric approach to reinforce the visual features of each pair of joints based on the IMUs.
220	gDLS*: Generalized Pose-and-Scale Estimation Given Scale and Gravity Priors	Victor Fragoso; Joseph DeGol; Gang Hua;	We present gDLS*, a generalized-camera-model pose-and-scale estimator that utilizes rotation and scale priors.
221	Embodied Language Grounding With 3D Visual Feature Representations	Mihir Prabhudesai; Hsiao-Yu Fish Tung; Syed Ashar Javed; Maximilian Sieb; Adam W. Harley; Katerina Fragkiadaki;	We present generative models that condition on the dependency tree of an utterance and generate a corresponding visual 3D feature map as well as reason about its plausibility, and detector models that condition on both the dependency tree of an utterance and a related image and localize the object referents in the 3D feature map inferred from the image.
222	Learning to Autofocus	Charles Herrmann; Richard Strong Bowen; Neal Wadhwa; Rahul Garg; Qiurui He; Jonathan T. Barron; Ramin Zabih;	We propose a learning-based approach to this problem, and provide a realistic dataset of sufficient size for effective learning.
223	Joint Demosaicing and Denoising With Self Guidance	Lin Liu; Xu Jia; Jianzhuang Liu; Qi Tian;	In this paper, we propose a self-guidance network (SGNet), where the green channels are initially estimated and then works as a guidance to recover all missing values in the input image.
224	Forward and Backward Information Retention for Accurate Binary Neural Networks	Haotong Qin; Ruihao Gong; Xianglong Liu; Mingzhu Shen; Ziran Wei; Fengwei Yu; Jingkuan Song;	To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients.
225	Light Field Spatial Super-Resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization	Jing Jin; Junhui Hou; Jie Chen; Sam Kwong;	In this paper, we propose a novel learning-based LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding.
226	A Multi-Hypothesis Approach to Color Constancy	Daniel Hernandez-Juarez; Sarah Parisot; Benjamin Busam; Ales Leonardis; Gregory Slabaugh; Steven McDonagh;	We propose a Bayesian framework that naturally handles color constancy ambiguity via a multi-hypothesis strategy.
227	Learning to Restore Low-Light Images via Decomposition-and-Enhancement	Ke Xu; Xin Yang; Baocai Yin; Rynson W.H. Lau;	Based on this model, we present a novel network that first learns to recover image objects in the low-frequency layer and then enhances high-frequency details based on the recovered image objects.
228	Background Matting: The World Is Your Green Screen	Soumyadip Sengupta; Vivek Jayaram; Brian Curless; Steven M. Seitz; Ira Kemelmacher-Shlizerman;	We propose a method for creating a matte – the per-pixel foreground color and alpha – of a person by taking photos or videos in an everyday setting with a handheld camera.
229	Supervised Raw Video Denoising With a Benchmark Dataset on Dynamic Scenes	Huanjing Yue; Cong Cao; Lei Liao; Ronghe Chu; Jingyu Yang;	In this paper, we solve this problem by creating motions for controllable objects, such as toys, and capturing each static moment for multiple times to generate clean video frames.
230	Photometric Stereo via Discrete Hypothesis-and-Test Search	Kenji Enomoto; Michael Waechter; Kiriakos N. Kutulakos; Yasuyuki Matsushita;	In this paper, we consider the problem of estimating surface normals of a scene with spatially varying, general BRDFs observed by a static camera under varying, known, distant illumination.
231	Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference	Thomas Verelst; Tinne Tuytelaars;	To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image.
232	Fixed-Point Back-Propagation Training	Xishan Zhang; Shaoli Liu; Rui Zhang; Chang Liu; Di Huang; Shiyi Zhou; Jiaming Guo; Qi Guo; Zidong Du; Tian Zhi; Yunji Chen;	In this paper, we propose a novel training approach, which applies a layer-wise precision-adaptive quantization in deep neural networks.
233	Heterogeneous Knowledge Distillation Using Information Flow Modeling	Nikolaos Passalis; Maria Tzelepi; Anastasios Tefas;	In this paper we propose a novel KD method that works by modeling the information flow through the various layers of the teacher model and then train a student model to mimic this information flow.
234	Rethinking Differentiable Search for Mixed-Precision Neural Networks	Zhaowei Cai; Nuno Vasconcelos;	In this work, the problem of optimal mixed-precision network search (MPS) is considered.
235	Residual Feature Aggregation Network for Image Super-Resolution	Jie Liu; Wenjie Zhang; Yuting Tang; Jie Tang; Gangshan Wu;	To address this issue, we propose a novel residual feature aggregation (RFA) framework for more efficient feature extraction.
236	Resolution Adaptive Networks for Efficient Inference	Le Yang; Yizeng Han; Xi Chen; Shiji Song; Jifeng Dai; Gao Huang;	In this paper, we focus on spatial redundancy of input samples and propose a novel Resolution Adaptive Network (RANet), which is inspired by the intuition that low-resolution representations are sufficient for classifying "easy" inputs containing large objects with prototypical features, while only some "hard" samples need spatially detailed information.
237	Learning to Forget for Meta-Learning	Sungyong Baik; Seokil Hong; Kyoung Mu Lee;	Thus, we propose task-and-layer-wise attenuation on the compromised initialization to reduce its influence.
238	Deep Learning for Handling Kernel/model Uncertainty in Image Deconvolution	Yuesong Nan; Hui Ji;	Based on an error-in-variable (EIV) model of image blurring that takes kernel error into consideration, this paper presents a deep learning method for deconvolution, which unrolls a total-least-squares (TLS) estimator whose relating priors are learned by neural networks (NNs).
239	Reflection Scene Separation From a Single Image	Renjie Wan; Boxin Shi; Haoliang Li; Ling-Yu Duan; Alex C. Kot;	In this paper, instead of removing reflection components from the mixture image, we aim at recovering reflection scenes from the mixture image.
240	Wavelet Synthesis Net for Disparity Estimation to Synthesize DSLR Calibre Bokeh Effect on Smartphones	Chenchi Luo; Yingmao Li; Kaimo Lin; George Chen; Seok-Jun Lee; Jihwan Choi; Youngjun Francis Yoo; Michael O. Polley;	Empowered by a novel wavelet synthesis network architecture, we have greatly narrowed the gap between DSLR and smartphone camera in terms of the bokeh more than ever before.
241	Bundle Adjustment on a Graph Processor	Joseph Ortiz; Mark Pupilli; Stefan Leutenegger; Andrew J. Davison;	We show for the first time that the classical computer vision problem of bundle adjustment (BA) can be solved extremely fast on a graph processor using Gaussian Belief Propagation.
242	3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset	Malte Pedersen; Joakim Bruslund Haurum; Stefan Hein Bengtson; Thomas B. Moeslund;	In this work we present a novel publicly available stereo based 3D RGB dataset for multi-object zebrafish tracking, called 3D-ZeF.
243	PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models	Sachit Menon; Alexandru Damian; Shijia Hu; Nikhil Ravi; Cynthia Rudin;	We present a novel super-resolution algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature.
244	Scalability in Perception for Autonomous Driving: Waymo Open Dataset	Pei Sun; Henrik Kretzschmar; Xerxes Dotiwalla; Aurelien Chouard; Vijaysai Patnaik; Paul Tsui; James Guo; Yin Zhou; Yuning Chai; Benjamin Caine; Vijay Vasudevan; Wei Han; Jiquan Ngiam; Hang Zhao; Aleksei Timofeev; Scott Ettinger; Maxim Krivokon; Amy Gao; Aditya Joshi; Yu Zhang; Jonathon Shlens; Zhifeng Chen; Dragomir Anguelov;	In an effort to help align the research community’s contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset.
245	Extreme Relative Pose Network Under Hybrid Representations	Zhenpei Yang; Siming Yan; Qixing Huang;	In this paper, we introduce a novel RGB-D based relative pose estimation approach that is suitable for small-overlapping or non-overlapping scans and can output multiple relative poses.
246	Single-Shot Monocular RGB-D Imaging Using Uneven Double Refraction	Andreas Meuleman; Seung-Hwan Baek; Felix Heide; Min H. Kim;	In this work, we propose a method for monocular single-shot RGB-D imaging.
247	Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image	Zhengqin Li; Mohammad Shafiei; Ravi Ramamoorthi; Kalyan Sunkavalli; Manmohan Chandraker;	We propose a deep inverse rendering framework for indoor scenes.
248	3D Packing for Self-Supervised Monocular Depth Estimation	Vitor Guizilini; Rares Ambrus; Sudeep Pillai; Allan Raventos; Adrien Gaidon;	In this work, we propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos.
249	Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching	Xiaodong Gu; Zhiwen Fan; Siyu Zhu; Zuozhuo Dai; Feitong Tan; Ping Tan;	In this paper, we propose a both memory and time efficient cost volume formulation that is complementary to existing multi-view stereo and stereo matching approaches based on 3D cost volumes.
250	From Two Rolling Shutters to One Global Shutter	Cenek Albl; Zuzana Kukelova; Viktor Larsson; Michal Polic; Tomas Pajdla; Konrad Schindler;	We explore a surprisingly simple camera configuration that makes it possible to undo the rolling shutter distortion: two cameras mounted to have different rolling shutter directions.
251	Deep Global Registration	Christopher Choy; Wei Dong; Vladlen Koltun;	We present Deep Global Registration, a differentiable framework for pairwise registration of real-world 3D scans.
252	Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness	Shuo Cheng; Zexiang Xu; Shilin Zhu; Zhuwen Li; Li Erran Li; Ravi Ramamoorthi; Hao Su;	We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images.
253	Why Having 10,000 Parameters in Your Camera Model Is Better Than Twelve	Thomas Schops; Viktor Larsson; Marc Pollefeys; Torsten Sattler;	We propose a calibration pipeline for generic models that is fully automated, easy to use, and can act as a drop-in replacement for parametric calibration, with a focus on accuracy.
254	Blur Aware Calibration of Multi-Focus Plenoptic Camera	Mathieu Labussiere; Celine Teuliere; Frederic Bernardin; Omar Ait-Aider;	This paper presents a novel calibration algorithm for Multi-Focus Plenoptic Cameras (MFPCs) using raw images only.
255	Learning Fused Pixel and Feature-Based View Reconstructions for Light Fields	Jinglei Shi; Xiaoran Jiang; Christine Guillemot;	In this paper, we present a learning-based framework for light field view synthesis from a subset of input views.
256	SAL: Sign Agnostic Learning of Shapes From Raw Data	Matan Atzmon; Yaron Lipman;	In this paper we introduce Sign Agnostic Learning (SAL), a deep learning approach for learning implicit shape representations directly from raw, unsigned geometric data, such as point clouds and triangle soups.
257	Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval	Tobias Weyand; Andre Araujo; Bingyi Cao; Jack Sim;	We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks.
258	Instance Guided Proposal Network for Person Search	Wenkai Dong; Zhaoxiang Zhang; Chunfeng Song; Tieniu Tan;	In this paper, we propose a new detection network for person search, named Instance Guided Proposal Network (IGPN), which can learn the similarity between query persons and proposals.
259	Which Is Plagiarism: Fashion Image Retrieval Based on Regional Representation for Design Protection	Yining Lang; Yuan He; Fan Yang; Jianfeng Dong; Hui Xue;	Different from the existing works that mainly focus on identical or similar fashion item retrieval, in this paper, we aim to study the plagiarized clothes retrieval which is somewhat ignored in the academic community while itself has great application value.
260	Inter-Task Association Critic for Cross-Resolution Person Re-Identification	Zhiyi Cheng; Qi Dong; Shaogang Gong; Xiatian Zhu;	In this paper, we introduce a novel model training regularisation method, called Inter-Task Association Critic (INTACT), to address this fundamental problem.
261	FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding	Dian Shao; Yue Zhao; Bo Dai; Dahua Lin;	To take action recognition to a new level, we develop FineGym, a new dataset built on top of gymnasium videos.
262	Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition	Frederik Warburg; Soren Hauberg; Manuel Lopez-Antequera; Pau Gargallo; Yubin Kuang; Javier Civera;	We contribute with Mapillary Street-Level Sequences (SLS), a large dataset for urban and suburban place recognition from image sequences.
263	BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning	Fisher Yu; Haofeng Chen; Xin Wang; Wenqi Xian; Yingying Chen; Fangchen Liu; Vashisht Madhavan; Trevor Darrell;	We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving.
264	Rethinking Computer-Aided Tuberculosis Diagnosis	Yun Liu; Yu-Huan Wu; Yunfeng Ban; Huifang Wang; Ming-Ming Cheng;	To solve this problem, we establish a large-scale TB dataset, namely Tuberculosis X-ray (TBX11K) dataset.
265	IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning	Xi Yang; Ding Xia; Taichi Kin; Takeo Igarashi;	In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available.
266	Revisiting Saliency Metrics: Farthest-Neighbor Area Under Curve	Sen Jia; Neil D. B. Bruce;	In this paper, we propose a new metric to address the long-standing problem of center bias in saliency evaluation.
267	Computing the Testing Error Without a Testing Set	Ciprian A. Corneanu; Sergio Escalera; Aleix M. Martinez;	Here, we derive an algorithm to estimate the performance gap between training and testing without the need of a testing dataset.
268	Improving Confidence Estimates for Unfamiliar Examples	Zhizhong Li; Derek Hoiem;	In this paper, we compare and evaluate several methods to improve confidence estimates for unfamiliar and familiar samples.
269	CycleISP: Real Image Restoration via Improved Data Synthesis	Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; Ling Shao;	In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions.
270	Enhanced Blind Face Restoration With Multi-Exemplar Images and Adaptive Spatial Feature Fusion	Xiaoming Li; Wenyu Li; Dongwei Ren; Hongzhi Zhang; Meng Wang; Wangmeng Zuo;	To address these issues, this paper suggests to enhance blind face restoration performance by utilizing multi-exemplar images and adaptive fusion of features from guidance and degraded images.
271	Explorable Super Resolution	Yuval Bahat; Tomer Michaeli;	In this paper, we introduce the task of explorable super resolution.
272	Syn2Real Transfer Learning for Image Deraining Using Gaussian Processes	Rajeev Yasarla; Vishwanath A. Sindagi; Vishal M. Patel;	We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset while generalizing better using unlabeled real-world images.
273	Deblurring by Realistic Blurring	Kaihao Zhang; Wenhan Luo; Yiran Zhong; Lin Ma; Bjorn Stenger; Wei Liu; Hongdong Li;	To address this problem, we propose a new method which combines two GAN models, i.e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images.
274	Bringing Old Photos Back to Life	Ziyu Wan; Bo Zhang; Dongdong Chen; Pan Zhang; Dong Chen; Jing Liao; Fang Wen;	We propose to restore old photos that suffer from severe degradation through a deep learning approach.
275	A Physics-Based Noise Formation Model for Extreme Low-Light Raw Denoising	Kaixuan Wei; Ying Fu; Jiaolong Yang; Hua Huang;	To address this issue, we present a highly accurate noise formation model based on the characteristics of CMOS photosensors, thereby enabling us to synthesize realistic samples that better match the physics of image formation process.
276	Camouflaged Object Detection	Deng-Ping Fan; Ge-Peng Ji; Guolei Sun; Ming-Ming Cheng; Jianbing Shen; Ling Shao;	We present a comprehensive study on a new task named camouflaged object detection (COD), which aims to identify objects that are "seamlessly" embedded in their surroundings.
277	Holistically-Attracted Wireframe Parsing	Nan Xue; Tianfu Wu; Song Bai; Fudong Wang; Gui-Song Xia; Liangpei Zhang; Philip H.S. Torr;	This paper presents a fast and parsimonious parsing method to accurately and robustly detect a vectorized wireframe in an input image with a single forward pass.
278	Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction	Fuyang Zhang; Nelson Nauata; Yasutaka Furukawa;	This paper proposes a novel message passing neural (MPN) architecture Conv-MPN, which reconstructs an outdoor building as a planar graph from a single RGB image.
279	Domain Adaptation for Image Dehazing	Yuanjie Shao; Lerenhan Li; Wenqi Ren; Changxin Gao; Nong Sang;	To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules.
280	Auto-Encoding Twin-Bottleneck Hashing	Yuming Shen; Jie Qin; Jiaxin Chen; Mengyang Yu; Li Liu; Fan Zhu; Fumin Shen; Ling Shao;	In this paper, we tackle the above problems by proposing an efficient and adaptive code-driven graph, which is updated by decoding in the context of an auto-encoder.
281	Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis	Mang Tik Chiu; Xingqian Xu; Yunchao Wei; Zilong Huang; Alexander G. Schwing; Robert Brunner; Hrant Khachatrian; Hovnatan Karapetyan; Ivan Dozier; Greg Rose; David Wilson; Adrian Tudor; Naira Hovakimyan; Thomas S. Huang; Honghui Shi;	To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.
282	Bi-Directional Interaction Network for Person Search	Wenkai Dong; Zhaoxiang Zhang; Chunfeng Song; Tieniu Tan;	To address this issue, we propose a Siamese network which owns an additional instance-aware branch, named Bi-directional Interaction Network (BINet).
283	Meshlet Priors for 3D Mesh Reconstruction	Abhishek Badki; Orazio Gallo; Jan Kautz; Pradeep Sen;	We introduce meshlets, small patches of mesh that we use to learn local shape priors.
284	Space-Time-Aware Multi-Resolution Video Enhancement	Muhammad Haris; Greg Shakhnarovich; Norimichi Ukita;	We consider the problem of space-time super-resolution (ST-SR): increasing spatial resolution of video frames and simultaneously interpolating frames to increase the frame rate.
285	FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation	Xiang Li; Tianhan Wei; Yau Pun Chen; Yu-Wing Tai; Chi-Keung Tang;	In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation.
286	MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation	John Lambert; Zhuang Liu; Ozan Sener; James Hays; Vladlen Koltun;	We present MSeg, a composite dataset that unifies se- mantic segmentation datasets from different domains.
287	Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification	Yichao Yan; Jie Qin; Jiaxin Chen; Li Liu; Fan Zhu; Ying Tai; Ling Shao;	In this work, we propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH), to pursue better representational capabilities by modeling spatiotemporal dependencies in terms of multiple granularities.
288	Online Joint Multi-Metric Adaptation From Frequent Sharing-Subset Mining for Person Re-Identification	Jiahuan Zhou; Bing Su; Ying Wu;	Therefore, we propose an online joint multi-metric adaptation model to adapt the offline learned P-RID models for the online data by learning a series of metrics for all the sharing-subsets.
289	Taking a Deeper Look at Co-Salient Object Detection	Deng-Ping Fan; Zheng Lin; Ge-Peng Ji; Dingwen Zhang; Huazhu Fu; Ming-Ming Cheng;	To tackle this issue, we first collect a new high-quality dataset, named CoSOD3k, which contains 3,316 images divided into 160 groups with multiple level annotations, i.e., category, bounding box, object, and instance levels.
290	Single-Stage 6D Object Pose Estimation	Yinlin Hu; Pascal Fua; Wei Wang; Mathieu Salzmann;	In this work, we introduce a deep architecture that directly regresses 6D poses from correspondences.
291	OccuSeg: Occupancy-Aware 3D Instance Segmentation	Lei Han; Tian Zheng; Lan Xu; Lu Fang;	In this paper, we define “3D occupancy size”, as the number of voxels occupied by each instance. It owns advantages of robustness in prediction, on which basis, OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed.
292	Camera Trace Erasing	Chang Chen; Zhiwei Xiong; Xiaoming Liu; Feng Wu;	In this paper, we address a new low-level vision problem, camera trace erasing, to reveal the weakness of trace-based forensic methods.
293	Deep Metric Learning via Adaptive Learnable Assessment	Wenzhao Zheng; Jiwen Lu; Jie Zhou;	In this paper, we propose a deep metric learning via adaptive learnable assessment (DML-ALA) method for image retrieval and clustering, which aims to learn a sample assessment strategy to maximize the generalization of the trained metric.
294	Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective	Jialun Liu; Yifan Sun; Chuchu Han; Zhaopeng Dou; Wenhui Li;	To this end, we propose to augment each instance of the tail classes with certain disturbances in the deep feature space.
295	Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention	Ming Jiang; Shi Chen; Jinhui Yang; Qi Zhao;	Specifically, we introduce the first dataset of top-down attention in immersive scenes.
296	HUMBI: A Large Multiview Dataset of Human Body Expressions	Zhixuan Yu; Jae Shin Yoon; In Kyu Lee; Prashanth Venkatesh; Jaesik Park; Jihun Yu; Hyun Soo Park;	This paper presents a new large multiview dataset called HUMBI for human body expressions with natural clothing.
297	Image Search With Text Feedback by Visiolinguistic Attention Learning	Yanbei Chen; Shaogang Gong; Loris Bazzani;	In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework.
298	Image Processing Using Multi-Code GAN Prior	Jinjin Gu; Yujun Shen; Bolei Zhou;	In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks.
299	What Does Plate Glass Reveal About Camera Calibration?	Qian Zheng; Jinnan Chen; Zhan Lu; Boxin Shi; Xudong Jiang; Kim-Hui Yap; Ling-Yu Duan; Alex C. Kot;	This paper aims to calibrate the orientation of glass and the field of view of the camera from a single reflection-contaminated image. We collect a dataset containing 320 samples as well as their camera parameters for evaluation.
300	Zero-Assignment Constraint for Graph Matching With Outliers	Fudong Wang; Nan Xue; Jin-Gang Yu; Gui-Song Xia;	To address this issue, we present the zero-assignment constraint (ZAC) for approaching the graph matching problem in the presence of outliers.
301	Cascaded Deep Video Deblurring Using Temporal Sharpness Prior	Jinshan Pan; Haoran Bai; Jinhui Tang;	We present a simple and effective deep convolutional neural network (CNN) model for video deblurring.
302	JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection	Keren Fu; Deng-Ping Fan; Ge-Peng Ji; Qijun Zhao;	This paper proposes a novel joint learning and densely-cooperative fusion (JL-DCF) architecture for RGB-D salient object detection.
303	From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement	Wenhan Yang; Shiqi Wang; Yuming Fang; Yue Wang; Jiaying Liu;	To address these problems, we propose a novel semi-supervised learning approach for low-light image enhancement.
304	Unsupervised Adaptation Learning for Hyperspectral Imagery Super-Resolution	Lei Zhang; Jiangtao Nie; Wei Wei; Yanning Zhang; Shengcai Liao; Ling Shao;	To tackle this problem, we present an unsupervised adaptation learning (UAL) framework.
305	Central Similarity Quantization for Efficient Image and Video Retrieval	Li Yuan; Tao Wang; Xiaopeng Zhang; Francis EH Tay; Zequn Jie; Wei Liu; Jiashi Feng;	In this work, we propose a new global similarity metric, termed as central similarity, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy.
306	ARCH: Animatable Reconstruction of Clothed Humans	Zeng Huang; Yuanlu Xu; Christoph Lassner; Hao Li; Tony Tung;	In this paper, we propose ARCH (Animatable Reconstruction of Clothed Humans), a novel end-to-end framework for accurate reconstruction of animation-ready 3D clothed humans from a monocular image.
307	A Model-Driven Deep Neural Network for Single Image Rain Removal	Hong Wang; Qi Xie; Qian Zhao; Deyu Meng;	To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable network structures.
308	Novel Object Viewpoint Estimation Through Reconstruction Alignment	Mohamed El Banani; Jason J. Corso; David F. Fouhey;	The goal of this paper is to estimate the viewpoint for a novel object.
309	Creating Something From Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing	Hengtong Hu; Lingxi Xie; Richang Hong; Qi Tian;	In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method.
310	Evaluating Weakly Supervised Object Localization Methods Right	Junsuk Choe; Seong Joon Oh; Seungho Lee; Sanghyuk Chun; Zeynep Akata; Hyunjung Shim;	In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
311	Style Normalization and Restitution for Generalizable Person Re-Identification	Xin Jin; Cuiling Lan; Wenjun Zeng; Zhibo Chen; Li Zhang;	In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains.
312	Reconstruct Locally, Localize Globally: A Model Free Method for Object Pose Estimation	Ming Cai; Ian Reid;	Instead, we propose a learning-based method whose input is a collection of images of a target object, and whose output is the pose of the object in a novel view.
313	RoboTHOR: An Open Simulation-to-Real Embodied AI Platform	Matt Deitke; Winson Han; Alvaro Herrasti; Aniruddha Kembhavi; Eric Kolve; Roozbeh Mottaghi; Jordi Salvador; Dustin Schwenk; Eli VanderBilt; Matthew Wallingford; Luca Weihs; Mark Yatskar; Ali Farhadi;	In this paper, we introduce RoboTHOR to democratize research in interactive and embodied visual AI.
314	All in One Bad Weather Removal Using Architectural Search	Ruoteng Li; Robby T. Tan; Loong-Fah Cheong;	In this paper, we propose a method that can handle multiple bad weather degradations: rain, fog, snow and adherent raindrops using a single network.
315	Relation-Aware Global Attention for Person Re-Identification	Zhizheng Zhang; Cuiling Lan; Wenjun Zeng; Xin Jin; Zhibo Chen;	In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning.
316	HOnnotate: A Method for 3D Annotation of Hand and Object Poses	Shreyas Hampali; Mahdi Rad; Markus Oberweger; Vincent Lepetit;	We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method.
317	Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics	Yuezun Li; Xin Yang; Pu Sun; Honggang Qi; Siwei Lyu;	We present a new large-scale challenging DeepFake video dataset, Celeb-DF, which contains 5,639 high-quality DeepFake videos of celebrities generated using improved synthesis process.
318	Deep Unfolding Network for Image Super-Resolution	Kai Zhang; Luc Van Gool; Radu Timofte;	To address this issue, this paper proposes an end-to-end trainable unfolding network which leverages both learningbased methods and model-based methods.
319	On the Uncertainty of Self-Supervised Monocular Depth Estimation	Matteo Poggi; Filippo Aleotti; Fabio Tosi; Stefano Mattoccia;	Purposely, we explore for the first time how to estimate the uncertainty for this task and how this affects depth accuracy, proposing a novel peculiar technique specifically designed for self-supervised approaches.
320	Proxy Anchor Loss for Deep Metric Learning	Sungyeon Kim; Dongwon Kim; Minsu Cho; Suha Kwak;	This paper presents a new proxy-based loss that takes advantages of both pair- and proxy-based methods and overcomes their limitations.
321	Unsupervised Learning for Intrinsic Image Decomposition From a Single Image	Yunfei Liu; Yu Li; Shaodi You; Feng Lu;	In this paper, we propose a novel unsupervised intrinsic image decomposition framework, which relies on neither labeled training data nor hand-crafted priors.
322	Multi-Domain Learning for Accurate and Few-Shot Color Constancy	Jin Xiao; Shuhang Gu; Lei Zhang;	In this paper, we start a pioneer work to introduce multi-domain learning to color constancy area.
323	PANDA: A Gigapixel-Level Human-Centric Video Dataset	Xueyang Wang; Xiya Zhang; Yinheng Zhu; Yuchen Guo; Xiaoyun Yuan; Liuyu Xiang; Zerun Wang; Guiguang Ding; David Brady; Qionghai Dai; Lu Fang;	We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis.
324	Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS	Long Chen; Haizhou Ai; Rui Chen; Zijie Zhuang; Shuang Liu;	In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views.
325	Spatial-Temporal Graph Convolutional Network for Video-Based Person Re-Identification	Jinrui Yang; Wei-Shi Zheng; Qize Yang; Ying-Cong Chen; Qi Tian;	In this work, we propose a novel Spatial-Temporal Graph Convolutional Network (STGCN) to solve these problems.
326	Salience-Guided Cascaded Suppression Network for Person Re-Identification	Xuesong Chen; Canmiao Fu; Yong Zhao; Feng Zheng; Jingkuan Song; Rongrong Ji; Yi Yang;	To handle this limitation, we propose a novel Salience-guided Cascaded Suppression Network (SCSN) which enables the model to mine diverse salient features and integrate these features into the final representation by a cascaded manner.
327	Fashion Outfit Complementary Item Retrieval	Yen-Liang Lin; Son Tran; Larry S. Davis;	We propose a new framework for outfit complementary item retrieval.
328	Learning Event-Based Motion Deblurring	Zhe Jiang; Yu Zhang; Dongqing Zou; Jimmy Ren; Jiancheng Lv; Yebin Liu;	In this paper, we start from a sequential formulation of event-based motion deblurring, then show how its optimization can be unfolded with a novel end-toend deep architecture.
329	Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation	Yunhan Zhao; Shu Kong; Daeyun Shin; Charless Fowlkes;	Based on these observations, we develop an attention module that learns to identify and remove difficult out-of-domain regions in real images in order to improve depth prediction for a model trained primarily on synthetic data.
330	Neural Blind Deconvolution Using Deep Priors	Dongwei Ren; Kai Zhang; Qilong Wang; Qinghua Hu; Wangmeng Zuo;	To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution.
331	Anisotropic Convolutional Networks for 3D Semantic Scene Completion	Jie Li; Kai Han; Peng Wang; Yu Liu; Xia Yuan;	To handle such variations, we propose a novel module called anisotropic convolution, which properties with flexibility and power impossible for the competing methods such as standard 3D convolution and some of its variations.
332	TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution	Yapeng Tian; Yulun Zhang; Yun Fu; Chenliang Xu;	To overcome the limitation, we propose a temporally-deformable alignment network (TDAN) to adaptively align the reference frame and each supporting frame at the feature level without computing optical flow.
333	Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution	Xiaoyu Xiang; Yapeng Tian; Yulun Zhang; Yun Fu; Jan P. Allebach; Chenliang Xu;	In this paper, we explore the space-time video super-resolution task, which aims to generate a high-resolution (HR) slow-motion video from a low frame rate (LFR), low-resolution (LR) video.
334	Fast MSER	Hailiang Xu; Siqi Xie; Fan Chen;	In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2.
335	Unsupervised Person Re-Identification via Softened Similarity Learning	Yutian Lin; Lingxi Xie; Yu Wu; Chenggang Yan; Qi Tian;	In this paper, we follow the iterative training mechanism but discard clustering, since it incurs loss from hard quantization, yet its only product, image-level similarity, can be easily replaced by pairwise computation and a softened classification task.
336	COCAS: A Large-Scale Clothes Changing Person Dataset for Re-Identification	Shijie Yu; Shihua Li; Dapeng Chen; Rui Zhao; Junjie Yan; Yu Qiao;	To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named Clothes Changing Person Set (COCAS), which provides multiple images of the same identity with different clothes.
337	Learning Formation of Physically-Based Face Attributes	Ruilong Li; Karl Bladin; Yajie Zhao; Chinmay Chinara; Owen Ingraham; Pengda Xiang; Xinglei Ren; Pratusha Prasad; Bipin Kishore; Jun Xing; Hao Li;	Based on a combined data set of 4000 high resolution facial scans, we introduce a non-linear morphable face model, capable of producing multifarious face geometry of pore-level resolution, coupled with material attributes for use in physically-based rendering.
338	Generalized Product Quantization Network for Semi-Supervised Image Retrieval	Young Kyun Jang; Nam Ik Cho;	To resolve this issue, we propose the first quantization-based semi-supervised image retrieval scheme: Generalized Product Quantization (GPQ) network.
339	Stereoscopic Flash and No-Flash Photography for Shape and Albedo Recovery	Xu Cao; Michael Waechter; Boxin Shi; Ye Gao; Bo Zheng; Yasuyuki Matsushita;	We present a minimal imaging setup that harnesses both geometric and photometric approaches for shape and albedo recovery.
340	Context-Aware Group Captioning via Self-Attention and Contrastive Features	Zhuowan Li; Quan Tran; Long Mai; Zhe Lin; Alan L. Yuille;	To solve this problem, we propose a framework combining self-attention mechanism with contrastive feature construction to effectively summarize common information from each image group while capturing discriminative information between them.
341	MEBOW: Monocular Estimation of Body Orientation in the Wild	Chenyan Wu; Yukun Chen; Jiajia Luo; Che-Chun Su; Anuja Dawane; Bikramjot Hanzra; Zhuo Deng; Bilan Liu; James Z. Wang; Cheng-hao Kuo;	We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image.
342	Distilling Image Dehazing With Heterogeneous Task Imitation	Ming Hong; Yuan Xie; Cuihua Li; Yanyun Qu;	In this paper, we propose a knowledge-distill dehazing network which distills image dehazing with the heterogeneous task imitation.
343	Select, Supplement and Focus for RGB-D Saliency Detection	Miao Zhang; Weisong Ren; Yongri Piao; Zhengkun Rong; Huchuan Lu;	In this paper, we propose a new framework for accurate RGB-D saliency detection taking account of local and global complementarities from two modalities.
344	Transfer Learning From Synthetic to Real-Noise Denoising With Adaptive Instance Normalization	Yoonsik Kim; Jae Woong Soh; Gu Yong Park; Nam Ik Cho;	In order to cope with various and complex real-noise, we propose a well-generalized denoising architecture and a transfer learning scheme.
345	On Joint Estimation of Pose, Geometry and svBRDF From a Handheld Scanner	Carolin Schmitt; Simon Donne; Gernot Riegler; Vladlen Koltun; Andreas Geiger;	We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF.
346	Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision	Michael Niemeyer; Lars Mescheder; Michael Oechsle; Andreas Geiger;	In this work, we propose a differentiable rendering formulation for implicit shape and texture representations.
347	Meta-Transfer Learning for Zero-Shot Super-Resolution	Jae Woong Soh; Sunwoo Cho; Nam Ik Cho;	In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR.
348	Solving Jigsaw Puzzles With Eroded Boundaries	Dov Bridger; Dov Danon; Ayellet Tal;	This paper focuses on a specific variant of the problem–solving puzzles with eroded boundaries.
349	Context-Aware Attention Network for Image-Text Retrieval	Qi Zhang; Zhen Lei; Zhaoxiang Zhang; Stan Z. Li;	In this work, we propose a unified Context-Aware Attention Network (CAAN), which selectively focuses on critical local fragments (regions and words) by aggregating the global context.
350	M-LVC: Multiple Frames Prediction for Learned Video Compression	Jianping Lin; Dong Liu; Houqiang Li; Feng Wu;	We propose an end-to-end learned video compression scheme for low-latency scenarios.
351	Efficient Dynamic Scene Deblurring Using Spatially Variant Deconvolution Network With Optical Flow Guided Training	Yuan Yuan; Wei Su; Dandan Ma;	In this paper, we start from the deblurring deconvolution operation, then design an effective and real-time deblurring network.
352	Single Image Reflection Removal Through Cascaded Refinement	Chao Li; Yixiao Yang; Kun He; Stephen Lin; John E. Hopcroft;	Inspired by iterative structure reduction for hidden community detection in social networks, we propose an Iterative Boost Convolutional LSTM Network (IBCLN) that enables cascaded prediction for reflection removal.
353	From Patches to Pictures (PaQ-2-PiQ): Mapping the Perceptual Space of Picture Quality	Zhenqiang Ying; Haoran Niu; Praful Gupta; Dhruv Mahajan; Deepti Ghadiyaram; Alan Bovik;	To advance progress on this problem, we introduce the largest (by far) subjective picture quality database, containing about 40, 000 real-world distorted pictures and 120, 000 patches, on which we collected about 4M human judgments of picture quality.
354	Video to Events: Recycling Video Datasets for Event Cameras	Daniel Gehrig; Mathias Gehrig; Javier Hidalgo-Carrio; Davide Scaramuzza;	In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data.
355	Composed Query Image Retrieval Using Locally Bounded Features	Mehrdad Hosseinzadeh; Yang Wang;	In this paper, we propose a novel method that represents the image using a set of local areas in the image.
356	Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring	Maitreya Suin; Kuldeep Purohit; A. N. Rajagopalan;	In this work, we propose an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations and process each test image adaptively.
357	End-to-End Illuminant Estimation Based on Deep Metric Learning	Bolei Xu; Jingxin Liu; Xianxu Hou; Bozhi Liu; Guoping Qiu;	To overcome this problem, we introduce a deep metric learning approach named Illuminant-Guided Triplet Network (IGTN) to color constancy.
358	Variational-EM-Based Deep Learning for Noise-Blind Image Deblurring	Yuesong Nan; Yuhui Quan; Hui Ji;	This paper aims at developing a deep learning framework for deblurring images with unknown noise level.
359	Image Demoireing with Learnable Bandpass Filters	Bolun Zheng; Shanxin Yuan; Gregory Slabaugh; Ales Leonardis;	In this paper, we propose a novel multiscale bandpass convolutional neural network (MBCNN) to address this problem.
360	Assessing Image Quality Issues for Real-World Problems	Tai-Yin Chiu; Yinan Zhao; Danna Gurari;	We introduce a new large-scale dataset that links the assessment of image quality issues to two practical vision tasks: image captioning and visual question answering.
361	Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising	Haokui Zhang; Ying Li; Hao Chen; Chunhua Shen;	In this paper, we propose HiNAS (Hierarchical NAS), an effort towards employing NAS to automatically design effective neural network architectures for image denoising.
362	Blindly Assess Image Quality in the Wild Guided by a Self-Adaptive Hyper Network	Shaolin Su; Qingsen Yan; Yu Zhu; Cheng Zhang; Xin Ge; Jinqiu Sun; Yanning Zhang;	To deal with the challenge, we propose a self-adaptive hyper network architecture to blind assess image quality in the wild.
363	Perceptual Quality Assessment of Smartphone Photography	Yuming Fang; Hanwei Zhu; Yan Zeng; Kede Ma; Zhou Wang;	We introduce the Smartphone Photography Attribute and Quality (SPAQ) database, consisting of 11,125 pictures taken by 66 smartphones, where each image is attached with so far the richest annotations.
364	Don’t Hit Me! Glass Detection in Real-World Scenes	Haiyang Mei; Xin Yang; Yang Wang; Yuanyuan Liu; Shengfeng He; Qiang Zhang; Xiaopeng Wei; Rynson W.H. Lau;	In this paper, we propose an important problem of detecting glass from a single RGB image.
365	Progressive Mirror Detection	Jiaying Lin; Guodong Wang; Rynson W.H. Lau;	Hence, we propose a model in this paper to progressively learn the content similarity between the inside and outside of the mirror while explicitly detecting the mirror edges.
366	Category-Level Articulated Object Pose Estimation	Xiaolong Li; He Wang; Li Yi; Leonidas J. Guibas; A. Lynn Abbott; Shuran Song;	We present a novel category-level approach that correctly accommodates object instances previously unseen during training.
367	Unbiased Scene Graph Generation From Biased Training	Kaihua Tang; Yulei Niu; Jianqiang Huang; Jiaxin Shi; Hanwang Zhang;	In this paper, we present a novel SGG framework based on causal inference but not the conventional likelihood.
368	Dynamic Graph Message Passing Networks	Li Zhang; Dan Xu; Anurag Arnab; Philip H.S. Torr;	We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph.
369	Weakly Supervised Visual Semantic Parsing	Alireza Zareian; Svebor Karaman; Shih-Fu Chang;	In this paper, we address those two limitations by first proposing a generalized formulation of SGG, namely Visual Semantic Parsing, which disentangles entity and predicate recognition, and enables sub-quadratic performance.
370	GPS-Net: Graph Property Sensing Network for Scene Graph Generation	Xin Lin; Changxing Ding; Jinquan Zeng; Dacheng Tao;	Accordingly, in this paper, we propose a Graph Property Sensing Network (GPS-Net) that fully explores these three properties for SGG.
371	End-to-End Optimization of Scene Layout	Andrew Luo; Zhoutong Zhang; Jiajun Wu; Joshua B. Tenenbaum;	We propose an end-to-end variational generative model for scene layout synthesis conditioned on scene graphs.
372	Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision	Fei Pan; Inkyu Shin; Francois Rameau; Seokju Lee; In So Kweon;	In this work, we propose a two-step self-supervised domain adaptation approach to minimize the inter-domain and intra-domain gap together.
373	Dual Super-Resolution Learning for Semantic Segmentation	Li Wang; Dong Li; Yousong Zhu; Lu Tian; Yi Shan;	In this paper, we propose a simple and flexible two-stream framework named Dual Super-Resolution Learning (DSRL) to effectively improve the segmentation accuracy without introducing extra computation costs.
374	Self-Supervised Scene De-Occlusion	Xiaohang Zhan; Xingang Pan; Bo Dai; Ziwei Liu; Dahua Lin; Chen Change Loy;	In this paper, we investigate the problem of scene de-occlusion, which aims to recover the underlying occlusion ordering and complete the invisible parts of occluded objects.
375	BANet: Bidirectional Aggregation Network With Occlusion Handling for Panoptic Segmentation	Yifeng Chen; Guangchen Lin; Songyuan Li; Omar Bourahla; Yiming Wu; Fangfang Wang; Junyi Feng; Mingliang Xu; Xi Li;	Motivated by these observations, we propose a novel deep panoptic segmentation scheme based on a bidirectional learning pipeline.
376	CPR-GCN: Conditional Partial-Residual Graph Convolutional Network in Automated Anatomical Labeling of Coronary Arteries	Han Yang; Xingjian Zhen; Ying Chi; Lei Zhang; Xian-Sheng Hua;	Motivated by the wide application of the graph neural network in structured data, in this paper, we propose a conditional partial-residual graph convolutional network (CPR-GCN), which takes both position and CT image into consideration, since CT image contains abundant information such as branch size and spanning direction.
377	Cross-View Correspondence Reasoning Based on Bipartite Graph Convolutional Network for Mammogram Mass Detection	Yuhang Liu; Fandong Zhang; Qianyi Zhang; Siwen Wang; Yizhou Wang; Yizhou Yu;	In this paper, we introduce bipartite graph convolutional network to endow existing methods with cross-view reasoning ability of radiologists in mammogram mass detection.
378	MPM: Joint Representation of Motion and Position Map for Cell Tracking	Junya Hayashida; Kazuya Nishimura; Ryoma Bise;	In this paper, we propose the Motion and Position Map (MPM) that jointly represents both detection and association for not only migration but also cell division.
379	Deep Distance Transform for Tubular Structure Segmentation in CT Scans	Yan Wang; Xu Wei; Fengze Liu; Jieneng Chen; Yuyin Zhou; Wei Shen; Elliot K. Fishman; Alan L. Yuille;	Inspired by this, we propose a geometry-aware tubular structure segmentation method, Deep Distance Transform (DDT), which combines intuitions from the classical distance transform for skeletonization and modern deep segmentation networks.
380	Instance Segmentation of Biological Images Using Harmonic Embeddings	Victor Kulikov; Victor Lempitsky;	We present a new instance segmentation approach tailored to biological images, where instances may correspond to individual cells, organisms or plant parts.
381	Multi-scale Domain-adversarial Multiple-instance CNN for Cancer Subtype Classification with Unannotated Histopathological Images	Noriaki Hashimoto; Daisuke Fukushima; Ryoichi Koga; Yusuke Takagi; Kaho Ko; Kei Kohno; Masato Nakaguro; Shigeo Nakamura; Hidekata Hontani; Ichiro Takeuchi;	We propose a new method for cancer subtype classification from histopathological images, which can automatically detect tumor-specific features in a given whole slide image (WSI).
382	SOS: Selective Objective Switch for Rapid Immunofluorescence Whole Slide Image Classification	Sam Maksoud; Kun Zhao; Peter Hobson; Anthony Jennings; Brian C. Lovell;	In this paper, we demonstrate that conventional patch-based processing is redundant for certain WSI classification tasks where high resolution is only required in a minority of cases.
383	Task Agnostic Robust Learning on Corrupt Outputs by Correlation-Guided Mixture Density Networks	Sungjoon Choi; Sanghoon Hong; Kyungjae Lee; Sungbin Lim;	In this paper, we focus on weakly supervised learning with noisy training data for both classification and regression problems.
384	METAL: Minimum Effort Temporal Activity Localization in Untrimmed Videos	Da Zhang; Xiyang Dai; Yuan-Fang Wang;	Towards this objective, we propose a novel Similarity Pyramid Network (SPN) that adopts the few-shot learning technique of Relation Network and directly encodes hierarchical multi-scale correlations, which we learn by optimizing two complimentary loss functions in an end-to-end manner.
385	Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data	Xi Yan; David Acuna; Sanja Fidler;	We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain.
386	Revisiting Knowledge Distillation via Label Smoothing Regularization	Li Yuan; Francis EH Tay; Guilin Li; Tao Wang; Jiashi Feng;	In this work, we challenge this common belief by following experimental observations: 1) beyond the acknowledgment that the teacher can improve the student, the student can also enhance the teacher significantly by reversing the KD procedure; 2) a poorly-trained teacher with much lower accuracy than the student can still improve the latter significantly.
387	WCP: Worst-Case Perturbations for Semi-Supervised Deep Learning	Liheng Zhang; Guo-Jun Qi;	In this paper, we present a novel regularization mechanism for training deep networks by minimizing the Worse-Case Perturbation (WCP).
388	DEPARA: Deep Attribution Graph for Deep Knowledge Transferability	Jie Song; Yixin Chen; Jingwen Ye; Xinchao Wang; Chengchao Shen; Feng Mao; Mingli Song;	In this paper, we propose the DEeP Attribution gRAph (DEPARA) to investigate the transferability of knowledge learned from PR-DNNs.
389	Conditional Channel Gated Networks for Task-Aware Continual Learning	Davide Abati; Jakub Tomczak; Tijmen Blankevoort; Simone Calderara; Rita Cucchiara; Babak Ehteshami Bejnordi;	In this work, we introduce a novel framework to tackle this problem with conditional computation.
390	Towards Discriminability and Diversity: Batch Nuclear-Norm Maximization Under Label Insufficient Situations	Shuhao Cui; Shuhui Wang; Junbao Zhuo; Liang Li; Qingming Huang; Qi Tian;	Accordingly, to improve both discriminability and diversity, we propose Batch Nuclear-norm Maximization (BNM) on the output matrix.
391	FocalMix: Semi-Supervised Learning for 3D Medical Image Detection	Dong Wang; Yuan Zhang; Kexin Zhang; Liwei Wang;	In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection.
392	Learning 3D Semantic Scene Graphs From 3D Indoor Reconstructions	Johanna Wald; Helisa Dhamo; Nassir Navab; Federico Tombari;	In our work we focus on scene graphs, a data structure that organizes the entities of a scene in a graph, where objects are nodes and their relationships modeled as edges.
393	Self-Supervised Viewpoint Learning From Image Collections	Siva Karthik Mustikovela; Varun Jampani; Shalini De Mello; Sifei Liu; Umar Iqbal; Carsten Rother; Jan Kautz;	We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner with a generative network, along with symmetry and adversarial constraints to successfully supervise our viewpoint estimation network.
394	Two-Shot Spatially-Varying BRDF and Shape Estimation	Mark Boss; Varun Jampani; Kihwan Kim; Hendrik P.A. Lensch; Jan Kautz;	We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
395	Variational Context-Deformable ConvNets for Indoor Scene Parsing	Zhitong Xiong; Yuan Yuan; Nianhui Guo; Qi Wang;	Thus, in this paper, we propose a novel variational context-deformable (VCD) module to learn adaptive receptive-field in a structured fashion.
396	Strip Pooling: Rethinking Spatial Pooling for Scene Parsing	Qibin Hou; Li Zhang; Ming-Ming Cheng; Jiashi Feng;	In this paper, beyond conventional spatial pooling that usually has a regular shape of NxN, we rethink the formulation of spatial pooling by introducing a new pooling strategy, called strip pooling, which considers a long but narrow kernel, i.e., 1xN or Nx1.
397	Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector	Qi Fan; Wei Zhuo; Chi-Keung Tang; Yu-Wing Tai;	In this paper, we propose a novel few-shot object detection network that aims at detecting objects of unseen categories with only a few annotated examples.
398	What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation	Jiahua Dong; Yang Cong; Gan Sun; Bineng Zhong; Xiaowei Xu;	To address these challenges, we develop a new unsupervised semantic transfer model including two complementary modules (i.e., T_D and T_F ) for endoscopic lesions segmentation, which can alternatively determine where and how to explore transferable domain-invariant knowledge between labeled source lesions dataset (e.g., gastroscope) and unlabeled target diseases dataset (e.g., enteroscopy).
399	ADINet: Attribute Driven Incremental Network for Retinal Image Classification	Qier Meng; Satoh Shin’ichi;	In this paper, we design a framework named "Attribute Driven Incremental Network" (ADINet), a new architecture that integrates class label prediction and attribute prediction into an incremental learning framework to boost the classification performance.
400	Unsupervised Domain Adaptation With Hierarchical Gradient Synchronization	Lanqing Hu; Meina Kan; Shiguang Shan; Xilin Chen;	Inspired by this, we propose a novel method called Hierarchical Gradient Synchronization to model the synchronization relationship among the local distribution pieces and global distribution, aiming for more precise domain-invariant features.
401	Deep Grouping Model for Unified Perceptual Parsing	Zhiheng Li; Wenxuan Bao; Jiayang Zheng; Chenliang Xu;	Overcoming these challenges, we propose a deep grouping model (DGM) that tightly marries the two types of representations and defines a bottom-up and a top-down process for feature exchanging.
402	Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching	Yujiao Shi; Xin Yu; Dylan Campbell; Hongdong Li;	Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
403	Gum-Net: Unsupervised Geometric Matching for Fast and Accurate 3D Subtomogram Image Alignment and Averaging	Xiangrui Zeng; Min Xu;	We propose a Geometric unsupervised matching Net-work (Gum-Net) for finding the geometric correspondence between two images with application to 3D subtomogram alignment and averaging.
404	FDA: Fourier Domain Adaptation for Semantic Segmentation	Yanchao Yang; Stefano Soatto;	We describe a simple method for unsupervised domain adaptation, whereby the discrepancy between the source and target distributions is reduced by swapping the low-frequency spectrum of one with the other.
405	Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery	Zhuo Zheng; Yanfei Zhong; Junjue Wang; Ailong Ma;	In this paper, we argue that the problems lie on the lack of foreground modeling and propose a foreground-aware relation network (FarSeg) from the perspectives of relation-based and optimization-based foreground modeling, to alleviate the above two problems.
406	When2com: Multi-Agent Perception via Communication Graph Grouping	Yen-Cheng Liu; Junjiao Tian; Nathaniel Glaser; Zsolt Kira;	In this paper, we address the collaborative perception problem, where one agent is required to perform a perception task and can communicate and share information with other agents on the same task.
407	Learning Human-Object Interaction Detection Using Interaction Points	Tiancai Wang; Tong Yang; Martin Danelljan; Fahad Shahbaz Khan; Xiangyu Zhang; Jian Sun;	In this paper, we therefore propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
408	C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation	Qihang Yu; Dong Yang; Holger Roth; Yutong Bai; Yixiao Zhang; Alan L. Yuille; Daguang Xu;	In this paper, we propose a coarse-to-fine neural architecture search (C2FNAS) to automatically search a 3D segmentation network from scratch without inconsistency on network size or input size.
409	Adaptive Subspaces for Few-Shot Learning	Christian Simon; Piotr Koniusz; Richard Nock; Mehrtash Harandi;	In this paper, we provide a framework for few-shot learning by introducing dynamic classifiers that are constructed from few samples.
410	Learning to Detect Important People in Unlabelled Images for Semi-Supervised Important People Detection	Fa-Ting Hong; Wei-Hong Li; Wei-Shi Zheng;	To overcome this problem, we propose learning important people detection on partially annotated images.
411	Stochastic Sparse Subspace Clustering	Ying Chen; Chun-Guang Li; Chong You;	In particular, we show that dropout is equivalent to adding a squared l_2 norm regularization on the representation coefficients, therefore induces denser solutions. Then, we reformulate the optimization problem as a consensus problem over a set of small-scale subproblems.
412	CRNet: Cross-Reference Networks for Few-Shot Segmentation	Weide Liu; Chi Zhang; Guosheng Lin; Fayao Liu;	In this paper, we propose a cross-reference network (CRNet) for few-shot segmentation.
413	Shoestring: Graph-Based Semi-Supervised Classification With Severely Limited Labeled Data	Wanyu Lin; Zhaolin Gao; Baochun Li;	To address the problem of semi-supervised learning in the presence of severely limited labeled samples, we propose a new framework, called Shoestring , that incorporates metric learning into the paradigm of graph-based semi-supervised learning.
414	Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings	Paul Bergmann; Michael Fauser; David Sattlegger; Carsten Steger;	We introduce a powerful student-teacher framework for the challenging problem of unsupervised anomaly detection and pixel-precise anomaly segmentation in high-resolution images.
415	3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior	Xiaokang Chen; Kwan-Yee Lin; Chen Qian; Gang Zeng; Hongsheng Li;	In this paper, we propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation, which could still be able to encode sufficient geometric information, e.g., room layout, object’s sizes and shapes, to infer the invisible areas of the scene with well structure-preserving details.
416	Graph-Guided Architecture Search for Real-Time Semantic Segmentation	Peiwen Lin; Peng Sun; Guangliang Cheng; Sirui Xie; Xi Li; Jianping Shi;	In order to release researchers from these tedious mechanical trials, we propose a Graph-guided Architecture Search (GAS) pipeline to automatically search real-time semantic segmentation networks.
417	Composing Good Shots by Exploiting Mutual Relations	Debang Li; Junge Zhang; Kaiqi Huang; Ming-Hsuan Yang;	Motivated by this, we propose a graph-based module with a gated feature update to model the relations between different candidates.
418	Organ at Risk Segmentation for Head and Neck Cancer Using Stratified Learning and Neural Architecture Search	Dazhou Guo; Dakai Jin; Zhuotun Zhu; Tsung-Ying Ho; Adam P. Harrison; Chun-Hung Chao; Jing Xiao; Le Lu;	For such scenarios, insights can be gained from the stratification approaches seen in manual clinical OAR delineation. This is the goal of our work, where we introduce stratified organ at risk segmentation (SOARS), an approach that stratifies OARs into anchor, mid-level, and small & hard (S&H) categories.
419	G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features	Wei Chen; Xi Jia; Hyung Jin Chang; Jinming Duan; Ales Leonardis;	In this paper, we propose a novel real-time 6D object pose estimation framework, named G2L-Net.
420	Unsupervised Instance Segmentation in Microscopy Images via Panoptic Domain Adaptation and Task Re-Weighting	Dongnan Liu; Donghao Zhang; Yang Song; Fan Zhang; Lauren O’Donnell; Heng Huang; Mei Chen; Weidong Cai;	In this work, we propose a Cycle Consistency Panoptic Domain Adaptive Mask R-CNN (CyC-PDAM) architecture for unsupervised nuclei segmentation in histopathology images, by learning from fluorescence microscopy images.
421	Single-Stage Semantic Segmentation From Image Labels	Nikita Araslanov; Stefan Roth;	In this work, we first define three desirable properties of a weakly supervised method: local consistency, semantic fidelity, and completeness. Using these properties as guidelines, we then develop a segmentation-based network model and a self-supervised training scheme to train for semantic masks from image-level annotations in a single stage.
422	Cascaded Human-Object Interaction Recognition	Tianfei Zhou; Wenguan Wang; Siyuan Qi; Haibin Ling; Jianbing Shen;	Considering the intrinsic complexity of the task, we introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
423	DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction With Deep T1 Prior	Bo Zhou; S. Kevin Zhou;	In this work, we address the above two limitations by proposing a Dual Domain Recurrent Network (DuDoRNet) with deep T1 prior embedded to simultaneously recover k-space and images for accelerating the acquisition of MRI with a long imaging protocol.
424	Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation	Junsong Fan; Zhaoxiang Zhang; Chunfeng Song; Tieniu Tan;	In this paper, we argue that the critical factor preventing to obtain the full object mask is the classification boundary mismatch problem in applying the CAM to WSSS.
425	FPConv: Learning Local Flattening for Point Convolution	Yiqun Lin; Zizheng Yan; Haibin Huang; Dong Du; Ligang Liu; Shuguang Cui; Xiaoguang Han;	We introduce FPConv, a novel surface-style convolution operator designed for 3D point cloud analysis.
426	Rotation Equivariant Graph Convolutional Network for Spherical Image Classification	Qin Yang; Chenglin Li; Wenrui Dai; Junni Zou; Guo-Jun Qi; Hongkai Xiong;	In this paper, we generalize the grid-based CNNs to a non-Euclidean space by taking into account the geometry of spherical surfaces and propose a Spherical Graph Convolutional Network (SGCN) to encode rotation equivariant representations.
427	FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation	Hanchao Yu; Shanhui Sun; Haichao Yu; Xiao Chen; Honghui Shi; Thomas S. Huang; Terrence Chen;	In this context, we proposed a novel fast online adaptive learning (FOAL) framework: an online gradient descent based optimizer that is optimized by a meta-learner.
428	ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation	Sharon Fogel; Hadar Averbuch-Elor; Sarel Cohen; Shai Mazor; Roee Litman;	We present ScrabbleGAN, a semi-supervised approach to synthesize handwritten text images that are versatile both in style and lexicon.
429	Cross-Domain Semantic Segmentation via Domain-Invariant Interactive Relation Transfer	Fengmao Lv; Tao Liang; Xiang Chen; Guosheng Lin;	In this paper, we propose a new domain adaptation approach, called Pivot Interaction Transfer (PIT).
430	Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition	Linchao Zhu; Yi Yang;	To deal with the class imbalance problem, we introduce an Inflated Episodic Memory (IEM) for long-tailed visual recognition.
431	Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior	Osama Makansi; Ozgun Cicek; Kevin Buchicchio; Thomas Brox;	In this paper, we investigate the problem of anticipating future dynamics, particularly the future location of other vehicles and pedestrians, in the view of a moving vehicle.
432	Structure Preserving Generative Cross-Domain Learning	Haifeng Xia; Zhengming Ding;	To this end, we develop a novel Generative cross-domain learning via Structure-Preserving (GSP), which attempts to transform target data into the source domain in order to take advantage of source supervision.
433	Reverse Perspective Network for Perspective-Aware Object Counting	Yifan Yang; Guorong Li; Zhe Wu; Li Su; Qingming Huang; Nicu Sebe;	We propose a reverse perspective network to solve the scale variations of input images, instead of generating perspective maps to smooth final outputs.
434	Multi-Path Region Mining for Weakly Supervised 3D Semantic Segmentation on Point Clouds	Jiacheng Wei; Guosheng Lin; Kim-Hui Yap; Tzu-Yi Hung; Lihua Xie;	In this paper, we propose a weakly supervised approach to predict point-level results using weak labels on 3D point clouds.
435	Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation	Renjun Xu; Pelen Liu; Liyan Wang; Chao Chen; Jindong Wang;	In this paper, we present Reliable Weighted Optimal Transport (RWOT) for unsupervised domain adaptation, including novel Shrinking Subspace Reliability (SSR) and weighted optimal transport strategy.
436	ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes	Charles R. Qi; Xinlei Chen; Or Litany; Leonidas J. Guibas;	In this work, we build on top of VoteNet and propose a 3D detection architecture called ImVoteNet specialized for RGB-D scenes.
437	Understanding Road Layout From Videos as a Whole	Buyu Liu; Bingbing Zhuang; Samuel Schulter; Pan Ji; Manmohan Chandraker;	In this paper, we address the problem of inferring the layout of complex road scenes from video sequences.
438	Bi-Directional Relationship Inferring Network for Referring Image Segmentation	Zhiwei Hu; Guang Feng; Jiayu Sun; Lihe Zhang; Huchuan Lu;	In this work, we propose a bi-directional relationship inferring network (BRINet) to model the dependencies of cross-modal information.
439	Perspective Plane Program Induction From a Single Image	Yikai Li; Jiayuan Mao; Xiuming Zhang; William T. Freeman; Joshua B. Tenenbaum; Jiajun Wu;	We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image.
440	DeepFLASH: An Efficient Network for Learning-Based Medical Image Registration	Jian Wang; Miaomiao Zhang;	This paper presents DeepFLASH, a novel network with efficient training and inference for learning-based medical image registration.
441	Semi-Supervised Learning for Few-Shot Image-to-Image Translation	Yaxing Wang; Salman Khan; Abel Gonzalez-Garcia; Joost van de Weijer; Fahad Shahbaz Khan;	In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training.
442	Semantic Correspondence as an Optimal Transport Problem	Yanbin Liu; Linchao Zhu; Makoto Yamada; Yi Yang;	The whole procedure is combined into a unified optimal transport algorithm by converting the maximization problem to the optimal transport formulation and incorporating the staircase weights into optimal transport algorithm to act as empirical distributions.
443	How Much Time Do You Have? Modeling Multi-Duration Saliency	Camilo Fosco; Anelise Newman; Pat Sukhum; Yun Bin Zhang; Nanxuan Zhao; Aude Oliva; Zoya Bylinskii;	In this paper we propose to capture gaze as a series of snapshots, by generating population-level saliency heatmaps for multiple viewing durations. We collect the CodeCharts1K dataset, which contains multiple distinct heatmaps per image corresponding to 0.5, 3, and 5 seconds of free-viewing.
444	Fine-Grained Generalized Zero-Shot Learning via Dense Attribute-Based Attention	Dat Huynh; Ehsan Elhamifar;	Instead of aligning a global feature vector of an image with its associated class semantic vector, we propose an attribute embedding technique that aligns each attribute-based feature with its attribute semantic vector.
445	Online Depth Learning Against Forgetting in Monocular Videos	Zhenyu Zhang; Stephane Lathuiliere; Elisa Ricci; Nicu Sebe; Yan Yan; Jian Yang;	Specifically, to adapt temporal-continuous depth patterns in videos, we introduce a novel meta-learning approach to learn adapter modules by combining online adaptation process into the learning objective.
446	Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation	Lingjing Wang; Xiang Li; Yi Fang;	In comparison, we propose a novel 3D shape segmentation method that requires few labeled data for training.
447	Pattern-Structure Diffusion for Multi-Task Learning	Ling Zhou; Zhen Cui; Chunyan Xu; Zhenyu Zhang; Chaoqun Wang; Tong Zhang; Jian Yang;	Inspired by the observation that pattern structures high-frequently recur within intra-task also across tasks, we propose a pattern-structure diffusion (PSD) framework to mine and propagate task-specific and task-across pattern structures in the task-level space for joint depth estimation, segmentation and surface normal prediction.
448	Training Noise-Robust Deep Neural Networks via Meta-Learning	Zhen Wang; Guosheng Hu; Qinghua Hu;	In this work, we propose a new loss correction approach, named as Meta Loss Correction (MLC), to directly learn T from data via the meta-learning framework.
449	Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation	Jiazhao Zhang; Chenyang Zhu; Lintao Zheng; Kai Xu;	We propose a novel fusion-aware 3D point convolution which operates directly on the geometric surface being reconstructed and exploits effectively the inter-frame correlation for high-quality 3D feature learning.
450	Universal Source-Free Domain Adaptation	Jogendra Nath Kundu; Naveen Venkat; Rahul M V; R. Venkatesh Babu;	Devoid of such impractical assumptions, we propose a novel two-stage learning process.
451	Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction	Beibei Jin; Yu Hu; Qiankun Tang; Jingyu Niu; Zhiping Shi; Yinhe Han; Xiaowei Li;	Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to uniformly deal with spatial and temporal information.
452	Varicolored Image De-Hazing	Akshay Dudhane; Kuldeep M. Biradar; Prashant W. Patil; Praful Hambarde; Subrahmanyam Murala;	In this paper, we propose a varicolored end-to-end image de-hazing network which restores the color balance in a given varicolored hazy image and recovers the haze-free image.
453	SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds	Hanyu Shi; Guosheng Lin; Hao Wang; Tzu-Yi Hung; Zhenhua Wang;	In this paper, we propose SpSequenceNet to address this problem.
454	Separating Particulate Matter From a Single Microscopic Image	Tushar Sandhan; Jin Young Choi;	In this work, we thoroughly analyze the physical properties of PM, microscope and their inevitable interaction; and propose an optimization scheme, which removes the PM from a high-resolution microscopic image within a few seconds.
455	Adaptive Dilated Network With Self-Correction Supervision for Counting	Shuai Bai; Zhiqun He; Yu Qiao; Hanzhe Hu; Wei Wu; Junjie Yan;	In this paper, we propose an adaptive dilated convolution and a novel supervised learning framework named self-correction (SC) supervision.
456	PointPainting: Sequential Fusion for 3D Object Detection	Sourabh Vora; Alex H. Lang; Bassam Helou; Oscar Beijbom;	In this work, we propose PointPainting: a sequential fusion method to fill this gap.
457	Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications	Biagio Brattoli; Joseph Tighe; Fedor Zhdanov; Pietro Perona; Krzysztof Chalupka;	We propose the first end-to-end algorithm for ZSL in video classification.
458	Learning to Select Base Classes for Few-Shot Classification	Linjun Zhou; Peng Cui; Xu Jia; Shiqiang Yang; Qi Tian;	In this paper, we utilize a simple yet effective measure, the Similarity Ratio, as an indicator for the generalization performance of a few-shot model. We then formulate the base class selection problem as a submodular optimization problem over Similarity Ratio.
459	CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus	Florian Kluger; Eric Brachmann; Hanno Ackermann; Carsten Rother; Michael Ying Yang; Bodo Rosenhahn;	We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements.
460	Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks	Tony C.W. Mok; Albert C.S. Chung;	In this paper, we present a novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously.
461	Distilled Semantics for Comprehensive Scene Understanding from Videos	Fabio Tosi; Filippo Aleotti; Pierluigi Zama Ramirez; Matteo Poggi; Samuele Salti; Luigi Di Stefano; Stefano Mattoccia;	In this paper, we take an additional step toward holistic scene understanding with monocular cameras by learning depth and motion alongside with semantics, with supervision for the latter provided by a pre-trained network distilling proxy ground truth images.
462	Modeling Biological Immunity to Adversarial Examples	Edward Kim; Jocelyn Rego; Yijing Watkins; Garrett T. Kenyon;	In this work, we explored this gap through the lens of biology and neuroscience in order to understand the robustness exhibited in human perception.
463	DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-Move Forgery Detection and Localization	Ashraful Islam; Chengjiang Long; Arslan Basharat; Anthony Hoogs;	In this paper, we propose a Generative Adversarial Network with a dual-order attention model to detect and localize copy-move forgeries.
464	Correspondence-Free Material Reconstruction using Sparse Surface Constraints	Sebastian Weiss; Robert Maier; Daniel Cremers; Rudiger Westermann; Nils Thuerey;	We present a method to infer physical material parameters, and even external boundaries, from the scanned motion of a homogeneous deformable object via the solution of an inverse problem.
465	Augmenting Colonoscopy Using Extended and Directional CycleGAN for Lossy Image Translation	Shawn Mathew; Saad Nadeem; Sruti Kumari; Arie Kaufman;	In this paper, we present a deep learning framework, Extended and Directional CycleGAN, for lossy unpaired image-to-image translation between OC and VC to augment OC video sequences with scale-consistent depth information from VC and VC with patient-specific textures, color and specular highlights from OC (e.g. for realistic polyp synthesis).
466	Attention Scaling for Crowd Counting	Xiaoheng Jiang; Li Zhang; Mingliang Xu; Tianzhu Zhang; Pei Lv; Bing Zhou; Xin Yang; Yanwei Pang;	To overcome this problem, we propose an approach to alleviate the counting performance differences in different regions.
467	Shape Reconstruction by Learning Differentiable Surface Representations	Jan Bednarik; Shaifali Parashar; Erhan Gundogdu; Mathieu Salzmann; Pascal Fua;	In this paper, we show that we can exploit the inherent differentiability of deep networks to leverage differential surface properties during training so as to prevent patch collapse and strongly reduce patch overlap.
468	A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image	Yuyu Guo; Lei Bi; Euijoon Ahn; Dagan Feng; Qian Wang; Jinman Kim;	In this paper, we present a spatiotemporal volumetric interpolation network (SVIN) designed for 4D dynamic medical images.
469	Attention-Based Context Aware Reasoning for Situation Recognition	Thilini Cooray; Ngai-Man Cheung; Wei Lu;	Inspired by the success achieved by query-based visual reasoning (e.g., Visual Question Answering), we propose to address semantic role prediction as a query-based visual reasoning problem.
470	PatchVAE: Learning Local Latent Codes for Recognition	Kamal Gupta; Saurabh Singh; Abhinav Shrivastava;	Drawing inspiration from the mid-level representation discovery work, we propose PatchVAE, that reasons about images at patch level.
471	Self-Supervised Monocular Trained Depth Estimation Using Self-Attention and Discrete Disparity Volume	Adrian Johnston; Gustavo Carneiro;	In this paper, we propose two new ideas to improve self-supervised monocular trained depth estimation: 1) self-attention, and 2) discrete disparity prediction.
472	STAViS: Spatio-Temporal AudioVisual Saliency Network	Antigoni Tsiami; Petros Koutras; Petros Maragos;	We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos.
473	More Grounded Image Captioning by Distilling Image-Text Matching Model	Yuanen Zhou; Meng Wang; Daqing Liu; Zhenzhen Hu; Hanwang Zhang;	To this end, we propose a Part-of-Speech (POS) enhanced image-text matching model (SCAN): POS-SCAN, as the effective knowledge distillation for more grounded image captioning.
474	DUNIT: Detection-Based Unsupervised Image-to-Image Translation	Deblina Bhattacharjee; Seungryong Kim; Guillaume Vizier; Mathieu Salzmann;	In this paper, we introduce a Detection-based Unsupervised Image-to-image Translation (DUNIT) approach that explicitly accounts for the object instances in the translation process.
475	Learning to Observe: Approximating Human Perceptual Thresholds for Detection of Suprathreshold Image Transformations	Alan Dolhasz; Carlo Harvey; Ian Williams;	In this paper, we propose to directly approximate the perceptual function performed by human observers completing a visual detection task.
476	Show, Edit and Tell: A Framework for Editing Image Captions	Fawaz Sammani; Luke Melas-Kyriazi;	This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption.
477	Structure Boundary Preserving Segmentation for Medical Image With Ambiguous Boundary	Hong Joo Lee; Jung Uk Kim; Sangmin Lee; Hak Gu Kim; Yong Man Ro;	In this paper, we propose a novel image segmentation method to tackle two critical problems of medical image, which are (i) ambiguity of structure boundary in the medical image domain and (ii) uncertainty of the segmented region without specialized domain knowledge.
478	Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers	Lyujian Lu; Hua Wang; Saad Elbeleidy; Feiping Nie;	To tackle this problem, in this paper we propose a novel formulation to learn an enriched representation for imaging biomarkers that can simultaneously capture both the information conveyed by baseline neuroimaging records and that by progressive variations of varied counts of available follow-up records over time.
479	Predicting Lymph Node Metastasis Using Histopathological Images Based on Multiple Instance Learning With Deep Graph Convolution	Yu Zhao; Fan Yang; Yuqi Fang; Hailing Liu; Niyun Zhou; Jun Zhang; Jiarui Sun; Sen Yang; Bjoern Menze; Xinjuan Fan; Jianhua Yao;	In this paper, we propose a multiple instance learning method based on deep graph convolutional network and feature selection (FS-GCN-MIL) for histopathological image classification.
480	Extremely Dense Point Correspondences Using a Learned Feature Descriptor	Xingtong Liu; Yiping Zheng; Benjamin Killeen; Masaru Ishii; Gregory D. Hager; Russell H. Taylor; Mathias Unberath;	In this work, we present an effective self-supervised training scheme and novel loss design for dense descriptor learning.
481	Local Deep Implicit Functions for 3D Shape	Kyle Genova; Forrester Cole; Avneesh Sud; Aaron Sarna; Thomas Funkhouser;	Towards this end, we introduce Local Deep Implicit Functions (LDIF), a 3D shape representation that decomposes space into a structured set of learned implicit functions.
482	PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation	Li Jiang; Hengshuang Zhao; Shaoshuai Shi; Shu Liu; Chi-Wing Fu; Jiaya Jia;	In this paper, we present PointGroup, a new end-to-end bottom-up architecture, specifically focused on better grouping the points by exploring the void space between objects.
483	Cost Volume Pyramid Based Depth Inference for Multi-View Stereo	Jiayu Yang; Wei Mao; Jose M. Alvarez; Miaomiao Liu;	We propose a cost volume-based neural network for depth inference from multi-view images.
484	RoutedFusion: Learning Real-Time Depth Map Fusion	Silvan Weder; Johannes Schonberger; Marc Pollefeys; Martin R. Oswald;	To this end, we present a novel real-time capable machine learning-based method for depth map fusion.
485	VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals	Zhixiang Min; Yiding Yang; Enrique Dunn;	We propose a dense indirect visual odometry method taking as input externally estimated optical flow fields instead of hand-crafted feature correspondences.
486	Learning to Optimize Non-Rigid Tracking	Yang Li; Aljaz Bozic; Tianwei Zhang; Yanli Ji; Tatsuya Harada; Matthias Niessner;	In this paper, we employ learnable optimizations to improve tracking robustness and speed up solver convergence.
487	KFNet: Learning Temporal Camera Relocalization Using Kalman Filtering	Lei Zhou; Zixin Luo; Tianwei Shen; Jiahui Zhang; Mingmin Zhen; Yao Yao; Tian Fang; Long Quan;	In this work, we improve the temporal relocalization method by using a network architecture that incorporates Kalman filtering (KFNet) for online camera relocalization.
488	Information-Driven Direct RGB-D Odometry	Alejandro Fontan; Javier Civera; Rudolph Triebel;	This paper presents an information-theoretic approach to point selection in direct RGB-D odometry.
489	SuperGlue: Learning Feature Matching With Graph Neural Networks	Paul-Edouard Sarlin; Daniel DeTone; Tomasz Malisiewicz; Andrew Rabinovich;	This paper introduces SuperGlue, a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points.
490	Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task	Aritra Bhowmik; Stefan Gumhold; Carsten Rother; Eric Brachmann;	We propose a new training methodology which embeds the feature detector in a complete vision pipeline, and where the learnable parameters are trained in an end-to-end fashion.
491	ReDA:Reinforced Differentiable Attribute for 3D Face Reconstruction	Wenbin Zhu; HsiangTao Wu; Zeyu Chen; Noranart Vesdapunt; Baoyuan Wang;	To further reduce the ambiguities, we present a novel framework called "Reinforced Differentiable Attributes" ("ReDA") which is more general and effective than previous Differentiable Rendering ("DR").
492	EventCap: Monocular 3D Capture of High-Speed Human Motions Using an Event Camera	Lan Xu; Weipeng Xu; Vladislav Golyanik; Marc Habermann; Lu Fang; Christian Theobalt;	In this paper, we propose EventCap — the first approach for 3D capturing of high-speed human motions using a single event camera.
493	Cross-Modal Deep Face Normals With Deactivable Skip Connections	Victoria Fernandez Abrevaya; Adnane Boukhayma; Philip H.S. Torr; Edmond Boyer;	We present an approach for estimating surface normals from in-the-wild color images of faces.
494	Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild	Dominik Kulon; Riza Alp Guler; Iasonas Kokkinos; Michael M. Bronstein; Stefanos Zafeiriou;	We introduce a simple and effective network architecture for monocular 3D hand pose estimation consisting of an image encoder followed by a mesh convolutional decoder that is trained through a direct 3D hand mesh reconstruction loss.
495	Face X-Ray for More General Face Forgery Detection	Lingzhi Li; Jianmin Bao; Ting Zhang; Hao Yang; Dong Chen; Fang Wen; Baining Guo;	In this paper we propose a novel image representation called face X-ray for detecting forgery in face images.
496	A Morphable Face Albedo Model	William A. P. Smith; Alassane Seck; Hannah Dee; Bernard Tiddeman; Joshua B. Tenenbaum; Bernhard Egger;	In this paper, we bring together two divergent strands of research: photometric face capture and statistical 3D face appearance modelling.
497	Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses	Rongliang Wu; Gongjie Zhang; Shijian Lu; Tao Chen;	To address these limitations, we propose Cascade Expression Focal GAN (Cascade EF-GAN), a novel network that performs progressive facial expression editing with local expression focuses.
498	GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes	Enric Corona; Albert Pumarola; Guillem Alenya; Francesc Moreno-Noguer; Gregory Rogez;	To this end, we introduce a generative model that jointly reasons in all these levels and 1) regresses the 3D shape and pose of the objects in the scene; 2) estimates the grasp types; and 3) refines the 51-DoF of a 3D hand model that minimize a graspability loss.
499	Deep Spatial Gradient and Temporal Depth Learning for Face Anti-Spoofing	Zezheng Wang; Zitong Yu; Chenxu Zhao; Xiangyu Zhu; Yunxiao Qin; Qiusheng Zhou; Feng Zhou; Zhen Lei;	In contrast, we design a new approach to detect presentation attacks from multiple frames based on two insights: 1) detailed discriminative clues (e.g., spatial gradient magnitude) between living and spoofing face may be discarded through stacked vanilla convolutions, and 2) the dynamics of 3D moving faces provide important clues in detecting the spoofing faces.
500	DeepCap: Monocular Human Performance Capture Using Weak Supervision	Marc Habermann; Weipeng Xu; Michael Zollhofer; Gerard Pons-Moll; Christian Theobalt;	We propose a novel deep learning approach for monocular dense human performance capture.
501	Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction	Ruixu Liu; Ju Shen; He Wang; Chen Chen; Sen-ching Cheung; Vijayan Asari;	We propose a novel attention-based framework for 3D human pose estimation from a monocular video.
502	Advancing High Fidelity Identity Swapping for Forgery Detection	Lingzhi Li; Jianmin Bao; Hao Yang; Dong Chen; Fang Wen;	In this work, we study various existing benchmarks for deepfake detection researches.
503	Controllable Person Image Synthesis With Attribute-Decomposed GAN	Yifang Men; Yiming Mao; Yuning Jiang; Wei-Ying Ma; Zhouhui Lian;	This paper introduces the Attribute-Decomposed GAN, a novel generative model for controllable person image synthesis, which can produce realistic person images with desired human attributes (e.g., pose, head, upper clothes and pants) provided in various source inputs.
504	Attentive Normalization for Conditional Image Generation	Yi Wang; Ying-Cong Chen; Xiangyu Zhang; Jian Sun; Jiaya Jia;	In this paper, we characterize long-range dependence with attentive normalization (AN), which is an extension to traditional instance normalization.
505	SEAN: Image Synthesis With Semantic Region-Adaptive Normalization	Peihao Zhu; Rameen Abdal; Yipeng Qin; Peter Wonka;	We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image.
506	Blurry Video Frame Interpolation	Wang Shen; Wenbo Bao; Guangtao Zhai; Li Chen; Xiongkuo Min; Zhiyong Gao;	In this paper, we propose a blurry video frame interpolation method to reduce motion blur and up-convert frame rate simultaneously.
507	Learning Physics-Guided Face Relighting Under Directional Light	Thomas Nestmeyer; Jean-Francois Lalonde; Iain Matthews; Andreas Lehrmann;	We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face.
508	Disentangled Image Generation Through Structured Noise Injection	Yazeed Alharbi; Peter Wonka;	Instead of traditional approaches, we propose feeding multiple noise codes through separate fully-connected layers respectively.
509	Cross-Domain Correspondence Learning for Exemplar-Based Image Translation	Pan Zhang; Bo Zhang; Dong Chen; Lu Yuan; Fang Wen;	We present a general framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain (e.g., semantic segmentation mask, or edge map, or pose keypoints), given an exemplar image.
510	Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning	Yu Deng; Jiaolong Yang; Dong Chen; Fang Wen; Xin Tong;	We propose an approach for face image generation of virtual people with disentangled, precisely-controllable latent representations for identity of non-existing people, expression, pose, and illumination.
511	Single Image Reflection Removal With Physically-Based Training Images	Soomin Kim; Yuchi Huo; Sung-Eui Yoon;	In this paper, physically based rendering is used for faithfully synthesizing the required training images, and a corresponding network structure and loss term are proposed.
512	SketchyCOCO: Image Generation From Freehand Scene Sketches	Chengying Gao; Qi Liu; Qi Xu; Limin Wang; Jianzhuang Liu; Changqing Zou;	We introduce the first method for automatic image generation from scene-level freehand sketches. We have built a large-scale composite dataset called SketchyCOCO to support and evaluate the solution.
513	Image Based Virtual Try-On Network From Unpaired Data	Assaf Neuberger; Eran Borenstein; Bar Hilleli; Eduard Oks; Sharon Alpert;	This paper presents a new image-based virtual try-on approach (Outfit-VITON) that helps visualize how a composition of clothing items selected from various reference images form a cohesive outfit on a person in a query image.
514	PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer	Wentao Jiang; Si Liu; Chen Gao; Jie Cao; Ran He; Jiashi Feng; Shuicheng Yan;	In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image.
515	RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild	Jiankang Deng; Jia Guo; Evangelos Ververas; Irene Kotsia; Stefanos Zafeiriou;	In this paper, we present a novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane.
516	Semantic Image Manipulation Using Scene Graphs	Helisa Dhamo; Azade Farshad; Iro Laina; Nassir Navab; Gregory D. Hager; Federico Tombari; Christian Rupprecht;	Our goal is to encode image information in a given constellation and from there on generate new constellations, such as replacing objects or even changing relationships between objects, while respecting the semantics and style from the original image.
517	A Stochastic Conditioning Scheme for Diverse Human Motion Prediction	Sadegh Aliakbarian; Fatemeh Sadat Saleh; Mathieu Salzmann; Lars Petersson; Stephen Gould;	Alternatively, in this paper, we propose to stochastically combine the root of variations with previous pose information, so as to force the model to take the noise into account.
518	Transferring Dense Pose to Proximal Animal Classes	Artsiom Sanakoyeu; Vasil Khalidov; Maureen S. McCarthy; Andrea Vedaldi; Natalia Neverova;	We show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes.
519	Weakly-Supervised 3D Human Pose Learning via Multi-View Images in the Wild	Umar Iqbal; Pavlo Molchanov; Jan Kautz;	We propose a novel end-to-end learning framework that enables weakly-supervised training using multi-view consistency.
520	VIBE: Video Inference for Human Body Pose and Shape Estimation	Muhammed Kocabas; Nikos Athanasiou; Michael J. Black;	To address this problem, we propose "Video Inference for Body Pose and Shape Estimation" (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations.
521	G3AN: Disentangling Appearance and Motion for Video Generation	Yaohui Wang; Piotr Bilinski; Francois Bremond; Antitza Dantcheva;	To tackle this challenge, we introduce G3AN, a novel spatio-temporal generative model, which seeks to capture the distribution of high dimensional video data and to model appearance and motion in disentangled manner.
522	Domain Adaptive Image-to-Image Translation	Ying-Cong Chen; Xiaogang Xu; Jiaya Jia;	To deal with these issues, we propose the Domain Adaptive Image-To-Image translation (DAI2I) framework that adapts an I2I model for out-of-domain samples.
523	GAN Compression: Efficient Architectures for Interactive Conditional GANs	Muyang Li; Ji Lin; Yaoyao Ding; Zhijian Liu; Jun-Yan Zhu; Song Han;	In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs.
524	Searching Central Difference Convolutional Networks for Face Anti-Spoofing	Zitong Yu; Chenxu Zhao; Zezheng Wang; Yunxiao Qin; Zhuo Su; Xiaobai Li; Feng Zhou; Guoying Zhao;	Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information.
525	TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting	Zhuoqian Yang; Wentao Zhu; Wayne Wu; Chen Qian; Qiang Zhou; Bolei Zhou; Chen Change Loy;	We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person.
526	AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation	Hyeongmin Lee; Taeoh Kim; Tae-young Chung; Daehyun Pak; Yuseok Ban; Sangyoun Lee;	To solve this problem, we propose a new warping module named Adaptive Collaboration of Flows (AdaCoF).
527	FReeNet: Multi-Identity Face Reenactment	Jiangning Zhang; Xianfang Zeng; Mengmeng Wang; Yusu Pan; Liang Liu; Yong Liu; Yu Ding; Changjie Fan;	This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model.
528	Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera	Jae Shin Yoon; Kihwan Kim; Orazio Gallo; Hyun Soo Park; Jan Kautz;	This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene.
529	Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data	Yuxiao Zhou; Marc Habermann; Weipeng Xu; Ikhsanul Habibie; Christian Theobalt; Feng Xu;	We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy.
530	The GAN That Warped: Semantic Attribute Editing With Unpaired Data	Garoe Dorta; Sara Vicente; Neill D. F. Campbell; Ivor J. A. Simpson;	This work proposes to learn how to perform semantic image edits through the application of smooth warp fields.
531	4D Visualization of Dynamic Events From Unconstrained Multi-View Videos	Aayush Bansal; Minh Vo; Yaser Sheikh; Deva Ramanan; Srinivasa Narasimhan;	We present a data-driven approach for 4D space-time visualization of dynamic events from videos captured by hand-held multiple cameras.
532	Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds	Yongming Rao; Jiwen Lu; Jie Zhou;	Based on this hypothesis, we propose to learn point cloud representation by bidirectional reasoning between the local structures at different abstraction hierarchies and the global shape without human supervision.
533	HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation	Bowen Cheng; Bin Xiao; Jingdong Wang; Honghui Shi; Thomas S. Huang; Lei Zhang;	In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids.
534	Detecting Attended Visual Targets in Video	Eunji Chong; Yongxin Wang; Nataniel Ruiz; James M. Rehg;	Our goal is to identify where each person in each frame of a video is looking, and correctly handle the case where the gaze target is out-of-frame. We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior.
535	Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution	Yong Guo; Jian Chen; Jingdong Wang; Qi Chen; Jiezhang Cao; Zeshuai Deng; Yanwu Xu; Mingkui Tan;	To address the above issues, we propose a dual regression scheme by introducing an additional constraint on LR data to reduce the space of the possible functions.
536	Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool	Konstantinos Rematas; Vittorio Ferrari;	We present a neural rendering framework that maps a voxelized scene into a high quality image.
537	Neural Contours: Learning to Draw Lines From 3D Shapes	Difan Liu; Mohamed Nabail; Aaron Hertzmann; Evangelos Kalogerakis;	This paper introduces a method for learning to generate line drawings from 3D models.
538	Softmax Splatting for Video Frame Interpolation	Simon Niklaus; Feng Liu;	We propose softmax splatting to address this paradigm shift and show its effectiveness on the application of frame interpolation.
539	CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks	Maxim Maximov; Ismail Elezi; Laura Leal-Taixe;	We propose and develop CIAGAN, a model for image and video anonymization based on conditional generative adversarial networks.
540	Probabilistic Structural Latent Representation for Unsupervised Embedding	Mang Ye; Jianbing Shen;	To tackle these issues, this paper proposes a probabilistic structural latent representation (PSLR), which incorporates an adaptable softmax embedding to approximate the positive concentrated and negative instance separated properties in the graph latent space.
541	Semantically Multi-Modal Image Synthesis	Zhen Zhu; Zhiliang Xu; Ansheng You; Xiang Bai;	In this paper, we focus on semantically multi-modal image synthesis (SMIS) task, namely, generating multi-modal images at the semantic level.
542	Nested Scale-Editing for Conditional Image Synthesis	Lingzhi Zhang; Jiancong Wang; Yinshuang Xu; Jie Min; Tarmily Wen; James C. Gee; Jianbo Shi;	We propose an image synthesis approach that provides stratified navigation in the latent code space.
543	UnrealText: Synthesizing Realistic Scene Text Images From the Unreal World	Shangbang Long; Cong Yao;	In this paper, we introduce UnrealText, an efficient image synthesis method that renders realistic images via a 3D graphics engine.
544	Fast Texture Synthesis via Pseudo Optimizer	Wu Shi; Yu Qiao;	We propose a new efficient method that aims to simulate the optimization process while retains most of the properties.
545	Towards Learning Structure via Consensus for Face Segmentation and Parsing	Iacopo Masi; Joe Mathai; Wael AbdAlmageed;	We thereby offer a novel learning mechanism to enforce structure in the prediction via consensus, guided by a robust loss function that forces pixel objects to be consistent with each other.
546	CookGAN: Causality Based Text-to-Image Synthesis	Bin Zhu; Chong-Wah Ngo;	This paper presents a new network architecture, CookGAN, that mimics visual effect in causality chain, preserves fine-grained details and progressively upsamples image.
547	Weakly Supervised Discriminative Feature Learning With State Information for Person Identification	Hong-Xing Yu; Wei-Shi Zheng;	In this work we propose utilizing the state information as weak supervision to address the visual discrepancy caused by different states.
548	Future Video Synthesis With Object Motion Prediction	Yue Wu; Rongrong Gao; Jaesik Park; Qifeng Chen;	We present an approach to predict future video frames given a sequence of continuous video frames in the past.
549	MaskGAN: Towards Diverse and Interactive Facial Image Manipulation	Cheng-Han Lee; Ziwei Liu; Lingyun Wu; Ping Luo;	To overcome these drawbacks, we propose a novel framework termed MaskGAN, enabling diverse and interactive face manipulation. To facilitate extensive studies, we construct a large-scale high-resolution face dataset with fine-grained mask annotations named CelebAMask-HQ.
550	A Graduated Filter Method for Large Scale Robust Estimation	Huu Le; Christopher Zach;	In this paper, we introduce a novel solver for robust estimation that possesses a strong ability to escape poor local minima.
551	Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation	Cheng Ma; Zhenyu Jiang; Yongming Rao; Jiwen Lu; Jie Zhou;	In this paper, we propose a deep face super-resolution (FSR) method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively.
552	Coherent Reconstruction of Multiple Humans From a Single Image	Wen Jiang; Nikos Kolotouros; Georgios Pavlakos; Xiaowei Zhou; Kostas Daniilidis;	In this work, we address the problem of multi-person 3D pose estimation from a single image.
553	PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling	Xu Yan; Chaoda Zheng; Zhen Li; Sheng Wang; Shuguang Cui;	In this paper, we present a novel end-to-end network for robust point clouds processing, named PointASNL, which can deal with point clouds with noise effectively.
554	A Neural Rendering Framework for Free-Viewpoint Relighting	Zhang Chen; Anpei Chen; Guli Zhang; Chengyuan Wang; Yu Ji; Kiriakos N. Kutulakos; Jingyi Yu;	We present a novel Relightable Neural Renderer (RNR) for simultaneous view synthesis and relighting using multi-view image inputs.
555	A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection	Zhihao Chen; Lei Zhu; Liang Wan; Song Wang; Wei Feng; Pheng-Ann Heng;	To boost the shadow detection performance, this paper presents a multi-task mean teacher model for semi-supervised shadow detection by leveraging unlabeled data and exploring the learning of multiple information of shadows simultaneously.
556	GroupFace: Learning Latent Groups and Constructing Group-Based Representations for Face Recognition	Yonghyun Kim; Wonpyo Park; Myung-Cheol Roh; Jongju Shin;	We propose a novel face-recognition-specialized architecture called GroupFace that utilizes multiple group-aware representations, simultaneously, to improve the quality of the embedding feature.
557	Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution	Xibin Song; Yuchao Dai; Dingfu Zhou; Liu Liu; Wei Li; Hongdong Li; Ruigang Yang;	In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with realworld DSR tasks.
558	Time Flies: Animating a Still Image With Time-Lapse Video As Reference	Chia-Chi Cheng; Hung-Yu Chen; Wei-Chen Chiu;	In this paper, we propose a self-supervised end-to-end model to generate the time-lapse video from a single image and a reference video.
559	SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness	Philipp Terhorst; Jan Niklas Kolf; Naser Damer; Florian Kirchbuchner; Arjan Kuijper;	Avoiding the use of inaccurate quality labels, we proposed a novel concept to measure face quality based on an arbitrary face recognition model.
560	Grid-GCN for Fast and Scalable Point Cloud Learning	Qiangeng Xu; Xudong Sun; Cho-Ying Wu; Panqu Wang; Ulrich Neumann;	In this paper, we present a method, named Grid-GCN, for fast and scalable point cloud learning.
561	Domain Balancing: Face Recognition on Long-Tailed Domains	Dong Cao; Xiangyu Zhu; Xingyu Huang; Jianzhu Guo; Zhen Lei;	In this paper, we propose a novel Domain Balancing (DB) mechanism to handle this problem.
562	AdversarialNAS: Adversarial Neural Architecture Search for GANs	Chen Gao; Yunpeng Chen; Si Liu; Zhenxiong Tan; Shuicheng Yan;	In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation.
563	Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining	Yiqun Mei; Yuchen Fan; Yuqian Zhou; Lichao Huang; Thomas S. Huang; Honghui Shi;	In this paper, we propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network.
564	The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation	Junjie Huang; Zheng Zhu; Feng Guo; Guan Huang;	In this paper, we focus on this problem and find that the devil of top-down pose estimator is in the biased data processing.
565	Data Uncertainty Learning in Face Recognition	Jie Chang; Zhonghao Lan; Changmao Cheng; Yichen Wei;	This work applies data uncertainty learning to face recognition, such that the feature (mean) and uncertainty (variance) are learnt simultaneously, for the first time.
566	Regularizing Discriminative Capability of CGANs for Semi-Supervised Generative Learning	Yi Liu; Guangchang Deng; Xiangping Zeng; Si Wu; Zhiwen Yu; Hau-San Wong;	To address this issue, we propose a regularization technique based on Random Regional Replacement (R^3-regularization) to facilitate the generative learning process.
567	FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification	Wenxuan Wang; Yanwei Fu; Xuelin Qian; Yu-Gang Jiang; Qi Tian; Xiangyang Xue;	To address these challenges, we propose a unified Face Morphological Multi-branch Network (FMMu-Net) for makeup-invariant face verification, which can simultaneously synthesize many diverse makeup faces through face morphology network (FM-Net) and effectively learn cosmetics-robust face representations using attention-based multi-branch learning network (AttM-Net).
568	UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation	Lei Zhao; Qihang Mo; Sihuan Lin; Zhizhong Wang; Zhiwen Zuo; Haibo Chen; Wei Xing; Dongming Lu;	In order to produce multiple and diverse reasonable solutions, we present Unsupervised Cross-space Translation Generative Adversarial Network (called UCTGAN) which mainly consists of three network modules: conditional encoder module, manifold projection module and generation module.
569	Decoupled Representation Learning for Skeleton-Based Gesture Recognition	Jianbo Liu; Yongcheng Liu; Ying Wang; Veronique Prinet; Shiming Xiang; Chunhong Pan;	In this paper, we propose to decouple the gesture into hand posture variations and hand movements, which are then modeled separately.
570	An Efficient PointLSTM for Point Clouds Based Gesture Recognition	Yuecong Min; Yanxiao Zhang; Xiujuan Chai; Xilin Chen;	In this paper, we formulate gesture recognition as an irregular sequence recognition problem and aim to capture long-term spatial correlations across point cloud sequences.
571	Editing in Style: Uncovering the Local Semantics of GANs	Edo Collins; Raja Bala; Bob Price; Sabine Susstrunk;	Focusing on StyleGAN, we introduce a simple and effective method for making local, semantically-aware edits to a target output image.
572	On the Detection of Digital Face Manipulation	Hao Dang; Feng Liu; Joel Stehouwer; Xiaoming Liu; Anil K. Jain;	Instead of simply using multi-task learning to simultaneously detect manipulated images and predict the manipulated mask (regions), we propose to utilize an attention mechanism to process and improve the feature maps for the classification task.
573	Learning Texture Transformer Network for Image Super-Resolution	Fuzhi Yang; Huan Yang; Jianlong Fu; Hongtao Lu; Baining Guo;	In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively.
574	Reference-Based Sketch Image Colorization Using Augmented-Self Reference and Dense Semantic Correspondence	Junsoo Lee; Eungyeup Kim; Yunsung Lee; Dongjun Kim; Jaehyuk Chang; Jaegul Choo;	To tackle this challenge, we propose to utilize the identical image with geometric distortion as a virtual reference, which makes it possible to secure the ground truth for a colored output image.
575	Deblurring Using Analysis-Synthesis Networks Pair	Adam Kaufman; Raanan Fattal;	We propose a new architecture which breaks the deblurring network into an analysis network which estimates the blur, and a synthesis network that uses this kernel to deblur the image.
576	Exploring Unlabeled Faces for Novel Attribute Discovery	Hyojin Bahng; Sunghyo Chung; Seungjoo Yoo; Jaegul Choo;	In this paper, we attempt to alleviate this necessity for labeled data in the facial image translation domain.
577	Neural Pose Transfer by Spatially Adaptive Instance Normalization	Jiashun Wang; Chao Wen; Yanwei Fu; Haitao Lin; Tianyun Zou; Xiangyang Xue; Yinda Zhang;	Particularly in this paper, we are interested in transferring the pose of source human mesh to deform the target human mesh, while the source and target meshes may have different identity information.
578	Fine-Grained Image-to-Image Transformation Towards Visual Recognition	Wei Xiong; Yutong He; Yixuan Zhang; Wenhan Luo; Lin Ma; Jiebo Luo;	In this paper, we aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks.
579	Deep Facial Non-Rigid Multi-View Stereo	Ziqian Bai; Zhaopeng Cui; Jamal Ahmed Rahim; Xiaoming Liu; Ping Tan;	We present a method for 3D face reconstruction from multi-view images with different expressions.
580	Attention-Driven Cropping for Very High Resolution Facial Landmark Detection	Prashanth Chandran; Derek Bradley; Markus Gross; Thabo Beeler;	Building on top of recent progress in attention-based networks, we present a novel, fully convolutional regional architecture that is specially designed for predicting landmarks on very high resolution facial images without downsampling.
581	Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis	Yiyi Liao; Katja Schwarz; Lars Mescheder; Andreas Geiger;	We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain.
582	End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection	Rui Qian; Divyansh Garg; Yan Wang; Yurong You; Serge Belongie; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; Wei-Lun Chao;	In this paper, we introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
583	Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks	Jiangke Lin; Yi Yuan; Tianjia Shao; Kun Zhou;	In this paper, we introduce a method to reconstruct 3D facial shapes with high-fidelity textures from single-view images in the wild, without the need to capture a large-scale face texture database.
584	CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition	Yuge Huang; Yuhan Wang; Ying Tai; Xiaoming Liu; Pengcheng Shen; Shaoxin Li; Jilin Li; Feiyue Huang;	In this work, we propose a novel Adaptive Curriculum Learning loss (CurricularFace) that embeds the idea of curriculum learning into the loss function to achieve a novel training strategy for deep face recognition, which mainly addresses easy samples in the early training stage and hard ones in the later stage.
585	Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images	Hang Zhou; Jihao Liu; Ziwei Liu; Yu Liu; Xiaogang Wang;	To overcome these challenges, we propose a novel unsupervised framework that can synthesize photo-realistic rotated faces using only single-view image collections in the wild.
586	One-Shot Domain Adaptation for Face Generation	Chao Yang; Ser-Nam Lim;	In this paper, we propose a framework capable of generating face images that fall into the same distribution as that of a given one-shot example.
587	BidNet: Binocular Image Dehazing Without Explicit Disparity Estimation	Yanwei Pang; Jing Nie; Jin Xie; Jungong Han; Xuelong Li;	On the assumption that dehazed binocular images are superior to the hazy ones for stereo vision tasks such as 3D object detection and according to the fact that image haze is a function of depth, this paper proposes a Binocular image dehazing Network (BidNet) aiming at dehazing both the left and right images of binocular images within the deep learning framework.
588	Deep Shutter Unrolling Network	Peidong Liu; Zhaopeng Cui; Viktor Larsson; Marc Pollefeys;	We present a novel network for rolling shutter effect correction.
589	Joint Texture and Geometry Optimization for RGB-D Reconstruction	Yanping Fu; Qingan Yan; Jie Liao; Chunxia Xiao;	In this paper, we propose a novel approach that can jointly optimize the camera poses, texture and geometry of the reconstructed model, and color consistency between the key-frames.
590	Deep 3D Capture: Geometry and Reflectance From Sparse Multi-View Images	Sai Bi; Zexiang Xu; Kalyan Sunkavalli; David Kriegman; Ravi Ramamoorthi;	We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object from a sparse set of only six images captured by wide-baseline cameras under collocated point lighting.
591	Auto-Tuning Structured Light by Optical Stochastic Gradient Descent	Wenzheng Chen; Parsa Mirdehghan; Sanja Fidler; Kiriakos N. Kutulakos;	We consider the problem of optimizing the performance of an active imaging system by automatically discovering the illuminations it should use, and the way to decode them.
592	MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction	Zhenyu Xu; Yiguang Liu; Xuelei Shi; Ying Wang; Yunan Zheng;	In this paper, we present a novel method, matching ambiguity reduced multiple view stereo (MARMVS) to address this issue.
593	Uncertainty Based Camera Model Selection	Michal Polic; Stanislav Steidl; Cenek Albl; Zuzana Kukelova; Tomas Pajdla;	In this paper, we present a new automatic method for camera model selection in large scale SfM that is based on efficient uncertainty evaluation.
594	Local Implicit Grid Representations for 3D Scenes	Chiyu "Max" Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Niessner, Thomas Funkhouser;	In this paper, we introduce Local Implicit Grid Representations, a new 3D shape representation designed for scalability and generality.
595	TetraTSDF: 3D Human Reconstruction From a Single Image With a Tetrahedral Outer Shell	Hayato Onizuka; Zehra Hayirci; Diego Thomas; Akihiro Sugimoto; Hideaki Uchiyama; Rin-ichiro Taniguchi;	In this paper, we propose the tetrahedral outer shell volumetric truncated signed distance function (TetraTSDF) model for the human body, and its corresponding part connection network (PCN) for 3D human body shape regression.
596	Averaging Essential and Fundamental Matrices in Collinear Camera Settings	Amnon Geifman; Yoni Kasten; Meirav Galun; Ronen Basri;	In this paper, we introduce an analysis and algorithms for averaging bifocal tensors (essential or fundamental matrices) when either subsets or all of the camera centers are collinear.
597	On the Distribution of Minima in Intrinsic-Metric Rotation Averaging	Kyle Wilson; David Bindel;	In this paper, we study the spatial distribution of local minima.
598	Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation	Edoardo Remelli; Shangchen Han; Sina Honari; Pascal Fua; Robert Wang;	We present a lightweight solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
599	A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction From an Open Aerial Dataset	Jin Liu; Shunping Ji;	We also introduce in this paper a novel network, called RED-Net, for wide-range depth inference, which we developed from a recurrent encoder-decoder structure to regularize cost maps across depths and a 2D fully convolutional network as framework.
600	Factorized Higher-Order CNNs With an Application to Spatio-Temporal Emotion Estimation	Jean Kossaifi; Antoine Toisoul; Adrian Bulat; Yannis Panagakis; Timothy M. Hospedales; Maja Pantic;	In this paper, we unify these two approaches by proposing a tensor factorization framework for efficient multidimensional (separable) convolutions of higher-order.
601	Effectively Unbiased FID and Inception Score and Where to Find Them	Min Jin Chong; David Forsyth;	This paper shows that two commonly used evaluation metrics for generative models, the Frechet Inception Distance (FID) and the Inception Score (IS), are biased — the expected value of the score computed for a finite sample set is not the true value of the score.
602	Robust Homography Estimation via Dual Principal Component Pursuit	Tianjiao Ding; Yunchen Yang; Zhihui Zhu; Daniel P. Robinson; Rene Vidal; Laurent Kneip; Manolis C. Tsakiris;	We revisit robust estimation of homographies over point correspondences between two or three views, a fundamental problem in geometric vision.
603	Non-Adversarial Video Synthesis With Learned Priors	Abhishek Aich; Akash Gupta; Rameswar Panda; Rakib Hyder; M. Salman Asif; Amit K. Roy-Chowdhury;	Different from these methods, we focus on the problem of generating videos from latent noise vectors, without any reference input frames.
604	Uncertainty-Aware Mesh Decoder for High Fidelity 3D Face Reconstruction	Gun-Hee Lee; Seong-Whan Lee;	In this paper, we propose to employ (i) an uncertainty-aware encoder that presents face features as distributions and (ii) a fully nonlinear decoder model combining Graph CNN with GAN.
605	3FabRec: Fast Few-Shot Face Alignment by Reconstruction	Bjorn Browatzki; Christian Wallraven;	We introduce a semi-supervised method in which the crucial idea is to first generate implicit face knowledge from the large amounts of unlabeled images of faces available today.
606	Weakly-Supervised Domain Adaptation via GAN and Mesh Model for Estimating 3D Hand Poses Interacting Objects	Seungryul Baek; Kwang In Kim; Tae-Kyun Kim;	In this work, we propose a novel end-to-end trainable pipeline that adapts the hand-object domain to the single hand-only domain, while learning for HPE.
607	Vec2Face: Unveil Human Faces From Their Blackbox Features in Face Recognition	Chi Nhan Duong; Thanh-Dat Truong; Khoa Luu; Kha Gia Quach; Hung Bui; Kaushik Roy;	This paper presents a novel generative structure with Bijective Metric Learning, namely Bijective Generative Adversarial Networks in a Distillation framework (DiBiGAN), for synthesizing faces of an identity given that person’s features.
608	StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images	Ayush Tewari; Mohamed Elgharib; Gaurav Bharaj; Florian Bernard; Hans-Peter Seidel; Patrick Perez; Michael Zollhofer; Christian Theobalt;	We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM.
609	Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis	Jogendra Nath Kundu; Siddharth Seth; Varun Jampani; Mugalodi Rakesh; R. Venkatesh Babu; Anirban Chakraborty;	Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames.
610	Learning Meta Face Recognition in Unseen Domains	Jianzhu Guo; Xiangyu Zhu; Chenxu Zhao; Dong Cao; Zhen Lei; Stan Z. Li;	In this paper, we aim to learn a generalized model that can directly handle new unseen domains without any model updating. Besides, we propose two benchmarks for generalized face recognition evaluation.
611	Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data	Shichao Li; Lei Ke; Kevin Pratama; Yu-Wing Tai; Chi-Keung Tang; Kwang-Ting Cheng;	This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for training 2D-to-3D networks, (2) can effectively reduce dataset bias.
612	GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models	Hongyi Xu; Eduard Gabriel Bazavan; Andrei Zanfir; William T. Freeman; Rahul Sukthankar; Cristian Sminchisescu;	We present a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework.
613	Generating 3D People in Scenes Without People	Yan Zhang; Mohamed Hassan; Heiko Neumann; Michael J. Black; Siyu Tang;	We present a fully automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.
614	Transferring Cross-Domain Knowledge for Video Sign Language Recognition	Dongxu Li; Xin Yu; Chenchen Xu; Lars Petersson; Hongdong Li;	Motivated by this observation, we propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
615	Bodies at Rest: 3D Human Pose and Shape Estimation From a Pressure Image Using Synthetic Data	Henry M. Clever; Zackory Erickson; Ariel Kapusta; Greg Turk; Karen Liu; Charles C. Kemp;	We describe a physics-based method that simulates human bodies at rest in a bed with a pressure sensing mat, and present PressurePose, a synthetic dataset with 206K pressure images with 3D human poses and shapes.
616	Bayesian Adversarial Human Motion Synthesis	Rui Zhao; Hui Su; Qiang Ji;	We propose a generative probabilistic model for human motion synthesis.
617	LSM: Learning Subspace Minimization for Low-Level Vision	Chengzhou Tang; Lu Yuan; Ping Tan;	We study the energy minimization problem in low-level vision tasks from a novel perspective.
618	Learning a Neural Solver for Multiple Object Tracking	Guillem Braso; Laura Leal-Taixe;	In this work, we exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs).
619	GLU-Net: Global-Local Universal Network for Dense Flow and Correspondences	Prune Truong; Martin Danelljan; Radu Timofte;	In this work, we propose a universal network architecture that is directly applicable to all the aforementioned dense correspondence problems.
620	SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking	Dongyan Guo; Jun Wang; Ying Cui; Zhenhua Wang; Shengyong Chen;	By decomposing the visual tracking task into two subproblems as classification for pixel category and regression for object bounding box at this pixel, we propose a novel fully convolutional Siamese network to solve visual tracking end-to-end in a per-pixel manner.
621	MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask	Shengyu Zhao; Yilun Sheng; Yue Dong; Eric I-Chao Chang; Yan Xu;	In this paper, we propose an asymmetric occlusion-aware feature matching module, which can learn a rough occlusion mask that filters useless (occluded) areas immediately after feature warping without any explicit supervision.
622	Tracking by Instance Detection: A Meta-Learning Approach	Guangting Wang; Chong Luo; Xiaoyan Sun; Zhiwei Xiong; Wenjun Zeng;	We propose a principled three-step approach to build a high-performance tracker.
623	High-Performance Long-Term Tracking With Meta-Updater	Kenan Dai; Yunhua Zhang; Dong Wang; Jianhua Li; Huchuan Lu; Xiaoyun Yang;	In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame?
624	TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model	Bo Pang; Yizhuo Li; Yifan Zhang; Muchen Li; Cewu Lu;	To address these challenges, we propose a concise end-to-end model TubeTK which only needs one step training by introducing the "bounding-tube" to indicate temporal-spatial locations of objects in a short video clip.
625	Collaborative Motion Prediction via Neural Motion Message Passing	Yue Hu; Siheng Chen; Ya Zhang; Xiao Gu;	To address this challenge, we propose neural motion message passing (NMMP) to explicitly model the interaction and learn representations for directed interactions between actors.
626	P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds	Haozhe Qi; Chen Feng; Zhiguo Cao; Feng Zhao; Yang Xiao;	Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly.
627	Self-Supervised Deep Visual Odometry With Online Adaptation	Shunkai Li; Xin Wang; Yingdian Cao; Fei Xue; Zike Yan; Hongbin Zha;	In this paper, we propose an online meta-learning algorithm to enable VO networks to continuously adapt to new environments in a self-supervised manner.
628	Globally Optimal Contrast Maximisation for Event-Based Motion Estimation	Daqi Liu; Alvaro Parra; Tat-Jun Chin;	To alleviate this weakness, we propose a new globally optimal event-based motion estimation algorithm.
629	D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features	Xuyang Bai; Zixin Luo; Lei Zhou; Hongbo Fu; Long Quan; Chiew-Lan Tai;	In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point.
630	Towards Backward-Compatible Representation Learning	Yantao Shen; Yuanjun Xiong; Wei Xia; Stefano Soatto;	We propose a framework to train embedding models, called backward-compatible training (BCT), as a first step towards backward compatible representation learning.
631	PointAugment: An Auto-Augmentation Framework for Point Cloud Classification	Ruihui Li; Xianzhi Li; Pheng-Ann Heng; Chi-Wing Fu;	We present PointAugment, a new auto-augmentation framework that automatically optimizes and augments point cloud samples to enrich the data diversity when we train a classification network.
632	Cross-Batch Memory for Embedding Learning	Xun Wang; Haozhi Zhang; Weilin Huang; Matthew R. Scott;	In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating throughout the training process.
633	Circle Loss: A Unified Perspective of Pair Similarity Optimization	Yifan Sun; Changmao Cheng; Yuhan Zhang; Chi Zhang; Liang Zheng; Zhongdao Wang; Yichen Wei;	This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity s_p and minimize the between-class similarity s_n.
634	Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics	Simon Jenni; Hailin Jin; Paolo Favaro;	We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations of an image.
635	Hyperbolic Image Embeddings	Valentin Khrulkov; Leyla Mirvakhabova; Evgeniya Ustinova; Ivan Oseledets; Victor Lempitsky;	In this work, we demonstrate that in many practical scenarios, hyperbolic embeddings provide a better alternative.
636	Controllable Orthogonalization in Training DNNs	Lei Huang; Li Liu; Fan Zhu; Diwen Wan; Zehuan Yuan; Bo Li; Ling Shao;	This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton’s iteration (ONI), to learn a layer-wise orthogonal weight matrix in DNNs.
637	An Investigation Into the Stochasticity of Batch Whitening	Lei Huang; Lei Zhao; Yi Zhou; Fan Zhu; Li Liu; Ling Shao;	Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios.
638	High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification	Guan’an Wang; Shuo Yang; Huanyu Liu; Zhicheng Wang; Yang Yang; Shuliang Wang; Gang Yu; Erjin Zhou; Jian Sun;	In this paper, we propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
639	Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance	Jaime Spencer; Richard Bowden; Simon Hadfield;	The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval, regardless of the current seasonal or temporal appearance.
640	Learning to Dress 3D People in Generative Clothing	Qianli Ma; Jinlong Yang; Anurag Ranjan; Sergi Pujades; Gerard Pons-Moll; Siyu Tang; Michael J. Black;	To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing.
641	MAST: A Memory-Augmented Self-Supervised Tracker	Zihang Lai; Erika Lu; Weidi Xie;	We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods.
642	Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation	Liang Liu; Jiangning Zhang; Ruifei He; Yong Liu; Yabiao Wang; Ying Tai; Donghao Luo; Chengjie Wang; Jilin Li; Feiyue Huang;	In this work, we present a framework to use more reliable supervision from transformations.
643	GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning	Xinshuo Weng; Yongxin Wang; Yunze Man; Kris M. Kitani;	In this work, we propose two techniques to improve the discriminative feature learning for MOT: (1) instead of obtaining features for each object independently, we propose a novel feature interaction mechanism by introducing the Graph Neural Network.
644	ClusterFit: Improving Generalization of Visual Representations	Xueting Yan; Ishan Misra; Abhinav Gupta; Deepti Ghadiyaram; Dhruv Mahajan;	In this work, we present a simple strategy – ClusterFit to improve the robustness of the visual representations learned during pre-training.
645	Learning Dynamic Relationships for 3D Human Motion Prediction	Qiongjie Cui; Huaijiang Sun; Fei Yang;	To tackle these issues, we propose a deep generative model based on graph networks and adversarial learning.
646	Knowledge As Priors: Cross-Modal Knowledge Generalization for Datasets Without Superior Knowledge	Long Zhao; Xi Peng; Yuxiao Chen; Mubbasir Kapadia; Dimitris N. Metaxas;	In this paper, we propose a novel scheme to train the Student in a Target dataset where the Teacher is unavailable.
647	S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation	Yizhe Zhu; Martin Renqiang Min; Asim Kadav; Hans Peter Graf;	We propose a sequential variational autoencoder to learn disentangled representations of sequential data (e.g., videos and audios) under self-supervision.
648	Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning	Yuan Yao; Chang Liu; Dezhao Luo; Yu Zhou; Qixiang Ye;	In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way.
649	Learning to Manipulate Individual Objects in an Image	Yanchao Yang; Yutong Chen; Stefano Soatto;	The key to our method is the combination of spatial disentanglement, enforced by a Contextual Information Separation loss, and perceptual cycle-consistency, enforced by a loss that penalizes changes in the image partition in response to perturbations of the latent factors.
650	PADS: Policy-Adapted Sampling for Visual Similarity Learning	Karsten Roth; Timo Milbich; Bjorn Ommer;	We, therefore, employ reinforcement learning and have a teacher network adjust the sampling distribution based on the current state of the learner network, which represents visual similarity.
651	Siam R-CNN: Visual Tracking by Re-Detection	Paul Voigtlaender; Jonathon Luiten; Philip H.S. Torr; Bastian Leibe;	We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking.
652	ASLFeat: Learning Local Features of Accurate Shape and Localization	Zixin Luo; Lei Zhou; Xuyang Bai; Hongkai Chen; Jiahui Zhang; Yao Yao; Shiwei Li; Tian Fang; Long Quan;	In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues.
653	Filter Grafting for Deep Neural Networks	Fanxu Meng; Hao Cheng; Ke Li; Zhixin Xu; Rongrong Ji; Xing Sun; Guangming Lu;	This paper proposes a new learning paradigm called filter grafting, which aims to improve the representation capability of Deep Neural Networks (DNNs).
654	HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation	Bardia Doosti; Shujon Naha; Majid Mirbagheri; David J. Crandall;	In this paper, we propose a lightweight model called HOPE-Net which jointly estimates hand and object pose in 2D and 3D in real-time.
655	DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation	Mohammad Rami Koujan; Anastasios Roussos; Stefanos Zafeiriou;	In this work, we propose DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense estimation of 3D non-rigid facial flow between pairs of monocular images.
656	Learning for Video Compression With Hierarchical Quality and Recurrent Enhancement	Ren Yang; Fabian Mentzer; Luc Van Gool; Radu Timofte;	In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network.
657	Learning Better Lossless Compression Using Lossy Compression	Fabian Mentzer; Luc Van Gool; Michael Tschannen;	We leverage the powerful lossy image compression algorithm BPG to build a lossless image compression system.
658	Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching	Pengpeng Liu; Irwin King; Michael R. Lyu; Jia Xu;	In this paper, we propose a unified method to jointly learn optical flow and stereo matching.
659	Multi-Scale Fusion Subspace Clustering Using Similarity Constraint	Zhiyuan Dang; Cheng Deng; Xu Yang; Heng Huang;	In this paper, we propose the Multi-Scale Fusion Subspace Clustering Using Similarity Constraint (SC-MSFSC) network, which learns a more discriminative self-expression coefficient matrix by a novel multi-scale fusion module.
660	Siamese Box Adaptive Network for Visual Tracking	Zedu Chen; Bineng Zhong; Guorong Li; Shengping Zhang; Rongrong Ji;	To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN).
661	Cross-Domain Face Presentation Attack Detection via Multi-Domain Disentangled Representation Learning	Guoqing Wang; Hu Han; Shiguang Shan; Xilin Chen;	In light of this, we propose an efficient disentangled representation learning for cross-domain face PAD.
662	Online Deep Clustering for Unsupervised Representation Learning	Xiaohang Zhan; Jiahao Xie; Ziwei Liu; Yew-Soon Ong; Chen Change Loy;	To overcome this challenge, we propose Online Deep Clustering (ODC) that performs clustering and network update simultaneously rather than alternatingly.
663	Density-Aware Feature Embedding for Face Clustering	Senhui Guo; Jing Xu; Dapeng Chen; Chao Zhang; Xiaogang Wang; Rui Zhao;	In this paper, we propose a Density-Aware Feature Embedding Network (DA-Net) for the task of face clustering, which utilizes both local and non-local information, to learn a robust feature embedding.
664	Self-Supervised Learning of Pretext-Invariant Representations	Ishan Misra; Laurens van der Maaten;	Specifically, we develop Pretext-Invariant Representation Learning (PIRL, pronounced as `pearl’) that learns invariant representations based on pretext tasks.
665	ROAM: Recurrently Optimizing Tracking Model	Tianyu Yang; Pengfei Xu; Runbo Hu; Hua Chai; Antoni B. Chan;	In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations.
666	Deformable Siamese Attention Networks for Visual Object Tracking	Yuechen Yu; Yilei Xiong; Weilin Huang; Matthew R. Scott;	In this paper, we propose Deformable Siamese Attention Networks, referred to as SiamAttn, by introducing a new Siamese attention mechanism that computes deformable self-attention and cross-attention.
667	15 Keypoints Is All You Need	Michael Snower; Asim Kadav; Farley Lai; Hans Peter Graf;	We present an efficient multi-person pose-tracking method, KeyTrack that only relies on keypoint information without using any RGB or optical flow to locate and track human keypoints in real-time.
668	Optical Flow in the Dark	Yinqiang Zheng; Mingfang Zhang; Feng Lu;	We propose an end-to-end data-driven method that avoids error accumulation and learns optical flow directly from low-light noisy images. We also collect a new optical flow dataset in raw format with a large range of exposure to be used as a benchmark.
669	Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt	Hangyu Lin; Yanwei Fu; Xiangyang Xue; Yu-Gang Jiang;	Particularly, towards the pre-training task, we present a novel Sketch Gestalt Model (SGM) to help train the Sketch-BERT.
670	A Unified Object Motion and Affinity Model for Online Multi-Object Tracking	Junbo Yin; Wenguan Wang; Qinghao Meng; Ruigang Yang; Jianbing Shen;	In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure.
671	Sub-Frame Appearance and 6D Pose Estimation of Fast Moving Objects	Denys Rozumnyi; Jan Kotera; Filip Sroubek; Jiri Matas;	We propose a novel method that tracks fast moving objects, mainly non-uniform spherical, in full 6 degrees of freedom, estimating simultaneously their 3D motion trajectory, 3D pose and object appearance changes with a time step that is a fraction of the video frame exposure time.
672	How to Train Your Deep Multi-Object Tracker	Yihong Xu; Aljosa Osep; Yutong Ban; Radu Horaud; Laura Leal-Taixe; Xavier Alameda-Pineda;	In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers.
673	TPNet: Trajectory Proposal Network for Motion Prediction	Liangji Fang; Qinhong Jiang; Jianping Shi; Bolei Zhou;	In this work we propose a novel two-stage motion prediction framework, Trajectory Proposal Network (TPNet).
674	Large Scale Video Representation Learning via Relational Graph Clustering	Hyodong Lee; Joonseok Lee; Joe Yue-Hei Ng; Paul Natsev;	In this work, we explore two promising scalable representation learning approaches on video domain.
675	Towards Universal Representation Learning for Deep Face Recognition	Yichun Shi; Xiang Yu; Kihyuk Sohn; Manmohan Chandraker; Anil K. Jain;	Instead, we propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge.
676	Robust Partial Matching for Person Search in the Wild	Yingji Zhong; Xiaoyu Wang; Shiliang Zhang;	To alleviate this issue, this paper proposes an Align-to-Part Network (APNet) for person detection and re-Identification (reID).
677	Correlation-Guided Attention for Corner Detection Based Visual Tracking	Fei Du; Peng Liu; Wei Zhao; Xianglong Tang;	We analyze the reasons for their failure and propose a state-of-the-art tracker that performs correlation-guided attentional corner detection in two stages.
678	Learning Multi-Object Tracking and Segmentation From Automatic Annotations	Lorenzo Porzi; Markus Hofinger; Idoia Ruiz; Joan Serrat; Samuel Rota Bulo; Peter Kontschieder;	In this work we contribute a novel pipeline to automatically generate training data, and to improve over state-of-the-art multi-object tracking and segmentation (MOTS) methods.
679	PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation	Abdallah Benzine; Florian Chabot; Bertrand Luvison; Quoc Cuong Pham; Catherine Achard;	In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based Network), a new single-shot, anchor-based and multi-person 3D pose estimation approach.
680	Rotation Consistent Margin Loss for Efficient Low-Bit Face Recognition	Yudong Wu; Yichao Wu; Ruihao Gong; Yuanhao Lv; Ken Chen; Ding Liang; Xiaolin Hu; Xianglong Liu; Junjie Yan;	In this paper, we consider the low-bit quantization problem of face recognition (FR) under the open-set protocol.
681	Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking	Peiliang Li; Jieqi Shi; Shaojie Shen;	To benefit from both the powerful object understanding skill from deep neural network meanwhile tackle precise geometry modeling for consistent trajectory estimation, we propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
682	Unity Style Transfer for Person Re-Identification	Chong Liu; Xiaojun Chang; Yi-Dong Shen;	To solve this problem, we propose a UnityStyle adaption method, which can smooth the style disparities within the same camera and across different cameras.
683	Suppressing Uncertainties for Large-Scale Facial Expression Recognition	Kai Wang; Xiaojiang Peng; Jianfei Yang; Shijian Lu; Yu Qiao;	To address this problelm, this paper proposes to suppress the uncertainties by a simple yet efficient Self-Cure Network (SCN).
684	Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation	Rahul Mitra; Nitesh B. Gundavarapu; Abhishek Sharma; Arjun Jain;	To reduce this annotation dependency, we propose Multiview-Consistent Semi Supervised Learning (MCSS) framework that utilizes similarity in pose information from unannotated, uncalibrated but synchronized multi-view videos of human motions as additional weak supervision signal to guide 3D human pose regression.
685	Regularizing Neural Networks via Minimizing Hyperspherical Energy	Rongmei Lin; Weiyang Liu; Zhen Liu; Chen Feng; Zhiding Yu; James M. Rehg; Li Xiong; Le Song;	To address these problems, we propose the compressive minimum hyperspherical energy (CoMHE) as a more effective regularization for neural networks.
686	Learning Representations by Predicting Bags of Visual Words	Spyros Gidaris; Andrei Bursuc; Nikos Komodakis; Patrick Perez; Matthieu Cord;	Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions that encode discrete visual concepts, here called visual words.
687	AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces	Muhammad Haris Khan; John McDonagh; Salman Khan; Muhammad Shahabuddin; Aditya Arora; Fahad Shahbaz Khan; Ling Shao; Georgios Tzimiropoulos;	To this end, we introduce a large-scale, hierarchical annotated dataset of animal faces, featuring 22.4K faces from 350 diverse species and 21 animal orders across biological taxonomy.
688	A Transductive Approach for Video Object Segmentation	Yizhuo Zhang; Zhirong Wu; Houwen Peng; Stephen Lin;	To address this issue, we propose a simple yet strong transductive method, in which additional modules, datasets, and dedicated architectural designs are not needed.
689	Dynamic Face Video Segmentation via Reinforcement Learning	Yujiang Wang; Mingzhi Dong; Jie Shen; Yang Wu; Shiyang Cheng; Maja Pantic;	To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an efficient and effective scheduling policy from expert information about decision history and from the process of maximising global return.
690	Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion	Julian Chibane; Thiemo Alldieck; Gerard Pons-Moll;	To solve this, we propose Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions, but critically they can also retain detail when it is present in the input data, and can reconstruct articulated humans.
691	Semantic Drift Compensation for Class-Incremental Learning	Lu Yu; Bartlomiej Twardowski; Xialei Liu; Luis Herranz; Kai Wang; Yongmei Cheng; Shangling Jui; Joost van de Weijer;	Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars.
692	Context-Aware Human Motion Prediction	Enric Corona; Albert Pumarola; Guillem Alenya; Francesc Moreno-Noguer;	In this paper, we explore this scenario using a novel context-aware motion prediction architecture.
693	DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data	Aljaz Bozic; Michael Zollhofer; Christian Theobalt; Matthias Niessner;	Based on this corpus, we introduce a data-driven non-rigid feature matching approach, which we integrate into an optimization-based reconstruction pipeline.
694	Optical Non-Line-of-Sight Physics-Based 3D Human Pose Estimation	Mariko Isogawa; Ye Yuan; Matthew O’Toole; Kris M. Kitani;	We describe a method for 3D human pose estimation from transient images (i.e., a 3D spatio-temporal histogram of photons) acquired by an optical non-line-of-sight (NLOS) imaging system.
695	Learning to Transfer Texture From Clothing Images to 3D Humans	Aymen Mir; Thiemo Alldieck; Gerard Pons-Moll;	In this paper, we present a simple yet effective method to automatically transfer textures of clothing images (front and back) to 3D garments worn on top SMPL, in real time.
696	UniPose: Unified Human Pose Estimation in Single Images and Videos	Bruno Artacho; Andreas Savakis;	We propose UniPose, a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics.
697	Minimal Solutions to Relative Pose Estimation From Two Views Sharing a Common Direction With Unknown Focal Length	Yaqing Ding; Jian Yang; Jean Ponce; Hui Kong;	We propose minimal solutions to relative pose estimation problem from two views sharing a common direction with unknown focal length.
698	3D Human Mesh Regression With Dense Correspondence	Wang Zeng; Wanli Ouyang; Ping Luo; Wentao Liu; Xiaogang Wang;	This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i.e. a 2D space used for texture mapping of 3D mesh).
699	Cross-Modal Pattern-Propagation for RGB-T Tracking	Chaoqun Wang; Chunyan Xu; Zhen Cui; Ling Zhou; Tong Zhang; Xiaoya Zhang; Jian Yang;	Motivated by our observations on RGB-T data that pattern correlations are high-frequently recurred across modalities also along sequence frames, in this paper, we propose a cross-modal pattern-propagation (CMPP) tracking framework to diffuse instance patterns across RGB-T data on spatial domain as well as temporal domain.
700	Distilling Knowledge From Graph Convolutional Networks	Yiding Yang; Jiayan Qiu; Mingli Song; Dacheng Tao; Xinchao Wang;	In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a pre-trained GCN model.
701	Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment	Po-Hsiang Huang; Fu-En Yang; Yu-Chiang Frank Wang;	In this paper, we propose a unique network of CrossID-GAN to perform multi-ID face reenactment.
702	Distribution-Aware Coordinate Representation for Human Pose Estimation	Feng Zhang; Xiatian Zhu; Hanbin Dai; Mao Ye; Ce Zhu;	For the first time, we find that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for the performance. We further probe the design limitations of the standard coordinate decoding method, and propose a more principled distributionaware decoding method.
703	Parsing-Based View-Aware Embedding Network for Vehicle Re-Identification	Dechao Meng; Liang Li; Xuejing Liu; Yadong Li; Shijie Yang; Zheng-Jun Zha; Xingyu Gao; Shuhui Wang; Qingming Huang;	In this paper, we propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
704	HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map	Jameel Malik; Ibrahim Abdelaziz; Ahmed Elhayek; Soshi Shimada; Sk Aziz Ali; Vladislav Golyanik; Christian Theobalt; Didier Stricker;	In contrast, we propose a novel architecture with 3D convolutions trained in a weakly-supervised manner.
705	Determinant Regularization for Gradient-Efficient Graph Matching	Tianshu Yu; Junchi Yan; Baoxin Li;	In this paper, we show a novel regularization technique with the tool of determinant analysis on the matching matrix which is relaxed into continuous domain with gradient based optimization.
706	D3S – A Discriminative Single Shot Segmentation Tracker	Alan Lukezic; Jiri Matas; Matej Kristan;	We propose a discriminative single-shot segmentation tracker – D3S, which narrows the gap between visual object tracking and video object segmentation.
707	MANTRA: Memory Augmented Networks for Multiple Trajectory Prediction	Francesco Marchetti; Federico Becattini; Lorenzo Seidenari; Alberto Del Bimbo;	In this paper we address the problem of multimodal trajectory prediction exploiting a Memory Augmented Neural Network.
708	End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances	Marin Toromanoff; Emilie Wirbel; Fabien Moutarde;	We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane keeping, pedestrians and vehicles avoidance, and traffic light detection.
709	GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-Wise Transformations	Xiang Gao; Wei Hu; Guo-Jun Qi;	To this end, we propose a novel unsupervised learning of Graph Transformation Equivariant Representations (GraphTER), aiming to capture intrinsic patterns of graph structure under both global and local transformations.
710	Can Facial Pose and Expression Be Separated With Weak Perspective Camera?	Evangelos Sariyanidi; Casey J. Zampella; Robert T. Schultz; Birkan Tunc;	This paper critically examines the suitability of WP camera for separating facial pose and expression.
711	Probabilistic Regression for Visual Tracking	Martin Danelljan; Luc Van Gool; Radu Timofte;	In this work, we therefore propose a probabilistic regression formulation and apply it to tracking.
712	3DRegNet: A Deep Neural Network for 3D Point Registration	G. Dias Pais; Srikumar Ramalingam; Venu Madhav Govindu; Jacinto C. Nascimento; Rama Chellappa; Pedro Miraldo;	We present 3DRegNet, a novel deep learning architecture for the registration of 3D scans.
713	Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation	Matteo Fabbri; Fabio Lanzi; Simone Calderara; Stefano Alletto; Rita Cucchiara;	In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images.
714	Three-Dimensional Reconstruction of Human Interactions	Mihai Fieraru; Mihai Zanfir; Elisabeta Oneata; Alin-Ionut Popa; Vlad Olaru; Cristian Sminchisescu;	This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2,525 contact events, 728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences within 138,213 selected contact regions.
715	Distribution-Induced Bidirectional Generative Adversarial Network for Graph Representation Learning	Shuai Zheng; Zhenfeng Zhu; Xingxing Zhang; Zhizhe Liu; Jian Cheng; Yao Zhao;	In this paper, we propose a Distribution-induced Bidirectional Generative Adversarial Network (named DBGAN) for graph representation learning.
716	Minimal Solvers for 3D Scan Alignment With Pairs of Intersecting Lines	Andre Mateus; Srikumar Ramalingam; Pedro Miraldo;	In this paper, we present minimal solvers that combine these different type of constraints: 1) three line intersections and one point match; 2) one line intersection and two point matches; 3) three line intersections and one plane match; 4) one line intersection and two plane matches; and 5) one line intersection, one point match, and one plane match.
717	Wavelet Integrated CNNs for Noise-Robust Image Classification	Qiufu Li; Linlin Shen; Sheng Guo; Zhihui Lai;	We present general DWT and Inverse DWT (IDWT) layers applicable to various wavelets like Haar, Daubechies, and Cohen, etc., and design wavelet integrated CNNs (WaveCNets) using these layers for image classification.
718	Embedding Expansion: Augmentation in Embedding Space for Deep Metric Learning	Byungsoo Ko; Geonmo Gu;	In this paper, inspired by query expansion and database augmentation, we propose an augmentation method in an embedding space for pair-based metric learning losses, called embedding expansion.
719	PropagationNet: Propagate Points to Curve to Learn Structure Information	Xiehe Huang; Weihong Deng; Haifeng Shen; Xiubao Zhang; Jieping Ye;	In this paper, we explore the instincts and reasons behind our two proposals, i.e. Propagation Module and Focal Wing Loss, to tackle the problem.
720	Sequential 3D Human Pose and Shape Estimation From Point Clouds	Kangkan Wang; Jin Xie; Guofeng Zhang; Lei Liu; Jian Yang;	In this paper, we propose a novel sequential 3D human pose and shape estimation framework from a sequence of point clouds.
721	Improving the Robustness of Capsule Networks to Image Affine Transformations	Jindong Gu; Volker Tresp;	Furthermore, we explore the limitations of capsule transformations and propose affine CapsNets (Aff-CapsNets), which are more robust to affine transformations.
722	Noise Modeling, Synthesis and Classification for Generic Object Anti-Spoofing	Joel Stehouwer; Amin Jourabloo; Yaojie Liu; Xiaoming Liu;	In this work, we define and tackle the problem of Generic Object Anti-Spoofing (GOAS) for the first time.
723	Quaternion Product Units for Deep Learning on 3D Rotation Groups	Xuan Zhang; Shaofei Qin; Yi Xu; Hongteng Xu;	We propose a novel quaternion product unit (QPU) to represent data on 3D rotation groups.
724	Unsupervised Representation Learning for Gaze Estimation	Yu Yu; Jean-Marc Odobez;	To address this issue, our main contribution in this paper is to propose an effective approach to learn a low dimensional gaze representation without gaze annotations, which to the best of our best knowledge, is the first work to do so.
725	P-nets: Deep Polynomial Neural Networks	Grigorios G. Chrysos; Stylianos Moschoglou; Giorgos Bouritsas; Yannis Panagakis; Jiankang Deng; Stefanos Zafeiriou;	In this paper, we propose \Pi-Nets, a new class of DCNNs.
726	Hierarchically Robust Representation Learning	Qi Qian; Juhua Hu; Hao Li;	In this work, we investigate this phenomenon and demonstrate that deep features can be suboptimal due to the fact that they are learned by minimizing the empirical risk.
727	How Useful Is Self-Supervised Pretraining for Visual Tasks?	Alejandro Newell; Jia Deng;	We investigate what factors may play a role in the utility of these pretraining methods for practitioners.
728	Copy and Paste GAN: Face Hallucination From Shaded Thumbnails	Yang Zhang; Ivor W. Tsang; Yawei Luo; Chang-Hui Hu; Xiaobo Lu; Xin Yu;	This paper proposes a Copy and Paste Generative Adversarial Network (CPGAN) to recover authentic high-resolution (HR) face images while compensating for low and non-uniform illumination.
729	TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style	Chaitanya Patel; Zhouyingcheng Liao; Gerard Pons-Moll;	In this paper, we present TailorNet, a neural model which predicts clothing deformation in 3D as a function of three factors: pose, shape and style (garment geometry), while retaining wrinkle detail.
730	Object-Occluded Human Shape and Pose Estimation From a Single Color Image	Tianshu Zhang; Buzhen Huang; Yangang Wang;	In this paper, we focus on the problem of directly estimating the object-occluded human shape and pose from single color images. To supervise the network training, we further build a novel dataset named as 3DOH50K.
731	Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking	Jin Gao; Weiming Hu; Yan Lu;	Despite many dedicated techniques proposed to somehow treat those issues, in this paper we take a new way to strike a compromise between them based on the recursive least-squares estimation (LSE) algorithm.
732	Self-Supervised Monocular Scene Flow Estimation	Junhwa Hur; Stefan Roth;	We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance.
733	Learning Fast and Robust Target Models for Video Object Segmentation	Andreas Robinson; Felix Jaremo Lawin; Martin Danelljan; Fahad Shahbaz Khan; Michael Felsberg;	We propose a novel VOS architecture consisting of two network components.
734	Reciprocal Learning Networks for Human Trajectory Prediction	Hao Sun; Zhiqun Zhao; Zhihai He;	Based on this unique property, we develop a new approach, called reciprocal learning, for human trajectory prediction.
735	Nonparametric Object and Parts Modeling With Lie Group Dynamics	David S. Hayden; Jason Pacheco; John W. Fisher III;	Here, we relax such strong assumptions via an unsupervised, Bayesian nonparametric parts model that infers an unknown number of parts with motions coupled by a body dynamic and parameterized by SE(D), the Lie group of rigid transformations.
736	Learning to Shadow Hand-Drawn Sketches	Qingyuan Zheng; Zhuoru Li; Adam Bargteil;	We present a fully automatic method to generate detailed and accurate artistic shadows from pairs of line drawing sketches and lighting directions.
737	Intuitive, Interactive Beard and Hair Synthesis With Generative Models	Kyle Olszewski; Duygu Ceylan; Jun Xing; Jose Echevarria; Zhili Chen; Weikai Chen; Hao Li;	We present an interactive approach to synthesizing realistic variations in facial hair in images, ranging from subtle edits to existing hair to the addition of complex and challenging hair in images of clean-shaven subjects.
738	Semantic Pyramid for Image Generation	Assaf Shocher; Yossi Gandelsman; Inbar Mosseri; Michal Yarom; Michal Irani; William T. Freeman; Tali Dekel;	We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model.
739	SynSin: End-to-End View Synthesis From a Single Image	Olivia Wiles; Georgia Gkioxari; Richard Szeliski; Justin Johnson;	We propose a novel end-to-end model for this task using a single image at test time; it is trained on real images without any ground-truth 3D information.
740	A Characteristic Function Approach to Deep Implicit Generative Modeling	Abdul Fatir Ansari; Jonathan Scarlett; Harold Soh;	In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions.
741	High-Resolution Daytime Translation Without Domain Labels	Ivan Anokhin; Pavel Solovev; Denis Korzhenkov; Alexey Kharlamov; Taras Khakhulin; Aleksei Silvestrov; Sergey Nikolenko; Victor Lempitsky; Gleb Sterkin;	We present the high-resolution daytime translation (HiDT) model for this task.
742	Leveraging 2D Data to Learn Textured 3D Mesh Generation	Paul Henderson; Vagia Tsiminaki; Christoph H. Lampert;	In this work, we present the first generative model of textured 3D meshes.
743	Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting	Zili Yi; Qiang Tang; Shekoofeh Azizi; Daesik Jang; Zhan Xu;	Motivated by this, we propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches, thus only requiring a low-resolution prediction from the network.
744	Flow Contrastive Estimation of Energy-Based Models	Ruiqi Gao; Erik Nijkamp; Diederik P. Kingma; Zhen Xu; Andrew M. Dai; Ying Nian Wu;	This paper studies a training method to jointly estimate an energy-based model and a flow-based model, in which the two models are iteratively updated based on a shared adversarial value function.
745	Hardware-in-the-Loop End-to-End Optimization of Camera Image Processing Pipelines	Ali Mosleh; Avinash Sharma; Emmanuel Onzon; Fahim Mannan; Nicolas Robidoux; Felix Heide;	Departing from such approximations, we present a hardware-in-the-loop method that directly optimizes hardware image processing pipelines for end-to-end domain-specific losses by solving a nonlinear multi-objective optimization problem with a novel 0th-order stochastic solver directly interfaced with the hardware ISP.
746	Search to Distill: Pearls Are Everywhere but Not the Eyes	Yu Liu; Xuhui Jia; Mingxing Tan; Raviteja Vemulapalli; Yukun Zhu; Bradley Green; Xiaogang Wang;	To achieve this, we present a new Architecture-aware Knowledge Distillation (AKD) approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model.
747	Total Deep Variation for Linear Inverse Problems	Erich Kobler; Alexander Effland; Karl Kunisch; Thomas Pock;	In this paper, we propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning.
748	Relative Interior Rule in Block-Coordinate Descent	Tomas Werner; Daniel Prusa; Tomas Dlask;	Based on this observation, we develop a theoretical framework for block-coordinate descent applied to general convex problems.
749	Learning Combinatorial Solver for Graph Matching	Tao Wang; He Liu; Yidong Li; Yi Jin; Xiaohui Hou; Haibin Ling;	In this paper we propose a fully trainable framework for graph matching, in which learning of affinities and solving for combinatorial optimization are not explicitly separated as in many previous arts.
750	SampleNet: Differentiable Point Cloud Sampling	Itai Lang; Asaf Manor; Shai Avidan;	We introduce a novel differentiable relaxation for point cloud sampling that approximates sampled points as a mixture of points in the primary input cloud.
751	Can We Learn Heuristics for Graphical Model Inference Using Reinforcement Learning?	Safa Messaoud; Maghav Kumar; Alexander G. Schwing;	In this paper, we show that we can learn program heuristics, i.e., policies, for solving inference in higher order CRFs for the task of semantic segmentation, using reinforcement learning.
752	Quasi-Newton Solver for Robust Non-Rigid Registration	Yuxin Yao; Bailin Deng; Weiwei Xu; Juyong Zhang;	In this paper, we propose a formulation for robust non-rigid registration based on a globally smooth robust estimator for data fitting and regularization, which can handle outliers and partial overlaps.
753	Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective	Muhammad Abdullah Jamal; Matthew Brown; Ming-Hsuan Yang; Liqiang Wang; Boqing Gong;	To this end, we propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
754	Optimizing Rank-Based Metrics With Blackbox Differentiation	Michal Rolinek; Vit Musil; Anselm Paulus; Marin Vlastelica; Claudio Michaelis; Georg Martius;	We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent.
755	DualSDF: Semantic Shape Manipulation Using a Two-Level Representation	Zekun Hao; Hadar Averbuch-Elor; Noah Snavely; Serge Belongie;	We propose DualSDF, a representation expressing shapes at two levels of granularity, one capturing fine details and the other representing an abstracted proxy shape using simple and semantically consistent shape primitives.
756	Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives	Duo Li; Qifeng Chen;	Complementary to previous training strategies, we propose Dynamic Hierarchical Mimicking, a generic feature learning mechanism, to advance CNN training with enhanced generalization ability.
757	Deep Homography Estimation for Dynamic Scenes	Hoang Le; Feng Liu; Shu Zhang; Aseem Agarwala;	This paper investigates and discusses how to design and train a deep neural network that handles dynamic scenes.
758	PF-Net: Point Fractal Network for 3D Point Cloud Completion	Zitian Huang; Yikuan Yu; Jiawen Xu; Feng Ni; Xinyi Le;	In this paper, we propose a Point Fractal Network (PF-Net), a novel learning-based approach for precise and high-fidelity point cloud completion.
759	On the Regularization Properties of Structured Dropout	Ambar Pal; Connor Lane; Rene Vidal; Benjamin D. Haeffele;	In this work we show that for single hidden-layer linear networks, DropBlock induces spectral k-support norm regularization, and promotes solutions that are low-rank and have factors with equal norm.
760	Learning Oracle Attention for High-Fidelity Face Completion	Tong Zhou; Changxing Ding; Shaowen Lin; Xinchao Wang; Dacheng Tao;	Accordingly, in this paper, we design a comprehensive framework for face completion based on the U-Net structure.
761	Deep Image Spatial Transformation for Person Image Generation	Yurui Ren; Xiaoming Yu; Junming Chen; Thomas H. Li; Ge Li;	In this paper, we propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level.
762	Learning to Optimize on SPD Manifolds	Zhi Gao; Yuwei Wu; Yunde Jia; Mehrtash Harandi;	In this paper, we propose a meta-learning method to automatically learn an iterative optimizer on SPD manifolds.
763	Deep 3D Portrait From a Single Image	Sicheng Xu; Jiaolong Yang; Dong Chen; Fang Wen; Yu Deng; Yunde Jia; Xin Tong;	In this paper, we present a learning-based approach for recovering the 3D geometry of human head from a single portrait image.
764	RDCFace: Radial Distortion Correction for Face Recognition	He Zhao; Xianghua Ying; Yongjie Shi; Xin Tong; Jingsi Wen; Hongbin Zha;	In this paper, we propose a distortion-invariant face recognition system called RDCFace, which directly and only utilize the distorted images of faces, to alleviate the effects of radial lens distortion.
765	Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition	Yaobin Zhang; Weihong Deng; Mei Wang; Jiani Hu; Xian Li; Dongyue Zhao; Dongchao Wen;	To solve this problem, we propose an effective automatic label noise cleansing framework for face recognition datasets, FaceGraph.
766	MISC: Multi-Condition Injection and Spatially-Adaptive Compositing for Conditional Person Image Synthesis	Shuchen Weng; Wenbo Li; Dawei Li; Hongxia Jin; Boxin Shi;	In this paper, we explore synthesizing person images with multiple conditions for various backgrounds.
767	SAINT: Spatially Aware Interpolation NeTwork for Medical Slice Synthesis	Cheng Peng; Wei-An Lin; Haofu Liao; Rama Chellappa; S. Kevin Zhou;	In this paper, we introduce a Spatially Aware Interpolation NeTwork (SAINT) for medical slice synthesis to alleviate the memory constraint that volumetric data poses.
768	Recurrent Feature Reasoning for Image Inpainting	Jingyuan Li; Ning Wang; Lefei Zhang; Bo Du; Dacheng Tao;	In this paper, we devise a Recurrent Feature Reasoning (RFR) network which is mainly constructed by a plug-and-play Recurrent Feature Reasoning module and a Knowledge Consistent Attention (KCA) module.
769	Structure-Preserving Super Resolution With Gradient Guidance	Cheng Ma; Yongming Rao; Yean Cheng; Ce Chen; Jiwen Lu; Jie Zhou;	In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details.
770	Epipolar Transformers	Yihui He; Rui Yan; Katerina Fragkiadaki; Shoou-I Yu;	Therefore, we propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation.
771	Diversified Arbitrary Style Transfer via Deep Feature Perturbation	Zhizhong Wang; Lei Zhao; Haibo Chen; Lihong Qiu; Qihang Mo; Sihuan Lin; Wei Xing; Dongming Lu;	In this paper, we tackle these limitations and propose a simple yet effective method for diversified arbitrary style transfer.
772	MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks	Animesh Karnewar; Oliver Wang;	In this work, we propose the Multi-Scale Gradient Generative Adversarial Network (MSG-GAN), a simple but effective technique for addressing this by allowing the flow of gradients from the discriminator to the generator at multiple scales.
773	Overcoming Multi-Model Forgetting in One-Shot NAS With Diversity Maximization	Miao Zhang; Huiqi Li; Shirui Pan; Xiaojun Chang; Steven Su;	In this paper, we formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures during the supernet training.
774	Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds	Mohsen Joneidi; Saeed Vahidian; Ashkan Esmaeili; Weijia Wang; Nazanin Rahnavard; Bill Lin; Mubarak Shah;	A simple and efficient selection algorithm with a linear complexity order, referred to as spectrum pursuit (SP), is proposed that pursuits spectral components of the dataset using available sample points.
775	Neural Point Cloud Rendering via Multi-Plane Projection	Peng Dai; Yinda Zhang; Zhuwen Li; Shuaicheng Liu; Bing Zeng;	We present a new deep point cloud rendering pipeline through multi-plane projections.
776	Wish You Were Here: Context-Aware Human Generation	Oran Gafni; Lior Wolf;	We present a novel method for inserting objects, specifically humans, into existing images, such that they blend in a photorealistic manner, while respecting the semantic context of the scene.
777	Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content	Han Yang; Ruimao Zhang; Xiaobao Guo; Wei Liu; Wangmeng Zuo; Ping Luo;	To address this issue, we propose a novel visual try-on network, namely Adaptive Content Generating and Preserving Network (ACGPN).
778	Breaking the Cycle – Colleagues Are All You Need	Ori Nizan; Ayellet Tal;	This paper proposes a novel approach to performing image-to-image translation between unpaired domains.
779	Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation	Hao Tang; Dan Xu; Yan Yan; Philip H.S. Torr; Nicu Sebe;	In this paper, we address the task of semantic-guided scene generation.
780	ManiGAN: Text-Guided Image Manipulation	Bowen Li; Xiaojuan Qi; Thomas Lukasiewicz; Philip H.S. Torr;	The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text.
781	Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions	Ricard Durall; Margret Keuper; Janis Keuper;	In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly.
782	Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems	Patrick Knobelreiter; Christian Sormann; Alexander Shekhovtsov; Friedrich Fraundorfer; Thomas Pock;	In this work we take one of the simplest inference methods, a truncated max-product Belief Propagation, and add what is necessary to make it a proper component of a deep learning model: connect it to learning formulations with losses on marginals and compute the backprop operation.
783	Barycenters of Natural Images Constrained Wasserstein Barycenters for Image Morphing	Dror Simon; Aviad Aberdam;	In this work, we propose a novel approach for image morphing that possesses all three desired properties.
784	Guided Variational Autoencoder for Disentanglement Learning	Zheng Ding; Yifan Xu; Weijian Xu; Gaurav Parmar; Yang Yang; Max Welling; Zhuowen Tu;	We propose an algorithm, guided variational autoencoder (Guided-VAE), that is able to learn a controllable generative model by performing latent representation disentanglement learning.
785	Cross-Spectral Face Hallucination via Disentangling Independent Factors	Boyan Duan; Chaoyou Fu; Yi Li; Xingguang Song; Ran He;	Rather than building a monolithic but complex structure, this paper proposes a Pose Aligned Cross-spectral Hallucination (PACH) approach to disentangle the independent factors and deal with them in individual stages.
786	Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules	Zhengxue Cheng; Heming Sun; Masaru Takeuchi; Jiro Katto;	Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model.
787	C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds	Albert Pumarola; Stefan Popov; Francesc Moreno-Noguer; Vittorio Ferrari;	In this paper, we introduce C-Flow, a novel conditioning scheme that brings normalizing flows to an entirely new scenario with great possibilities for multimodal data modeling.
788	Cogradient Descent for Bilinear Optimization	Li’an Zhuo; Baochang Zhang; Linlin Yang; Hanlin Chen; Qixiang Ye; David Doermann; Rongrong Ji; Guodong Guo;	In this paper, we introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem, based on a theoretical framework to coordinate the gradient of hidden variables via a projection function.
789	Instance-Aware Image Colorization	Jheng-Wei Su; Hung-Kuo Chu; Jia-Bin Huang;	In this paper, we propose a method for achieving instance-aware colorization.
790	Joint Training of Variational Auto-Encoder and Latent Energy-Based Model	Tian Han; Erik Nijkamp; Linqi Zhou; Bo Pang; Song-Chun Zhu; Ying Nian Wu;	This paper proposes a joint training method to learn both the variational auto-encoder (VAE) and the latent energy-based model (EBM).
791	Adaptive Loss-Aware Quantization for Multi-Bit Networks	Zhongnan Qu; Zimu Zhou; Yun Cheng; Lothar Thiele;	We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy.
792	ScopeFlow: Dynamic Scene Scoping for Optical Flow	Aviram Bar-Haim; Lior Wolf;	We propose to modify the common training protocols of optical flow, leading to sizable accuracy improvements without adding to the computational complexity of the training process.
793	Video Super-Resolution With Temporal Group Attention	Takashi Isobe; Songjiang Li; Xu Jia; Shanxin Yuan; Gregory Slabaugh; Chunjing Xu; Ya-Li Li; Shengjin Wang; Qi Tian;	In this work, we propose a novel method that can effectively incorporate temporal information in a hierarchical way.
794	Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression	Yawei Li; Shuhang Gu; Christoph Mayer; Luc Van Gool; Radu Timofte;	In this paper, we analyze two popular network compression techniques, i.e. filter pruning and low-rank decomposition, in a unified sense.
795	3D Photography Using Context-Aware Layered Depth Inpainting	Meng-Li Shih; Shih-Yang Su; Johannes Kopf; Jia-Bin Huang;	We propose a method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view.
796	MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation	Yuheng Li; Krishna Kumar Singh; Utkarsh Ojha; Yong Jae Lee;	We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation.
797	Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer	Yerlan Idelbayev; Miguel A. Carreira-Perpinan;	We show that, with a suitable formulation, this problem is amenable to a mixed discrete-continuous optimization jointly over the ranks and over the matrix elements, and give a corresponding algorithm.
798	Global Texture Enhancement for Fake Face Detection in the Wild	Zhengzhe Liu; Xiaojuan Qi; Philip H.S. Torr;	In this paper, we conduct an empirical study on fake/real faces, and have two important observations: firstly, the texture of fake faces is substantially different from real ones; secondly, global texture statistics are more robust to image editing and transferable to fake faces from different GANs and datasets.
799	Panoptic-Based Image Synthesis	Aysegul Dundar; Karan Sapra; Guilin Liu; Andrew Tao; Bryan Catanzaro;	We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps which unify semantic and instance information.
800	Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination	Pratul P. Srinivasan; Ben Mildenhall; Matthew Tancik; Jonathan T. Barron; Richard Tucker; Noah Snavely;	We present a deep learning solution for estimating the incident illumination at any 3D location within a scene from an input narrow-baseline stereo image pair.
801	Learning to Cartoonize Using White-Box Cartoon Representations	Xinrui Wang; Jinze Yu;	This paper presents an approach for image cartoonization.
802	End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization	Bo Chen; Alvaro Parra; Jiewei Cao; Nan Li; Tat-Jun Chin;	Towards this aim, we present BPnP, a novel network module that backpropagates gradients through a Perspective-n-Points (PnP) solver to guide parameter updates of a neural network.
803	Analyzing and Improving the Image Quality of StyleGAN	Tero Karras; Samuli Laine; Miika Aittala; Janne Hellsten; Jaakko Lehtinen; Timo Aila;	We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them.
804	Fashion Editing With Adversarial Parsing Learning	Haoye Dong; Xiaodan Liang; Yixuan Zhang; Xujie Zhang; Xiaohui Shen; Zhenyu Xie; Bowen Wu; Jian Yin;	In this paper, we propose a novel Fashion Editing Generative Adversarial Network (FE-GAN), which is capable of manipulating fashion images by free-form sketches and sparse color strokes.
805	Augment Your Batch: Improving Generalization Through Instance Repetition	Elad Hoffer; Tal Ben-Nun; Itay Hubara; Niv Giladi; Torsten Hoefler; Daniel Soudry;	We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations.
806	ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes	Daquan Liu; Chengjiang Long; Hongpan Zhang; Hanning Yu; Xinzhi Dong; Chunxia Xiao;	To address this problem, we propose an end-to-end Generative Adversarial Network for shadow generation named ARShadowGAN for augmented reality in single light scenes.
807	An End-to-End Edge Aggregation Network for Moving Object Segmentation	Prashant W. Patil; Kuldeep M. Biradar; Akshay Dudhane; Subrahmanyam Murala;	In this paper, the inherent correlation learning-based edge extraction mechanism (EEM) and dense residual block (DRB) are proposed for the discriminative foreground representation.
808	Learning Video Stabilization Using Optical Flow	Jiyang Yu; Ravi Ramamoorthi;	We propose a novel neural network that infers the per-pixel warp fields for video stabilization from the optical flow fields of the input video.
809	Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation	Runfa Chen; Wenbing Huang; Binghui Huang; Fuchun Sun; Bin Fang;	To tackle this issue, we develop a decoupled training strategy by which the encoder is only trained when maximizing the adversary loss while keeping frozen otherwise.
810	Robust Design of Deep Neural Networks Against Adversarial Attacks Based on Lyapunov Theory	Arash Rahnama; Andre T. Nguyen; Edward Raff;	In this work, we take a control theoretic approach to the problem of robustness in DNNs.
811	StarGAN v2: Diverse Image Synthesis for Multiple Domains	Yunjey Choi; Youngjung Uh; Jaejun Yoo; Jung-Woo Ha;	We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines.
812	Warping Residual Based Image Stitching for Large Parallax	Kyu-Yul Lee; Jae-Young Sim;	In this paper, we propose an image stitching algorithm robust to large parallax based on the novel concept of warping residuals.
813	A U-Net Based Discriminator for Generative Adversarial Networks	Edgar Schonfeld; Bernt Schiele; Anna Khoreva;	To target this issue we propose an alternative U-Net based discriminator architecture, borrowing the insights from the segmentation literature.
814	Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping	Ran Yi; Yong-Jin Liu; Yu-Kun Lai; Paul L. Rosin;	To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible (by a truncation loss) and only embedded in selective facial regions (by a relaxed forward cycle-consistency loss).
815	When to Use Convolutional Neural Networks for Inverse Problems	Nathaniel Chodosh; Simon Lucey;	In this work we argue that for some types of inverse problems the CNN approximation breaks down leading to poor performance.
816	LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood	Abhinav Kumar; Tim K. Marks; Wenxuan Mou; Ye Wang; Michael Jones; Anoop Cherian; Toshiaki Koike-Akino; Xiaoming Liu; Chen Feng;	In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities.
817	Affinity Graph Supervision for Visual Recognition	Chu Wang; Babak Samari; Vladimir G. Kim; Siddhartha Chaudhuri; Kaleem Siddiqi;	Here we propose a principled method to directly supervise the learning of weights in affinity graphs, to exploit meaningful connections between entities in the data source.
818	Unsupervised Magnification of Posture Deviations Across Subjects	Michael Dorkenwald; Uta Buchler; Bjorn Ommer;	We present an approach to unsupervised magnification of posture differences across individuals despite large deviations in appearance.
819	Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network	Fukun Yin; Shizhe Zhou;	In this paper we address the problem of accurately estimating the height of a person with arbitrary postures from a single depth image.
820	Fast Soft Color Segmentation	Naofumi Akimoto; Huachun Zhu; Yanghua Jin; Yoshimitsu Aoki;	To address this issue, we propose a neural network based method for this task that decomposes a given image into multiple layers in a single forward pass.
821	Global Optimality for Point Set Registration Using Semidefinite Programming	Jose Pedro Iglesias; Carl Olsson; Fredrik Kahl;	In this paper we present a study of global optimality conditions for Point Set Registration (PSR) with missing data.
822	Image2StyleGAN++: How to Edit the Embedded Images?	Rameen Abdal; Yipeng Qin; Peter Wonka;	We propose Image2StyleGAN++, a flexible image editing framework with many applications.
823	SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object Tracking	Yanru Huang; Feiyu Zhu; Zheni Zeng; Xi Qiu; Yuan Shen; Jianan Wu;	We present a novel self quality evaluation metric SQE for parameters optimization in the challenging yet critical multi-object tracking task.
824	EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning	Lin Wang; Tae-Kyun Kim; Kuk-Jin Yoon;	To tackle the challenges, we propose a novel end-to-end pipeline that reconstructs LR images from event streams, enhances the image qualities and upsamples the enhanced images, called EventSR.
825	Hierarchical Pyramid Diverse Attention Networks for Face Recognition	Qiangchang Wang; Tianyi Wu; He Zheng; Guodong Guo;	In this work, we propose a hierarchical pyramid diverse attention (HPDA) network.
826	RGBD-Dog: Predicting Canine Pose from RGBD Sensors	Sinead Kearney; Wenbin Li; Martin Parsons; Kwang In Kim; Darren Cosker;	In our work, we focus on the problem of 3D canine pose estimation from RGBD images, recording a diverse range of dog breeds with several Microsoft Kinect v2s, simultaneously obtaining the 3D ground truth skeleton via a motion capture system.
827	Multi-Scale Progressive Fusion Network for Single Image Deraining	Kui Jiang; Zhongyuan Wang; Peng Yi; Chen Chen; Baojin Huang; Yimin Luo; Jiayi Ma; Junjun Jiang;	In this work, we explore the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi-scale progressive fusion network (MSPFN) for single image rain streak removal.
828	Learning a Neural 3D Texture Space From 2D Exemplars	Philipp Henzler; Niloy J. Mitra; Tobias Ritschel;	We suggest a generative model of 2D and 3D natural textures with diversity, visual fidelity and at high computational efficiency.
829	BachGAN: High-Resolution Image Synthesis From Salient Object Layout	Yandong Li; Yu Cheng; Zhe Gan; Licheng Yu; Liqiang Wang; Jingjing Liu;	We propose a new task towards more practical applications for image generation – high-quality image synthesis from salient object layout.
830	Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy	Jaejun Yoo; Namhyuk Ahn; Kyung-Ah Sohn;	In this paper, we provide a comprehensive analysis of the existing augmentation methods applied to the super-resolution task.
831	On Positive-Unlabeled Classification in GAN	Tianyu Guo; Chang Xu; Jiajun Huang; Yunhe Wang; Boxin Shi; Chao Xu; Dacheng Tao;	This paper defines a positive and unlabeled classification problem for standard GANs, which then leads to a novel technique to stabilize the training of the discriminator in GANs.
832	DoveNet: Deep Image Harmonization via Domain Verification	Wenyan Cong; Jianfu Zhang; Li Niu; Liu Liu; Zhixin Ling; Weiyuan Li; Liqing Zhang;	In this work, we contribute an image harmonization dataset iHarmony4 by generating synthesized composite images based on COCO (resp., Adobe5k, Flickr, day2night) dataset, leading to our HCOCO (resp., HAdobe5k, HFlickr, Hday2night) sub-dataset.
833	Noise Robust Generative Adversarial Networks	Takuhiro Kaneko; Tatsuya Harada;	As an alternative, we propose a novel family of GANs called noise robust GANs (NR-GANs), which can learn a clean image generator even when training images are noisy.
834	Normalizing Flows With Multi-Scale Autoregressive Priors	Apratim Bhattacharyya; Shweta Mahajan; Mario Fritz; Bernt Schiele; Stefan Roth;	In this work, we improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR).
835	Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution	Gyumin Shim; Jinsun Park; In So Kweon;	In this paper, we propose a novel and efficient reference feature extraction module referred to as the Similarity Search and Extraction Network (SSEN) for reference-based super-resolution (RefSR) tasks.
836	Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings	Amy Zhao; Guha Balakrishnan; Kathleen M. Lewis; Fredo Durand; John V. Guttag; Adrian V. Dalca;	We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process.
837	GeoDA: A Geometric Framework for Black-Box Adversarial Attacks	Ali Rahmati; Seyed-Mohsen Moosavi-Dezfooli; Pascal Frossard; Huaiyu Dai;	We propose a geometric framework to generate adversarial examples in one of the most challenging black-box settings where the adversary can only generate a small number of queries, each of them returning the top-1 label of the classifier.
838	GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data	Seongwook Yoon; Sanghoon Sull;	We propose a novel imputation method for highly missing data.
839	An Internal Covariate Shift Bounding Algorithm for Deep Neural Networks by Unitizing Layers’ Outputs	You Huang; Yuanlong Yu;	Thus this paper proposes a measure for ICS by using the Earth Mover (EM) distance and then derives the upper and lower bounds for the measure to provide a theoretical analysis of BN.
840	A Unified Optimization Framework for Low-Rank Inducing Penalties	Marcus Valtonen Ornhag; Carl Olsson;	In this paper we study the convex envelopes of a new class of functions.
841	Single-Side Domain Generalization for Face Anti-Spoofing	Yunpei Jia; Jie Zhang; Shiguang Shan; Xilin Chen;	In this work, we propose an end-to-end single-side domain generalization framework (SSDG) to improve the generalization ability of face anti-spoofing.
842	The Knowledge Within: Methods for Data-Free Model Compression	Matan Haroush; Itay Hubara; Elad Hoffer; Daniel Soudry;	Contributions: We present three methods for generating synthetic samples from trained models. Then, we demonstrate how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process.
843	Scale-Space Flow for End-to-End Optimized Video Compression	Eirikur Agustsson; David Minnen; Nick Johnston; Johannes Balle; Sung Jin Hwang; George Toderici;	In this paper, we show that a generalized warping operator that better handles common failure cases, e.g. disocclusions and fast motion, can provide competitive compression results with a greatly simplified model and training procedure.
844	Dynamic Neural Relational Inference	Colin Graber; Alexander G. Schwing;	In response to this, we develop Dynamic Neural Relational Inference (dNRI), which incorporates insights from sequential latent variable models to predict separate relation graphs for every time-step.
845	Real-Time Panoptic Segmentation From Dense Detections	Rui Hou; Jie Li; Arjun Bhargava; Allan Raventos; Vitor Guizilini; Chao Fang; Jerome Lynch; Adrien Gaidon;	In this paper, we propose a new single-shot panoptic segmentation network that leverages dense detections and a global self-attention mechanism to operate in real-time with performance approaching the state of the art.
846	Deep Snake for Real-Time Instance Segmentation	Sida Peng; Wen Jiang; Huaijin Pi; Xiuli Li; Hujun Bao; Xiaowei Zhou;	This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation.
847	AdaCoSeg: Adaptive Shape Co-Segmentation With Group Consistency Loss	Chenyang Zhu; Kai Xu; Siddhartha Chaudhuri; Li Yi; Leonidas J. Guibas; Hao Zhang;	We introduce AdaCoSeg, a deep neural network architecture for adaptive co-segmentation of a set of 3D shapes represented as point clouds.
848	Learning Dynamic Routing for Semantic Segmentation	Yanwei Li; Lin Song; Yukang Chen; Zeming Li; Xiangyu Zhang; Xingang Wang; Jian Sun;	This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing.
849	Boosting Semantic Human Matting With Coarse Annotations	Jinlin Liu; Yuan Yao; Wendi Hou; Miaomiao Cui; Xuansong Xie; Changshui Zhang; Xian-Sheng Hua;	In this paper, we propose to leverage coarse annotated data coupled with fine annotated data to boost end-to-end semantic human matting without trimaps as extra input.
850	BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation	Hao Chen; Kunyang Sun; Zhi Tian; Chunhua Shen; Yongming Huang; Youliang Yan;	Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches.
851	UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders	Jing Zhang; Deng-Ping Fan; Yuchao Dai; Saeed Anwar; Fatemeh Sadat Saleh; Tong Zhang; Nick Barnes;	In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
852	Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence	Nicolas Donati; Abhishek Sharma; Maks Ovsjanikov;	We present a novel learning-based approach for computing correspondences between non-rigid 3D shapes.
853	Deep Polarization Cues for Transparent Object Segmentation	Agastya Kalra; Vage Taamazyan; Supreeth Krishna Rao; Kartik Venkataraman; Ramesh Raskar; Achuta Kadambi;	This paper reframes the problem of transparent object segmentation into the realm of light polarization, i.e., the rotation of light waves.
854	DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes	Jonas Schult; Francis Engelmann; Theodora Kontogianni; Bastian Leibe;	We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions.
855	F-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation	Konstantin Sofiiuk; Ilia Petrov; Olga Barinova; Anton Konushin;	We propose f-BRS (feature backpropagating refinement scheme) that solves an optimization problem with respect to auxiliary variables instead of the network inputs, and requires running forward and backward passes just for a small part of a network.
856	Approximating shapes in images with low-complexity polygons	Muxingzi Li; Florent Lafarge; Renaud Marlet;	We present an algorithm for extracting and vectorizing objects in images with polygons.
857	Towards Visually Explaining Variational Autoencoders	Wenqian Liu; Runze Li; Meng Zheng; Srikrishna Karanam; Ziyan Wu; Bir Bhanu; Richard J. Radke; Octavia Camps;	In this work, we take a step towards bridging this crucial gap, proposing the first technique to visually explain VAEs by means of gradient-based attention.
858	Towards Global Explanations of Convolutional Neural Networks With Concept Attribution	Weibin Wu; Yuxin Su; Xixian Chen; Shenglin Zhao; Irwin King; Michael R. Lyu; Yu-Wing Tai;	To overcome such drawbacks, we propose a novel two-stage framework, Attacking for Interpretability (AfI), which explains model decisions in terms of the importance of user-defined concepts.
859	Interpretable and Accurate Fine-grained Recognition via Region Grouping	Zixuan Huang; Yin Li;	We present an interpretable deep model for fine-grained visual recognition.
860	SAM: The Sensitivity of Attribution Methods to Hyperparameters	Naman Bansal; Chirag Agarwal; Anh Nguyen;	In this paper, we provide a thorough empirical study on the sensitivity of existing attribution methods.
861	High-Frequency Component Helps Explain the Generalization of Convolutional Neural Networks	Haohan Wang; Xindi Wu; Zeyi Huang; Eric P. Xing;	We investigate the relationship between the frequency spectrum of image data and the generalization behavior of convolutional neural networks (CNN).
862	FALCON: A Fourier Transform Based Approach for Fast and Secure Convolutional Neural Network Predictions	Shaohua Li; Kaiping Xue; Bin Zhu; Chenkai Ding; Xindi Gao; David Wei; Tao Wan;	In this paper, we focus on the scenario where clients want to classify private images with a convolutional neural network model hosted in the server, while both parties keep their data private.
863	Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion	Hongxu Yin; Pavlo Molchanov; Jose M. Alvarez; Zhizhong Li; Arun Mallya; Derek Hoiem; Niraj K. Jha; Jan Kautz;	We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network.
864	Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering	Hui Tang; Ke Chen; Kui Jia;	To alleviate this risk, we are motivated by the assumption of structural domain similarity, and propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data.
865	HyperSTAR: Task-Aware Hyperparameters for Deep Networks	Gaurav Mittal; Chang Liu; Nikolaos Karianakis; Victor Fragoso; Mei Chen; Yun Fu;	To reduce HPO time, we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware method to warm-start HPO for deep neural networks.
866	ActBERT: Learning Global-Local Video-Text Representations	Linchao Zhu; Yi Yang;	In this paper, we introduce ActBERT for self-supervised learning of joint video-text representations from unlabeled data.
867	State-Relabeling Adversarial Active Learning	Beichen Zhang; Liang Li; Shijie Yang; Shuhui Wang; Zheng-Jun Zha; Qingming Huang;	In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples.
868	Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization	Jinjie Mai; Meng Yang; Wenfeng Luo;	To remedy this, we propose a simple yet powerful approach by introducing a novel adversarial erasing technique, erasing integrated learning (EIL).
869	A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning	Dat Huynh; Ehsan Elhamifar;	In this work, we develop a shared multi-attention model for multi-label zero-shot learning.
870	Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos	Tomas Jakab; Ankush Gupta; Hakan Bilen; Andrea Vedaldi;	We propose a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses.
871	Few-Shot Open-Set Recognition Using Meta-Learning	Bo Liu; Hao Kang; Haoxiang Li; Gang Hua; Nuno Vasconcelos;	This combines the random selection of a set of novel classes per episode, a loss that maximizes the posterior entropy for examples of those classes, and a new metric learning formulation based on the Mahalanobis distance.
872	Few-Shot Learning via Embedding Adaptation With Set-to-Set Functions	Han-Jia Ye; Hexiang Hu; De-Chuan Zhan; Fei Sha;	In this paper, we propose a novel approach to adapt the instance embeddings to the target classification task with a set-to-set function, yielding embeddings that are task-specific and are discriminative.
873	Temporally Distributed Networks for Fast Video Semantic Segmentation	Ping Hu; Fabian Caba; Oliver Wang; Zhe Lin; Stan Sclaroff; Federico Perazzi;	We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation.
874	Benchmarking the Robustness of Semantic Segmentation Models	Christoph Kamann; Carsten Rother;	While there are recent robustness studies for full-image classification, we are the first to present an exhaustive study for semantic segmentation, based on the state-of-the-art model DeepLabv3+.
875	There and Back Again: Revisiting Backpropagation Saliency Methods	Sylvestre-Alvise Rebuffi; Ruth Fong; Xu Ji; Andrea Vedaldi;	In this work, we conduct a thorough analysis of backpropagation-based saliency methods and propose a single framework under which several such methods can be unified.
876	Deep Semantic Clustering by Partition Confidence Maximisation	Jiabo Huang; Shaogang Gong; Xiatian Zhu;	In this work, we propose to solve this problem by learning the most confident clustering solution from all the possible separations, based on the observation that assigning samples from the same semantic categories into different clusters will reduce both the intra-cluster compactness and inter-cluster diversity, i.e. lower partition confidence.
877	StructEdit: Learning Structural Shape Variations	Kaichun Mo; Paul Guerrero; Li Yi; Hao Su; Peter Wonka; Niloy J. Mitra; Leonidas J. Guibas;	Instead, we treat shape differences as primary objects in their own right and propose to encode them in their own latent space.
878	Harmonizing Transferability and Discriminability for Adapting Object Detectors	Chaoqi Chen; Zebiao Zheng; Xinghao Ding; Yue Huang; Qi Dou;	In this paper, we propose a Hierarchical Transferability Calibration Network (HTCN) that hierarchically (local-region/image/instance) calibrates the transferability of feature representations for harmonizing transferability and discriminability.
879	Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching	Xuhua Huang; Jiarui Xu; Yu-Wing Tai; Chi-Keung Tang;	In this paper, we introduce "tracking-by-detection" into VOS which can coherently integrates segmentation into tracking, by proposing a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance.
880	CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement	Ho Kei Cheng; Jihoon Chung; Yu-Wing Tai; Chi-Keung Tang;	In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data.
881	Correlating Edge, Pose With Parsing	Ziwei Zhang; Chi Su; Liang Zheng; Xiaodong Xie;	To capture such correlations, we propose a Correlation Parsing Machine (CorrPM) employing a heterogeneous non-local block to discover the spatial affinity among feature maps from the edge, pose and parsing.
882	VecRoad: Point-Based Iterative Graph Exploration for Road Graphs Extraction	Yong-Qiang Tan; Shang-Hua Gao; Xuan-Yi Li; Ming-Ming Cheng; Bo Ren;	To enhance the road connectivity while maintaining the precise alignment between the graph and real road, we propose a point-based iterative graph exploration scheme with segmentation-cues guidance and flexible steps.
883	Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation	Zeyu Wang; Klint Qinami; Ioannis Christos Karakozis; Kyle Genova; Prem Nair; Kenji Hata; Olga Russakovsky;	We highlight the shortcomings of popular adversarial training approaches for bias mitigation, propose a simple but similarly effective alternative to the inference-time Reducing Bias Amplification method of Zhao et al., and design a domain-independent training technique that outperforms all other methods.
884	Hierarchical Human Parsing With Typed Part-Relation Reasoning	Wenguan Wang; Hailong Zhu; Jifeng Dai; Yanwei Pang; Jianbing Shen; Ling Shao;	Focusing on this, we seek to simultaneously exploit the representational capacity of deep graph networks and the hierarchical human structures.
885	Compositional Convolutional Neural Networks: A Deep Architecture With Innate Robustness to Partial Occlusion	Adam Kortylewski; Ju He; Qing Liu; Alan L. Yuille;	Inspired by the success of compositional models at classifying partially occluded objects, we propose to integrate compositional models and DCNNs into a unified deep model with innate robustness to partial occlusion.
886	Spatial Pyramid Based Graph Reasoning for Semantic Segmentation	Xia Li; Yibo Yang; Qijie Zhao; Tiancheng Shen; Zhouchen Lin; Hong Liu;	In this paper, we apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
887	Learning Video Object Segmentation From Unlabeled Videos	Xiankai Lu; Wenguan Wang; Jianbing Shen; Yu-Wing Tai; David J. Crandall; Steven C. H. Hoi;	We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.
888	Part-Aware Context Network for Human Parsing	Xiaomei Zhang; Yingying Chen; Bingke Zhu; Jinqiao Wang; Ming Tang;	In this work, we propose a Part-aware Context Network (PCNet), a novel and effective algorithm to deal with the challenge.
889	SCOUT: Self-Aware Discriminant Counterfactual Explanations	Pei Wang; Nuno Vasconcelos;	A new family of discriminant explanations is introduced. These produce heatmaps that attribute high scores to image regions informative of a classifier prediction but not of a counter class.
890	Weakly-Supervised Semantic Segmentation via Sub-Category Exploration	Yu-Ting Chang; Qiaosong Wang; Wei-Chih Hung; Robinson Piramuthu; Yi-Hsuan Tsai; Ming-Hsuan Yang;	To enforce the network to pay attention to other parts of an object, we propose a simple yet effective approach that introduces a self-supervised task by exploiting the sub-category information.
891	Continual Learning With Extended Kronecker-Factored Approximate Curvature	Janghyeon Lee; Hyeong Gwon Hong; Donggyu Joo; Junmo Kim;	We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers.
892	Phase Consistent Ecological Domain Adaptation	Yanchao Yang; Dong Lao; Ganesh Sundaramoorthi; Stefano Soatto;	We introduce two criteria to regularize the optimization involved in learning a classifier in a domain where no annotated data are available, leveraging annotated data in a different domain, a problem known as unsupervised domain adaptation.
893	AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-Identification	Yunpeng Zhai; Shijian Lu; Qixiang Ye; Xuebo Shan; Jie Chen; Rongrong Ji; Yonghong Tian;	This paper presents a novel augmented discriminative clustering (AD-Cluster) technique that estimates and augments person clusters in target domains and enforces the discrimination ability of re-ID models with the augmented clusters.
894	3D-MPA: Multi-Proposal Aggregation for 3D Semantic Instance Segmentation	Francis Engelmann; Martin Bokeloh; Alireza Fathi; Bastian Leibe; Matthias Niessner;	We present 3D-MPA, a method for instance segmentation on 3D point clouds.
895	Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision	Denis Gudovskiy; Alec Hodgkinson; Takuya Yamaguchi; Sotaro Tsukizawa;	To implement such acquisition function, we propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators.
896	Adaptive Graph Convolutional Network With Attention Graph Clustering for Co-Saliency Detection	Kaihua Zhang; Tengpeng Li; Shiwen Shen; Bo Liu; Jin Chen; Qingshan Liu;	For this task, we present a novel adaptive graph convolutional network with attention graph clustering (GCAGC).
897	A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection	Yongri Piao; Zhengkun Rong; Miao Zhang; Weisong Ren; Huchuan Lu;	To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream.
898	Deep Fair Clustering for Visual Learning	Peizhao Li; Han Zhao; Hongfu Liu;	In light of these limitations, in this paper, we propose Deep Fair Clustering (DFC) to learn fair and clustering-favorable representations for clustering simultaneously.
899	Bidirectional Graph Reasoning Network for Panoptic Segmentation	Yangxin Wu; Gengwei Zhang; Yiming Gao; Xiajun Deng; Ke Gong; Xiaodan Liang; Liang Lin;	We introduce a Bidirectional Graph Reasoning Network (BGRNet), which incorporates graph structure into the conventional panoptic segmentation network to mine the intra-modular and inter-modular relations within and between foreground things and background stuff classes.
900	Exploit Clues From Views: Self-Supervised and Regularized Learning for Multiview Object Recognition	Chih-Hui Ho; Bo Liu; Tz-Ying Wu; Nuno Vasconcelos;	In this work, the problem of multiview self-supervised learning (MV-SSL) is investigated, where only image to object association is given.
901	Spherical Space Domain Adaptation With Robust Pseudo-Label Loss	Xiang Gu; Jian Sun; Zongben Xu;	In this paper, we propose a novel adversarial DA approach completely defined in spherical feature space, in which we define spherical classifier for label prediction and spherical domain discriminator for discriminating domain labels.
902	Stochastic Classifiers for Unsupervised Domain Adaptation	Zhihe Lu; Yongxin Yang; Xiatian Zhu; Cong Liu; Yi-Zhe Song; Tao Xiang;	In this paper, we introduce a novel method called STochastic clAssifieRs (STAR) for addressing this problem.
903	Unsupervised Learning of Intrinsic Structural Representation Points	Nenglun Chen; Lingjie Liu; Zhiming Cui; Runnan Chen; Duygu Ceylan; Changhe Tu; Wenping Wang;	We present a simple yet interpretable unsupervised method for learning a new structural representation in the form of 3D structure points.
904	PolyTransform: Deep Polygon Transformer for Instance Segmentation	Justin Liang; Namdar Homayounfar; Wei-Chiu Ma; Yuwen Xiong; Rui Hu; Raquel Urtasun;	In this paper, we propose PolyTransform, a novel instance segmentation algorithm that produces precise, geometry-preserving masks by combining the strengths of prevailing segmentation approaches and modern polygon-based methods.
905	Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection	Huajun Zhou; Xiaohua Xie; Jian-Huang Lai; Zixuan Chen; Lingxiao Yang;	In this paper, we first analyze such correlation and then propose an interactive two-stream decoder to explore multiple cues, including saliency, contour and their correlation.
906	Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet	Wang Zhao; Shaohui Liu; Yezhi Shu; Yong-Jin Liu;	In this work, we tackle the essential problem of scale inconsistency for self supervised joint depth-pose learning.
907	LT-Net: Label Transfer by Learning Reversible Voxel-Wise Correspondence for One-Shot Medical Image Segmentation	Shuxin Wang; Shilei Cao; Dong Wei; Renzhen Wang; Kai Ma; Liansheng Wang; Deyu Meng; Yefeng Zheng;	We introduce a one-shot segmentation method to alleviate the burden of manual annotation for medical images.
908	FGN: Fully Guided Network for Few-Shot Instance Segmentation	Zhibo Fan; Jin-Gang Yu; Zhihao Liang; Jiarong Ou; Changxin Gao; Gui-Song Xia; Yuanqing Li;	This paper presents a Fully Guided Network (FGN) for few-shot instance segmentation.
909	A Quantum Computational Approach to Correspondence Problems on Point Sets	Vladislav Golyanik; Christian Theobalt;	We review AQC and derive a new algorithm for correspondence problems on point sets suitable for execution on AQC.
910	Data-Efficient Semi-Supervised Learning by Reliable Edge Mining	Peibin Chen; Tao Ma; Xu Qin; Weidi Xu; Shuchang Zhou;	We propose Reliable Edge Mining (REM), which forms a reliable graph by only selecting reliable and useful edges.
911	NestedVAE: Isolating Common Factors via Weak Supervision	Matthew J. Vowels; Necati Cihan Camgoz; Richard Bowden;	To isolate the common factors we combine the theory of deep latent variable models with information bottleneck theory for scenarios whereby data may be naturally paired across domains and no additional supervision is required.
912	Progressive Adversarial Networks for Fine-Grained Domain Adaptation	Sinan Wang; Xinyang Chen; Yunbo Wang; Mingsheng Long; Jianmin Wang;	This paper presents the Progressive Adversarial Networks (PAN) to align fine-grained categories across domains with a curriculum-based adversarial learning framework.
913	A Disentangling Invertible Interpretation Network for Explaining Latent Representations	Patrick Esser; Robin Rombach; Bjorn Ommer;	We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user.
914	Modeling the Background for Incremental Learning in Semantic Segmentation	Fabio Cermelli; Massimiliano Mancini; Samuel Rota Bulo; Elisa Ricci; Barbara Caputo;	In this work we revisit classical incremental learning methods, proposing a new distillation-based framework which explicitly accounts for this shift.
915	Interpreting the Latent Space of GANs for Semantic Face Editing	Yujun Shen; Jinjin Gu; Xiaoou Tang; Bolei Zhou;	In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs.
916	Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation	Jianqiang Wan; Yang Liu; Donglai Wei; Xiang Bai; Yongchao Xu;	In this paper, we propose a fast image segmentation method based on a novel super boundary-to-pixel direction (super-BPD) and a customized segmentation algorithm with super-BPD.
917	Self-Learning With Rectification Strategy for Human Parsing	Tao Li; Zhiyuan Liang; Sanyuan Zhao; Jiahao Gong; Jianbing Shen;	In this paper, we solve the sample shortage problem in the human parsing task.
918	Hyperbolic Visual Embedding Learning for Zero-Shot Recognition	Shaoteng Liu; Jingjing Chen; Liangming Pan; Chong-Wah Ngo; Tat-Seng Chua; Yu-Gang Jiang;	This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition.
919	Sequential Mastery of Multiple Visual Tasks: Networks Naturally Learn to Learn and Forget to Forget	Guy Davidson; Michael C. Mozer;	We explore the behavior of a standard convolutional neural net in a continual-learning setting that introduces visual classification tasks sequentially and requires the net to master new tasks while preserving mastery of previously learned tasks.
920	Distilling Effective Supervision From Severe Label Noise	Zizhao Zhang; Han Zhang; Sercan O. Arik; Honglak Lee; Tomas Pfister;	We present a holistic framework to train deep neural networks in a way that is highly invulnerable to label noise.
921	Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks	Aditya Golatkar; Alessandro Achille; Stefano Soatto;	We propose a method for "scrubbing" the weights clean of information about a particular set of training data.
922	CenterMask: Single Shot Instance Segmentation With Point Representation	Yuqing Wang; Zhaoliang Xu; Hao Shen; Baoshan Cheng; Lirong Yang;	In this paper, we propose a single-shot instance segmentation method, which is simple, fast and accurate.
923	Mitigating Bias in Face Recognition Using Skewness-Aware Reinforcement Learning	Mei Wang; Weihong Deng;	A reinforcement learning based race balance network (RL-RBN) is proposed.
924	MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images	Yaxing Wang; Abel Gonzalez-Garcia; David Berga; Luis Herranz; Fahad Shahbaz Khan; Joost van de Weijer;	We propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs.
925	DLWL: Improving Detection for Lowshot Classes With Weakly Labelled Data	Vignesh Ramanathan; Rui Wang; Dhruv Mahajan;	Towards this end, we propose a modification to the FRCNN model to automatically infer label assignment for objects proposals from weakly labelled images during training.
926	Unsupervised Deep Shape Descriptor With Point Distribution Learning	Yi Shi; Mengchen Xu; Shuaihang Yuan; Yi Fang;	This paper proposes a novel probabilistic framework for the learning of unsupervised deep shape descriptors with point distribution learning.
927	Stylization-Based Architecture for Fast Deep Exemplar Colorization	Zhongyou Xu; Tingting Wang; Faming Fang; Yun Sheng; Guixu Zhang;	To tackle these problems, we propose a deep exemplar colorization architecture inspired by the characteristics of stylization in feature extracting and blending.
928	Cars Can’t Fly Up in the Sky: Improving Urban-Scene Segmentation via Height-Driven Attention Networks	Sungha Choi; Joanne T. Kim; Jaegul Choo;	This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet), for improving semantic segmentation for urban-scene images.
929	State-Aware Tracker for Real-Time Video Object Segmentation	Xi Chen; Zuoxin Li; Ye Yuan; Gang Yu; Jianxin Shen; Donglian Qi;	In this work, we address the task of semi-supervised video object segmentation (VOS) and explore how to make efficient use of video property to tackle the challenge of semi-supervision.
930	Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning	Xuan Liao; Wenhao Li; Qisen Xu; Xiangfeng Wang; Bo Jin; Xiaoyun Zhang; Yanfeng Wang; Ya Zhang;	We here propose to model the dynamic process of iterative interactive image segmentation as a Markov decision process (MDP) and solve it with reinforcement learning (RL).
931	ENSEI: Efficient Secure Inference via Frequency-Domain Homomorphic Convolution for Privacy-Preserving Visual Recognition	Song Bian; Tianchen Wang; Masayuki Hiromoto; Yiyu Shi; Takashi Sato;	In this work, we propose ENSEI, a secure inference (SI) framework based on the frequency-domain secure convolution (FDSC) protocol for the efficient execution of image inference in the encrypted domain.
932	Multi-Scale Interactive Network for Salient Object Detection	Youwei Pang; Xiaoqi Zhao; Lihe Zhang; Huchuan Lu;	In this paper, we propose the aggregate interaction modules to integrate the features from adjacent levels, in which less noise is introduced because of only using small up-/down-sampling rates.
933	Interactive Multi-Label CNN Learning With Partial Labels	Dat Huynh; Ehsan Elhamifar;	We introduce a new loss function that regularizes the cross-entropy loss with a cost function that measures the smoothness of labels and features of images on the data manifold.
934	ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation	Yawar Siddiqui; Julien Valentin; Matthias Niessner;	We propose ViewAL, a novel active learning strategy for semantic segmentation that exploits viewpoint consistency in multi-view datasets.
935	Scene-Adaptive Video Frame Interpolation via Meta-Learning	Myungsub Choi; Janghoon Choi; Sungyong Baik; Tae Hyun Kim; Kyoung Mu Lee;	In this work, we propose to adapt the model to each video by making use of additional information that is readily available at test time and yet has not been exploited in previous works.
936	Action Segmentation With Joint Self-Supervised Temporal Domain Adaptation	Min-Hung Chen; Baopu Li; Yingze Bao; Ghassan AlRegib; Zsolt Kira;	To reduce the discrepancy, we propose SelfSupervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches.
937	Pixel Consensus Voting for Panoptic Segmentation	Haochen Wang; Ruotian Luo; Michael Maire; Greg Shakhnarovich;	The core of our approach, Pixel Consensus Voting, is a framework for instance segmentation based on the generalized Hough transform.
938	Minimizing Discrete Total Curvature for Image Processing	Qiuxiang Zhong; Yutong Li; Yijie Yang; Yuping Duan;	In this paper, we propose a novel curvature regularity, the total curvature (TC), by minimizing the normal curvatures along different directions.
939	Towards Robust Image Classification Using Sequential Attention Models	Daniel Zoran; Mike Chrzanowski; Po-Sen Huang; Sven Gowal; Alex Mott; Pushmeet Kohli;	In this paper we propose to augment a modern neural-network architecture with an attention model inspired by human perception.
940	Discovering Synchronized Subsets of Sequences: A Large Scale Solution	Evangelos Sariyanidi; Casey J. Zampella; Keith G. Bartley; John D. Herrington; Theodore D. Satterthwaite; Robert T. Schultz; Birkan Tunc;	We present an approximate, but highly efficient and scalable, method that represents the search space as a union of sets called epsilon-expanded clusters, one of which is theoretically guaranteed to contain the largest subset of synchronized sequences.
941	Going Deeper With Lean Point Networks	Eric-Tuan Le; Iasonas Kokkinos; Niloy J. Mitra;	In this work we introduce Lean Point Networks (LPNs) to train deeper and more accurate point processing networks by relying on three novel point processing blocks that improve memory consumption, inference time, and accuracy: a convolution-type block for point sets that blends neighborhood information in a memory-efficient manner; a crosslink block that efficiently shares information across low- and high-resolution processing branches; and a multi-resolution point cloud processing block for faster diffusion of information.
942	Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment	Rui Xiang; Rongjie Lai; Hongkai Zhao;	In this work, we introduce a novel local pairwise descriptor and then develop a simple, effective iterative method to solve the resulting quadratic assignment through sparsity control for shape correspondence between two approximate isometric surfaces.
943	Explainable Object-Induced Action Decision for Autonomous Vehicles	Yiran Xu; Xiaoyin Yang; Lihang Gong; Hsuan-Chu Lin; Tz-Ying Wu; Yunsheng Li; Nuno Vasconcelos;	A new paradigm is proposed for autonomous driving. The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem.
944	Spatially Attentive Output Layer for Image Classification	Ildoo Kim; Woonhyuk Baek; Sungwoong Kim;	In this paper, we propose a novel spatial output layer on top of the existing convolutional feature maps to explicitly exploit the location-specific output information.
945	Attack to Explain Deep Representation	Mohammad A. A. K. Jalwana; Naveed Akhtar; Mohammed Bennamoun; Ajmal Mian;	This paper counter-argues and proposes the first attack on deep learning that aims at explaining the learned representation instead of fooling it.
946	Computing Valid P-Values for Image Segmentation by Selective Inference	Kosuke Tanizaki; Noriaki Hashimoto; Yu Inatsu; Hidekata Hontani; Ichiro Takeuchi;	To overcome this difficulty, we introduce a statistical approach called selective inference, and develop a framework for computing valid p-values in which segmentation bias is properly accounted for.
947	Unsupervised Learning From Video With Deep Neural Embeddings	Chengxu Zhuang; Tianwei She; Alex Andonian; Max Sobol Mark; Daniel Yamins;	Here we present the Video Instance Embedding (VIE) framework, which trains deep nonlinear embeddings on video sequence inputs.
948	Partial Weight Adaptation for Robust DNN Inference	Xiufeng Xie; Kyu-Han Kim;	We present GearNN, an adaptive inference architecture that accommodates DNN inputs with varying distortions.
949	Probability Weighted Compact Feature for Domain Adaptive Retrieval	Fuxiang Huang; Lei Zhang; Yang Yang; Xichuan Zhou;	In this paper, considering the practical application, we focus on challenging cross-domain retrieval.
950	Where Does It End? – Reasoning About Hidden Surfaces by Object Intersection Constraints	Michael Strecke; Jorg Stuckler;	In this paper we propose Co-Section, an optimization-based approach to 3D dynamic scene reconstruction, which infers hidden shape information from intersection constraints.
951	PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation	Yang Zhang; Zixiang Zhou; Philip David; Xiangyu Yue; Zerong Xi; Boqing Gong; Hassan Foroosh;	The combination of the aforementioned challenges motivates us to propose a new LiDAR-specific, KNN-free segmentation algorithm – PolarNet.
952	Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation	Dwarikanath Mahapatra; Behzad Bozorgtabar; Ling Shao;	We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
953	Transferring and Regularizing Prediction for Semantic Segmentation	Yiheng Zhang; Zhaofan Qiu; Ting Yao; Chong-Wah Ngo; Dong Liu; Tao Mei;	In this paper, we novelly exploit the intrinsic properties of semantic segmentation to alleviate such problem for model transfer.
954	PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition	Kun Su; Xiulong Liu; Eli Shlizerman;	We propose a novel system for unsupervised skeleton-based action recognition.
955	Model Adaptation: Unsupervised Domain Adaptation Without Source Data	Rui Li; Qianfen Jiao; Wenming Cao; Hau-San Wong; Si Wu;	In this paper, we investigate a challenging unsupervised domain adaptation setting — unsupervised model adaptation.
956	Evade Deep Image Retrieval by Stashing Private Images in the Hash Space	Yanru Xiao; Cong Wang; Xing Gao;	In this paper, we propose a new mechanism based on adversarial examples to "stash" private images in the deep hash space while maintaining perceptual similarity.
957	Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules	Jinkyu Kim; Suhong Moon; Anna Rohrbach; Trevor Darrell; John Canny;	We propose a new approach that learns vehicle control with the help of human advice.
958	ProAlignNet: Unsupervised Learning for Progressively Aligning Noisy Contours	VSR Veeravasarapu; Abhishek Goel; Deepak Mittal; Maneesh Singh;	This work presents a novel ConvNet, "ProAlignNet," that accounts for large scale misalignments and complex transformations between the contour shapes.
959	Attribution in Scale and Space	Shawn Xu; Subhashini Venugopalan; Mukund Sundararajan;	We propose a new technique called Blur Integrated Gradients (Blur IG).
960	Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing	Vedika Agarwal; Rakshith Shetty; Mario Fritz;	In this paper, we propose a novel way to analyze and measure the robustness of the state of the art models w.r.t semantic visual variations as well as propose ways to make models more robust against spurious correlations.
961	Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection	Shi-Xue Zhang; Xiaobin Zhu; Jie-Bo Hou; Chang Liu; Chun Yang; Hongfa Wang; Xu-Cheng Yin;	In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection.
962	Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels	Junran Peng; Xingyuan Bu; Ming Sun; Zhaoxiang Zhang; Tieniu Tan; Junjie Yan;	In this work, we quantitatively analyze these label problems and provide a simple but effective solution.
963	BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition	Boyan Zhou; Quan Cui; Xiu-Shen Wei; Zhao-Min Chen;	Therefore, we propose a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.
964	Momentum Contrast for Unsupervised Visual Representation Learning	Kaiming He; Haoqi Fan; Yuxin Wu; Saining Xie; Ross Girshick;	We present Momentum Contrast (MoCo) for unsupervised visual representation learning.
965	Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation	Gedas Bertasius; Lorenzo Torresani;	We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence.
966	Weakly Supervised Fine-Grained Image Classification via Guassian Mixture Model Oriented Discriminative Learning	Zhihui Wang; Shijie Wang; Shuhui Yang; Haojie Li; Jianjun Li; Zezhou Li;	In this paper, we propose an end-to-end Discriminative Feature-oriented Gaussian Mixture Model (DF-GMM), to address the problem of discriminative region diffusion and find better fine-grained details.
967	Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection	Shifeng Zhang; Cheng Chi; Yongqiang Yao; Zhen Lei; Stan Z. Li;	In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them.
968	Learning User Representations for Open Vocabulary Image Hashtag Prediction	Thibaut Durand;	In this paper, we introduce an open vocabulary model for image hashtag prediction – the task of mapping an image to its accompanying hashtags.
969	Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval	Ayan Kumar Bhunia; Yongxin Yang; Timothy M. Hospedales; Tao Xiang; Yi-Zhe Song;	In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible.
970	Few-Shot Pill Recognition	Suiyi Ling; Andreas Pastor; Jing Li; Zhaohui Che; Junle Wang; Jieun Kim; Patrick Le Callet;	In this study, a new pill image database, namely CURE, is first developed with more varied imaging conditions and instances for each pill category. Secondly, a W2-net is proposed for better pill segmentation. Thirdly, a Multi-Stream (MS) deep network that captures task-related features along with a novel two-stage training methodology are proposed.
971	PointRend: Image Segmentation As Rendering	Alexander Kirillov; Yuxin Wu; Kaiming He; Ross Girshick;	We present a new method for efficient high-quality image segmentation of objects and scenes.
972	ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network	Yuliang Liu; Hao Chen; Chunhua Shen; Tong He; Lianwen Jin; Liangwei Wang;	Our contributions are three-fold: 1) For the first time, we adaptively fit oriented or curved text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods.
973	Learning Temporal Co-Attention Models for Unsupervised Video Action Localization	Guoqiang Gong; Xinghan Wang; Yadong Mu; Qi Tian;	To solve ACL, we propose a two-step "clustering + localization" iterative procedure.
974	Spatiotemporal Fusion in 3D CNNs: A Probabilistic View	Yizhou Zhou; Xiaoyan Sun; Chong Luo; Zheng-Jun Zha; Wenjun Zeng;	In this paper, we propose to convert the spatiotemporal fusion strategies into a probability space, which allows us to perform network-level evaluations of various fusion strategies without having to train them separately.
975	Uncertainty-Aware Score Distribution Learning for Action Quality Assessment	Yansong Tang; Zanlin Ni; Jiahuan Zhou; Danyang Zhang; Jiwen Lu; Ying Wu; Jie Zhou;	To address this issue, we propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA).
976	Learning Interactions and Relationships Between Movie Characters	Anna Kukleva; Makarand Tapaswi; Ivan Laptev;	In this work, we propose neural models to learn and jointly predict interactions, relationships, and the pair of characters that are involved.
977	Video Panoptic Segmentation	Dahun Kim; Sanghyun Woo; Joon-Young Lee; In So Kweon;	In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation.
978	Understanding Human Hands in Contact at Internet Scale	Dandan Shan; Jiaqi Geng; Michelle Shu; David F. Fouhey;	This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: hand location, side, contact state, and a box around the object in contact.
979	End-to-End Learning of Visual Representations From Uncurated Instructional Videos	Antoine Miech; Jean-Baptiste Alayrac; Lucas Smaira; Ivan Laptev; Josef Sivic; Andrew Zisserman;	In this work we propose a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos.
980	You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions	Evonne Ng; Donglai Xiang; Hanbyul Joo; Kristen Grauman;	We propose a learning-based approach to estimate the camera wearer’s 3D body pose from egocentric video sequences.
981	Learning a Weakly-Supervised Video Actor-Action Segmentation Model With a Wise Selection	Jie Chen; Zhiheng Li; Jiebo Luo; Chenliang Xu;	To overcome these challenges, we propose a general Weakly-Supervised framework with a Wise Selection of training samples and model evaluation criterion (WS^2).
982	Learning to Measure the Static Friction Coefficient in Cloth Contact	Abdullah Haroon Rasheed; Victor Romero; Florence Bertails-Descoubes; Stefanie Wuhrer; Jean-Sebastien Franco; Arnaud Lazarus;	We propose a first vision-based measurement network for friction between cloth and a substrate, using a simple and repeatable video acquisition protocol.
983	SpeedNet: Learning the Speediness in Videos	Sagie Benaim; Ariel Ephrat; Oran Lang; Inbar Mosseri; William T. Freeman; Michael Rubinstein; Michal Irani; Tali Dekel;	The core component in our approach is SpeedNet–a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up.
984	Telling Left From Right: Learning Spatial Correspondence of Sight and Sound	Karren Yang; Bryan Russell; Justin Salamon;	We propose a novel self-supervised task to leverage an orthogonal principle: matching spatial information in the audio stream to the positions of sound sources in the visual stream.
985	Visual-Textual Capsule Routing for Text-Based Video Segmentation	Bruce McIntosh; Kevin Duarte; Yogesh S Rawat; Mubarak Shah;	In this work, we focus on integration of video and text for the task of actor and action video segmentation from a sentence.
986	Graph-Structured Referring Expression Reasoning in the Wild	Sibei Yang; Guanbin Li; Yizhou Yu;	In this paper, we propose a scene graph guided modular network (SGMN), which performs reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression.
987	Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs	Shizhe Chen; Qin Jin; Peng Wang; Qi Wu;	In this work, we propose the Abstract Scene Graph (ASG) structure to represent user intention in fine-grained level and control what and how detailed the generated description should be.
988	Hierarchical Conditional Relation Networks for Video Question Answering	Thao Minh Le; Vuong Le; Svetha Venkatesh; Truyen Tran;	We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated structures for representation and reasoning over video.
989	REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments	Yuankai Qi; Qi Wu; Peter Anderson; Xin Wang; William Yang Wang; Chunhua Shen; Anton van den Hengel;	In the hope that it might drive progress towards more flexible and powerful human interactions with robots, we propose a dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images.
990	Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA	Ronghang Hu; Amanpreet Singh; Trevor Darrell; Marcus Rohrbach;	In this work, we propose a novel model for the TextVQA task based on a multimodal transformer architecture accompanied by a rich representation for text in images.
991	SQuINTing at VQA Models: Introspecting VQA Models With Sub-Questions	Ramprasaath R. Selvaraju; Purva Tendulkar; Devi Parikh; Eric Horvitz; Marco Tulio Ribeiro; Besmira Nushi; Ece Kamar;	To address this shortcoming, we propose an approach called Sub-Question Importance-aware Network Tuning (SQuINT), which encourages the model to attend to the same parts of the image when answering the reasoning question and the perception sub question.
992	Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks	Fengda Zhu; Yi Zhu; Xiaojun Chang; Xiaodan Liang;	In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to exploit the additional training signals derived from these semantic information.
993	Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation	Necati Cihan Camgoz; Oscar Koller; Simon Hadfield; Richard Bowden;	We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner.
994	Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation	Gen Luo; Yiyi Zhou; Xiaoshuai Sun; Liujuan Cao; Chenglin Wu; Cheng Deng; Rongrong Ji;	In this paper, we propose a novel Multi-task Collaborative Network (MCN) to achieve a joint learning of REC and RES for the first time.
995	Counterfactual Vision and Language Learning	Ehsan Abbasnejad; Damien Teney; Amin Parvaneh; Javen Shi; Anton van den Hengel;	We propose a method that addresses this problem by introducing counterfactuals in the training.
996	Iterative Context-Aware Graph Inference for Visual Dialog	Dan Guo; Hui Wang; Hanwang Zhang; Zheng-Jun Zha; Meng Wang;	To this end, we propose a novel Context-Aware Graph (CAG) neural network.
997	TA-Student VQA: Multi-Agents Training by Self-Questioning	Peixi Xiong; Ying Wu;	We introduce our self-questioning model with multi-agent training: TA-student VQA.
998	Exploring Self-Attention for Image Recognition	Hengshuang Zhao; Jiaya Jia; Vladlen Koltun;	We explore variations of self-attention and assess their effectiveness for image recognition.
999	Cops-Ref: A New Dataset and Task on Compositional Referring Expression Comprehension	Zhenfang Chen; Peng Wang; Lin Ma; Kwan-Yee K. Wong; Qi Wu;	To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features.
1000	Improving Convolutional Networks With Self-Calibrated Convolutions	Jiang-Jiang Liu; Qibin Hou; Ming-Ming Cheng; Changhu Wang; Jiashi Feng;	In this paper, we consider how to improve the basic convolutional feature transformation process of CNNs without tuning the model architectures.
1001	Modality Shifting Attention Network for Multi-Modal Video Question Answering	Junyeong Kim; Minuk Ma; Trung Pham; Kyungsu Kim; Chang D. Yoo;	This paper considers a network referred to as Modality Shifting Attention Network (MSAN) for Multimodal Video Question Answering (MVQA) task.
1002	Learning to Structure an Image With Few Colors	Yunzhong Hou; Liang Zheng; Stephen Gould;	To this end, we propose a color quantization network, ColorCNN, which learns to structure the images from the classification loss in an end-to-end manner.
1003	On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering	Xinyu Wang; Yuliang Liu; Chunhua Shen; Chun Chet Ng; Canjie Luo; Lianwen Jin; Chee Seng Chan; Anton van den Hengel; Liangwei Wang;	We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that co-opts a well understood image-based metric to reflect the method’s ability to reason.
1004	From Paris to Berlin: Discovering Fashion Style Influences Around the World	Ziad Al-Halah; Kristen Grauman;	We introduce an approach that detects which cities influence which other cities in terms of propagating their styles.
1005	A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation	Anyi Rao; Linning Xu; Yu Xiong; Guodong Xu; Qingqiu Huang; Bolei Zhou; Dahua Lin;	Towards this goal, we scale up the scene segmentation task by building a large-scale video dataset MovieScenes, which contains 21K annotated scene segments from 150 movies. We further propose a local-to-global scene segmentation framework, which integrates multi-modal information across three levels, i.e. clip, segment, and movie.
1006	G-TAD: Sub-Graph Localization for Temporal Action Detection	Mengmeng Xu; Chen Zhao; David S. Rojas; Ali Thabet; Bernard Ghanem;	In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem.
1007	Detailed 2D-3D Joint Representation for Human-Object Interaction	Yong-Lu Li; Xinpeng Liu; Han Lu; Shiyi Wang; Junqi Liu; Jiefeng Li; Cewu Lu;	In light of these, we propose a detailed 2D-3D joint representation learning method. To better evaluate the 2D ambiguity processing capacity of models, we propose a new benchmark named Ambiguous-HOI consisting of hard ambiguous images.
1008	One-Shot Adversarial Attacks on Visual Tracking With Dual Attention	Xuesong Chen; Xiyu Yan; Feng Zheng; Yong Jiang; Shu-Tao Xia; Yong Zhao; Rongrong Ji;	In this paper, we propose a novel one-shot adversarial attack method to generate adversarial examples for free-model single object tracking, where merely adding slight perturbations on the target patch in the initial frame causes state-of-the-art trackers to lose the target in subsequent frames.
1009	Rethinking Classification and Localization for Object Detection	Yue Wu; Yinpeng Chen; Lu Yuan; Zicheng Liu; Lijuan Wang; Hongzhi Li; Yun Fu;	Based upon these findings, we propose a Double-Head method, which has a fully connected head focusing on classification and a convolution head for bounding box regression.
1010	Correspondence Networks With Adaptive Neighbourhood Consensus	Shuda Li; Kai Han; Theo W. Costain; Henry Howard-Jenkins; Victor Prisacariu;	In this paper, we tackle the task of establishing dense visual correspondences between images containing objects of the same category.
1011	Multiple Anchor Learning for Visual Object Detection	Wei Ke; Tianliang Zhang; Zeyi Huang; Qixiang Ye; Jianzhuang Liu; Dong Huang;	In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector.
1012	PhraseCut: Language-Based Image Segmentation in the Wild	Chenyun Wu; Zhe Lin; Scott Cohen; Trung Bui; Subhransu Maji;	We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs.
1013	Mask Encoding for Single Shot Instance Segmentation	Rufeng Zhang; Zhi Tian; Chunhua Shen; Mingyu You; Youliang Yan;	In this work, we propose a simple single-shot instance segmentation framework, termed mask encoding based instance segmentation (MEInst).
1014	Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs	Jingwei Ji; Ranjay Krishna; Li Fei-Fei; Juan Carlos Niebles;	Inspired by evidence that the prototypical unit of an event is an action-object interaction, we introduce Action Genome, a representation that decomposes actions into spatio-temporal scene graphs.
1015	Learning Unseen Concepts via Hierarchical Decomposition and Composition	Muli Yang; Cheng Deng; Junchi Yan; Xianglong Liu; Dacheng Tao;	We propose to learn unseen concepts in a hierarchical decomposition-and-composition manner.
1016	Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification	Seokeon Choi; Sumin Lee; Youngeun Kim; Taekyung Kim; Changick Kim;	To reduce both intra- and cross-modality discrepancies, we propose a Hierarchical Cross-Modality Disentanglement (Hi-CMD) method, which automatically disentangles ID-discriminative factors and ID-excluded factors from visible-thermal images.
1017	In Defense of Grid Features for Visual Question Answering	Huaizu Jiang; Ishan Misra; Marcus Rohrbach; Erik Learned-Miller; Xinlei Chen;	In this paper, we revisit grid features for VQA, and find they can work surprisingly well — running more than an order of magnitude faster with the same accuracy (e.g. if pre-trained in a similar fashion).
1018	Multi-Mutual Consistency Induced Transfer Subspace Learning for Human Motion Segmentation	Tao Zhou; Huazhu Fu; Chen Gong; Jianbing Shen; Ling Shao; Fatih Porikli;	To this end, we propose a novel multi-mutual consistency induced transfer subspace learning framework for human motion segmentation.
1019	Dense Regression Network for Video Grounding	Runhao Zeng; Haoming Xu; Wenbing Huang; Peihao Chen; Mingkui Tan; Chuang Gan;	The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.
1020	Neural Architecture Search for Lightweight Non-Local Networks	Yingwei Li; Xiaojie Jin; Jieru Mei; Xiaochen Lian; Linjie Yang; Cihang Xie; Qihang Yu; Yuyin Zhou; Song Bai; Alan L. Yuille;	We propose AutoNL to overcome the above two obstacles.
1021	Learning Saliency Propagation for Semi-Supervised Instance Segmentation	Yanzhao Zhou; Xin Wang; Jianbin Jiao; Trevor Darrell; Fisher Yu;	We propose ShapeProp, which learns to activate the salient regions within the object detection and propagate the areas to the whole instance through an iterative learnable message passing module.
1022	Speech2Action: Cross-Modal Supervision for Action Recognition	Arsha Nagrani; Chen Sun; David Ross; Rahul Sukthankar; Cordelia Schmid; Andrew Zisserman;	In this work we investigate the link between spoken words and actions in movies.
1023	Normalized and Geometry-Aware Self-Attention Network for Image Captioning	Longteng Guo; Jing Liu; Xinxin Zhu; Peng Yao; Shichen Lu; Hanqing Lu;	In this paper, we improve SA from two aspects to promote the performance of image captioning.
1024	Memory Enhanced Global-Local Aggregation for Video Object Detection	Yihong Chen; Yue Cao; Han Hu; Liwei Wang;	In this paper we introduce memory enhanced global-local aggregation (MEGA) network, which is among the first trials that takes full consideration of both global and local information.
1025	Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval	Kaiyue Pang; Yongxin Yang; Timothy M. Hospedales; Tao Xiang; Yi-Zhe Song;	In this paper, we propose a self-supervised alternative for representation pre-training.
1026	LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks	Hang Zhou; Dongdong Chen; Jing Liao; Kejiang Chen; Xiaoyi Dong; Kunlin Liu; Weiming Zhang; Gang Hua; Nenghai Yu;	To overcome these shortcomings, this paper proposes a novel label guided adversarial network (LG-GAN) for real-time flexible targeted point cloud attack.
1027	Memory Aggregation Networks for Efficient Interactive Video Object Segmentation	Jiaxu Miao; Yunchao Wei; Yi Yang;	In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way.
1028	VQA With No Questions-Answers Training	Ben-Zion Vatashsky; Shimon Ullman;	We propose a novel method that consists of two main parts: generating a question graph representation, and an answering procedure, guided by the abstract structure of the question graph to invoke an extendable set of visual estimators.
1029	Counting Out Time: Class Agnostic Video Repetition Counting in the Wild	Debidatta Dwibedi; Yusuf Aytar; Jonathan Tompson; Pierre Sermanet; Andrew Zisserman;	We present an approach for estimating the period with which an action is repeated in a video.
1030	SaccadeNet: A Fast and Accurate Object Detector	Shiyi Lan; Zhou Ren; Yi Wu; Larry S. Davis; Gang Hua;	In this paper, inspired by such mechanism, we propose a fast and accurate object detector called SaccadeNet.
1031	Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification	Zhizheng Zhang; Cuiling Lan; Wenjun Zeng; Zhibo Chen;	In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation.
1032	Video Object Grounding Using Semantic Roles in Language Description	Arka Sadhu; Kan Chen; Ram Nevatia;	Here, we investigate the role of object relations in VOG and propose a novel framework VOGNet to encode multi-modal object relations via self-attention with relative position encoding.
1033	Designing Network Design Spaces	Ilija Radosavovic; Raj Prateek Kosaraju; Ross Girshick; Kaiming He; Piotr Dollar;	In this work, we present a new network design paradigm.
1034	12-in-1: Multi-Task Vision and Language Representation Learning	Jiasen Lu; Vedanuj Goswami; Marcus Rohrbach; Devi Parikh; Stefan Lee;	In this work, we investigate these relationships between vision-and-language tasks by developing a large-scale, multi-task model.
1035	MLCVNet: Multi-Level Context VoteNet for 3D Object Detection	Qian Xie; Yu-Kun Lai; Jing Wu; Zhoutao Wang; Yiming Zhang; Kai Xu; Jun Wang;	In this paper, we address the 3D object detection task by capturing multi-level contextual information with the self-attention mechanism and multi-scale feature fusion.
1036	Listen to Look: Action Recognition by Previewing Audio	Ruohan Gao; Tae-Hyun Oh; Kristen Grauman; Lorenzo Torresani;	We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies.
1037	Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization	Ruyi Ji; Longyin Wen; Libo Zhang; Dawei Du; Yanjun Wu; Chen Zhao; Xianglong Liu; Feiyue Huang;	Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree.
1038	Music Gesture for Visual Sound Separation	Chuang Gan; Deng Huang; Hang Zhao; Joshua B. Tenenbaum; Antonio Torralba;	To address this, we propose "Music Gesture," a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
1039	Referring Image Segmentation via Cross-Modal Progressive Comprehension	Shaofei Huang; Tianrui Hui; Si Liu; Guanbin Li; Yunchao Wei; Jizhong Han; Luoqi Liu; Bo Li;	In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
1040	Cloth in the Wind: A Case Study of Physical Measurement Through Simulation	Tom F. H. Runia; Kirill Gavrilyuk; Cees G. M. Snoek; Arnold W. M. Smeulders;	In this paper, we propose to measure latent physical properties for cloth in the wind without ever having seen a real example before.
1041	The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction	Junwei Liang; Lu Jiang; Kevin Murphy; Ting Yu; Alexander Hauptmann;	This paper studies the problem of predicting the distribution over multiple possible future paths of people as they move through various visual scenes.
1042	CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection	Zhiwei Dong; Guoxuan Li; Yue Liao; Fei Wang; Pengju Ren; Chen Qian;	In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance.
1043	PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection	Shaoshuai Shi; Chaoxu Guo; Li Jiang; Zhe Wang; Jianping Shi; Xiaogang Wang; Hongsheng Li;	We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.
1044	Graph Embedded Pose Clustering for Anomaly Detection	Amir Markovitz; Gilad Sharir; Itamar Friedman; Lihi Zelnik-Manor; Shai Avidan;	We propose a new method for anomaly detection of human actions.
1045	Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation	Jiaming Sun; Linghao Chen; Yiming Xie; Siyu Zhang; Qinhong Jiang; Xiaowei Zhou; Hujun Bao;	In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images.
1046	Deepstrip: High-Resolution Boundary Refinement	Peng Zhou; Brian Price; Scott Cohen; Gregg Wilensky; Larry S. Davis;	In this paper, we target refining the boundaries in high resolution images given low resolution masks.
1047	Smoothing Adversarial Domain Attack and P-Memory Reconsolidation for Cross-Domain Person Re-Identification	Guangcong Wang; Jian-Huang Lai; Wenqi Liang; Guangrun Wang;	To reduce the gap between the source and target domains, we propose a Smoothing Adversarial Domain Attack (SADA) approach that guides the source domain images to align the target domain images by using a trained camera classifier.
1048	Meshed-Memory Transformer for Image Captioning	Marcella Cornia; Matteo Stefanini; Lorenzo Baraldi; Rita Cucchiara;	With the aim of filling this gap, we present M2 – a Meshed Transformer with Memory for Image Captioning.
1049	Learning From Noisy Anchors for One-Stage Object Detection	Hengduo Li; Zuxuan Wu; Chen Zhu; Caiming Xiong; Richard Socher; Larry S. Davis;	In this paper, we propose to mitigate noise incurred by imperfect label assignment such that the contributions of anchors are dynamically determined by a carefully constructed cleanliness score associated with each anchor.
1050	Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection	Zhongzheng Ren; Zhiding Yu; Xiaodong Yang; Ming-Yu Liu; Yong Jae Lee; Alexander G. Schwing; Jan Kautz;	To target these issues we develop an instance-aware and context-focused unified framework.
1051	Density-Based Clustering for 3D Object Detection in Point Clouds	Syeda Mariam Ahmed; Chee Meng Chew;	In this work, we introduce a novel approach for 3D object detection that is significant in two main aspects: a) cascaded modular approach that focuses the receptive field of each module on specific points in the point cloud, for improved feature learning and b) a class agnostic instance segmentation module that is initiated using unsupervised clustering.
1052	Few-Shot Video Classification via Temporal Alignment	Kaidi Cao; Jingwei Ji; Zhangjie Cao; Chien-Yi Chang; Juan Carlos Niebles;	In this paper, we propose the Ordered Temporal Alignment Module (OTAM), a novel few-shot learning framework that can learn to classify a previously unseen video.
1053	Densely Connected Search Space for More Flexible Neural Architecture Search	Jiemin Fang; Yuzhu Sun; Qian Zhang; Yuan Li; Wenyu Liu; Xinggang Wang;	In this paper, we propose to search block counts and block widths by designing a densely connected search space, i.e., DenseNAS.
1054	Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning	Shizhe Chen; Yida Zhao; Qin Jin; Qi Wu;	To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.
1055	Warp to the Future: Joint Forecasting of Features and Feature Motion	Josip Saric; Marin Orsic; Tonci Antunovic; Sacha Vrazic; Sinisa Segvic;	We propose to address this issue by complementing F2M forecasting with the classic F2F approach.
1056	Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio	Zhengsu Chen; Jianwei Niu; Lingxi Xie; Xuefeng Liu; Longhui Wei; Qi Tian;	This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs, so that under each network configuration, one can estimate the FLOPs utilization ratio (FUR) for each layer and use it to determine whether to increase or decrease the number of channels on the layer.
1057	Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences	Zhu Zhang; Zhou Zhao; Yang Zhao; Qi Wang; Huasheng Liu; Lianli Gao;	In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG).
1058	Cross-Modal Cross-Domain Moment Alignment Network for Person Search	Ya Jing; Wei Wang; Liang Wang; Tieniu Tan;	Specially, we propose a moment alignment network (MAN) to solve the cross-modal cross-domain person search task in this paper.
1059	Self-Training With Noisy Student Improves ImageNet Classification	Qizhe Xie; Minh-Thang Luong; Eduard Hovy; Quoc V. Le;	We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images.
1060	Learning Longterm Representations for Person Re-Identification Using Radio Signals	Lijie Fan; Tianhong Li; Rongyao Fang; Rumen Hristov; Yuan Yuan; Dina Katabi;	In this paper, we introduce RF-ReID, a novel approach that harnesses radio frequency (RF) signals for longterm person ReID.
1061	LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation	Keunhong Park; Arsalan Mousavian; Yu Xiang; Dieter Fox;	We propose a novel framework for 6D pose estimation of unseen objects.
1062	Learning Instance Occlusion for Panoptic Segmentation	Justin Lazarow; Kwonjoon Lee; Kunyu Shi; Zhuowen Tu;	To resolve this issue, we propose a branch that is tasked with modeling how two instance masks should overlap one another as a binary relation.
1063	Vision-Dialog Navigation by Exploring Cross-Modal Memory	Yi Zhu; Fengda Zhu; Zhaohuan Zhan; Bingqian Lin; Jianbin Jiao; Xiaojun Chang; Xiaodan Liang;	In this paper, we propose the Cross-modal Memory Network (CMN) for remembering and understanding the rich information relevant to historical navigation actions.
1064	ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks	Mohit Shridhar; Jesse Thomason; Daniel Gordon; Yonatan Bisk; Winson Han; Roozbeh Mottaghi; Luke Zettlemoyer; Dieter Fox;	We present ALFRED (Action Learning From Realistic Environments and Directives), a benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks.
1065	NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing	Xin Huang; Zheng Ge; Zequn Jie; Osamu Yoshie;	To avoid such a dilemma, this paper proposes a novel Representative Region NMS (R2NMS) approach leveraging the less occluded visible parts, effectively removing the redundant boxes without bringing in many false positives.
1066	Visual Commonsense R-CNN	Tan Wang; Jianqiang Huang; Hanwang Zhang; Qianru Sun;	We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA.
1067	What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective	Qilong Wang; Li Zhang; Banggu Wu; Dongwei Ren; Peihua Li; Wangmeng Zuo; Qinghua Hu;	In this paper, we make an attempt to understand what deep CNNs benefit from GCP in a viewpoint of optimization.
1068	EfficientDet: Scalable and Efficient Object Detection	Mingxing Tan; Ruoming Pang; Quoc V. Le;	In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency.
1069	Fast Template Matching and Update for Video Object Tracking and Segmentation	Mingjie Sun; Jimin Xiao; Eng Gee Lim; Bingfeng Zhang; Yao Zhao;	In this paper, the main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames where only the first-frame box-level ground-truth is provided.
1070	Counterfactual Samples Synthesizing for Robust Visual Question Answering	Long Chen; Xin Yan; Jun Xiao; Hanwang Zhang; Shiliang Pu; Yueting Zhuang;	To this end, we propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme.
1071	Local-Global Video-Text Interactions for Temporal Grounding	Jonghwan Mun; Minsu Cho; Bohyung Han;	We tackle this problem using a novel regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query, which corresponds to important semantic entities described in the query (e.g., actors, objects, and actions), and reflect bi-modal interactions between the linguistic features of the query and the visual features of the video in multiple levels.
1072	Set-Constrained Viterbi for Set-Supervised Action Segmentation	Jun Li; Sinisa Todorovic;	Our first contribution is the formulation of a new set-constrained Viterbi algorithm (SCV).
1073	Probabilistic Video Prediction From Noisy Data With a Posterior Confidence	Yunbo Wang; Jiajun Wu; Mingsheng Long; Joshua B. Tenenbaum;	In this paper, we propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Network (BP-Net).
1074	Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context	Chenchen Liu; Yang Jin; Kehan Xu; Guoqiang Gong; Yadong Mu;	To address these issues, this work proposes a novel sliding-window scheme to simultaneously predict short-term and long-term relationships.
1075	Visual Grounding in Video for Unsupervised Word Translation	Gunnar A. Sigurdsson; Jean-Baptiste Alayrac; Aida Nematzadeh; Lucas Smaira; Mateusz Malinowski; Joao Carreira; Phil Blunsom; Andrew Zisserman;	Our goal is to use visual grounding to improve unsupervised word mapping between languages.
1076	Two Causal Principles for Improving Visual Dialog	Jiaxin Qi; Yulei Niu; Jianqiang Huang; Hanwang Zhang;	This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial).
1077	Spatio-Temporal Graph for Video Captioning With Knowledge Distillation	Boxiao Pan; Haoye Cai; De-An Huang; Kuan-Hui Lee; Adrien Gaidon; Ehsan Adeli; Juan Carlos Niebles;	In this paper, we propose a novel spatio-temporal graph model for video captioning that exploits object interactions in space and time.
1078	A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension	Yue Liao; Si Liu; Guanbin Li; Fei Wang; Yanjie Chen; Chen Qian; Bo Li;	To this end, we propose a novel Realtime Cross-modality Correlation Filtering method (RCCF).
1079	Better Captioning With Sequence-Level Exploration	Jia Chen; Qin Jin;	In this work, we show the limitation of the current sequence-level learning objective for captioning tasks from both theory and empirical result.
1080	Violin: A Large-Scale Dataset for Video-and-Language Inference	Jingzhou Liu; Wenhu Chen; Yu Cheng; Zhe Gan; Licheng Yu; Yiming Yang; Jingjing Liu;	We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.
1081	RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge	Jun Cheng; Fuxiang Wu; Yanling Tian; Lei Wang; Dapeng Tao;	To address this problem, we propose a novel rich feature generating text-to-image synthesis, called RiFeGAN, to enrich the given description.
1082	Graph Structured Network for Image-Text Matching	Chunxiao Liu; Zhendong Mao; Tianzhu Zhang; Hongtao Xie; Bin Wang; Yongdong Zhang;	In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence.
1083	Straight to the Point: Fast-Forwarding Videos via Reinforcement Learning Using Textual Data	Washington Ramos; Michel Silva; Edson Araujo; Leandro Soriano Marcolino; Erickson Nascimento;	In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos.
1084	Multi-Modality Cross Attention Network for Image and Sentence Matching	Xi Wei; Tianzhu Zhang; Yan Li; Yongdong Zhang; Feng Wu;	Different from them, in this work, we propose a novel MultiModality Cross Attention (MMCA) Network for image and sentence matching by jointly modeling the intra-modality and inter-modality relationships of image regions and sentence words in a unified deep model.
1085	Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data	Yen-Chang Hsu; Yilin Shen; Hongxia Jin; Zsolt Kira;	We base our work on a popular method ODIN, proposing two strategies for freeing it from the needs of tuning with OoD data, while improving its OoD detection performance.
1086	Learning Augmentation Network via Influence Functions	Donghoon Lee; Hyunsin Park; Trung Pham; Chang D. Yoo;	This paper considers an influence function that predicts how generalization performance, in terms of validation loss, is affected by a particular augmented training sample.
1087	X-Linear Attention Networks for Image Captioning	Yingwei Pan; Ting Yao; Yehao Li; Tao Mei;	In this paper, we introduce a unified attention block — X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning.
1088	Unsupervised Person Re-Identification via Multi-Label Classification	Dongkai Wang; Shiliang Zhang;	This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
1089	Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax	Yu Li; Tao Wang; Bingyi Kang; Sheng Tang; Chunfeng Wang; Jintao Li; Jiashi Feng;	In this work, we propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training.
1090	What You See is What You Get: Exploiting Visibility for 3D Object Detection	Peiyun Hu; Jason Ziglar; David Held; Deva Ramanan;	We argue that representing 2.5D data as collections of (x,y,z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning.
1091	Deep Structure-Revealed Network for Texture Recognition	Wei Zhai; Yang Cao; Zheng-Jun Zha; HaiYong Xie; Feng Wu;	To address this problem, we propose a novel Deep Structure-Revealed Network (DSR-Net) that leverages spatial dependency among the captured primitives as structural representation for texture recognition.
1092	Online Knowledge Distillation via Collaborative Learning	Qiushan Guo; Xinjiang Wang; Yichao Wu; Zhipeng Yu; Ding Liang; Xiaolin Hu; Ping Luo;	This work presents an efficient yet effective online Knowledge Distillation method via Collaborative Learning, termed KDCL, which is able to consistently improve the generalization ability of deep neural networks (DNNs) that have different learning capacities.
1093	Dynamic Convolution: Attention Over Convolution Kernels	Yinpeng Chen; Xiyang Dai; Mengchen Liu; Dongdong Chen; Lu Yuan; Zicheng Liu;	To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width.
1094	3DSSD: Point-Based 3D Single Stage Object Detector	Zetong Yang; Yanan Sun; Shu Liu; Jiaya Jia;	In this paper, we present a lightweight point-based 3D single stage object detector 3DSSD to achieve decent balance of accuracy and efficiency.
1095	Deep Degradation Prior for Low-Quality Image Classification	Yang Wang; Yang Cao; Zheng-Jun Zha; Jing Zhang; Zhiwei Xiong;	To address this problem, this paper proposes a novel deep degradation prior for low-quality image classification.
1096	ViBE: Dressing for Diverse Body Shapes	Wei-Lin Hsiao; Kristen Grauman;	We introduce ViBE, a VIsual Body-aware Embedding that captures clothing’s affinity with different body shapes.
1097	Don’t Judge an Object by Its Context: Learning to Overcome Contextual Bias	Krishna Kumar Singh; Dhruv Mahajan; Kristen Grauman; Yong Jae Lee; Matt Feiszli; Deepti Ghadiyaram;	Our goal is to accurately recognize a category in the absence of its context, without compromising on performance when it co-occurs with context.
1098	SESS: Self-Ensembling Semi-Supervised 3D Object Detection	Na Zhao; Tat-Seng Chua; Gim Hee Lee;	Inspired by the recent success of self-ensembling technique in semi-supervised image classification task, we propose SESS, a self-ensembling semi-supervised 3D object detection framework.
1099	Combining Detection and Tracking for Human Pose Estimation in Videos	Manchen Wang; Joseph Tighe; Davide Modolo;	We propose a novel top-down approach that tackles the problem of multi-person human pose estimation and tracking in videos.
1100	SAPIEN: A SimulAted Part-Based Interactive ENvironment	Fanbo Xiang; Yuzhe Qin; Kaichun Mo; Yikuan Xia; Hao Zhu; Fangchen Liu; Minghua Liu; Hanxiao Jiang; Yifu Yuan; He Wang; Li Yi; Angel X. Chang; Leonidas J. Guibas; Hao Su;	SAPIEN enables various robotic vision and interaction tasks that require detailed part-level understanding.
1101	RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds	Qingyong Hu; Bo Yang; Linhai Xie; Stefano Rosa; Yulan Guo; Zhihua Wang; Niki Trigoni; Andrew Markham;	In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds.
1102	SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving	Zhenpei Yang; Yuning Chai; Dragomir Anguelov; Yin Zhou; Pei Sun; Dumitru Erhan; Sean Rafferty; Henrik Kretzschmar;	In this paper, we present a simple yet effective approach to generate realistic scenario sensor data, based only on a limited amount of lidar and camera data collected by an autonomous vehicle.
1103	A Programmatic and Semantic Approach to Explaining and Debugging Neural Network Based Object Detectors	Edward Kim; Divya Gopinath; Corina Pasareanu; Sanjit A. Seshia;	In this paper, we present a programmatic and semantic approach to explaining, understanding, and debugging the correct and incorrect behaviors of a neural network based perception system.
1104	Predicting Semantic Map Representations From Images Using Pyramid Occupancy Networks	Thomas Roddick; Roberto Cipolla;	In this work we present a simple, unified approach for estimating these map representations directly from monocular images using a single end-to-end deep learning architecture.
1105	Efficient Derivative Computation for Cumulative B-Splines on Lie Groups	Christiane Sommer; Vladyslav Usenko; David Schubert; Nikolaus Demmel; Daniel Cremers;	In this work we present an alternative derivation of time derivatives based on recurrence relations that needs O(k) instead of O(k^2) matrix operations (for a spline of order k) and results in simple and elegant expressions.
1106	RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real	Kanishka Rao; Chris Harris; Alex Irpan; Sergey Levine; Julian Ibarz; Mohi Khansari;	In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image.
1107	LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World	Sivabalan Manivasagam; Shenlong Wang; Kelvin Wong; Wenyuan Zeng; Mikita Sazanovich; Shuhan Tan; Bin Yang; Wei-Chiu Ma; Raquel Urtasun;	We tackle the problem of producing realistic simulations of LiDAR point clouds, the sensor of preference for most self-driving vehicles.
1108	Just Go With the Flow: Self-Supervised Scene Flow Estimation	Himangi Mittal; Brian Okorn; David Held;	As an alternative, we present a method of training scene flow that uses two self-supervised losses, based on nearest neighbors and cycle consistency.
1109	TITAN: Future Forecast Using Action Priors	Srikanth Malla; Behzad Dariush; Chiho Choi;	In an attempt to address this problem, we introduce TITAN (Trajectory Inference using Targeted Action priors Network), a new model that incorporates prior positions, actions, and context to forecast future trajectory of agents and future ego-motion.
1110	Robust Learning Through Cross-Task Consistency	Amir R. Zamir; Alexander Sax; Nikhil Cheerla; Rohan Suri; Zhangjie Cao; Jitendra Malik; Leonidas J. Guibas;	We propose a flexible and fully computational framework for learning while enforcing Cross-Task Consistency (X-TAC).
1111	Dynamic Refinement Network for Oriented and Densely Packed Object Detection	Xingjia Pan; Yuqiang Ren; Kekai Sheng; Weiming Dong; Haolei Yuan; Xiaowei Guo; Chongyang Ma; Changsheng Xu;	To resolve the first two issues, we present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH).
1112	AOWS: Adaptive and Optimal Network Width Search With Latency Constraints	Maxim Berman; Leonid Pishchulin; Ning Xu; Matthew B. Blaschko; Gerard Medioni;	We introduce a novel efficient one-shot NAS approach to optimally search for channel numbers, given latency constraints on a specific hardware.
1113	High-Dimensional Convolutional Networks for Geometric Pattern Recognition	Christopher Choy; Junha Lee; Rene Ranftl; Jaesik Park; Vladlen Koltun;	In this work, we present high-dimensional convolutional networks for geometric pattern recognition problems that arise in 2D and 3D registration problems.
1114	Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks	Saurabh Singh; Shankar Krishnan;	In this paper we propose the Filter Response Normalization (FRN) layer, a novel combination of a normalization and an activation function, that can be used as a replacement for other normalizations and activations.
1115	Deep Iterative Surface Normal Estimation	Jan Eric Lenssen; Christian Osendorfer; Jonathan Masci;	This paper presents an end-to-end differentiable algorithm for robust and detail-preserving surface normal estimation on unstructured point-clouds.
1116	Dataless Model Selection With the Deep Frame Potential	Calvin Murdock; Simon Lucey;	Building upon theoretical connections between deep learning and sparse approximation, we propose the deep frame potential: a measure of coherence that is approximately related to representation stability but has minimizers that depend only on network structure.
1117	UNAS: Differentiable Architecture Search Meets Reinforcement Learning	Arash Vahdat; Arun Mallya; Ming-Yu Liu; Jan Kautz;	In this work, we present UNAS, a unified framework for NAS, that encapsulates recent DNAS and RL-based approaches under one framework.
1118	Local Context Normalization: Revisiting Local Normalization	Anthony Ortiz; Caleb Robinson; Dan Morris; Olac Fuentes; Christopher Kiekintveld; Md Mahmudulla Hassan; Nebojsa Jojic;	We propose an algorithmic solution to make LCN efficient for arbitrary window sizes, even if every point in the image has a unique window.
1119	ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning	Weiwei Sun; Wei Jiang; Eduard Trulls; Andrea Tagliasacchi; Kwang Moo Yi;	In this paper, we propose Attentive Context Normalization (ACN), a simple yet effective technique to build permutation-equivariant networks robust to outliers.
1120	Learning Situational Driving	Eshed Ohn-Bar; Aditya Prakash; Aseem Behl; Kashyap Chitta; Andreas Geiger;	Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes.
1121	From Depth What Can You See? Depth Completion via Auxiliary Image Reconstruction	Kaiyue Lu; Nick Barnes; Saeed Anwar; Liang Zheng;	This paper continues this line of research and aims to overcome the above shortcomings.
1122	Symmetry and Group in Attribute-Object Compositions	Yong-Lu Li; Yue Xu; Xiaohan Mao; Cewu Lu;	Incorporating the symmetry principle, a transformation framework inspired by group theory is built, i.e. SymNet.
1123	Noise-Aware Fully Webly Supervised Object Detection	Yunhang Shen; Rongrong Ji; Zhiwei Chen; Xiaopeng Hong; Feng Zheng; Jianzhuang Liu; Mingliang Xu; Qi Tian;	In this work, we propose an end-to-end framework to jointly learn webly supervised detectors and reduce the negative impact of noisy labels.
1124	3D Part Guided Image Editing for Fine-Grained Object Understanding	Zongdai Liu; Feixiang Lu; Peng Wang; Hui Miao; Liangjun Zhang; Ruigang Yang; Bin Zhou;	In this paper, we fill this important missing piece in autonomous driving by solving two critical issues.
1125	STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction	Zhishuai Zhang; Jiyang Gao; Junhua Mao; Yukai Liu; Dragomir Anguelov; Congcong Li;	In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet).
1126	Rethinking Performance Estimation in Neural Architecture Search	Xiawu Zheng; Rongrong Ji; Qiang Wang; Qixiang Ye; Zhenguo Li; Yonghong Tian; Qi Tian;	In this paper, we provide a novel yet systematic rethinking of PE in a resource constrained regime, termed budgeted PE (BPE), which precisely and effectively estimates the performance of an architecture sampled from an architecture space.
1127	Feature-Metric Registration: A Fast Semi-Supervised Approach for Robust Point Cloud Registration Without Correspondences	Xiaoshui Huang; Guofeng Mei; Jian Zhang;	We present a fast feature-metric point cloud registration framework, which enforces the optimisation of registration by minimising a feature-metric projection error without correspondences.
1128	Learning Multi-View Camera Relocalization With Graph Neural Networks	Fei Xue; Xin Wu; Shaojun Cai; Junqiu Wang;	We propose to construct a view graph to excavate the information of the whole given sequence for absolute camera pose estimation.
1129	MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps	Pengxiang Wu; Siheng Chen; Dimitris N. Metaxas;	In this work, we propose an efficient deep model, called MotionNet, to jointly perform perception and motion prediction from 3D point clouds.
1130	EcoNAS: Finding Proxies for Economical Neural Architecture Search	Dongzhan Zhou; Xinchi Zhou; Wenwei Zhang; Chen Change Loy; Shuai Yi; Xuesen Zhang; Wanli Ouyang;	In this paper, we observe that most existing proxies exhibit different behaviors in maintaining the rank consistency among network candidates.
1131	Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection	Jianyuan Guo; Kai Han; Yunhe Wang; Chao Zhang; Zhaohui Yang; Han Wu; Xinghao Chen; Chang Xu;	To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i.e. backbone, neck, and head) of object detector in an end-to-end manner.
1132	Geometrically Principled Connections in Graph Neural Networks	Shunwang Gong; Mehdi Bahri; Michael M. Bronstein; Stefanos Zafeiriou;	In this paper, we argue geometry should remain the primary driving force behind innovation in the emerging field of geometric deep learning.
1133	On Vocabulary Reliance in Scene Text Recognition	Zhaoyi Wan; Jielei Zhang; Liang Zhang; Jiebo Luo; Cong Yao;	In this paper, we establish an analytical framework, in which different datasets, metrics and module combinations for quantitative comparisons are devised, to conduct an in-depth study on the problem of vocabulary reliance in scene text recognition.
1134	Generating Accurate Pseudo-Labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations	Vishnu Suresh Lokhande; Songwong Tasneeyapant; Abhay Venkatesh; Sathya N. Ravi; Vikas Singh;	Motivated by some of these results, we explore the use of Hermite polynomial expansions as a substitute for ReLUs in deep networks.
1135	GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping	Hao-Shu Fang; Chenxi Wang; Minghao Gou; Cewu Lu;	In this work, we contribute a large-scale grasp pose detection dataset with a unified evaluation system.
1136	PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation	Jianzhun Shao; Yuhang Jiang; Gu Wang; Zhigang Li; Xiangyang Ji;	In this work, to get rid of the burden of 6D annotations, we formulate the 6D pose refinement as a Markov Decision Process and impose on the reinforcement learning approach with only 2D image annotations as weakly-supervised 6D pose information, via a delicate reward definition and a composite reinforced optimization method for efficient and effective policy training.
1137	Through Fog High-Resolution Imaging Using Millimeter Wave Radar	Junfeng Guan; Sohrab Madani; Suraj Jog; Saurabh Gupta; Haitham Hassanieh;	We introduce HawkEye, a system that leverages a cGAN architecture to recover high-frequency shapes from raw low-resolution mmWave heat-maps.
1138	Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction	Vincent Le Guen; Nicolas Thome;	Since physics is too restrictive for describing the full visual content of generic video sequences, we introduce PhyDNet, a two-branch deep architecture, which explicitly disentangles PDE dynamics from unknown complementary information.
1139	D2Det: Towards High Quality Object Detection and Instance Segmentation	Jiale Cao; Hisham Cholakkal; Rao Muhammad Anwer; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao;	We propose a novel two-stage detection method, D2Det, that collectively addresses both precise localization and accurate classification.
1140	LiDAR-Based Online 3D Video Object Detection With Graph-Based Message Passing and Spatiotemporal Transformer Attention	Junbo Yin; Jianbing Shen; Chenye Guan; Dingfu Zhou; Ruigang Yang;	In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences.
1141	Orthogonal Convolutional Neural Networks	Jiayun Wang; Yubei Chen; Rudrasis Chakraborty; Stella X. Yu;	We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, instead of the common kernel orthogonality approach, which we show is only necessary but not sufficient for ensuring orthogonal convolutions.
1142	Self-Robust 3D Point Recognition via Gather-Vector Guidance	Xiaoyi Dong; Dongdong Chen; Hang Zhou; Gang Hua; Weiming Zhang; Nenghai Yu;	In this paper, we look into the problem of 3D adversary attack, and propose to leverage the internal properties of the point clouds and the adversarial examples to design a new self-robust deep neural network (DNN) based 3D recognition systems.
1143	VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation	Jiyang Gao; Chen Sun; Hang Zhao; Yi Shen; Dragomir Anguelov; Congcong Li; Cordelia Schmid;	This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components.
1144	ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	Qilong Wang; Banggu Wu; Pengfei Zhu; Peihua Li; Wangmeng Zuo; Qinghua Hu;	To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain.
1145	MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning	Yuan Gao; Haoping Bai; Zequn Jie; Jiayi Ma; Kui Jia; Wei Liu;	We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL).
1146	PnPNet: End-to-End Perception and Prediction With Tracking in the Loop	Ming Liang; Bin Yang; Wenyuan Zeng; Yun Chen; Rui Hu; Sergio Casas; Raquel Urtasun;	Towards this goal we propose PnPNet, an end-to-end model that takes as input sequential sensor data, and outputs at each time step object tracks and their future trajectories.
1147	Revisiting the Sibling Head in Object Detector	Guanglu Song; Yu Liu; Xiaogang Wang;	This paper provides the observation that the spatial misalignment between the two object functions in the sibling head can considerably hurt the training process, but this misalignment can be resolved by a very simple operator called task-aware spatial disentanglement (TSD).
1148	Visual Reaction: Learning to Play Catch With Your Drone	Kuo-Hao Zeng; Roozbeh Mottaghi; Luca Weihs; Ali Farhadi;	In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agents itself.
1149	Prime Sample Attention in Object Detection	Yuhang Cao; Kai Chen; Chen Change Loy; Dahua Lin;	In this work, we revisit this paradigm through a careful study on how different samples contribute to the overall performance measured in terms of mAP.
1150	SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	Xianzhi Du; Tsung-Yi Lin; Pengchong Jin; Golnaz Ghiasi; Mingxing Tan; Yin Cui; Quoc V. Le; Xiaodan Song;	In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone.
1151	KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects	Xingyu Liu; Rico Jonschkowski; Anelia Angelova; Kurt Konolige;	We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called KeyPose, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects.
1152	SegGCN: Efficient 3D Point Cloud Segmentation With Fuzzy Spherical Kernel	Huan Lei; Naveed Akhtar; Ajmal Mian;	Inspired by this observation, we incorporate a fuzzy mechanism into discrete convolutional kernels for 3D point clouds as our first major contribution.
1153	nuScenes: A Multimodal Dataset for Autonomous Driving	Holger Caesar; Varun Bankiti; Alex H. Lang; Sourabh Vora; Venice Erin Liong; Qiang Xu; Anush Krishnan; Yu Pan; Giancarlo Baldan; Oscar Beijbom;	In this work we present nuTonomy scenes (nuScenes), the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view.
1154	PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation	Yisheng He; Wei Sun; Haibin Huang; Jianran Liu; Haoqiang Fan; Jian Sun;	In this work, we present a novel data-driven method for robust 6DoF object pose estimation from a single RGBD image.
1155	Probabilistic Pixel-Adaptive Refinement Networks	Anne S. Wannenwetsch; Stefan Roth;	We introduce probabilistic pixel-adaptive convolutions (PPACs), which not only depend on image guidance data for filtering, but also respect the reliability of per-pixel predictions.
1156	Discovering Human Interactions With Novel Objects via Zero-Shot Learning	Suchen Wang; Kim-Hui Yap; Junsong Yuan; Yap-Peng Tan;	We aim to detect human interactions with novel objects through zero-shot learning.
1157	Equalization Loss for Long-Tailed Object Recognition	Jingru Tan; Changbao Wang; Buyu Li; Quanquan Li; Wanli Ouyang; Changqing Yin; Junjie Yan;	In this work, we analyze this problem from a novel perspective: each positive sample of one category can be seen as a negative sample for other categories, making the tail categories receive more discouraging gradients.
1158	Learning Depth-Guided Convolutions for Monocular 3D Object Detection	Mingyu Ding; Yuqi Huo; Hongwei Yi; Zhe Wang; Jianping Shi; Zhiwu Lu; Ping Luo;	In this work, instead of using pseudo-LiDAR representation, we improve the fundamental 2D fully convolutions by proposing a new local convolutional network (LCN), termed Depth-guided Dynamic-Depthwise-Dilated LCN (D4LCN), where the filters and their receptive fields can be automatically learned from image-based depth maps, making different pixels of different images have different filters.
1159	Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather	Mario Bijelic; Tobias Gruber; Fahim Mannan; Florian Kraus; Werner Ritter; Klaus Dietmayer; Felix Heide;	To this end, we present a deep fusion network for robust fusion without a large corpus of labeled training data covering all asymmetric distortions.
1160	Don’t Even Look Once: Synthesizing Features for Zero-Shot Detection	Pengkai Zhu; Hanxiao Wang; Venkatesh Saligrama;	We propose a novel detection algorithm "Don’t Even Look Once (DELO)," that synthesizes visual features for unseen objects and augments existing training algorithms to incorporate unseen object detection.
1161	EPOS: Estimating 6D Pose of Objects With Symmetries	Tomas Hodan; Daniel Barath; Jiri Matas;	We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image.
1162	Train in Germany, Test in the USA: Making 3D Object Detectors Generalize	Yan Wang; Xiangyu Chen; Yurong You; Li Erran Li; Bharath Hariharan; Mark Campbell; Kilian Q. Weinberger; Wei-Lun Chao;	In this paper we consider the task of adapting 3D object detectors from one dataset to another.
1163	Exploring Categorical Regularization for Domain Adaptive Object Detection	Chang-Dong Xu; Xing-Ran Zhao; Xin Jin; Xiu-Shen Wei;	In this paper, we tackle the domain adaptive object detection problem, where the main challenge lies in significant domain gaps between source and target domains.
1164	Neural Implicit Embedding for Point Cloud Analysis	Kent Fujiwara; Taiichi Hashimoto;	We present a novel representation for point clouds that encapsulates the local characteristics of the underlying structure.
1165	Pose-Guided Visible Part Matching for Occluded Person ReID	Shang Gao; Jingya Wang; Huchuan Lu; Zimo Liu;	To address this issue, we propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility in an end-to-end framework.
1166	ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection	Yuxin Wang; Hongtao Xie; Zheng-Jun Zha; Mengting Xing; Zilong Fu; Yongdong Zhang;	In this paper, we propose the ContourNet, which effectively handles these two problems taking a further step toward accurate arbitrary-shaped text detection.
1167	Exploring Data Aggregation in Policy Learning for Vision-Based Urban Autonomous Driving	Aditya Prakash; Aseem Behl; Eshed Ohn-Bar; Kashyap Chitta; Andreas Geiger;	Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy’s state distribution.
1168	Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition	Mohan Zhou; Yalong Bai; Wei Zhang; Tiejun Zhao; Tao Mei;	In this paper, we propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions into the traditional framework.
1169	Recognizing Objects From Any View With Object and Viewer-Centered Representations	Sainan Liu; Vincent Nguyen; Isaac Rehg; Zhuowen Tu;	In this paper, we tackle an important task in computer vision: any view object recognition.
1170	Gated Channel Transformation for Visual Recognition	Zongxin Yang; Linchao Zhu; Yu Wu; Yi Yang;	In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks.
1171	Non-Local Neural Networks With Grouped Bilinear Attentional Transforms	Lu Chi; Zehuan Yuan; Yadong Mu; Changhu Wang;	This work proposes a novel non-local operator. It is inspired by the attention mechanism of human visual system, which can quickly attend to important local parts in sight and suppress other less-relevant information.
1172	Generative-Discriminative Feature Representations for Open-Set Recognition	Pramuditha Perera; Vlad I. Morariu; Rajiv Jain; Varun Manjunatha; Curtis Wigington; Vicente Ordonez; Vishal M. Patel;	We propose two techniques to force class activations of open-set samples to be low.
1173	RPM-Net: Robust Point Matching Using Learned Features	Zi Jian Yew; Gim Hee Lee;	In this paper, we propose the RPM-Net — a less sensitive to initialization and more robust deep learning-based approach for rigid point cloud registration.
1174	Sideways: Depth-Parallel Training of Video Models	Mateusz Malinowski; Grzegorz Swirszcz; Joao Carreira; Viorica Patraucean;	We propose Sideways, an approximate backpropagation scheme for training video models.
1175	Basis Prediction Networks for Effective Burst Denoising With Large Kernels	Zhihao Xia; Federico Perazzi; Michael Gharbi; Kalyan Sunkavalli; Ayan Chakrabarti;	To this end, we introduce a novel basis prediction network that, given an input burst, predicts a set of global basis kernels — shared within the image — and the corresponding mixing coefficients — which are specific to individual pixels.
1176	Private-kNN: Practical Differential Privacy for Computer Vision	Yuqing Zhu; Xiang Yu; Manmohan Chandraker; Yu-Xiang Wang;	We propose a practically data-efficient scheme based on private release of k-nearest neighbor (kNN) queries, which altogether avoids splitting the training dataset.
1177	SP-NAS: Serial-to-Parallel Backbone Search for Object Detection	Chenhan Jiang; Hang Xu; Wei Zhang; Xiaodan Liang; Zhenguo Li;	In this paper, we propose a two-phase serial-to-parallel architecture search framework named SP-NAS towards a flexible task-oriented detection backbone.
1178	Structure Aware Single-Stage 3D Object Detection From Point Cloud	Chenhang He; Hui Zeng; Jianqiang Huang; Xian-Sheng Hua; Lei Zhang;	In this work, we propose to improve the localization precision of single-stage detectors by explicitly leveraging the structure information of 3D point cloud.
1179	"Looking at the Right Stuff" – Guided Semantic-Gaze for Autonomous Driving	Anwesan Pal; Sayan Mondal; Henrik I. Christensen;	We propose a novel Semantics Augmented GazE (SAGE) detection approach that captures driving specific contextual information, in addition to the raw gaze.
1180	What’s Hidden in a Randomly Weighted Neural Network?	Vivek Ramanujan; Mitchell Wortsman; Aniruddha Kembhavi; Ali Farhadi; Mohammad Rastegari;	We empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an "untrained subnetwork" approaches a network with learned weights in accuracy.
1181	Structured Multi-Hashing for Model Compression	Elad Eban; Yair Movshovitz-Attias; Hao Wu; Mark Sandler; Andrew Poon; Yerlan Idelbayev; Miguel A. Carreira-Perpinan;	In this work we combine ideas from weight hashing and dimensionality reductions resulting in a simple and powerful structured multi-hashing method based on matrix products that allows direct control of model size of any deep network and is trained end-to-end.
1182	DOPS: Learning to Detect 3D Objects and Predict Their 3D Shapes	Mahyar Najibi; Guangda Lai; Abhijit Kundu; Zhichao Lu; Vivek Rathod; Thomas Funkhouser; Caroline Pantofaru; David Ross; Larry S. Davis; Alireza Fathi;	We propose DOPS, a fast single-stage 3D object detection method for LIDAR data.
1183	AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization	Yiming Li; Changhong Fu; Fangqiang Ding; Ziyuan Huang; Geng Lu;	In this work, a novel approach is proposed to online automatically and adaptively learn spatio-temporal regularization term.
1184	GP-NAS: Gaussian Process Based Neural Architecture Search	Zhihang Li; Teng Xi; Jiankang Deng; Gang Zhang; Shengzhao Wen; Ran He;	In this paper, we aim to address three important questions in NAS: (1) How to measure the correlation between architectures and their performances? (2) How to evaluate the correlation between different architectures? (3) How to learn these correlations with a small number of samples?
1185	NAS-FCOS: Fast Neural Architecture Search for Object Detection	Ning Wang; Yang Gao; Hao Chen; Peng Wang; Zhi Tian; Chunhua Shen; Yanning Zhang;	Here we propose to search for the decoder structure of object detectors with search efficiency being taken into consideration.
1186	TCTS: A Task-Consistent Two-Stage Framework for Person Search	Cheng Wang; Bingpeng Ma; Hong Chang; Shiguang Shan; Xilin Chen;	To address the consistency problem, we introduce a Task-Consist Two-Stage (TCTS) person search framework, includes an identity-guided query (IDGQ) detector and a Detection Results Adapted (DRA) re-ID model.
1187	SCATTER: Selective Context Attentional Scene Text Recognizer	Ron Litman; Oron Anschel; Shahar Tsiper; Roee Litman; Shai Mazor; R. Manmatha;	In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER).
1188	Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation	Dengsheng Chen; Jun Li; Zheng Wang; Kai Xu;	We present a novel approach to category-level 6D object pose and size estimation.
1189	Hierarchical Scene Coordinate Classification and Regression for Visual Localization	Xiaotian Li; Shuzhe Wang; Yi Zhao; Jakob Verbeek; Juho Kannala;	In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image.
1190	MiLeNAS: Efficient Neural Architecture Search via Mixed-Level Reformulation	Chaoyang He; Haishan Ye; Li Shen; Tong Zhang;	To remedy this, this paper proposes MiLeNAS, a mixed-level reformulation for NAS that can be optimized efficiently and reliably.
1191	Scalable Uncertainty for Computer Vision With Functional Variational Inference	Eduardo D. C. Carvalho; Ronald Clark; Andrea Nicastro; Paul H. J. Kelly;	By leveraging the structure of the induced covariance matrices, we propose numerically efficient algorithms which enable fast training in the context of high-dimensional tasks such as depth estimation and semantic segmentation.
1192	Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End	Abdelrahman Eldesokey; Michael Felsberg; Karl Holmquist; Michael Persson;	In this work, we thus focus on modeling the uncertainty of depth data in depth completion starting from the sparse noisy input all the way to the final prediction.
1193	Butterfly Transform: An Efficient FFT Based Neural Architecture Design	Keivan Alizadeh vahid; Anish Prabhu; Ali Farhadi; Mohammad Rastegari;	In this paper, we show that extending the butterfly operations from the FFT algorithm to a general Butterfly Transform (BFT) can be beneficial in building an efficient block structure for CNN designs.
1194	A Certifiably Globally Optimal Solution to Generalized Essential Matrix Estimation	Ji Zhao; Wanting Xu; Laurent Kneip;	We present a convex optimization approach for generalized essential matrix (GEM) estimation.
1195	MUXConv: Information Multiplexing in Convolutional Neural Networks	Zhichao Lu; Kalyanmoy Deb; Vishnu Naresh Boddeti;	To overcome this limitation, we present MUXConv, a layer that is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network, while mitigating computational complexity.
1196	PointGMM: A Neural GMM Network for Point Clouds	Amir Hertz; Rana Hanocka; Raja Giryes; Daniel Cohen-Or;	We present PointGMM, a neural network that learns to generate hGMMs which are characteristic of the shape class, and also coincide with the input point cloud.
1197	Noisier2Noise: Learning to Denoise From Unpaired Noisy Data	Nick Moran; Dan Schmidt; Yu Zhong; Patrick Coady;	We present a method for training a neural network to perform image denoising without access to clean training examples or access to paired noisy training examples.
1198	TRPLP – Trifocal Relative Pose From Lines at Points	Ricardo Fabbri; Timothy Duff; Hongyi Fan; Margaret H. Regan; David da Costa de Pinho; Elias Tsigaridas; Charles W. Wampler; Jonathan D. Hauenstein; Peter J. Giblin; Benjamin Kimia; Anton Leykin; Tomas Pajdla;	We present a method for solving two minimal problems for relative camera pose estimation from three views, which are based on three view correspondences of (i) three points and one line and (ii) three points and two lines through two of the points.
1199	DSNAS: Direct Neural Architecture Search Without Parameter Retraining	Shoukang Hu; Sirui Xie; Hehui Zheng; Chunxiao Liu; Jianping Shi; Xunying Liu; Dahua Lin;	In this work, we propose a new problem definition for NAS, task-specific end-to-end, based on this observation.
1200	MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships	Yongjian Chen; Lei Tai; Kai Sun; Mingyang Li;	To this end, we propose a novel method to improve the monocular 3D object detection by considering the relationship of paired samples.
1201	Regularization on Spatio-Temporally Smoothed Feature for Action Recognition	Jinhyung Kim; Seunghwan Cha; Dongyoon Wee; Soonmin Bae; Junmo Kim;	In this paper, we propose Random Mean Scaling (RMS), a simple and effective regularization method, to relieve the overfitting problem in 3D residual networks.
1202	Towards Accurate Scene Text Recognition With Semantic Reasoning Networks	Deli Yu; Xuan Li; Chengquan Zhang; Tao Liu; Junyu Han; Jingtuo Liu; Errui Ding;	To mitigate these limitations, we propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition, where a global semantic reasoning module (GSRM) is introduced to capture global semantic context through multi-way parallel transmission.
1203	Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation	Juncheng Li; Xin Wang; Siliang Tang; Haizhou Shi; Fei Wu; Yueting Zhuang; William Yang Wang;	In this paper, we focus on visual navigation in the low-resource setting, where we have only a few training environments annotated with object information.
1204	Inferring Attention Shift Ranks of Objects for Image Saliency	Avishek Siris; Jianbo Jiao; Gary K.L. Tam; Xianghua Xie; Rynson W.H. Lau;	Following psychological studies, in this paper, we propose to predict the saliency rank by inferring human attention shift. Due to the lack of such data, we first construct a large-scale salient object ranking dataset.
1205	Camera On-Boarding for Person Re-Identification Using Hypothesis Transfer Learning	Sk Miraj Ahmed; Aske R. Lejbolle; Rameswar Panda; Amit K. Roy-Chowdhury;	Rather, based on the fact that it is easy to store the learned re-identifications models, which mitigates any data privacy concern, we develop an efficient model adaptation approach using hypothesis transfer learning that aims to transfer the knowledge using only source models and limited labeled data, but without using any source camera data from the existing network.
1206	Joint Graph-Based Depth Refinement and Normal Estimation	Mattia Rossi; Mireille El Gheche; Andreas Kuhn; Pascal Frossard;	With these settings in mind, we devise a novel depth refinement framework that aims at recovering the underlying piece-wise planarity of those inverse depth maps associated to piece-wise planar scenes.
1207	DR Loss: Improving Object Detection by Distributional Ranking	Qi Qian; Lei Chen; Hao Li; Rong Jin;	In this work, we propose a novel distributional ranking (DR) loss to handle the challenge.
1208	Self-Trained Deep Ordinal Regression for End-to-End Video Anomaly Detection	Guansong Pang; Cheng Yan; Chunhua Shen; Anton van den Hengel; Xiao Bai;	By formulating a surrogate two-class ordinal regression task we devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data.
1209	Few-Shot Class-Incremental Learning	Xiaoyu Tao; Xiaopeng Hong; Xinyuan Chang; Songlin Dong; Xing Wei; Yihong Gong;	To address this problem, we represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes. On this basis, we propose the TOpology-Preserving knowledge InCrementer (TOPIC) framework.
1210	PolarMask: Single Shot Instance Segmentation With Polar Representation	Enze Xie; Peize Sun; Xiaoge Song; Wenhai Wang; Xuebo Liu; Ding Liang; Chunhua Shen; Ping Luo;	In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used by easily embedding it into most off-the-shelf detection methods.
1211	DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers	Chi Zhang; Yujun Cai; Guosheng Lin; Chunhua Shen;	In this paper, we address the few-shot classification task from a new perspective of optimal matching between image regions.
1212	Detection in Crowded Scenes: One Proposal, Multiple Predictions	Xuangeng Chu; Anlin Zheng; Xiangyu Zhang; Jian Sun;	We propose a simple yet effective proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.
1213	Autolabeling 3D Objects With Differentiable Rendering of SDF Shape Priors	Sergey Zakharov; Wadim Kehl; Arjun Bhargava; Adrien Gaidon;	We present an automatic annotation pipeline to recover 9D cuboids and 3D shapes from pre-trained off-the-shelf 2D detectors and sparse LIDAR data.
1214	Interactive Object Segmentation With Inside-Outside Guidance	Shiyin Zhang; Jun Hao Liew; Yunchao Wei; Shikui Wei; Yao Zhao;	To achieve this, we propose an Inside-Outside Guidance (IOG) approach in this work.
1215	Mnemonics Training: Multi-Class Incremental Learning Without Forgetting	Yaoyao Liu; Yuting Su; An-An Liu; Bernt Schiele; Qianru Sun;	This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner.
1216	Learning to Segment 3D Point Clouds in 2D Image Space	Yecheng Lyu; Xinming Huang; Ziming Zhang;	In contrast to the literature where local patterns in 3D point clouds are captured by customized convolutional operators, in this paper we study the problem of how to effectively and efficiently project such point clouds into a 2D image space so that traditional 2D convolutional neural networks (CNNs) such as U-Net can be applied for segmentation.
1217	Smooth Shells: Multi-Scale Shape Registration With Functional Maps	Marvin Eisenberger; Zorah Lahner; Daniel Cremers;	We propose a novel 3D shape correspondence method based on the iterative alignment of so-called smooth shells.
1218	Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation	Yude Wang; Jie Zhang; Meina Kan; Shiguang Shan; Xilin Chen;	In this paper, we propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
1219	Efficient Neural Vision Systems Based on Convolutional Image Acquisition	Pedram Pad; Simon Narduzzi; Clement Kundig; Engin Turetken; Siavash A. Bigdeli; L. Andrea Dunbar;	In this paper, we tackle this fundamental challenge by introducing a hybrid optical-digital implementation of a convolutional neural network (CNN) based on engineering of the point spread function (PSF) of an optical imaging system.
1220	Visual Chirality	Zhiqiu Lin; Jin Sun; Abe Davis; Noah Snavely;	In this paper, we investigate how the statistics of visual data are changed by reflection.
1221	What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images	Xing Xu; Jiefu Chen; Jinhui Xiao; Lianli Gao; Fumin Shen; Heng Tao Shen;	Specifically, we propose a novel and efficient optimization-based method that can be naturally integrated to different sequential prediction schemes, i.e., connectionist temporal classification (CTC) and attention mechanism.
1222	Dynamic Traffic Modeling From Overhead Imagery	Scott Workman; Nathan Jacobs;	Instead, we propose an automatic approach for generating dynamic maps of traffic speeds using convolutional neural networks.
1223	Satellite Image Time Series Classification With Pixel-Set Encoders and Temporal Self-Attention	Vivien Sainte Fare Garnot; Loic Landrieu; Sebastien Giordano; Nesrine Chehata;	We propose an alternative approach in which the convolutional layers are advantageously replaced with encoders operating on unordered sets of pixels to exploit the typically coarse resolution of publicly available satellite images.
1224	DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads	Xi Zhang; Xiaolin Wu; Xinliang Zhai; Xianye Ben; Chengjie Tu;	To address this problem, we present a novel deep convolutional neural network (DCNN) method for very low bit rate video reconstruction of talking heads.
1225	Learning When and Where to Zoom With Deep Reinforcement Learning	Burak Uzkent; Stefano Ermon;	In this direction, we propose PatchDrop a reinforcement learning approach to dynamically identify when and where to use/acquire high resolution data conditioned on the paired, cheap, low resolution images.
1226	Cross-Domain Detection via Graph-Induced Prototype Alignment	Minghao Xu; Hang Wang; Bingbing Ni; Qi Tian; Wenjun Zhang;	To mitigate these problems, we propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment via elaborate prototype representations.
1227	Meta-Learning of Neural Architectures for Few-Shot Learning	Thomas Elsken; Benedikt Staffler; Jan Hendrik Metzen; Frank Hutter;	To improve upon this, we propose MetaNAS, the first method which fully integrates NAS with gradient-based meta-learning.
1228	Towards Inheritable Models for Open-Set Domain Adaptation	Jogendra Nath Kundu; Naveen Venkat; Ambareesh Revanur; Rahul M V; R. Venkatesh Babu;	Addressing this, we introduce a practical DA paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future.
1229	Learning From Synthetic Animals	Jiteng Mu; Weichao Qiu; Gregory D. Hager; Alan L. Yuille;	In this paper, we use synthetic images and ground truth generated from CAD animal models to address this challenge.
1230	Distilling Cross-Task Knowledge via Relationship Matching	Han-Jia Ye; Su Lu; De-Chuan Zhan;	This paper deals with a general scenario reusing the knowledge from a cross-task teacher — two models are targeting non-overlapping label spaces.
1231	Open Compound Domain Adaptation	Ziwei Liu; Zhongqi Miao; Xingang Pan; Xiaohang Zhan; Dahua Lin; Stella X. Yu; Boqing Gong;	We propose a new approach based on two technical insights into OCDA: 1) a curriculum domain adaptation strategy to bootstrap generalization across domains in a data-driven self-organizing fashion and 2) a memory module to increase the model’s agility towards novel domains.
1232	Context Prior for Scene Segmentation	Changqian Yu; Jingbo Wang; Changxin Gao; Gang Yu; Chunhua Shen; Nong Sang;	In this work, we directly supervise the feature aggregation to distinguish the intra-class and interclass context clearly.
1233	Tangent Images for Mitigating Spherical Distortion	Marc Eder; Mykhailo Shvets; John Lim; Jan-Michael Frahm;	In this work, we propose "tangent images," a spherical image representation that facilitates transferable and scalable 360 degree computer vision.
1234	Learning a Dynamic Map of Visual Appearance	Tawfiq Salem; Scott Workman; Nathan Jacobs;	Every day billions of images capture this complex relationship, many of which are associated with precise time and location metadata. We propose to use these images to construct a global-scale, dynamic map of visual appearance attributes.
1235	Webly Supervised Knowledge Embedding Model for Visual Reasoning	Wenbo Zheng; Lan Yan; Chao Gou; Fei-Yue Wang;	We present a two-stage approach for the task that can augment knowledge through an effective embedding model with weakly supervised web data.
1236	Gradually Vanishing Bridge for Adversarial Domain Adaptation	Shuhao Cui; Shuhui Wang; Junbao Zhuo; Chi Su; Qingming Huang; Qi Tian;	In this paper, we equip adversarial domain adaptation with Gradually Vanishing Bridge (GVB) mechanism on both generator and discriminator.
1237	Active Speakers in Context	Juan Leon Alcazar; Fabian Caba; Long Mai; Federico Perazzi; Joon-Young Lee; Pablo Arbelaez; Bernard Ghanem;	This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons.
1238	Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation	Bowen Cheng; Maxwell D. Collins; Yukun Zhu; Ting Liu; Thomas S. Huang; Hartwig Adam; Liang-Chieh Chen;	In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed.
1239	Inter-Region Affinity Distillation for Road Marking Segmentation	Yuenan Hou; Zheng Ma; Chunxiao Liu; Tak-Wai Hui; Chen Change Loy;	In this work, we explore a novel knowledge distillation (KD) approach that can transfer ‘knowledge’ on scene structure more effectively from a teacher to a student model.
1240	Unified Dynamic Convolutional Network for Super-Resolution With Variational Degradations	Yu-Syuan Xu; Shou-Yao Roy Tseng; Yu Tseng; Hsien-Kai Kuo; Yi-Min Tsai;	To fulfill this requirement, this paper proposes a unified network to accommodate the variations from inter-image (cross-image variations) and intra-image (spatial variations).
1241	Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks	Luca Bertinetto; Romain Mueller; Konstantinos Tertikas; Sina Samangooei; Nicholas A. Lord;	In this paper, we aim to renew interest in this problem by reviewing past approaches and proposing two simple methods which outperform the prior art under several metrics on two large datasets with complex class hierarchies: tieredImageNet and iNaturalist’19.
1242	Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN	Jingwen Ye; Yixin Ji; Xinchao Wang; Xin Gao; Mingli Song;	In this paper, we propose a data-free knowledge amalgamate strategy to craft a well-behaved multi-task student network from multiple single/multi-task teachers.
1243	Screencast Tutorial Video Understanding	Kunpeng Li; Chen Fang; Zhaowen Wang; Seokhwan Kim; Hailin Jin; Yun Fu;	In this paper, we propose visual understanding of screencast tutorials as a new research problem to the computer vision community. We collect a new dataset of Adobe Photoshop video tutorials and annotate it with both low-level and high-level semantic labels.
1244	DSGN: Deep Stereo Geometry Network for 3D Object Detection	Yilun Chen; Shu Liu; Xiaoyong Shen; Jiaya Jia;	Our method, called Deep Stereo Geometry Network (DSGN), reduces this gap significantly by detecting 3D objects on a differentiable volumetric representation — 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space.
1245	Weakly-Supervised Salient Object Detection via Scribble Annotations	Jing Zhang; Xin Yu; Aixuan Li; Peipei Song; Bowen Liu; Yuchao Dai;	In this paper, we propose a weakly-supervised salient object detection model to learn saliency from such annotations.
1246	Learning to Learn Single Domain Generalization	Fengchun Qiao; Long Zhao; Xi Peng;	We propose a new method named adversarial domain augmentation to solve this Out-of-Distribution (OOD) generalization problem.
1247	Severity-Aware Semantic Segmentation With Reinforced Wasserstein Training	Xiaofeng Liu; Wenxuan Ji; Jane You; Georges El Fakhri; Jonghye Woo;	To sidestep this, in this work, we propose to incorporate the severity-aware inter-class correlation into our Wasserstein training framework by configuring its ground distance matrix.
1248	Boosting Few-Shot Learning With Adaptive Margin Loss	Aoxue Li; Weiran Huang; Xu Lan; Jiashi Feng; Zhenguo Li; Liwei Wang;	This paper proposes an adaptive margin principle to improve the generalization ability of metric-based meta-learning approaches for few-shot learning problems.
1249	JA-POLS: A Moving-Camera Background Model via Joint Alignment and Partially-Overlapping Local Subspaces	Irit Chelly; Vlad Winter; Dor Litvak; David Rosen; Oren Freifeld;	Here we propose a purely-2D unsupervised modular method that systematically eliminates those issues.
1250	AugFPN: Improving Multi-Scale Feature Learning for Object Detection	Chaoxu Guo; Bin Fan; Qian Zhang; Shiming Xiang; Chunhong Pan;	In this paper, we begin by first analyzing the design defects of feature pyramid in FPN, and then introduce a new feature pyramid architecture named AugFPN to address these problems.
1251	xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation	Maximilian Jaritz; Tuan-Hung Vu; Raoul de Charette; Emilie Wirbel; Patrick Perez;	In this work, we explore how to learn from multi-modality and propose cross-modal UDA (xMUDA) where we assume the presence of 2D images and 3D point clouds for 3D semantic segmentation.
1252	Norm-Aware Embedding for Efficient Person Search	Di Chen; Shanshan Zhang; Jian Yang; Bernt Schiele;	To this end, We present a novel approach called Norm-Aware Embedding to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training.
1253	Intelligent Home 3D: Automatic 3D-House Design From Linguistic Descriptions Only	Qi Chen; Qi Wu; Rui Tang; Yuhan Wang; Shuai Wang; Mingkui Tan;	In this paper, we formulate it as a language conditioned visual content generation problem that is further divided into a floor plan generation and an interior texture (such as floor and wall) synthesis task.
1254	Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation	Zhonghao Wang; Mo Yu; Yunchao Wei; Rogerio Feris; Jinjun Xiong; Wen-mei Hwu; Thomas S. Huang; Honghui Shi;	We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work.
1255	Robust Object Detection Under Occlusion With Context-Aware CompositionalNets	Angtian Wang; Yihong Sun; Adam Kortylewski; Alan L. Yuille;	In this work, we propose to overcome two limitations of CompositionalNets which will enable them to detect partially occluded objects: 1) CompositionalNets, as well as other DCNN architectures, do not explicitly separate the representation of the context from the object itself.
1256	IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval	Hui Chen; Guiguang Ding; Xudong Liu; Zijia Lin; Ji Liu; Jungong Han;	In this paper, to address such a deficiency, we propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences between images and texts are captured with multiple steps of alignments.
1257	Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning	Shaobo Min; Hantao Yao; Hongtao Xie; Chaoqun Wang; Zheng-Jun Zha; Yongdong Zhang;	In this paper, we propose a novel Domain-aware Visual Bias Eliminating (DVBE) network that constructs two complementary visual representations, i.e., semantic-free and semantic-aligned, to treat seen and unseen domains separately.
1258	Semi-Supervised Semantic Segmentation With Cross-Consistency Training	Yassine Ouali; Celine Hudelot; Myriam Tami;	In this paper, we present a novel cross-consistency based semi-supervised approach for semantic segmentation.
1259	Learning to Learn Cropping Models for Different Aspect Ratio Requirements	Debang Li; Junge Zhang; Kaiqi Huang;	In this paper, we propose a meta-learning (learning to learn) based aspect ratio specified image cropping method called Mars, which can generate cropping results of different expected aspect ratios.
1260	What Makes Training Multi-Modal Classification Networks Hard?	Weiyao Wang; Du Tran; Matt Feiszli;	This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. We address these two problems with a technique we call Gradient-Blending, which computes an optimal blending of modalities based on their overfitting behaviors.
1261	Selective Transfer With Reinforced Transfer Network for Partial Domain Adaptation	Zhihong Chen; Chao Chen; Zhaowei Cheng; Boyuan Jiang; Ke Fang; Xinyu Jin;	In this paper, we propose a reinforced transfer network (RTNet), which utilizes both high-level and pixel-level information for PDA problem.
1262	Semi-Supervised Semantic Image Segmentation With Self-Correcting Networks	Mostafa S. Ibrahim; Arash Vahdat; Mani Ranjbar; William G. Macready;	In this paper, we introduce a principled semi-supervised framework that only use a small set of fully supervised images (having semantic segmentation labels and box labels) and a set of images with only object bounding box labels (we call it the weak-set).
1263	Exemplar Normalization for Learning Deep Representation	Ruimao Zhang; Zhanglin Peng; Lingyun Wu; Zhen Li; Ping Luo;	This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
1264	Imitative Non-Autoregressive Modeling for Trajectory Forecasting and Imputation	Mengshi Qi; Jie Qin; Yu Wu; Yi Yang;	To this end, we propose a novel imitative non-autoregressive modeling method to simultaneously handle the trajectory prediction task and the missing value imputation task.
1265	Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text	Difei Gao; Ke Li; Ruiping Wang; Shiguang Shan; Xilin Chen;	Following this idea, we propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN).
1266	StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching	Rui Liu; Chengxi Yang; Wenxiu Sun; Xiaogang Wang; Hongsheng Li;	In this paper, we propose an end-to-end training framework with domain translation and stereo matching networks to tackle this challenge.
1267	Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning	Jiamin Wu; Tianzhu Zhang; Zheng-Jun Zha; Jiebo Luo; Yongdong Zhang; Feng Wu;	To address this issue, we propose an end-to-end Self-supervised Domain-aware Generative Network (SDGN) by integrating self-supervised learning into feature generating model for unbiased GZSL.
1268	Sparse Layered Graphs for Multi-Object Segmentation	Niels Jeppesen; Anders N. Christensen; Vedrana A. Dahl; Anders B. Dahl;	We introduce the novel concept of a Sparse Layered Graph (SLG) for s-t graph cut segmentation of image data.
1269	Visual-Semantic Matching by Exploring High-Order Attention and Distraction	Yongzhi Li; Duo Zhang; Yadong Mu;	In this work, we address this task from two previously-ignored aspects: high-order semantic information (e.g., object-predicate-subject triplet, object-attribute pair) and visual distraction (i.e., despite the high relevance to textual query, images may also contain many prominent distracting objects or visual relations).
1270	End-to-End 3D Point Cloud Instance Segmentation Without Detection	Haiyong Jiang; Feilong Yan; Jianfei Cai; Jianmin Zheng; Jun Xiao;	In this paper, we introduce a novel framework to enable end-to-end instance segmentation without detection and a separate step of grouping.
1271	Deep Adversarial Decomposition: A Unified Framework for Separating Superimposed Images	Zhengxia Zou; Sen Lei; Tianyang Shi; Zhenwei Shi; Jieping Ye;	We propose a unified framework named "deep adversarial decomposition" for single superimposed image separation.
1272	Differentiable Adaptive Computation Time for Visual Reasoning	Cristobal Eyzaguirre; Alvaro Soto;	This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT, which, unlike existing ones, is end-to-end differentiable.
1273	DeepLPF: Deep Local Parametric Filters for Image Enhancement	Sean Moran; Pierre Marza; Steven McDonagh; Sarah Parisot; Gregory Slabaugh;	In this paper, we introduce a novel approach to automatically enhance images using learned spatially local filters of three different types (Elliptical Filter, Graduated Filter, Polynomial Filter).
1274	Instance Credibility Inference for Few-Shot Learning	Yikai Wang; Chengming Xu; Chen Liu; Li Zhang; Yanwei Fu;	In contrast, this paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning.
1275	Learning From Web Data With Self-Organizing Memory Module	Yi Tu; Li Niu; Junjie Chen; Dawei Cheng; Liqing Zhang;	In this paper, we propose a novel method, which is capable of handling these two types of noises together, without the supervision of clean images in the training stage.
1276	TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning	Zhongjie Yu; Lin Chen; Zhongwei Cheng; Jiebo Luo;	In this paper, we propose a new transfer-learning framework for semi-supervised few-shot learning to fully utilize the auxiliary information from labeled base-class data and unlabeled novel-class data.
1277	Learning the Redundancy-Free Features for Generalized Zero-Shot Object Recognition	Zongyan Han; Zhenyong Fu; Jian Yang;	To reduce the superfluous information in the fine-grained objects, in this paper, we propose to learn the redundancy-free features for generalized zero-shot learning.
1278	Neural Topological SLAM for Visual Navigation	Devendra Singh Chaplot; Ruslan Salakhutdinov; Abhinav Gupta; Saurabh Gupta;	This paper studies the problem of image-goal navigation which involves navigating to the location indicated by a goal image in a novel previously unseen environment.
1279	WaveletStereo: Learning Wavelet Coefficients of Disparity Map in Stereo Matching	Menglong Yang; Fangrui Wu; Wei Li;	This paper proposes a novel stereo matching method called WaveletStereo, which learns the wavelet coefficients of the disparity rather than the disparity itself.
1280	Robust Superpixel-Guided Attentional Adversarial Attack	Xiaoyi Dong; Jiangfan Han; Dongdong Chen; Jiayang Liu; Huanyu Bian; Zehua Ma; Hongsheng Li; Xiaogang Wang; Weiming Zhang; Nenghai Yu;	Based on these two considerations, we propose the first robust superpixel-guided attentional adversarial attack method.
1281	BEDSR-Net: A Deep Shadow Removal Network From a Single Document Image	Yun-Hsuan Lin; Wen-Chin Chen; Yung-Yu Chuang;	This paper proposes the Background Estimation Document Shadow Removal Network (BEDSR-Net), the first deep network specifically designed for document image shadow removal.
1282	Cross-Domain Document Object Detection: Benchmark Suite and Method	Kai Li; Curtis Wigington; Chris Tensmeyer; Handong Zhao; Nikolaos Barmpalios; Vlad I. Morariu; Varun Manjunatha; Tong Sun; Yun Fu;	We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain.
1283	Explaining Knowledge Distillation by Quantifying the Knowledge	Xu Cheng; Zhefan Rao; Yilan Chen; Quanshi Zhang;	This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN).
1284	Exploring Bottom-Up and Top-Down Cues With Attentive Learning for Webly Supervised Object Detection	Zhonghua Wu; Qingyi Tao; Guosheng Lin; Jianfei Cai;	Within our approach, we introduce a bottom-up mechanism based on the well-trained fully supervised object detector (i.e. Faster RCNN) as an object region estimator for web images by recognizing the common objectiveness shared by base and novel classes.
1285	Enhancing Generic Segmentation With Learned Region Representations	Or Isaacs; Oran Shayer; Michael Lindenbaum;	We propose an alternative approach called Deep Generic Segmentation (DGS) and try to follow the path used for semantic segmentation.
1286	Adaptive Hierarchical Down-Sampling for Point Cloud Classification	Ehsan Nezhadarya; Ehsan Taghavi; Ryan Razani; Bingbing Liu; Jun Luo;	In this paper, we propose a novel deterministic, adaptive, permutation-invariant down-sampling layer, called Critical Points Layer (CPL), which learns to reduce the number of points in an unordered point cloud while retaining the important (critical) ones.
1287	FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions	Alvin Wan; Xiaoliang Dai; Peizhao Zhang; Zijian He; Yuandong Tian; Saining Xie; Bichen Wu; Matthew Yu; Tao Xu; Kan Chen; Peter Vajda; Joseph E. Gonzalez;	To address this bottleneck, we propose a memory and computationally efficient DNAS variant: DMaskingNAS.
1288	Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation	Myeongjin Kim; Hyeran Byun;	In this paper, considering the fundamental difference between the two domains as the texture, we propose a method to adapt to the target domain’s texture.
1289	Putting Visual Object Recognition in Context	Mengmi Zhang; Claire Tseng; Gabriel Kreiman;	We propose a biologically-inspired context-aware object recognition model consisting of a two-stream architecture.
1290	SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection	Ze Chen; Zhihang Fu; Rongxin Jiang; Yaowu Chen; Xian-Sheng Hua;	In this paper, we propose a spatial likelihood voting (SLV) module to converge the proposal localizing process without any bounding box annotations.
1291	Universal Weighting Metric Learning for Cross-Modal Matching	Jiwei Wei; Xing Xu; Yang Yang; Yanli Ji; Zheng Wang; Heng Tao Shen;	To address this problem, we propose a simple and interpretable universal weighting framework for cross-modal matching, which provides a tool to analyze the interpretability of various loss functions.
1292	IDA-3D: Instance-Depth-Aware 3D Object Detection From Stereo Vision for Autonomous Driving	Wanli Peng; Hao Pan; He Liu; Yi Sun;	Considering more general scenes, where there is no LiDAR data in the 3D datasets, we propose a 3D object detection approach from stereo vision which does not rely on LiDAR data either as input or as supervision in training, but solely takes RGB images with corresponding annotated 3D bounding boxes as training data.
1293	Label Decoupling Framework for Salient Object Detection	Jun Wei; Shuhui Wang; Zhe Wu; Chi Su; Qingming Huang; Qi Tian;	To address this problem, we propose a label decoupling framework (LDF) which consists of a label decoupling (LD) procedure and a feature interaction network (FIN).
1294	Transform and Tell: Entity-Aware News Image Captioning	Alasdair Tran; Alexander Mathews; Lexing Xie;	We propose an end-to-end model which generates captions for images embedded in news articles.
1295	HAMBox: Delving Into Mining High-Quality Anchors on Face Detection	Yang Liu; Xu Tang; Junyu Han; Jingtuo Liu; Dinger Rui; Xiang Wu;	In this paper, we propose an Online High-quality Anchor Mining Strategy (HAMBox), which explicitly helps outer faces compensate with high-quality anchors.
1296	Hierarchical Feature Embedding for Attribute Recognition	Jie Yang; Jiarou Fan; Yiru Wang; Yige Wang; Weihao Gan; Lin Liu; Wei Wu;	To address this problem, we propose a hierarchical feature embedding (HFE) framework, which learns a fine-grained feature embedding by combining attribute and ID information.
1297	Squeeze-and-Attention Networks for Semantic Segmentation	Zilong Zhong; Zhong Qiu Lin; Rene Bidart; Xiaodan Hu; Ibrahim Ben Daya; Zhifeng Li; Wei-Shi Zheng; Jonathan Li; Alexander Wong;	In this paper, we propose a novel squeeze-and-attention network (SANet) architecture that leverages an effective squeeze-and-attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixel-wise prediction.
1298	Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection	Sara Beery; Guanhang Wu; Vivek Rathod; Ronny Votel; Jonathan Huang;	In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.
1299	Mixture Dense Regression for Object Detection and Human Pose Estimation	Ali Varamesh; Tinne Tuytelaars;	To this end, we devise a framework for spatial regression using mixture density networks.
1300	Syntax-Aware Action Targeting for Video Captioning	Qi Zheng; Chaoyue Wang; Dacheng Tao;	Specifically, we propose a Syntax-Aware Action Targeting (SAAT) module that firstly builds a self-attended scene representation to draw global dependence among multiple objects within a scene, and then decodes the visually-related syntax components by setting different queries.
1301	Learning Visual Emotion Representations From Web Data	Zijun Wei; Jianming Zhang; Zhe Lin; Joon-Young Lee; Niranjan Balasubramanian; Minh Hoai; Dimitris Samaras;	We present a scalable approach for learning powerful visual features for emotion recognition.
1302	The Edge of Depth: Explicit Constraints Between Segmentation and Depth	Shengjie Zhu; Garrick Brazil; Xiaoming Liu;	In this work we study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images.
1303	A Context-Aware Loss Function for Action Spotting in Soccer Videos	Anthony Cioppa; Adrien Deliege; Silvio Giancola; Bernard Ghanem; Marc Van Droogenbroeck; Rikke Gade; Thomas B. Moeslund;	In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot.
1304	Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training	Weituo Hao; Chunyuan Li; Xiujun Li; Lawrence Carin; Jianfeng Gao;	In this paper, we present the first pre-training and fine-tuning paradigm for vision-and-language navigation (VLN) tasks.
1305	Video Instance Segmentation Tracking With a Modified VAE Architecture	Chung-Ching Lin; Ying Hung; Rogerio Feris; Linglin He;	We propose a modified variational autoencoder (VAE) architecture built on top of Mask R-CNN for instance-level video segmentation and tracking.
1306	Deformation-Aware Unpaired Image Translation for Pose Estimation on Laboratory Animals	Siyuan Li; Semih Gunel; Mirela Ostrek; Pavan Ramdya; Pascal Fua; Helge Rhodin;	Our goal is to capture the pose of real animals using synthetic training examples, without using any manual supervision.
1307	ZeroQ: A Novel Zero Shot Quantization Framework	Yaohui Cai; Zhewei Yao; Zhen Dong; Amir Gholami; Michael W. Mahoney; Kurt Keutzer;	Here, we propose \OURS, a novel zero-shot quantization framework to address this.
1308	Disparity-Aware Domain Adaptation in Stereo Image Restoration	Bo Yan; Chenxi Ma; Bahetiyaer Bare; Weimin Tan; Steven C. H. Hoi;	Towards this end, this paper analyses how to effectively explore disparity information, and proposes a unified stereo image restoration framework.
1309	Offset Bin Classification Network for Accurate Object Detection	Heqian Qiu; Hongliang Li; Qingbo Wu; Hengcan Shi;	In this paper, we propose an offset bin classification network optimized with cross entropy loss to predict more accurate offsets.
1310	TBT: Targeted Neural Network Attack With Bit Trojan	Adnan Siraj Rakin; Zhezhi He; Deliang Fan;	In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through bit-flip attack.
1311	Maintaining Discrimination and Fairness in Class Incremental Learning	Bowen Zhao; Xi Xiao; Guojun Gan; Bin Zhang; Shu-Tao Xia;	In this paper, we propose a simple and effective solution motivated by the aforementioned observations to address catastrophic forgetting.
1312	Background Data Resampling for Outlier-Aware Classification	Yi Li; Nuno Vasconcelos;	The problem of learning an image classifier that allows detection of out-of-distribution (OOD) examples, with the help of auxiliary background datasets, is studied.
1313	STEFANN: Scene Text Editor Using Font Adaptive Neural Network	Prasun Roy; Saumik Bhattacharya; Subhankar Ghosh; Umapada Pal;	In this paper, we propose a method to modify text in an image at character-level.
1314	Geometry and Learning Co-Supported Normal Estimation for Unstructured Point Cloud	Haoran Zhou; Honghua Chen; Yidan Feng; Qiong Wang; Jing Qin; Haoran Xie; Fu Lee Wang; Mingqiang Wei; Jun Wang;	In this paper, we propose a normal estimation method for unstructured point cloud.
1315	Sequential Motif Profiles and Topological Plots for Offline Signature Verification	Elias N. Zois; Evangelos Zervas; Dimitrios Tsourounis; George Economou;	In this paper, inspired by the recent use of image visibility graphs for mapping images into networks, we introduce for the first time in offline SV literature their use as a parameter free, agnostic representation for exploring global as well as local information.
1316	Optical Flow in Dense Foggy Scenes Using Semi-Supervised Learning	Wending Yan; Aashish Sharma; Robby T. Tan;	To address the problem, we introduce a semi-supervised deep learning technique that employs real fog images without optical flow ground-truths in the training process.
1317	A Spatial RNN Codec for End-to-End Image Compression	Chaoyi Lin; Jiabao Yao; Fangdong Chen; Li Wang;	In this paper, we propose a fast yet effective method for end-to-end image compression by incorporating a novel spatial recurrent neural network.
1318	Object Relational Graph With Teacher-Recommended Learning for Video Captioning	Ziqi Zhang; Yaya Shi; Chunfeng Yuan; Bing Li; Peijin Wang; Weiming Hu; Zheng-Jun Zha;	In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy.
1319	MMTM: Multimodal Transfer Module for CNN Fusion	Hamid Reza Vaezi Joze; Amirreza Shaban; Michael L. Iuzzolino; Kazuhito Koishida;	In this paper, we present a simple neural network module for leveraging the knowledge from multiple modalities in convolutional neural networks.
1320	Generalized Zero-Shot Learning via Over-Complete Distribution	Rohit Keshari; Richa Singh; Mayank Vatsa;	To learn a discriminative classifier which yields good performance in Zero-Shot Learning (ZSL) settings, we propose to generate an Over-Complete Distribution (OCD) using Conditional Variational Autoencoder (CVAE) of both seen and unseen classes.
1321	Gait Recognition via Semi-supervised Disentangled Representation Learning to Identity and Covariate Features	Xiang Li; Yasushi Makihara; Chi Xu; Yasushi Yagi; Mingwu Ren;	We therefore propose a method of gait recognition via disentangled representation learning that considers both identity and covariate features.
1322	Unifying Training and Inference for Panoptic Segmentation	Qizhu Li; Xiaojuan Qi; Philip H.S. Torr;	We present an end-to-end network to bridge the gap between training and inference pipeline for panoptic segmentation, a task that seeks to partition an image into semantic regions for "stuff" and object instances for "things".
1323	Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection	Liang Du; Xiaoqing Ye; Xiao Tan; Jianfeng Feng; Zhenbo Xu; Errui Ding; Shilei Wen;	In this paper, we innovatively propose a domain adaptation like approach to enhance the robustness of the feature representation.
1324	Interactive Image Segmentation With First Click Attention	Zheng Lin; Zhao Zhang; Lin-Zhuo Chen; Ming-Ming Cheng; Shao-Ping Lu;	In this paper, we demonstrate the critical role of the first click about providing the location and main body information of the target object.
1325	NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection	Yazhao Li; Yanwei Pang; Jianbing Shen; Jiale Cao; Ling Shao;	With this observation, we propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features.
1326	Scale-Equalizing Pyramid Convolution for Object Detection	Xinjiang Wang; Shilong Zhang; Zhuoran Yu; Litong Feng; Wayne Zhang;	Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution.
1327	Learning to Cluster Faces via Confidence and Connectivity Estimation	Lei Yang; Dapeng Chen; Xiaohang Zhan; Rui Zhao; Chen Change Loy; Dahua Lin;	In this paper, we propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs.
1328	Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer	Yan Lu; Yue Wu; Bin Liu; Tianzhu Zhang; Baopu Li; Qi Chu; Nenghai Yu;	In this paper, we tackle the above limitation by proposing a novel cross-modality shared-specific feature transfer algorithm (termed cm-SSFT) to explore the potential of both the modality-shared information and the modality-specific characteristics to boost the reidentification performance.
1329	DPGN: Distribution Propagation Graph Network for Few-Shot Learning	Ling Yang; Liangliang Li; Zilun Zhang; Xinyu Zhou; Erjin Zhou; Yu Liu;	We propose a novel approach named distribution propagation graph network (DPGN) for few-shot learning.
1330	Density-Aware Graph for Deep Semi-Supervised Visual Recognition	Suichan Li; Bin Liu; Dongdong Chen; Qi Chu; Lu Yuan; Nenghai Yu;	Motivated by these limitations, this paper proposes to solve the SSL problem by building a novel density-aware graph, based on which the neighborhood information can be easily leveraged and the feature learning and label propagation can also be trained in an end-to-end way.
1331	Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation	Moab Arar; Yiftach Ginger; Dov Danon; Amit H. Bermano; Daniel Cohen-Or;	In this work, we bypass the difficulties of developing cross-modality similarity measures, by training an image-to-image translation network on the two input modalities.
1332	Binarizing MobileNet via Evolution-Based Searching	Hai Phan; Zechun Liu; Dang Huynh; Marios Savvides; Kwang-Ting Cheng; Zhiqiang Shen;	In this paper, we propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet, a compact network with separable depth-wise convolution.
1333	Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians	Jialian Wu; Chunluan Zhou; Ming Yang; Qian Zhang; Yuan Li; Junsong Yuan;	In this paper, we exploit the local temporal context of pedestrians in videos and propose a tube feature aggregation network (TFAN) aiming at enhancing pedestrian detectors against severe occlusions.
1334	Orderless Recurrent Models for Multi-Label Classification	Vacit Oguz Yazici; Abel Gonzalez-Garcia; Arnau Ramisa; Bartlomiej Twardowski; Joost van de Weijer;	Therefore, in this paper, we propose ways to dynamically order the ground truth labels with the predicted label sequence.
1335	Gold Seeker: Information Gain From Policy Distributions for Goal-Oriented Vision-and-Langauge Reasoning	Ehsan Abbasnejad; Iman Abbasnejad; Qi Wu; Javen Shi; Anton van den Hengel;	We propose a reinforcement-learning approach that maintains a distribution over its internal information, thus explicitly representing the ambiguity in what it knows, and needs to know, towards achieving its goal.
1336	Rethinking the Route Towards Weakly Supervised Object Localization	Chen-Lin Zhang; Yun-Hao Cao; Jianxin Wu;	In this paper, we demonstrate that weakly supervised object localization should be divided into two parts: class-agnostic object localization and object classification.
1337	Adversarial Feature Hallucination Networks for Few-Shot Learning	Kai Li; Yulun Zhang; Kunpeng Li; Yun Fu;	In this paper, we propose Adversarial Feature Hallucination Networks (AFHN) which is based on conditional Wasserstein Generative Adversarial networks (cWGAN) and hallucinates diverse and discriminative features conditioned on the few labeled samples.
1338	Conditional Gaussian Distribution Learning for Open Set Recognition	Xin Sun; Zhenning Yang; Chi Zhang; Keck-Voon Ling; Guohao Peng;	In this paper, we propose a novel method, Conditional Gaussian Distribution Learning (CGDL), for open set recognition.
1339	Connect-and-Slice: An Hybrid Approach for Reconstructing 3D Objects	Hao Fang; Florent Lafarge;	In this paper, we address this issue with an hybrid method that successively connects and slices planes detected from 3D data.
1340	Attentive Weights Generation for Few Shot Learning via Information Maximization	Yiluan Guo; Ngai-Man Cheung;	In this work, we present Attentive Weights Generation for few shot learning via Information Maximization (AWGIM), which introduces two novel contributions: i) Mutual information maximization between generated weights and data within the task; this enables the generated weights to retain information of the task and the specific query sample.
1341	Assessing Eye Aesthetics for Automatic Multi-Reference Eye In-Painting	Bo Yan; Qing Lin; Weimin Tan; Shili Zhou;	In this paper, aesthetic assessment is introduced into eye in-painting task for the first time. We construct an eye aesthetic dataset, and train the eye aesthetic assessment network on this basis.
1342	PuppeteerGAN: Arbitrary Portrait Animation With Semantic-Aware Appearance Transformation	Zhuo Chen; Chaoyue Wang; Bo Yuan; Dacheng Tao;	In this paper, we devised a novel two-stage framework called PuppeteerGAN for solving these challenges.
1343	SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition	Zhi Qiao; Yu Zhou; Dongbao Yang; Yucan Zhou; Weiping Wang;	In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts.
1344	Texture and Shape Biased Two-Stream Networks for Clothing Classification and Attribute Recognition	Yuwei Zhang; Peng Zhang; Chun Yuan; Zhi Wang;	To this end, we propose to use two streams to enhance the extraction of shape and texture, respectively.
1345	Distortion Agnostic Deep Watermarking	Xiyang Luo; Ruohan Zhan; Huiwen Chang; Feng Yang; Peyman Milanfar;	In this paper, we propose a new framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training.
1346	RMP-SNN: Residual Membrane Potential Neuron for Enabling Deeper High-Accuracy and Low-Latency Spiking Neural Network	Bing Han; Gopalakrishnan Srinivasan; Kaushik Roy;	We propose ANN-SNN conversion using "soft reset" spiking neuron model, referred to as Residual Membrane Potential (RMP) spiking neuron, which retains the "residual" membrane potential above threshold at the firing instants.
1347	BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector	Yang Liu; Xu Tang;	To resolve this, the success of Neural Archi-tecture Search (NAS) inspires us to search face-appropriate backbone and featrue pyramid network (FPN) architecture.
1348	PFCNN: Convolutional Neural Networks on 3D Surfaces Using Parallel Frames	Yuqi Yang; Shilin Liu; Hao Pan; Yang Liu; Xin Tong;	We use parallel frames on surface to define PFCNNs that enable effective feature learning on surface meshes by mimicking standard convolutions faithfully.
1349	iTAML: An Incremental Task-Agnostic Meta-learning Approach	Jathushan Rajasegaran; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Mubarak Shah;	In this pursuit, we introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks.
1350	Optimal least-squares solution to the hand-eye calibration problem	Amit Dekel; Linus Harenstam-Nielsen; Sergio Caccamo;	We propose a least-squares formulation to the noisy hand-eye calibration problem using dual-quaternions, and introduce efficient algorithms to find the exact optimal solution, based on analytic properties of the problem, avoiding non-linear optimization.
1351	MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices	Bo Chen; Golnaz Ghiasi; Hanxiao Liu; Tsung-Yi Lin; Dmitry Kalenichenko; Hartwig Adam; Quoc V. Le;	We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.
1352	VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions	Oytun Ulutan; A S M Iftekhar; B. S. Manjunath;	VSGNet extracts visual features from the human-object pairs, refines the features with spatial configurations of the pair, and utilizes the structural connections between the pair via graph convolutions.
1353	End-to-End Camera Calibration for Broadcast Videos	Long Sha; Jennifer Hobbs; Panna Felsen; Xinyu Wei; Patrick Lucey; Sujoy Ganguly;	In this paper, we propose an end-to-end approach for single moving camera calibration across challenging scenarios in sports.
1354	Regularizing CNN Transfer Learning With Randomised Regression	Yang Zhong; Atsuto Maki;	This paper is about regularizing deep convolutional networks (CNNs) based on an adaptive framework for transfer learning with limited training data in the target domain.
1355	KeypointNet: A Large-Scale 3D Keypoint Dataset Aggregated From Numerous Human Annotations	Yang You; Yujing Lou; Chengkun Li; Zhoujun Cheng; Liangwei Li; Lizhuang Ma; Cewu Lu; Weiming Wang;	To handle the inconsistency between annotations from different people, we propose a novel method to aggregate these keypoints automatically, through minimization of a fidelity loss.
1356	Hierarchical Clustering With Hard-Batch Triplet Loss for Person Re-Identification	Kaiwei Zeng; Munan Ning; Yaohua Wang; Yang Guo;	In order to improve the quality of pseudo labels in existing methods, we propose the HCT method which combines hierarchical clustering with hard-batch triplet loss.
1357	Joint Semantic Segmentation and Boundary Detection Using Iterative Pyramid Contexts	Mingmin Zhen; Jinglu Wang; Lei Zhou; Shiwei Li; Tianwei Shen; Jiaxiang Shang; Tian Fang; Long Quan;	In this paper, we present a joint multi-task learning framework for semantic segmentation and boundary detection.
1358	Attention-Guided Hierarchical Structure Aggregation for Image Matting	Yu Qiao; Yuhao Liu; Xin Yang; Dongsheng Zhou; Mingliang Xu; Qiang Zhang; Xiaopeng Wei;	In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input.
1359	MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation	Rongchang Xie; Chunyu Wang; Yizhou Wang;	In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large number of cameras in the Panoptic dataset.
1360	Prior Guided GAN Based Semantic Inpainting	Avisek Lahiri; Arnav Kumar Jain; Sanskar Agrawal; Pabitra Mitra; Prabir Kumar Biswas;	In this paper, going against the general trend, we focus on the second paradigm of inpainting and address both of its mentioned problems.
1361	Weakly Supervised Semantic Point Cloud Segmentation: Towards 10x Fewer Labels	Xun Xu; Gim Hee Lee;	In this work, we propose a weakly supervised point cloud segmentation approach which requires only a tiny fraction of points to be labelled in the training stage.
1362	Physically Realizable Adversarial Examples for LiDAR Object Detection	James Tu; Mengye Ren; Sivabalan Manivasagam; Ming Liang; Bin Yang; Richard Du; Frank Cheng; Raquel Urtasun;	In this paper, we address this issue and present a method to generate universal 3D adversarial objects to fool LiDAR detectors.
1363	Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization	Hongxin Wei; Lei Feng; Xiangyu Chen; Bo An;	In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training.
1364	Light-weight Calibrator: A Separable Component for Unsupervised Domain Adaptation	Shaokai Ye; Kailu Wu; Mu Zhou; Yunfei Yang; Sia Huat Tan; Kaidi Xu; Jiebo Song; Chenglong Bao; Kaisheng Ma;	In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain’s performance.
1365	Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition	Canjie Luo; Yuanzhi Zhu; Lianwen Jin; Yongpan Wang;	In this paper, we propose a new method for text image augmentation.
1366	Learning Selective Self-Mutual Attention for RGB-D Saliency Detection	Nian Liu; Ni Zhang; Junwei Han;	In this paper, we propose to fuse attention learned in both modalities.
1367	Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation	Yangtao Zheng; Di Huang; Songtao Liu; Yunhong Wang;	To address such an issue, this paper proposes a novel coarse-to-fine feature adaptation approach to cross-domain object detection.
1368	Estimating Low-Rank Region Likelihood Maps	Gabriela Csurka; Zoltan Kato; Andor Juhasz; Martin Humenberger;	Herein, we propose a novel self-supervised low-rank region detection deep network that predicts a low-rank likelihood map from an image.
1369	Neural Head Reenactment with Latent Pose Descriptors	Egor Burkov; Igor Pasechnik; Artur Grigorev; Victor Lempitsky;	We propose a neural head reenactment system, which is driven by a latent pose representation and is capable of predicting the foreground segmentation alongside the RGB image.
1370	Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis	K R Prajwal; Rudrabha Mukhopadhyay; Vinay P. Namboodiri; C.V. Jawahar;	In this work, we explore the task of lip to speech synthesis, i.e., learning to generate natural speech given only the lip movements of a speaker.
1371	Self-Supervised Learning of Video-Induced Visual Invariances	Michael Tschannen; Josip Djolonga; Marvin Ritter; Aravindh Mahendran; Neil Houlsby; Sylvain Gelly; Mario Lucic;	We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI).
1372	Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer	Jan Svoboda; Asha Anoosheh; Christian Osendorfer; Jonathan Masci;	This paper introduces a neural style transfer model to generate a stylized image conditioning on a set of examples describing the desired style.
1373	MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment	Florian Bernard; Zeeshan Khan Suri; Christian Theobalt;	To this end, we propose a novel shape deformation model based on an efficient low-dimensional discrete model, so that finding a globally optimal solution is tractable in (most) practical cases.
1374	Improving One-Shot NAS by Suppressing the Posterior Fading	Xiang Li; Chen Lin; Chuming Li; Ming Sun; Wei Wu; Junjie Yan; Wanli Ouyang;	In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the Posterior Fading problem, which compromises the effectiveness of shared weights.
1375	Incremental Few-Shot Object Detection	Juan-Manuel Perez-Rua; Xiatian Zhu; Timothy M. Hospedales; Tao Xiang;	We present the first study aiming to go beyond these limitations by considering the Incremental Few-Shot Detection (iFSD) problem setting, where new classes must be registered incrementally (without revisiting base classes) and with few examples.
1376	Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data	Qi Chang; Hui Qu; Yikai Zhang; Mert Sabuncu; Chao Chen; Tong Zhang; Dimitris N. Metaxas;	In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN).
1377	Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation	Yingwei Pan; Ting Yao; Yehao Li; Chong-Wah Ngo; Tao Mei;	In this paper, we address this problem by augmenting the state-of-the-art domain adaptation technique, Self-Ensembling, with category-agnostic clusters in target domain.
1378	Regularizing Class-Wise Predictions via Self-Knowledge Distillation	Sukmin Yun; Jongjin Park; Kimin Lee; Jinwoo Shin;	To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples.
1379	Hierarchical Graph Attention Network for Visual Relationship Detection	Li Mi; Zhenzhong Chen;	In this work, a Hierarchical Graph Attention Network (HGAT) is proposed to capture the dependencies on both object-level and triplet-level.
1380	M2m: Imbalanced Classification via Major-to-Minor Translation	Jaehyung Kim; Jongheon Jeong; Jinwoo Shin;	In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples (e.g., images) from more-frequent classes.
1381	CenterMask: Real-Time Anchor-Free Instance Segmentation	Youngwan Lee; Jongyoul Park;	We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector (FCOS) in the same vein with Mask R-CNN.
1382	Multi-Path Learning for Object Pose Estimation Across Domains	Martin Sundermeyer; Maximilian Durner; En Yen Puang; Zoltan-Csaba Marton; Narunas Vaskevicius; Kai O. Arras; Rudolph Triebel;	We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together.
1383	Incremental Learning in Online Scenario	Jiangpeng He; Runyu Mao; Zeman Shao; Fengqing Zhu;	In this paper, we propose an incremental learning framework that can work in the challenging online learning scenario and handle both new classes data and new observations of old classes.
1384	Enhanced Transport Distance for Unsupervised Domain Adaptation	Mengxue Li; Yi-Ming Zhai; You-Wei Luo; Peng-Fei Ge; Chuan-Xian Ren;	In this work, we propose an enhanced transport distance (ETD) for UDA.
1385	TESA: Tensor Element Self-Attention via Matricization	Francesca Babiloni; Ioannis Marras; Gregory Slabaugh; Stefanos Zafeiriou;	In this paper, we introduce a new method, called Tensor Element Self-Attention (TESA) that generalizes such work to capture interdependencies along all dimensions of the tensor using matricization.
1386	Training a Steerable CNN for Guidewire Detection	Donghang Li; Adrian Barbu;	In this paper, we present a steerable Convolutional Neural Network (CNN), which is a Fully Convolutional Neural Network (FCNN) that can detect objects rotated by an arbitrary 2D angle, without being rotation invariant.
1387	Superpixel Segmentation With Fully Convolutional Networks	Fengting Yang; Qian Sun; Hailin Jin; Zihan Zhou;	Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, we present a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid.
1388	SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation	Koutilya PNVR; Hao Zhou; David Jacobs;	We propose a novel method for combining synthetic and real images when training networks to determine geometric information from a single image.
1389	Label Distribution Learning on Auxiliary Label Space Graphs for Facial Expression Recognition	Shikai Chen; Jianfeng Wang; Yuedong Chen; Zhongchao Shi; Xin Geng; Yong Rui;	To solve the problem, we propose a novel approach named Label Distribution Learning on Auxiliary Label Space Graphs(LDL-ALSG) that leverages the topological information of the labels from related but more distinct tasks, such as action unit recognition and facial landmark detection.
1390	Deep Residual Flow for Out of Distribution Detection	Ev Zisselman; Aviv Tamar;	In this work, we present a novel approach that improves upon the state-of-the-art by leveraging an expressive density model based on normalizing flows.
1391	FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation	Shurui Gui; Chaoyue Wang; Qihua Chen; Dacheng Tao;	In this work, we devised a novel structure-to-texture generation framework which splits the video interpolation task into two stages: structure-guided interpolation and texture refinement.
1392	Learning Nanoscale Motion Patterns of Vesicles in Living Cells	Arif Ahmed Sekh; Ida Sundvor Opstad; Asa Birna Birgisdottir; Truls Myrmel; Balpreet Singh Ahluwalia; Krishna Agarwal; Dilip K. Prasad;	We propose an integrative approach, built upon physics based simulations, nanoscopy algorithms, and shallow residual attention network to make it possible for the first time to analysis sub-resolution motion patterns in vesicles that may also be of sub-resolution diameter.
1393	Improving Action Segmentation via Graph-Based Temporal Reasoning	Yifei Huang; Yusuke Sugano; Yoichi Sato;	In this paper, we propose a network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans.
1394	Episode-Based Prototype Generating Network for Zero-Shot Learning	Yunlong Yu; Zhong Ji; Jungong Han; Zhongfei Zhang;	We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the learning system requires to recognize unseen classes given only the corresponding class semantics.
1395	Learning to Segment the Tail	Xinting Hu; Yi Jiang; Kaihua Tang; Jingyuan Chen; Chunyan Miao; Hanwang Zhang;	We propose a "divide&conquer" strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one.
1396	Learning to Evaluate Perception Models Using Planner-Centric Metrics	Jonah Philion; Amlan Kar; Sanja Fidler;	In this paper, we propose a principled metric for 3D object detection specifically for the task of self-driving.
1397	Where, What, Whether: Multi-Modal Learning Meets Pedestrian Detection	Yan Luo; Chongyang Zhang; Muming Zhao; Hao Zhou; Jun Sun;	In this paper, we propose W^3Net, which attempts to address above challenges by decomposing the pedestrian detection task into Where, What and Whether problem directing against pedestrian localization, scale prediction and classification correspondingly.
1398	CoverNet: Multimodal Behavior Prediction Using Trajectory Sets	Tung Phan-Minh; Elena Corina Grigore; Freddy A. Boulton; Oscar Beijbom; Eric M. Wolff;	We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving.
1399	Real-World Person Re-Identification via Degradation Invariance Learning	Yukun Huang; Zheng-Jun Zha; Xueyang Fu; Richang Hong; Liang Li;	In this paper, to solve the above problem, we propose a degradation invariance learning framework for real-world person Re-ID.
1400	Defending and Harnessing the Bit-Flip Based Adversarial Weight Attack	Zhezhi He; Adnan Siraj Rakin; Jingtao Li; Chaitali Chakrabarti; Deliang Fan;	In this work, we conduct comprehensive investigations on BFA and propose to leverage binarization-aware training and its relaxation — piece-wise clustering as simple and effective countermeasures to BFA.
1401	Adversarial Latent Autoencoders	Stanislav Pidhorskyi; Donald A. Adjeroh; Gianfranco Doretto;	We introduce an autoencoder that tackles these issues jointly, which we call Adversarial Latent Autoencoder (ALAE).
1402	Adaptive Fractional Dilated Convolution Network for Image Aesthetics Assessment	Qiuyu Chen; Wei Zhang; Ning Zhou; Peng Lei; Yi Xu; Yu Zheng; Jianping Fan;	In this paper, an adaptive fractional dilated convolution (AFDC), which is aspect-ratio-embedded, composition-preserving and parameter-free, is developed to tackle this issue natively in convolutional kernel level.
1403	Deep Generative Model for Robust Imbalance Classification	Xinyue Wang; Yilin Lyu; Liping Jing;	In this paper, a deep generative classifier is proposed to mitigate this issue via both data perturbation and model perturbation.
1404	Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses	Wanqing Zhao; Shaobo Zhang; Ziyu Guan; Wei Zhao; Jinye Peng; Jianping Fan;	In this paper, we develop a keypoint-based 6D object pose detection method (and its deep network) called Object Keypoint based POSe Estimation (OK-POSE).
1405	MetaIQA: Deep Meta-Learning for No-Reference Image Quality Assessment	Hancheng Zhu; Leida Li; Jinjian Wu; Weisheng Dong; Guangming Shi;	With this motivation, this paper presents a no-reference IQA metric based on deep meta-learning.
1406	Sketchformer: Transformer-Based Representation for Sketched Structure	Leo Sampaio Ferraz Ribeiro; Tu Bui; John Collomosse; Moacir Ponti;	Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes.
1407	Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation	Sunghun Joung; Seungryong Kim; Hanjae Kim; Minsu Kim; Ig-Jae Kim; Junghyun Cho; Kwanghoon Sohn;	To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.
1408	Learning a Unified Sample Weighting Network for Object Detection	Qi Cai; Yingwei Pan; Yu Wang; Jingen Liu; Ting Yao; Tao Mei;	To this end, we devise a general loss function to cover most region-based object detectors with various sampling strategies, and then based on it we propose a unified sample weighting network to predict a sample’s task weights.
1409	Old Is Gold: Redefining the Adversarially Learned One-Class Classifier Training Paradigm	Muhammad Zaigham Zaheer; Jin-Ha Lee; Marcella Astrid; Seung-Ik Lee;	In this study, we propose a framework that effectively generates stable results across a wide range of training steps and allows us to use both the generator and the discriminator of an adversarial model for efficient and robust anomaly detection.
1410	An Adaptive Neural Network for Unsupervised Mosaic Consistency Analysis in Image Forensics	Quentin Bammey; Rafael Grompone von Gioi; Jean-Michel Morel;	In this paper we develop a blind method that can train directly on unlabelled and potentially forged images to point out local mosaic inconsistencies.
1411	McFlow: Monte Carlo Flow Models for Data Imputation	Trevor W. Richardson; Wencheng Wu; Lei Lin; Beilei Xu; Edgar A. Bernal;	To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Carlo sampling.
1412	Learning to See Through Obstructions	Yu-Lun Liu; Wei-Sheng Lai; Ming-Hsuan Yang; Yung-Yu Chuang; Jia-Bin Huang;	We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera.
1413	GaitPart: Temporal Part-Based Model for Gait Recognition	Chao Fan; Yunjie Peng; Chunshui Cao; Xu Liu; Saihui Hou; Jiannan Chi; Yongzhen Huang; Qing Li; Zhiqiang He;	Then, we propose a novel part-based model GaitPart and get two aspects effect of boosting the performance: On the one hand, Focal Convolution Layer, a new applying of convolution, is presented to enhance the fine-grained learning of the part-level spatial features. On the other hand, the Micro-motion Capture Module (MCM) is proposed and there are several parallel MCMs in the GaitPart corresponding to the pre-defined parts of the human body, respectively.
1414	EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle	Trisha Mittal; Pooja Guhan; Uttaran Bhattacharya; Rohan Chandra; Aniket Bera; Dinesh Manocha;	We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. We also introduce a new dataset, GroupWalk, which is a collection of videos captured in multiple real-world settings of people walking.
1415	Can Deep Learning Recognize Subtle Human Activities?	Vincent Jacquot; Zhuofan Ying; Gabriel Kreiman;	In this work, we propose a new action classification challenge that is performed well by humans, but poorly by state-of-the-art Deep Learning models.
1416	PhysGAN: Generating Physical-World-Resilient Adversarial Examples for Autonomous Driving	Zelun Kong; Junfeng Guo; Ang Li; Cong Liu;	We present PhysGAN, which generates physical-world-resilient adversarial examples for misleading autonomous driving systems in a continuous manner.
1417	ILFO: Adversarial Attack on Adaptive Neural Networks	Mirazul Haque; Anki Chauhan; Cong Liu; Wei Yang;	In this paper, we investigate the robustness of neural networks against energy-oriented attacks.
1418	On Translation Invariance in CNNs: Convolutional Layers Can Exploit Absolute Spatial Location	Osman Semih Kayhan; Jan C. van Gemert;	In this paper we challenge the common assumption that convolutional layers in modern CNNs are translation invariant.
1419	Diverse Image Generation via Self-Conditioned GANs	Steven Liu; Tongzhou Wang; David Bau; Jun-Yan Zhu; Antonio Torralba;	We introduce a simple but effective unsupervised method for generating diverse images.
1420	Inducing Hierarchical Compositional Model by Sparsifying Generator Network	Xianglei Xing; Tianfu Wu; Song-Chun Zhu; Ying Nian Wu;	This paper proposes to learn hierarchical compositional AND-OR model for interpretable image synthesis by sparsifying the generator network.
1421	CARP: Compression Through Adaptive Recursive Partitioning for Multi-Dimensional Images	Rongjie Liu; Meng Li; Li Ma;	We present such a method for multi-dimensional image compression called Compression via Adaptive Recursive Partitioning (CARP).
1422	GrappaNet: Combining Parallel Imaging With Deep Learning for Multi-Coil MRI Reconstruction	Anuroop Sriram; Jure Zbontar; Tullie Murrell; C. Lawrence Zitnick; Aaron Defazio; Daniel K. Sodickson;	In this paper, we present a novel method to integrate traditional parallel imaging methods into deep neural networks that is able to generate high quality reconstructions even for high acceleration factors.
1423	Can Weight Sharing Outperform Random Architecture Search? An Investigation With TuNAS	Gabriel Bender; Hanxiao Liu; Bo Chen; Grace Chu; Shuyang Cheng; Pieter-Jan Kindermans; Quoc V. Le;	While the efficacies of both methods are problem-dependent, our experiments demonstrate that there are large, realistic tasks where efficient search methods can provide substantial gains over random search.
1424	Context Aware Graph Convolution for Skeleton-Based Action Recognition	Xikun Zhang; Chang Xu; Dacheng Tao;	In this paper, we propose a context aware graph convolutional network (CA-GCN).
1425	Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning	Thiago M. Paixao; Rodrigo F. Berriel; Maria C. S. Boeres; Alessandro L. Koerich; Claudine Badue; Alberto F. De Souza; Thiago Oliveira-Santos;	This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly (rather than quadratically) with the number of shreds.
1426	Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition	Luming Tang; Davis Wertheimer; Bharath Hariharan;	A solution is to use pose-normalized representations: first localize semantic parts in each image, and then describe images by characterizing the appearance of each part. While such representations are out of favor for fully supervised classification, we show that they are extremely effective for few-shot fine-grained classification.
1427	RankMI: A Mutual Information Maximizing Ranking Loss	Mete Kemertas; Leila Pishdad; Konstantinos G. Derpanis; Afsaneh Fazly;	We introduce an information-theoretic loss function, RankMI, and an associated training algorithm for deep representation learning for image retrieval.
1428	Learning Memory-Guided Normality for Anomaly Detection	Hyunjong Park; Jongyoun Noh; Bumsub Ham;	To address this problem, we present an unsupervised learning approach to anomaly detection that considers the diversity of normal patterns explicitly, while lessening the representation capacity of CNNs.
1429	Appearance Shock Grammar for Fast Medial Axis Extraction From Real Images	Charles-Olivier Dufresne Camaro; Morteza Rezanejad; Stavros Tsogkas; Kaleem Siddiqi; Sven Dickinson;	We combine ideas from shock graph theory with more recent appearance-based methods for medial axis extraction from complex natural scenes, improving upon the present best unsupervised method, in terms of efficiency and performance.
1430	Generalizing Hand Segmentation in Egocentric Videos With Uncertainty-Guided Model Adaptation	Minjie Cai; Feng Lu; Yoichi Sato;	In this work, we solve the hand segmentation generalization problem without requiring segmentation labels in the target domain.
1431	DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning	Jaime Spencer; Richard Bowden; Simon Hadfield;	We propose DeFeat-Net (Depth & Feature network), an approach to simultaneously learn a cross-domain dense feature representation, alongside a robust depth-estimation framework based on warped feature consistency.
1432	Learning Visual Motion Segmentation Using Event Surfaces	Anton Mitrokhin; Zhiyuan Hua; Cornelia Fermuller; Yiannis Aloimonos;	In this work we present a Graph Convolutional neural network for the task of scene motion segmentation by a moving camera.
1433	Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction	Abduallah Mohamed; Kun Qian; Mohamed Elhoseiny; Christian Claudel;	We propose the Social Spatio-Temporal Graph Convolutional Neural Network (Social-STGCNN), which substitutes the need of aggregation methods by modeling the interactions as a graph.
1434	Discriminative Multi-Modality Speech Recognition	Bo Xu; Cheng Lu; Yandong Guo; Jacob Wang;	In this paper, we propose a two-stage speech recognition model.
1435	Clean-Label Backdoor Attacks on Video Recognition Models	Shihao Zhao; Xingjun Ma; Xiang Zheng; James Bailey; Jingjing Chen; Yu-Gang Jiang;	In this paper, we show that existing image backdoor attacks are far less effective on videos, and outline 4 strict conditions where existing attacks are likely to fail: 1) scenarios with more input dimensions (eg. videos), 2) scenarios with high resolution, 3) scenarios with a large number of classes and few examples per class (a “sparse dataset”), and 4) attacks with access to correct labels (eg. clean-label attacks).
1436	Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors	Gilad Cohen; Guillermo Sapiro; Raja Giryes;	In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier.
1437	Unsupervised Model Personalization While Preserving Privacy and Scalability: An Open Problem	Matthias De Lange; Xu Jia; Sarah Parisot; Ales Leonardis; Gregory Slabaugh; Tinne Tuytelaars;	We aim to address this challenge within the continual learning paradigm and provide a novel Dual User-Adaptation framework (DUA) to explore the problem.
1438	GIFnets: Differentiable GIF Encoding Framework	Innfarn Yoo; Xiyang Luo; Yilin Wang; Feng Yang; Peyman Milanfar;	To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet.
1439	Learning Invariant Representation for Unsupervised Image Restoration	Wenchao Du; Hu Chen; Hongyu Yang;	Instead, we propose an unsupervised learning method that explicitly learns invariant presentation from noisy data and reconstructs clear observations.
1440	Improved Few-Shot Visual Classification	Peyman Bateni; Raghav Goyal; Vaden Masrani; Frank Wood; Leonid Sigal;	In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement.
1441	Learning Weighted Submanifolds With Variational Autoencoders and Riemannian Variational Autoencoders	Nina Miolane; Susan Holmes;	In this paper, we are interested in variants to learn potentially highly curved submanifolds of manifold-valued data.
1442	Learning Geocentric Object Pose in Oblique Monocular Images	Gordon Christie; Rodrigo Rene Rai Munoz Abujder; Kevin Foster; Shea Hagstrom; Gregory D. Hager; Myron Z. Brown;	Inspired by recent work in monocular height above ground prediction and optical flow prediction from static images, we develop an encoding of geocentric pose to address this challenge and train a deep network to compute the representation densely, supervised by publicly available airborne lidar.
1443	Understanding Adversarial Examples From the Mutual Influence of Images and Perturbations	Chaoning Zhang; Philipp Benz; Tooba Imtiaz; In So Kweon;	We propose to treat the DNN logits as a vector for feature representation, and exploit them to analyze the mutual influence of two independent inputs based on the Pearson correlation coefficient (PCC).
1444	Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models	Giannis Daras; Augustus Odena; Han Zhang; Alexandros G. Dimakis;	We introduce a new local sparse attention layer that preserves two-dimensional geometry and locality.
1445	MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion	Kentaro Wada; Edgar Sucar; Stephen James; Daniel Lenton; Andrew J. Davison;	We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision.
1446	HCNAF: Hyper-Conditioned Neural Autoregressive Flow and its Application for Probabilistic Occupancy Map Forecasting	Geunseob Oh; Jean-Sebastien Valois;	We introduce Hyper-Conditioned Neural Autoregressive Flow (HCNAF); a powerful universal distribution approximator designed to model arbitrarily complex conditional probability density functions.
1447	Detail-recovery Image Deraining via Context Aggregation Networks	Sen Deng; Mingqiang Wei; Jun Wang; Yidan Feng; Luming Liang; Haoran Xie; Fu Lee Wang; Meng Wang;	We propose an end-to-end detail-recovery image deraining network (termed a DRDNet) to solve the problem.
1448	MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model	Han Fu; Rui Wu; Chenghao Liu; Jianling Sun;	In this paper, we focus on the task of cross-modal retrieval between food images and cooking recipes.
1449	Hypergraph Attention Networks for Multimodal Learning	Eun-Sol Kim; Woo Young Kang; Kyoung-Woon On; Yu-Jung Heo; Byoung-Tak Zhang;	To resolve this problem, we propose Hypergraph Attention Networks (HANs), which define a common semantic space among the modalities with symbolic graphs and extract a joint representation of the modalities based on a co-attention map constructed in the semantic space.
1450	Moving in the Right Direction: A Regularization for Deep Metric Learning	Deen Dayal Mohan; Nishant Sankaran; Dennis Fedorishin; Srirangaraj Setlur; Venu Govindaraju;	In this work, we identify a shortcoming of existing loss formulations which fail to consider more optimal directions of pair displacements as another criterion for optimization.
1451	Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets	Daniel Haase; Manuel Amthor;	We introduce blueprint separable convolutions (BSConv) as highly efficient building blocks for CNNs.
1452	Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization	Lourenco V. Pato; Renato Negrinho; Pedro M. Q. Aguiar;	We propose to incorporate context in object detection by post-processing the output of an arbitrary detector to rescore the confidences of its detections.
1453	End-to-End Adversarial-Attention Network for Multi-Modal Clustering	Runwu Zhou; Yi-Dong Shen;	In this paper, we present an End-to-end Adversarial-attention network for Multi-modal Clustering (EAMC), where adversarial learning and attention mechanism are leveraged to align the latent feature distributions and quantify the importance of modalities respectively.
1454	Fast Sparse ConvNets	Erich Elsen; Marat Dukhan; Trevor Gale; Karen Simonyan;	In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts.
1455	Few Sample Knowledge Distillation for Efficient Network Compression	Tianhong Li; Jianguo Li; Zhuang Liu; Changshui Zhang;	This paper proposes a novel solution for knowledge distillation from label-free few samples to realize both data efficiency and training/processing efficiency.
1456	Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields	Michael Ramamonjisoa; Yuming Du; Vincent Lepetit;	We instead learn to predict, given a depth map predicted by some reconstruction method, a 2D displacement field able to re-sample pixels around the occlusion boundaries into sharper reconstructions.
1457	Shape correspondence using anisotropic Chebyshev spectral CNNs	Qinsong Li; Shengjun Liu; Ling Hu; Xinru Liu;	In this paper, we propose a novel architecture for shape correspondence, termed Anisotropic Chebyshev spectral CNNs (ACSCNNs), based on a new extension of the manifold convolution operator.
1458	RetinaTrack: Online Single Stage Joint Detection and Tracking	Zhichao Lu; Vivek Rathod; Ronny Votel; Jonathan Huang;	In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical.
1459	Multimodal Categorization of Crisis Events in Social Media	Mahdi Abavisani; Liwei Wu; Shengli Hu; Joel Tetreault; Alejandro Jaimes;	In this paper, we present a new multimodal fusion method that leverages both images and texts as input.
1460	SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings	Wenyu Han; Siyuan Xiang; Chenhui Liu; Ruoyu Wang; Chen Feng;	Can deep networks be trained to perform spatial reasoning tasks? How can we measure their “spatial intelligence”? To answer these questions, we present the SPARE3D dataset.
1461	SwapText: Image Based Texts Transfer in Scenes	Qiangpeng Yang; Jun Huang; Wei Lin;	In this work, we present SwapText, a three-stage framework to transfer texts across scene images.
1462	OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold	Mohamed Yousef; Tom E. Bishop;	We propose a novel and simple neural network module, termed OrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer, to convert it into a multi-line version by providing the model with enough spatial capacity to be able to properly collapse a 2D input signal into 1D without losing information.
1463	FroDO: From Detections to 3D Objects	Martin Runz; Kejie Li; Meng Tang; Lingni Ma; Chen Kong; Tanner Schmidt; Ian Reid; Lourdes Agapito; Julian Straub; Steven Lovegrove; Richard Newcombe;	We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers their location, pose and shape in a coarse to fine manner.
1464	Single-Step Adversarial Training With Dropout Scheduling	Vivek B.S.; R. Venkatesh Babu;	In this work, (i) we show that models trained using single-step adversarial training method learn to prevent the generation of single-step adversaries, and this is due to over-fitting of the model during the initial stages of training, and (ii) to mitigate this effect, we propose a single-step adversarial training method with dropout scheduling.
1465	Learning to Super Resolve Intensity Images From Events	S. Mohammad Mostafavi I.; Jonghyun Choi; Kuk-Jin Yoon;	We propose an end-to-end network to reconstruct high resolution, high dynamic range (HDR) images directly from the event stream.
1466	DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection	Liming Jiang; Ren Li; Wayne Wu; Chen Qian; Chen Change Loy;	In this paper, we present our on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection.
1467	CNN-Generated Images Are Surprisingly Easy to Spot… for Now	Sheng-Yu Wang; Oliver Wang; Richard Zhang; Andrew Owens; Alexei A. Efros;	In this work we ask whether it is possible to create a “universal” detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used.