Paper Digest: ECCV 2020 Highlights

August 20, 2020November 10, 2020 admin

Download ECCV-2020-Paper-Digests.pdf– highlights of all ECCV-2020 papers. Readers can choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers and patents.

Readers are also encouraged to read our ECCV 2020 Papers with Code/Data Page, which lists those papers that have published their code or data.

The European Conference on Computer Vision (ECCV) is one of the top computer vision conferences in the world. In 2020, it is to be held virtually due to covid-19 pandemic.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: Paper Digest: ECCV 2020 Highlights

	Title	Authors	Highlight
1	Quaternion Equivariant Capsule Networks for 3D Point Clouds	Yongheng Zhao; Tolga Birdal; Jan Eric Lenssen; Emanuele Menegatti; Leonidas Guibas; Federico Tombari;	We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points.
2	DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares	Yizhak Ben-Shabat; Stephen Gould;	We propose a surface fitting method for unstructured 3D point clouds.
3	NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search	Zhichao Lu; Kalyanmoy Deb; Erik Goodman; Wolfgang Banzhaf; Vishnu Naresh Boddeti;	In this paper, we propose an efficient NAS algorithm for generating task-specific models that are competitive under multiple competing objectives.
4	Describing Textures using Natural Language	Chenyun Wu; Mikayla Timm; Subhransu Maji;	In this paper, we study the problem of describing visual attributes of texture on a novel dataset containing rich descriptions of textures, and conduct a systematic study of current generative and discriminative models for grounding language to images on this dataset.
5	Empowering Relational Network by Self-Attention Augmented Conditional Random Fields for Group Activity Recognition	Rizard Renanda Adhi Pramono; Yie Tarng Chen; Wen Hsien Fang;	This paper presents a novel relational network for group activity recognition.
6	AiR: Attention with Reasoning Capability	Shi Chen; Ming Jiang; Jinhui Yang; Qi Zhao;	In this work, we propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes.
7	Self6D: Self-Supervised Monocular 6D Object Pose Estimation	Gu Wang; Fabian Manhardt; Jianzhun Shao; Xiangyang Ji; Nassir Navab ; Federico Tombari;	To overcome this shortcoming, we propose the idea of monocular 6D pose estimation by means of self-supervised learning, removing the need for real annotations.
8	Invertible Image Rescaling	Mingqing Xiao; Shuxin Zheng; Chang Liu; Yaolong Wang; Di He; Guolin Ke; Jiang Bian; Zhouchen Lin; Tie-Yan Liu;	In this work, we propose to solve this problem by modeling the downscaling and upscaling processes from a new perspective, i.e. an invertible bijective transformation, which can largely mitigate the ill-posed nature of image upscaling.
9	Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation	Yingda Xia; Yi Zhang; Fengze Liu; Wei Shen; Alan L. Yuille;	In this paper, we systematically study failure and anomaly detection for semantic segmentation and propose a unified framework, consisting of two modules, to address these two related problems.
10	House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation	Nelson Nauata; Kai-Hung Chang; Chin-Yi Cheng; Greg Mori; Yasutaka Furukawa;	This paper proposes a novel graph-constrained generative adversarial network, whose generator and discriminator are built upon relational architecture.
11	Crowdsampling the Plenoptic Function	Zhengqi Li; Wenqi Xian; Abe Davis; Noah Snavely;	In this paper,we present a new approach to novel view synthesis under time-varying illumination from such data.
12	VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment	Hanyue Tu; Chunyu Wang; Wenjun Zeng;	We present mph{VoxelPose} to estimate $3$D poses of multiple people from multiple camera views.
13	End-to-End Object Detection with Transformers	Nicolas Carion; Francisco Massa; Gabriel Synnaeve; Nicolas Usunier; Alexander Kirillov; Sergey Zagoruyko;	We present a new method that views object detection as a direct set prediction.
14	DeepSFM: Structure From Motion Via Deep Bundle Adjustment	Xingkui Wei; Yinda Zhang; Zhuwen Li; Yanwei Fu; Xiangyang Xue;	In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment (BA), which consists of two cost volume based architectures for depth and pose estimation respectively, iteratively running to improve both.
15	Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry	Yifan Xu; Tianqi Fan; Yi Yuan; Gurprit Singh;	Based on Farthest Point Sampling algorithm, we propose a sampling scheme that theoretically encourages better generalization performance, and results in fast convergence for SGD-based optimization algorithms.
16	Segment as Points for Efficient Online Multi-Object Tracking and Segmentation	Zhenbo Xu; Wei Zhang; Xiao Tan; Wei Yang; Huan Huang; Shilei Wen; Errui Ding; Liusheng Huang;	In this paper, we propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
17	Conditional Convolutions for Instance Segmentation	Zhi Tian; Chunhua Shen; Hao Chen;	We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).
18	MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution	Taojiannan Yang; Sijie Zhu; Chen Chen; Shen Yan; Mi Zhang; Andrew Willis;	We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.
19	Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset	Menglin Jia; Mengyun Shi; Mikhail Sirotenko; Yin Cui; Claire Cardie ; Bharath Hariharan; Hartwig Adam; Serge Belongie;	In order to solve this challenging task, we propose a novel Attribute-Mask R-CNN model to jointly perform instance segmentation and localized attribute recognition, and provide a novel evaluation metric for the task.
20	Privacy Preserving Structure-from-Motion	Marcel Geppert; Viktor Larsson; Pablo Speciale; Johannes L. Sch&oumlnberger; Marc Pollefeys;	In this paper, we further build upon this idea and propose solutions to the different core algorithms of an incremental Structure-from-Motion pipeline based on random line features.
21	Rewriting a Deep Generative Model	David Bau; Steven Liu; Tongzhou Wang; Jun-Yan Zhu; Antonio Torralba;	In this paper, we introduce a new problem setting: manipulation of specific rules encoded by a deep generative model.
22	Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets	Jiuniu Wang; Wenjia Xu; Qingzhong Wang; Antoni B. Chan;	In this paper, we aim to improve the distinctiveness of image captions through training with sets of similar images.
23	Long-term Human Motion Prediction with Scene Context	Zhe Cao; Hang Gao; Karttikeya Mangalam; Qi-Zhi Cai; Minh Vo; Jitendra Malik;	In this work, we propose a novel three-stage framework that exploits scene context to tackle this task.
24	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis	Ben Mildenhall; Pratul P. Srinivasan; Matthew Tancik; Jonathan T. Barron; Ravi Ramamoorthi; Ren Ng;	We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.
25	ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes	Panos Achlioptas; Ahmed Abdelreheem; Fei Xia; Mohamed Elhoseiny; Leonidas Guibas;	In this work we study the problem of using referential language to identify common objects in real-world 3D scenes.
26	MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images	Benjamin Attal; Selena Ling; Aaron Gokaslan; Christian Richardt; James Tompkin;	We introduce a method to convert stereo 360 (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering.
27	Learning and Aggregating Deep Local Descriptors for Instance-level Recognition	Giorgos Tolias; Tomas Jenicek; Ond?ej Chum;	We propose an efficient method to learn deep local descriptors for instance-level recognition.
28	A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem	George Terzakis; Manolis Lourakis;	An approach for estimating the pose of a camera given a set of 3D points and their corresponding 2D image projections is presented.
29	Learn to Recover Visible Color for Video Surveillance in a Day	Guangming Wu; Yinqiang Zheng; Zhiling Guo; Zekun Cai; Xiaodan Shi; Xin Ding; Yifei Huang; Yimin Guo; Ryosuke Shibasaki;	In this paper, we present a deep learning based approach that directly generates human-friendly, visible color for video surveillance in a day.
30	Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images	Heming Zhu; Yu Cao; Hang Jin; Weikai Chen; Dong Du; Zhangye Wang; Shuguang Cui; Xiaoguang Han;	We propose to fill this gap by introducing DeepFashion3D, the largest collection to date of 3D garment models, with the goal of establishing a novel benchmark and dataset for the evaluation of image-based garment reconstruction systems.
31	Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation	Zhenda Xie; Zheng Zhang; Xizhou Zhu; Gao Huang; Stephen Lin;	Towards reducing this superfluous computation, we propose to compute features only at sparsely sampled locations, which are probabilistically chosen according to activation responses, and then densely reconstruct the feature map with an efficient interpolation procedure.
32	BorderDet: Border Feature for Dense Object Detection	Han Qiu; Yuchen Ma; Zeming Li; Songtao Liu; Jian Sun;	In this paper, We propose a simple and efficient operator called Border-Align to extract “border features” from the extreme point of the border to enhance the point feature.
33	Regularization with Latent Space Virtual Adversarial Training	Genki Osada; Budrul Ahsan; Revoti Prasad Bora; Takashi Nishide;	To address this problem we propose LVAT (Latent space VAT), which injects perturbation in the latent space instead of the input space.
34	Du&sup2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels	Yinda Zhang; Neal Wadhwa; Sergio Orts-Escolano; Christian H&aumlne; Sean Fanello; Rahul Garg;	We present a novel approach based on neural networks for depth estimation that combines stereo from dual cameras with stereo from a dual-pixel sensor, which is increasingly common on consumer cameras.
35	Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot learning	Jaekyeom Kim; Hyoungseok Kim; Gunhee Kim;	We propose a model-agnostic method that improves the test-time performance of any few-shot learning models with no additional training, and thus is free from the training-test domain gap.
36	Targeted Attack for Deep Hashing based Retrieval	Jiawang Bai; Bin Chen; Yiming Li; Dongxian Wu; Weiwei Guo; Shu-Tao Xia; En-Hui Yang;	In this paper, we propose a novel method, dubbed deep hashing targeted attack (DHTA), to study the targeted attack on such retrieval.
37	Gradient Centralization: A New Optimization Technique for Deep Neural Networks	Hongwei Yong; Jianqiang Huang; Xiansheng Hua; Lei Zhang;	Different from those previous methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean.
38	Content-Aware Unsupervised Deep Homography Estimation	Jirong Zhang; Chuan Wang; Shuaicheng Liu; Lanpeng Jia; Nianjin Ye; Jue Wang; Ji Zhou; Jian Sun;	To overcome these problems, in this work we propose an unsupervised deep homography method with a new architecture design.
39	Multi-View Optimization of Local Feature Geometry	Mihai Dusmanu; Johannes L. Sch&oumlnberger; Marc Pollefeys;	In this work, we address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry.
40	The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization	Jingjing Shen; Thomas J. Cashman; Qi Ye; Tim Hutton; Toby Sharp; Federica Bogo; Andrew Fitzgibbon; Jamie Shotton;	To solve model-fitting problems for HoloLens 2 hand tracking, where the computational budget is approximately 100 times smaller than an iPhone 7, we introduce a new surface model: the `Phong surface’.
41	Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video	Miao Liu; Siyu Tang; Yin Li; James M. Rehg;	Motivated by this observation, we adopt intentional hand movement as a feature representation, and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action.
42	Learning Stereo from Single Images	Jamie Watson; Oisin Mac Aodha; Daniyar Turmukhambetov; Gabriel J. Brostow; Michael Firman;	We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding stereo pairs.
43	Prototype Rectification for Few-Shot Learning	Jinlu Liu; Liang Song; Yongqiang Qin;	In this paper, we figure out two key influencing factors of the process: the intra-class bias and the cross-class bias. We then propose a simple yet effective approach for prototype rectification in transductive setting.
44	Learning Feature Descriptors using Camera Pose Supervision	Qianqian Wang; Xiaowei Zhou; Bharath Hariharan; Noah Snavely;	In this paper we propose a novel weakly-supervised framework that can learn feature descriptors solely from relative camera poses between images.
45	Semantic Flow for Fast and Accurate Scene Parsing	Xiangtai Li; Ansheng You; Zhen Zhu; Houlong Zhao; Maoke Yang; Kuiyuan Yang; Shaohua Tan; Yunhai Tong;	In this paper, we focus on designing effective method for fast and accurate scene parsing.
46	Appearance Consensus Driven Self-Supervised Human Mesh Recovery	Jogendra Nath Kundu; Mugalodi Rakesh; Varun Jampani; Rahul Mysore Venkatesh; R. Venkatesh Babu;	We present a self-supervised human mesh recovery framework to infer human pose and shape from monocular images in the absence of any paired supervision.
47	Diffraction Line Imaging	Mark Sheinin; Dinesh N. Reddy; Matthew O’Toole; Srinivasa G. Narasimhan;	We present a novel computational imaging principle that combines diffractive optics with line (1D) sensing.
48	Aligning and Projecting Images to Class-conditional Generative Networks	Minyoung Huh; Richard Zhang; Jun-Yan Zhu; Sylvain Paris; Aaron Hertzmann;	We present a method for projecting an input image into the space of a class-conditional generative neural network.
49	Suppress and Balance: A Simple Gated Network for Salient Object Detection	Xiaoqi Zhao; Youwei Pang; Lihe Zhang; Huchuan Lu; Lei Zhang;	In this work, we propose a simple gated network (GateNet) to solve both issues at once.
50	Visual Memorability for Robotic Interestingness via Unsupervised Online Learning	Chen Wang; Wenshan Wang; Yuheng Qiu; Yafei Hu; Sebastian Scherer;	In this paper, we explore the problem of interesting scene prediction for mobile robots.
51	Post-Training Piecewise Linear Quantization for Deep Neural Networks	Jun Fang; Ali Shafiee; Hamzah Abdel-Aziz; David Thorsley; Georgios Georgiadis; Joseph H. Hassoun;	In this paper, we propose a PieceWise Linear Quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails.
52	Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification	Yang Zou; Xiaodong Yang; Zhiding Yu; B.V.K. Vijaya Kumar; Jan Kautz;	In this paper, we seek to improve adaptation by purifying the representation space to be adapted.
53	In-Home Daily-Life Captioning Using Radio Signals	Lijie Fan; Tianhong Li; Yuan Yuan; Dina Katabi;	We introduce RF-Diary, a new model for captioning daily life by analyzing the privacy-preserving radio signal in the home with the home’s floormap.
54	Self-Challenging Improves Cross-Domain Generalization	Zeyi Huang; Haohan Wang; Eric P. Xing; Dong Huang;	We introduce a simple training heuristic, Representation Self-Challenging (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
55	A Competence-aware Curriculum for Visual Concepts Learning via Question Answering	Qing Li; Siyuan Huang; Yining Hong; Song-Chun Zhu;	To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner.
56	Multitask Learning Strengthens Adversarial Robustness	Chengzhi Mao; Amogh Gupta; Vikram Nitin; Baishakhi Ray; Shuran Song ; Junfeng Yang; Carl Vondrick;	We present both theoretical and empirical analyses that connect the adversarial robustness of a model to the number of tasks that it is trained on.
57	S2DNAS: Transforming Static CNN Model for Dynamic Inference via Neural Architecture Search	Zhihang Yuan; Bingzhe Wu; Guangyu Sun; Zheng Liang; Shiwan Zhao; Weichen Bi;	In this paper, we introduce a general framework, S2DNAS, which can transform various static CNN models to support dynamic inference via neural architecture search.
58	Improving Deep Video Compression by Resolution-adaptive Flow Coding	Zhihao Hu; Zhenghao Chen; Dong Xu; Guo Lu; Wanli Ouyang; Shuhang Gu;	In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder.
59	Motion Capture from Internet Videos	Junting Dong; Qing Shuai; Yuanqing Zhang; Xian Liu; Xiaowei Zhou; Hujun Bao;	To address these challenges, we propose a novel optimization-based framework and experimentally demonstrate its ability to recover much more precise and detailed motion from multiple videos, compared against monocular motion capture methods.
60	Appearance-Preserving 3D Convolution for Video-based Person Re-identification	Xinqian Gu; Hong Chang; Bingpeng Ma; Hongkai Zhang; Xilin Chen;	To address this problem, we propose Appearance-Preserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel.
61	Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization	Dylan Campbell; Liu Liu; Stephen Gould;	We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors.
62	Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation	Xingang Pan; Xiaohang Zhan; Bo Dai; Dahua Lin; Chen Change Loy; Ping Luo;	This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images.
63	Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures	Mantang Guo; Junhui Hou; Jing Jin; Jie Chen; Lap-Pui Chau;	To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures.
64	Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling	Xuesong Niu; Zitong Yu; Hu Han; Xiaobai Li; Shiguang Shan; Guoying Zhao;	To address these challenges, we propose a cross-verified feature disentangling strategy to disentangle the physiological features with non-physiological representations such as head movements and lighting conditions, and then use the distilled physiological features for robust multi-task physiological measurements.
65	Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction	Bharat Lal Bhatnagar; Cristian Sminchisescu; Christian Theobalt; Gerard Pons-Moll;	Given sparse 3D point clouds sampled on the surface of a dressed person, we use an Implicit Part Network (IP-Net) to jointly predict the outer 3D surface of the dressed person, the inner body surface, and the semantic correspondences to a parametric body model.
66	Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network	Tsai-Shien Chen; Chih-Ting Liu; Chih-Wei Wu; Shao-Yi Chien;	In this work, we propose a dedicated Semantics-guided Part Attention Network (SPAN) to robustly predict part attention masks for different views of vehicles given only image-level semantic labels during training.
67	Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation	Guolei Sun; Wenguan Wang; Jifeng Dai; Luc Van Gool;	This paper studies the problem of learning semantic segmentation from image-level supervision only.
68	CoReNet: Coherent 3D Scene Reconstruction from a Single RGB Image	Stefan Popov; Pablo Bauszat; Vittorio Ferrari;	Building on common encoder-decoder architectures for this task, we propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner (2) a hybrid 3D volume representation that enables building translation equivariant models, while at the same time encoding fine object details without an excessive memory footprint (3) a reconstruction loss tailored to capture overall object geometry.
69	Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs	Lei Huang; Jie Qin; Li Liu; Fan Zhu; Ling Shao;	To this end, we propose layer-wise conditioning analysis, which explores the optimization landscape with respect to each layer independently.
70	RAFT: Recurrent All-Pairs Field Transforms for Optical Flow	Zachary Teed; Jia Deng;	We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for estimating optical flow.
71	Domain-invariant Stereo Matching Networks	Feihu Zhang; Xiaojuan Qi; Ruigang Yang; Victor Prisacariu; Benjamin Wah; Philip Torr;	In this paper, we aim at designing a domain-invariant stereo matching network (DSMNet) that generalizes well to unseen scenes.
72	DeepHandMesh: A Weakly-supervised Deep Encoder-Decoder Framework for High-fidelity Hand Mesh Modeling	Gyeongsik Moon; Takaaki Shiratori; Kyoung Mu Lee;	In this study, we firstly propose DeepHandMesh, a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling.
73	Content Adaptive and Error Propagation Aware Deep Video Compression	Guo Lu; Chunlei Cai; Xiaoyun Zhang; Li Chen; Wanli Ouyang; Dong Xu ; Zhiyong Gao;	To address these two problems, we propose a content adaptive and error propagation aware video compression system.
74	Towards Streaming Perception	Mengtian Li; Yu-Xiong Wang; Deva Ramanan;	To these ends, we present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception, which we refer to as &quot&quotstreaming accuracy&quot&quot.
75	Towards Automated Testing and Robustification by Semantic Adversarial Data Generation	Rakshith Shetty; Mario Fritz; Bernt Schiele;	In this work we propose semantic adversarial editing,a method to synthesize plausible but difficult data points on which our target model breaks down.
76	Adversarial Generative Grammars for Human Activity Prediction	AJ Piergiovanni; Anelia Angelova; Alexander Toshev; Michael S. Ryoo;	In this paper we propose an adversarial generative grammar model for future prediction.
77	GDumb: A Simple Approach that Questions Our Progress in Continual Learning	Ameya Prabhu; Philip H. S. Torr; Puneet K. Dokania;	To validate this, we propose GDumb that (1) greedily stores samples in memory as they come and (2) at test time, trains a model from scratch using samples only in the memory.
78	Learning Lane Graph Representations for Motion Forecasting	Ming Liang; Bin Yang; Rui Hu; Yun Chen; Renjie Liao; Song Feng; Raquel Urtasun;	We propose a motion forecasting model that exploits a novel structured map representation as well as actor-map interactions.
79	What Matters in Unsupervised Optical Flow	Rico Jonschkowski; Austin Stone; Jonathan T. Barron; Ariel Gordon; Kurt Konolige; Anelia Angelova;	By combining the results of our investigation with our improved model components, we are able to present a new unsupervised flow technique that significantly outperforms the previous unsupervised state-of-the-art and performs on par with supervised FlowNet2 on the KITTI 2015 dataset, while also being significantly simpler than related approaches.
80	Synthesis and Completion of Facades from Satellite Imagery	Xiaowei Zhang; Christopher May; Daniel Aliaga;	We present a machine learning-based inverse procedural modeling method to automatically create synthetic facades from satellite imagery.
81	Mapillary Planet-Scale Depth Dataset	Manuel L&oacutepez Antequera; Pau Gargallo; Markus Hofinger; Samuel Rota Bulò Yubin Kuang; Peter Kontschieder;	We introduce a new depth dataset that is an order of magnitude larger than previous datasets, but more importantly, contains an unprecedented gamut of locations, camera models and scene types while offering metric depth (not just up-to-scale).
82	V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction	Tsun-Hsuan Wang; Sivabalan Manivasagam; Ming Liang; Bin Yang; Wenyuan Zeng; Raquel Urtasun;	In this paper, we explore the use of vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles.
83	Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters	Haoyu Liang; Zhihao Ouyang; Yuyuan Zeng; Hang Su; Zihao He; Shu-Tao Xia; Jun Zhu; Bo Zhang;	Inspired by cellular differentiation, we propose a novel strategy to train interpretable CNNs by encouraging class-specific filters, among which each filter responds to only one (or few) class.
84	EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning	Bailin Li; Bowen Wu; Jiang Su; Guangrun Wang;	In this work, we present a pruning method called EagleEye, in which a simple yet efficient evaluation component based on adaptive batch normalization is applied to unveil a strong correlation between different pruned DNN structures and their final settled accuracy.
85	Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation	Marie-Julie Rakotosaona; Maks Ovsjanikov;	We present a learning-based method for interpolating and manipulating 3D shapes represented as point clouds, that is explicitly designed to preserve intrinsic shape properties.
86	Cross-Domain Cascaded Deep Translation	Oren Katzir; Dani Lischinski; Daniel Cohen-Or;	We mitigate this by descending the deep layers of a pre-trained network, where the deep features contain more semantics, and applying the translation between these deep feature.
87	“Look Ma, no landmarks!” – Unsupervised, Model-based Dense Face Alignment	Tatsuro Koizumi; William A. P. Smith;	In this paper, we show how to train an image-to-image network to predict dense correspondence between a face image and a 3D morphable model using only the model for supervision.
88	Online Invariance Selection for Local Feature Descriptors	R&eacutemi Pautrat; Viktor Larsson; Martin R. Oswald; Marc Pollefeys;	We propose to overcome this limitation with a disentanglement of invariance in local descriptors and with an online selection of the most appropriate invariance given the context.
89	Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations	Hongyu Liu; Bin Jiang; Yibing Song; Wei Huang; Chao Yang;	In this paper, we propose a mutual encoder-decoder CNN for joint recovery of both.
90	TextCaps: a Dataset for Image Captioning with Reading Comprehension	Oleksii Sidorov; Ronghang Hu; Marcus Rohrbach; Amanpreet Singh;	To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images.
91	It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction	Karttikeya Mangalam; Harshayu Girase; Shreyas Agarwal; Kuan-Hui Lee; Ehsan Adeli; Jitendra Malik; Adrien Gaidon;	In this work, we present Predicted Endpoint Conditioned Network (PECNet) for flexible human trajectory prediction.
92	Learning What to Learn for Video Object Segmentation	Goutam Bhat; Felix J&aumlremo Lawin; Martin Danelljan; Andreas Robinson; Michael Felsberg; Luc Van Gool; Radu Timofte;	We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learner.
93	SIZER: A Dataset and Model for Parsing 3D Clothing and Learning Size Sensitive 3D Clothing	Garvita Tiwari; Bharat Lal Bhatnagar; Tony Tung; Gerard Pons-Moll;	In this paper, we introduce SizerNet to predict 3D clothing conditionedon human body shape and garment size parameters, and ParserNet toinfer garment meshes and shape under clothing with personal details in asingle pass from an input mesh.
94	LIMP: Learning Latent Shape Representations with Metric Preservation Priors	Luca Cosmo; Antonio Norelli; Oshri Halimi; Ron Kimmel; Emanuele Rodolà	In this paper, we advocate the adoption of metric preservation as a powerful prior for learning latent representations of deformable 3D shapes.
95	Unsupervised Sketch to Photo Synthesis	Runtao Liu; Qian Yu; Stella X. Yu;	We study unsupervised sketch to photo synthesis for the first time, learning from unpaired sketch and photo data where the target photo for a sketch is unknown during training.
96	A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions	Evgenia Rusak; Lukas Schott; Roland S. Zimmermann; Julian Bitterwolf ; Oliver Bringmann; Matthias Bethge; Wieland Brendel;	Here, we demonstrate that a simple but properly tuned training with additive Gaussian and Speckle noise generalizes surprisingly well to unseen corruptions, easily reaching the state of the art on the corruption benchmark ImageNet-C (with ResNet50) and on MNIST-C.
97	SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification	Yida Wang; David Joseph Tan; Nassir Navab; Federico Tombari;	In this paper, we propose a method for 3D object completion and classification based on point clouds.
98	Hierarchical Face Aging through Disentangled Latent Characteristics	Peipei Li; Huaibo Huang; Yibo Hu; Xiang Wu; Ran He; Zhenan Sun;	To explore the age effects on facial images, we propose a Disentangled Adversarial Autoencoder (DAAE) to disentangle the facial images into three independent factors: age, identity and extraneous information.
99	Hybrid Models for Open Set Recognition	Hongjie Zhang; Ang Li; Jie Guo; Yanwen Guo;	We propose the OpenHybrid framework, which is composed of an encoder to encode the input data into a joint embedding space, a classifier to classify samples to inlier classes, and a flow-based density estimator to detect whether a sample belongs to the unknown category.
100	TopoGAN: A Topology-Aware Generative Adversarial Network	Fan Wang; Huidong Liu; Dimitris Samaras; Chao Chen;	In this paper, we propose a novel GAN model that learns the topology of real images, i.e., connectedness and loopy-ness.
101	Learning to Localize Actions from Moments	Fuchen Long; Ting Yao; Zhaofan Qiu; Xinmei Tian; Jiebo Luo; Tao Mei;	In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.
102	ForkGAN: Seeing into the Rainy Night	Ziqiang Zheng; Yang Wu; Xinran Han; Jianbo Shi;	We present a ForkGAN for task-agnostic image translation that can boost multiple vision tasks in adverse weather conditions.
103	TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning	Xinwei Sun; Yilun Xu; Peng Cao; Yuqing Kong; Lingjing Hu; Shanghang Zhang; Yizhou Wang;	In this paper, we propose a novel information-theoretic approach \– namely, extbf{T}otal extbf{C}orrelation extbf{G}ain extbf{M}aximization (TCGM) \— for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) has theoretical guarantee to have theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities.
104	ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval	Quan Cui; Qing-Yuan Jiang; Xiu-Shen Wei; Wu-Jun Li; Osamu Yoshie;	In this paper, we study the novel fine-grained hashing topic to generate compact binary codes for fine-grained images, leveraging the search and storage efficiency of hash learning to alleviate the aforementioned problems.
105	TSIT: A Simple and Versatile Framework for Image-to-Image Translation	Liming Jiang; Changxu Zhang; Mingyang Huang; Chunxiao Liu; Jianping Shi; Chen Change Loy;	We introduce a simple and versatile framework for image-to-image translation.
106	ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices	Xiangyu He; Zitao Mo; Ke Cheng; Weixiang Xu; Qinghao Hu; Peisong Wang; Qingshan Liu; Jian Cheng;	In this paper, by introducing an appropriate proxy matrix, we reduce the weights quantization error while circumventing explicit binary regularizations on the full-precision auxiliary variables.
107	HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular Multi-Person 3D Pose Estimation	Can Wang; Jiefeng Li; Wentao Liu; Chen Qian; Cewu Lu;	In this paper, we attempt to address the lack of a global perspective of the top-down approaches by introducing a novel form of supervision – Hierarchical Multi-person Ordinal Relations (HMOR).
108	Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve	Weicheng Kuo; Anelia Angelova; Tsung-Yi Lin; Angela Dai;	We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose.
109	A Unified Framework of Surrogate Loss by Refactoring and Interpolation	Lanlan Liu; Mingzhe Wang; Jia Deng;	We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent, reducing the amount of manual design of task-specific surrogate losses.
110	Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images	Sai Bi; Zexiang Xu; Kalyan Sunkavalli; Miloš Hašan; Yannick Hold-Geoffroy; David Kriegman; Ravi Ramamoorthi;	We present a deep learning approach to reconstruct scene appearance from unstructured images captured under collocated point lighting.
111	Memory-augmented Dense Predictive Coding for Video Representation Learning	Tengda Han; Weidi Xie; Andrew Zisserman;	The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.
112	PointMixup: Augmentation for Point Clouds	Yunlu Chen; Vincent Tao Hu; Efstratios Gavves; Thomas Mensink; Pascal Mettes; Pengwan Yang; Cees G. M. Snoek;	In this paper, we define data augmentation between point clouds as a shortest path linear interpolation.
113	Identity-Guided Human Semantic Parsing for Person Re-Identification	Kuan Zhu; Haiyun Guo; Zhiwei Liu; Ming Tang; Jinqiao Wang;	In this paper, we propose the identity-guided human semantic parsing approach (ISP) to locate both the human body parts and personal belongings at pixel-level for aligned person re-ID only with person identity labels.
114	Learning Gradient Fields for Shape Generation	Ruojin Cai; Guandao Yang; Hadar Averbuch-Elor; Zekun Hao; Serge Belongie; Noah Snavely; Bharath Hariharan;	In this work, we propose a novel technique to generate shapes from point cloud data.
115	COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder	Kuniaki Saito; Kate Saenko; Ming-Yu Liu;	To address the issue, we propose a new few-shot image translation model, COCO-FUNIT, which computes the style embedding of the example images conditioned on the input image and a new module called the constant style bias.
116	Corner Proposal Network for Anchor-free, Two-stage Object Detection	Kaiwen Duan; Lingxi Xie; Honggang Qi; Song Bai; Qingming Huang; Qi Tian;	This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals by finding potential corner keypoint combinations and then assigns a class label to each proposal by a standalone classification stage.
117	PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click	Henghui Ding; Scott Cohen; Brian Price; Xudong Jiang;	We propose to employ phrase expressions as another interaction input to infer the attributes of target object.
118	Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing	Yapeng Tian; Dingzeyu Li; Chenliang Xu;	In this paper, we introduce a new problem, named audio-visual video parsing, which aims to parse a video into temporal event segments and label them as either audible, visible, or both.
119	Learning Delicate Local Representations for Multi-Person Pose Estimation	Yuanhao Cai; Zhicheng Wang; Zhengxiong Luo; Binyi Yin; Angang Du; Haoqian Wang; Xiangyu Zhang; Xinyu Zhou; Erjin Zhou; Jian Sun;	In this paper, we propose a novel method called Residual Steps Network (RSN).
120	Learning to Plan with Uncertain Topological Maps	Edward Beeching; Jilles Dibangoye; Olivier Simonin; Christian Wolf;	Our main contribution is a data driven learning based approach for planning under uncertainty in topological maps, requiring an estimate of shortest paths in valued graphs with a probabilistic structure.
121	Neural Design Network: Graphic Layout Generation with Constraints	Hsin-Ying Lee; Lu Jiang; Irfan Essa; Phuong B Le; Haifeng Gong; Ming-Hsuan Yang; Weilong Yang;	We propose a method for design layout generation that can satisfy user-specified constraints.
122	Learning Open Set Network with Discriminative Reciprocal Points	Guangyao Chen; Limeng Qiao; Yemin Shi; Peixi Peng; Jia Li; Tiejun Huang; Shiliang Pu; Yonghong Tian;	In this paper, we propose a new concept, Reciprocal Point, which is the potential representation of the extra-class space corresponding to each known category.
123	Convolutional Occupancy Networks	Songyou Peng; Michael Niemeyer; Lars Mescheder; Marc Pollefeys; Andreas Geiger;	In this paper, we propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
124	Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry	He Chen; Pengfei Guo; Pengfei Li; Gim Hee Lee; Gregory Chirikjian;	In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
125	TIDE: A General Toolbox for Identifying Object Detection Errors	Daniel Bolya; Sean Foley; James Hays; Judy Hoffman;	We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms.
126	PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding	Saining Xie; Jiatao Gu; Demi Guo; Charles R. Qi; Leonidas Guibas; Or Litany;	In this work, we aim at facilitating research on 3D representation learning.
127	DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation	Xuefei Ning; Tianchen Zhao; Wenshuo Li; Peng Lei; Yu Wang; Huazhong Yang;	In this paper, we propose Differentiable Sparsity Allocation (DSA), an efficient end-to-end budgeted pruning flow.
128	Circumventing Outliers of AutoAugment with Knowledge Distillation	Longhui Wei; An Xiao; Lingxi Xie; Xiaopeng Zhang; Xin Chen; Qi Tian;	This paper delves deep into the working mechanism, and reveals that AutoAugment may remove part of discriminative information from the training image and so insisting on the ground-truth label is no longer the best option.
129	S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching	Hugo Germain; Guillaume Bourmaud; Vincent Lepetit;	In this paper, we introduce S2DNet, a novel feature matching pipeline, designed and trained to efficiently establish both robust and accurate correspondences.
130	RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving	Peixuan Li; Huaici Zhao; Pengfei Liu; Feidao Cao;	In this work, we propose an efficient and accurate monocular 3D detection framework in single shot.
131	Video Object Segmentation with Episodic Graph Memory Networks	Xiankai Lu; Wenguan Wang; Martin Danelljan; Tianfei Zhou; Jianbing Shen; Luc Van Gool;	In this work, a graph memory network is developed to address the novel idea of “learning to update the segmentation model”.
132	Rethinking Bottleneck Structure for Efficient Mobile Network Design	Daquan Zhou; Qibin Hou; Yunpeng Chen; Jiashi Feng; Shuicheng Yan;	In this paper, we rethink the necessity of such design change and find it may bring risks of information loss and gradient confusion.
133	Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks	Jeffrey O. Zhang; Alexander Sax; Amir Zamir; Leonidas Guibas; Jitendra Malik;	The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others. In this paper, we propose a straightforward alternative:side-tuning.
134	Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach	Zerui Chen; Yan Huang; Hongyuan Yu; Bin Xue; Ke Han; Yiru Guo; Liang Wang;	To accurately estimate 3D poses of different body parts, we attempt to build a part-aware 3D pose estimator by searching a set of network architectures.
135	REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets	Angelina Wang; Arvind Narayanan; Olga Russakovsky;	Overall, the key aim of our work is to tackle the machine learning bias problem early in the pipeline.
136	Contrastive Learning for Weakly Supervised Phrase Grounding	Tanmay Gupta; Arash Vahdat; Gal Chechik; Xiaodong Yang; Jan Kautz; Derek Hoiem;	We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words.
137	Collaborative Learning of Gesture Recognition and 3D Hand Pose Estimation with Multi-Order Feature Analysis	Siyuan Yang; Jun Liu; Shijian Lu; Meng Hwa Er; Alex C. Kot;	In this paper, we present a novel collaborative learning network for joint gesture recognition and 3D hand pose estimation.
138	Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors	Zuxuan Wu; Ser-Nam Lim; Larry S. Davis; Tom Goldstein;	We present a systematic study of adversarial attacks on state-of-the-art object detection frameworks.
139	TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images	Jianxin Lin; Yingxue Pang; Yingce Xia; Zhibo Chen; Jiebo Luo;	In this paper, we argue that even if each domain contains a single image, UI2I can still be achieved.
140	Semi-Siamese Training for Shallow Face Learning	Hang Du; Hailin Shi; Yuchi Liu; Jun Wang; Zhen Lei; Dan Zeng; Tao Mei;	In this paper, we aim to address the problem by introducing a novel training method named Semi-Siamese Training (SST).
141	GAN Slimming: All-in-One GAN Compression by A Unified Optimization Framework	Haotao Wang; Shupeng Gui; Haichuan Yang; Ji Liu; Zhangyang Wang;	To this end, we propose the first end-to-end optimization framework combining multiple compression means for GAN compression, dubbed GAN Slimming (GS).
142	Human Interaction Learning on 3D Skeleton Point Clouds for Video Violence Recognition	Yukun Su; Guosheng Lin; Jinhui Zhu; Qingyao Wu;	This paper introduces a new method for recognizing violent behavior by learning contextual relationships between related people from human skeleton points.
143	Binarized Neural Network for Single Image Super Resolution	Jingwei Xin; Nannan Wang; Xinrui Jiang; Jie Li; Heng Huang; Xinbo Gao;	We propose a simple but effective binary neural networks (BNN) based SISR model with a novel binarization scheme.
144	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation	Huiyu Wang; Yukun Zhu; Bradley Green; Hartwig Adam; Alan Yuille; Liang-Chieh Chen;	In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions.
145	Adaptive Computationally Efficient Network for Monocular 3D Hand Pose Estimation	Zhipeng Fan; Jun Liu; Yao Wang;	In this paper, we investigate the problem of reducing the overall computation cost yet maintaining the high accuracy for 3D hand pose estimation from video sequences.
146	Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking	Jinlong Peng; Changan Wang; Fangbin Wan; Yang Wu; Yabiao Wang; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Yanwei Fu;	Going beyond these sub-optimal frameworks, we propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution (the first as far as we know).
147	Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets	Tong Wu; Qingqiu Huang; Ziwei Liu; Yu Wang; Dahua Lin;	We present a new loss function called Distribution-Balanced Loss for the multi-label recognition problems that exhibit long-tailed class distributions.
148	Hamiltonian Dynamics for Real-World Shape Interpolation	Marvin Eisenberger; Daniel Cremers;	We revisit the classical problem of 3D shape interpolation and propose a novel, physically plausible approach based on Hamiltonian dynamics.
149	Learning to Scale Multilingual Representations for Vision-Language Tasks	Andrea Burns; Donghyun Kim; Derry Wijaya; Kate Saenko; Bryan A. Plummer;	In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance.
150	Multi-modal Transformer for Video Retrieval	Valentin Gabeur; Chen Sun; Karteek Alahari; Cordelia Schmid;	In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others.
151	Feature Representation Matters: End-to-End Learning for Reference-based Image Super-resolution	Yanchun Xie; Jimin Xiao; Mingjie Sun; Chao Yao; Kaizhu Huang;	In this paper, we are aiming for a general reference-based super-resolution setting: it does not require the low-resolution image and the high-resolution reference image to be well aligned or with a similar texture.
152	RobustFusion: Human Volumetric Capture with Data-driven Visual Cues using a RGBD Camera	Zhuo Su; Lan Xu; Zerong Zheng; Tao Yu; Yebin Liu; Lu Fang;	In this paper, inspired by the huge potential of learning-based human modeling, we propose RobustFusion, a robust human performance capture system combined with various data-driven visual cues using a single RGBD camera.
153	Surface Normal Estimation of Tilted Images via Spatial Rectifier	Tien Do; Khiem Vuong; Stergios I. Roumeliotis; Hyun Soo Park;	In this paper, we present a spatial rectifier to estimate surface normals of tilted images.
154	Multimodal Shape Completion via Conditional Generative Adversarial Networks	Rundi Wu; Xuelin Chen; Yixin Zhuang; Baoquan Chen;	Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping.
155	Generative Sparse Detection Networks for 3D Single-shot Object Detection	JunYoung Gwak; Christopher Choy; Silvio Savarese;	To this end, we propose Generative Sparse Detection Network (GSDN), a fully-convolutional single-shot sparse detection network that efficiently generates the support for object proposals.
156	Grounded Situation Recognition	Sarah Pratt; Mark Yatskar; Luca Weihs; Ali Farhadi; Aniruddha Kembhavi;	We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities.
157	Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos	Shaoxiang Chen; Wenhao Jiang; Wei Liu; Yu-Gang Jiang;	Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.
158	Unpaired Learning of Deep Image Denoising	Xiaohe Wu; Ming Liu; Yue Cao; Dongwei Ren; Wangmeng Zuo;	We investigate the task of learning blind image denoising networks from an unpaired set of clean and noisy images.
159	Self-supervising Fine-grained Region Similarities for Large-scale Image Localization	Yixiao Ge; Haibo Wang; Feng Zhu; Rui Zhao; Hongsheng Li;	To tackle this challenge, we propose to self-supervise image-to-region similarities in order to fully explore the potential of difficult positive images alongside their sub-regions.
160	Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video	Youngjoong Kwon; Stefano Petrangeli; Dahun Kim; Haoliang Wang; Eunbyung Park; Viswanathan Swaminathan; Henry Fuchs;	To tackle these challenges, we introduce a human-specific framework that employs a learned 3D-aware representation.
161	Side-Aware Boundary Localization for More Precise Object Detection	Jiaqi Wang; Wenwei Zhang; Yuhang Cao; Kai Chen; Jiangmiao Pang; Tao Gong; Jianping Shi; Chen Change Loy; Dahua Lin;	In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch.
162	SF-Net: Single-Frame Supervision for Temporal Action Localization	Fan Ma; Linchao Zhu; Yi Yang; Shengxin Zha; Gourab Kundu; Matt Feiszli; Zheng Shou;	In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL).
163	Negative Margin Matters: Understanding Margin in Few-shot Classification	Bin Liu; Yue Cao; Yutong Lin; Qi Li; Zheng Zhang; Mingsheng Long; Han Hu;	In this paper, we unconventionally propose to adopt appropriate negative-margin to softmax loss for few-shot classification, which surprisingly works well for the open-set scenarios of few-shot classification.
164	Particularity beyond Commonality: Unpaired Identity Transfer with Multiple References	Ruizheng Wu; Xin Tao; Yingcong Chen; Xiaoyong Shen; Jiaya Jia;	We accordingly propose a new multi-reference identity transfer framework by simultaneously making use of particularity and commonality of reference.
165	Tracking Objects as Points	Xingyi Zhou; Vladlen Koltun; Philipp Kr&aumlhenb&uumlhl;	In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art.
166	CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis	Jiadong Liang; Wenjie Pei; Feng Lu;	In this paper we circumvent this problem by focusing on parsing the content of both the input text and the synthesized image thoroughly to model the text-to-image consistency in the semantic level.
167	Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning	Fariborz Taherkhani; Ali Dabouei; Sobhan Soleymani; Jeremy Dawson; Nasser M. Nasrabadi;	In this work, we consider the general setting of the SSL problem for image classification,where the labeled and unlabeled data come from the same underlying distribution.
168	MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning	Simon Vandenhende; Stamatios Georgoulis; Luc Van Gool;	In this paper, we argue about the importance of considering task interactions at multiple scales when distilling task information in a multi-task learning setup.
169	Learning to Factorize and Relight a City	Andrew Liu; Shiry Ginosar; Tinghui Zhou; Alexei A. Efros; Noah Snavely;	We propose a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors.
170	Region Graph Embedding Network for Zero-Shot Learning	Guo-Sen Xie; Li Liu; Fan Zhu; Fang Zhao; Zheng Zhang; Yazhou Yao; Jie Qin; Ling Shao;	In this paper, to model the relations among local image regions, we incorporate the region-based relation reasoning into ZSL.
171	GRAB: A Dataset of Whole-Body Human Grasping of Objects	Omid Taheri; Nima Ghorbani; Michael J. Black; Dimitrios Tzionas;	Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size.
172	DEMEA: Deep Mesh Autoencoders for Non-Rigidly Deforming Objects	Edgar Tretschk; Ayush Tewari; Michael Zollh&oumlfer; Vladislav Golyanik; Christian Theobalt;	We propose a general-purpose DEep MEsh Autoencoder \hbox{(DEMEA)} which adds a novel embedded deformation layer to a graph-convolutional mesh autoencoder.
173	RANSAC-Flow: Generic Two-stage Image Alignment	Xi Shen; Fran&ccedilois Darmon; Alexei A. Efros; Mathieu Aubry;	We propose a two-stage process: first, a feature-based parametric coarse alignment using one or more homographies, followed by non-parametric fine pixel-wise alignment.
174	Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds	Arun Balajee Vasudevan; Dengxin Dai; Luc Van Gool;	We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a $360^{
175	Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images	Kiru Park; Timothy Patten; Markus Vincze;	This paper proposes a method, Neural Object Learning (NOL), that creates synthetic images of objects in arbitrary poses by combining only a few observations from cluttered images.
176	Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking	Jianfeng Yan; Zizhuang Wei; Hongwei Yi; Mingyu Ding; Runze Zhang; Yisong Chen; Guoping Wang; Yu-Wing Tai;	In this paper, we propose an efficient and effective dense hybrid recurrent multi-view stereo net with dynamic consistency checking, namely $D^{2}$HC-RMVSNet, for accurate dense point cloud reconstruction.
177	Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference &amp Application	Xuchong Qiu; Yang Xiao; Chaohui Wang; Renaud Marlet;	The former provides a way to generate large-scale accurate occlusion datasets while, based on the latter, we propose a novel method for task-independent pixel-level occlusion relationship estimation from single images.
178	MovieNet: A Holistic Dataset for Movie Understanding	Qingqiu Huang; Yu Xiong; Anyi Rao; Jiaze Wang; Dahua Lin;	In this paper, we introduce MovieNet — a holistic dataset for movie understanding.
179	Short-Term and Long-Term Context Aggregation Network for Video Inpainting	Ang Li; Shanshan Zhao; Xingjun Ma; Mingming Gong; Jianzhong Qi; Rui Zhang; Dacheng Tao; Ramamohanarao Kotagiri;	In this work, we present a novel context aggregation network to effectively exploit both short-term and long-term frame information for video inpainting.
180	DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization	Juan Du; Rui Wang; Daniel Cremers;	For relocalization in large-scale point clouds, we propose the first approach that unifies global place recognition and local 6DoF pose refinement.
181	Face Super-Resolution Guided by 3D Facial Priors	Xiaobin Hu; Wenqi Ren; John LaMaster; Xiaochun Cao; Xiaoming Li; Zechao Li; Bjoern Menze; Wei Liu;	In this paper, we propose a novel face super-resolution method that explicitly incorporates 3D facial priors which grasp the sharp facial structures.
182	Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation	Yabin Zhang; Bin Deng; Kui Jia; Lei Zhang;	In this work, we take a step further to study the proper extensions of SSL techniques for UDA.
183	Are Labels Necessary for Neural Architecture Search?	Chenxi Liu; Piotr Doll&aacuter; Kaiming He; Ross Girshick; Alan Yuille; Saining Xie;	In this paper, we ask the question: can we find high-quality neural architectures using only images, but no human-annotated labels?
184	BLSM: A Bone-Level Skinned Model of the Human Mesh	Haoyang Wang; Riza Alp G&uumller; Iasonas Kokkinos; George Papandreou; Stefanos Zafeiriou;	We introduce BLSM, a bone-level skinned model of the human body mesh where bone scales are set prior to template synthesis, rather than the common, inverse practice.
185	Associative Alignment for Few-shot Image Classification	Arman Afrasiyabi; Jean-Fran&ccedilois Lalonde; Christian Gagné	This paper proposes the idea of associative alignment for leveraging part of the base data by aligning the novel training instances to the closely related ones in the base training set.
186	Cyclic Functional Mapping: Self-supervised Correspondence between Non-isometric Deformable Shapes	Dvir Ginzburg; Dan Raviv;	We present the first utterly self-supervised network for dense correspondence mapping between non-isometric shapes.
187	View-Invariant Probabilistic Embedding for Human Pose	Jennifer J. Sun; Jiaping Zhao; Liang-Chieh Chen; Florian Schroff; Hartwig Adam; Ting Liu;	In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses.
188	Contact and Human Dynamics from Monocular Video	Davis Rempe; Leonidas J. Guibas; Aaron Hertzmann; Bryan Russell; Ruben Villegas; Jimei Yang;	In this paper, we present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
189	PointPWC-Net: Cost Volume on Point Clouds for (Self-)Supervised Scene Flow Estimation	Wenxuan Wu; Zhi Yuan Wang; Zhuwen Li; Wei Liu; Li Fuxin;	We propose a novel end-to-end deep scene flow model, called PointPWC-Net, that directly processes 3D point cloud scenes with large motions in a coarse-to-fine fashion.
190	Points2Surf Learning Implicit Surfaces from Point Clouds	Philipp Erler; Paul Guerrero; Stefan Ohrhallinger; Niloy J. Mitra; Michael Wimmer;	We present Points2Surf, a novel patch-based learning framework that produces accurate surfaces directly from raw scans without normals.
191	Few-Shot Scene-Adaptive Anomaly Detection	Yiwei Lu; Frank Yu; Mahesh Kumar Krishna Reddy; Yang Wang;	In this paper, we propose a novel few-shot scene-adaptive anomaly detection problem to address the limitations of previous approaches.
192	Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting	Bindita Chaudhuri; Noranart Vesdapunt; Linda Shapiro; Baoyuan Wang;	We propose an end-to-end framework that jointly learns a personalized face model per user and per-frame facial motion parameters from a large corpus of in-the-wild videos of user expressions.
193	Entropy Minimisation Framework for Event-based Vision Model Estimation	Urbano Miguel Nunes; Yiannis Demiris;	We propose a novel EMin framework for event-based vision model estimation.
194	Reconstructing NBA Players	Luyang Zhu; Konstantinos Rematas; Brian Curless; Steven M. Seitz; Ira Kemelmacher-Shlizerman;	Based on these models, we introduce a new method that takes as input a single photo of a clothed player performing any basketball pose and outputs a high resolution mesh and pose of that player.
195	PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments	Zhiming Chen; Kean Chen; Weiyao Lin; John See; Hui Yu; Yan Ke; Cong Yang;	Therefore, a novel loss, Pixels-IoU (PIoU) Loss, is formulated to exploit both the angle and IoU for accurate OBB regression.
196	TENet: Triple Excitation Network for Video Salient Object Detection	Sucheng Ren; Chu Han; Xin Yang; Guoqiang Han; Shengfeng He;	In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations.
197	Deep Feedback Inverse Problem Solver	Wei-Chiu Ma; Shenlong Wang; Jiayuan Gu; Sivabalan Manivasagam; Antonio Torralba; Raquel Urtasun;	We present an efficient, effective, and generic approach towards solving inverse problems.
198	Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification	Liuyu Xiang; Guiguang Ding; Jungong Han;	In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME).
199	Hallucinating Visual Instances in Total Absentia	Jiayan Qiu; Yiding Yang; Xinchao Wang; Dacheng Tao;	In this paper, we investigate a new visual restoration task, termed as hallucinating visual instances in total absentia (HVITA).
200	Weakly-supervised 3D Shape Completion in the Wild	Jiayuan Gu; Wei-Chiu Ma; Sivabalan Manivasagam; Wenyuan Zeng; Zihao Wang; Yuwen Xiong; Hao Su; Raquel Urtasun;	To this end, we propose a weakly-supervised method to estimate both 3D canonical shape and 6-DoF pose for alignment, given multiple partial observations associated with the same instance
201	DTVNet: Dynamic Time-lapse Video Generation via Single Still Image	Jiangning Zhang; Chao Xu; Liang Liu; Mengmeng Wang; Xia Wu; Yong Liu; Yunliang Jiang;	This paper presents a novel end-to-end dynamic time-lapse video generation framework, named DTVNet, to generate diversified time-lapse videos from a single landscape image, which are conditioned on normalized motion vectors.
202	CLIFFNet for Monocular Depth Estimation with Hierarchical Embedding Loss	Lijun Wang; Jianming Zhang; Yifan Wang; Huchuan Lu; Xiang Ruan;	This paper proposes a hierarchical loss for monocular depth estimation, which measures the differences between the prediction and ground truth in hierarchical embedding spaces of depth maps.
203	Collaborative Video Object Segmentation by Foreground-Background Integration	Zongxin Yang; Yunchao Wei; Yi Yang;	This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.
204	Adaptive Margin Diversity Regularizer for handling Data Imbalance in Zero-Shot SBIR	Titir Dutta; Anurag Singh; Soma Biswas;	Since most real-world training data have a fair amount of imbalance in this work, for the first time in literature, we extensively study the effect of training data imbalance on the generalization to unseen categories, with ZS-SBIR as the application area.
205	ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation	Xucong Zhang; Seonwook Park; Thabo Beeler; Derek Bradley; Siyu Tang ; Otmar Hilliges;	In this paper, we propose a new gaze estimation dataset called ETH-XGaze, consisting of over one million high-resolution images of varying gaze under extreme head poses.
206	Calibration-free Structure-from-Motion with Calibrated Radial Trifocal Tensors	Viktor Larsson; Nicolas Zobernig; Kasim Taskin; Marc Pollefeys;	In this paper we consider the problem of Structure-from-Motion from images with unknown intrinsic calibration.
207	Occupancy Anticipation for Efficient Exploration and Navigation	Santhosh K. Ramakrishnan; Ziad Al-Halah; Kristen Grauman;	We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
208	Unified Image and Video Saliency Modeling	Richard Droste; Jianbo Jiao; J. Alison Noble;	To address this we propose four novel domain adaptation techniques – Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN – in addition to an improved formulation of learned Gaussian priors.
209	TAO: A Large-Scale Benchmark for Tracking Any Object	Achal Dave; Tarasha Khurana; Pavel Tokmakov; Cordelia Schmid; Deva Ramanan;	To bridge this gap, we introduce a similarly diverse dataset for Tracking Any Object (TAO).
210	A Generalization of Otsu’s Method and Minimum Error Thresholding	Jonathan T. Barron;	We present Generalized Histogram Thresholding (GHT), a simple, fast, and effective technique for histogram-based image thresholding.
211	A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks	Unnat Jain; Luca Weihs; Eric Kolve; Ali Farhadi; Svetlana Lazebnik; Aniruddha Kembhavi; Alexander Schwing;	Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal.
212	Big Transfer (BiT): General Visual Representation Learning	Alexander Kolesnikov; Lucas Beyer; Xiaohua Zhai; Joan Puigcerver; Jessica Yung; Sylvain Gelly; Neil Houlsby;	We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT).
213	VisualCOMET: Reasoning about the Dynamic Context of a Still Image	Jae Sung Park; Chandra Bhagavatula; Roozbeh Mottaghi; Ali Farhadi; Yejin Choi;	We propose Visual COMET, the novel framework of visual common-sense reasoning tasks to predict events that might have happened before, events that might happen next, and the intents of the people at present.
214	Few-shot Action Recognition with Permutation-invariant Attention	Hongguang Zhang; Li Zhang; Xiaojuan Qi; Hongdong Li; Philip H. S. Torr; Piotr Koniusz;	Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos.
215	Character Grounding and Re-Identification in Story of Videos and Text Descriptions	Youngjae Yu; Jongseok Kim; Heeseung Yun; Jiwan Chung; Gunhee Kim;	In order to solve these related tasks in a mutually rewarding way, we propose a model named Character in Story Identification Network (CiSIN).
216	AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling	Wenshuo Ma; Tingzhong Tian; Hang Xu; Yimin Huang; Zhenguo Li;	In this paper, we study the problem of automatically optimizing anchor boxes for object detection.
217	Learning Visual Context by Comparison	Minchul Kim; Jongchan Park; Seil Na; Chang Min Park; Donggeun Yoo;	In this paper, we present Attend-and-Compare Module (ACM) for capturing the difference between an object of interest and its corresponding context.
218	Large Scale Holistic Video Understanding	Ali Diba; Mohsen Fayyaz; Vivek Sharma; Manohar Paluri; J&uumlrgen Gall; Rainer Stiefelhagen; Luc Van Gool;	We fill this gap by presenting a large-scale “Holistic Video Understanding Dataset” (HVU).
219	Indirect Local Attacks for Context-aware Semantic Segmentation Networks	Krishna Kanth Nakka; Mathieu Salzmann;	To this end, we introduce an indirect attack strategy, namely adaptive local attacks, aiming to find the best image location to perturb, while preserving the labels at this location and producing a realistic-looking segmentation map.
220	Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings	Anita Rau; Guillermo Garcia-Hernando; Danail Stoyanov; Gabriel J. Brostow; Daniyar Turmukhambetov;	While we don’t obviate the need for geometric verification, we propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup.
221	Connecting Vision and Language with Localized Narratives	Jordi Pont-Tuset; Jasper Uijlings; Soravit Changpinyo; Radu Soricut; Vittorio Ferrari;	We propose Localized Narratives, a new form of multimodal image annotations connecting vision and language.
222	Adversarial T-shirt! Evading Person Detectors in A Physical World	Kaidi Xu; Gaoyuan Zhang; Sijia Liu; Quanfu Fan; Mengshu Sun; Hongge Chen; Pin-Yu Chen; Yanzhi Wang; Xue Lin;	In this work, we proposed adversarial T-shirts, a robust physical adversarial example for evading person detectors even if it could undergo non-rigid deformation due to a moving person’s pose changes.
223	Bounding-box Channels for Visual Relationship Detection	Sho Inayoshi; Keita Otani; Antonio Tejero-de-Pablos; Tatsuya Harada;	In this paper, we propose the bounding-box channels, a novel architecture capable of relating the semantic, spatial, and image features strongly.
224	Minimal Rolling Shutter Absolute Pose with Unknown Focal Length and Radial Distortion	Zuzana Kukelova; Cenek Albl; Akihiro Sugimoto; Konrad Schindler; Tomas Pajdla;	We present the first minimal solutions for the absolute pose of a rolling shutter camera with unknown rolling shutter parameters, focal length, and radial distortion.
225	SRFlow: Learning the Super-Resolution Space with Normalizing Flow	Andreas Lugmayr; Martin Danelljan; Luc Van Gool; Radu Timofte;	In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input.
226	DeepGMR: Learning Latent Gaussian Mixture Models for Registration	Wentao Yuan; Benjamin Eckart; Kihwan Kim; Varun Jampani; Dieter Fox ; Jan Kautz;	In this paper, we introduce Deep Gaussian Mixture Registration (DeepGMR), the first learning-based registration method that explicitly leverages a probabilistic registration paradigm by formulating registration as the minimization of KL-divergence between two probability distributions modeled as mixtures of Gaussians.
227	Active Perception using Light Curtains for Autonomous Driving	Siddharth Ancha; Yaadhav Raaj; Peiyun Hu; Srinivasa G. Narasimhan; David Held;	In this work, we propose a method for 3D object recognition using light curtains, a resource-efficient active sensor that measures depth at selected locations in the environment in a controllable manner.
228	Invertible Neural BRDF for Object Inverse Rendering	Zhe Chen; Shohei Nobuhara; Ko Nishino;	We introduce a novel neural network-based BRDF model and a Bayesian framework for object inverse rendering, i.e., joint estimation of reflectance and natural illumination from a single image of an object of known geometry.
229	Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network	Wenfeng Luo; Meng Yang;	To fully explore the potential of the weak labels, we propose to impose separate treatments of strong and weak annotations via a strong-weakdual-branch network, which discriminates the massive inaccurate weak supervisions from those strong ones.
230	Practical Deep Raw Image Denoising on Mobile Devices	Yuzhi Wang; Haibin Huang; Qin Xu; Jiaming Liu; Yiqun Liu; Jue Wang;	In this work, we propose a light-weight, efficient neural network-based raw image denoiser that runs smoothly on mainstream mobile devices, and produces high quality denoising results.
231	SoundSpaces: Audio-Visual Navigation in 3D Environments	Changan Chen; Unnat Jain; Carl Schissler; Sebastia Vicenc Amengual Gari; Ziad Al-Halah; Vamsi Krishna Ithapu; Philip Robinson; and Kristen Grauman;	We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments.
232	Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization	Yuanhao Zhai; Le Wang; Wei Tang; Qilin Zhang; Junsong Yuan; Gang Hua;	In this paper, we present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges.
233	Erasing Appearance Preservation in Optimization-based Smoothing	Lvmin Zhang; Chengze Li; Yi JI; Chunping Liu; Tien-tsin Wong;	In this paper, we call this manipulation as Erasing Appearance Preservation (EAP).
234	Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler	Tsu-Jui Fu; Xin Eric Wang; Matthew F. Peterson,Scott T. Grafton; Miguel P. Eckstein; William Yang Wang;	We propose an adversarial-driven counterfactual reasoning model that can consider effective conditions instead of low-quality augmented data.
235	Guided Deep Decoder: Unsupervised Image Pair Fusion	Tatsumi Uezato; Danfeng Hong; Naoto Yokoya; Wei He;	To address this limitation, in this study, we propose a guided deep decoder network as a general prior.
236	Filter Style Transfer between Photos	Jonghwa Yim; Jisung Yoo; Won-joon Do; Beomsu Kim; Jihwan Choe;	In this paper, we introduce a new concept of style transfer, Filter Style Transfer (FST).
237	JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image	Linpu Fang; Xingyan Liu; Li Liu; Hang Xu; Wenxiong Kang;	In this paper, a novel pixel-wise prediction-based method is proposed to address the above issues.
238	Dynamic Group Convolution for Accelerating Convolutional Neural Networks	Zhuo Su; Linpu Fang; Wenxiong Kang; Dewen Hu; Matti Pietik&aumlinen; Li Liu;	In this paper, we propose dynamic group convolution (DGC) that adaptively selects which part of input channels to be connected within each group for individual samples on the fly.
239	RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering	Yaoxiong Huang; Mengchao He; Lianwen Jin; Yongpan Wang;	In this paper, a novel radical decomposition-and-rendering-based GAN(RD-GAN) is proposed to utilize the radical-level compositions of Chinese characters and achieves few-shot/zero-shot Chinese character style transfer.
240	Object-Contextual Representations for Semantic Segmentation	Yuhui Yuan; Xilin Chen; Jingdong Wang;	In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy.
241	Efficient Spatio-Temporal Recurrent Neural Network for Video Deblurring	Zhihang Zhong; Ye Gao; Yinqiang Zheng; Bo Zheng;	To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame.
242	Joint Semantic Instance Segmentation on Graphs with the Semantic Mutex Watershed	Steffen Wolf; Yuyan Li; Constantin Pape; Alberto Bailoni; Anna Kreshuk; Fred A. Hamprecht;	We propose a greedy algorithm for joint graph partitioning and labeling derived from the efficient Mutex Watershed partitioning algorithm.
243	Photon-Efficient 3D Imaging with A Non-Local Neural Network	Jiayong Peng; Zhiwei Xiong; Xin Huang; Zheng-Ping Li; Dong Liu; Feihu Xu;	In this paper, we first analyze the long-range correlations in both spatial and temporal dimensions of the measurements. Then we propose a non-local neural network for depth reconstruction by exploiting the long-range correlations.
244	GeLaTO: Generative Latent Textured Objects	Ricardo Martin-Brualla; Rohit Pandey; Sofien Bouaziz; Matthew Brown; Dan B Goldman;	Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance.
245	Improving Vision-and-Language Navigation with Image-Text Pairs from the Web	Arjun Majumdar; Ayush Shrivastava; Stefan Lee; Peter Anderson; Devi Parikh; Dhruv Batra;	Specifically, we develop VLN-BERT, a visiolinguistic transformer-based model for scoring the compatibility between an instruction (‘…stop at the brown sofa’) and a trajectory of panoramic RGB images captured by the agent.
246	Directional Temporal Modeling for Action Recognition	Xinyu Li; Bing Shuai; Joseph Tighe;	In this paper, we introduce a channel independent directional convolution (CIDC) operation, which learns to model the temporal evolution among local features.
247	Shonan Rotation Averaging: Global Optimality by Surfing SO(p)(n)	Frank Dellaert; David M. Rosen; Jing Wu; Robert Mahony; Luca Carlone;	Our method employs semidefinite relaxation in order to recover provably globally optimal solutions of the rotation averaging problem.
248	Semantic Curiosity for Active Visual Learning	Devendra Singh Chaplot; Helen Jiang; Saurabh Gupta; Abhinav Gupta;	In this paper, we study the task of embodied interactive learning for object detection.
249	Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training	Dongwon Park; Dong Un Kang; Jisoo Kim; Se Young Chun;	To realize MT approach, we propose progressive deblurring over iterations and incremental temporal training with temporally augmented training data.
250	ProgressFace: Scale-Aware Progressive Learning for Face Detection	Jiashu Zhu; Dong Li; Tiantian Han; Lu Tian; Yi Shan;	In this work, we propose a novel scale-aware progressive training mechanism to address large scale variations across faces.
251	Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference	Erik Nijkamp; Bo Pang; Tian Han; Linqi Zhou; Song-Chun Zhu; Ying Nian Wu;	In this paper, we propose to use noise initialized non-persistent short run MCMC, such as finite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution.
252	CoTeRe-Net: Discovering Collaborative Ternary Relations in Videos	Zhensheng Shi; Cheng Guan; Liangjie Cao; Qianqian Li; Ju Liang; Zhaorui Gu; Haiyong Zheng; Bing Zheng;	In this paper, we propose a novel relation model that discovers relations of both implicit and explicit cues as well as their collaboration in videos.
253	Modeling the Effects of Windshield Refraction for Camera Calibration	Frank Verbiest; Marc Proesmans; Luc Van Gool;	In this paper, we study the effects of windshield refraction for autonomous driving applications.
254	Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images through Generative Latent Search	Prashant Pandey; Aayush Kumar Tyagi; Sameer Ambekar; Prathosh AP;	We propose a method for target-independent segmentation where the ‘nearest-clone’ of a target image in the source domain is searched and used as a proxy in the segmentation network trained only on the source domain.
255	PROFIT: A Novel Training Method for sub-4-bit MobileNet Models	Eunhyeok Park; Sungjoo Yoo;	In this work, we report that the activation instability induced by weight quantization (AIWQ) is the key obstacle to sub-4-bit quantization of mobile networks.
256	Visual Relation Grounding in Videos	Junbin Xiao; Xindi Shang; Xun Yang; Sheng Tang; Tat-Seng Chua;	In this paper, we explore a novel task named visual Relation Grounding in Videos (vRGV).
257	Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows	Andrei Zanfir; Eduard Gabriel Bazavan; Hongyi Xu; William T. Freeman; Rahul Sukthankar; Cristian Sminchisescu;	In this paper we present new priors as well as large-scale weakly supervised models for 3D human pose and shape estimation.
258	Controlling Style and Semantics in Weakly-Supervised Image Generation	Dario Pavllo; Aurelien Lucchi; Thomas Hofmann;	We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene.
259	Jointly learning visual motion and confidence from local patches in event cameras	Daniel R. Kepple; Daewon Lee; Colin Prepsius; Volkan Isler; Il Memming Park; Daniel D. Lee;	We propose the first network to jointly learn visual motion and confidence from events in spatially local patches.
260	SODA: Story Oriented Dense Video Captioning Evaluation Framework	Soichiro Fujita; Tsutomu Hirao; Hidetaka Kamigaito; Manabu Okumura; Masaaki Nagata;	This paper proposes a new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), for measuring the performance of video story description systems.
261	Sketch-Guided Object Localization in Natural Images	Aditay Tripathi; Rajath R. Dani; Anand Mishra and Anirban Chakraborty;	We introduce a novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query.
262	A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses	Malik Boudiaf; J&eacuter&ocircme Rony; Imtiaz Masud Ziko; Eric Granger; Marco Pedersoli; Pablo Piantanida; Ismail Ben Ayed;	However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses.
263	Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models	Jize Cao; Zhe Gan; Yu Cheng; Licheng Yu; Yen-Chun Chen; Jingjing Liu;	To reveal the secrets behind the scene, we present VALUE (Vision-And-Language Understanding Evaluation), a set of meticulously designed probing tasks (e.g., Visual Coreference Resolution, Visual Relation Detection) generalizable to standard pre-trained V+L models, to decipher the inner workings of multimodal pre-training (e.g., implicit knowledge garnered in individual attention heads, inherent cross-modal alignment learned through contextualized multimodal embeddings).
264	The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement	William Peebles; John Peebles; Jun-Yan Zhu; Alexei Efros; Antonio Torralba;	In this paper, we propose the Hessian Penalty, a simple regularization function that encourages the input Hessian of a function to be diagonal.
265	STAR: Sparse Trained Articulated Human Body Regressor	Ahmed A. A. Osman; Timo Bolkart; Michael J. Black;	To address this, we define per-joint pose correctives and learn the subset of mesh vertices that are influenced by each joint movement. This sparse formulation results in more realistic deformations and significantly reduces the number of model parameters to 20% of SMPL.
266	Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer	Xinghao Chen; Yiman Zhang; Yunhe Wang; Han Shu; Chunjing Xu; Chang Xu;	This paper proposes to learn a lightweight video style transfer network via knowledge distillation paradigm.
267	Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning	Sihui Luo; Wenwen Pan; Xinchao Wang; Dazhou Wang; Haihong Tang; Mingli Song;	In this paper, we study how to reuse such heterogeneous pre-trained models as teachers, and build a versatile and compact student model, without accessing human annotations.
268	Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians	Shizhen Zhao; Changxin Gao; Jun Zhang; Hao Cheng; Chuchu Han; Xinyang Jiang; Xiaowei Guo; Wei-Shi Zheng; Nong Sang; Xing Sun;	To address this problem, this paper presents a novel deep network termed Pedestrian-Interference Suppression Network (PISNet).
269	Learning 3D Part Assembly from a Single Image	Yichen Li; Kaichun Mo; Lin Shao; Minhyuk Sung; Leonidas Guibas;	Towards this end, we introduce a novel problem,single-image-guided 3D part assembly, along with a learning-based solution.
270	PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions	Kaichun Mo; He Wang; Xinchen Yan; Leonidas Guibas;	In order to learn such a conditional shape generation procedure in an end-to-end fashion, we propose a conditional GAN “part tree”-to-“point cloud” model (PT2PC) that disentangles the structural and geometric factors.
271	Highly Efficient Salient Object Detection with 100K Parameters	Shang-Hua Gao; Yong-Qiang Tan; Ming-Ming Cheng; Chengze Lu; Yunpeng Chen; Shuicheng Yan;	In this paper, we aim to relieve the contradiction between computation cost and model performance by improving the network efficiency to a higher degree.
272	HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing	Qili Deng; Ziling Huang; Chung-Chi Tsai; Chia-Wen Lin;	In this paper, we present a Haze-Aware Representation Distillation Generative Adversarial Network named HardGAN for single-image dehazing.
273	Lifespan Age Transformation Synthesis	Roy Or-El; Soumyadip Sengupta; Ohad Fried; Eli Shechtman; Ira Kemelmacher-Shlizerman;	We propose a new multi domain image-to-image generative adversarial network architecture, whose learned latent space accurately models the continuous aging process in both directions.
274	Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation	Xingchao Peng; Yichen Li; Kate Saenko;	To describe and learn relations between different domains, we propose a novel Domain2Vec model to provide vectorial representations of visual domains based on joint learning of feature disentanglement and Gram matrix.
275	Simulating Content Consistent Vehicle Datasets with Attribute Descent	Yue Yao; Liang Zheng; Xiaodong Yang; Milind Naphade; Tom Gedeon;	We propose an attribute descent approach to let VehicleX approximate the attributes in real-world datasets.
276	Multiview Detection with Feature Perspective Transformation	Yunzhong Hou; Liang Zheng; Stephen Gould;	To address these questions, we introduce a novel multiview detector, MVDet.
277	Learning Object Relation Graph and Tentative Policy for Visual Navigation	Heming Du; Xin Yu; Liang Zheng;	Aiming to improve these two components, this paper proposes three complementary techniques, object relation graph (ORG),trial-driven imitation learning (IL), and a memory-augmented tentative policy network (TPN).
278	Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition	Chenyang Si; Xuecheng Nie; Wei Wang; Liang Wang; Tieniu Tan; Jiashi Feng;	To address these issues, we present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme via neighbor relation exploration and adversarial learning.
279	Across Scales &amp Across Dimensions: Temporal Super-Resolution using Deep Internal Learning	Liad Pollak Zuckerman; Eyal Naor; George Pisha; Shai Bagon; Michal Irani;	In this paper we propose a “Deep Internal Learning” approach for trueTSR.
280	Inducing Optimal Attribute Representations for Conditional GANs	Binod Bhattarai; Tae-Kyun Kim;	We propose a novel end-to-end learning framework based on Graph Convolutional Networks to learn the attribute representations to condition the generator.
281	AR-Net: Adaptive Frame Resolution for Efficient Action Recognition	Yue Meng; Chung-Ching Lin; Rameswar Panda; Prasanna Sattigeri; Leonid Karlinsky; Aude Oliva; Kate Saenko; Rogerio Feris;	In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos.
282	Image-to-Voxel Model Translation for 3D Scene Reconstruction and Segmentation	Vladimir V. Kniaz; Vladimir A. Knyaz; Fabio Remondino; Artem Bordodymov; Petr Moshkantsev;	We propose a single shot image-to-semantic voxel model translation framework. We collected a SemanticVoxels dataset with 116k images, ground-truth semantic voxel models, depth maps, and 6D object poses.
283	Consistency Guided Scene Flow Estimation	Yuhua Chen; Luc Van Gool; Cordelia Schmid; Cristian Sminchisescu;	The model takes two temporal stereo pairs as input, and predicts disparity and scene flow.
284	Autoregressive Unsupervised Image Segmentation	Yassine Ouali; C&eacuteline Hudelot; Myriam Tami;	In this work, we propose a new unsupervised image segmentation approach based on mutual information maximization between different constructed views of the inputs.
285	Controllable Image Synthesis via SegVAE	Yen-Chi Cheng; Hsin-Ying Lee; Min Sun; Ming-Hsuan Yang;	In this work, we specifically target at generating semantic maps given a label-set consisting of desired categories.
286	Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search	Yuan Tian; Qin Wang; Zhiwu Huang; Wen Li; Dengxin Dai; Minghao Yang ; Jun Wang; Olga Fink;	In this paper, we introduce a new reinforcement learning (RL) based neural architecture search (NAS) methodology for effective and efficient generative adversarial network (GAN) architecture search.
287	Efficient Non-Line-of-Sight Imaging from Transient Sinograms	Mariko Isogawa; Dorian Chan; Ye Yuan; Kris Kitani; Matthew O’Toole;	We propose a circular and confocal non-line-of-sight (C$^2$NLOS) scan that involves illuminating and imaging a common point, and scanning this point in a circular path along a wall.
288	Texture Hallucination for Large-Factor Painting Super-Resolution	Yulun Zhang; Zhifei Zhang; Stephen DiVerdi; Zhaowen Wang; Jose Echevarria; Yun Fu;	We aim to super-resolve digital paintings, synthesizing realistic details from high-resolution reference painting materials for very large scaling factors (g 8$ imes$, 16$ imes$).
289	Learning Progressive Joint Propagation for Human Motion Prediction	Yujun Cai; Lin Huang; Yiwei Wang; Tat-Jen Cham; Jianfei Cai; Junsong Yuan; Jun Liu; Xu Yang; Yiheng Zhu; Xiaohui Shen; Ding Liu; Jing Liu; Nadia Magnenat Thalmann;	In this paper, we address this problem in three aspects. First, to capture the long-range spatial correlations and temporal dependencies, we apply a transformer-based architecture with the global attention mechanism.
290	Image Stitching and Rectification for Hand-Held Cameras	Bingbing Zhuang; Quoc-Huy Tran;	In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke.
291	ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds	Gopal Sharma; Difan Liu; Subhransu Maji; Evangelos Kalogerakis; Siddhartha Chaudhuri; Radom&iacuter M?ch;	We propose a novel, end-to-end trainable, deep network called ParSeNet
292	The Group Loss for Deep Metric Learning	Ismail Elezi; Sebastiano Vascon; Alessandro Torcinovich; Marcello Pelillo; Laura Leal-Taixé	We propose Group Loss,a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group while promoting, at the same time, low-density regions amongst data points belonging to different groups.
293	Learning Object Depth from Camera Motion and Video Object Segmentation	Brent A. Griffin; Jason J. Corso;	To leverage this progress in 3D applications, this paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry).
294	OnlineAugment: Online Data Augmentation with Less Domain Knowledge	Zhiqiang Tang; Yunhe Gao; Leonid Karlinsky; Prasanna Sattigeri; Rogerio Feris; Dimitris Metaxas;	In this work, we offer an orthogonal extit{online} data augmentation scheme together with three new augmentation networks, co-trained with the target learning task.
295	Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction	Yiming Qian; Yasutaka Furukawa;	This paper proposes a novel single-image piecewise planar reconstruction technique that infers and enforces inter-plane relationships.
296	Intra-class Feature Variation Distillation for Semantic Segmentation	Yukang Wang; Wei Zhou; Tao Jiang; Xiang Bai; Yongchao Xu;	In this paper, different from previous methods performing knowledge distillation for densely pairwise relations, we propose a novel intra-class feature variation distillation (IFVD) to transfer the intra-class feature variation (IFV) of the cumbersome model (teacher) to the compact model (student).
297	Temporal Distinct Representation Learning for Action Recognition	Junwu Weng; Donghao Luo; Yabiao Wang; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Xudong Jiang; Junsong Yuan;	In this paper, we attempt to tackle this issue through two ways. 1) Design a sequential channel filtering mechanism, Progressive Enhancement Module (PEM), to excite the discriminative channels of features from different frames step by step, and thus avoid repeated information extraction. 2) Create a Temporal Diversity Loss (TD Loss) to force the kernels to concentrate on and capture the variations among frames rather than the image regions with similar appearance.
298	Representative Graph Neural Network	Changqian Yu; Yifan Liu; Changxin Gao; Chunhua Shen; Nong Sang;	In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.
299	Deformation-Aware 3D Model Embedding and Retrieval	Mikaela Angelina Uy; Jingwei Huang; Minhyuk Sung; Tolga Birdal; Leonidas Guibas;	We introduce a new problem of mph{retrieving} 3D models that are mph{deformable} to a given query shape and present a novel deep mph{deformation-aware} embedding to solve this retrieval task.
300	Atlas: End-to-End 3D Scene Reconstruction from Posed Images	Zak Murez; Tarrence van As; James Bartolozzi; Ayan Sinha; Vijay Badrinarayanan; Andrew Rabinovich;	We present an end-to-end 3D reconstruction of a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images.
301	Multiple Class Novelty Detection Under Data Distribution Shift	Poojan Oza; Hien V. Nguyen; Vishal M. Patel;	To this end, we consider the problem of multiple class novelty detection under dataset distribution shift to improve the novelty detection performance.
302	Colorization of Depth Map via Disentanglement	Chung-Sheng Lai; Zunzhi You; Ching-Chun Huang; Yi-Hsuan Tsai; Wei-Chen Chiu;	In this paper, we propose a depth map colorization method via disentangling appearance and structure factors, so that our model could 1) learn depth-invariant appearance features from an appearance reference and 2) generate colorized images by combining a given depth map and the appearance feature obtained from any reference.
303	Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes	Johanna Wald; Torsten Sattler; Stuart Golodetz; Tommaso Cavallari; Federico Tombari;	In this paper, we adapt 3RScan — a recently introduced indoor RGB-D dataset designed for object instance re-localization — to create RIO10, a new long-term camera re-localization benchmark focused on indoor scenes.
304	GeoGraph: Graph-based multi-view object detection with geometric cues end-to-end	Ahmed Samy Nassar; Stefano D’Aronco; S&eacutebastien Lef&egravevre; Jan D. Wegner;	In this paper, we propose an end-to-end learnable approach that detects static urban objects from multiple views, re-identifies instances, and finally assigns a geographic position per object.
305	Localizing the Common Action Among a Few Videos	Pengwan Yang; Vincent Tao Hu; Pascal Mettes; Cees G. M. Snoek;	To address this task, we introduce a new 3D convolutional network architecture able to align representations from the support videos with the relevant query video segments.
306	TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification	Moshe Lichtenstein; Prasanna Sattigeri; Rogerio Feris; Raja Giryes; Leonid Karlinsky;	In this paper we propose yet another simple technique that is important for the few shot learning performance – a search for a compact feature sub-space that is discriminative for a given few-shot test task.
307	Traffic Accident Benchmark for Causality Recognition	Tackgeun You; Bohyung Han;	We propose a brand new benchmark for analyzing causality in traffic accident videos by decomposing an accident into a pair of events, cause and effect.
308	Face Anti-Spoofing with Human Material Perception	Zitong Yu; Xiaobai Li; Xuesong Niu; Jingang Shi; Guoying Zhao;	In this paper we rephrase face anti-spoofing as a material recognition problem and combine it with classical human material perception, intending to extract discriminative and robust features for FAS.
309	How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction	Huikun Bi; Ruisi Zhang; Tianlu Mao; Zhigang Deng; Zhaoqi Wang;	This work presents a novel First-person View based Trajectory predicting model (FvTraj) to estimate the future trajectories of pedestrians in a scene given their observed trajectories and the corresponding first-person view images.
310	Multiple Expert Brainstorming for Domain Adaptive Person Re-identification	Yunpeng Zhai; Qixiang Ye; Shijian Lu; Mengxi Jia; Rongrong Ji; Yonghong Tian;	In this paper, we propose a multiple expert brainstorming network (MEB-Net) for domain adaptive person re-ID, opening up a promising direction about model ensemble problem under unsupervised conditions.
311	NASA Neural Articulated Shape Approximation	Boyang Deng; JP Lewis; Timothy Jeruzalski; Gerard Pons-Moll; Geoffrey Hinton; Mohammad Norouzi; Andrea Tagliasacchi;	This paper introduces neural articulated shape approximation (NASA), an alternative framework that enables efficient representation of articulated deformable objects using neural indicator functions that are conditioned on pose.
312	Towards Unique and Informative Captioning of Images	Zeyu Wang; Berthy Feng; Karthik Narasimhan; Olga Russakovsky;	We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be ‘topped’ using simple captioning systems relying on object detectors.
313	When Does Self-supervision Improve Few-shot Learning?	Jong-Chyi Su; Subhransu Maji; Bharath Hariharan;	Based on this analysis we present a technique that automatically selects images for SSL from a large, generic pool of unlabeled images for a given dataset that provides further improvements.
314	Two-branch Recurrent Network for Isolating Deepfakes in Videos	Iacopo Masi; Aditya Killekar; Royston Marian Mascarenhas; Shenoy Pratik Gurudatt; Wael AbdAlmageed;	We present a method for deepfake detection based on a two-branch network structure that isolates digitally manipulated faces by learning to amplify artifacts while suppressing the high-level face content.
315	Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment	Qing Liu; Orchid Majumder; Alessandro Achille; Avinash Ravichandran; Rahul Bhotika; Stefano Soatto;	We propose a method to train a model so it can learn new classification tasks while improving with each task solved.
316	BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models	Jiahui Yu; Pengchong Jin; Hanxiao Liu; Gabriel Bender; Pieter-Jan Kindermans; Mingxing Tan; Thomas Huang; Xiaodan Song; Ruoming Pang; Quoc Le;	In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies.
317	Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation	Sheng Jin; Wentao Liu; Enze Xie; Wenhai Wang; Chen Qian; Wanli Ouyang; Ping Luo;	In this paper, we investigate a new perspective of human part grouping and reformulate it as a graph clustering task.
318	Global Distance-distributions Separation for Unsupervised Person Re-identification	Xin Jin; Cuiling Lan; Wenjun Zeng; Zhibo Chen;	To address this problem, we introduce a global distance-distributions separation (GDS) constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view.
319	I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human Pose and Mesh Estimation from a Single RGB Image	Gyeongsik Moon; Kyoung Mu Lee;	To resolve the above issues, we propose I2L-MeshNet, an image-to-lixel(line+pixel) prediction network.
320	Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh Recovery from a 2D Human Pose	Hongsuk Choi; Gyeongsik Moon; Kyoung Mu Lee;	To overcome the above weaknesses, we propose Pose2Mesh, a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human {m mesh vertices} directly from the {m 2D human pose}.
321	ALRe: Outlier Detection for Guided Refinement	Mingzhu Zhu; Zhang Gao; Junzhi Yu; Bingwei He; Jiantao Liu;	In this paper, we propose a general outlier detection method for guided refinement.
322	Weakly-Supervised Crowd Counting Learns from Sorting rather than Locations	Yifan Yang; Guorong Li; Zhe Wu; Li Su; Qingming Huang; Nicu Sebe;	In this paper, we propose a weakly-supervised counting network, which directly regresses the crowd numbers without the location supervision.
323	Unsupervised Domain Attention Adaptation Network for Caricature Attribute Recognition	Wen Ji; Kelei He; Jing Huo; Zheng Gu; Yang Gao;	To facility the research in attribute learning of caricatures, we propose a caricature attribute dataset, namely WebCariA.
324	Many-shot from Low-shot: Learning to Annotate using Mixed Supervision for Object Detection	Carlo Biffi; Steven McDonagh; Philip Torr; Aleš Leonardis; Sarah Parisot;	Towards solving this problem we introduce, for the first time, an online annotation module (OAM) that learns to generate a many-shot set of mph{reliable} annotations from a larger volume of weakly labelled images.
325	Curriculum DeepSDF	Yueqi Duan; Haidong Zhu; He Wang; Li Yi Ram Nevatia; Leonidas J. Guibas;	In this paper, we design a “&quot&quotshape curriculum” for learning continuous Signed Distance Function (SDF) on shapes, namely Curriculum DeepSDF.
326	Meshing Point Clouds with Predicted Intrinsic-Extrinsic Ratio Guidance	Minghua Liu; Xiaoshuai Zhang; Hao Su;	Instead, we propose to leverage the input point cloud as much as possible, by only adding connectivity information to existing points.
327	Improved Adversarial Training via Learned Optimizer	Yuanhao Xiong; Cho-Jui Hsieh;	In this paper, we empirically demonstrate that the commonly used PGD attack may not be optimal for inner maximization, and improved inner optimizer can lead to a more robust model.
328	Component Divide-and-Conquer for Real-World Image Super-Resolution	Pengxu Wei; Ziwei Xie; Hannan Lu; Zongyuan Zhan; Qixiang Ye; Wangmeng Zuo; Liang Lin;	In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components.
329	Enabling Deep Residual Networks for Weakly Supervised Object Detection	Yunhang Shen; Rongrong Ji; Yan Wang; Zhiwei Chen; Feng Zheng; Feiyue Huang; Yunsheng Wu;	In this paper, we discover the intrinsic root with sophisticated analysis and propose a sequence of design principles to take full advantages of deep residual learning for WSOD from the perspectives of adding redundancy, improving robustness and aligning features.
330	Deep near-light photometric stereo for spatially varying reflectances	Hiroaki Santo; Michael Waechter; Yasuyuki Matsushita;	This paper presents a near-light photometric stereo method for spatially varying reflectances.
331	Learning Visual Representations with Caption Annotations	Mert Bulent Sariyildiz; Julien Perez; Diane Larlus;	To tackle this task, we propose hybrid models, with dedicated visual and textual encoders, and we show that the visual representations learned as a by-product of solving this task transfer well to a variety of target tasks.
332	Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier	Tz-Ying Wu; Pedro Morgado; Pei Wang; Chih-Hui Ho; Nuno Vasconcelos;	Motivated by this, a deep realistic taxonomic classifier (Deep-RTC) is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
333	Regression of Instance Boundary by Aggregated CNN and GCN	Yanda Meng; Wei Meng; Dongxu Gao; Yitian Zhao; Xiaoyun Yang; Xiaowei Huang; Yalin Zheng;	This paper proposes a straightforward, intuitive deep learning approach for (biomedical) image segmentation tasks.
334	Social Adaptive Module for Weakly-supervised Group Activity Recognition	Rui Yan; Lingxi Xie; Jinhui Tang; Xiangbo Shu; Qi Tian;	This paper presents a new task named weakly-supervised group activity recognition (GAR) which differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data.
335	RGB-D Salient Object Detection with Cross-Modality Modulation and Selection	Chongyi Li; Runmin Cong; Yongri Piao; Qianqian Xu; Chen Change Loy;	We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD).
336	RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval	Hung-Yu Tseng; Hsin-Ying Lee; Lu Jiang; Ming-Hsuan Yang; Weilong Yang;	In this work, we aim to synthesize images from scene description with retrieved patches as reference.
337	Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection	Dongzhan Zhou; Xinchi Zhou; Hongwen Zhang; Shuai Yi; Wanli Ouyang;	In this paper, we propose a general and efficient pre-training paradigm, Montage pre-training, for object detection.
338	Faster Person Re-Identification	Guan’an Wang; Shaogang Gong; Jian Cheng; Zengguang Hou;	In this work, we introduce a new solution for fast ReID by formulating a novel Coarse-to-Fine (CtF) hashing code search strategy, which complementarily uses short and long codes, achieving both faster speed and better accuracy.
339	Quantization Guided JPEG Artifact Correction	Max Ehrlich; Ser-Nam Lim; Larry Davis; Abhinav Shrivastava;	We solve this problem by creating a novel architecture which is parameterized by the JPEG file’s quantization matrix.
340	3PointTM: Faster Measurement of High-Dimensional Transmission Matrices	Yujun Chen; Manoj Kumar Sharma; Ashutosh Sabharwal; Ashok Veeraraghavan; Aswin C. Sankaranarayanan;	In this paper, we propose 3PointTM, an approach for sensing TMs that uses a minimal number of measurements per pixel – reducing the measurement budget by a factor of two as compared to state of the art in phase-shifting holography for measuring TMs – and has a low computational complexity as compared to phase retrieval.
341	Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer	Xide Xia; Meng Zhang; Tianfan Xue; Zheng Sun; Hui Fang; Brian Kulis ; Jiawen Chen;	We propose a new end-to-end model for photorealistic style transfer that is both fast and inherently generates photorealistic results.
342	Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction	Xiangyu Zhu; Fan Yang; Di Huang; Chang Yu; Hao Wang; Jianzhu Guo; Zhen Lei; Stan Z. Li;	Secondly, we propose a Fine-Grained reconstruction Network (FGNet) that can concentrate on shape modification by warping the network input and output to the UV space.
343	World-Consistent Video-to-Video Synthesis	Arun Mallya; Ting-Chun Wang; Karan Sapra; Ming-Yu Liu;	In this work, we propose a framework for utilizing all past generated frames when synthesizing each frame.
344	Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation	Qi Fan; Lei Ke; Wenjie Pei; Chi-Keung Tang; Yu-Wing Tai;	We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
345	GMNet: Graph Matching Network for Large Scale Part Semantic Segmentation in the Wild	Umberto Michieli; Edoardo Borsato; Luca Rossi; Pietro Zanuttigh;	In this work, we propose a novel framework combining higher object-level context conditioning and part-level spatial relationships to address the task.
346	Event-based Asynchronous Sparse Convolutional Networks	Nico Messikommer; Daniel Gehrig; Antonio Loquercio; Davide Scaramuzza;	In this work, we present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output, thus directly leveraging the intrinsic asynchronous and sparse nature of the event data.
347	AtlantaNet: Inferring the 3D Indoor Layout from a Single 360(?) Image beyond the Manhattan World Assumption	Giovanni Pintore; Marco Agus; Enrico Gobbetti;	We introduce a novel end-to-end approach to predict a 3D room layout from a single panoramic image.
348	AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification	Xiaofang Wang; Xuehan Xiong; Maxim Neumann; AJ Piergiovanni; Michael S. Ryoo; Anelia Angelova; Kris M. Kitani; Wei Hua;	We propose a novel search space for spatiotemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell.
349	REMIND Your Neural Network to Prevent Catastrophic Forgetting	Tyler L. Hayes; Kushal Kafle; Robik Shrestha; Manoj Acharya; Christopher Kanan;	Here, we propose REMIND, a brain-inspired approach that enables efficient replay with compressed representations.
350	Image Classification in the Dark using Quanta Image Sensors	Abhiram Gnanasambandam; Stanley H. Chan;	In this paper, we present a new low-light image classification solution using Quanta Image Sensors (QIS).
351	n-Reference Transfer Learning for Saliency Prediction	Yan Luo; Yongkang Wong; Mohan S. Kankanhalli; Qi Zhao;	To solve this problem, we propose a few-shot transfer learning paradigm for saliency prediction, which enables efficient transfer of knowledge learned from the existing large-scale saliency datasets to a target domain with limited labeled examples.
352	Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection	Shuhan Chen; Yun Fu;	In this paper, we aim to develop an efficient and compact deep network for RGB-D salient object detection, where the depth image provides complementary information to boost performance in complex scenarios.
353	Bottom-Up Temporal Action Localization with Mutual Regularization	Peisen Zhao; Lingxi Xie; Chen Ju; Ya Zhang; Yanfeng Wang; Qi Tian;	To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases.
354	On Modulating the Gradient for Meta-Learning	Christian Simon; Piotr Koniusz; Richard Nock; Mehrtash Harandi;	Inspired by optimization techniques, we propose a novel meta-learning algorithm with gradient modulation to encourage fast-adaptation of neural networks in the absence of abundant data.
355	Domain-Specific Mappings for Generative Adversarial Style Transfer	Hsin-Yu Chang; Zhixiang Wang; Yung-Yu Chuang;	For addressing this issue, this paper leverages domain-specific mappings for remapping latent features in the shared content space to domain-specific content spaces.
356	DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning	Timo Milbich; Karsten Roth; Homanga Bharadhwaj; Samarth Sinha; Yoshua Bengio; Bj&oumlrn Ommer; Joseph Paul Cohen;	To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting.
357	DHP: Differentiable Meta Pruning via HyperNetworks	Yawei Li; Shuhang Gu; Kai Zhang; Luc Van Gool; Radu Timofte;	To circumvent this problem, this paper introduces a differentiable prun-ing method via hypernetworks for automatic network pruning
358	Deep Transferring Quantization	Zheng Xie; Zhiquan Wen; Jing Liu; Zhiqiang Liu; Xixian Wu; Mingkui Tan;	Specifically, we propose a method named deep transferring quantization (DTQ) to effectively exploit the knowledge in a pre-trained full-precision model.
359	Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification	Guangyi Chen; Yuhao Lu; Jiwen Lu; Jie Zhou;	In this paper, we propose a deep credible metric learning (DCML) method for unsupervised domain adaptation person re-identification.
360	Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?	Guangyi Chen; Yongming Rao; Jiwen Lu; Jie Zhou;	To distill the temporal coherence part of video representationfrom frame representations, we propose a simple yet effective Adversarial Feature Augmentation (AFA) method, which highlights the temporal coherence features by introducing adversarial augmented temporal motionnoise.
361	Arbitrary-Oriented Object Detection with Circular Smooth Label	Xue Yang; Junchi Yan;	We design a new rotation detection baseline, to address the boundary problem by transforming angular prediction from a regression problem to a classification task with little accuracy loss, whereby high-precision angle classification is devised in contrast to previous works using coarse-granularity in rotation detection.
362	Learning Event-Driven Video Deblurring and Interpolation	Songnan Lin; Jiawei Zhang; Jinshan Pan; Zhe Jiang; Dongqing Zou; Yongtian Wang; Jing Chen; Jimmy Ren;	In this paper, we propose an effective event-driven video deblurring and interpolation algorithm based on deep convolutional neural networks (CNNs).
363	Vectorizing World Buildings: Planar Graph Reconstruction by Primitive Detection and Relationship Inference	Nelson Nauata; Yasutaka Furukawa;	This paper tackles a 2D architecture vectorization problem, whose task is to infer an outdoor building architecture as a 2D planar graph from a single RGB image.
364	Learning to Combine: Knowledge Aggregation for Multi-Source Domain Adaptation	Hang Wang; Minghao Xu; Bingbing Ni; Wenjun Zhang;	To mitigate these problems, we propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework via exploring interactions among domains.
365	CSCL: Critical Semantic-Consistent Learning for Unsupervised Domain Adaptation	Jiahua Dong; Yang Cong; Gan Sun; Yuyang Liu; Xiaowei Xu;	To address above challenges, we develop a new Critical Semantic-Consistent Learning (CSCL) model, which mitigates the discrepancy of both domain-wise and category-wise distributions.
366	Prototype Mixture Models for Few-shot Semantic Segmentation	Boyu Yang; Chang Liu; Bohao Li; Jianbin Jiao; Qixiang Ye;	In this paper, we propose prototype mixture models (PMMs), which correlate diverse image regions with multiple prototypes to enforce the prototype-based semantic representation.
367	Webly Supervised Image Classification with Self-Contained Confidence	Jingkang Yang; Litong Feng; Weirong Chen; Xiaopeng Yan; Huabin Zheng ; Ping Luo; Wayne Zhang;	Inspired by DNNs’ ability on confidence prediction, we introduce self-contained confidence (SCC) by adapting model uncertainty for WSL setting and use it to sample-wisely balance $\mathcal{L}_s$ and $\mathcal{L}_w$.
368	Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization	Haibao Yu; Qi Han; Jianbo Li; Jianping Shi; Guangliang Cheng; Bin Fan;	In this paper, we propose a novel soft Barrier Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all the searched models are inside the valid domain defined by the complexity constraint, thus could return an optimal model under the given constraint by conducting search only one time.
369	Monocular 3D Object Detection via Feature Domain Adaptation	Lele Chen; Guofeng Cui; Celong Liu; Zhong Li; Ziyi Kou; Yi Xu; Chenliang Xu;	In this paper, we propose a novel domain adaptation based monocular 3D object detection framework named DA-3Ddet, which adapts the feature from unsound image-based pseudo-LiDAR domain to the accurate real LiDAR domain for performance boosting.
370	AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation	Xiaofeng Liu; Tong Che; Yiqun Lu; Chao Yang; Site Li; Jane You;	In the viewer-centered coordinates, we construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation (shape, texture and the origin of viewer-centered coordinates, etc.).
371	VPN: Learning Video-Pose Embedding for Activities of Daily Living	Srijan Das; Saurav Sharma; Rui Dai; Fran&ccedilois Br&eacutemond; Monique Thonnat;	In this paper, we focus on the spatio-temporal aspect of recognizing Activities of Daily Living (ADL).
372	Soft Anchor-Point Object Detection	Chenchen Zhu; Fangyi Chen; Zhiqiang Shen; Marios Savvides;	In this work, we boost the performance of the anchor-point detector over the key-point counterparts while maintaining the speed advantage.
373	Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid	Jun Gao; Zian Wang; Jinchen Xuan; Sanja Fidler;	We introduce mph{Deformable Grid} (Defgrid), a learnable neural network module that predicts location offsets of vertices of a 2-dimensional triangular grid such that the edges of the deformed grid align with image boundaries.
374	Soft Expert Reward Learning for Vision-and-Language Navigation	Hu Wang; Qi Wu; Chunhua Shen;	In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
375	Part-aware Prototype Network for Few-shot Semantic Segmentation	Yongfei Liu; Xiangyi Zhang; Songyang Zhang; Xuming He;	In this paper, we propose a novel few-shot semantic segmentation framework based on the prototype representation.
376	Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization	Shujun Wang; Lequan Yu; Caizi Li; Chi-Wing Fu; Pheng-Ann Heng;	To this end, we present a new domain generalization framework that learns how to generalize across domains simultaneously from extit{extrinsic} relationship supervision and extit{intrinsic} self-supervision for images from multi-source domains.
377	Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos	Mahsa Ehsanpour; Alireza Abedin; Fatemeh Saleh; Javen Shi; Ian Reid ; Hamid Rezatofighi;	In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task.
378	Whole-Body Human Pose Estimation in the Wild	Sheng Jin; Lumin Xu; Jin Xu; Can Wang; Wentao Liu; Chen Qian; Wanli Ouyang; Ping Luo;	To fill in this blank, we introduce COCO-WholeBody which extends COCO dataset with whole-body annotations.
379	Relative Pose Estimation of Calibrated Cameras with Known SE(3) Invariants	Bo Li; Evgeniy Martyushev; Gim Hee Lee;	In this paper, we present a complete comprehensive study of the relative pose estimation problem for a calibrated camera constrained by known $\mathrm{SE}(3)$ invariant, which involves 5 minimal problems in total.
380	Sequential Convolution and Runge-Kutta Residual Architecture for Image Compressed Sensing	Runkai Zheng; Yinqi Zhang; Daolang Huang; Qingliang Chen;	To address the two challenges, this paper proposes a novel Runge-Kutta Convolutional Compressed Sensing Network (RK-CCSNet).
381	Deep Hough Transform for Semantic Line Detection	Qi Han; Kai Zhao; Jun Xu; Ming-Ming Cheng;	In this paper, we put forward a simple yet effective method to detect meaningful straight lines, a.k.a. semantic lines, in given scenes.
382	Structured Landmark Detection via Topology-Adapting Deep Graph Learning	Weijian Li; Yuhang Lu; Kang Zheng; Haofu Liao; Chihung Lin; Jiebo Luo; Chi-Tung Cheng; Jing Xiao; Le Lu; Chang-Fu Kuo; Shun Miao;	In this work, we present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical (e.g., hand, pelvis) landmark detection.
383	3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning	Xiangyu Xu; Hao Chen; Francesc Moreno-Noguer; L&aacuteszl&oacute A. Jeni; Fernando De la Torre;	To address the above issues, this paper proposes a novel algorithm called RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
384	Learning to Balance Specificity and Invariance for In and Out of Domain Generalization	Prithvijit Chattopadhyay; Yogesh Balaji; Judy Hoffman;	We introduce Domain-specific Masks for Generalization, a model for improving both in-domain and out-of-domain generalization performance.
385	Contrastive Learning for Unpaired Image-to-Image Translation	Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu;	We propose a straightforward method for doing so — maximizing mutual information between the two, using a framework based on contrastive learning.
386	DLow: Diversifying Latent Flows for Diverse Human Motion Prediction	Ye Yuan; Kris Kitani;	To address these problems, we propose a novel sampling method, Diversifying Latent Flows (DLow), to produce a diverse set of samples from a pretrained deep generative model.
387	GRNet: Gridding Residual Network for Dense Point Cloud Completion	Haozhe Xie; Hongxun Yao; Shangchen Zhou; Jiageng Mao; Shengping Zhang; Wenxiu Sun;	To solve this problem, we introduce 3D grids as intermediate representations to regularize unordered point clouds.
388	Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition	Saihui Hou; Chunshui Cao; Xu Liu; Yongzhen Huang;	In this work, we propose a novel network named Gait Lateral Network (GLN) which can learn both discriminative and compact representations from the silhouettes for gait recognition.
389	Blind Face Restoration via Deep Multi-scale Component Dictionaries	Xiaoming Li; Chaofeng Chen; Shangchen Zhou; Xianhui Lin; Wangmeng Zuo; Lei Zhang;	To address this issue, this paper suggests a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations.
390	Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods	Byungjoo Kim; Bryce Chudomelka; Jinyoung Park; Jaewoo Kang; Youngjoon Hong; Hyunwoo J. Kim;	Motivated by the SSP property and a generalized Runge-Kutta method, we proposed Strong Stability Preserving networks (SSP networks) which improve robustness against adversarial attacks.
391	Inequality-Constrained and Robust 3D Face Model Fitting	Evangelos Sariyanidi; Casey J. Zampella; Robert T. Schultz; Birkan Tunc;	We propose a new formulation that does not require the tuning of any weight parameter.
392	Gabor Layers Enhance Network Robustness	Juan C. P&eacuterez; Motasem Alfarra; Guillaume Jeanneret; Adel Bibi; Ali Thabet; Bernard Ghanem; Pablo Arbel&aacuteez;	In particular, we explore the effect of replacing the first layers of various deep architectures with Gabor layers (i.e. convolutional layers with filters that are based on learnable Gabor parameters) on robustness against adversarial attacks.
393	Conditional Image Repainting via Semantic Bridge and Piecewise Value Function	Shuchen Weng; Wenbo Li; Dawei Li; Hongxia Jin; Boxin Shi;	In this work, we improve the compositing by breaking through the latent ceiling using a novel piecewise value function.
394	Learnable Cost Volume Using the Cayley Representation	Taihong Xiao; Jinwei Yuan; Deqing Sun; Qifei Wang Xin-Yu Zhang; Kehan Xu; Ming-Hsuan Yang;	To address this issue, we propose a learnable cost volume (LCV) using an elliptical inner product, which generalizes the standard inner product by a positive definite kernel matrix.
395	HALO: Hardware-Aware Learning to Optimize	Chaojian Li; Tianlong Chen; Haoran You; Zhangyang Wang; Yingyan Lin;	To this end, we propose hardware-aware learning to optimize (HALO), a practical meta optimizer dedicated to resource-efficient on-device adaptation.
396	Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling	Jia Zheng; Junfei Zhang; Jing Li; Rui Tang; Shenghua Gao; Zihan Zhou;	In this paper, we present a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks.
397	BroadFace: Looking at Tens of Thousands of People at Once for Face Recognition	Yonghyun Kim; Wonpyo Park; Jongju Shin;	To overcome this difficulty, we propose a novel method called BroadFace, which is a learning process to consider a massive set of identities, comprehensively.
398	Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision	Xinzhe Han; Shuhui Wang; Chi Su; Weigang Zhang; Qingming Huang; Qi Tian;	In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer.
399	Domain Adaptive Semantic Segmentation Using Weak Labels	Sujoy Paul; Yi-Hsuan Tsai; Samuel Schulter; Amit K. Roy-Chowdhury; Manmohan Chandraker;	We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. In experiments, we show considerable improvements with respect to the existing state-of-the-arts in UDA and present a new benchmark in the WDA setting.
400	Knowledge Distillation Meets Self-Supervision	Guodong Xu; Ziwei Liu; Xiaoxiao Li; Chen Change Loy;	In this paper, we discuss practical ways to exploit those noisy self-supervision signals with selective transfer for distillation.
401	Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions	Ignacio Rocco; Relja Arandjelovi?; Josef Sivic;	In this work we target the problem of estimating accurately localised correspondences between a pair of images.
402	Reconstructing the Noise Variance Manifold for Image Denoising	Ioannis Marras; Grigorios G. Chrysos; Ioannis Alexiou; Gregory Slabaugh; Stefanos Zafeiriou;	To fill the gap, in this work we introduce the idea of a cGAN which explicitly leverages structure in the image noise variance space.
403	Occlusion-Aware Depth Estimation with Adaptive Normal Constraints	Xiaoxiao Long; Lingjie Liu; Christian Theobalt; Wenping Wang;	We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction.
404	VisualEchoes: Spatial Image Representation Learning through Echolocation	Ruohan Gao; Changan Chen; Ziad Al-Halah; Carl Schissler; Kristen Grauman;	We explore the spatial cues contained in echoes and how they can benefit vision tasks that require spatial reasoning.
405	Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval	Andrew Brown; Weidi Xie; Vicky Kalogeiton; Andrew Zisserman;	To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP.
406	Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation	Liang-Chieh Chen; Raphael Gontijo Lopes; Bowen Cheng; Maxwell D. Collins; Ekin D. Cubuk; Barret Zoph; Hartwig Adam; Jonathon Shlens;	In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation, simultaneously tackling semantic, instance, and panoptic segmentation.
407	Spatially Aware Multimodal Transformers for TextVQA	Yash Kant; Dhruv Batra; Peter Anderson; Alexander Schwing; Devi Parikh; Jiasen Lu; Harsh Agrawal;	In contrast, we propose a novel spatially aware self-attention layer such that each visual entity only looks at neighboring entities defined by a spatial graph.
408	Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector	Cheng-Chun Hsu; Yi-Hsuan Tsai; Yen-Yu Lin; Ming-Hsuan Yang;	Different from existing solutions, we propose a domain adaptation framework that accounts for each pixel, especially via predicting pixel-wise objectness and centerness.
409	URIE: Universal Image Enhancement for Visual Recognition in the Wild	Taeyoung Son Juwon Kang Namyup Kim Sunghyun Cho Suha Kwak;	To tackle this issue, we present a Universal and Recognition-friendly Image Enhancement network, dubbed URIE, which is attached in front of existing recognition models and enhances distorted input to improve their performance without retraining them.
410	Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation	Hongwei Yi; Zizhuang Wei; Mingyu Ding; Runze Zhang; Yisong Chen; Guoping Wang; Yu-Wing Tai;	In this paper, we propose an effective and efficient pyramid multi-view stereo (MVS) net with self-adaptive view aggregation for accurate and complete dense point cloud reconstruction.
411	SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning	Junbing Li; Changqing Zhang; Pengfei Zhu; Baoyuan Wu; Lei Chen; Qinghua Hu;	In this work, we propose to select a small subset of labels as landmarks which are easy to predict according to input (predictable) and can well recover the other possible labels (representative).
412	Unpaired Image-to-Image Translation using Adversarial Consistency Loss	Yihao Zhao; Ruihai Wu; Hao Dong;	In this paper, we propose a novel adversarial-consistency loss for image-to-image translation.
413	Discriminability Distillation in Group Representation Learning	Manyuan Zhang; Guanglu Song; Hang Zhou; Yu Liu;	We claim the most significant indicator to show whether the group representation can be benefited from one of its element is not the quality or an inexplicable score, but the discriminability w.r.t.the model.
414	Monocular Expressive Body Regression through Body-Driven Attention	Vasileios Choutas; Georgios Pavlakos; Timo Bolkart; Dimitrios Tzionas ; Michael J. Black;	We address these limitations by introducing ExPose(EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image.
415	Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation	Zongsheng Yue; Qian Zhao; Lei Zhang; Deyu Meng;	In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise generation tasks.
416	Linguistic Structure Guided Context Modeling for Referring Image Segmentation	Tianrui Hui; Si Liu; Shaofei Huang; Guanbin Li; Sansi Yu; Faxi Zhang; Jizhong Han;	To tackle this problem, we propose a “gather-propagate-distribute” scheme to model multimodal context by crossmodal interaction and implement this scheme as a novel Linguistic Structure guided Context Modeling (LSCM) module.
417	Federated Visual Classification with Real-World Data Distribution	Tzu-Ming Harry Hsu; Hang Qi; Matthew Brown;	In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm.
418	Robust Re-Identification by Multiple Views Knowledge Distillation	Angelo Porrello; Luca Bergamini; Simone Calderara;	In this work, we devise a training strategy that allows the transfer of a superior knowledge, arising from a set of views depicting the target object.
419	Defocus Deblurring Using Dual-Pixel Data	Abdullah Abuolaim; Michael S. Brown;	We propose an effective defocus deblurring method that exploits data available on dual-pixel (DP) sensors found on most modern cameras.
420	RhyRNN: Rhythmic RNN for Recognizing Events in Long and Complex Videos	Tianshu Yu; Yikang Li; Baoxin Li;	To address this, we propose Rhythmic RNN (RhyRNN) which is capable of handling long video sequences (up to 3,000 frames) as well as capturing rhythms at different scales.
421	Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping	Uttaran Bhattacharya; Christian Roncal; Trisha Mittal; Rohan Chandra ; Kyra Kapsaskis; Kurt Gray; Aniket Bera; Dinesh Manocha;	We present an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or motion-captured data and represented as sequences of 3D poses.
422	Weighing Counts: Sequential Crowd Counting by Reinforcement Learning	Liang Liu; Hao Lu; Hongwei Zou; Haipeng Xiong; Zhiguo Cao; Chunhua Shen;	Inspired by scale weighing, we propose a novel ‘counting scale’ termed LibraNet where the count value is analogized by weight.
423	Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks	Yunfei Liu; Xingjun Ma; James Bailey; Feng Lu;	In this paper, we present a new type of backdoor attack inspired by an important natural phenomenon: reflection.
424	Learning to Learn with Variational Information Bottleneck for Domain Generalization	Yingjun Du; Jun Xu; Huan Xiong; Qiang Qiu; Xiantong Zhen; Cees G. M. Snoek; Ling Shao;	Domain generalization models learn to generalize to previously unseen domains, but suffer from prediction uncertainty and domain shift. In this paper, we address both problems.
425	Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis	Ruixuan Yu; Xin Wei; Federico Tombari; Jian Sun;	In this paper we propose a rotation-invariant deep network for point clouds analysis.
426	Thanks for Nothing: Predicting Zero-Valued Activations with Lightweight Convolutional Neural Networks	Gil Shomron; Ron Banner; Moran Shkolnik; Uri Weiser;	Inspired by the observation that spatial correlation exists in CNN output feature maps (ofms), we propose a method to dynamically predict whether ofm activations are zero-valued or not according to their neighboring activation values, thereby avoiding zero-valued activations and reducing the number of convolution operations.
427	Layered Neighborhood Expansion for Incremental Multiple Graph Matching	Zixuan Chen; Zhihui Xie; Junchi Yan Yinqiang Zheng; Xiaokang Yang;	In this paper, we treat the graphs as graphs on a super-graph, and propose a novel breadth first search based method for expanding the neighborhood on the super-graph for a new coming graph, such that the matching with the new graph can be efficiently performed within the constructed neighborhood.
428	SCAN: Learning to Classify Images without Labels	Wouter Van Gansbeke; Simon Vandenhende; Stamatios Georgoulis; Marc Proesmans; Luc Van Gool;	In this paper, we deviate from recent works, and advocate a two-step approach where feature learning and clustering are decoupled.
429	Graph convolutional networks for learning with few clean and many noisy labels	Ahmet Iscen; Giorgos Tolias; Yannis Avrithis; Ond?ej Chum; Cordelia Schmid;	In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given.
430	Object-and-Action Aware Model for Visual Language Navigation	Yuankai Qi; Zizheng Pan; Shengping Zhang; Anton van den Hengel; Qi Wu;	In this paper, we propose an Object-and-Action Aware Model (OAAM) that processes these two different forms of natural language based instruction separately.
431	A Comprehensive Study of Weight Sharing in Graph Networks for 3D Human Pose Estimation	Kenkun Liu; Rongqi Ding; Zhiming Zou; Le Wang; Wei Tang;	The objective of this paper is to have a comprehensive and systematic study of weight sharing in GCNs for 3D HPE.
432	MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution	Wenbo Li; Xin Tao; Taian Guo; Lu Qi; Jiangbo Lu; Jiaya Jia;	Motivated by these findings, we propose a temporal multi-correspondence aggregation strategy to leverage most similar patches across frames, and also a cross-scale nonlocal-correspondence aggregation scheme to explore self-similarity of images across scales.
433	Efficient Semantic Video Segmentation with Per-frame Inference	Yifan Liu; Chunhua Shen; Changqian Yu; Jingdong Wang;	In contrast, here we explicitly consider the temporal consistency among frames as extra constraints during training and process each frame independently in the inference phase.
434	Increasing the Robustness of Semantic Segmentation Models with Painting-by-Numbers	Christoph Kamann; Carsten Rother;	We present a new training schema that increases this shape bias.
435	Deep Spiking Neural Network: Energy Efficiency Through Time based Coding	Bing Han; Kaushik Roy;	In this work, we propose an ANN to SNN conversion methodology that uses a time-based coding scheme, named Temporal-Switch-Coding (TSC), and a corresponding TSC spiking neuron model.
436	InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling	Jun Wang; Shiyi Lan; Mingfei Gao; Larry S. Davis;	To address this issue, we propose a novel 3D object detection framework with dynamic information modeling.
437	Utilizing Patch-level Category Activation Patterns for Multiple Class Novelty Detection	Poojan Oza; Vishal M. Patel;	In this paper, we propose a novel method that makes deep convolutional neural networks robust to novel classes.
438	People as Scene Probes	Yifan Wang; Brian L. Curless; Steven M. Seitz;	By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism.
439	Mapping in a Cycle: Sinkhorn Regularized Unsupervised Learning for Point Cloud Shapes	Lei Yang; Wenxi Liu; Zhiming Cui; Nenglun Chen; Wenping Wang;	We propose an unsupervised learning framework with the pretext task of finding dense correspondences between point cloud shapes from the same category based on the cycle-consistency formulation.
440	Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions	Matheus Gadelha; Aruni RoyChowdhury; Gopal Sharma; Evangelos Kalogerakis; Liangliang Cao; Erik Learned-Miller; Rui Wang; Subhransu Maji;	In this paper, we investigate the use of Approximate Convex Decompositions (ACD) as a self-supervisory signalfor label-efficient learning of point cloud representations.
441	TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video	Tiancheng Zhi; Christoph Lassner; Tony Tung; Carsten Stoll; Srinivasa G. Narasimhan; Minh Vo;	We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video.
442	Consistency-based Semi-supervised Active Learning: Towards Minimizing Labeling Cost	Mingfei Gao; Zizhao Zhang; Guo Yu; Sercan . Ar?k; Larry S. Davis; Tomas Pfister;	Here, we propose to unify unlabeled sample selection and model training towards minimizing labeling cost, and make two contributions towards that end.
443	Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation	Fangyun Wei; Xiao Sun; Hongyang Li; Jingdong Wang; Stephen Lin;	While this center-point regression is simple and efficient, we argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions.
444	Modeling 3D Shapes by Reinforcement Learning	Cheng Lin; Tingxiang Fan; Wenping Wang; Matthias Nie&szligner;	Inspired by such artist-based modeling, we propose a two-step neural framework based on RL to learn 3D modeling policies.
445	LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform	Lida Li; Kun Wang; Shuai Li; Xiangchu Feng; Lei Zhang;	In this paper, we propose to mitigate this issue by learning a CNN with a learnable sparse transform (LST), which converts the input features into a more compact and sparser domain so that the spatial and channel-wise redundancy can be more effectively reduced.
446	Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision	Damien Teney; Ehsan Abbasnedjad; Anton van den Hengel;	We propose an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets.
447	CN: Channel Normalization For Point Cloud Recognition	Zetong Yang; Yanan Sun; Shu Liu; Xiaojuan Qi; Jiaya Jia;	In this paper, we deeply analyze these point recognition frameworks and present a factor, called difference ratio, to measure the influence of structure information among different levels on the final representation.
448	Rethinking the Defocus Blur Detection Problem and A Real-Time Deep DBD Model	Ning Zhang; Junchi Yan;	In this work, we propose novel perspectives on the DBD problem and design convenient approach to build a real-time cost-effective DBD model.
449	AutoMix: Mixup Networks for Sample Interpolation via Cooperative Barycenter Learning	Jianchao Zhu; Liangliang Shi; Junchi Yan; Hongyuan Zha;	This paper proposes new ways of sample mixing by thinking of the process as generation of barycenter in a metric space for data augmentation.
450	Scene Text Image Super-resolution in the wild	Wenjia Wang; Enze Xie; Xuebo Liu; Wenhai Wang; Ding Liang; Chunhua Shen; Xiang Bai;	In this purpose, a new Text Super-Resolution Network, termed TSRN, with three novel modules is developed.
451	Coupling Explicit and Implicit Surface Representations for Generative 3D Modeling	Omid Poursaeed; Matthew Fisher; Noam Aigerman; Vladimir G. Kim;	We propose a novel neural architecture for representing 3D surfaces, which harnesses two complementary shape representations: (i) an explicit representation via an atlas, i.e., embeddings of 2D domains into 3D (ii) an implicit-function representation, i.e., a scalar function over the 3D volume, with its levels denoting surfaces.
452	Learning Disentangled Representations with Latent Variation Predictability	Xinqi Zhu; Chang Xu; Dacheng Tao;	This paper defines the variation predictability of latent disentangled representations.
453	Deep Space-Time Video Upsampling Networks	Jaeyeon Kang; Younghyun Jo; Seoung Wug Oh; Peter Vajda; Seon Joo Kim;	In this paper, we investigate the problem of jointly upsampling videos both in space and time, which is becoming more important with advances in display systems.
454	Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery	Shuo Wang; Jun Yue; Jianzhuang Liu; Qi Tian; Meng Wang;	To solve these problems, we propose a method based on multi-modal knowledge discovery.
455	Fast Video Object Segmentation using the Global Context Module	Yu Li; Zhuoran Shen; Ying Shan;	We developed a real-time, high-quality semi-supervised video object segmentation algorithm.
456	Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos	Anurag Arnab; Chen Sun; Arsha Nagrani; Cordelia Schmid;	In this paper, we present a spatio-temporal action recognition model that is trained with only video-level labels, which are significantly easier to annotate.
457	Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification	Nikita Dvornik; Cordelia Schmid; Julien Mairal;	In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
458	MessyTable: Instance Association in Multiple Camera Views	Zhongang Cai; Junzhe Zhang; Daxuan Ren; Cunjun Yu; Haiyu Zhao; Shuai Yi; Chai Kiat Yeo; Chen Change Loy;	We present an interesting and challenging dataset that features a large number of scenes with messy tables captured from multiple camera views.
459	A Unified Framework for Shot Type Classification Based on Subject Centric Lens	Anyi Rao; Jiaze Wang; Linning Xu; Xuekun Jiang; Qingqiu Huang; Bolei Zhou; Dahua Lin;	To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition.
460	BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues	Samuel Albanie; G&uumll Varol; Liliane Momeni; Triantafyllos Afouras; Joon Son Chung; Neil Fox; Andrew Zisserman;	In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.
461	HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization	Neng Qian; Jiayi Wang; Franziska Mueller; Florian Bernard; Vladislav Golyanik; Christian Theobalt;	To fill this gap, in this work we present HTML, the first parametric texture model of human hands.
462	CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions	Zhongdao Wang; Jingwei Zhang; Liang Zheng; Yixuan Liu; Yifan Sun; Yali Li; Shengjin Wang;	This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.
463	Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions	Xihui Liu; Zhe Lin; Jianming Zhang; Handong Zhao; Quan Tran; Xiaogang Wang; Hongsheng Li;	We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.
464	Towards Real-Time Multi-Object Tracking	Zhongdao Wang; Liang Zheng; Yixuan Liu; Yali Li; Shengjin Wang;	In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model.
465	A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation	Jian Liang; Yunbo Wang; Dapeng Hu; Ran He; Jiashi Feng;	In this paper, we build on domain adversarial learning and propose a novel domain adaptation method BA$^3$US with two new techniques termed Balanced Adversarial Alignment (BAA) and Adaptive Uncertainty Suppression (AUS), respectively.
466	Unsupervised Deep Metric Learning with Transformed Attention Consistency and Contrastive Clustering Loss	Yang Li; Shichao Kan; Zhihai He;	To characterize the consistent pattern of human attention during image comparisons, we introduce the idea of transformed attention consistency.
467	STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos	Ali Athar; Sabarinath Mahadevan; Aljosa Osep; Laura Leal-Taixé Bastian Leibe;	In this paper, we propose a different approach that is well-suited to a variety of tasks involving instance segmentation in videos.
468	Hierarchical Style-based Networks for Motion Synthesis	Jingwei Xu; Huazhe Xu; Bingbing Ni; Xiaokang Yang; Xiaolong Wang; Trevor Darrell;	In this paper, we propose an unsupervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
469	Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop	Benjamin Biggs; Oliver Boyne; James Charles; Andrew Fitzgibbon; Roberto Cipolla;	We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images.
470	Learning to Count in the Crowd from Limited Labeled Data	Vishwanath A. Sindagi; Rajeev Yasarla; Deepak Sam Babu; R. Venkatesh Babu; Vishal M. Patel;	In this work, we focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples while leveraging a large pool of unlabeled data.
471	SPOT: Selective Point Cloud Voting for Better Proposal in Point Cloud Object Detection	Hongyuan Du; Linjun Li; Bo Liu; Nuno Vasconcelos;	In this work, we propose Selective Point clOud voTing (SPOT) module, a simple effective component that can be easily trained end-to-end in point cloud object detectors to solve this problem.
472	Explainable Face Recognition	Jonathan R. Williford; Brandon B. May; Jeffrey Byrne;	In this paper, we provide the first comprehensive benchmark and baseline evaluation for XFR. Finally, we provide a comprehensive benchmark on this dataset comparing five state-of-the-art XFR algorithms on three facial matchers.
473	From Shadow Segmentation to Shadow Removal	Hieu Le; Dimitris Samaras;	We propose a shadow removal method that can be trained using only shadow and non-shadow patches cropped from the shadow images themselves.
474	Diverse and Admissible Trajectory Prediction through Multimodal Context Understanding	Seong Hyeon Park; Gyubok Lee; Jimin Seo; Manoj Bhat; Minseok Kang; Jonathan Francis; Ashwin Jadhav; Paul Pu Liang; Louis-Philippe Morency;	In this paper, we propose a model that synthesizes multiple input signals from the multimodal world\|the environment’s scene context and interactions between multiple surrounding agents\|to best model all diverse and admissible trajectories.
475	CONFIG: Controllable Neural Face Image Generation	Marek Kowalski; Stephan J. Garbin; Virginia Estellers; Tadas Baltrušaitis; Matthew Johnson; Jamie Shotton;	To this end we propose ConfigNet, a neural face model that allows for controlling individual aspects of output images in semantically meaningful ways and that is a significant step on the path towards finely-controllable neural rendering.
476	Single View Metrology in the Wild	Rui Zhu; Xingyi Yang; Yannick Hold-Geoffroy; Federico Perazzi; Jonathan Eisenmann; Kalyan Sunkavalli; Manmohan Chandraker;	We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition.
477	Procedure Planning in Instructional Videos	Chien-Yi Chang; De-An Huang; Danfei Xu; Ehsan Adeli; Li Fei-Fei; Juan Carlos Niebles;	In this paper, we study the problem of procedure planning in instructional videos, which can be seen as the first step towards enabling autonomous agents to plan for complex tasks in everyday settings such as cooking.
478	Funnel Activation for Visual Recognition	Ningning Ma; Xiangyu Zhang; Jian Sun;	We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition.
479	GIQA: Generated Image Quality Assessment	Shuyang Gu; Jianmin Bao; Dong Chen; Fang Wen;	We introduce three GIQA algorithms from two perspectives: learning-based and data-based.
480	Adversarial Continual Learning	Sayna Ebrahimi; Franziska Meier; Roberto Calandra; Trevor Darrell; Marcus Rohrbach;	We show that shared features are significantly less prone to forgetting and propose a novel hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features required to solve a sequence of tasks.
481	Adapting Object Detectors with Conditional Domain Normalization	Peng Su; Kun Wang; Xingyu Zeng; Shixiang Tang; Dapeng Chen; Di Qiu ; Xiaogang Wang;	In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain distribution gap.
482	HARD-Net: Hardness-AwaRe Discrimination Network for 3D Early Activity Prediction	Tianjiao Li; Jun Liu; Wei Zhang; Lingyu Duan;	In this paper, we propose a novel Hardness-AwaRe Discrimination Network (HARD-Net) to specifically investigate the relationships between the similar activity pairs that are hard to be discriminated.
483	Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction	Lokender Tiwari; Pan Ji; Quoc-Huy Tran; Bingbing Zhuang; Saket Anand ; Manmohan Chandraker;	In this paper, we demonstrate that the coupling of these two by leveraging the strengths of each mitigates the other’s shortcomings.
484	Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting	Shengcai Liao; Ling Shao;	In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps.
485	Self-supervised Bayesian Deep Learning for Image Recovery with Applications to Compressive Sensing	Tongyao Pang; Yuhui Quan; Hui Ji;	Motivated by the practical value of reducing the cost and complexity of constructing labeled training datasets, this paper proposed a self-supervised deep learning approach for image recovery, which is dataset-free.
486	Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement	Jian Wang; Xiang Long; Yuan Gao; Errui Ding; Shilei Wen;	In this paper, we aim to find a better approach to get more accurate localization results.
487	Semi-supervised Learning with a Teacher-student Network for Generalized Attribute Prediction	Minchul Shin;	With that in mind, we propose a multi-teacher-single-student (MTSS) approach inspired by the multi-task learning and the distillation of semi-supervised learning.
488	Unsupervised Domain Adaptation with Noise Resistible Mutual-Training for Person Re-identification	Fang Zhao; Shengcai Liao; Guo-Sen Xie; Jian Zhao; Kaihao Zhang; Ling Shao;	To depress noises in pseudo-labels, this paper proposes a Noise Resistible Mutual-Training (NRMT) method, which maintains two networks during training to perform collaborative clustering and mutual instance selection.
489	DPDist: Comparing Point Clouds Using Deep Point Cloud Distance	Dahlia Urbach; Yizhak Ben-Shabat; Michael Lindenbaum;	We introduce a new deep learning method for point cloud comparison.
490	Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation	Xiaokang Chen; Kwan-Yee Lin; Jingbo Wang; Wayne Wu; Chen Qian; Hongsheng Li; Gang Zeng;	In this paper, we propose a unified and efficient Cross-modality Guided Encoder to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
491	DataMix: Efficient Privacy-Preserving Edge-Cloud Inference	Zhijian Liu; Zhanghao Wu; Chuang Gan; Ligeng Zhu; Song Han;	In this paper, we mediate between the resource-constrained edge devices and the privacy-invasive cloud servers by introducing a novel privacy-preserving edge-cloud inference framework, DataMix.
492	Neural Re-Rendering of Humans from a Single Image	Kripasindhu Sarkar; Dushyant Mehta; Weipeng Xu; Vladislav Golyanik; Christian Theobalt;	To ad-dress these challenges, we propose a new method for neural re-renderingof a human under a novel user-defined pose and viewpoint given oneinput image.
493	Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation	Filippo Aleotti; Fabio Tosi; Li Zhang; Matteo Poggi; Stefano Mattoccia;	In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between the two.
494	PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration	Jinjin Gu; Haoming Cai; Haoyu Chen; Xiaoxing Ye; Jimmy S. Ren; Chao Dong;	Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods.
495	Why do These Match? Explaining the Behavior of Image Similarity Models	Bryan A. Plummer; Mariya I. Vasileva; Vitali Petsiuk; Kate Saenko; David Forsyth;	In this paper, we introduce Salient Attributes for Network Explanation (SANE) to explain image similarity models, where a model’s output is a score measuring the similarity of two inputs rather than a classification score.
496	CooGAN: A Memory-Efficient Framework for High-Resolution Facial Attribute Editing	Xuanhong Chen; Bingbing Ni; Naiyuan Liu; Ziang Liu; Yiliu Jiang; Loc Truong; Qi Tian;	To address these issues, we propose a NOVEL pixel translation framework called Cooperative GAN(CooGAN) for HR facial image editing.
497	Progressive Transformers for End-to-End Sign Language Production	Ben Saunders; Necati Cihan Camgoz; Richard Bowden;	In this paper, we propose Progressive Transformers, the first SLP model to translate from discrete spoken language sentences to continuous 3D sign pose sequences in an end-to-end manner.
498	Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting	Minghui Liao; Guan Pang; Jing Huang; Tal Hassner; Xiang Bai;	To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN.
499	Making Affine Correspondences Work in Camera Geometry Computation	Daniel Barath; Michal Polic; Wolfgang F&oumlrstner; Torsten Sattler; Tomas Pajdla; Zuzana Kukelova;	We propose a method for refining the local feature geometries by symmetric intensity-based matching, combine uncertainty propagation inside RANSAC with preemptive model verification, show a general scheme for computing uncertainty of minimal solvers results, and adapt the sample cheirality check for homography estimation to region-to-region correspondences.
500	Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces	Jiankang Deng; Jia Guo; Tongliang Liu; Mingming Gong; Stefanos Zafeiriou;	In this paper, we relax the intra-class constraint of ArcFace to improve the robustness to label noise.
501	Foley Music: Learning to Generate Music from Videos	Chuang Gan; Deng Huang; Peihao Chen; Joshua B. Tenenbaum; Antonio Torralba;	In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
502	Contrastive Multiview Coding	Yonglong Tian; Dilip Krishnan; Phillip Isola;	We study this hypothesis under the framework of multiview contrastive learning, where we learn a representation that aims to maximize mutual information between different views of the same scene but is otherwise compact.
503	Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses	Yingwei Li; Song Bai; Cihang Xie; Zhenyu Liao; Xiaohui Shen; Alan Yuille;	This paper focuses on learning transferable adversarial examples specifically against defense models (models to defense adversarial attacks).
504	Generative Low-bitwidth Data Free Quantization	Shoukai Xu; Haokun Li; Bohan Zhuang; Jing Liu; Jiezhang Cao; Chuangrun Liang; Mingkui Tan;	In this paper, we investigate a simple-yet-effective method called Generative Low-bitwidth Data Free Quantization(GDFQ) to remove the data dependence burden.
505	Local Correlation Consistency for Knowledge Distillation	Xiaojie Li; Jianlong Wu; Hongyu Fang; Yue Liao; Fei Wang; Chen Qian;	In this paper, we propose the local correlation exploration framework for knowledge distillation.
506	Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild	Jason Y. Zhang; Sam Pepose; Hanbyul Joo; Deva Ramanan; Jitendra Malik; Angjoo Kanazawa;	We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment.
507	Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation	Hang Zhou; Xudong Xu; Dahua Lin; Xiaogang Wang; Ziwei Liu;	To overcome this challenge, we propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio.
508	CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations	Yuanhan Zhang; ZhenFei Yin; Yidong Li; Guojun Yin; Junjie Yan; Jing Shao; Ziwei Liu;	Our key insight is that, compared with the commonly-used binary supervision or mid-level geometric representations, rich semantic annotations as auxiliary tasks can greatly boost the performance and generalizability of face anti-spoofing across a wide range of spoof attacks.
509	Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues	Yuyang Qian; Guojun Yin; Lu Sheng; Zixuan Chen; Jing Shao;	To introduce frequency into the face forgery detection, we propose a novel Frequency in Face Forgery Network (F$^3$-Net), taking advantages of two different but complementary frequency-aware clues, 1) frequency-aware decomposed image components, and 2) local frequency statistics, to deeply mine the forgery patterns via our two-stream collaborative learning framework.
510	Weakly-Supervised Cell Tracking via Backward-and-Forward Propagation	Kazuya Nishimura; Junya Hayashida; Chenyang Wang; Dai Fei Elmer Ker; Ryoma Bise;	We propose a weakly-supervised cell tracking method that can train a convolutional neural network (CNN) by using only the annotation of &quot&quotcell detection&quot&quot (i.e., the coordinates of cell positions) without association information, in which cell positions can be easily obtained by nuclear staining.
511	SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation	John Yang; Hyung Jin Chang; Seungeui Lee; Nojun Kwak;	In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework for better 3D hand pose estimation performance, which leads to the necessity of a large scale dataset with sequential RGB hand images.
512	Rethinking the Distribution Gap of Person Re-identification with Camera-based Batch Normalization	Zijie Zhuang; Longhui Wei; Lingxi Xie; Tianyu Zhang; Hengheng Zhang ; Haozhe Wu; Haizhou Ai; Qi Tian;	This paper rethinks the working mechanism of conventional ReID approaches and puts forward a new solution.
513	AMLN: Adversarial-based Mutual Learning Network for Online Knowledge Distillation	Xiaobing Zhang; Shijian Lu; Haigang Gong; Zhipeng Luo; Ming Liu;	In this work, we propose an innovative adversarial-based mutual learning network (AMLN) that introduces process-driven learning beyond outcome-driven learning for augmented online knowledge distillation.
514	Online Multi-modal Person Search in Videos	Jiangyue Xia; Anyi Rao; Qingqiu Huang; Linning Xu; Jiangtao Wen; Dahua Lin;	In this paper, we propose an online person search framework, which can recognize people in a video on the fly.
515	Single Image Super-Resolution via a Holistic Attention Network	Ben Niu; Weilei Wen; Wenqi Ren; Xiangde Zhang; Lianping Yang; Shuzhen Wang; Kaihao Zhang; Xiaochun Cao; Haifeng Shen;	To address this problem, we propose a new holistic attention network (HAN), which consists of a layer attention module (LAM) and a channel-spatial attention module (CSAM), to model the holistic interdependencies among layers, channels, and positions.
516	Can You Read Me Now? Content Aware Rectification using Angle Supervision	Amir Markovitz; Inbal Lavi; Or Perel; Shai Mazor; Roee Litman;	We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification that relies on the document’s content, the location of the words and specifically their orientation, as hints to assist in the rectification process.
517	Momentum Batch Normalization for Deep Learning with Small Batch Size	Hongwei Yong; Jianqiang Huang; Deyu Meng; Xiansheng Hua; Lei Zhang;	To make a deeper understanding of BN, in this work we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process, while the noise level depends only on the batch size.
518	AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds	Abdullah Hamdi; Sara Rojas; Ali Thabet; Bernard Ghanem;	In this work, we present novel data-driven adversarial attacks against 3D point cloud networks.
519	Edge-aware Graph Representation Learning and Reasoning for Face Parsing	Gusi Te; Yinglu Liu; Wei Hu; Hailin Shi; Tao Mei;	To this end, we propose to model and reason the region-wise relations by learning graph representations, and leverage the edge information between regions for optimized abstraction.
520	BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network	Deng-Ping Fan; Yingjie Zhai; Ali Borji; Jufeng Yang; Ling Shao;	In this paper, we make the first attempt to leverage the inherent multi-modal and multi-level nature of RGB-D salient object detection to develop a novel cascaded refinement network.
521	G-LBM:Generative Low-dimensional Background Model Estimation from Video Sequences	Behnaz Rezaei; Amirreza Farnoosh; Sarah Ostadabbas;	In this paper, we propose a computationally tractable and theoretically supported non-linear low-dimensional generative model to represent real-world data in the presence of noise and sparse outliers.
522	H3DNet: 3D Object Detection Using Hybrid Geometric Primitives	Zaiwei Zhang; Bo Sun; Haitao Yang; Qixing Huang;	We introduce H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels.
523	Expressive Telepresence via Modular Codec Avatars	Hang Chu; Shugao Ma; Fernando De la Torre; Sanja Fidler; Yaser Sheikh;	This paper aims in this direction and presents Modular Codec Avatars (MCA), a method to generate hyper-realistic faces driven by the cameras in the VR headset.
524	Cascade Graph Neural Networks for RGB-D Salient Object Detection	Ao Luo; Xin Li; Fan Yang; Zhicheng Jiao; Hong Cheng; Siwei Lyu;	In this paper, we study the problem of salient object detection for RGB-D images by using both color and depth information.
525	FairALM: Augmented Lagrangian Method for Training Fair Models with Little Regret	Vishnu Suresh Lokhande; Aditya Kumar Akash; Sathya N. Ravi; Vikas Singh;	Here, we study mechanisms that impose fairness concurrently while training the model.
526	Generating Videos of Zero-Shot Compositions of Actions and Objects	Megha Nawhal; Mengyao Zhai; Andreas Lehrmann; Leonid Sigal; Greg Mori;	In this paper we develop methods for generating such videos — making progress toward addressing the important, open problem of video generation in complex scenes.
527	ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language	Zhe Wang; Zhiyuan Fang; Jun Wang; Yezhou Yang;	To be concrete, our Visual-Textual Attribute Alignment model (dubbed as ViTAA) learns to disentangle the feature space of a person into sub-spaces corresponding to attributes using a light auxiliary attribute segmentation layer. It then aligns these visual features with the textual attributes parsed from the sentences via a novel contrastive learning loss.
528	Renovating Parsing R-CNN for Accurate Multiple Human Parsing	Lu Yang; Qing Song; Zhihui Wang; Mengjie Hu; Chun Liu; Xueshi Xin; Wenhe Jia; Songcen Xu;	To reverse this phenomenon, we present Renovating Parsing R-CNN (RP R-CNN), which introduces a global semantic enhanced feature pyramid network and a parsing re-scoring network into the existing high-performance pipeline.
529	Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning	Qing Yu; Daiki Ikami; Go Irie; Kiyoharu Aizawa;	Instead of training an OOD detector and SSL separately, we propose a multi-task curriculum learning framework.
530	Gradient-Induced Co-Saliency Detection	Zhao Zhang; Wenda Jin; Jun Xu; Ming-Ming Cheng;	In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection (GICD) method. To evaluate the performance of Co-SOD methods on discovering the co-salient object among multiple foregrounds, we construct a challenging CoCA dataset, where each image contains at least one extraneous foreground along with the co-salient object.
531	Nighttime Defogging Using High-Low Frequency Decomposition and Grayscale-Color Networks	Wending Yan; Robby T. Tan; Dengxin Dai;	In this paper, we address the problem of nighttime defogging from a single image.
532	SegFix: Model-Agnostic Boundary Refinement for Segmentation	Yuhui Yuan; Jingyi Xie; Xilin Chen; Jingdong Wang;	We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model.
533	Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction	Cunjun Yu; Xiao Ma; Jiawei Ren; Haiyu Zhao; Shuai Yi;	In this paper, we present STAR, a Spatio-Temporal grAph tRansformer framework, which tackles trajectory prediction by only attention mechanisms.
534	Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars	Egor Zakharov; Aleksei Ivakhnenko; Aliaksandra Shysheya; Victor Lempitsky;	We propose a neural rendering-based system that creates head avatars from a single photograph.
535	Neural Geometric Parser for Single Image Camera Calibration	Jinwoo Lee; Minhyuk Sung; Hyunjoon Lee; Junho Kim;	We propose a neural geometric parser learning single image camera calibration for man-made scenes.
536	Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision	Yuxiang Wei; Ming Liu; Haolin Wang; Ruifeng Zhu; Guosheng Hu; Wangmeng Zuo;	We propose a novel Flow-based Feature Warping Model (FFWM) which can learn to synthesize photo-realistic and illumination preserving frontal images with illumination inconsistent supervision.
537	Learning Architectures for Binary Networks	Dahyun Kim; Kunal Pratap Singh; Jonghyun Choi;	Questioning that the architectures designed for FP networks might not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective.
538	Semantic View Synthesis	Hsin-Ping Huang; Hung-Yu Tseng; Hsin-Ying Lee; Jia-Bin Huang;	To address the drawbacks, we propose a two-step approach. First, we focus on synthesizing the color and depth of the visible surface of the 3D scene. We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process.
539	An Analysis of Sketched IRLS for Accelerated Sparse Residual Regression	Daichi Iwata; Michael Waechter; Wen-Yan Lin; Yasuyuki Matsushita;	This paper studies the problem of sparse residual regression, i.e., learning a linear model using a norm that favors solutions in which the residuals are sparsely distributed.
540	Relative Pose from Deep Learned Depth and a Single Affine Correspondence	Ivan Eichhardt; Daniel Barath;	We propose a new approach for combining deep-learned nonmetric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence.
541	Video Super-Resolution with Recurrent Structure-Detail Network	Takashi Isobe; Xu Jia; Shuhang Gu; Songjiang Li; Shengjin Wang; Qi Tian;	In this work, we propose a novel recurrent video super-resolution method which is both effective and efficient in exploiting previous frames to super-resolve the current frame.
542	Shape Adaptor: A Learnable Resizing Module	Shikun Liu; Zhe Lin; Yilin Wang; Jianming Zhang; Federico Perazzi; Edward Johns;	We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution.
543	Shuffle and Attend: Video Domain Adaptation	Jinwoo Choi; Gaurav Sharma; Samuel Schulter; Jia-Bin Huang;	We address the problem of domain adaptation in videos for the task of human action recognition.
544	DRG: Dual Relation Graph for Human-Object Interaction Detection	Chen Gao; Jiarui Xu; Yuliang Zou; Jia-Bin Huang;	In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph (one human-centric and one object-centric).
545	Flow-edge Guided Video Completion	Chen Gao; Ayush Saraf; Jia-Bin Huang; Johannes Kopf;	We present a new flow-based video completion algorithm.
546	End-to-End Trainable Deep Active Contour Models for Automated Image Segmentation: Delineating Buildings in Aerial Imagery	Ali Hatamizadeh; Debleena Sengupta; Demetri Terzopoulos;	As a solution, we present Trainable Deep Active Contours (TDACs), an automatic image segmentation framework that intimately unites Convolutional Neural Networks (CNNs) and Active Contour Models (ACMs).
547	Towards End-to-end Video-based Eye-Tracking	Seonwook Park; Emre Aksan; Xucong Zhang; Otmar Hilliges;	In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
548	Generating Handwriting via Decoupled Style Descriptors	Atsunobu Kotani; Stefanie Tellex; James Tompkin;	Instead, we introduce the Decoupled Style Descriptor (DSD) model for handwriting, which factors both character- and writer-level styles and allows our model to represent an overall greater space of styles.
549	LEED: Label-Free Expression Editing via Disentanglement	Rongliang Wu; Shijian Lu;	This paper presents an innovative label-free expression editing via disentanglement (LEED) framework that is capable of editing the expression of both frontal and profile facial images without requiring any expression labels.
550	Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards	Xuewen Yang; Heming Zhang; Di Jin; Yingru Liu; Chi-Hao Wu; Jianchao Tan; Dongliang Xie; Jue Wang; Xin Wang;	The goal of this work is to develop a novel learning framework for accurate and expressive fashion captioning.
551	Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder	Gouthaman KV; Anurag Mittal;	In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect.
552	Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation	Jogendra Nath Kundu; Ambareesh Revanur; Govind Vitthal Waghmare; Rahul Mysore Venkatesh; R. Venkatesh Babu;	We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
553	Class-Incremental Domain Adaptation	Jogendra Nath Kundu; Rahul Mysore Venkatesh; Naveen Venkat; Ambareesh Revanur; R. Venkatesh Babu;	In this work, we effectively identify the limitations of these approaches in the CIDA paradigm.
554	Anti-Bandit Neural Architecture Search for Model Defense	Hanlin Chen; Baochang Zhang; Song Xue; Xuan Gong; Hong Liu; Rongrong Ji; David Doermann;	In this paper, we defend against adversarial attacks using neural architecture search (NAS) which is based on a comprehensive search of denoising blocks, weight-free operations, Gabor filters and convolutions.
555	Wavelet-Based Dual-Branch Network for Image Demoir&eacuteing	Lin Liu; Jianzhuang Liu; Shanxin Yuan; Gregory Slabaugh; Aleš Leonardis; Wengang Zhou; Qi Tian;	In this paper, we design a wavelet-based dual-branch network (WDNet) with a spatial attention mechanism for image demoireing.
556	Low Light Video Enhancement using Synthetic Data Produced with an Intermediate Domain Mapping	Danai Triantafyllidou; Sean Moran; Steven McDonagh; Sarah Parisot; Gregory Slabaugh;	By generating dynamic video data synthetically, we enable a recently proposed state-of-the-art RAW-to-RGB model to attain higher image quality (improved colour, reduced artifacts) and improved temporal consistency, compared to the same model trained with only static real video data
557	Non-Local Spatial Propagation Network for Depth Completion	Jinsun Park; Kyungdon Joo; Zhe Hu; Chi-Kuei Liu; In So Kweon;	In this paper, we propose a robust and efficient end-to-end non-local spatial propagation network for depth completion.
558	DanbooRegion: An Illustration Region Dataset	Lvmin Zhang; Yi JI; Chunping Liu;	We detail the challenges in achieving this dataset and present a human-in-the-loop workflow namely Feasibility-based Assignment Recommendation (FAR) to enable large-scale annotating.
559	Event Enhanced High-Quality Image Recovery	Bishan Wang; Jingwei He; Lei Yu; Gui-Song Xia; Wen Yang;	Based on this, we propose an explainable network, an event-enhanced sparse learning network (eSL-Net), to recover the high-quality images from event cameras.
560	PackDet: Packed Long-Head Object Detector	Kun Ding; Guojin He; Huxiang Gu; Zisha Zhong; Shiming Xiang; Chunhong Pan;	To solve this issue, we propose a packing operator (PackOp) to combine all head branches together at spatial.
561	A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS	Xuefei Ning; Yin Zheng; Tianchen Zhao; Yu Wang; Huazhong Yang;	This work proposes a novel Graph-based neural ArchiTecture Encoding Scheme, a.k.a. GATES, to improve the predictor-based neural architecture search.
562	Learning Semantic Neural Tree for Human Parsing	Ruyi Ji; Dawei Du; Libo Zhang; Longyin Wen; Yanjun Wu; Chen Zhao; Feiyue Huang; Siwei Lyu;	In this paper, we design a novel semantic neural tree for human parsing, which uses a tree architecture to encode physiological structure of human body, and design a coarse to fine process in a cascade manner to generate accurate results.
563	Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation	Wenbin Wang; Ruiping Wang; Shiguang Shan; Xilin Chen;	Therefore, we argue that a desirable scene graph should be also hierarchically constructed, and introduce a new scheme for modeling scene graph.
564	Burst Denoising via Temporally Shifted Wavelet Transforms	Xuejian Rong; Denis Demandolx; Kevin Matzen; Priyam Chatterjee; Yingli Tian;	We propose an end-to-end trainable burst denoising pipeline which jointly captures high-resolution and high-frequency deep features derived from wavelet transforms.
565	JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans	Fengze Liu; Jinzheng Cai; Yuankai Huo; Chi-Tung Cheng; Ashwin Raju; Dakai Jin; Jing Xiao; Alan Yuille; Le Lu; ChienHung Liao; Adam P. Harrison;	In this work, we propose a novel multi-task learning system, JSSR, based on an end-to-end 3D convolutional neural network that is composed of a generator, a registration and a segmentation component.
566	SimAug: Learning Robust Representations from Simulation for Trajectory Prediction	Junwei Liang; Lu Jiang; Alexander Hauptmann;	We propose a novel approach to learn robust representation through augmenting the simulation training data such that the representation can better generalize to unseen real-world test data.
567	ScribbleBox: Interactive Annotation Framework for Video Object Segmentation	Bowen Chen; Huan Ling; Xiaohui Zeng; Jun Gao; Ziyue Xu; Sanja Fidler;	We introduce ScribbleBox, an interactive framework for annotating object instances with masks in videos with a significant boost in efficiency.
568	Rethinking Pseudo-LiDAR Representation	Xinzhu Ma; Shinan Liu; Zhiyi Xia; Hongwen Zhang; Xingyu Zeng; Wanli Ouyang;	In this paper, we perform an in-depth investigation and observe that the pseudo-LiDAR representation is effective because of the coordinate transformation, instead of data representation itself.
569	Deep Multi Depth Panoramas for View Synthesis	Kai-En Lin; Zexiang Xu; Ben Mildenhall; Pratul P. Srinivasan; Yannick Hold-Geoffroy; Stephen DiVerdi; Qi Sun; Kalyan Sunkavalli; Ravi Ramamoorthi;	We propose a learning-based approach for novel view synthesis for multi-camera 360$^
570	MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection	Fa-Ting Hong; Xuanteng Huang; Wei-Hong Li; Wei-Shi Zheng;	In this work, we propose casting weakly supervised video highlight detection modeling for a given specific event as a multiple instance ranking network (MINI-Net) learning.
571	ContactPose: A Dataset of Grasps with Object Contact and Hand Pose	Samarth Brahmbhatt; Chengcheng Tang; Christopher D. Twigg; Charles C. Kemp; James Hays;	We introduce ContactPose, the first dataset of hand-object contact paired with hand pose, object pose, and RGB-D images.
572	API-Net: Robust Generative Classifier via a Single Discriminator	Xinshuai Dong; Hong Liu; Rongrong Ji; Liujuan Cao; Qixiang Ye; Jianzhuang Liu; Qi Tian;	This work aims for a solution of generative classifiers that can profit from the merits of both.
573	Bias-based Universal Adversarial Patch Attack for Automatic Check-out	Aishan Liu; Jiakai Wang; Xianglong Liu; Bowen Cao; Chongzhi Zhang; Hang Yu;	To address the problem, this paper proposes a bias-based framework to generate class-agnostic universal adversarial patches with strong generalization ability, which exploits both the perceptual and semantic bias of models.
574	Imbalanced Continual Learning with Partitioning Reservoir Sampling	Chris Dongjoo Kim; Jinseo Jeong; Gunhee Kim;	We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by ?rst empirically showing a new challenge of destructive forgetting of the minority concepts on the tail.
575	Guided Collaborative Training for Pixel-wise Semi-Supervised Learning	Zhanghan Ke; Di Qiu; Kaican Li; Qiong Yan; Rynson W.H. Lau;	In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions.
576	Stacking Networks Dynamically for Image Restoration Based on the Plug-and-Play Framework	Haixin Wang; Tianhao Zhang; Muzhi Yu; Jinan Sun; Wei Ye; Chen Wang ; Shikun Zhang;	To address this challenge, we leverage the iterative process of the traditional plug-and-play method to provide a dynamic stacked network for Image Restoration.
577	Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight	Ming Sun; Haoxuan Dou; Junjie Yan;	To remedy the above issues, we reduce the super-network size by randomly dropping connection between network blocks while embedding a larger search space.
578	Spatial Attention Pyramid Network for Unsupervised Domain Adaptation	Congcong Li; Dawei Du; Libo Zhang; Longyin Wen; Tiejian Luo; Yanjun Wu; Pengfei Zhu;	To that end, in this paper, we design a new spatial attention pyramid network for unsupervised domain adaptation.
579	GSIR: Generalizable 3D Shape Interpretation and Reconstruction	Jianren Wang; Zhaoyuan Fang;	We propose to recover 3D shape structures as cuboids from partially reconstructed objects and use the predicted structures to further guide 3D reconstruction.
580	Weakly Supervised 3D Object Detection from Lidar Point Cloud	Qinghao Meng; Wenguan Wang; Tianfei Zhou; Jianbing Shen; Luc Van Gool ; Dengxin Dai;	This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances.
581	Two-phase Pseudo Label Densification for Self-training based Domain Adaptation	Inkyu Shin; Sanghyun Woo; Fei Pan; In So Kweon;	In order to tackle this problem, we propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD.
582	Adaptive Offline Quintuplet Loss for Image-Text Matching	Tianlang Chen; Jiajun Deng; Jiebo Luo;	In this paper, we propose solutions by sampling negatives offline from the whole training set.
583	Learning Object Placement by Inpainting for Compositional Data Augmentation	Lingzhi Zhang; Tarmily Wen; Jie Min; Jiancong Wang; David Han; Jianbo Shi;	We propose a self-learning framework that automatically generates the necessary training data without any manual labeling by detecting, cutting, and inpainting objects from an image.
584	Deep Vectorization of Technical Drawings	Vage Egiazarian; Oleg Voynov; Alexey Artemov; Denis Volkhonskiy; Aleksandr Safin; Maria Taktasheva; Denis Zorin; Evgeny Burnaev;	We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images.
585	CAD-Deform: Deformable Fitting of CAD Models to 3D Scans	Vladislav Ishimtsev; Alexey Bokhovkin; Alexey Artemov; Savva Ignatyev ; Matthias Niessner; Denis Zorin; Evgeny Burnaev;	In this work, we address this shortcoming by introducing CAD-Deform, a method which obtains more accurate CAD-to-scan fits by non-rigidly deforming retrieved CAD models.
586	An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices	Xiaolong Ma; Wei Niu; Tianyun Zhang; Sijia Liu; Sheng Lin; Hongjia Li; Wujie Wen; Xiang Chen; Jian Tang; Kaisheng Ma; Bin Ren; Yanzhi Wang;	To solve the problem, we introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
587	AutoTrajectory: Label-free Trajectory Extraction and Prediction from Videos using Dynamic Points	Yuexin Ma; Xinge Zhu; Xinjing Cheng; Ruigang Yang; Jiming Liu; Dinesh Manocha;	In this paper, we present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction to use raw videos directly.
588	Multi-Agent Embodied Question Answering in Interactive Environments	Sinan Tan; Weilai Xiang; Huaping Liu; Di Guo; Fuchun Sun;	We investigate a new AI task — Multi-Agent Interactive Question Answering — where several agents explore the scene jointly in interactive environments to answer a question.
589	Conditional Sequential Modulation for Efficient Global Image Retouching	Jingwen He; Yihao Liu; Yu Qiao; Chao Dong;	In this paper, we investigate some commonly-used retouching operations and mathematically find that these pixel-independent operations can be approximated or formulated by multi-layer perceptrons (MLPs).
590	Segmenting Transparent Objects in the Wild	Enze Xie; Wenjia Wang; Wenhai Wang; Mingyu Ding; Chunhua Shen; Ping Luo;	To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10,428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
591	Length-Controllable Image Captioning	Chaorui Deng; Ning Ding; Mingkui Tan; Qi Wu;	In this paper, we propose to use a simple length level embedding to endow them with this ability.
592	Few-Shot Semantic Segmentation with Democratic Attention Networks	Haochen Wang; Xudong Zhang; Yutao Hu; Yandan Yang; Xianbin Cao; Xiantong Zhen;	In this paper, we propose the Democratic Attention Network (DAN) for few-shot semantic segmentation.
593	Defocus Blur Detection via Depth Distillation	Xiaodong Cun; Chi-Man Pun;	To solve these problems, we introduce depth information into DBD for the first time.
594	Motion Guided 3D Pose Estimation from Videos	Jingbo Wang; Sijie Yan; Yuanjun Xiong; Dahua Lin;	We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
595	Reflection Separation via Multi-bounce Polarization State Tracing	Rui Li; Simeng Qiu; Guangming Zang; Wolfgang Heidrich;	In this paper we aim to generalize the reflection removal to real-world scenarios with more complicated light interactions.
596	SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation	Jiale Cao; Rao Muhammad Anwer; Hisham Cholakkal; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao;	We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box.
597	SemanticAdv: Generating Adversarial Examples via Attribute-conditioned Image Editing	Haonan Qiu; Chaowei Xiao; Lei Yang; Xinchen Yan; Honglak Lee; Bo Li;	In this paper, we propose SemanticAdv to generate a new type of semantically realistic adversarial examples via attribute-conditioned image editing.
598	Learning with Noisy Class Labels for Instance Segmentation	Longrong Yang; Fanman Meng; Hongliang Li; Qingbo Wu; Qishang Cheng;	To solve this issue, a novel method is proposed in this paper, which uses different losses describing different roles of noisy class labels to enhance the learning.
599	Deep Image Clustering with Category-Style Representation	Junjie Zhao; Donghuan Lu; Kai Ma; Yu Zhang; Yefeng Zheng;	In this paper, we propose a novel deep image clustering framework to learn a category-style latent representation in which the category information is disentangled from image style and can be directly used as the cluster assignment.
600	Self-supervised Motion Representation via Scattering Local Motion Cues	Yuan Tian; Zhaohui Che; Wenbo Bao; Guangtao Zhai; Zhiyong Gao;	In this paper, we leverage the massive unlabeled video data to learn an accurate explicit motion representation that aligns well with the semantic distribution of the moving objects.
601	Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets	Tian Chen; Shijie An; Yuan Zhang; Chongyang Ma ; Huayan Wang; Xiaoyan Guo; Wen Zheng;	One key limitation of existing approaches lies in their lack of structural information exploitation, which leads to inaccurate spatial layout, discontinuous surface, and ambiguous boundaries. In this paper, we tackle this problem in three aspects.
602	BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation	Junheum Park; Keunsoo Ko; Chul Lee; Chang-Su Kim;	We propose a novel deep-learning-based video interpolation algorithm based on bilateral motion estimation.
603	Hard negative examples are hard, but useful	Hong Xuan; Abby Stylianou; Xiaotong Liu; Robert Pless;	In this paper, we characterize the space of triplets and derive why hard negatives make triplet loss training fail.
604	ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions	Zechun Liu; Zhiqiang Shen; Marios Savvides; Kwang-Ting Cheng;	In this paper, we propose several ideas for enhancing a bi- nary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
605	Video Object Detection via Object-level Temporal Aggregation	Chun-Han Yao; Chen Fang; Xiaohui Shen; Yangyue Wan; Ming-Hsuan Yang;	In this work we propose to improve video object detection via temporal aggregation.
606	Object Detection with a Unified Label Space from Multiple Datasets	Xiangyun Zhao; Samuel Schulter; Gaurav Sharma; Yi-Hsuan Tsai; Manmohan Chandraker; Ying Wu;	Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces.
607	Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D	Jonah Philion; Sanja Fidler;	We propose a new end-to-end architecture that directly extracts a bird’s-eye-view representation of a scene given image data from an arbitrary number of cameras.
608	Comprehensive Image Captioning via Scene Graph Decomposition	Yiwu Zhong; Liwei Wang; Jianshu Chen; Dong Yu; Yin Li;	We address the challenging problem of image captioning by revisiting the representation of image scene graph.
609	Symbiotic Adversarial Learning for Attribute-based Person Search	Yu-Tong Cao; Jingya Wang; Dacheng Tao;	In this paper, we present a symbiotic adversarial learning framework, called SAL.
610	Amplifying Key Cues for Human-Object-Interaction Detection	Yang Liu; Qingchao Chen; Andrew Zisserman;	In this paper we introduce two methods to amplify key cues in the image, and also a method to combine these and other cues when considering the interaction between a human and an object.
611	Rethinking Few-shot Image Classification: A Good Embedding is All You Need?	Yonglong Tian; Yue Wang; Dilip Krishnan; Joshua B. Tenenbaum; Phillip Isola;	In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods.
612	Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization	Kyle Min; Jason J. Corso;	Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring. To address this issue, we propose a novel method named A2CL-PT.
613	Action Localization through Continual Predictive Learning	Sathyanarayanan Aakur; Sudeep Sarkar;	In this paper, we present a new approach based on continual learning that uses feature-level predictions for self-supervision.
614	Generative View-Correlation Adaptation for Semi-Supervised Multi-View Learning	Yunyu Liu; Lichen Wang; Yue Bai; Can Qin; Zhengming Ding; Yun Fu;	To address the challenges, we propose a novel View-Correlation Adaptation ( extit{VCA}) framework in semi-supervised fashion.
615	READ: Reciprocal Attention Discriminator for Image-to-Video Re-Identification	Minho Shim; Hsuan-I Ho; Jinhyung Kim; Dongyoon Wee;	In this work, we focus on image-to-video re-ID which compares a single query image to videos in the gallery.
616	3D Human Shape Reconstruction from a Polarization Image	Shihao Zou; Xinxin Zuo; Yiming Qian; Sen Wang; Chi Xu; Minglun Gong ; Li Cheng;	This paper tackles the problem of estimating 3D body shape of clothed humans from single polarized 2D images, i.e. polarization images.
617	The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification	Pirazh Khorramshahi; Neehar Peri; Jun-cheng Chen; Rama Chellappa;	In this paper, we present Self-supervised Attention for Vehicle Re-identification (SAVER), a novel approach to effectively learn vehicle-specific discriminative features.
618	Improving One-stage Visual Grounding by Recursive Sub-query Construction	Zhengyuan Yang; Tianlang Chen; Liwei Wang; Jiebo Luo;	To address this query modeling deficiency, we propose a recursive sub-query construction framework, which reasons between image and query for multiple rounds and reduces the referring ambiguity step by step.
619	Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video	Jianyi Wang; Xin Deng; Mai Xu; Congyong Chen; Yuhang Song;	In this paper, we focus on enhancing the perceptualquality of compressed video.
620	Example-Guided Image Synthesis using Masked Spatial-Channel Attention and Self-Supervision	Haitian Zheng; Haofu Liao; Lele Chen; Wei Xiong; Tianlang Chen; Jiebo Luo;	In this paper, we tackle a more challenging and general task, where the exemplar is a scene image that is semantically different from the given label map.
621	Content-Consistent Matching for Domain Adaptive Semantic Segmentation	Guangrui Li; Guoliang Kang; Wu Liu; Yunchao Wei; Yi Yang;	This paper considers the adaptation of semantic segmentation from the synthetic source domain to the real target domain.
622	AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting	Wenhai Wang; Xuebo Liu; Xiaozhong Ji; Enze Xie; Ding Liang; ZhiBo Yang; Tong Lu; Chunhua Shen; Ping Luo;	Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
623	History Repeats Itself: Human Motion Prediction via Motion Attention	Wei Mao; Miaomiao Liu; Mathieu Salzmann;	Here, we introduce an attention-based feed-forward network that explicitly leverages this observation.
624	Unsupervised Video Object Segmentation with Joint Hotspot Tracking	Lu Zhang; Jianming Zhang; Zhe Lin; Radom&iacuter M?ch; Huchuan Lu; You He;	Specifically, we propose a Weighted Correlation Siamese Network (WCS-Net) which employs a Weighted Correlation Block (WCB) for encoding the pixel-wise correspondence between a template frame and the search frame.
625	SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach	Ailing Zeng; Xiao Sun; Fuyang Huang; Minhao Liu; Qiang Xu; Stephen Lin;	We propose to take advantage of this fact for better generalization to rare and unseen poses.
626	CAFE-GAN: Arbitrary Face Attribute Editing with Complementary Attention Feature	Jeong gi Kwak; David K. Han; Hanseok Ko;	To address this unintended altering problem, we propose a novel GAN model which is designed to edit only the parts of a face pertinent to the target attributes by the concept of Complementary Attention Feature (CAFE).
627	MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection	Xin Lu; Quanquan Li; Buyu Li; Junjie Yan;	In this paper, we propose MimicDet, a novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detectors.
628	Latent Topic-aware Multi-Label Classification	Jianghong Ma; Yang Liu;	This paper shows that the sample and feature exaction, which are two important procedures for removing noisy and redundant information encoded in training samples in both sample and feature perspectives, can be effectively and efficiently performed in the latent topic space by considering topic-based feature-label correlation.
629	Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning	Xiangxi Shi; Xu Yang; Jiuxiang Gu; Shafiq Joty; Jianfei Cai;	In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.
630	Attract, Perturb, and Explore: Learning a Feature Alignment Network for Semi-supervised Domain Adaptation	Taekyung Kim; Changick Kim;	We propose an SSDA framework that aims to align features via alleviation of the intra-domain discrepancy.
631	Curriculum Manager for Source Selection in Multi-Source Domain Adaptation	Luyu Yang; Yogesh Balaji; Ser-Nam Lim; Abhinav Shrivastava;	In this paper, we proposed an adversarial agent that learns a dynamic curriculum for source samples, called Curriculum Manager for Source Selection (CMSS).
632	Powering One-shot Topological NAS with Stabilized Share-parameter Proxy	Ronghao Guo; Chen Lin; Chuming Li; Keyu Tian; Ming Sun; Lu Sheng; Junjie Yan;	In this work, we try to enhance the one-shot NAS by exploring high-performing network architectures in our large-scale Topology Augmented Search Space (i.e., over 3.4&times10^10 different topological structures).
633	Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation	Haoran Wang; Tong Shen; Wei Zhang; Ling-Yu Duan; Tao Mei;	To fully exploit the supervision in the source domain, we propose a fine-grained adversarial learning strategy for class-level feature alignment while preserving the internal structure of semantics across domains.
634	Boundary-preserving Mask R-CNN	Tianheng Cheng; Xinggang Wang; Lichao Huang; Wenyu Liu;	To remedy this, we propose a conceptually simple yet effective Boundary-guided Mask R-CNN (BMask R-CNN) to leverage object boundary information to improve mask localization accuracy.
635	Self-supervised Single-view 3D Reconstruction via Semantic Consistency	Xueting Li; Sifei Liu; Kihwan Kim; Shalini De Mello; Varun Jampani; Ming-Hsuan Yang; Jan Kautz;	The key insight of our work is that objects can be represented as a collection of deformable parts, and each part is semantically coherent across different instances of the same category (e.g., wings on birds and wheels on cars).
636	MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation	Benlin Liu; Yongming Rao; Jiwen Lu; Jie Zhou; Cho-Jui Hsieh;	Specifically, we propose that better soft targets with higher compatibility can be generated by using a label generator to fuse the featuremaps from deeper stages in a top-down manner, and we can employ the meta-learning technique to optimize this label generator.
637	Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling	Yuliang Zou; Pan Ji; Quoc-Huy Tran; Jia-Bin Huang; Manmohan Chandraker;	In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences.
638	The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation	Tao Wang; Yu Li; Bingyi Kang; Junnan Li; Junhao Liew; Sheng Tang; Steven Hoi; Jiashi Feng;	Based on such an observation, we first consider various techniques for improving long-tail classification performance which indeed enhance instance segmentation results. We then propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
639	What is Learned in Deep Uncalibrated Photometric Stereo?	Guanying Chen; Michael Waechter; Boxin Shi; Kwan-Yee K. Wong; Yasuyuki Matsushita;	In this paper, we analyze the features learned by this method and find that they strikingly resemble attached shadows, shadings, and specular highlights, which are known to provide useful clues in resolving the generalized bas-relief (GBR) ambiguity.
640	Prior-based Domain Adaptive Object Detection for Hazy and Rainy Conditions	Vishwanath A. Sindagi; Poojan Oza; Rajeev Yasarla; Vishal M. Patel;	To address this issue, we propose an unsupervised prior-based domain adversarial object detection framework for adapting the detectors to hazy and rainy conditions.
641	Adversarial Ranking Attack and Defense	Mo Zhou; Zhenxing Niu; Le Wang; Qilin Zhang; Gang Hua;	In this paper, we propose two attacks against deep ranking systems,i.e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations.
642	ReDro: Efficiently Learning Large-sized SPD Visual Representation	Saimunur Rahman; Lei Wang; Changming Sun; Luping Zhou;	This work proposes a novel scheme called Relation Dropout (ReDro). It is inspired by the fact that eigen-decomposition of a block diagonal matrix can be efficiently obtained by decomposing each of its diagonal square matrices, which are of smaller sizes.
643	Graph-Based Social Relation Reasoning	Wanhua Li; Yueqi Duan; Jiwen Lu; Jianjiang Feng; Jie Zhou;	In this paper, we propose a simpler, faster, and more accurate method named graph relational reasoning network (GR$^2$N) for social relation recognition.
644	EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection	Tengteng Huang; Zhe Liu; Xiwu Chen; Xiang Bai;	In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors (namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidence.
645	Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency	Jiaxiang Shang; Tianwei Shen; Shiwei li; Lei Zhou; Mingmin Zhen; Tian Fang; Long Quan;	In contrast to previous works that only enforce 2D feature constraints, we propose a self-supervised training architecture by leveraging the multi-view geometry consistency, which provides reliable constraints on face pose and depth estimation.
646	Asynchronous Interaction Aggregation for Action Detection	Jiajun Tang; Jin Xia; Xinzhi Mu; Bo Pang; Cewu Lu;	We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
647	Shape and Viewpoint without Keypoints	Shubham Goel; Angjoo Kanazawa; Jitendra Malik;	We present a learning framework that learns to recover the 3D shape, pose and texture from a single image, trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
648	Learning Attentive and Hierarchical Representations for 3D Shape Recognition	Jiaxin Chen; Jie Qin; Yuming Shen; Li Liu; Fan Zhu; Ling Shao;	This paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR).
649	TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search	Yibo Hu; Xiang Wu; Ran He;	In this paper, we rethink three freedoms of differentiable NAS, i.e. operation-level, depth-level and width-level, and propose a novel method, named Three-Freedom NAS (TF-NAS), to achieve both good classification accuracy and precise latency constraint.
650	Associative3D: Volumetric Reconstruction from Sparse Views	Shengyi Qian; Linyi Jin; David F. Fouhey;	We propose a new approach that estimates reconstructions, distributions over the camera/object and camera/camera transformations, as well as an inter-view object affinity matrix.
651	PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit	Yongqiang Mou; Lei Tan; Hui Yang; Jingying Chen; Leyuan Liu; Rui Yan; Yaohong Huang;	In this paper, we address the problem of recognizing degradation images that are suffering from high blur or low-resolution.
652	Memory Selection Network for Video Propagation	Ruizheng Wu; Huaijia Lin; Xiaojuan Qi; Jiaya Jia;	To tackle this challenge, we propose a memory selection network, which learns to select suitable guidance from all previous frames for effective and robust propagation.
653	Disentangled Non-local Neural Networks	Minghao Yin; Zhuliang Yao; Yue Cao; Xiu Li; Zheng Zhang; Stephen Lin; Han Hu;	Based on these findings, we present the disentangled non-local block, where the two terms are decoupled to facilitate learning for both terms.
654	URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark	Seonguk Seo; Joon-Young Lee; Bohyung Han;	We propose a unified referring video object segmentation network (URVOS).
655	Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup	Chuanchen Luo; Chunfeng Song; Zhaoxiang Zhang;	As for the latter issue, we propose a novel cross-domain mixup scheme.
656	Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks	Yan Liu; Lingqiao Liu; Peng Wang; Pingping Zhang; Yinjie Lei;	Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks.
657	Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training	Hongkai Zhang; Hong Chang; Bingpeng Ma; Naiyan Wang; Xilin Chen;	In this work, we first point out the inconsistency problem between the fixed network settings and the dynamic training procedure, which greatly affects the performance.
658	Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip	Weilun Chen; Zhaoxiang Zhang; Xiaolin Hu; Baoyuan Wu;	In this paper, we show that just randomly flipping the signs of a small number of entries in adversarial perturbations can significantly boost the attack performance.
659	Knowledge Transfer via Dense Cross-Layer Mutual-Distillation	Anbang Yao; Dawei Sun;	In this paper, we propose Dense Cross-layer Mutual-distillation (DCM), an improved two-way KT method in which the teacher and student networks are trained collaboratively from scratch.
660	Matching Guided Distillation	Kaiyu Yue; Jiangfan Deng; Feng Zhou;	In this paper, we present Matching Guided Distillation(MGD) as an efficient and parameter-free manner to solve these problems.
661	Clustering Driven Deep Autoencoder for Video Anomaly Detection	Yunpeng Chang; Zhigang Tu; Wei Xie; Junsong Yuan;	Since the abnormal events are usually different from normal events in appearance and/or in motion behavior, we address this issue by designing a novel convolution autoencoder architecture to separately capture spatial and temporal informative representation.
662	Learning to Compose Hypercolumns for Visual Correspondence	Juhong Min; Jongmin Lee; Jean Ponce; Minsu Cho;	In this work, we introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
663	Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction	Lei Zhou; Zixin Luo; Mingmin Zhen; Tianwei Shen; Shiwei Li; Zhuofei Huang; Tian Fang; Long Quan;	In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability.
664	Object-based Illumination Estimation with Rendering-aware Neural Networks	Xin Wei; Guojun Chen; Yue Dong; Stephen Lin; Xin Tong;	We present a scheme for fast environment light estimation from the RGBD appearance of individual objects and their local image areas.
665	Progressive Point Cloud Deconvolution Generation Network	Le Hui; Rui Xu; Jin Xie; Jianjun Qian; Jian Yang;	In this paper, we propose an effective point cloud generation method, which can generate multi-resolution point clouds of the same shape from a latent vector.
666	SSCGAN: Facial Attribute Editing via Style Skip Connections	Wenqing Chu; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Rongrong Ji;	In this work, we focus on solving this issue by editing the channel-wise global information denoted as the style feature.
667	Negative Pseudo Labeling using Class Proportion for Semantic Segmentation in Pathology	Hiroki Tokunaga; Brian Kenji Iwana; Yuki Teramoto; Akihiko Yoshizawa ; Ryoma Bise;	In this paper, we propose a subtype segmentation method that uses such proportional labels as weakly supervised labels.
668	Learn to Propagate Reliably on Noisy Affinity Graphs	Lei Yang; Qingqiu Huang; Huaiyi Huang; Linning Xu; Dahua Lin;	To overcome these difficulties, we propose a new framework that allows labels to be propagated reliably on large-scale real-world data.
669	Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search	Xiangxiang Chu; Tianbao Zhou; Bo Zhang; Jixiang Li;	Thereby, we present a novel approach called Fair DARTS where the exclusive competition is relaxed to be collaborative.
670	TANet: Towards Fully Automatic Tooth Arrangement	Guodong Wei; Zhiming Cui; Yumeng Liu; Nenglun Chen; Runnan Chen; Guiqing Li; Wenping Wang;	In this work, we proposed a learning-based method for fast and automatic tooth arrangement.
671	UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection	Bumsoo Kim; Taeho Choi; Jaewoo Kang; Hyunwoo J. Kim;	To tackle this problem, we propose UnionDet, a one-stage meta architecture for HOI detection powered by a novel union-level detector that eliminates this additional inference stage by directly capturing the region of interaction.
672	GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision	Lei Ke; Shichao Li; Yanan Sun; Yu-Wing Tai; Chi-Keung Tang;	We present a novel end-to-end framework named as GSNet ( extbf{\underline{G}}eometric and extbf{\underline{S}}cene-aware \underline{ extbf{Net}}work), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view.
673	Resolution Switchable Networks for Runtime Efficient Image Recognition	Yikai Wang; Fuchun Sun; Duo Li; Anbang Yao;	We propose a general method to train a single convolutional neural network which is capable of switching image resolutions at inference.
674	SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation	Jianan Zhen; Qi Fang; Jiaming Sun; Wentao Liu; Wei Jiang; Hujun Bao ; Xiaowei Zhou;	In this paper, we propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
675	Learning to Detect Open Classes for Universal Domain Adaptation	Bo Fu; Zhangjie Cao; Mingsheng Long; Jianmin Wang;	Towards accurate open class detection, we propose Calibrated Multiple Uncertainties (CMU) with a novel transferability measure estimated by a mixture of uncertainty quantities in complementation: entropy, confidence and consistency, defined on conditional probabilities calibrated by a multi-classifier ensemble model.
676	Visual Compositional Learning for Human-Object Interaction Detection	Zhi Hou; Xiaojiang Peng; Yu Qiao; Dacheng Tao;	We devise a deep Visual Compositional Learning (VCL) framework, which is a simple yet efficient framework to effectively address this problem.
677	Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches	Shuai Yang; Zhangyang Wang; Jiaying Liu; Zongming Guo;	In this paper, we propose Deep Plastic Surgery, a novel, robust and controllable image editing framework that allows users to interactively edit images using hand-drawn sketch inputs.
678	Rethinking Class Activation Mapping for Weakly Supervised Object Localization	Wonho Bae; Junhyug Noh; Gunhee Kim;	We propose three simple but robust techniques that alleviate the problems, including thresholded average pooling, negative weight clamping, and percentile as a standard for thresholding.
679	OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features	Anton Osokin; Denis Sumin; Vasily Lomakin;	In this paper, we consider the task of one-shot object detection, which consists in detecting objects defined by a single demonstration.
680	Interpretable Neural Network Decoupling	Yuchao Li; Rongrong Ji; Shaohui Lin; Baochang Zhang; Chenqian Yan; Yongjian Wu; Feiyue Huang; Ling Shao;	In this paper, we propose a novel architecture decoupling method to interpret the network from a perspective of investigating its calculation paths.
681	Omni-sourced Webly-supervised Learning for Video Recognition	Haodong Duan; Yue Zhao; Yuanjun Xiong; Wentao Liu; Dahua Lin;	We introduce OmniSource, a novel framework for leveraging web data to train video recognition models.
682	CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending	Hang Xu; Shaoju Wang; Xinyue Cai; Wei Zhang; Xiaodan Liang; Zhenguo Li;	In this paper, we propose a novel lane-sensitive architecture search framework named CurveLane-NAS to automatically capture both long-ranged coherent and accurate short-range curve information while unifying both architecture search and post-processing on curve lane predictions via point blending.
683	Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation	Jiaxing Huang; Shijian Lu; Dayan Guan; Xiaobing Zhang;	This paper presents an innovative local contextual-relation consistent domain adaptation (CrCDA) technique that aims to achieve local-level consistencies during the global-level alignment.
684	Estimating People Flows to Better Count Them in Crowded Scenes	Weizhe Liu; Mathieu Salzmann; Pascal Fua;	In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing.
685	Generate to Adapt: Resolution Adaption Network for Surveillance Face Recognition	Han Fang; Weihong Deng; Yaoyao Zhong; Jiani Hu;	To avoid this problem, we propose a novel resolution adaption network (RAN) which contains Multi-Resolution Generative Adversarial Networks (MR-GAN) followed by a feature adaption network.
686	Learning Feature Embeddings for Discriminant Model based Tracking	Linyu Zheng; Ming Tang; Yingying Chen; Jinqiao Wang; Hanqing Lu;	After observing that the features used in most online discriminatively trained trackers are not optimal, in this paper, we propose a novel and effective architecture to learn optimal feature embeddings for online discriminative tracking.
687	WeightNet: Revisiting the Design Space of Weight Networks	Ningning Ma; Xiangyu Zhang; Jiawei Huang; Jian Sun;	We present a conceptually simple, flexible and effective framework for weight generating networks.
688	Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift	Ryuhei Takahashi; Atsushi Hashimoto; Motoharu Sonogashira; Masaaki Iiyama;	This paper discusses unsupervised domain adaptation (UDA) with target shift, i.e., UDA with the non-identical label distributions of the source and target domains.
689	Learning Where to Focus for Efficient Video Object Detection	Zhengkai Jiang; Yu Liu; Ceyuan Yang; Jihao Liu; Peng Gao; Qian Zhang; Shiming Xiang; Chunhong Pan;	Therefore, a novel module called Learnable Spatio-Temporal Sampling (LSTS) has been proposed to learn semantic-level correspondences among frame features accurately.
690	Learning Object Permanence from Video	Aviv Shamsian; Ofri Kleinfeld; Amir Globerson; Gal Chechik;	Here we introduce the setup of learning Object Permanence from labeled videos.
691	Adaptive Text Recognition through Visual Matching	Chuhan Zhang; Ankush Gupta; Andrew Zisserman;	We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual decoding and linguistic modelling stages through intermediate representations in the form of similarity maps.
692	Actions as Moving Points	Yixuan Li; Zixu Wang; Limin Wang; Gangshan Wu;	In this paper, we present a conceptually simple, computationally efficient, and more precise action tubelet detection framework, termed as MovingCenter Detector (MOC-detector), by treating an action instance as a trajectory of moving points.
693	Learning to Exploit Multiple Vision Modalities by Using Grafted Networks	Yuhuang Hu; Tobi Delbruck; Shih-Chii Liu;	This paper proposes a Network Grafting Algorithm (NGA), where a new front end network driven by unconventional visual inputs replaces the front end network of a pretrained deep network that processes intensity frames.
694	Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild	Alexander Grabner; Yaming Wang; Peizhao Zhang; Peihong Guo; Tong Xiao; Peter Vajda; Peter M. Roth; Vincent Lepetit;	We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild.
695	3D Fluid Flow Reconstruction Using Compact Light Field PIV	Zhong Li; Yu Ji; Jingyi Yu; Jinwei Ye;	In this paper, we present a PIV solution that uses a compact lenslet-based light field camera to track dense particles floating in the fluid and reconstruct the 3D fluid flow.
696	Contextual Diversity for Active Learning	Sharat Agarwal; Himanshu Arora; Saket Anand; Chetan Arora;	Since the context is difficult to evaluate in the absence of ground-truth labels, we introduce the notion of contextual diversity that captures the confusion associated with spatially co-occurring classes.
697	Temporal Aggregate Representations for Long-Range Video Understanding	Fadime Sener; Dipika Singhania; Angela Yao;	In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework.
698	Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition	Zhe Niu; Brian Mak;	In this paper, we propose novel stochastic modeling of various components of a continuous sign language recognition (CSLR) system that is based on the transformer encoder and connectionist temporal classification (CTC).
699	General 3D Room Layout from a Single View by Render-and-Compare	Sinisa Stekovic; Shreyas Hampali; Mahdi Rad; Sayan Deb Sarkar; Friedrich Fraundorfer; Vincent Lepetit;	We present a novel method to reconstruct the 3D layout of a room—walls, ?oors, ceilings—from a single perspective view in challenging conditions, by contrast with previous single-view methods restricted to cuboid-shaped layouts.
700	Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints	Vikramjit Sidhu; Edgar Tretschk; Vladislav Golyanik; Antonio Agudo; Christian Theobalt;	We introduce the first dense neural non-rigid structure from motion (N-NRSfM) approach, which can be trained end-to-end in an unsupervised manner from 2D point tracks.
701	Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability	Anelise Newman; Camilo Fosco; Vincent Casser; Allen Lee; Barry McNamara; Aude Oliva;	We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays.
702	Yet Another Intermediate-Level Attack	Qizhang Li; Yiwen Guo; Hao Chen;	In this paper, we propose a novel method to enhance the black-box transferability of baseline adversarial examples.
703	Topology-Change-Aware Volumetric Fusion for Dynamic Scene Reconstruction	Chao Li; Xiaohu Guo;	In this paper, the classic framework is re-designed to enable 4D reconstruction of dynamic scene under topology changes, by introducing a novel structure of Non-manifold Volumetric Grid to the re-design of both TSDF and EDG, which allows connectivity updates by cell splitting and replication.
704	Early Exit Or Not: Resource-Efficient Blind Quality Enhancement for Compressed Images	Qunliang Xing; Mai Xu; Tianyi Li; Zhenyu Guan;	In this paper, we propose a resource-efficient blind quality enhancement (RBQE) approach for compressed images.
705	PatchNets: Patch-Based Generalizable Deep Implicit 3D Shape Representations	Edgar Tretschk; Ayush Tewari; Vladislav Golyanik; Michael Zollh&oumlfer; Carsten Stoll; Christian Theobalt;	In this paper, we present a mid-level patch-based surface representation.
706	How does Lipschitz Regularization Influence GAN Training?	Yipeng Qin; Niloy Mitra; Peter Wonka;	In this work, we uncover an even more important effect of Lipschitz regularization by examining its impact on the loss function: It degenerates GAN loss functions to almost linear ones by restricting their domain and interval of attainable gradient values.
707	Infrastructure-based Multi-Camera Calibration using Radial Projections	Yukai Lin; Viktor Larsson; Marcel Geppert; Zuzana Kukelova; Marc Pollefeys; Torsten Sattler;	In this paper, we propose to fully calibrate a multi-camera system from scratch using an infrastructure-based approach.
708	MotionSqueeze: Neural Motion Feature Learning for Video Understanding	Heeseung Kwon; Manjin Kim; Suha Kwak; Minsu Cho;	In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features.
709	Polarized Optical-Flow Gyroscope	Masada Tzabari; Yoav Y. Schechner;	We merge by generalization two principles of passive optical sensing of motion.
710	Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation	Da Li; Timothy Hospedales;	In this paper we take an orthogonal perspective and propose a framework to further enhance performance by meta-learning the initial conditions of existing DA algorithms.
711	An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning	Yaoyao Liu; Bernt Schiele; Qianru Sun;	In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions.
712	On the Effectiveness of Image Rotation for Open Set Domain Adaptation	Silvia Bucci; Mohammad Reza Loghmani; Tatiana Tommasi;	We propose a novel method to addresses both these problems using the self-supervised task of rotation recognition.
713	Combining Task Predictors via Enhancing Joint Predictability	Kwang In Kim; Christian Richardt; Hyung Jin Chang;	We present a new predictor combination algorithm that improves the target by i) measuring the relevance of references based on their capabilities in predicting the target, and ii) strengthening such estimated relevance.
714	Multi-Scale Positive Sample Refinement for Few-Shot Object Detection	Jiaxi Wu; Songtao Liu; Di Huang; Yunhong Wang;	To this end, we propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
715	Single-Image Depth Prediction Makes Feature Matching Easier	Carl Toft; Daniyar Turmukhambetov; Torsten Sattler; Fredrik Kahl; Gabriel J. Brostow;	In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching.
716	Deep Reinforced Attention Learning for Quality-Aware Visual Recognition	Duo Li; Qifeng Chen;	In this paper, we build upon the weakly-supervised generation mechanism of intermediate attention maps in any convolutional neural networks and disclose the effectiveness of attention modules more straightforwardly to fully exploit their potential.
717	CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization	Yuxi Li; Weiyao Lin; John See; Ning Xu Shugong Xu; Ke Yan; Cong Yang;	In this paper, we propose Coarse-to-Fine Action Detector (CFAD), an original end-to-end trainable framework for efficient spatiotemporal action localization.
718	Learning Joint Spatial-Temporal Transformations for Video Inpainting	Yanhong Zeng; Jianlong Fu; Hongyang Chao;	In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
719	Single Path One-Shot Neural Architecture Search with Uniform Sampling	Zichao Guo; Xiangyu Zhang; Haoyuan Mu; Wen Heng; Zechun Liu; Yichen Wei; Jian Sun;	This work propose a Single Path One-Shot model to address the challenge in the training.
720	Learning to Generate Novel Domains for Domain Generalization	Kaiyang Zhou; Yongxin Yang; Timothy Hospedales; Tao Xiang;	This paper focuses on domain generalization (DG), the task of learning from multiple source domains a model that generalizes well to unseen domains.
721	Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections	Theodora Kontogianni; Michael Gygli; Jasper Uijlings; Vittorio Ferrari;	Instead, we recognize that user corrections can serve as sparse training examples and we propose a method that capitalizes on that idea to update the model parameters on-the-fly to the data at hand.
722	Impact of base dataset design on few-shot image classification	Othman Sbai; Camille Couprie; Mathieu Aubry;	In this paper, we systematically study the effect of variations in the training data by evaluating deep features trained on different image sets in a few-shot classification setting.
723	Invertible Zero-Shot Recognition Flows	Yuming Shen; Jie Qin; Lei Huang; Li Liu; Fan Zhu; Ling Shao;	To tackle the above limitations, for the first time, this work incorporates a new family of generative models (i.e., flow-based models) into ZSL.
724	GeoLayout: Geometry Driven Room Layout Estimation Based on Depth Maps of Planes	Weidong Zhang; Wei Zhang; Yinda Zhang;	In this work, we propose to incorporate geometric reasoning to deep learning for layout estimation. Moreover, we present a new dataset with pixel-level depth annotation of dominant planes.
725	Location Sensitive Image Retrieval and Tagging	Raul Gomez; Jaume Gibert; Lluis Gomez; Dimosthenis Karatzas;	In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth.
726	Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image	Wei Zeng; Sezer Karaoglu; Theo Gevers;	In this paper, we propose a method which jointly learns layout prediction and depth estimation from a single indoor panorama image.
727	Guessing State Tracking for Visual Dialogue	Wei Pang; Xiaojie Wang;	This paper proposes a guessing state for the Guesser, and regards guess as a process with change of guessing state through a dialogue.
728	Memory-Efficient Incremental Learning Through Feature Adaptation	Ahmet Iscen; Jeffrey Zhang; Svetlana Lazebnik; Cordelia Schmid;	We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes, instead of the images themselves, unlike most existing work.
729	Neural Voice Puppetry: Audio-driven Facial Reenactment	Justus Thies; Mohamed Elgharib; Ayush Tewari; Christian Theobalt; Matthias Nie&szligner;	We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis.
730	One-Shot Unsupervised Cross-Domain Detection	Antonio D’Innocente; Francesco Cappio Borlino; Silvia Bucci; Barbara Caputo; Tatiana Tommasi;	This paper addresses this setting, presenting an object detection algorithm able to perform unsupervised adaption across domains by using only one target sample, seen at test time.
731	Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks	Majed El Helou; Ruofan Zhou; Sabine S&uumlsstrunk;	We present an analysis, in the frequency domain, of degradation-kernel overfitting in super-resolution and introduce a conditional learning perspective that extends to both super-resolution and denoising.
732	Probabilistic Future Prediction for Video Scene Understanding	Anthony Hu; Fergal Cotter; Nikhil Mohan; Corina Gurau; Alex Kendall;	We present a novel deep learning architecture for probabilistic future prediction from video.
733	Suppressing Mislabeled Data via Grouping and Self-Attention	Xiaojiang Peng; Kai Wang; Zhaoyang Zeng; Qing Li; Jianfei Yang; Yu Qiao;	To suppressing the impact of mislabeled data, this paper proposes a conceptually simple yet efficient training block, termed as Attentive Feature Mixup (AFM), which allows paying more attention to clean samples and less to mislabeled ones via sample interactions in small groups.
734	Class-wise Dynamic Graph Convolution for Semantic Segmentation	Hanzhe Hu; Deyi Ji; Weihao Gan; Shuai Bai; Wei Wu; Junjie Yan;	In order to avoid potential misleading contextual information aggregation in previous work, we propose a class-wise dynamic graph convolution(CDGC) module to adaptively propagate information.
735	Character-Preserving Coherent Story Visualization	Yun-Zhu Song; Zhi Rui Tam; Hung-Jen Chen; Huiao-Han Lu; Hong-Han Shuai;	Therefore, we propose a new framework named Character-Preserving Coherent Story Visualization (CP-CSV) to tackle the challenges.
736	GINet: Graph Interaction Network for Scene Parsing	Tianyi Wu; Yu Lu; Yu Zhu; Chuang Zhang; MingWu; Zhanyu Ma; Guodong Guo;	In this work, we explore how to incorperate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss).
737	Tensor Low-Rank Reconstruction for Semantic Segmentation	Wanli Chen; Xinge Zhu; Ruoqi Sun; Junjun He; Ruiyu Li; Xiaoyong Shen ; Bei Yu;	In this paper, we propose a new approach to model the 3D context representations,which not only avoids the space compression, but also tackles the high-rank difficulty.
738	Attentive Normalization	Xilai Li; Wei Sun; Tianfu Wu;	In this paper, we propose a light-weight integration between the two schema.
739	Count- and Similarity-aware R-CNN for Pedestrian Detection	Jin Xie; Hisham Cholakkal; Rao Muhammad Anwer; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao; Mubarak Shah;	We propose an approach that leverages pedestrian count and proposal similarity information within a two-stage pedestrian detection framework.
740	TRADI: Tracking Deep Neural network Weight Distributions	Gianni Franchi; Andrei Bursuc; Emanuel Aldea; S&eacuteverine Dubuisson; Isabelle Bloch;	In this work we propose to make use of this knowledge and leverage it for computing the distributions of the weights of the DNN.
741	Spatiotemporal Attacks for Embodied Agents	Aishan Liu; Tairan Huang; Xianglong Liu; Yitao Xu; Yuqing Ma; Xinyun Chen; Stephen J. Maybank; Dacheng Tao;	In this work, we take the first step to study adversarial attacks for embodied agents.
742	Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model without Manual Annotation	Qingqiu Huang; Lei Yang; Huaiyi Huang; Tong Wu; Dahua Lin;	In this work, we propose a simple yet effective method, which trains a face recognition model by progressively expanding the labeled set via both selective propagation and caption-driven expansion.
743	Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild	Liqian Ma; Zhe Lin; Connelly Barnes; Alexei A Efros; Jingwan Lu;	To address this issue, we introduce unselfie, a novel photographic transformation that automatically translates a selfie into a neutral-pose portrait.
744	Design and Interpretation of Universal Adversarial Patches in Face Detection	Xiao Yang; Fangyun Wei; Hongyang Zhang; Jun Zhu;	We propose new optimization-based approaches to automatic design of universal adversarial patches for varying goals of the attack, including scenarios in which true positives are suppressed without introducing false positives.
745	Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild	Yang Xiao; Renaud Marlet;	We propose a meta-learning framework that can be applied to both tasks, possibly including 3D data.
746	Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints	Adrian Spurr; Umar Iqbal; Pavlo Molchanov; Otmar Hilliges; Jan Kautz;	Embracing this challenge we propose a set of novel losses that constrain the prediction of a neural network to lie within the range of biomechanically feasible 3D hand configurations.
747	Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification	Mang Ye; Jianbing Shen; David J. Crandall; Ling Shao; Jiebo Luo;	In this paper, we propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
748	Contextual Heterogeneous Graph Network for Human-Object Interaction Detection	Hai Wang; Wei-shi Zheng; Ling Yingbiao;	In this work, we address such a problem for HOI task by proposing a heterogeneous graph network that models humans and objects as different kinds of nodes and incorporates intra-class messages between homogeneous nodes and inter-class messages between heterogeneous nodes.
749	Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning	Xi Cheng; Zhenyong Fu; Jian Yang;	In this work, we present a simple yet effective zero-shot image super-resolution model.
750	A Closest Point Proposal for MCMC-based Probabilistic Surface Registration	Dennis Madsen; Andreas Morel-Forster; Patrick Kahr; Dana Rahbani; Thomas Vetter; Marcel L&uumlthi;	We propose to view non-rigid surface registration as a probabilistic inference problem.
751	Interactive Video Object Segmentation Using Global and Local Transfer Modules	Yuk Heo; Yeong Jun Koh; Chang-Su Kim;	An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper.
752	End-to-end Interpretable Learning of Non-blind Image Deblurring	Thomas Eboli; Jian Sun; Jean Ponce;	We propose to precondition the Richardson solver using approximate inverse filters of the (known) blur and natural image prior kernels.
753	Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation	Junsong Fan; Zhaoxiang Zhang; Tieniu Tan;	Instead of struggling to refine a single seed, we propose a novel approach to alleviate the inaccurate seed problem by leveraging the segmentation model’s robustness to learn from multiple seeds.
754	Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection	Jing Zhang; Jianwen Xie; Nick Barnes;	In this paper, we propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples, where the noisy labels are generated by unsupervised handcrafted feature-based methods.
755	Rethinking Image Deraining via Rain Streaks and Vapors	Yinglong Wang; Yibing Song; Chao Ma; Bing Zeng;	In this work, we reformulate rain streaks as transmission medium together with vapors to model rain imaging.
756	Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes	Marcelo Gennari do Nascimento; Theo W. Costain; Victor Adrian Prisacariu;	We propose a novel method for neural network quantization that casts the neural architecture search problem as one of hyperparameter search to find non-uniform bit distributions throughout the layers of a CNN.
757	Is Sharing of Egocentric Video Giving Away Your Biometric Signature?	Daksh Thapar; Chetan Arora; Aditya Nigam;	In this work, we create a novel kind of privacy attack by extracting the wearer’s gait profile, a well known biometric signature, from such optical flow in the egocentric videos.
758	Captioning Images Taken by People Who Are Blind	Danna Gurari; Yinan Zhao; Meng Zhang; Nilavra Bhattacharya;	Observing that people who are blind have relied on (human-based) image captioning services to learn about images they take for nearly a decade, we introduce the first image captioning dataset to represent this real use case.
759	Improving Semantic Segmentation via Decoupled Body and Edge Supervision	Xiangtai Li; Xia Li; Li Zhang; Guangliang Cheng; Jianping Shi; Zhouchen Lin; Shaohua Tan; Yunhai Tong;	In this paper, a new paradigm for semantic segmentation is proposed.
760	Conditional Entropy Coding for Efficient Video Compression	Jerry Liu; Shenlong Wang; Wei-Chiu Ma; Meet Shah; Rui Hu; Pranaab Dhawan; Raquel Urtasun;	We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
761	Differentiable Feature Aggregation Search for Knowledge Distillation	Yushuo Guan; Pengyu Zhao; Bingxuan Wang; Yuanxing Zhang; Cong Yao; Kaigui Bian; Jian Tang;	Specifically, we introduce DFA, a two-stage Differentiable Feature Aggregation search method that motivated by DARTS in neural architecture search, to efficiently find the aggregations.
762	Attention Guided Anomaly Localization in Images	Shashanka Venkataramanan; Kuan-Chuan Peng; Rajat Vikram Singh; Abhijit Mahalanobis;	Without the need of anomalous training images, we propose Convolutional Adversarial Variational autoencoder with Guided Attention (CAVGA), which localizes the anomaly with a convolutional latent variable to preserve the spatial information.
763	Self-supervised Video Representation Learning by Pace Prediction	Jiangliu Wang; Jianbo Jiao; Yun-Hui Liu;	This paper addresses the problem of self-supervised video representation learning from a new perspective — by video pace prediction.
764	Full-Body Awareness from Partial Observations	Chris Rockwell; David F. Fouhey;	We study this problem and make a number of contributions to address it: (i) we propose a simple but highly effective self-training framework that adapts human 3D mesh recovery systems to consumer videos and demonstrate its application to two recent systems;
765	Reinforced Axial Refinement Network for Monocular 3D Object Detection	Lijie Liu; Chufan Wu; Jiwen Lu; Lingxi Xie; Jie Zhou; Qi Tian;	To improve the efficiency of sampling, we propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
766	Self-Supervised Multi-Task Procedure Learning from Instructional Videos	Ehsan Elhamifar; Dat Huynh;	We address the problem of unsupervised procedure learning from instructional videos of multiple tasks using Deep Neural Networks (DNNs).
767	CosyPose: Consistent multi-view multi-object 6D pose estimation	Yann Labbé Justin Carpentier; Mathieu Aubry; Josef Sivic;	We introduce an approach for recovering the 6D pose of multiple known objects in a scene captured by a set of input images with unknown camera viewpoints.
768	In-Domain GAN Inversion for Real Image Editing	Jiapeng Zhu; Yujun Shen; Deli Zhao; Bolei Zhou;	To solve this problem, we propose an in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing.
769	Key Frame Proposal Network for Efficient Pose Estimation in Videos	Yuexi Zhang; Yin Wang; Octavia Camps; Mario Sznaier;	In this paper, we propose a novel method combining local approaches with global context.
770	Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning	Yuki Saito; Takuma Nakamura; Hirotaka Hachiya; Kenji Fukumizu;	In this study, we propose a novel deep learning architecture to address the abovementioned difficulties and also an efficient training framework for set-to-set matching.
771	Making Sense of CNNs: Interpreting Deep Representations &amp Their Invariances with INNs	Robin Rombach; Patrick Esser; Bj&oumlrn Ommer;	We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts.
772	Cross-Modal Weighting Network for RGB-D Salient Object Detection	Gongyang Li; Zhi Liu; Linwei Ye; Yang Wang; Haibin Ling;	In this paper, we propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
773	Open-set Adversarial Defense	Rui Shao; Pramuditha Perera; Pong C. Yuen; Vishal M. Patel;	In this paper, we show that open-set recognition systems are vulnerable to adversarial attacks.
774	Deep Image Compression using Decoder Side Information	Sharon Ayzik; Shai Avidan;	We present a Deep Image Compression neural network that relies on side information, which is only available to the decoder.
775	Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation	Jeevan Devaranjan; Amlan Kar; Sanja Fidler;	In this paper, we propose a generative model of synthetic scenes that reduces the distribution gap between the scene structure of generated scenes and a real target image dataset.
776	A Generic Visualization Approach for Convolutional Neural Networks	Ahmed Taha; Xitong Yang; Abhinav Shrivastava; Larry Davis;	We formulate attention visualization as a constrained optimization problem.
777	Interactive Annotation of 3D Object Geometry using 2D Scribbles	Tianchang Shen; Jun Gao; Amlan Kar; Sanja Fidler;	In this paper, we propose an interactive framework for annotating 3D object geometry from both point cloud data and RGB imagery.
778	Hierarchical Kinematic Human Mesh Recovery	Georgios Georgakis; Ren Li; Srikrishna Karanam; Terrence Chen; Jana Košecká Ziyan Wu;	In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model.
779	Multi-Loss Rebalancing Algorithm for Monocular Depth Estimation	Jae-Han Lee; Chang-Su Kim;	An algorithm to combine multiple loss terms adaptively for training a monocular depth estimator is proposed in this work.
780	3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View	Marc Badger; Yufu Wang; Adarsh Modh; Ammon Perkes; Nikos Kolotouros ; Bernd G. Pfrommer; Marc F. Schmidt; Kostas Daniilidis;	To address this problem, we first introduce a model and multi-view optimization approach, which we use to capture the unique shape and pose space displayed by live birds. We then introduce a pipeline and experiments for keypoint, mask, pose, and shape regression that recovers accurate avian postures from single views.
781	We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos	Alex Andonian; Camilo Fosco; Mathew Monfort; Allen Lee; Rogerio Feris; Carl Vondrick; Aude Oliva;	Here, we propose an approach for learning semantic relational set abstractions on videos, inspired by human learning.
782	Joint Optimization for Multi-Person Shape Models from Markerless 3D-Scans	Samuel Zeitvogel; Johannes Dornheim; Astrid Laubenheimer;	We propose a markerless end-to-end training framework for parametric 3D human shape models.
783	Accurate RGB-D Salient Object Detection via Collaborative Learning	Wei Ji; Jingjing Li; Miao Zhang; Yongri Piao; Huchuan Lu;	In this paper, we propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way, which solves those problems tactfully.
784	Finding Your (3D) Center: 3D Object Detection Using a Learned Loss	David Griffiths; Jan Boehm; Tobias Ritschel;	Addressing this disparity, we introduce a new optimization procedure, which allows training for 3D detection with raw 3D scans while using as little as 5\,\% of the object labels and still achieve comparable performance.
785	Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection	Ganlong Zhao; Guanbin Li; Ruijia Xu; Liang Lin;	In this paper, we are the first to reveal that the region proposal network (RPN) and region proposal classifier (RPC) in the endemic two-stage detectors (e.g., Faster RCNN) demonstrate significantly different transferability when facing large domain gap.
786	Two Stream Active Query Suggestion for Active Learning in Connectomics	Zudi Lin; Donglai Wei; Won-Dong Jang; Siyan Zhou; Xupeng Chen; Xueying Wang; Richard Schalek; Daniel Berger; Brian Matejek; Lee Kamentsky; Adi Peleg; Daniel Haehn; Thouis Jones; Toufiq Parag; Jeff Lichtman; Hanspeter Pfister;	To tackle this, we propose a two-stream active query suggestion approach.
787	Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images	Jiahui Lei; Srinath Sridhar; Paul Guerrero; Minhyuk Sung; Niloy Mitra; Leonidas J. Guibas;	We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views.
788	6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference	Mai Bui; Tolga Birdal; Haowen Deng; Shadi Albarqouni; Leonidas Guibas; Slobodan Ilic; Nassir Navab;	We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined
789	Modeling Artistic Workflows for Image Generation and Editing	Hung-Yu Tseng; Matthew Fisher; Jingwan Lu; Yijun Li; Vladimir Kim; Ming-Hsuan Yang;	Motivated by the above observations, we propose a generative model that follows a given artistic workflow, enabling both multi-stage image generation as well as multi-stage image editing of an existing piece of art.
790	A Large-scale Annotated Mechanical Components Benchmark for Classification and Retrieval Tasks with Deep Neural Networks	Sangpil Kim; Hyung-gun Chi; Xiao Hu; Qixing Huang; Karthik Ramani;	We introduce a large-scale annotated mechanical components benchmark for classification and retrieval tasks named MechanicalComponents Benchmark (MCB): a large-scale dataset of 3D objects of mechanical components.
791	Hidden Footprints: Learning Contextual Walkability from 3D Human Trails	Jin Sun; Hadar Averbuch-Elor; Qianqian Wang; Noah Snavely;	We tackle this problem by leveraging information from existing datasets, without any additional labeling.
792	Self-Supervised Learning of Audio-Visual Objects from Video	Triantafyllos Afouras; Andrew Owens; Joon Son Chung; Andrew Zisserman;	Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning.
793	GAN-based Garment Generation Using Sewing Pattern Images	Yu Shen; Junbang Liang; Ming C. Lin;	We propose a unified method using the generative network.
794	Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach	Chaitanya Ahuja; Dong Won Lee; Yukiko I. Nakano; Louis-Philippe Morency;	In this paper, we propose a new model, named Mix-StAGE, which trains a single model for multiple speakers while learning unique style embeddings for each speaker’s gestures in an end-to-end manner.
795	An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds	Rui Huang; Wanyue Zhang; Abhijit Kundu; Caroline Pantofaru; David A Ross; Thomas Funkhouser; Alireza Fathi;	To address this problem, in this paper we propose a sparse LSTM-based multi-frame 3d object detection algorithm.
796	Monotonicity Prior for Cloud Tomography	Tamar Loeub; Aviad Levis; Vadim Holodovsky; Yoav Y. Schechner;	We introduce a differentiable monotonicity prior, useful to express signals of monotonic tendency.
797	Learning Trailer Moments in Full-Length Movies with Co-Contrastive Attention	Lezi Wang; Dong Liu; Rohit Puri; Dimitris N. Metaxas;	We introduce a novel ranking network that utilizes the Co-Attention between movies and trailers as guidance to generate the training pairs, where the moments highly corrected with trailers are expected to be scored higher than the uncorrelated moments.
798	Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval	Christopher Thomas; Adriana Kovashka;	We propose novel within-modality losses which encourage semantic coherency in both the text and image subspaces, which does not necessarily align with visual coherency.
799	Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline	Vishvak Murahari; Dhruv Batra; Devi Parikh; Abhishek Das;	Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog.
800	Learning to Generate Grounded Visual Captions without Localization Supervision	Chih-Yao Ma; Yannis Kalantidis; Ghassan AlRegib; Peter Vajda; Marcus Rohrbach; Zsolt Kira;	In this work, we help the model to achieve this via a novel cyclical training regimen that forces the model to localize each word in the image after the sentence decoder generates it, and then reconstruct the sentence from the localized image region(s) to match the ground-truth.
801	Neural Hair Rendering	Menglei Chai; Jian Ren; Sergey Tulyakov;	In this paper, we propose a generic neural-based hair rendering pipeline that can synthesize photo-realistic images from virtual 3D hair models.
802	JNR: Joint-based Neural Rig Representation for Compact 3D Face Modeling	Noranart Vesdapunt; Mitch Rundle; HsiangTao Wu; Baoyuan Wang;	In this paper, we introduce a novel approach to learn a 3D face model using a joint-based face rig and a neural skinning network.
803	On Disentangling Spoof Trace for Generic Face Anti-Spoofing	Yaojie Liu; Joel Stehouwer; Xiaoming Liu;	This work designs a novel adversarial learning framework to disentangle the spoof traces from input faces as a hierarchical combination of patterns at multiple scales.
804	Streaming Object Detection for 3-D Point Clouds	Wei Han; Zhengdong Zhang; Benjamin Caine; Brandon Yang; Christoph Sprunk; Ouais Alsharif; Jiquan Ngiam; Vijay Vasudevan; Jonathon Shlens; Zhifeng Chen;	In this work, we explore how to build an object detector that removes this artificial latency constraint, and instead operates on native streaming data in order to significantly reduce latency.
805	NAS-DIP: Learning Deep Image Prior with Neural Architecture Search	Yun-Chun Chen; Chen Gao; Esther Robb; Jia-Bin Huang;	Building upon a generic U-Net architecture, our core contribution lies in designing new search spaces for (1) an upsampling cell and (2) a pattern of cross-scale residual connections.
806	Learning to Learn in a Semi-Supervised Fashion	Yun-Chun Chen; Chao-Te Chou; Yu-Chiang Frank Wang;	To address semi-supervised learning from both labeled and unlabeled data, we present a novel meta-learning scheme.
807	FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning	Chia-Wen Kuo; Chih-Yao Ma; Jia-Bin Huang; Zsolt Kira;	In this paper, we propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations.
808	RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects	Bin Yang; Runsheng Guo; Ming Liang; Sergio Casas; Raquel Urtasun;	To better address this, we propose a new solution that exploits both LiDAR and Radar sensors for perception.
809	Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation	Medhini Narasimhan; Erik Wijmans; Xinlei Chen; Trevor Darrell; Dhruv Batra; Devi Parikh; Amanpreet Singh;	We introduce a learning-based approach for room navigation using semantic maps.
810	Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes	Chenhongyi Yang; Vitaly Ablavsky; Kaihong Wang; Qi Feng; Margrit Betke;	In this work, we propose a novel Non-Maximum-Suppression (NMS) algorithm that dramatically improves the detection recall while maintaining high precision in scenes with heavy occlusions.
811	Towards causal benchmarking of bias in face analysis algorithms	Guha Balakrishnan; Yuanjun Xiong; Wei Xia; Pietro Perona;	To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which directly manipulates the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change.
812	Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation	Tong He; Dong Gong; Zhi Tian; Chunhua Shen;	To tackle the above issue, we propose a memory-augmented network that learns and memorizes the representative prototypes that encode both geometry and semantic information.
813	Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions	Noa Garcia; Yuta Nakashima;	Inspired by this behaviour, we design ROLL, a model for knowledge-based video story question answering that leverages three crucial aspects of movie understanding: dialog comprehension, scene reasoning, and storyline recalling.
814	Transformation Consistency Regularization – A Semi-Supervised Paradigm for Image-to-Image Translation	Aamir Mustafa; Rafal K. Mantiuk;	We propose Transformation Consistency Regularization, which delves into a more challenging setting of image-to-image translation, which remains unexplored by semi-supervised algorithms.
815	LIRA: Lifelong Image Restoration from Unknown Blended Distortions	Jianzhao Liu; Jianxin Lin; Xin Li; Wei Zhou; Sen Liu; Zhibo Chen;	When the input is degraded by a new distortion, inspired by adult neurogenesis in human memory system, we develop a neural growing strategy where the previously trained model can incorporate a new expert branch and continually accumulate new knowledge without interfering with learned knowledge.
816	HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization	Jiahao Lin; Gim Hee Lee;	In this paper, we propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization in the camera coordinate space.
817	SOLO: Segmenting Objects by Locations	Xinlong Wang; Tao Kong; Chunhua Shen; Yuning Jiang; Lei Li;	We present a new, embarrassingly simple approach to instance segmentation in images.
818	Learning to See in the Dark with Events	Song Zhang; Yu Zhang; Zhe Jiang; Dongqing Zou; Jimmy Ren; Bin Zhou;	In this paper, we propose learning to see in the dark by translating the HDR events in low light to canonical sharp images as if captured in day light.
819	Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data	Tim Salzmann; Boris Ivanovic; Punarjay Chakravarty; Marco Pavone;	Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps).
820	Context-Gated Convolution	Xudong Lin; Lin Ma; Wei Liu; Shih-Fu Chang;	Motivated by this, we propose one novel Context-Gated Convolution (CGC) to explicitly modify the weights of convolutional layers adaptively under the guidance of global context.
821	Polynomial Regression Network for Variable-Number Lane Detection	Bingke Wang; Zilei Wang; Yixin Zhang;	In this work, we propose to use polynomial curves to represent traffic lanes and then propose a novel polynomial regression network (PRNet) to directly predict them, where semantic segmentation is not involved.
822	Structural Deep Metric Learning for Room Layout Estimation	Wenzhao Zheng; Jiwen Lu; Jie Zhou;	In this paper, we propose a structural deep metric learning (SDML) method for room layout estimation, which aims to recover the 3D spatial layout of a cluttered indoor scene from a monocular RGB image.
823	Adaptive Task Sampling for Meta-Learning	Chenghao Liu; Zhihao Wang; Doyen Sahoo; Yuan Fang Kun Zhang; Steven C.H. Hoi;	In this paper, we propose an adaptive task sampling method to improve the generalization performance.
824	Deep Complementary Joint Model for Complex Scene Registration and Few-shot Segmentation on Medical Images	Yuting He; Tiantian Li; Guanyu Yang; Youyong Kong; Yang Chen; Huazhong Shu; Jean-Louis Coatrieux; Jean-Louis Dillenseger; Shuo Li;	We propose a novel Deep Complementary Joint Model (DeepRS) for complex scene registration and few-shot segmentation.
825	Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems	Kailai Zhou; Linsen Chen; Xun Cao;	Inspired by this observation, we propose Modality Balance Network (MBNet) which facilitates the optimization process in a much more flexible and balanced manner.
826	High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling	Yu Zeng; Zhe Lin; Jimei Yang; Jianming Zhang; Eli Shechtman; Huchuan Lu;	To address this challenge, we propose an iterative inpainting method with a feedback mechanism.
827	Online Ensemble Model Compression using Knowledge Distillation	Devesh Walawalkar; Zhiqiang Shen; Marios Savvides;	This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble.
828	Deep Learning-based Pupil Center Detection for Fast and Accurate Eye Tracking System	Kang Il Lee; Jung Ho Jeon; Byung Cheol Song;	Thus, we propose more accurate pupil center detection by improving the representation quality of the network in charge of pupil center detection.
829	Efficient Residue Number System Based Winograd Convolution	Zhi-Gang Liu; Matthew Mattina;	Our work extends the Winograd algorithm to Residue Number System (RNS).
830	Robust Tracking against Adversarial Attacks	Shuai Jia; Chao Ma; Yibing Song; Xiaokang Yang;	We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
831	Single-Shot Neural Relighting and SVBRDF Estimation	Shen Sang; Manmohan Chandraker;	We present a novel physically-motivated deep network for joint shape and material estimation, as well as relighting under novel illumination conditions, using a single image captured by a mobile phone camera.
832	Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement	Qiang Nie ; Ziwei Liu ; Yunhui Liu;	In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner.
833	Angle-based Search Space Shrinking for Neural Architecture Search	Yiming Hu; Yuding Liang; Zichao Guo; Ruosi Wan; Xiangyu Zhang; Yichen Wei; Qingyi Gu; Jian Sun;	In this work, we present a simple and general search space shrinking method, called Angle-Based search space Shrinking (ABS), for Neural Architecture Search (NAS).
834	RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition	Xiaoyu Yue; Zhanghui Kuang; Chenhao Lin; Hongbin Sun; Wayne Zhang;	To suppress such side-effect, we propose a novel position enhancement branch, and dynamically fuse its outputs with those of the decoder attention module for scene text recognition.
835	Towards Fast, Accurate and Stable 3D Dense Face Alignment	Jianzhu Guo; Xiangyu Zhu; Yang Yang; Fan Yang; Zhen Lei; Stan Z. Li;	In this paper, we propose a novel regression framework which makes a balance among speed, accuracy and stability.
836	Iterative Feature Transformation for Fast and Versatile Universal Style Transfer	Tai-Yin Chiu; Danna Gurari;	We propose a new transformation that iteratively stylizes features with analytical gradient descent.
837	CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search	Xin Chen; Yawen Duan; Zewei Chen; Hang Xu; Zihao Chen; Xiaodan Liang; Tong Zhang; Zhenguo Li;	This is the first work to our knowledge that proposes an efficient transferrable NAS solution while maintaining robustness across various settings.
838	Toward Faster and Simpler Matrix Normalization via Rank-1 Update	Tan Yu; Yunfeng Cai; Ping Li;	To overcome these limitations, we propose a rank-1 update normalization (RUN), which only needs matrix-vector multiplications and thus is significantly more efficient than NS iteration using matrix-matrix multiplications.
839	Accurate Polarimetric BRDF for Real Polarization Scene Rendering	Yuhi Kondo; Taishi Ono; Legong Sun; Yasutaka Hirasawa; Jun Murayama;	In this paper, we propose a new polarimetric BRDF (pBRDF) model.
840	Lensless Imaging with Focusing Sparse URA Masks in Long-Wave Infrared and its Application for Human Detection	Ilya Reshetouski; Hideki Oyaizu; Kenichiro Nakamura; Ryuta Satoh; Suguru Ushiki; Ryuichi Tadano; Atsushi Ito; Jun Murayama;	We introduce a lensless imaging framework for contemporary computer vision applications in long-wavelength infrared (LWIR).
841	Topology-Preserving Class-Incremental Learning	Xiaoyu Tao; Xinyuan Chang; Xiaopeng Hong; Xing Wei; Yihong Gong;	On this basis, we propose a novel topology-preserving class-incremental learning (TPCIL) framework.
842	Inter-Image Communication for Weakly Supervised Localization	Xiaolin Zhang; Yunchao Wei; Yi Yang;	In this paper, we propose to leverage pixel-level similarities across different objects for learning more accurate object locations in a complementary way.
843	UFO&sup2: A Unified Framework towards Omni-supervised Object Detection	Zhongzheng Ren; Zhiding Yu; Xiaodong Yang; Ming-Yu Liu; Alexander G. Schwing; Jan Kautz;	In this paper, we present UFO$^2$, a unified object detection framework that can handle different forms of supervision simultaneously.
844	iCaps: An Interpretable Classifier via Disentangled Capsule Networks	Dahuin Jung; Jonghyun Lee; Jihun Yi; Sungroh Yoon;	In this work, we address these two limitations using a novel class-supervised disentanglement algorithm and an additional regularizer, respectively.
845	Detecting Natural Disasters, Damage, and Incidents in the Wild	Ethan Weber; Nuria Marzo; Dim P. Papadopoulos; Aritro Biswas; Agata Lapedriza; Ferda Ofli; Muhammad Imran; Antonio Torralba;	In this work, we present the Incidents Dataset, which contains 446,684 images annotated by humans that cover 43 incidents across a variety of scenes.
846	Dynamic ReLU	Yinpeng Chen; Xiyang Dai; Mengchen Liu; Dongdong Chen; Lu Yuan; Zicheng Liu;	In this paper, we propose dynamic ReLU (DY-ReLU), a dynamic rectifier of which parameters are generated by a hyper function over all in-put elements.
847	Acquiring Dynamic Light Fields through Coded Aperture Camera	Kohei Sakai; Keita Takahashi; Toshiaki Fujii; Hajime Nagahara;	We investigate the problem of compressive acquisition of a dynamic light field.
848	Gait Recognition from a Single Image using a Phase-Aware Gait Cycle Reconstruction Network	Chi Xu; Yasushi Makihara; Xiang Li; Yasushi Yagi; Jianfeng Lu;	We propose a method of gait recognition just from a single image for the first time, which enables latency-free gait recognition.
849	Informative Sample Mining Network for Multi-Domain Image-to-Image Translation	Jie Cao; Huaibo Huang; Yi Li; Ran He; Zhenan Sun;	In this paper, we reveal that improving the sample selection strategy is an effective solution.
850	Spherical Feature Transform for Deep Metric Learning	Yuke Zhu; Yan Bai; Yichen Wei;	This work proposes a novel spherical feature transform approach.
851	Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering	Ruixue Tang; Chao Ma; Wei Emma Zhang; Qi Wu; Xiaokang Yang;	In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data.
852	Unsupervised Multi-View CNN for Salient View Selection of 3D Objects and Scenes	Ran Song; Wei Zhang; Yitian Zhao; Yonghuai Liu;	We present an unsupervised 3D deep learning framework based on a ubiquitously true proposition named by us view-object consistency as it states that a 3D object and its projected 2D views always belong to the same object class.
853	Representation Sharing for Fast Object Detector Search and Beyond	Yujie Zhong; Zelu Deng; Sheng Guo; Matthew R. Scott; Weilin Huang;	To enhance such capability, we propose an extremely efficient neural architecture search method, named Fast And Diverse (FAD), to better explore the optimal configuration of receptive fields and con-volution types in the sub-networks for one-stage detectors.
854	Peeking into occluded joints: A novel framework for crowd pose estimation	Lingteng Qiu; Xuanye Zhang; Yanran Li; Guanbin Li; Xiaojun Wu; Zixiang Xiong; Xiaoguang Han; Shuguang Cui;	Therefore, we thoroughly pursue this problem and propose a novel OPEC-Net framework together with a new Occluded Pose (OCPose) dataset with 9k annotated images.
855	RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition	Linxi Fan; Shyamal Buch; Guanzhi Wang; Ryan Cao; Yuke Zhu; Juan Carlos Niebles; Li Fei-Fei;	To this end, we introduce RubiksNet, a new efficient architecture for video action recognition which is based on a proposed learnable 3D spatiotemporal shift operation instead.
856	Deep Hashing with Active Pairwise Supervision	Ziwei Wang; Quan Zheng; Jiwen Lu; Jie Zhou;	n this paper, we propose a Deep Hashing method with Active Pairwise Supervision(DH-APS).
857	Graph Edit Distance Reward: Learning to Edit Scene Graph	Lichang Chen; Guosheng Lin; Shijie Wang; Qingyao Wu;	In this paper, we propose a new method to edit the scene graph according to the user instructions, which has never been explored.
858	Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing	Yajie Xing; Jingbo Wang; Gang Zeng;	In this paper, we propose a novel operator called malleable 2.5D convolution to learn the receptive field along the depth-axis.
859	Feature-metric Loss for Self-supervised Learning of Depth and Egomotion	Chang Shu; Kun Yu; Zhixiang Duan; Kuiyuan Yang;	In this work, feature-metric loss is proposed and defined on feature representation, where the feature representation is also learned in a self-supervised manner and regularized by both first-order and second-order derivatives to constrain the loss landscapes to form proper convergence basins.
860	Propagating Over Phrase Relations for One-Stage Visual Grounding	Sibei Yang; Guanbin Li; Yizhou Yu;	In this paper, we propose a linguistic structure guided propagation network for one-stage phrase grounding.
861	Adversarial Semantic Data Augmentation for Human Pose Estimation	Yanrui Bin; Xuan Cao; Xinya Chen; Yanhao Ge; Ying Tai; Chengjie Wang; Jilin Li; Feiyue Huang; Changxin Gao; Nong Sang;	We instead propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
862	Free View Synthesis	Gernot Riegler; Vladlen Koltun;	We present a method for novel view synthesis from input images that are freely distributed around a scene.
863	Face Anti-Spoofing via Disentangled Representation Learning	Ke-Yue Zhang; Taiping Yao; Jian Zhang; Ying Tai; Shouhong Ding; Jilin Li; Feiyue Huang; Haichuan Song; Lizhuang Ma;	In this paper, motivated by the disentangled representation learning, we propose a novel perspective of face anti-spoofing that disentangles the liveness features and content features from images, and the liveness features is further used for classification.
864	Prime-Aware Adaptive Distillation	Youcai Zhang; Zhonghao Lan; Yuchen Dai; Fangao Zeng; Yan Bai; Jie Chang; Yichen Wei;	This paper introduces the adaptive sample weighting to KD.
865	Meta-Learning with Network Pruning	Hongduan Tian; Bo Liu; Xiao-Tong Yuan; Qingshan Liu;	To remedy this deficiency, we propose a network pruning based meta-learning approach for overfitting reduction via explicitly controlling the capacity of network.
866	Spiral Generative Network for Image Extrapolation	Dongsheng Guo; Hongzhi Liu; Haoru Zhao; Yunhao Cheng; Qingwei Song; Zhaorui Gu; Haiyong Zheng; Bing Zheng;	In this paper, motivated by human natural ability to perceive unseen surroundings imaginatively, we propose a novel Spiral Generative Network, SpiralNet, to perform image extrapolation in a spiral manner, which regards extrapolation as an evolution process growing from an input sub-image along a spiral curve to an expanded full image.
867	SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches	Fang Liu; Changqing Zou; Xiaoming Deng; Ran Zuo; Yu-Kun Lai; Cuixia Ma; Yong-Jin Liu; Hongan Wang;	In this paper, for the first time, we study the fine-grained scene-level SBIR problem which aims at retrieving scene images satisfying the user’s specific requirements via a freehand scene sketch.
868	Few-shot Compositional Font Generation with Dual Memory	Junbum Cha; Sanghyuk Chun; Gayoung Lee; Bado Lee; Seonghyeon Kim; Hwalsuk Lee;	In this paper, we focus on compositional scripts, a widely used letter system in the world, where each glyph can be decomposed by several components.
869	PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling	Yue Qian; Junhui Hou; Sam Kwong; Ying He;	In this paper, we propose a novel deep neural network based method, called PUGeo-Net, for upsampling 3D point clouds.
870	Handcrafted Outlier Detection Revisited	Luca Cavalli; Viktor Larsson; Martin Ralf Oswald; Torsten Sattler; Marc Pollefeys;	Based on best practices, we propose a hierarchical pipeline for effective outlier detection as well as integrate novel ideas which in sum lead to an efficient and competitive approach to outlier rejection.
871	The Average Mixing Kernel Signature	Luca Cosmo; Giorgia Minello; Michael Bronstein; Luca Rossi; Andrea Torsello;	We introduce the Average Mixing Kernel Signature (AMKS), a novel signature for points on non-rigid three-dimensional shapes based on the average mixing kernel and continuous-time quantum walks.
872	BCNet: Learning Body and Cloth Shape from A Single Image	Boyi Jiang; Juyong Zhang; Yang Hong; Jinhao Luo; Ligang Liu; Hujun Bao;	In this paper, we consider the problem to automatically reconstruct garment and body shapes from a single near-front view RGB image. To train our model, we construct two large scale datasets with ground truth body and garment geometries as well as paired color images.
873	Self-supervised Keypoint Correspondences for Multi-Person Pose Estimation and Tracking in Videos	Umer Rafi; Andreas Doering; Bastian Leibe; Juergen Gall;	To address this issue, we propose an approach that relies on key point correspondences for associating persons in videos.
874	Interactive Multi-Dimension Modulation with Dynamic Controllable Residual Learning for Image Restoration	Jingwen He; Chao Dong; Yu Qiao;	To make a step forward, this paper presents a new problem setup, called multi-dimension (MD) modulation, which aims at modulating output effects across multiple degradation types and levels.
875	Polysemy Deciphering Network for Human-Object Interaction Detection	Xubin Zhong; Changxing Ding; Xian Qu; Dacheng Tao;	To address this issue, in this paper, we propose a novel Polysemy Deciphering Network (PD-Net), which decodes the visual polysemy of verbs for HOI detection in three ways.
876	PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning	Arthur Douillard; Matthieu Cord; Charles Ollion; Thomas Robert; Eduardo Valle;	In this work, we propose PODNet, a model inspired by representation learning.
877	Learning Graph-Convolutional Representations for Point Cloud Denoising	Francesca Pistilli; Giulia Fracastoro; Diego Valsesia; Enrico Magli;	We propose a deep neural network based on graph-convolutional layers that can elegantly deal with the permutation-invariance problem encountered by learning-based point cloud processing methods.
878	Semantic Line Detection Using Mirror Attention and Comparative Ranking and Matching	Dongkwon Jin; Jun-Tae Lee; Chang-Su Kim;	A novel algorithm to detect semantic lines is proposed in this paper.
879	A Differentiable Recurrent Surface for Asynchronous Event-Based Data	Marco Cannici; Marco Ciccone; Andrea Romanoni ; Matteo Matteucci;	In this paper, we propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces.
880	Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches	Ruoyi Du; Dongliang Chang; Ayan Kumar Bhunia; Jiyang Xie; Zhanyu Ma ; Yi-Zhe Song; Jun Guo;	In this work, we propose a novel framework for fine-grained visual classi?cation to tackle these problems.
881	LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation	Tak-Wai Hui; Chen Change Loy;	In this paper, we introduce LiteFlowNet3, a deep network consisting of two specialized modules, to address the above challenges.
882	Microscopy Image Restoration with Deep Wiener-Kolmogorov Filters	Valeriya Pronina; Filippos Kokkinos; Dmitry V. Dylov; Stamatios Lefkimmiatis;	In this work, we propose a unifying framework of algorithms for Gaussian image deblurring and denoising.
883	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	Dave Zhenyu Chen; Angel X. Chang; Matthias Nie&szligner;	In order to train and benchmark our method, we introduce a new ScanRefer dataset, containing 46,173 descriptions of 9,943 objects from 703 ScanNet scenes.
884	JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds	Zeyu Hu; Mingmin Zhen; Xuyang Bai; Hongbo Fu; Chiew-lan Tai;	In this paper, we tackle the 3D semantic edge detection task for the first time and present a new two-stream fully-convolutional network that jointly performs the two tasks.
885	Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior	Hu Zhang; Linchao Zhu; Yi Zhu; Yi Yang;	In this paper, we aim to attack video models by utilizing intrinsic movement pattern and regional relative motion among video frames.
886	An Inference Algorithm for Multi-Label MRF-MAP Problems with Clique Size 100	Ishant Shanu; Siddhant Bharti; Chetan Arora; S. N. Maheshwari;	In this paper, we propose an algorithm for optimal solutions to submodular higher-order multi-label MRF-MAP energy functions which can handle practical computer vision problems with up to 16 labels and cliques of size 100.
887	Dual Refinement Underwater Object Detection Network	Baojie Fan; Wei Chen; Yang Cong; Jiandong Tian;	To address these problems, we propose an underwater detection framework with feature enhancement and anchor refinement.
888	Multiple Sound Sources Localization from Coarse to Fine	Rui Qian; Di Hu; Heinrich Dinkel; Mengyue Wu; Ning Xu; Weiyao Lin;	To solve this problem, we develop a two-stage audiovisual learning framework that disentangles audio and visual representations of different categories from complex scenes, then performs cross-modal feature alignment in a coarse-to-fine manner.
889	Task-Aware Quantization Network for JPEG Image Compression	Jinyoung Choi; Bohyung Han;	We propose to learn a deep neural network for JPEG image compression, which predicts image-specific optimized quantization tables fully compatible with the standard JPEG encoder and decoder.
890	Energy-Based Models for Deep Probabilistic Regression	Fredrik K. Gustafsson; Martin Danelljan; Goutam Bhat; Thomas B. Sch&oumln;	We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation.
891	CLOTH3D: Clothed 3D Humans	Hugo Bertiche; Meysam Madadi; Sergio Escalera;	We present CLOTH3D, the first big scale synthetic dataset of 3D clothed human sequences.
892	Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images	Kang Zhou; Yuting Xiao; Jianlong Yang; Jun Cheng; Wen Liu; Weixin Luo; Zaiwang Gu; Jiang Liu; Shenghua Gao;	Motivated by this, we propose to leverage the relation between the image texture and structure to design a deep neural network for anomaly detection.
893	CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers	Xingping Dong; Jianbing Shen; Ling Shao; Fatih Porikli;	In this paper, we provide a deep analysis for Siamese-based trackers and find that the one core reason for their failure on challenging cases can be attributed to the problem of {\it decisive samples missing} during offline training.
894	Occlusion-Aware Siamese Network for Human Pose Estimation	Lu Zhou; Yingying Chen; Yunze Gao; Jinqiao Wang; Hanqing Lu;	To conquer this dilemma, we propose an occlusion-aware siamese network to improve the performance.
895	Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model	Yufan Liu; Minglang Qiao; Mai Xu; Bing Li; Weiming Hu; Ali Borji;	In this paper, we thoroughly investigate such influences by establishing a large-scale eye-tracking database of Multiple-face Video in Visual-Audio condition (MVVA).
896	NormalGAN: Learning Detailed 3D Human from a Single RGB-D Image	Lizhen Wang; Xiaochen Zhao; Tao Yu; Songtao Wang; Yebin Liu;	We propose NormalGAN, a fast adversarial learning-based method to reconstruct the complete and detailed 3D human from a single RGB-D image.
897	Model-based occlusion disentanglement for image-to-image translation	Fabio Pizzati; Pietro Cerri; Raoul de Charette;	Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model.
898	Rotation-robust Intersection over Union for 3D Object Detection	Yu Zheng; Danyang Zhang; Sinan Xie; Jiwen Lu; Jie Zhou;	In this paper, we propose a Rotation-robust Intersection over Union ($ extit{RIoU}$) for 3D object detection, which aims to jointly learn the overlap of rotated bounding boxes.
899	New Threats against Object Detector with Non-local Block	Yi Huang; Fan Wang; Adams Wai-Kin Kong; Kwok-Yan Lam;	In this paper, two new threats named disappearing attack and appearing attack against object detectors with a non-local block are investigated.
900	Self-Supervised CycleGAN for Object-Preserving Image-to-Image Domain Adaptation	Xinpeng Xie; Jiawei Chen; Yuexiang Li; Linlin Shen; Kai Ma; Yefeng Zheng;	In this paper, we propose a novel GAN (namely OP-GAN) to address the problem, which involves a self-supervised module to enforce the image content consistency during image-to-image translations without any extra annotations.
901	On the Usage of the Trifocal Tensor in Motion Segmentation	Federica Arrigoni; Luca Magri; Tomas Pajdla;	In this paper we address motion segmentation in multiple images by combining partial results coming from triplets of images, which are obtained by fitting a number of trifocal tensors to correspondences.
902	3D-Rotation-Equivariant Quaternion Neural Networks	Wen Shen; Binbin Zhang; Shikun Huang; Zhihua Wei; Quanshi Zhang;	This paper proposes a set of rules to revise various neural networks for 3D point cloud processing to rotation-equivariant quaternion neural networks (REQNNs).
903	InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image	Gyeongsik Moon; Shoou-I Yu; He Wen; Takaaki Shiratori; Kyoung Mu Lee;	Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image.
904	Active Crowd Counting with Limited Supervision	Zhen Zhao; Miaojing Shi; Xiaoxiao Zhao; Li Li;	In the last cycle when the labeling budget is met, the large amount of unlabeled data are also utilized: a distribution classifier is introduced to align the labeled data with unlabeled data furthermore, we propose to mix up the distribution labels and latent representations of data in the network to particularly improve the distribution alignment in-between training samples.
905	Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance	Marvin Klingner; Jan-Aike Termhlen; Jonas Mikolajczyk; Tim Fingscheidt;	In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models.
906	Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language	Shaoxiang Chen; Yu-Gang Jiang;	In this paper, we propose a novel TALL method which builds a Hierarchical Visual-Textual Graph to model interactions between the objects and words as well as among the objects to jointly understand the video contents and the language.
907	Do Not Mask What You Do Not Need to Mask: a Parser-Free Virtual Try-On	Thibaut Issenhuth; J&eacuter&eacutemie Mary; Cl&eacutement Calauz&egravenes;	In this paper, we propose a novel student-teacher paradigm where the teacher is trained in the standard way (reconstruction) before guiding the student to focus on the initial task (changing the cloth).
908	NODIS: Neural Ordinary Differential Scene Understanding	Yuren Cong; Hanno Ackermann; Wentong Liao; Michael Ying Yang; Bodo Rosenhahn;	In this work, we interpret that formulation as Ordinary Differential Equation (ODE).
909	AssembleNet++: Assembling Modality Representations via Attention Connections – Supplementary Material –	Michael S. Ryoo; AJ Piergiovanni; Juhana Kangaspunta; Anelia Angelova;	We create a family of powerful video models which are able to: (i) learn interactions between semantic object information and raw appearance and motion features, and (ii) deploy attention in order to better learn the importance of features at each convolutional block of the network.
910	Learning Propagation Rules for Attribution Map Generation	Yiding Yang; Jiayan Qiu; Mingli Song; Dacheng Tao; Xinchao Wang;	In this paper, we propose a dedicated method to generate attribution maps that allow us to learn the propagation rules automatically, overcoming the flaws of the hand-crafted ones.
911	Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference	Menelaos Kanakis; David Bruggemann; Suman Saha; Stamatios Georgoulis ; Anton Obukhov; Luc Van Gool;	In this paper, we show that both can be achieved simply by reparameterizing the convolutions of standard neural network architectures into a non-trainable shared part (filter bank) and task-specific parts (modulators), where each modulator has a fraction of the filter bank parameters.
912	Learning Predictive Models from Observation and Interaction	Karl Schmeckpeper; Annie Xie; Oleh Rybkin; Stephen Tian; Kostas Daniilidis; Sergey Levine; Chelsea Finn;	We address the first challenge by formulating the corresponding graphical model and treating the action as an observed variable for the interaction data and an unobserved variablefor the observation data, and the second challenge by using a domain-dependent prior.
913	Unifying Deep Local and Global Features for Image Search	Bingyi Cao; Andr&eacute Araujo; Jack Sim;	In this work, our key contribution is to unify global and local features into a single deep model, enabling accurate retrieval with efficient feature extraction.
914	Human Body Model Fitting by Learned Gradient Descent	Jie Song; Xu Chen; Otmar Hilliges;	We propose a novel algorithm for the fitting of 3D human shape to images.
915	DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition	Matthew Korban; Xin Li;	We propose a Dynamic Directed Graph Convolutional Network (DDGCN) to model spatial and temporal features of human actions from their skeletal representations.
916	Learning latent representations across multiple data domains using Lifelong VAEGAN	Fei Ye; Adrian G. Bors;	In this paper, we propose a novel lifelong learning approach, namely the Lifelong VAEGAN (L-VAEGAN), which not only induces a powerful generative replay network but also learns meaningful latent representations, benefiting representation learning.
917	DVI: Depth Guided Video Inpainting for Autonomous Driving	Miao Liao; Feixiang Lu; Dingfu Zhou; Sibo Zhang; Wei Li; Ruigang Yang;	To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud.
918	Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation	Kenan E. Ak; Ning Xu; Zhe Lin; Yilin Wang;	To address these limitations, we propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models.
919	APRICOT: A Dataset of Physical Adversarial Attacks on Object Detection	A. Braunegg; Amartya Chakraborty; Michael Krumdick; Nicole Lape; Sara Leary; Keith Manville; Elizabeth Merkhofer; Laura Strickhart; Matthew Walmer;	We present APRICOT, a collection of over 1,000 annotated photographs of printed adversarial patches in public locations.
920	Visual Question Answering on Image Sets	Ankan Bansal; Yuting Zhang; Rama Chellappa;	We introduce the task of Image-Set Visual Question Answering (ISVQA), which generalizes the commonly studied single-image VQA problem to multi-image settings.
921	Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots	Qi Chen; Lin Sun; Zhixin Wang; Kui Jia; Alan Yuille;	We thus argue in this paper for an approach opposite to existing methods using object-level anchors.
922	Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations	Huaiyi Huang; Yuqi Zhang; Qingqiu Huang; Zhengkui Guo; Ziwei Liu; Dahua Lin;	In this work, we contribute Placepedia1, a large-scale place dataset with more than 35M photos from 240K unique places.
923	DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points	Ayan Sinha; Zak Murez; James Bartolozzi; Vijay Badrinarayanan; Andrew Rabinovich;	Distinct from cost volume approaches, we propose an efficient depth estimation approach by first (a) detecting and evaluating descriptors for interest points, then (b) learning to match and triangulate a small set of interest points, and finally densifying this sparse set of 3D points using CNNs.
924	Dynamic Low-light Imaging with Quanta Image Sensors	Yiheng Chi; Abhiram Gnanasambandam; Vladlen Koltun; Stanley H. Chan;	We propose a solution using Quanta Image Sensors (QIS) and present a new image reconstruction algorithm.
925	Disambiguating Monocular Depth Estimation with a Single Transient	Mark Nishimura; David B. Lindell; Christopher Metzler; Gordon Wetzstein;	In this work, we demonstrate how a depth histogram of the scene, which can be readily captured using a single-pixel time-resolved detector, can be fused with the output of existing monocular depth estimation algorithms to resolve the depth ambiguity problem.
926	DSDNet: Deep Structured self-Driving Network	Wenyuan Zeng; Shenlong Wang; Renjie Liao; Yun Chen; Bin Yang; Raquel Urtasun;	In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network.
927	QuEST: Quantized Embedding Space for Transferring Knowledge	Himalaya Jain; Spyros Gidaris; Nikos Komodakis; Patrick P&eacuterez; Matthieu Cord;	In this work, we propose a novel way to achieve this goal: by distilling the knowledge through a quantized visual words space.
928	EGDCL: An Adaptive Curriculum Learning Framework for Unbiased Glaucoma Diagnosis	Rongchang Zhao; Xuanlin Chen; Zailiang Chen; Shuo Li;	In this paper, we propose a novel curriculum learning paradigm (EGDCL) to train an unbiased glaucoma diagnosis model with the adaptive dual-curriculum.
929	Backpropagated Gradient Representations for Anomaly Detection	Gukyeong Kwon; Mohit Prabhushankar; Dogancan Temel; Ghassan AlRegib;	Hence, we propose the utilization of backpropagated gradients as representations to characterize model behavior on anomalies and, consequently, detect such anomalies.
930	Dense RepPoints: Representing Visual Objects with Dense Point Sets	Ze Yang; Yinghao Xu; Han Xue; Zheng Zhang Raquel Urtasun; Liwei Wang ; Stephen Lin; Han Hu;	We present a new object representation, called Dense Rep-Points, which utilize a large number of points to describe the multi-grainedobject representation of both box level and pixel level.
931	On Dropping Clusters to Regularize Graph Convolutional Neural Networks	Xikun Zhang; Chang Xu; Dacheng Tao;	To effectively regularize GCNs, we devise DropCluster which first randomly zeros some seed entries and then zeros entries that are spatially or depth-wisely correlated to those seed entries.
932	Adaptive Video Highlight Detection by Learning from User History	Mrigank Rochan; Mahesh Kumar Krishna Reddy; Linwei Ye; Yang Wang;	In this paper, we propose a simple yet effective framework that learns to adapt highlight detection to a user by exploiting the user’s history in the form of highlights that the user has previously created.
933	Improving 3D Object Detection through Progressive Population Based Augmentation	Shuyang Cheng; Zhaoqi Leng; Ekin Dogus Cubuk; Barret Zoph; Chunyan Bai; Jiquan Ngiam; Yang Song; Benjamin Caine; Vijay Vasudevan; Congcong Li; Quoc V. Le; Jonathon Shlens; Dragomir Anguelov;	In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection.
934	DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction	Jiongchao Jin; Akshay Gadi Patil; Zhang Xiong; Hao Zhang;	We introduce a differential visual similarity metric to train deep neural networks for 3D reconstruction, aimed at improving reconstruction quality.
935	SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization	Xuefeng Hu; Zhihan Zhang; Zhenye Jiang; Syomantak Chaudhuri; Zhenheng Yang; Ram Nevatia;	We present a novel, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations.
936	Adversarial Learning for Zero-shot Domain Adaptation	Jinghua Wang; Jianmin Jiang;	With the hypothesis that the shift between a given pair of domains is shared across tasks, we propose a new method for ZSDA by transferring domain shift from an irrelevant task (IrT) to the task of interest (ToI).
937	YOLO in the Dark – Domain Adaptation Method for Merging Multiple Models –	Yukihiro Sasagawa; Hajime Nagahara;	We propose a method of domain adaptation for merging multiple models with less effort than creating an additional dataset.
938	Identity-Aware Multi-Sentence Video Description	Jae Sung Park; Trevor Darrell; Anna Rohrbach;	We propose a multi-sentence Identity-Aware Video Description task, which overcomes this limitation and requires to re-identify persons locally within a set of consecutive clips.
939	VQA-LOL: Visual Question Answering under the Lens of Logic	Tejas Gokhale; Pratyay Banerjee; Chitta Baral; Yezhou Yang;	In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions.
940	Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation	Mengyao Zhai; Lei Chen; Jiawei He; Megha Nawhal; Frederick Tung; Greg Mori;	In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks.
941	TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering	Xiaofeng Yang; Guosheng Lin; Fengmao Lv; Fayao Liu;	We propose a novel tiered reasoning method that dynamically selects object level candidates based on language representations and generates robust pairwise relations within the selected candidate objects.
942	Mining Inter-Video Proposal Relations for Video Object Detection	Mingfei Han; Yali Wang; Xiaojun Chang; Yu Qiao;	To address the limitation, we propose a novel Inter-Video Proposal Relation module.
943	TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval	Jie Lei; Licheng Yu; Tamara L. Berg; Mohit Bansal;	We introduce TV show Retrieval (TVR), a new multimodal retrieval dataset.
944	Minimum Class Confusion for Versatile Domain Adaptation	Ying Jin; Ximei Wang; Mingsheng Long(); Jianmin Wang;	To this end, this paper studies Versatile Domain Adaptation (VDA),where one method can handle several different DA scenarios without any modification.
945	Large Batch Optimization for Object Detection: Training COCO in 12 Minutes	Tong Wang; Yousong Zhu; Chaoyang Zhao; Wei Zeng; Yaowei Wang; Jinqiao Wang; Ming Tang;	Specifically, we present a novel Periodical Moments Decay LAMB (PMD-LAMB) algorithm to effectively reduce the negative effects of the lagging historical gradients.
946	Towards Practical and Efficient High-Resolution HDR Deghosting with CNN	K. Ram Prabhakar; Susmit Agrawal; Durgesh Kumar Singh; Balraj Ashwath ; R. Venkatesh Babu;	In this paper, we present a deep neural network based approach to generate high-quality ghost-free HDR for high-resolution images.
947	Monocular Differentiable Rendering for Self-Supervised 3D Object Detection	Deniz Beker; Hiroharu Kato; Mihai Adrian Morariu; Takahiro Ando; Toru Matsuoka; Wadim Kehl; Adrien Gaidon;	To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks.
948	Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation	Meng Tian; Marcelo H Ang Jr; Gim Hee Lee;	We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
949	Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction	Chaofan Tao; Qinhong Jiang; Lixin Duan; Ping Luo;	However, unlike previous work that isolated the spatial interaction, temporal coherence, and scene layout, this paper designs a new mechanism, \ie, Dynamic and Static Context-aware Motion Predictor (DSCMP), to integrates these rich information into the long-short-term-memory (LSTM).
950	Image-based table recognition: data, model, and evaluation	Xu Zhong; Elaheh ShafieiBavani; Antonio Jimeno Yepes;	To facilitate image-based table recognition with deep learning, we develop and release the largest publicly available table recognition dataset PubTabNet, containing 568k table images with corresponding structured HTML representation.
951	Group Activity Prediction with Sequential Relational Anticipation Model	Junwen Chen; Wentao Bao,; Yu Kong;	In this paper, we propose a novel approach to predict group activities given the beginning frames with incomplete activity executions.
952	PiP: Planning-informed Trajectory Prediction for Autonomous Driving	Haoran Song; Wenchao Ding; Yuxuan Chen; Shaojie Shen; Michael Yu Wang; Qifeng Chen;	We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting.
953	PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer	Duo Li; Anbang Yao; Qifeng Chen;	We bridge this regret by exploiting multi-scale features in a finer granularity.
954	Hierarchical Context Embedding for Region-based Object Detection	Zhao-Min Chen; Xin Jin; Borui Zhao; Xiu-Shen Wei; Yanwen Guo;	To address this issue, we present a simple but effective Hierarchical Context Embedding (HCE) framework, which can be applied as a plug-and-play component, to facilitate the classification ability of a series of region-based detectors by mining contextual cues.
955	Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition	Jin Ye; Junjun He; Xiaojiang Peng; Wenhao Wu; Yu Qiao;	Our goal is to eliminate such bias and enhance the robustness of the learnt features.
956	Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection	Yuliang Guo; Guang Chen; Peitao Zhao; Weide Zhang; Jinghao Miao; Jingao Wang; Tae Eun Choe;	We present a generalized and scalable method, called Gen-LaneNet, to detect 3D lanes from a single image. Moreover, we release a new synthetic dataset and its construction strategy to encourage the development and evaluation of 3D lane detection methods.
957	Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction	Xin Xiong; Haipeng Xiong; Ke Xian; Chen Zhao; Zhiguo Cao; Xin Li;	In this work, we approach this problem by addressing two issues that have been under-researched in the open literature: sampling strategy (data term) and graph construction (prior term).
958	MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation	Kaisiyuan Wang Qianyi Wu Linsen Song Zhuoqian Yang Wayne Wu Chen Qian Ran He Yu Qiao Chen Change Loy;	To address this issue, we build the Multi-view Emotional Audio-visual Dataset(MEAD) which is a talking-face video corpus featuring 60 actors and actresses talking with 8 different emotions at 3 different intensity levels.
959	Detecting Human-Object Interactions with Action Co-occurrence Priors	Dong-Jin Kim Xiao Sun Jinsoo Choi Stephen Lin In So Kweon;	In this paper, we model the correlations as action co-occurrence matrices and present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
960	Learning Connectivity of Neural Networks from a Topological Perspective	Kun Yuan; Quanquan Li; Jing Shao; Junjie Yan;	In this paper, we attempt to optimize the connectivity in neural networks.
961	JSTASR: Joint Size and Transparency-Aware Snow Removal Algorithm Based on Modified Partial Convolution and Veiling Effect Removal	Wei-Ting Chen; Hao-Yu Fang; Jian-Jiun Ding; Cheng-Che Tsai; Sy-Yen Kuo;	In this paper, first, we reformulate the snow model. Different from that in the previous works, in the proposed snow model, the veiling effect is included. Second, a novel joint size and transparency-aware snow removal algorithm called JSTASR is proposed.
962	Ocean: Object-aware Anchor-free Tracking	Zhipeng Zhang; Houwen Peng; Jianlong Fu Bing Li; Weiming Hu;	In this paper, we propose a novel object-aware anchor-free network to address this issue.
963	Object Tracking using Spatio-Temporal Networks for Future Prediction Location	Yuan Liu; Ruoteng Li; Yu Cheng; Robby T. Tan; Xiubao Sui;	We introduce an object tracking algorithm that predicts the future locations of the target object and assists the tracker to handle object occlusion.
964	Pillar-based Object Detection for Autonomous Driving	Yue Wang; Alireza Fathi; Abhijit Kundu; David A. Ross; Caroline Pantofaru; Tom Funkhouser; Justin Solomon;	We present a simple and flexible object detection framework optimized for autonomous driving.
965	Sparse Adversarial Attack via Perturbation Factorization	Yanbo Fan; Baoyuan Wu; Tuanhui Li; Yong Zhang; Mingyang Li; Zhifeng Li; Yujiu Yang;	This work studies the sparse adversarial attack, which aims to generate adversarial perturbations onto partial positions of one benign image, such that the perturbed image is incorrectly predicted by one deep neural network (DNN) model.
966	3D Scene Reconstruction from a Single Viewport	Maximilian Denninger; Rudolph Triebel;	We present a novel approach to infer volumetric reconstructions from a single viewport, based only on an RGB image and a reconstructed normal image.
967	Learning to Optimize Domain Specific Normalization for Domain Generalization	Seonguk Seo; Yumin Suh; Dongwan Kim; Geeho Kim; Jongwoo Han; Bohyung Han;	We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains.
968	Self-supervised Outdoor Scene Relighting	Ye Yu; Abhimitra Meka; Mohamed Elgharib; Hans-Peter Seidel; Christian Theobalt; William A. P. Smith;	In contrast, we propose a self-supervised approach for relighting.
969	Privacy Preserving Visual SLAM	Mikiya Shibuya; Shinya Sumikura; Ken Sakurada;	This study proposes a privacy-preserving Visual SLAM framework for estimating camera poses and performing bundle adjustment with mixed line and point clouds in real time.
970	Leveraging Acoustic Images for Effective Self-Supervised Audio Representation Learning	Valentina Sanguineti; Pietro Morerio; Niccol&ograve Pozzetti; Danilo Greco; Marco Cristani; Vittorio Murino;	In this paper, we propose the use of a new modality characterized by a richer information content, namely acoustic images, for the sake of audio-visual scene understanding.
971	Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval	Yanbei Chen; Loris Bazzani;	In this work, we study the problem of composing images and textual modifications for language-guided retrieval in the context of fashion applications.
972	Globally Optimal and Efficient Vanishing Point Estimation in Atlanta World	Haoang Li; Pyojin Kim; Ji Zhao; Kyungdon Joo; Zhipeng Cai; Zhe Liu ; Yun-Hui Liu;	To overcome these limitations, we propose the novel mine-and-stab (MnS) algorithm and embed it in the branch-and-bound (BnB) algorithm.
973	StyleGAN2 Distillation for Feed-forward Image Manipulation	Yuri Viazovetskyi; Vladimir Ivashkin; Evgeny Kashin;	We propose a way to distill a particular image manipulation of StyleGAN2 into image-to-image network trained in paired way.
974	Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds	Jinxian Liu; Minghui Yu; Bingbing Ni?; Ye Chen;	We develop a novel learning scheme named Self-Prediction for 3D instance and semantic segmentation of point clouds.
975	Learning Disentangled Representations via Mutual Information Estimation	Eduardo Hugo Sanchez; Mathieu Serrurier; Mathias Ortner;	In this paper, we investigate the problem of learning disentangled representations.
976	Challenge-Aware RGBT Tracking	Chenglong Li; Lei Liu; Andong Lu; Qing Ji; Jin Tang;	In this paper, we propose a novel challenge-aware neural network to handle the modality-shared challenges (e.g., fast motion, scale variation and occlusion) and the modality-specific ones (e.g., illumination variation and thermal crossover) for RGBT tracking.
977	Fully Trainable and Interpretable Non-Local Sparse Models for Image Restoration	Bruno Lecouat; Jean Ponce; Julien Mairal;	We propose a novel differentiable relaxation of joint sparsity that exploits both principles and leads to a general framework for image restoration which is (1) trainable end to end, (2) fully interpretable, and (3) much more compact than competing deep learning architectures.
978	AutoSimulate: (Quickly) Learning Synthetic Data Generation	Harkirat Singh Behl; Atilim G&uumlne? Baydin; Ran Gal; Philip H.S. Torr; Vibhav Vineet;	We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective.
979	LatticeNet: Towards Lightweight Image Super-resolution with Lattice Block	Xiaotong Luo; Yuan Xie; Yulun Zhang; Yanyun Qu; Cuihua Li; Yun Fu;	To address this problem, we focus on the lightweight models for fast and accurate image SR.
980	Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation	M.Naseer Subhani; Mohsen Ali;	In this paper, we propose a novel approach of exploiting scale-invariance property of the semantic segmentation model for self-supervised domain adaptation.
981	Active Visual Information Gathering for Vision-Language Navigation	Hanqing Wang; Wenguan Wang; Tianmin Shu; Wei Liang; Jianbing Shen;	To achieve this, we propose an end-to-end framework for learning an exploration policy that decides i) when and where to explore, ii) what information is worth gathering during exploration, and iii) how to adjust the navigation decision after the exploration.
982	Deep Hough-Transform Line Priors	Yancong Lin; Silvia L. Pintea; Jan C. van Gemert;	Here, we reduce the dependency on labeled data by building on the classic knowledge-based priors while using deep networks to learn features.
983	Unsupervised Shape and Pose Disentanglement for 3D Meshes	Keyang Zhou; Bharat Lal Bhatnagar; Gerard Pons-Moll;	In this paper, we presenta simple yet effective approach to learn disentangled shape and poserepresentations in an unsupervised setting.
984	CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection	Muhammad Zaigham Zaheer; Arif Mahmood; Marcella Astrid; Seung-Ik Lee;	In this work, we propose a weakly supervised anomaly detection method which has manifold contributions including 1) a random batch based training procedure to reduce inter-batch correlation, 2) a normalcy suppression mechanism to minimize anomaly scores of the normal regions of a video by taking into account the overall information available in one training batch, and 3) a clustering distance based loss to contribute towards mitigating the label noise and to produce better anomaly representations by encouraging our model to generate distinct normal and anomalous clusters.
985	Inclusive GAN: Improving Data and Minority Coverage in Generative Models	Ning Yu; Ke Li; Peng Zhou Jitendra Malik; Larry Davis; Mario Fritz;	We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include, and validate its effectiveness at little compromise from the overall performance on the entire dataset.
986	SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects	Evangelos Ntavelis; Andr&eacutes Romero; Iason Kastanis; Luc Van Gool; Radu Timofte;	To address these limitations, we propose SESAME, a novel generator-discriminator pair for Semantic Editing of Scenes by Adding, Manipulating or Erasing objects.
987	Dive Deeper Into Box for Object Detection	Ran Chen; Yong Liu; Mengdan Zhang; Shu Liu; Bei Yu; Yu-Wing Tai;	This motivates us to investigate a box reorganization method (DDBNet), which can dive deeper into the box to strive for more accurate localization.
988	PG-Net: Pixel to Global Matching Network for Visual Tracking	Bingyan Liao; Chenye Wang; Yayun Wang; Yaonong Wang; Jun Yin;	In this paper, a Pixel to Global Matching Network (PG-Net) is proposed to suppress the influence of background in search image while achieving state-of-the-art tracking performance.
989	Why Are Deep Representations Good Perceptual Quality Features?	Taimoor Tariq; Okan Tarhan Tursun; Munchurl Kim; Piotr Didyk;	We introduce two new formulations to measure the frequency and orientation selectivity of the features learned by convolutional layers for evaluating deep features learned by widely-used deep CNNs such as VGG-16.
990	Geometric Estimation via Robust Subspace Recovery	Aoxiang Fan; Xingyu Jiang; Yang Wang; Junjun Jiang; Jiayi Ma;	In this paper, we consider the problem from an optimization perspective, to exploit the intrinsic linear structure of point correspondences to assist estimation.
991	Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification	Sanath Narayan; Akshita Gupta; Fahad Shahbaz Khan; Cees G. M. Snoek; Ling Shao;	We propose to enforce semantic consistency at all stages of (generalized) zero-shot learning: training, feature synthesis and classification.
992	Human Correspondence Consensus for 3D Object Semantic Understanding	Yujing Lou; Yang You; Chengkun Li; Zhoujun Cheng; Liangwei Li; Lizhuang Ma; Weiming Wang; Cewu Lu;	In this paper, we introduce a new dataset named CorresPondenceNet.
993	Learning Memory Augmented Cascading Network for Compressed Sensing of Images	Jiwei Chen; Yubao Sun; Qingshan Liu; Rui Huang;	In this paper, we propose a cascading network for compressed sensing of images with progressive reconstruction.
994	Least squares surface reconstruction on arbitrary domains	Dizhong Zhu; William A. P. Smith;	We propose a new method for computing numerical derivatives based on 2D Savitzky-Golay filters and K-nearest neighbour kernels.
995	Task-conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery	My Kieu; Andrew D. Bagdanov; Marco Bertini; Alberto del Bimbo;	In this paper, we propose a novel approach to domain adaptation that significantly improves pedestrian detection performance in the thermal domain.
996	Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting	Junhua Zou; Zhisong Pan; Junyang Qiu; Xin Liu; Ting Rui; Wei Li;	We introduce a three stage pipeline: resized-diverse-inputs (RDIM), diversity-ensemble (DEM) and region fitting, that work together to generate transferable adversarial examples.
997	DADA: Differentiable Automatic Data Augmentation	Yonggang Li; Guosheng Hu; Yongtao Wang; Timothy Hospedales; Neil M. Robertson; Yongxin Yang;	In this paper, we propose Differentiable Automatic Data Augmentation (DADA) which dramatically reduces the cost.
998	SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans	Armen Avetisyan; Tatiana Khanova; Christopher Choy; Denver Dash; Angela Dai; Matthias Nie&szligner;	We present a novel approach to reconstructing lightweight, CAD-based representations of scanned 3D environments from commodity RGB-D sensors.
999	Kinship Identification through Joint Learning using Kinship Verification Ensembles	Wei Wang; Shaodi You; Theo Gevers;	To this end, we propose a novel kinship identification approach based onjoint training of kinship verification ensembles and classification modules.
1000	Kernelized Memory Network for Video Object Segmentation	Hongje Seong; Junhyuk Hyun; Euntai Kim;	To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN).
1001	A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection	Xiaoqi Zhao; Lihe Zhang¹ Youwei Pang; Huchuan Lu; Lei Zhang;	In this work, we design a single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model.
1002	Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation	Tianyi Zhang; Guosheng Lin; Weide Liu; Jianfei Cai; Alex Kot;	In this paper we focus on the task of weakly-supervised semantic segmentation supervised with image-level labels.
1003	Temporal Keypoint Matching and Refinement Network for Pose Estimation and Tracking	Chunluan Zhou Zhou Ren Gang Hua;	In this paper, we mainly focus on improving pose association and estimation in a video to build a strong pose estimator and tracker.
1004	Neural Point-Based Graphics	Kara-Ali Aliev; Artem Sevastopolsky; Maria Kolos; Dmitry Ulyanov; Victor Lempitsky;	We present a new point-based approach for modeling the appearance of real scenes.
1005	FHDe&sup2Net: Full High Definition Demoireing Network	Bin He; Ce Wang; Boxin Shi; Ling-Yu Duan;	We propose the Full High Definition Demoir´eing Network (FHDe2Net) to solve such problems.
1006	Learning Structural Similarity of User Interface Layouts using Graph Networks	Dipu Manandhar; Dan Ruta; John Collomosse;	We propose a novel representation learning technique for measuring the similarity of user interface designs.
1007	NAS-Count: Counting-by-Density with Neural Architecture Search	Yutao Hu ¹ Xiaolong Jiang ² Xuhui Liu; Baochang Zhang; Jungong Han; Xianbin Cao ² David Doermann;	In this work, we automate the design of counting models with Neural Architecture Search (NAS) and introduce an end-to-end searched encoder-decoder architecture, Automatic Multi-Scale Network (AMSNet).
1008	Towards Generalization Across Depth for Monocular 3D Object Detection	Andrea Simonelli; Samuel Rota Buló Lorenzo Porzi; Elisa Ricci; Peter Kontschieder;	In particular, in this work we show that, thanks to our virtual views generation process, a lightweight, single-stage architecture suffices to set new state-of-the-art results on the popular KITTI3D benchmark.
1009	Margin-Mix: Semi–Supervised Learning for Face Expression Recognition	Corneliu Florea; Mihai Badea; Laura Florea; Andrei Racoviteanu; Constantin Vertan;	In this paper, as we aim to construct a semi-supervised learning algorithm, we exploit the characteristics of the Deep Convolutional Networks to provide, for an input image, both an embedding descriptor and a prediction.
1010	Principal Feature Visualisation in Convolutional Neural Networks	Marianne Bakken; Johannes Kvam; Alexey A. Stepanov; Asbj&oslashrn Berge;	We introduce a new visualisation technique for CNNs called Principal Feature Visualisation (PFV).
1011	Progressive Refinement Network for Occluded Pedestrian Detection	Xiaolin Song Kaili Zhao Wen-Sheng Chu Honggang Zhang Jun Guo;	We present Progressive Refinement Network (PRNet), a novel single-stage detector that tackles occluded pedestrian detection.
1012	Monocular Real-Time Volumetric Performance Capture	Ruilong Li; Yuliang Xiu; Shunsuke Saito; Zeng Huang; Kyle Olsewski; Hao Li;	We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model.
1013	The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale	Christian Ertler; Jerneja Mislej; Tobias Ollmann; Lorenzo Porzi; Gerhard Neuhold; Yubin Kuang;	In this paper, we introduce a new traffic sign dataset of 105K street-level images around the world covering 400 manually annotated traffic sign classes in diverse scenes, wide range of geographical locations, and varying weather and lighting conditions.
1014	Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction	Anil Armagan; Guillermo Garcia-Hernando; Seungryul Baek; Shreyas Hampali; Mahdi Rad; Zhaohui Zhang; Shipeng Xie; MingXiu Chen; Boshen Zhang; Fu Xiong; Yang Xiao; Zhiguo Cao; Junsong Yuan; Pengfei Ren?; Weiting Huang?; Haifeng Sun?; Marek Hr&uacutez?; Jakub Kanis?; Zden?k Kr?oul?; Qingfu Wan; Shile Li; Linlin Yang; Dongheui Lee; Angela Yao; Weiguo Zhou; Sijia Mei; Yunhui Liu; Adrian Spurr; Umar Iqbal; Pavlo Molchanov; Philippe Weinzaepfel; Romain Br&eacutegier; Gr&eacutegory Rogez; Vincent Lepetit; Tae-Kyun Kim;	To address these issues, we designed a public challenge (HANDS’19) to evaluate the abilities of current 3D hand pose estimators~(HPEs) to interpolate and extrapolate the poses of a training set.
1015	Disentangling Multiple Features in Video Sequences using Gaussian Processes in Variational Autoencoders	Sarthak Bhagat; Shagun Uppal; Zhuyun Yin; Nengli Lim;	We introduce MGP-VAE (Multi-disentangled-features Gaussian Processes Variational AutoEncoder), a variational autoencoder which uses Gaussian processes (GP) to model the latent space for the unsupervised learning of disentangled representations in video sequences.
1016	SEN: A Novel Feature Normalization Dissimilarity Measure for Prototypical Few-Shot Learning Networks	Van Nhan Nguyen; Sigurd L&oslashkse; Kristoffer Wickstr&oslashm; Michael Kampffmeyer; Davide Roverso; Robert Jenssen;	In this paper, we equip Prototypical Networks (PNs) with a novel dissimilarity measure to enable discriminative feature normalization for few-shot learning.
1017	Kinematic 3D Object Detection in Monocular Video	Garrick Brazil; Gerard Pons-Moll; Xiaoming Liu; Bernt Schiele;	In this work, we propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
1018	Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents	Ye Zhu; Yu Wu; Yi Yang; Yan Yan;	To this end, in this paper, we introduce a new task called video description via two multi-modal cooperative dialog agents, whose ultimate goal is for one conversational agent to describe an unseen video based on the dialog and two static frames.
1019	SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding	Sangmin Lee; Jung Uk Kim; Hak Gu Kim; Seongyeop Kim; Yong Man Ro;	In this paper, we propose a novel symptom-aware cybersickness assessment network (SACA Net) that quantifies physical symptom levels for assessing cybersickness of individual viewers.
1020	End-to-End Low Cost Compressive Spectral Imaging with Spatial-Spectral Self-Attention	Ziyi Meng; Jiawei Ma; Xin Yuan;	To make solid progress on this challenging yet under-investigated task, we reproduce a stable single disperser (SD) CASSI system to gather large-scale real-world CASSI data and propose a novel deep convolutional network to carry out the real-time reconstruction by using self-attention.
1021	Know Your Surroundings: Exploiting Scene Information for Object Tracking	Goutam Bhat; Martin Danelljan; Luc Van Gool; Radu Timofte;	In this work, we propose a novel tracking architecture which can utilize scene information for tracking.
1022	Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases	Ren Wang; Gaoyuan Zhang; Sijia Liu; Pin-Yu Chen; Jinjun Xiong; Meng Wang;	In this paper, we study the problem of the Trojan network (TrojanNet) detection in the data-scarce regime, where only the weights of a trained DNN are accessed by the detector.
1023	Anatomy-Aware Siamese Network: Exploiting Semantic Asymmetry for Accurate Pelvic Fracture Detection in X-ray Images	Haomin Chen; Yirui Wang; Kang Zheng; Weijian Li; Chi-Tung Chang; Adam P. Harrison; Jing Xiao; Gregory D. Hager; Le Lu; Chien-Hung Liao; Shun Miao;	In this work, we present a new approach to fracture detection that uses a Siamese network to take advantage of the anatomical symmetry of pelvic structures to improve fracture detection.
1024	DeepLandscape: Adversarial Modeling of Landscape Videos	Elizaveta Logacheva; Roman Suvorov; Oleg Khomenko; Anton Mashikhin; Victor Lempitsky;	We propose simple but necessary modifications to StyleGAN inversion procedure, which lead to in-domain latent codes and allow to manipulate real images.
1025	GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images	Lei Kang; Pau Riba; Yaxing Wang; Mar&ccedilal Rusi&ntildeol; Alicia Forn&eacutes; Mauricio Villegas;	In this work, we take a step closer to producing realistic and varied artificially rendered handwritten words.
1026	Spatial-Angular Interaction for Light Field Image Super-Resolution	Yingqian Wang; Longguang Wang; Jungang Yang; Wei An; Jingyi Yu; Yulan Guo;	In this paper, we propose a spatial-angular interactive network (namely, LF-InterNet) for LF image SR.
1027	BATS: Binary ArchitecTure Search	Adrian Bulat; Brais Martinez; Georgios Tzimiropoulos;	This paper proposes Binary ArchitecTure Search (BATS), a framework that drastically reduces the accuracy gap between binary neural networks and their real-valued counterparts by means of Neural Architecture Search (NAS).
1028	A Closer Look at Local Aggregation Operators in Point Cloud Analysis	Ze Liu(†); Han Hu; Yue Cao; Zheng Zhang; Xin Tong;	In this paper, we revisit the representative local aggregation operators and study their performance using the same deep residual architecture.
1029	Look here! A parametric learning based approach to redirect visual attention	Youssef A. Mejjati; Celso F. Gomez; Kwang In Kim; Eli Shechtman; Zoya Bylinskii;	Motivated by professional work flows, we introduce an automatic method to make an image region more attention-capturing via subtle image edits that maintain realism and fidelity to the original.
1030	Variational Diffusion Autoencoders with Random Walk Sampling	Henry Li; Ofir Lindenbaum; Xiuyuan Cheng; Alexander Cloninger;	We propose a method that combines these approaches into a generative model that inherits the asymptotic guarantees of diffusion maps while preserving the scalability of deep models.
1031	Adaptive Variance Based Label Distribution Learning For Facial Age Estimation	Xin Wen; Biying Li; Haiyun Guo; Zhiwei Liu; Guosheng Hu; Ming Tang; Jinqiao Wang;	To model a sample-specific variance, in this paper, we propose an adaptive variance based distribution learning (AVDL) method for facial age estimation.
1032	Connecting the Dots: Detecting Adversarial Perturbations Using Context Inconsistency	Shasha Li; Shitong Zhu; Sudipta Paul; Amit Roy-Chowdhury; Chengyu Song; Srikanth Krishnamurthy; Ananthram Swami; Kevin S Chan;	In brief, our approach builds a set of autoencoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if a perturbation was added to the sample and trigger context violation.
1033	Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations	Abbas Sadat; Sergio Casas; Mengye Ren; Xinyu Wu; Pranaab Dhawan; Raquel Urtasun;	In this paper we propose a novel end-to-end learnable network that performs joint perception, prediction and motion planningfor self-driving vehicles and produces interpretable intermediate representations.
1034	VarSR: Variational Super-Resolution Network for Very Low Resolution Images	Sangeek Hyun; Jae-Pil Heo;	In this paper, we propose VarSR, Variational Super Resolution Network, that matches latent distributions of LR and HR images to recover the missing details.
1035	Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation	Ashwin Raju; Chi-Tung Cheng; Yuankai Huo; Jinzheng Cai; Junzhou Huang; Jing Xiao; Le Lu; ChienHung Liao; Adam P. Harrison;	In this work, we present a novel segmentation strategy, co-heterogenous andadaptive segmentation (CHASe), which only requires a small labeled cohort of single phase data to adapt to any unlabeled cohort of heterogenous multi-phase data with possibly new clinical scenarios and pathologies.
1036	Towards Recognizing Unseen Categories in Unseen Domains	Massimiliano Mancini; Zeynep Akata; Elisa Ricci; Barbara Caputo;	The key idea of CuMix is to simulate the test-time domain and semantic shift using images and features from unseen domains and categories generated by mixing up the multiple source domains and categories available during training.
1037	Square Attack: a query-efficient black-box adversarial attack via random search	Maksym Andriushchenko; Francesco Croce; Nicolas Flammarion; Matthias Hein;	We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$- adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking.
1038	You Are Here: Geolocation by Embedding Maps and Images	Noe Samano; Mengjie Zhou; Andrew Calway;	We present a novel approach to geolocalising panoramic images on a 2-D cartographic map based on learning a low dimensional embedded space, which allows a comparison between an image captured at a location and local neighbourhoods of the map.
1039	Segmentations-Leak: Membership Inference Attacks and Defenses in Semantic Image Segmentation	Yang He; Shadi Rahimian; Bernt Schiele; Mario Fritz;	We present the first attacks and defenses for complex, state of the art models for semantic segmentation.
1040	From Image to Stability: Learning Dynamics from Human Pose	Jesse Scott; Bharadwaj Ravichandran; Christopher Funk; Robert T. Collins; Yanxi Liu;	We propose and validate two end-to-end deep learning architectures to learn foot pressure distribution maps (dynamics) from 2D or 3D human pose (kinematics).
1041	LevelSet R-CNN: A Deep Variational Method for Instance Segmentation	Namdar Homayounfar Yuwen Xiong Justin Liang Wei-Chiu Ma Raquel Urtasun;	We propose LevelSet R-CNN, which combines the best of both worlds by obtaining powerful feature representations that are combined in an end-to-end manner with a variational segmentation framework.
1042	Efficient Scale-Permuted Backbone with Learned Resource Distribution	Xianzhi Du; Tsung-Yi Lin; Pengchong Jin; Yin Cui Mingxing Tan; Quoc Le; Xiaodan Song;	In this work, we propose a simple technique to combine efficient operations and compound scaling with a previously learned scale-permuted architecture.
1043	Reducing Distributional Uncertainty by Mutual Information Maximisation and Transferable Feature Learning	Jian Gao; Yang Hua; Guosheng Hu; Chi Wang; Neil M. Robertson;	In this paper, we propose to formulate the distributional uncertainty both between the source(s) and target domain(s) and within each domain using mutual information.
1044	Bridging Knowledge Graphs to Generate Scene Graphs	Alireza Zareian; Svebor Karaman; Shih-Fu Chang;	In this paper, we present a unified formulation of these two constructs, where a scene graph is seen as an image-conditioned instantiation of a commonsense knowledge graph.
1045	Implicit Latent Variable Model for Scene-Consistent Motion Forecasting	Sergio Casas; Cole Gulino; Simon Suo; Katie Luo; Renjie Liao; Raquel Urtasun;	In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
1046	Learning Visual Commonsense for Robust Scene Graph Generation	Alireza Zareian; Zhecan Wang; Haoxuan You; Shih-Fu Chang;	We propose the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data, and use that to improve the robustness of scene understanding.
1047	MPCC: Matching Priors and Conditionals for Clustering	Nicol&aacutes Astorga; Pablo Huijse; Pavlos Protopapas; Pablo Est&eacutevez;	We propose Matching Priors and Conditionals for Clustering (MPCC), a GAN-based model with an encoder to infer latent variables and cluster categories from data, and a flexible decoder to generate samples from a conditional latent space.
1048	PointAR: Efficient Lighting Estimation for Mobile Augmented Reality	Yiqin Zhao; Tian Guo;	We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art mobile deep learning models.
1049	Discrete Point Flow Networks for Efficient Point Cloud Generation	Roman Klokov; Edmond Boyer; Jakob Verbeek;	We introduce a latent variable model that builds on normalizing flows with affine coupling layers to generate 3D point clouds of an arbitrary size given a latent shape representation.
1050	Accelerating Deep Learning with Millions of Classes	Zhuoning Yuan; Zhishuai Guo; Xiaotian Yu; Xiaoyu Wang; Tianbao Yang;	To address these issues, we propose an efficient training framework to handle extreme classification tasks based onRandom Projection.
1051	Password-conditioned Anonymization and Deanonymization with Face Identity Transformers	Xiuye Gu; Weixin Luo; Michael S. Ryoo; Yong Jae Lee;	We propose a novel face identity transformer which enables automated photo-realistic password-based anonymization and deanonymization of human faces appearing in visual data.
1052	Inertial Safety from Structured Light	Sizhuo Ma; Mohit Gupta;	We present inertial safety maps (ISM), a novel scene representation designed for fast detection of obstacles in scenarios involving camera or scene motion, such as robot navigation and human-robot interaction.
1053	PointTriNet: Learned Triangulation of 3D Point Sets	Nicholas Sharp; Maks Ovsjanikov;	We present PointTriNet, a differentiable and scalable approach enabling point set triangulation as a layer in 3D learning pipelines.
1054	Toward Unsupervised, Multi-Object Discovery in Large-Scale Image Collections	Huy V. Vo; Patrick P&eacuterez; Jean Ponce;	We build on the optimization approach of Vo {m et al.} [34] with several key novelties: (1) We propose a novel saliency-based region proposal algorithm that achieves significantly higher overlap with ground-truth objects than other competitive methods.
1055	Deep Novel View Synthesis from Colored 3D Point Clouds	Zhenbo Song; Wayne Chen; Dylan Campbell; Hongdong Li;	We propose a new deep neural network which takes a colored 3D point cloud of a scene, and directly synthesizes a photo-realistic image from an arbitrary viewpoint.
1056	Consensus-Aware Visual-Semantic Embedding for Image-Text Matching	Haoran Wang; Ying Zhang; Zhong Ji; Yanwei Pang; Lin Ma;	In this paper, we propose a Consensus-aware Visual-Semantic Embedding (CVSE) model to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching.
1057	Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising	Guanting Dong; Yueyi Zhang; Zhiwei Xiong;	In this paper, we propose a Spatial Hierarchy Aware Residual Pyramid Network, called SHARP-Net, to remove the depth noise by fully exploiting the geometry information of the scene on different scales.
1058	Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding	Songtao He; Favyen Bastani; Satvat Jagwani; Mohammad Alizadeh; Hari Balakrishnan; Sanjay Chawla; Mohamed M. Elshrif; Samuel Madden; Mohammad Amin Sadeghi;	In this paper, we propose a new method, Sat2Graph, which combinesthe advantages of the two prior categories into a unified framework.
1059	Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition	Di Hu; Xuhong Li; Lichao Mou; Pu Jin; Dong Chen; Liping Jing; Xiaoxiang Zhu; Dejing Dou;	Inspired by the multi-channel perception theory in cognition science, in this paper, for improving the performance on the aerial scene recognition, we explore a novel audiovisual aerial scene recognition task using both images and sounds as input.
1060	Polarimetric Multi-View Inverse Rendering	Jinyu Zhao; Yusuke Monno; Masatoshi Okutomi;	In this paper, we propose a novel 3D reconstruction method called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that effectively exploits geometric, photometric, and polarimetric cues extracted from input multi-view color polarization images.
1061	SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information	Jing Yu Koh; Duc Thanh Nguyen; Quang-Trung Truong; Sai-Kit Yeung; Alexander Binder;	Inspired by the practicality and applicability of the semi-automatic approach, this paper proposes a novel deep neural network architecture, namely SideInfNet that effectively integrates features learnt from images with side information extracted from user annotations.
1062	Improving Face Recognition by Clustering Unlabeled Faces in the Wild	Aruni RoyChowdhury; Xiang Yu; Kihyuk Sohn; Erik Learned-Miller; Manmohan Chandraker;	To address this, we propose a novel identity separation method based on extreme value theory.
1063	NeuRoRA: Neural Robust Rotation Averaging	Pulak Purkait; Tat-Jun Chin; Ian Reid;	In this work, we aim to build a neural network that learns the noise patterns from the data and predict/regress the model parameters from the noisy relative orientations.
1064	SG-VAE: Scene Grammar Variational Autoencoder to generate new indoor scenes	Pulak Purkait; Christopher Zach; Ian Reid;	In this work, we propose a neural network to learn a generative model for sampling consistent indoor scene layouts.
1065	Unsupervised Learning of Optical Flow with Deep Feature Similarity	Woobin Im; Tae-Kyun Kim; Sung-Eui Yoon;	In this work, rather than the handcrafted features i.e. census or pixel values, we propose to use deep self-supervised features with a novel similarity measure, which fuses multi-layer similarities.
1066	Blended Grammar Network for Human Parsing	Xiaomei Zhang; Yingying Chen; Bingke Zhu; Jinqiao Wang; Ming Tang;	In this paper, we propose a Blended Grammar Network (BGNet), to deal with the challenge.
1067	P&sup2Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation	Zehao Yu; Lei Jin; Shenghua Gao;	In this paper, we argue that the poor performance suffers from the non-discriminative point-based matching.
1068	Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs	Van-Quang Nguyen; Masanori Suganuma; Takayuki Okatani;	In this paper, we present a neural architecture named Light-weight Transformer for Many Inputs (LTMI) that can efficiently deal with all the interactions between multiple such inputs in visual dialog.
1069	Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting	Xiyang Liu; Jie Yang; Wenrui Ding; Tieqiang Wang; Zhijin Wang; Junjun Xiong;	To solve this problem, we introduce a new target, named local counting map (LCM), to obtain more accurate results than density map based approaches.
1070	BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging	Ziheng Cheng; Ruiying Lu; Zhengjue Wang; Hao Zhang; Bo Chen; Ziyi Meng; Xin Yuan;	We consider the problem of video snapshot compressive imaging (SCI), where multiple high-speed frames are coded by different masks and then summed to a single measurement.
1071	Ultra Fast Structure-aware Deep Lane Detection	Zequn Qin; Huanyu Wang; Xi Li;	Motivated by this observation, we propose a novel, simple, yet effective formulation aiming at extremely fast speed and challenging scenarios.
1072	Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling	Subin Jeon; Seonghyeon Nam; Seoung Wug Oh; Seon Joo Kim;	We propose an attention-based networks for transferring motions between arbitrary objects.
1073	Domain Adaptive Object Detection via Asymmetric Tri-way Faster-RCNN	Zhenwei He; Lei Zhang;	Therefore, in order to avoid the source domain collapse risk caused by parameter sharing, we propose an asymmetric tri-way Faster-RCNN (ATF) for domain adaptive object detection.
1074	Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition	Xiaobo Wang; Tianyu Fu; Shengcai Liao; Shuo Wang; Zhen Lei; Tao Mei;	In this paper, we propose a novel position-aware exclusivity to encourage large diversity among different filters of the same layer to alleviate the low-capability of student network.
1075	Learning Camera-Aware Noise Models	Ke-Chi Chang; Ren Wang; Hung-Jin Lin; Yu-Lun Liu; Chia-Ping Chen; Yu-Lin Chang; Hwann-Tzong Chen;	To tackle this issue, we propose a data-driven approach, where a generative noise model is learned from real-world noise.
1076	Towards Precise Completion of Deformable Shapes	Oshri Halimi; Ido Imanuel; Or Litany; Giovanni Trappolini; Emanuele Rodolà Leonidas Guibas; Ron Kimmel;	More specifically, given the geometry of a full, articulated object in a given pose, as well as a partial scan of the same object in a different pose, we address the new problem of matching the part to the whole while simultaneously reconstructing the new pose from its partial observation.
1077	Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration	Jiahao Li; Changhao Zhang; Ziyao Xu; Hangning Zhou; Chi Zhang;	In this paper, we propose a novel learning-based pipeline for partially overlapping 3D point cloud registration.
1078	Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization	Amir Rahimi; Amirreza Shaban; Thalaiyasingam Ajanthan; Richard Hartley; Byron Boots;	We study the problem of learning localization model on target classes with weakly supervised image labels, helped by a fully annotated source dataset.
1079	Environment-agnostic Multitask Learning for Natural Language Grounded Navigation	Xin Eric Wang; Vihan Jain; Eugene Ie; William Yang Wang; Zornitsa Kozareva; Sujith Ravi[2];	To close the gap between seen and unseen environments, we aim at learning a generalized navigation model from two novel perspectives: (1) we introduce a multitask navigation model that can be seamlessly trained on both Vision-Language Navi-gation (VLN) and Navigation from Dialog History (NDH) tasks, which benefits from richer natural language guidance and effectively transfers knowledge across tasks; (2) we propose to learn environment-agnostic representations for the navigation policy that are invariant among the environments seen during training, thus generalizing better on unseen environments.
1080	TPFN: Applying Outer Product along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data	Binghua Li; Chao Li; Feng Duan; Ning Zheng; Qibin Zhao;	To this end, we propose a novel network architecture termed Time Product Fusion Network (TPFN), which takes the high-order statistics over both modalities and temporal dynamics into account.
1081	ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis	Eu Wern Teh; Terrance DeVries; Graham W. Taylor;	We consider the problem of distance metric learning (DML), where the task is to learn an effective similarity measure between images.
1082	Learning with Privileged Information for Efficient Image Super-Resolution	Wonkyung Lee; Junghyup Lee; Dohyung Kim; Bumsub Ham;	We introduce in this paper a novel distillation framework, consisting of teacher and student networks, that allows to boost the performance of FSRCNN drastically.
1083	Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification	Jianing Li,; Shiliang Zhang;	This paper tackles this challenge through jointly enforcing visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
1084	Autoencoder-based Graph Construction for Semi-supervised Learning	Mingeun Kang; Kiwon Lee; Yong H. Lee; Changho Suh;	In this paper, we propose a holistic approach that employs a parameterized neural-net-based autoencoder for matrix completion, thereby enabling simultaneous training between models of the classifier and matrix completion.
1085	Virtual Multi-view Fusion for 3D Semantic Segmentation	Abhijit Kundu; Xiaoqi Yin; Alireza Fathi; David Ross; Brian Brewington; Thomas Funkhouser; Caroline Pantofaru;	In this paper we revisit the classic multiview representation of 3D meshes and study several techniques that make them effective for 3D semantic segmentation of meshes.
1086	Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition	Ke Cheng; Yifan Zhang; Congqi Cao; Lei Shi; Jian Cheng; Hanqing Lu;	In this paper, we rethink the spatial aggregation in existing GCN-based skeleton action recognition methods and discover that they are limited by coupling aggregation mechanism.
1087	Deep Shape from Polarization	Yunhao Ba; Alex Gilbert; Franklin Wang; Jinfa Yang; Rui Chen; Yiqin Wang; Lei Yan; Boxin Shi; Achuta Kadambi;	This paper makes a first attempt to bring the Shape from Polarization (SfP) problem to the realm of deep learning.
1088	A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning	Xingyu Chen; Xuguang Lan; Fuchun Sun; Nanning Zheng;	To resolve this problem, in this paper, we propose a boundary based Out-of-Distribution (OOD) classifier which classifies the unseen and seen domains by only using seen samples for training.
1089	Mind the Discriminability: Asymmetric Adversarial Domain Adaptation	Jianfei Yang; Han Zou; Yuxun Zhou; Zhaoyang Zeng; Lihua Xie ();	In this paper, we tackle this problem by designing a simple yet effective scheme, namely Asymmetric Adversarial Domain Adaptation (AADA).
1090	SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates	Zhizhong Han; Guanhui Qiao; Yu-Shen Liu; Matthias Zwicker;	To avoid dense and irregular sampling in 3D, we propose to represent shapes using 2D functions, where the output of the function at each 2D location is a sequence of line segments inside the shape.
1091	Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking	ShiJie Sun; Naveed Akhtar; XiangYu Song; HuanSheng Song; Ajmal Mian ; Mubarak Shah;	To resolve this issue, we introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects’ motion parameters to perform joint detection and association in an end-to-end manner.
1092	Deep FusionNet for Point Cloud Semantic Segmentation	Feihu Zhang Jin Fang Benjamin Wah Philip Torr;	To address these issues, we propose a deep fusion network architecture (FusionNet) with a unique voxel-based mini-PointNet point cloud representation and a new feature aggregation module (fusion module) for large-scale 3D semantic segmentation.
1093	Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information	Bichuan Guo; Jiangtao Wen; Yuxing Han;	In this paper, we propose an approach that achieves decoupling of angular and spatial information by establishing correspondences in the angular domain, then employs regularization to enforce a rotational invariance.
1094	Dual Adversarial Network for Deep Active Learning	Shuo Wang; Yuexiang Li; Kai Ma; Ruhui Ma; Haibing Guan; Yefeng Zheng;	In this paper, we investigate the overlapping problem of recent uncertainty-based approaches and propose to alleviate the issue by taking representativeness into consideration.
1095	Fully Convolutional Networks for Continuous Sign Language Recognition	Ka Leong Cheng; Zhaoyang Yang; Qifeng Chen; Yu-Wing Tai;	In this paper, we propose a fully convolutional network (FCN) for online SLR to concurrently learn spatial and temporal features from weakly annotated video sequences with only sentence-level annotations given.
1096	Self-adapting confidence estimation for stereo	Matteo Poggi; Filippo Aleotti; Fabio Tosi; Giulio Zaccaroni; Stefano Mattoccia;	In this paper, we propose a flexible and lightweight solution enabling self-adapting confidence estimation agnostic to the stereo algorithm or network.
1097	Deep Surface Normal Estimation on the 2-Sphere with Confidence Guided Semantic Attention	Quewei Li; Jie Guo; Yang Fei; Qinyu Tang; Wenxiu Sun; Jin Zeng; Yanwen Guo;	We propose a deep convolutional neural network (CNN) to estimate surface normal from a single color image accompanied with a low-quality depth channel.
1098	AutoSTR: Efficient Backbone Search for Scene Text Recognition	Hui Zhang; Quanming Yao; Mingkun Yang; Yongchao Xu; Xiang Bai;	In this work, inspired by the success of neural architecture search (NAS), we propose automated STR (AutoSTR), which can address the above issue by searching data-dependent backbones.
1099	Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification	Sungwon Han; Sungwon Park; Sungkyu Park; Sundong Kim; Meeyoung Cha;	To address this limitation, we propose a novel two-stage algorithm in which an embedding module for pretraining precedes a refining module that concurrently performs embedding and class assignment.
1100	Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification	Weitao Wan; Jiansheng Chen; Ming-Hsuan Yang;	We propose that this problem can be solved by explicitly modeling the deep feature distribution, for example as a Gaussian Mixture, and then properly introducing the likelihood regularization into the loss function.
1101	Faster AutoAugment: Learning Augmentation Strategies Using Backpropagation	Ryuichiro Hataya; Zdenek Jan; Kazuki Yoshizoe; Hideki Nakayama;	In this paper, we propose a differentiable policy search pipeline for data augmentation, which is much faster than previous methods.
1102	Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation	Lin Huang; Jianchao Tan; Ji Liu; Junsong Yuan;	To borrow wisdom from this structured learning framework while avoiding the sequential modeling for hand pose, taking a 3D point set as input, we propose to leverage the Transformer architecture with a novel non-autoregressive structured decoding mechanism.
1103	Boundary-Aware Cascade Networks for Temporal Action Segmentation	Zhenzhi Wang; Ziteng Gao; Limin Wang; Zhifeng Li; Gangshan Wu;	To address these problems, we present a new boundary-aware cascade network by introducing two novel components.
1104	Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation	Xu Yan; Weibing Zhao; Kun Yuan; Ruimao Zhang; Zhen Li; Shuguang Cui;	This work investigates a novel multi-reference based super-resolution problem by proposing a Content Independent Multi-Reference Super-Resolution (CIMR-SR) model, which is able to adaptively match the visual pattern between references and target image in the low resolution and enhance the feature representation of the target image in the higher resolution.
1105	Inference Graphs for CNN Interpretation	Yael Konforti; Alon Shpigler; Boaz Lerner; Aharon Bar-Hillel;	We propose to model the network hidden layers activity using probabilistic models.
1106	An End-to-End OCR Text Re-organization Sequence Learning for Rich-text Detail Image Comprehension	Liangcheng Li; Feiyu Gao; Jiajun Bu; Yongpan Wang; Zhi Yu; Qi Zheng;	To tackle the above problems, we propose a novel end-to-end OCR text reorganizing model.
1107	Improving Query Efficiency of Black-box Adversarial Attack	Yang Bai; Yuyuan Zeng; Yong Jiang; Yisen Wang; Shu-Tao Xia; Weiwei Guo;	Therefore, in order to improve query efficiency, we explore the distribution of adversarial examples around benign inputs with the help of image structure information characterized by a Neural Process, and propose a Neural Process based black-box adversarial attack (NP-Attack) in this paper.
1108	Self-similarity Student for Partial Label Histopathology Image Segmentation	Hsien-Tzu Cheng; Chun-Fu Yeh; Po-Chen Kuo; Andy Wei; Keng-Chi Liu; Mong-Chi Ko; Kuan-Hua Chao; Yu-Ching Peng; Tyng-Luh Liu;	To learn from these patches, we propose Self-similarity Student, combining teacher-student model paradigm with similarity learning.
1109	BioMetricNet: deep unconstrained face verification through learning of metrics regularized onto Gaussian distributions	Arslan Ali; Matteo Testa; Tiziano Bianchi; Enrico Magli;	We present BioMetricNet: a novel framework for deep unconstrained face verification which learns a regularized metric to compare facial features.
1110	A Decoupled Learning Scheme for Real-world Burst Denoising from Raw Images	Zhetong Liang; Shi Guo; Hong Gu; Huaqi Zhang; Lei Zhang;	In this paper, a novel multi-frame CNN model is carefully designed, which decouples the learning of motion from the learning of noise statistics.
1111	Global-and-Local Relative Position Embedding for Unsupervised Video Summarization	Yunjae Jung; Donghyeon Cho; Sanghyun Woo; In So Kweon;	In this paper, we therefore present a novel input decomposition strategy, which samples the input both globally and locally.
1112	Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms	Jaesung Rim; Haeyun Lee; Jucheol Won; Sunghyun Cho;	In this work, we present a large-scale dataset of real-world blurred images and ground truth sharp images for learning and benchmarking single image deblurring methods.
1113	SPARK: Spatial-aware Online Incremental Attack Against Visual Tracking	Qing Guo; Xiaofei Xie; Felix Juefei-Xu; Lei Ma; Zhongguo Li; Wanli Xue; Wei Feng; Yang Liu;	In this paper, we identify a new task for the adversarial attack to visual tracking: online generating imperceptible perturbations that mislead trackers along with an incorrect (Untargeted Attack, UA) or specified trajectory (Targeted Attack, TA).
1114	CenterNet Heatmap Propagation for Real-time Video Object Detection	Zhujun Xu; Emir Hrustic; Damien Vivet;	In this work, we introduce a method based on a one-stage detector called CenterNet.
1115	Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection	Youwei Pang; Lihe Zhang; Xiaoqi Zhao; Huchuan Lu;	In the end, we implement a kind of more flexible and efficient multi-scale cross-modal feature processing, i.e. dynamic dilated pyramid module.
1116	SOLAR: Second-Order Loss and Attention for Image Retrieval	Tony Ng; Vassileios Balntas; Yurun Tian; Krystian Mikolajczyk;	In this work, we explore two second-order components. One is focused on second-order spatial information to increase the performance of image descriptors, both local and global. It is used to re-weight feature maps, and thus emphasise salient image locations that are subsequently used for description. The second component is concerned with a second-order similarity (SOS) loss, that we extend to global descriptors for image retrieval, and is used to enhance the triplet loss with hard-negative mining.
1117	Fixing Localization Errors to Improve Image Classification	Guolei Sun; Salman Khan; Wen Li; Hisham Cholakkal; Fahad Shahbaz Khan; Luc Van Gool;	In this work, we explore a new direction towards the possible use of CAM in deep network learning process.
1118	PatchPerPix for Instance Segmentation	Lisa Mais; Peter Hirsch and Dagmar Kainmueller;	In this paper we present a novel method for proposal free instance segmentation that can handle sophisticated object shapes that span large parts of an image and form dense object clusters with crossovers.
1119	Attend and Segment: Attention Guided Active Semantic Segmentation	Soroush Seifi; Tinne Tuytelaars;	In this paper we propose a method to gradually segment a scene given a sequence of partial observations.
1120	Accelerating CNN Training by Pruning Activation Gradients	Xucheng Ye; Pengcheng Dai; Junyu Luo; Xin Guo; Yingjie Qi; Jianlei Yang; Yiran Chen;	Hence, we consider pruning these very small gradients randomly to accelerate CNN training according to the statistical distribution of activation gradients.
1121	Global and Local Enhancement Networks for Paired and Unpaired Image Enhancement	Han-Ul Kim; Young Jun Koh; Chang-Su Kim;	A novel approach for paired and unpaired image enhancement is proposed in this work.
1122	Probabilistic Anchor Assignment with IoU Prediction for Object Detection	Kang Kim; Hee Seok Lee;	In this paper we propose a novel anchor assignment strategy that adaptively separates anchors into positive and negative samples for a ground truth bounding box according to the model’s learning status such that it is able to reason the separation in a probabilistic manner.
1123	Eyeglasses 3D shape reconstruction from a single face image	Yating Wang; Quan Wang; Feng Xu;	In this paper, we present an automatic system that recovers the 3D shape of eyeglasses from a single face image with an arbitrary head pose.
1124	Temporal Complementary Learning for Video Person Re-Identification	Ruibing Hou; Hong Chang; Bingpeng Ma; Shiguang Shan; Xilin Chen;	This paper proposes a Temporal Complementary Learning Network that extracts complementary features of consecutive video frames for video person re-identification.
1125	HoughNet: Integrating near and long-range evidence for bottom-up object detection	Nermin Samet; Samet Hicsonmez; Emre Akbas;	This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method.
1126	Graph Wasserstein Correlation Analysis for Movie Retrieval	Xueya Zhang; Tong Zhang; Xiaobin Hong; Zhen Cui; Jian Yang;	In this work, we propose Graph Wasserstein Correlation Analysis (GWCA) to deal with the core issue therein, i.e, cross heterogeneous graph comparison.
1127	Context-Aware RCNN: A Baseline for Action Detection in Videos	Jianchao Wu; Zhanghui Kuang; Limin Wang; Wayne Zhang; Gangshan Wu;	Thus, we revisit RCNN for actor-centric action recognition via cropping and resizing image patches around actors before feature extraction with I3D deep network.
1128	Full-Time Monocular Road Detection Using Zero-Distribution Prior of Angle of Polarization	Ning Li; Yongqiang Zhao; Quan Pan; Seong G. Kong; Jonathan Cheung-Wai Chan;	This paper presents a road detection technique based on long-wave infrared (LWIR) polarization imaging for autonomous navigation regardless of illumination conditions, day and night.
1129	A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation	Haoxian Zhang; Yang Zhao; Ronggang Wang;	Inspired by classical pyramid energy minimization optical flow algorithms, this paper proposes a recurrent residual pyramid network (RRPN) for video frame interpolation.
1130	Learning Enriched Features for Real Image Restoration and Enhancement	Syed Waqas Zamir; Aditya Arora; Salman Khan; Munawar Hayat; Fahad Shahbaz Khan; Ming-Hsuan Yang; Ling Shao;	In this paper, we present an architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network and receiving strong contextual information from the low-resolution representations.
1131	Detail Preserved Point Cloud Completion via Separated Feature Aggregation	Wenxiao Zhang; Qingan Yan; Chunxia Xiao;	In this work, instead of using a global feature to recover the whole complete surface, we explore multi-level features by hierarchical feature learning and represent the existing-part and the missing-part respectively.
1132	LabelEnc: A New Intermediate Supervision Method for Object Detection	Miao Hao; Yitao Liu; Xiangyu Zhang; Jian Sun;	In this paper we propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems.
1133	Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets	Clara Fernandez-Labrador; Ajad Chhatkuli; Danda Pani Paudel; Jose J. Guerrero; C&eacutedric Demonceaux; Luc Van Gool;	This paper aims at learning such 3D keypoints, in an unsupervised manner, using a collection of misaligned 3D point clouds of objects from an unknown category.
1134	PAMS: Quantized Super-Resolution via Parameterized Max Scale	Huixia Li; Chenqian Yan; Shaohui Lin; Xiawu Zheng; Baochang Zhang; Fan Yang; Rongrong Ji;	To address these two issues, we propose a new quantization scheme termed PArameterized Max Scale (PAMS), which applies the trainable truncated parameter to explore the upper bound of the quantization range adaptively.
1135	SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds	Xinge Zhu Yuexin Ma Tai Wang Yan Xu Jianping Shi Dahua Lin;	In this paper, we propose a novel 3D shape signature to explore the shape information from point clouds.
1136	OID: Outlier Identifying and Discarding in Blind Image Deblurring	Liang Chen; Faming Fang; Jiawei Zhang; Jun Liu; Guixu Zhang;	To address these problems,this paper develops a simple yet effective Outlier Identifying and Discarding (OID) method, which alleviates limitations in existing Maximum A Posteriori (MAP)-based deblurring models when significant outliers are presented.
1137	Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors	Mateusz Michalkiewicz; Sarah Parisot; Stavros Tsogkas; Mahsa Baktashmotlagh; Anders Eriksson; Eugene Belilovsky;	In this work we demonstrate experimentally that naive baselines do not apply when the goal is to learn to reconstruct novel objects using very few examples, and that in a mph{few-shot} learning setting, the network must learn concepts that can be applied to new categories, avoiding rote memorization.
1138	Enhanced Sparse Model for Blind Deblurring	Liang Chen; Faming Fang; Shen Lei; Fang Li; Guixu Zhang;	In this paper, we develop a new term to better fit the complex natural noise.
1139	SumGraph: Video Summarization via Recursive Graph Modeling	Jungin Park; Jiyoung Lee; Ig-Jae Kim; Kwanghoon Sohn;	We propose recursive graph modeling networks for video summarization, termed SumGraph, to represent a relation graph, where frames are regarded as nodes and nodes are connected by semantic relationships among frames.
1140	Feature Normalized Knowledge Distillation for Image Classification	Kunran Xu; Lai Rui; Yishi Li; Lin Gu;	From this perspective, we systematically analyze the distillation mechanism and demonstrate that the L2-norm of the feature in penultimate layer would be too large under the influence of label noise, and the temperature T in KD could be regarded as a correction factor for L2-norm to suppress the impact of noise.
1141	A Metric Learning Reality Check	Kevin Musgrave; Serge Belongie; Ser-Nam Lim;	Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look at the field to see if this is actually true.
1142	FTL: A universal framework for training low-bit DNNs via Feature Transfer	Kunyuan Du; Ya Zhang; Haibing Guan; Qi Tian; Shenggan Cheng; James Lin;	Here we introduce a novel feature-based knowledge transfer framework, which utilizes a 32-bit DNN to guide the training of a low-bit DNN via feature maps.
1143	XingGAN for Person Image Generation	Hao Tang; Song Bai; Li Zhang; Philip H.S. Torr; Nicu Sebe;	We propose a novel Generative Adversarial Network (XingGAN or CrossingGAN) for person image generation tasks, i.e., translating the pose of a given person to a desired one.
1144	GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering	Chuang Niu; Jun Zhang; Ge Wang; Jimin Liang;	We propose a self-supervised Gaussian ATtention network for image Clustering (GATCluster).
1145	VCNet: A Robust Approach to Blind Image Inpainting	Yi Wang; Ying-Cong Chen; Xin Tao; Jiaya Jia;	In this paper, we relax the assumption by defining a new blind inpainting setting, making training a blind inpainting neural system robust against various unknown missing region patterns.
1146	Learning to Predict Context-adaptive Convolution for Semantic Segmentation	Jianbo Liu; Junjun He; Yu Qiao; Jimmy S. Ren; Hongsheng Li;	In this paper, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial location of the semantic feature maps.
1147	EfficientFCN: Holistically-guided Decoding for Semantic Segmentation	Jianbo Liu; Junjun He; Jiawei Zhang; Jimmy S. Ren; Hongsheng Li;	In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pretrained network without any dilated convolution.
1148	GroSS: Group-Size Series Decomposition for Grouped Architecture Search	Henry Howard-Jenkins; Yiwen Li; Victor Adrian Prisacariu;	We present a novel approach which is able to explore the configuration of grouped convolutions within neural networks.
1149	Efficient Adversarial Attacks for Visual Object Tracking	Siyuan Liang; Xingxing Wei; Siyuan Yao; Xiaochun Cao;	We present an end-to-end network FAN (Fast Attack Network) that uses a novel drift loss combined with the embedded feature loss to attack the Siamese network based trackers.
1150	Globally-Optimal Event Camera Motion Estimation	Xin Peng; Yifu Wang; Ling Gao; Laurent Kneip;	The present paper looks at fronto-parallel motion estimation of an event camera.
1151	Weakly-supervised Learning of Human Dynamics	Petrissa Zell; Bodo Rosenhahn; Bastian Wandt;	This paper proposes a weakly-supervised learning framework for dynamics estimation from human motion.
1152	Journey Towards Tiny Perceptual Super-Resolution	Royson Lee; ?ukasz Dudziak; Mohamed Abdelfattah; Stylianos I. Venieris; Hyeji Kim; Hongkai Wen; Nicholas D. Lane;	In this work, we propose a neural architecture search (NAS) approach that integrates NAS and generative adversarial networks (GANs) with recent advances in perceptual SR and pushes the efficiency of small perceptual SR models to facilitate on-device execution.
1153	What makes fake images detectable? Understanding properties that generalize	Lucy Chai; David Bau; Ser-Nam Lim; Phillip Isola;	We seek to understand what properties of these fake images make them detectable and identify what generalizes across different model architectures, datasets, and variations in training.
1154	Embedding Propagation: Smoother Manifold for Few-Shot Classification	Pau Rodr&iacuteguez; Issam Laradji; Alexandre Drouin; Alexandre Lacoste;	In this work, we propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
1155	Category Level Object Pose Estimation via Neural Analysis-by-Synthesis	Xu Chen; Zijian Dong; Jie Song; Andreas Geiger; Otmar Hilliges;	In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary.
1156	High-Fidelity Synthesis with Disentangled Representation	Wonkwang Lee; Donggyun Kim; Seunghoon Hong; Honglak Lee;	We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that can easily incorporate the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis.
1157	PL?P – Point-line Minimal Problems under Partial Visibility in Three Views	Timothy Duff; Kathl&eacuten Kohn; Anton Leykin; Tomas Pajdla;	We present a complete classification of minimal problems for generic arrangements of points and lines in space observed partially by three calibrated perspective cameras when each line is incident to at most one point.
1158	Prediction and Recovery for Adaptive Low-Resolution Person Re-Identification	Ke Han; Yan Huang; Zerui Chen; Liang Wang; Tieniu Tan;	In this paper, we propose a novel Prediction, Recovery and Identification (PRI) model for LR re-id, which adaptively recovers missing details by predicting a preferable scale factor based on the image content.
1159	Learning Canonical Representations for Scene Graph to Image Generation	Roei Herzig; Amir Bar; Huijuan Xu; Gal Chechik; Trevor Darrell; Amir Globerson;	In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs.
1160	Adversarial Robustness on In- and Out-Distribution Improves Explainability	Maximilian Augustin; Alexander Meinke; Matthias Hein;	In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution.
1161	Deformable Style Transfer	Sunnie S. Y. Kim; Nicholas Kolkin; Jason Salavon; Gregory Shakhnarovich;	We propose deformable style transfer (DST), an optimization-based approach that jointly stylizes the texture and geometry of a content image to better match a style image.
1162	Aligning Videos in Space and Time	Senthil Purushwalkam; Tian Ye; Saurabh Gupta; Abhinav Gupta;	In this paper, we focus on the task of extracting visual correspondences across videos.
1163	Neural Wireframe Renderer: Learning Wireframe to Image Translations	Yuan Xue; Zihan Zhou; Xiaolei Huang;	In this paper, we bridge the information gap by generating photo-realistic rendering of indoor scenes from wireframe models in an image translation framework.
1164	RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax	Xiao Zhang; Rui Zhao; Yu Qiao; Hongsheng Li;	To address this problem, this paper introduces a novel Radial Basis Function (RBF) distances to replace the commonly used inner products in the softmax loss function, such that it can adaptively assign losses to regularize the intra-class and inter-class distances by reshaping the relative differences, and thus creating more representative prototypes of classes to improve optimization.
1165	Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction	Kelvin Wong; Qiang Zhang; Ming Liang; Bin Yang; Renjie Liao; Abbas Sadat; Raquel Urtasun;	We present a novel method for testing the safety of self-driving vehicles in simulation.
1166	Determining the Relevance of Features for Deep Neural Networks	Christian Reimers; Jakob Runge; Joachim Denzler;	In this work, we present a novel method to identify whether a specific feature is relevant to a classifier’s decision or not.
1167	Weakly Supervised Semantic Segmentation with Boundary Exploration	Liyi Chen; Weiwei Wu; Chenchen Fu; Xiao Han; Yuntao Zhang;	To obtain semantic segmentation under weak supervision, this paper presents a simple yet effective approach based on the idea of explicitly exploring object boundaries from training images to keep coincidence of segmentation and boundaries.
1168	GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation	Wallace Lira; Johannes Merz; Daniel Ritchie; Daniel Cohen-Or; Hao Zhang;	We introduce GANHopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops.
1169	DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild	Philippe Weinzaepfel; Romain Br&eacutegier; Hadrien Combaluzier; Vincent Leroy; Gr&eacutegory Rogez;	We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild.
1170	Multi-view adaptive graph convolutions for graph classification	Nikolas Adaloglou; Nicholas Vretos; Petros Daras;	In this paper, a novel multi-view methodology for graph-based neural networks is proposed.
1171	Instance Adaptive Self-Training for Unsupervised Domain Adaptation	Ke Mei; Chuang Zhu; Jiaqi Zou; Shanghang Zhang;	In this paper, we propose an instance adaptive self-training framework for UDA on the task of semantic segmentation.
1172	Weight Decay Scheduling and Knowledge Distillation for Active Learning	Juseung Yun; Byungjoo Kim; Junmo Kim;	However,in this paper, we focus on the data-incremental nature of active learning, and propose a method for properly tuning the weight decay as the amount of data increases.
1173	HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs	Hai Victor Habi; Roy H. Jennings; Arnon Netzer;	In this work, we introduce the Hardware Friendly Mixed Precision Quantization Block (HMQ) in order to meet this requirement.
1174	Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning	Christopher Zach; Huu Le;	We aim to remove the need to maintain the latent variables and propose two formally justified methods, that dynamically adapt the required accuracy of latent variable inference.
1175	Geometry Constrained Weakly Supervised Object Localization	Weizeng Lu; Xi Jia; Weicheng Xie; Linlin Shen; Yicong Zhou; Jinming Duan;	We propose a geometry constrained network, termed GCNet, for weakly supervised object localization (WSOL).
1176	Duality Diagram Similarity: a generic framework for initialization selection in task transfer learning	Kshitij Dwivedi; Jiahui Huang; Radoslaw Martin Cichy; Gemma Roig;	In this paper, we tackle an open research question in transfer learning, which is selecting a model initialization to achieve high performance on a new task, given several pre-trained models.
1177	OneGAN: Simultaneous Unsupervised Learning of Conditional Image Generation, Foreground Segmentation, and Fine-Grained Clustering	Yaniv Benny; Lior Wolf;	We present a method for simultaneously learning, in an unsupervised manner, (i) a conditional image generator, (ii) foreground extraction and segmentation, (iii) clustering into a two-level class hierarchy, and (iv) object removal and background completion, all done without any use of annotation.
1178	Mining self-similarity: Label super-resolution with epitomic representations	Nikolay Malkin; Anthony Ortiz; Nebojsa Jojic;	We derive a new training algorithm for epitomes which allows, for the first time, learning from very large data sets and derive a label super-resolution algorithm as a statistical inference algorithm over epitomic representations.
1179	AE-OT-GAN: Training GANs from data specific latent distribution	Dongsheng An; Yang Guo; Min Zhang; Xin Qi; Na Lei; Xianfang Gu;	In this paper, we propose the AE-OT-GAN model to utilize the advantages of the both models: generate high quality images and at the same time overcome the mode collapse/mixture problems.
1180	Null-sampling for Interpretable and Fair Representations	Thomas Kehrenberg; Myles Bartlett; Oliver Thomas; Novi Quadrianto;	We propose to learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness.
1181	Guiding Monocular Depth Estimation Using Depth-Attention Volume	Lam Huynh; Phong Nguyen-Ha; Jiri Matas; Esa Rahtu; Janne Heikkilä	In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
1182	Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping	Adam W. Harley; Shrinidhi Kowshika Lakshmikanth; Paul Schydlo; Katerina Fragkiadaki;	We propose to leverage multiview data of static points in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time.
1183	Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer	Yuanyi Zhong; Jianfeng Wang; Jian Peng; Lei Zhang;	In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain.
1184	B&eacutezierSketch: A generative model for scalable vector sketches	Ayan Das; Yongxin Yang; Timothy Hospedales; Tao Xiang; Yi-Zhe Song;	In this paper we present B&eacutezierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution.
1185	Semantic Relation Preserving Knowledge Distillation for Image-to-Image Translation	Zeqi Li; Ruowei Jiang,; Parham Aarabi;	In this work, we propose a novel method to address this problem by applying knowledge distillation together with distillation of a semantic relation preserving matrix.
1186	Domain Adaptation Through Task Distillation	Brady Zhou; Nimit Kalra; Philipp Kr&aumlhenb&uumlhl;	We use these recognition datasets to link up a source and target domain to transfer models between them in a task distillation framework.
1187	PatchAttack: A Black-box Texture-based Attack with Reinforcement Learning	Chenglin Yang; Adam Kortylewski; Cihang Xie; Yinzhi Cao; Alan Yuille;	Our proposed PatchAttack is query efficient and can break models for both targeted and non-targeted attacks.
1188	More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning	Yu Liu; Sarah Parisot; Gregory Slabaugh; Xu Jia; Ales Leonardis; Tinne Tuytelaars;	Since those regularization strategies are mostly associated with classifier outputs, we propose a MUlti-Classifier (MUC) incremental learning paradigm that integrates an ensemble of auxiliary classifiers to estimate more effective regularization constraints.
1189	Extending and Analyzing Self-Supervised Learning Across Domains	Bram Wallace; Bharath Hariharan;	We discover, among other findings, that Rotation is the most semantically meaningful task, while much of the performance of Jigsaw is attributable to the nature of its induced distribution rather than semantic understanding.
1190	Multi-Source Open-Set Deep Adversarial Domain Adaptation	Sayan Rakshit; Dipesh Tamboli; Pragati Shuddhodhan Meshram; Biplab Banerjee; Gemma Roig; Subhasis Chaudhuri;	As a remedy, we propose a novel adversarial learning-driven approach to deal with the MS-OSDA setup.
1191	Neural Batch Sampling with Reinforcement Learning for Semi-Supervised Anomaly Detection	Wen-Hsuan Chu; Kris M. Kitani;	In particular, we propose a novel semi-supervised learning algorithm for anomaly detection and segmentation using an anomaly classifier that uses as input the extit{loss profile} of a data sample processed through an autoencoder.
1192	LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities	Baoxiong Jia; Yixin Chen; Siyuan Huang; Yixin Zhu; Song-Chun Zhu;	We introduce the LEMMA dataset to provide a single home to address these missing dimensions with carefully designed settings, wherein the numbers of tasks and agents vary to highlight different learning objectives.
1193	Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images	Matthew Purri; Kristin Dana;	In this work, we introduce the challenging task of estimating a set of tactile physical properties from visual information.
1194	Accurate Optimization of Weighted Nuclear Norm for Non-Rigid Structure from Motion	Jos&eacute Pedro Iglesias; Carl Olsson; Marcus Valtonen &Oumlrnhag;	In this paper we show that more accurate results can in many cases beachieved with 2nd order methods.
1195	Proposal-based Video Completion	Yuan-Ting Hu; Heng Wang; Nicolas Ballas; Kristen Grauman; Alexander G. Schwing;	In contrast, in this paper, we propose a video inpainting algorithm based on proposals: we use 3D convolutions to obtain an initial inpainting estimate which is subsequently refined by fusing a generated set of proposals.
1196	HGNet: Hybrid Generative Network for Zero-shot Domain Adaptation	Haifeng Xia; Zhengming Ding;	In this paper, we propose a novel algorithm, Hybrid Generative Network (HGNet) for Zero-shot Domain Adaptation, which embeds an adaptive feature separation (AFS) module into generative architecture.
1197	Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding	Kaihao Zhang; Wenhan Luo; Wenqi Ren; Jingwen Wang Fang Zhao; Lin Ma ; Hongdong Li;	In this paper, we present a Paired Rain Removal Network (PRRNet), which exploits both stereo images and semantic information.
1198	DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks	Hassan Dbouk; Hetul Sanghvi; Mahesh Mehendale; Naresh Shanbhag;	To this end, we present a novel fully differentiable non-uniform quantizer that can be seamlessly mapped onto efficient ternary-based dot product engines.
1199	All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling	Zhixiang Chi; Rasoul Mohammadi Nasiri; Zheng Liu; Juwei Lu; Jin Tang ; Konstantinos N Plataniotis;	Departing from the state-of-the-art, this work introduces a true multi-frame interpolator.
1200	A Broader Study of Cross-Domain Few-Shot Learning	Yunhui Guo; Noel C. Codella; Leonid Karlinsky; James V. Codella; John R. Smith; Kate Saenko; Tajana Rosing; Rogerio Feris;	In this paper, we propose the Broader Study of Cross-Domain Few-Shot Learning (BSCD-FSL) benchmark, consisting of image data from a diverse assortment of image acquisition methods.
1201	Practical Poisoning Attacks on Neural Networks	Junfeng Guo; Cong Liu;	This paper presents a new, practical targeted poisoning attack method on neural networks in vision domain, namely BlackCard.
1202	Unsupervised Domain Adaptation in the Dissimilarity Space for Person Re-identification	Djebril Mekhazni; Amran Bhuiyan; George Ekladious; Eric Granger;	In this paper, we propose a novel Dissimilarity-based Maximum Mean Discrepancy (D-MMD) loss for aligning pair-wise distances that can be optimized via gradient descent using relatively small batch sizes.
1203	Learn distributed GAN with Temporary Discriminators	Hui Qu; Yikai Zhang; Qi Chang; Zhennan Yan; Chao Chen; Dimitris Metaxas;	In this work, we propose a method for training distributed GAN with sequential temporary discriminators.
1204	SemifreddoNets: Partially Frozen Neural Networks for Efficient Computer Vision Systems	Leo F Isikdogan; Bhavin V Nayak; Chyuan-Tyng Wu; Joao Peralta Moreira ; Sushma Rao; Gilad Michael;	We propose a system comprised of fixed-topology neural networks having partially frozen weights, named SemifreddoNets.
1205	Improving Adversarial Robustness by Enforcing Local and Global Compactness	Anh Bui; Trung Le; He Zhao; Paul Montague; Olivier deVel; Tamas Abraham; Dinh Phung;	In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network.
1206	TopoAL: An Adversarial Learning Approach for Topology-Aware Road Segmentation	Subeesh Vasu; Mateusz Kozinski; Leonardo Citraro; and Pascal Fua;	To address this issue, we introduce an Adversarial Learning (AL) strategy tailored for our purposes.
1207	Channel selection using Gumbel Softmax	Charles Herrmann; Richard Strong Bowen; Ramin Zabih;	We propose a single end-to-end framework that can improve inference efficiency in both settings.
1208	Exploiting Temporal Coherence for Self-Supervised One-shot Video Re-identification	Dripta S. Raychaudhuri; Amit K. Roy-Chowdhury;	In this paper, we propose a new framework named Temporal Consistency Progressive Learning, which uses temporal coherence as a novel self-supervised auxiliary task in the one-shot learning paradigm to capture such relationships amongst the unlabeled tracklets.
1209	An Efficient Training Framework for Reversible Neural Architectures	Zixuan Jiang; Keren Zhu; Mingjie Liu; Jiaqi Gu; David Z. Pan;	In this work, we formulate the decision problem for reversible operators with training time as the objective function and memory usage as the constraint.
1210	Box2Seg: Attention Weighted Loss and Discriminative Feature Learning for Weakly Supervised Segmentation	Viveka Kulharia; Siddhartha Chandra; Amit Agrawal; Philip Torr; Ambrish Tyagi;	We propose a weakly supervised approach to semantic segmentation using bounding box annotations.
1211	FreeCam3D: Snapshot Structured Light 3D with Freely-Moving Cameras	Yicheng Wu; Vivek Boominathan; Xuan Zhao; Jacob T. Robinson; Hiroshi Kawasaki; Aswin Sankaranarayanan; Ashok Veeraraghavan;	We propose a freeform structured light system that does not rigidly constrain camera(s) to the projector.
1212	One-Pixel Signature: Characterizing CNN Models for Backdoor Detection	Shanjiaoyang Huang; Weiqi Peng; Zhiwei Jia; Zhuowen Tu;	We tackle the convolution neural networks (CNNs) backdoor detection problem by proposing a new representation called one-pixel signature.
1213	Learning to Transfer Learn: Reinforcement Learning-Based Selection for Adaptive Transfer Learning	Linchao Zhu; Sercan . Ar?k; Yi Yang; Tomas Pfister;	We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset.
1214	Structure-Aware Generation Network for Recipe Generation from Images	Hao Wang; Guosheng Lin; Steven C. H. Hoi; Chunyan Miao;	In this paper, we are interested in automatically generating cooking instructions for food.
1215	A Simple and Effective Framework for Pairwise Deep Metric Learning	Qi Qi; Yan Yan; Zixuan Wu; Xiaoyu Wang; Tianbao Yang;	In this paper, we cast DML as a simple pairwise binary classification problem that classifies a pair of examples as similar or dissimilar.
1216	Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner	Eugene Lee; Evan Chen; Chen-Yi Lee;	To cope with the unforeseeable distributional changes during deployment, we propose a transductive meta-learner that takes unlabeled samples during testing (deployment) for a self-supervised weight adjustment (also known as transductive inference), providing fast adaptation to the distributional changes.
1217	A Recurrent Transformer Network for Novel View Action Synthesis	Kara Marie Schatz; Erik Quintanilla; Shruti Vyas; Yogesh S Rawat;	In this work, we address the problem of synthesizing human actions from novel views.
1218	Multi-view Action Recognition using Cross-view Video Prediction	Shruti Vyas; Yogesh S Rawat; Mubarak Shah;	In this work, we address the problem of action recognition in a multi-view environment.
1219	Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation	Mingmin Zhen; Shiwei Li; Lei Zhou; Jiaxiang Shang; Haoan Feng; Tian Fang; Long Quan;	In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task.
1220	SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction	Sriram N N; Buyu Liu; Francesco Pittaluga; Manmohan Chandraker;	We propose advances that address two key challenges in future trajectory prediction: (i) multimodality in both training data and predictions and (ii) constant time inference regardless of number of agents.
1221	Label-Driven Reconstruction for Domain Adaptation in Semantic Segmentation	Jinyu Yang; Weizhi An; Sheng Wang; Xinliang Zhu; Chaochao Yan; Junzhou Huang;	Here, we present an innovative framework, designed to mitigate the image translation bias and align cross-domain features with the same category.
1222	Efficient Outdoor 3D Point Cloud Semantic Segmentation for Critical Road Objects and Distributed Contexts	Chi-Chong Wong; Chi-Man Vong;	In this work, we propose a novel neural network model called Attention-based Dynamic Convolution Network with Self-Attention Global Contexts(ADConvnet-SAGC), which i) applies attention mechanism to adaptively focus on the most related neighboring points for learning the point features of 3D objects, especially for small objects with diverse shapes ii) applies self-attention module for efficiently capturing long-range distributed contexts from the input iii) a more reasonable and compact architecture for efficient inference.
1223	Attributional Robustness Training using Input-Gradient Spatial Alignment	Mayank Singh; Nupur Kumari; Puneet Mangla; Abhishek Sinha; Vineeth N Balasubramanian; Balaji Krishnamurthy;	In this work, we study the problem of attributional robustness (i.e. models having robust explanations) by showing an upper bound for attributional vulnerability in terms of spatial correlation between the input image and its explanation map.
1224	Reducing the Sim-to-Real Gap for Event Cameras	Timo Stoffregen; Cedric Scheerlinck; Davide Scaramuzza; Tom Drummond; Nick Barnes; Lindsay Kleeman; Robert Mahony;	To address this, we present a new extbf{High Quality Frames (HQF)} dataset, containing events and ground truth frames from a DAVIS240C that are well-exposed and minimally motion-blurred.
1225	Spatial Geometric Reasoning for Room Layout Estimation via Deep Reinforcement Learning	Liangliang Ren; Yangyang Song; Jiwen Lu; Jie Zhou;	We formulate the problem as a Markov decision process, in which the layout is incrementally adjusted based on the difference between the current layout and the target image, and the policy is learned via deep reinforcement learning.
1226	Learning Data Augmentation Strategies for Object Detection	Barret Zoph; Ekin D. Cubuk; Golnaz Ghiasi; Tsung-Yi Lin; Jonathon Shlens; Quoc V. Le;	First, we propose to use AutoAugment [3] to design better data augmentation strategies for object detection because it can address the difficulty of designing them. Second, we use the method to assess the value of data augmentation in object detection and compare it against the value of architecture.
1227	DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search	Xiyang Dai; Dongdong Chen; Mengchen Liu; Yinpeng Chen; Lu Yuan;	In this paper, we present DA-NAS that can directly search the architecture for large-scale target tasks while allowing a large candidate set in a more efficient manner.
1228	A Closer Look at Generalisation in RAVEN	Steven Spratley; Krista Ehinger; Tim Miller;	We revise the existing evaluation, and introduce two relational models, Rel-Base and Rel-AIR, that significantly improve this performance.
1229	Supervised Edge Attention Network for Accurate Image Instance Segmentation	Xier Chen; Yanchao Lian; Licheng Jiao; Haoran Wang; YanJie Gao; Shi Lingling;	To circumvent this issue, we propose a fully convolutional box head and a supervised edge attention module in mask head.
1230	Discriminative Partial Domain Adversarial Network	Jian Hu; Hongya Tuo; Chao Wang; Lingfeng Qiao; Haowen Zhong; Junchi Yan; Zhongliang Jing; Henry Leung;	In this paper, a novel Discriminative Partial Domain Adversarial Network (DPDAN) is developed.
1231	Differentiable Programming for Hyperspectral Unmixing using a Physics-based Dispersion Model	John Janiczek; Parth Thaker; Gautam Dasarathy; Christopher S. Edwards ; Philip Christensen; Suren Jayasuriya;	In this paper, spectral variation is considered from a physics-based approach and incorporated into an end-to-end spectral unmixing algorithm via differentiable programming.
1232	Deep Cross-species Feature Learning for Animal Face Recognition via Residual Interspecies Equivariant Network	Xiao Shi; Chenxue Yang; Xue Xia; Xiujuan Chai;	In this work, we propose a novel Residual InterSpecies Equivariant Network (RiseNet) to deal with the animal face recognition task with limited training samples.
1233	Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes	Liang Liao; Jing Xiao; Zheng Wang; Chia-Wen Lin; Shin’ichi Satoh;	In this paper, we propose a Semantic Guidance and Evaluation Network (SGE-Net) to iteratively update the structural priors and the inpainted image in an interplay framework of semantics extraction and image inpainting.
1234	Sound2Sight: Generating Visual Dynamics from Sound and Context	Moitreya Chatterjee; Anoop Cherian;	In this paper, we study this problem in the context of audio-conditioned visual synthesis — a task that is important, for example, in occlusion reasoning.
1235	3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection	Jin Hyeok Yoo; Yecheol Kim; Jisong Kim; Jun Won Choi;	In this paper, we propose a new deep architecture for fusing camera and LiDAR sensors for 3D object detection.
1236	NoiseRank: Unsupervised Label Noise Reduction with Dependence Models	Karishma Sharma; Pinar Donmez; Enming Luo; Yan Liu; I. Zeki Yalniz;	In this paper, we propose NoiseRank, for unsupervised label noise reduction using Markov Random Fields (MRF).
1237	Fast Adaptation to Super-Resolution Networks via Meta-Learning	Seobin Park; Jinsu Yoo; Donghyeon Cho; Jiwon Kim; Tae Hyun Kim;	In this work, we observe the opportunity for further improvement of the performance of SISR without changing the architecture of conventional SR networks by practically exploiting additional information given from the input image.
1238	TP-LSD: Tri-Points Based Line Segment Detector	Siyu Huang; Fangbo Qin; Pengfei Xiong; Ning Ding; Yijia He; Xiao Liu;	This paper proposes a novel deep convolutional model, Tri-Points Based Line Segment Detector (TP-LSD), to detect line segments in an image at real-time speed.
1239	SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation	Chenfeng Xu; Bichen Wu; Zining Wang; Wei Zhan; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka;	To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image.
1240	An Attention-driven Two-stage Clustering Method for Unsupervised Person Re-Identification	Zilong Ji; Xiaolong Zou; Xiaohan Lin; Xiao Liu; Tiejun Huang; Si Wu;	In the present study, we propose an attention-driven two-stage clustering (ADTC) method to solve this problem.
1241	Toward Fine-grained Facial Expression Manipulation	Jun Ling; Han Xue; Li Song; Shuhui Yang; Rong Xie; Xiao Gu;	In this study, we take these two objectives into consideration and propose a novel method.
1242	Adaptive Object Detection with Dual Multi-Label Prediction	Zhen Zhao; Yuhong Guo; Haifeng Shen; Jieping Ye;	In this paper, we propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection by exploiting multi-label object recognition as a dual auxiliary task.
1243	Table Structure Recognition using Top-Down and Bottom-Up Cues	Sachin Raja; Ajoy Mondal; C V Jawahar;	In our work, we focus on tables that have complex structures, dense content, and varying layouts with no dependency on meta-features and/or OCR.
1244	Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-Encoder	Mingyu Yin; Li Sun; Qingli Li;	This paper proposes a view translation model within cVAE-GAN framework for the purpose of unpaired training.
1245	Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments	Jacob Krantz; Erik Wijmans; Arjun Majumdar; Dhruv Batra; Stefan Lee;	We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.
1246	Boundary Content Graph Neural Network for Temporal Action Proposal Generation	Yueran Bai; Yingying Wang; Yunhai Tong; Yang Yang; Qiyue Liu; Junhui Liu;	To address this issue, we propose a novel Boundary Content Graph Neural Network (BC-GNN) to model the insightful relations between the boundary and action content of temporal proposals by the graph neural networks.
1247	Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition	Yunhao Ge; Jiaping Zhao; Laurent Itti;	Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously.
1248	VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval	Minuk Ma; Sunjae Yoon; Junyeong Kim; Youngjoon Lee; Sunghun Kang; Chang D. Yoo;	This paper explores a method for performing VMR in a weakly-supervised manner (wVMR): training is performed without temporal moment labels but only with the text query that describes a segment of the video.
1249	Attention-Based Query Expansion Learning	Albert Gordo; Filip Radenovic; Tamara Berg;	In this paper we propose a more principled framework to query expansion,where one trains, in a discriminative manner, a model that learns how images should be aggregated to form the expanded query.
1250	Interpretable Foreground Object Search As Knowledge Distillation	Boren Li; Po-Yu Zhuang; Jian Gu; Mingyang Li; Ping Tan;	This paper proposes a knowledge distillation method for foreground object search (FoS).
1251	Improving Knowledge Distillation via Category Structure	Zailiang Chen; Xianxian Zheng; Hailan Shen; Ziyang Zeng; Yukun Zhou; Rongchang Zhao;	In this paper, a novel Category Structure is proposed to transfer category-level structured relations for knowledge distillation.
1252	High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images	Stephan J. Garbin; Marek Kowalski; Matthew Johnson; Jamie Shotton;	In this work, we propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model which, in turn, maps the vector to a photorealistic image of a person of the same pose, expression, hair, and lighting.
1253	Attentive Prototype Few-shot Learning with Capsule Network-based Embedding	Fangyu Wu; Jeremy S.Smith; Wenjin Lu; Chaoyi Pang; Bailing Zhang;	Our contributions include (1) a new embedding structure to encode relative spatial relationships between features by applying a capsule network (2) a new triplet loss designated to enhance the semantic feature embedding where similar samples are close to each other while dissimilar samples are farther apart and (3) an effective non-parametric classifier termed attentive prototypes in place of the simple prototypes in current few-shot learning.
1254	Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances	Aditya Arun; C.V. Jawahar; M. Pawan Kumar;	Unlike previous approaches, we explicitly model the uncertainty in the pseudo label generation process using a conditional distribution.
1255	DA4AD: End-to-End Deep Attention-based Visual Localization for Autonomous Driving	Yao Zhou; Guowei Wan; Shenhua Hou; Li Yu; Gang Wang; Xiaofei Rui; Shiyu Song;	We present a visual localization framework based on novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy.
1256	Visual-Relation Conscious Image Generation from Structured-Text	Duc Minh Vo; Akihiro Sugimoto;	We propose an end-to-end network for image generation from given structured-text that consists of the visual-relation layout module and the pyramid of GANs, namely stacking-GANs.
1257	Patch-wise Attack for Fooling Deep Neural Network	Lianli Gao; Qilong Zhang; Jingkuan Song; Xianglong Liu; Heng Tao Shen;	Motivated by this, we propose a patch-wise iterative algorithm – a black-box attack towards main stream normally trained and defense models, which differs from the existing attack methods manipulating pixel-wise noise.
1258	Feature Pyramid Transformer	Dong Zhang; Hanwang Zhang; Jinhui Tang; Meng Wang; Xiansheng Hua; Qianru Sun;	To this end, we propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT).
1259	MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module	Jiabin Xing; Zhi Qi; Jiying Dong; Jiaxuan Cai; Hao Liu;	To address the issue, we propose two compact stereo networks, MABNet and its light version MABNet_tiny.
1260	Guided Saliency Feature Learning for Person Re-identification in Crowded Scenes	Lingxiao He; Wu Liu;	In this paper, we propose a simple occlusion-aware approach to address the problem.
1261	Asymmetric Two-Stream Architecture for Accurate RGB-D Saliency Detection	Miao Zhang; Sun Xiao Fei; Jie Liu; Shuang Xu; Yongri Piao; Huchuan Lu;	In this paper, we propose an asymmetric two-stream architecture taking account of the inherent differences between RGB and depth data for saliency detection.
1262	Explaining Image Classifiers using Statistical Fault Localization	Youcheng Sun; Hana Chockler; Xiaowei Huang; Daniel Kroening;	In this paper, we show that statistical fault localization (SFL) techniques from software engineering deliver high quality explanations of the outputs of DNNs, where we define an explanation as a minimal subset of features sufficient for making the same decision as for the original input.
1263	Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers	Michal Rol&iacutenek; Paul Swoboda; Dominik Zietlow; Anselm Paulus; V&iacutet Musil; Georg Martius;	Building on recent progress at the intersection of combinatorial optimization and deep learning, we propose an end-to-end trainable architecture for deep graph matching that contains unmodified combinatorial solvers.
1264	Learning Video Representations by Transforming Time	Simon Jenni; Givi Meishvili; Paolo Favaro;	We introduce a novel self-supervised learning approach to learn representations of videos that are responsive to changes in the motion dynamics.
1265	Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation	Madhu Vankadari; Sourav Garg; Anima Majumder; Swagat Kumar; Ardhendu Behera;	In this paper, we look into the problem of estimating per-pixel depth maps from unconstrained RGB monocular night-time images which is a difficult task that has not been addressed adequately in the literature.
1266	Variational Connectionist Temporal Classification	Linlin Chao; Jingdong Chen; Wei Chu;	To remedy this, we propose variational CTC (Var-CTC) to enhance the learning of non-blank symbols.
1267	End-to-end Dynamic Matching Network for Multi-view Multi-person 3d Pose Estimation	Congzhentao Huang; Shuai Jiang; Yang Li; Ziyue Zhang; Jason Traish; Chen Deng; Sam Ferguson; Richard Yi Da Xu;	To address this phenomenon, we propose a novel end-to-end training scheme that brings the three separate modules into a single model.
1268	Orderly Disorder in Point Cloud Domain	Morteza Ghahremani; Bernard Tiddeman; Yonghuai Liu; and Ardhendu Behera;	In this paper, we propose a smart yet simple deep network for analysis of 3D modelsusing ‘orderly disorder’ theory.
1269	Deep Decomposition Learning for Inverse Imaging Problems	Dongdong Chen; Mike E. Davies;	In this paper, inspired by the geometry that data can be decomposed by two components from the null-space of the forward operator and the range space of its pseudo-inverse, we train neural networks to learn the two components and therefore learn the decomposition, i.e. we explicitly reformulate the neural network layers as learning range-nullspace decomposition functions with reference to the layer inputs, instead of learning unreferenced functions.
1270	FLOT: Scene Flow on Point Clouds guided by Optimal Transport	Gilles Puy; Alexandre Boulch; Renaud Marlet;	We propose and study a method called FLOT that estimates scene flow on point clouds.
1271	Accurate Reconstruction of Oriented 3D Points using Affine Correspondences	Carolina Raposo; Joao P. Barreto;	This article provides new formulations for achieving epipolar geometry-consistent ACs, that, besides leading to linear solvers that are up to 30$ imes$ faster than the state-of-the-art alternatives, allow for a fast refinement scheme that significantly improves the quality of the noisy ACs.
1272	Volumetric Transformer Networks	Seungryong Kim; Sabine Ssstrunk; Mathieu Salzmann;	To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
1273	360(o) Camera Alignment via Segmentation	Benjamin Davidson; Mohsan S. Alvi; Jo&atildeo F. Henriques;	In this work, we investigate how to solve this problem by fusing purely geometric cues, such as apparent vanishing points, with learned semantic cues, such as the expectation that some visual elements (e.g. doors) have a natural upright position.
1274	A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval	Bin Wang; Yongsheng Gao;	Although its extended version, trace transform, allow us to construct affine invariants, they are less informative and computational expensive due to the loss of spatial relationship between trace lines and the extensive repeated calculation of transform. To address this issue, a novel line integral transform is proposed.
1275	Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks	Federico Baldassarre; Kevin Smith; Josephine Sullivan; Hossein Azizpour;	This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels.
1276	Guided Semantic Flow	Sangryul Jeon; Dongbo Min; Seungryong Kim; Jihwan Choe; Kwanghoon Sohn;	To address such severe matching ambiguities, we introduce a novel approach, called {guided semantic flow}, based on the key insight that sparse yet reliable matches can effectively capture non-rigid geometric variations, and these confident matches can guide adjacent pixels to have similar solution spaces, reducing the matching ambiguities significantly.
1277	Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation	Mausoom Sarkar; Milan Aggarwal; Arneh Jain; Hiresh Gupta; Balaji Krishnamurthy;	In this paper, we share our findings on employing a hierarchical semantic segmentation network for this task of structure extraction.
1278	Measuring the Importance of Temporal Features in Video Saliency	Matthias Tangemann; Matthias K&uumlmmerer; Thomas S.A. Wallis; Matthias Bethge;	In this work, we test this assumption by quantifying to which extent gaze on recent video saliency benchmarks can be predicted by a static baseline model.
1279	Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution	Haotian Tang; Zhijian Liu; Shengyu Zhao; Yujun Lin; Ji Lin; Hanrui Wang; Song Han;	To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch.
1280	Towards Reliable Evaluation of Algorithms for Road Network Reconstruction from Aerial Images	Leonardo Citraro; Mateusz Kozi?ski; Pascal Fua;	To provide more reliable evaluation, we design three new metrics that are sensitive to all classes of errors.
1281	Online Continual Learning under Extreme Memory Constraints	Enrico Fini; St&eacutephane Lathuili&egravere; Enver Sangineto; Moin Nabi; Elisa Ricci;	In this paper, we introduce the novel problem of Memory-Constrained Online Continual Learning (MC-OCL) which imposes strict constraints on the memory overhead that a possible algorithm can use to avoid catastrophic forgetting.
1282	Learning to Cluster under Domain Shift	Willi Menapace; St&eacutephane Lathuili&egravere; Elisa Ricci;	In this work we overcome this assumption and we address the problem of transferring knowledge from a source to a target domain when both source and target data have no annotations.
1283	Defense Against Adversarial Attacks via Controlling Gradient Leaking on Embedded Manifolds	Yueru Li; Shuyu Cheng; Hang Su; Jun Zhu;	In this paper, we present a new perspective, namely gradient leaking hypothesis, to understand the existence of adversarial examples and to further motivate effective defense strategies.
1284	Improving Optical Flow on a Pyramid Level	Markus Hofinger; Samuel Rota Bulò Lorenzo Porzi; Arno Knapitsch; Thomas Pock; Peter Kontschieder;	In this work we review the coarse-to-fine spatial feature pyramid concept, which is used in state-of-the-art optical flow estimation networks to make exploration of the pixel flow search space computationally tractable and efficient.
1285	Procrustean Regression Networks: Learning 3D Structure of Non-Rigid Objects from 2D Annotations	Sungheon Park; Minsik Lee; Nojun Kwak;	We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects when only 2D annotations are available as ground truths.
1286	Learning to Learn Parameterized Classification Networks for Scalable Input Images	Duo Li; Anbang Yao; Qifeng Chen;	To achieve efficient and flexible image classification at runtime, we employ meta learners to generate convolutional weights of main networks for various input scales and maintain privatized Batch Normalization layers per scale.
1287	Stereo Event-based Particle Tracking Velocimetry for 3D Fluid Flow Reconstruction	Yuanhao Wang; Ramzi Idoughi; Wolfgang Heidrich;	In this paper, we present a new framework that retrieves dense 3D measurements of the fluid velocity field using a pair of event-based cameras.
1288	Simplicial Complex based Point Correspondence between Images warped onto Manifolds	Charu Sharma; Manohar Kaul;	In this paper, we pose the assignment problem as finding a bijective map between two graph induced simplicial complexes, which are higher-order analogues of graphs.
1289	Representation Learning on Visual-Symbolic Graphs for Video Understanding	Effrosyni Mavroudi; Benjam&iacuten B&eacutejar Haro; Ren&eacute Vidal;	To capture this rich visual and semantic context, we propose using two graphs: (1) an attributed spatio-temporal visual graph whose nodes correspond to actors and objects and whose edges encode different types of interactions, and (2) a symbolic graph that models semantic relationships.
1290	Distance-Normalized Unified Representation for Monocular 3D Object Detection	Xuepeng Shi; Zhixiang Chen; Tae-Kyun Kim;	To achieve fast and accurate monocular 3D object detection, we introduce a single-stage and multi-scale framework to learn a unified representation for objects within different distance ranges, termed as UR3D.
1291	Sequential Deformation for Accurate Scene Text Detection	Shanyu Xiao; Liangrui Peng; Ruijie Yan; Keyu An; Gang Yao; Jaesik Min;	In this paper, we propose a novel sequential deformation method to effectively model the line-shape of scene text.
1292	Where to Explore Next? ExHistCNN for History-aware Autonomous 3D Exploration	Yiming Wang; Alessio Del Bue;	In this work we address the problem of autonomous 3D exploration of an unknown indoor environment using a depth camera.
1293	Semi-Supervised Segmentation based on Error-Correcting Supervision	Robert Mendel; Luis Antonio de Souza Jr; David Rauber; Jo&atildeo Paulo Papa; Christoph Palm;	In this work, we augment such supervised segmentation models by allowing them to learn from unlabeled data.
1294	Quantum-soft QUBO Suppression for Accurate Object Detection	Junde Li; Swaroop Ghosh;	In this paper, we first map the task of removing redundant detections into Quadratic Unconstrained Binary Optimization (QUBO) framework that consists of detection score from each bounding box and overlap ratio between pair of bounding boxes. Next, we solve the QUBO problem using the proposed Quantum-soft QUBO Suppression algorithm for fast and accurate detection by exploiting quantum computing advantages.
1295	Label-similarity Curriculum Learning	&Uumlr&uumln Dogan; Aniket Anand Deshmukh; Marcin Bronislaw Machura; Christian Igel;	We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation.
1296	Recurrent Image Annotation With Explicit Inter-Label Dependencies	Ayushi Dutta; Yashaswi Verma; C.V. Jawahar;	In this paper, we address this limitation and propose a novel approach in which the RNN is explicitly forced to learn multiple relevant inter-label dependencies, without the need of feeding the ground-truth in any particular order.
1297	Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution	Jing Yao; Danfeng Hong; Jocelyn Chanussot; Deyu Meng; Xiaoxiang Zhu ; Zongben Xu;	To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI).
1298	SimPose: Effectively Learning DensePose and Surface Normals of People from Simulated Data	Tyler Zhu; Per Karlsson; Christoph Bregler;	With a proliferation of generic domain-adaptation approaches, we report a simple yet effective technique for learning difficult per-pixel 2.5D and 3D regression representations of articulated people.
1299	ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images	Yu-Hui Lee; Shang-Hong Lai;	In this paper, we propose a novel image-to-image GAN framework for eyeglasses removal, called ByeGlassesGAN, which is used to automatically detect the position of eyeglasses and then remove them from face images.
1300	Differentiable Joint Pruning and Quantization for Hardware Efficiency	Ying Wang; Yadong Lu; Tijmen Blankevoort;	We present a differentiable joint pruning and quantization (DJPQ) scheme.
1301	Learning to Generate Customized Dynamic 3D Facial Expressions	Rolandos Alexandros Potamias; Jiali Zheng; Stylianos Ploumpis; Giorgos Bouritsas; Evangelos Ververas; Stefanos Zafeiriou;	In this paper, we extrapolate those advances to the 3D domain, by studying 3D image-to-video translation with a particular focus on 4D facial expressions.
1302	LandscapeAR: Large Scale Outdoor Augmented Reality by Matching Photographs with Terrain Models Using Learned Descriptors	Jan Brejcha; Michal Luk&aacute?; Yannick Hold-Geoffroy; Oliver Wang; Martin ?ad&iacutek;	We introduce a solution to large scale Augmented Reality for outdoor scenes by registering camera images to textured Digital Elevation Models (DEMs).
1303	Learning Disentangled Feature Representation for Hybrid-distorted Image Restoration	Xin Li; Xin Jin; Jianxin Lin; Sen Liu; Yaojun Wu; Tao Yu; Wei Zhou ; Zhibo Chen;	To decompose such interference, we introduce the concept of Disentangled Feature Learning to achieve the feature-level divide-and-conquer of hybrid distortions.
1304	Jointly De-biasing Face Recognition and Demographic Attribute Estimation	Sixue Gong; Xiaoming Liu; Anil K. Jain;	We present a novel de-biasing adversarial network (DebFace) that learns to extract disentangled feature representations for both unbiased face recognition and demographics estimation.
1305	Regularized Loss for Weakly Supervised Single Class Semantic Segmentation	Olga Veksler;	We propose a new weakly supervised method for training CNNs to segment an object of a single class of interest.
1306	Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks	Chankyu Lee; Adarsh Kumar Kosta; Alex Zihao Zhu; Kenneth Chaney; Kostas Daniilidis; Kaushik Roy;	To overcome these issues, we present Spike-FlowNet, a deep hybrid neural network architecture integrating SNNs and ANNs for efficiently estimating optical flow from sparse event camera outputs without sacrificing the performance.
1307	Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations	Aditya Golatkar; Alessandro Achille; Stefano Soatto;	We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions, and can be extended to ensure forgetting in the final activations of the network.
1308	Inherent Adversarial Robustness of Deep Spiking Neural Networks: Effects of Discrete Input Encoding and Non-Linear Activations	Saima Sharmin; Nitin Rathi; Priyadarshini Panda; Kaushik Roy;	In this work, we demonstrate that adversarial accuracy of SNNs under gradient-based attacks is higher than their non-spiking counterparts for CIFAR datasets on deep VGG and ResNet architectures, particularly in blackbox attack scenario.
1309	Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks	Baris Gecer; Alexandros Lattas; Stylianos Ploumpis; Jiankang Deng; Athanasios Papaioannou; Stylianos Moschoglou; Stefanos Zafeiriou;	In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis.
1310	Learning to Learn Words from Visual Scenes	D&iacutedac Sur&iacutes; Dave Epstein; Heng Ji; Shih-Fu Chang; Carl Vondrick;	We introduce a meta-learning framework that mph{learns how to learn} word representations from unconstrained scenes.
1311	On Transferability of Histological Tissue Labels in Computational Pathology	Mahdi S. Hosseini; Lyndon Chan; Weimin Huang; Yichen Wang; Danial Hasan; Corwyn Rowsell; Savvas Damaskinos; Konstantinos N. Plataniotis;	In this paper, we explore the possibility of transferring diagnostically-relevant histology labels from a source-domain into multiple target-domains to classify similar tissue structures and cancer grades.
1312	Learning Actionness via Long-range Temporal Order Verification	Dimitri Zhukov; Jean-Baptiste Alayrac; Ivan Laptev; Josef Sivic;	To address these challenges, we here propose a self-supervised and generic method to isolate actions from their back-ground.
1313	Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays	Laurie Bose; Piotr Dudek; Jianing Chen; Stephen J. Carey; Walterio W. Mayol-Cuevas;	We present a novel method of CNN inference for pixel processor array (PPA) vision sensors, designed to take advantage of their massive parallelism and analog compute capabilities.
1314	Character Region Attention For Text Spotting	Youngmin Baek; Seung Shin; Jeonghun Baek; Sungrae Park; Junyeop Lee ; Daehyun Nam; Hwalsuk Lee;	Based on the insight, we construct a tightly coupled single pipeline model.
1315	Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network	Anh-Huy Phan; Konstantin Sobolev; Konstantin Sozykin; Dmitry Ermilov ; Julia Gusak; Petr Tichavský Valeriy Glukhov; Ivan Oseledets; Andrzej Cichocki;	We present a novel method, which can stabilize the low-rank approximation of convolutional kernels and ensure efficient compression while preserving the high-quality performance of the neural networks.
1316	Dual Mixup Regularized Learning for Adversarial Domain Adaptation	Yuan Wu; Diana Inkpen; Ahmed El-Roby;	In order to alleviate the above issues, we propose a dual mixup regularized learning (DMRL) method for UDA, which not only guides the classifier in enhancing consistent predictions in-between samples, but also enriches the intrinsic structures of the latent space.
1317	Robust and On-the-fly Dataset Denoising for Image Classification	Jiaming Song; Yann Dauphin; Michael Auli; Tengyu Ma;	We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.
1318	Imaging Behind Occluders Using Two-Bounce Light	Connor Henley; Tomohiro Maeda; Tristan Swedish; Ramesh Raskar;	We introduce the new non-line-of-sight imaging problem of mph{imaging behind an occluder}.
1319	Improving Object Detection with Selective Self-Supervised Self-Training	Yandong Li; Di Huang; Danfeng Qin; Liqiang Wang; Boqing Gong;	To tackle this challenge, we propose a selective net to rectify the supervision signals in Web images.
1320	Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction	Rohan Chabra; Jan E. Lenssen; Eddy Ilg; Tanner Schmidt; Julian Straub; Steven Lovegrove; Richard Newcombe;	To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables high-quality 3D shape representation without prohibitive memory requirements.
1321	Info3D: Representation Learning on 3D Objects using Mutual Information Maximization and Contrastive Learning	Aditya Sanghi;	To solve these issues we propose to extend the InfoMax and contrastive learning principles on 3D shapes.
1322	Adversarial Data Augmentation via Deformation Statistics	Sahin Olut; Zhengyang Shen; Zhenlin Xu; Samuel Gerber; Marc Niethammer;	To that end, we explore an augmentation strategy which builds statistical deformation models from unlabeled data via principal component analysis and uses the resulting statistical deformation space to augment the labeled training samples.
1323	Neural Predictor for Neural Architecture Search	Wei Wen; Hanxiao Liu; Yiran Chen; Hai Li; Gabriel Bender; Pieter-Jan Kindermans;	We propose an approach with three basic steps that is conceptually much simpler.
1324	Learning Permutation Invariant Representations using Memory Networks	Shivam Kalra; Mohammed Adnan; Graham Taylor; H.R. Tizhoosh;	In this work, we present a permutation invariant neural network called Memory-based Exchangeable Model (MEM) for learning universal set functions.
1325	Feature Space Augmentation for Long-Tailed Data	Peng Chu; Xiao Bian; Shaopeng Liu; Haibin Ling;	In this work, we present a novel approach to address the long-tailed problem by augmenting the under-represented classes in the feature space with the features learned from the classes with ample samples.
1326	Laying the Foundations of Deep Long-Term Crowd Flow Prediction	Samuel S. Sohn; Honglu Zhou; Seonghyeon Moon; Sejong Yoon; Vladimir Pavlovic; Mubbasir Kapadia;	We propose the first deep framework to instantly predict the long-term flow of crowds in arbitrarily large, realistic environments.
1327	Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning	Zhekun Luo; Devin Guillory; Baifeng Shi; Wei Ke; Fang Wan; Trevor Darrell; Huijuan Xu;	In this work, we explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization (EM) framework.
1328	Fairness by Learning Orthogonal Disentangled Representations	Mhd Hasan Sarhan; Nassir Navab; Abouzar Eslami; Shadi Albarqouni;	In this paper, we propose a novel disentanglement approach to invariant representation problem.
1329	Self-supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation	Cheng Ouyang; Carlo Biffi; Chen Chen; Turkay Kart; Huaqi Qiu; Daniel Rueckert;	To address this problem we make several contributions: (1) A novel self-supervised FSS framework for medical images in order to eliminate the requirement for annotations during training.
1330	On Diverse Asynchronous Activity Anticipation	He Zhao; Richard P. Wildes;	We investigate the joint anticipation of long-term activity labels and their corresponding times with the aim of improving both the naturalness and diversity of predictions. We address these matters using Conditional Adversarial Generative Networks for Discrete Sequences.
1331	Representative-Discriminative Learning for Open-set Land Cover Classification of Satellite Imagery	Razieh Kaviani Baghbaderani; Ying Qu; Hairong Qi; Craig Stutts;	In this paper, we study the problem of open-set land cover classification that identifies the samples belonging to unknown classes during testing, while maintaining performance on known classes.
1332	Structure-Aware Human-Action Generation	Ping Yu; Yang Zhao; Chunyuan Li; Junsong Yuan; Changyou Chen;	To overcome this challenge, we propose a variant of GCNs to leverage the self-attention mechanism to prune a complete action graph in the temporal space.
1333	Towards Efficient Coarse-to-Fine Networks for Action and Gesture Recognition	Niamul Quader; Juwei Lu; Peng Dai; Wei Li;	First, we systematically yield enhanced receptive fields for complementary feature extraction via coarse-to-fine decomposition of input imagery along the spatial and temporal dimensions, and adaptively focus on training important feature pathways using a reparameterized fully connected layer. Second, we develop a `use when needed’ scheme with a `coarse-exit’ strategy that allows selective use of expensive high-resolution processing in a data-dependent fashion to retain accuracy while reducing computation cost.
1334	S&sup3Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data	Bin Cheng; Inderjot Singh Saggu; Raunak Shah; Gaurav Bansal; Dinesh Bharadia;	We present S3Net, a self-supervised framework which combines these complementary features: we use synthetic and real-world images for training while exploiting geometric, temporal, as well as semantic constraints.
1335	Leveraging Seen and Unseen Semantic Relationships for Generative Zero-Shot Learning	Maunil R Vyas; Hemanth Venkateswara; Sethuraman Panchanathan;	To address this concern, we propose the novel LsrGAN, a generative model that Leverages the Semantic Relationship between seen and unseen categories and explicitly performs knowledge transfer by incorporating a novel Semantic Regularized Loss (SR-Loss).
1336	Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks	Niamul Quader; Md Mafijul Islam Bhuiyan; Juwei Lu; Peng Dai; Wei Li;	We propose novel approaches for simultaneously identifying important weights of a convolutional neural network (ConvNet) and providing more attention to the important weights during training.
1337	UNITER: UNiversal Image-TExt Representation Learning	Yen-Chun Chen; Linjie Li; Licheng Yu; Ahmed El Kholy Faisal Ahmed; Zhe Gan; Yu Cheng; Jingjing Liu;	In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings.
1338	Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks	Xiujun Li; Xi Yin; Chunyuan Li; Pengchuan Zhang; Xiaowei Hu; Lei Zhang; Lijuan Wang; Houdong Hu; Li Dong; Furu Wei; Yejin Choi; Jianfeng Gao;	While existing methods simply concatenate image region features and text features as input to the model to be pre-trained and use self-attention to learn image-text semantic alignments in a brute force manner, in this paper, we propose a new learning method Oscar, which uses object tags detected in images as anchor points to significantly ease the learning of alignments.
1339	Improving Face Recognition from Hard Samples via Distribution Distillation Loss	Yuge Huang; Pengcheng Shen; Ying Tai; Shaoxin Li; Xiaoming Liu; Jilin Li; Feiyue Huang; Rongrong Ji;	To improve the performance on hard samples, we propose a novel Distribution Distillation Loss to narrow the performance gap between easy and hard samples, which is simple, effective and generic for various types of facial variations.
1340	Extract and Merge: Superpixel Segmentation with Regional Attributes	Jianqiao An; Yucheng Shi; Yahong Han; Meijun Sun; Qi Tian;	In this work, we propose the concept of regional attribute, which indicates the location of a certain region in the object.
1341	Spatial-Adaptive Network for Single Image Denoising	Meng Chang; Qi Li; Huajun Feng; Zhihai Xu;	In this paper, we propose a novel spatial-adaptive denoising network (SADNet) for effcient single image blind noise removal.
1342	Physics-based Feature Dehazing Networks	Jiangxin Dong; Jinshan Pan;	We propose a physics-based feature dehazing network for image dehazing.
1343	Learning Surrogates via Deep Embedding	Yash Patel; Tom&aacuteš Hoda?; Ji?&iacute Matas;	This paper proposes a technique for training neural networks by minimizing surrogate losses that approximate the target evaluation metric, which may be non-differentiable.
1344	An Asymmetric Modeling for Action Assessment	Jibin Gao; Wei-Shi Zheng; Jia-Hui Pan; Chengying Gao; Yaowei Wang; Wei Zeng; Jianhuang Lai;	In this work, we model the asymmetric interactions among agents for action assessment.
1345	High-quality Single-model Deep Video Compression with Frame-Conv3D and Multi-frame Differential Modulation	Wenyu Sun; Chen Tang; Weigui Li; Zhuqing Yuan; Huazhong Yang; Yongpan Liu;	This paper proposes a deep video compression method to simultaneously encode multiple frames with Frame-Conv3D and differential modulation.
1346	Instance-Aware Embedding for Point Cloud Instance Segmentation	Tong He; Yifan Liu; Chunhua Shen; Xinlong Wang; Changming Sun;	In this work, we study the influence of instance-aware knowledge by proposing an Instance-Aware Module (IAM).
1347	Self-Paced Deep Regression Forests with Consideration on Underrepresented Examples	Lili Pan; Shijie Ai; Yazhou Ren; Zenglin Xu;	To this end, this paper proposes a new deep discriminative model—self-paceddeep regression forests with consideration on underrepresented examples (SPUDRFs).
1348	Manifold Projection for Adversarial Defense on Face Recognition	Jianli Zhou; Chao Liang; Jun Chen;	In this paper, we propose Adversarial Variational AutoEncoder (A-VAE), a novel framework to tackle both types of attacks.
1349	Weakly Supervised Learning with Side Information for Noisy Labeled Images	Lele Cheng; Xiangzeng Zhou; Liming Zhao; Dangwei Li; Hong Shang; Yun Zheng; Pan Pan; Yinghui Xu;	In this paper, we present an efficient weakly-supervised learning by using a Side Information Network (SINet), which aims to effectively carry out a large scale classi cation with severely noisy labels.
1350	Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision	Peng Wu; Jing Liu; Yujia Shi; Yujia Sun; Fangtao Shao; Zhaoyang Wu ; Zhiwei Yang;	To address this problem, in this work we first release a large-scale and multi-scene dataset named XD-Violence with a total duration of 217 hours, containing 4754 untrimmed videos with audio signals and weak labels.
1351	SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection	Rui Fan; Hengli Wang; Peide Cai; Ming Liu;	Hence, in this paper, we first introduce a novel module, named surface normal estimator (SNE), which can infer surface normal information from dense depth/disparity images with high accuracy and efficiency. Furthermore, we propose a data-fusion CNN architecture, referred to as RoadSeg, which can extract and fuse features from both RGB images and the inferred surface normal information for accurate freespace detection.
1352	Modeling the Space of Point Landmark Constrained Diffeomorphisms	Chengfeng Wen; Yang Guo; Xianfeng Gu;	In order to fulfill these requirements, this work proposes a novel model of the space of point landmark constrained diffeomorphisms.
1353	PieNet: Personalized Image Enhancement Network	Han-Ul Kim; Young Jun Koh; Chang-Su Kim;	In this paper, we propose the first deep learning approach to personalized image enhancement, which can enhance new images for a new user, by asking him or her to select about 10$\sim$20 preferred images from a random set of images.
1354	Rotational Outlier Identification in Pose Graphs Using Dual Decomposition	Arman Karimian; Ziqi Yang; Roberto Tron;	In this paper, we contribute to the state of the art of the latter, by proposing a method to detect incorrect orientation measurements prior to pose graph optimization by checking the geometric consistency of rotation measurements.
1355	Speech-driven Facial Animation using Cascaded GANs for Learning of Motion and Texture	Dipanjan Das; Sandika Biswas; Sanjana Sinha; Brojeshwar Bhowmick;	In this paper, we propose a novel strategy where we partition the problem and learn the motion and texture separately.
1356	Solving Phase Retrieval with a Learned Reference	Rakib Hyder; Zikui Cai; M. Salman Asif;	In this paper, we assume that a known (learned) reference is added to the signal before capturing the Fourier amplitude measurements. Our method is inspired by the principle of adding a reference signal in holography.
1357	Dual Grid Net: Hand Mesh Vertex Regression from Single Depth Maps	Chengde Wan; Thomas Probst; Luc Van Gool; Angela Yao;	We aim to recover the dense 3D surface of the hand from depth maps and propose a network that can predict mesh vertices, transformation matrices for every joint and joint coordinates in a single forward pass.