Paper Digest: CVPR 2019 Highlights

June 13, 2019October 29, 2019 admin

Download CVPR-2019-Paper-Digests.pdf– highlights of all 1,294 CVPR-2019 papers (.PDF file size is ~1M).
You can also download paper highlights by sessions (15 sessions in total):
3D Multiview; 3D Single View & RGBD; Action & Video;
Applications; Computational Photography & Graphics; Deep Learning;
Face & Body; Language & Reasoning; Low-Level & Optimization;
Motion & Biometrics; Recognition; Scenes & Representation;
Segmentation, Grouping, & Shape; Statistics, Physics, Theory, & Datasets; Synthesis.

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2019, it is to be held in California. There were more than 5,000 paper submissions, of which 1,294 were accepted. More than 100 papers also published their code (download link).

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get new paper updates customized to your own interests on a daily basis. You are also welcome to follow us on Twitter and Linkedin for most recent updates.

Paper Digest Team
team@paperdigest.org

TABLE 1: CVPR 2019 Papers

	Title	Authors	Highlight
1	Finding Task-Relevant Features for Few-Shot Learning by Category Traversal	Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang	In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners.
2	Edge-Labeling Graph Neural Network for Few-Shot Learning	Jongmin Kim, Taesup Kim, Sungwoong Kim, Chang D. Yoo	In this paper, we propose a novel edge-labeling graph neural network (EGNN), which adapts a deep neural network on the edge-labeling graph, for few-shot learning.
3	Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning	Spyros Gidaris, Nikos Komodakis	Given an initial recognition model already trained on a set of base classes, the goal of this work is to develop a meta-model for few-shot learning.
4	Kervolutional Neural Networks	Chen Wang, Jianfei Yang, Lihua Xie, Junsong Yuan	To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick.
5	Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem	Matthias Hein, Maksym Andriushchenko, Julian Bitterwolf	For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data.
6	On the Structural Sensitivity of Deep Convolutional Networks to the Directions of Fourier Basis Functions	Yusuke Tsuzuku, Issei Sato	As a byproduct of the analysis, we propose an algorithm to create shift-invariant universal adversarial perturbations available in black-box settings.
7	Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization	Siyuan Qiao, Zhe Lin, Jianming Zhang, Alan L. Yuille	In this paper, we study the problem of improving computational resource utilization of neural networks.
8	Hardness-Aware Deep Metric Learning	Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, Jie Zhou	This paper presents a hardness-aware deep metric learning (HDML) framework.
9	Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation	Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L. Yuille, Li Fei-Fei	In this paper, we study NAS for semantic image segmentation.
10	Learning Loss for Active Learning	Donggeun Yoo, In So Kweon	In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks.
11	Striking the Right Balance With Uncertainty	Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, Ling Shao	In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples.
12	AutoAugment: Learning Augmentation Strategies From Data	Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le	In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies.
13	SDRSAC: Semidefinite-Based Randomized Approach for Robust Point Cloud Registration Without Correspondences	Huu M. Le, Thanh-Toan Do, Tuan Hoang, Ngai-Man Cheung	This paper presents a novel randomized algorithm for robust point cloud registration without correspondences.
14	BAD SLAM: Bundle Adjusted Direct RGB-D SLAM	Thomas Schops, Torsten Sattler, Marc Pollefeys	In contrast, in this paper we present a novel, fast direct BA formulation which we implement in a real-time dense RGB-D SLAM algorithm. In order to facilitate state-of-the-art research on direct RGB-D SLAM, we propose a novel, well-calibrated benchmark for this task that uses synchronized global shutter RGB and depth cameras.
15	Revealing Scenes by Inverting Structure From Motion Reconstructions	Francesco Pittaluga, Sanjeev J. Koppal, Sing Bing Kang, Sudipta N. Sinha	In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy.
16	Strand-Accurate Multi-View Hair Capture	Giljoo Nam, Chenglei Wu, Min H. Kim, Yaser Sheikh	In this paper, we present the first method to capture high-fidelity hair geometry with strand-level accuracy.
17	DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation	Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove	In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data.
18	Pushing the Boundaries of View Extrapolation With Multiplane Images	Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, Noah Snavely	We explore the problem of view synthesis from a narrow baseline pair of images, and focus on generating high-quality view extrapolations with plausible disocclusions.
19	GA-Net: Guided Aggregation Net for End-To-End Stereo Matching	Feihu Zhang, Victor Prisacariu, Ruigang Yang, Philip H.S. Torr	We propose two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively.
20	Real-Time Self-Adaptive Deep Stereo	Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano	Instead, we propose to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment.
21	LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation	Sunok Kim, Seungryong Kim, Dongbo Min, Kwanghoon Sohn	We present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks.
22	NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences	Chen Zhao, Zhiguo Cao, Chi Li, Xin Li, Jiaqi Yang	To address this issue, we present a compatibility-specific mining method to search for consistent neighbors.
23	Coordinate-Free Carlsson-Weinshall Duality and Relative Multi-View Geometry	Matthew Trager, Martial Hebert, Jean Ponce	We present a coordinate-free description of Carlsson-Weinshall duality between scene points and camera pinholes and use it to derive a new characterization of primal/dual multi-view geometry.
24	Deep Reinforcement Learning of Volume-Guided Progressive View Inpainting for 3D Point Scene Completion From a Single Depth Image	Xiaoguang Han, Zhaoxuan Zhang, Dong Du, Mingdai Yang, Jingming Yu, Pan Pan, Xin Yang, Ligang Liu, Zixiang Xiong, Shuguang Cui	We present a deep reinforcement learning method of progressive view inpainting for 3D point scene completion under volume guidance, achieving high-quality scene reconstruction from only a single depth image with severe occlusion.
25	Video Action Transformer Network	Rohit Girdhar, Joao Carreira, Carl Doersch, Andrew Zisserman	We introduce the Action Transformer model for recognizing and localizing human actions in video clips.
26	Timeception for Complex Action Recognition	Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders	This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued.
27	STEP: Spatio-Temporal Progressive Learning for Video Action Detection	Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry S. Davis, Jan Kautz	In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector–a progressive learning framework for spatio-temporal action detection in videos.
28	Relational Action Forecasting	Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid	This paper focuses on multi-person action forecasting in videos.
29	Long-Term Feature Banks for Detailed Video Understanding	Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, Ross Girshick	In this paper, we enable existing video models to do the same.
30	Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes	Yuke Li	To this end, we propose a novel Imitative Decision Learning (IDL) approach.
31	What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment	Paritosh Parmar, Brendan Tran Morris	In this paper, we propose to learn spatio-temporal features that explain three related tasks – fine-grained action recognition, commentary generation, and estimating the AQA score.
32	MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation	Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou	In this paper, we propose a novel approach to defer the decision making for a target object in each frame, until a global view can be established with the entire video being taken into consideration.
33	Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model	Weining Wang, Yan Huang, Liang Wang	Considering that current methods are generally time-consuming due to the dense frame-processing manner, we propose a recurrent neural network based reinforcement learning model which selectively observes a sequence of frames and associates the given sentence with video content in a matching-based manner.
34	Gaussian Temporal Awareness Networks for Action Localization	Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei	In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal.
35	Efficient Video Classification Using Fewer Frames	Shweta Bhardwaj, Mukundhan Srinivasan, Mitesh M. Khapra	In this work, we focus on building compute-efficient video classification models which process fewer frames and hence have less number of FLOPs.
36	Parsing R-CNN for Instance-Level Human Analysis	Lu Yang, Qing Song, Zhihui Wang, Ming Jiang	In this paper, we present an end-to-end pipeline for solving the instance-level human analysis, named Parsing R-CNN.
37	Large Scale Incremental Learning	Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu	We propose a simple and effective method to address this data imbalance issue.
38	TopNet: Structural Point Cloud Decoder	Lyne P. Tchapmi, Vineet Kosaraju, Hamid Rezatofighi, Ian Reid, Silvio Savarese	In this work, we pro-pose a novel decoder that generates a structured point cloud without assuming any specific structure or topology on the underlying point set.
39	Perceive Where to Focus: Learning Visibility-Aware Part-Level Features for Partial Person Re-Identification	Yifan Sun, Qin Xu, Yali Li, Chi Zhang, Yikang Li, Shengjin Wang, Jian Sun	We propose a Visibility-aware Part Model (VPM) for partial re-ID, which learns to perceive the visibility of regions through self-supervision.
40	Meta-Transfer Learning for Few-Shot Learning	Qianru Sun, Yaoyao Liu, Tat-Seng Chua, Bernt Schiele	In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks.
41	Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation	Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid	In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically for mobile devices with limited power capacity and computation resources.
42	Deep RNN Framework for Visual Sequential Applications	Bo Pang, Kaiwen Zha, Hanwen Cao, Chen Shi, Cewu Lu	To deal with this, we propose a new recurrent neural framework that can be stacked deep effectively.
43	Graph-Based Global Reasoning Networks	Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Yan Shuicheng, Jiashi Feng, Yannis Kalantidis	In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.
44	SSN: Learning Sparse Switchable Normalization via SparsestMax	Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo	This work addresses this issue by presenting Sparse Switchable Normalization (SSN) where the importance ratios are constrained to be sparse.
45	Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition	Yongming Rao, Jiwen Lu, Jie Zhou	We present a generic, flexible and 3D rotation invariant framework based on spherical symmetry for point cloud recognition.
46	Learning to Generate Synthetic Data via Compositing	Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari	We present a task-specific approach to synthetic data generation.
47	Divide and Conquer the Embedding Space for Metric Learning	Artsiom Sanakoyeu, Vadim Tschernezki, Uta Buchler, Bjorn Ommer	In this work, we propose a novel easy-to-implement divide and conquer approach for deep metric learning, which significantly improves the state-of-the-art performance of metric learning.
48	Latent Space Autoregression for Novelty Detection	Davide Abati, Angelo Porrello, Simone Calderara, Rita Cucchiara	In our proposal, we design a general unsupervised framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying the latent representations with an autoregressive procedure.
49	Attending to Discriminative Certainty for Domain Adaptation	Vinod Kumar Kurmi, Shanu Kumar, Vinay P. Namboodiri	In this paper, we aim to solve for unsupervised domain adaptation of classifiers where we have access to label information for the source domain while these are not available for a target domain.
50	Feature Denoising for Improving Adversarial Robustness	Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L. Yuille, Kaiming He	Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising.
51	Selective Kernel Networks	Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang	We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information.
52	On Implicit Filter Level Sparsity in Convolutional Neural Networks	Dushyant Mehta, Kwang In Kim, Christian Theobalt	We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay.
53	FlowNet3D: Learning Scene Flow in 3D Point Clouds	Xingyu Liu, Charles R. Qi, Leonidas J. Guibas	In this work, we propose a novel deep neural network named FlowNet3D that learns scene flow from point clouds in an end-to-end fashion.
54	Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks	Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese	In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT).
55	Co-Occurrent Features in Semantic Segmentation	Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie	In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target.
56	Bag of Tricks for Image Classification with Convolutional Neural Networks	Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li	In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study.
57	Learning Channel-Wise Interactions for Binary Convolutional Neural Networks	Ziwei Wang, Jiwen Lu, Chenxin Tao, Jie Zhou, Qi Tian	In this paper, we propose a channel-wise interaction based binary convolutional neural network learning method (CI-BCNN) for efficient inference.
58	Knowledge Adaptation for Efficient Semantic Segmentation	Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan	To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.
59	Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack	Zhezhi He, Adnan Siraj Rakin, Deliang Fan	In this work, we propose Parametric-Noise-Injection (PNI) which involves trainable Gaussian noise injection at each layer on either activation or weights through solving the Min-Max optimization problem, embedded with adversarial training.
60	Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-Identification	Zhun Zhong, Liang Zheng, Zhiming Luo, Shaozi Li, Yi Yang	In this work, we comprehensively investigate into the intra-domain variations of the target domain and propose to generalize the re-ID model w.r.t three types of the underlying invariance, i.e., exemplar-invariance, camera-invariance and neighborhood-invariance.
61	Dissecting Person Re-Identification From the Viewpoint of Viewpoint	Xiaoxiao Sun, Liang Zheng	To derive insights in this scientific campaign, this paper makes an early attempt in studying a particular factor, viewpoint.
62	Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification	Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, Shin’ichi Satoh	To address the problem, this paper introduces a novel Dual-level Discrepancy Reduction Learning (D^2RL) scheme which handles the two discrepancies separately.
63	Progressive Feature Alignment for Unsupervised Domain Adaptation	Chaoqi Chen, Weiping Xie, Wenbing Huang, Yu Rong, Xinghao Ding, Yue Huang, Tingyang Xu, Junzhou Huang	In this paper, we propose the Progressive Feature Alignment Network (PFAN) to align the discriminative features across domains progressively and effectively, via exploiting the intra-class variation in the target domain.
64	Feature-Level Frankenstein: Eliminating Variations for Discriminative Recognition	Xiaofeng Liu, Site Li, Lingsheng Kong, Wanqing Xie, Ping Jia, Jane You, B.V.K. Kumar	In this paper, we cast these problems as an adversarial minimax game in the latent space.
65	Learning a Deep ConvNet for Multi-Label Classification With Partial Labels	Thibaut Durand, Nazanin Mehrasa, Greg Mori	To reduce the annotation cost, we propose to train a model with partial labels i.e. only some labels are known per image.
66	Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression	Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese	In this paper, we address the this weakness by introducing a generalized version of IoU as both a new loss and a new metric.
67	Densely Semantically Aligned Person Re-Identification	Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen	We propose a densely semantically aligned person re-identification (re-ID) framework. By leveraging the estimation of the dense semantics of a person image, we construct a set of densely semantically aligned part images (DSAP-images), where the same spatial positions have the same semantics across different person images.
68	Generalising Fine-Grained Sketch-Based Image Retrieval	Kaiyue Pang, Ke Li, Yongxin Yang, Honggang Zhang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song	In this paper, we identify cross-category generalisation for FG-SBIR as a domain generalisation problem, and propose the first solution.
69	Adapting Object Detectors via Selective Cross-Domain Alignment	Xinge Zhu, Jiangmiao Pang, Ceyuan Yang, Jianping Shi, Dahua Lin	Motivated by this, we propose a novel approach to domain adaption for object detection to handle the issues in “where to look” and “how to align”.
70	Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation	Yunhang Shen, Rongrong Ji, Yan Wang, Yongjian Wu, Liujuan Cao	In particular, we present an efficient and effective framework termed Weakly Supervised Joint Detection and Segmentation (WS-JDS).
71	Thinking Outside the Pool: Active Training Image Creation for Relative Attributes	Aron Yu, Kristen Grauman	We propose an active image generation approach to address this issue.
72	Generalizable Person Re-Identification by Domain-Invariant Mapping Network	Jifei Song, Yongxin Yang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales	In this work, a novel deep ReID model termed Domain-Invariant Mapping Network (DIMN) is proposed.
73	Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification	Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, Song Wang	To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches.
74	Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification	Song Bai, Peng Tang, Philip H.S. Torr, Longin Jan Latecki	Based on the analysis, we propose a unified yet robust algorithm which inherits their advantages and discards their disadvantages.
75	Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization	Junbao Zhuo, Shuhui Wang, Shuhao Cui, Qingming Huang	We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T.
76	Weakly Supervised Person Re-Identification	Jingke Meng, Sheng Wu, Wei-Shi Zheng	We cast this weakly supervised person re-id challenge into a multi-instance multi-label learning (MIML) problem.
77	PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud	Shaoshuai Shi, Xiaogang Wang, Hongsheng Li	In this paper, we propose PointRCNN for 3D object detection from raw point cloud.
78	Automatic Adaptation of Object Detectors to New Domains Using Self-Training	Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller	A modified knowledge distillation loss is proposed, and we investigate several ways of assigning soft-labels to the training examples from the target domain.
79	Deep Sketch-Shape Hashing With Segmented 3D Stochastic Viewing	Jiaxin Chen, Jie Qin, Li Liu, Fan Zhu, Fumin Shen, Jin Xie, Ling Shao	In this paper, we propose a novel framework for efficient sketch-based 3D shape retrieval, i.e., Deep Sketch-Shape Hashing (DSSH), which tackles the challenging problem from two perspectives.
80	Generative Dual Adversarial Network for Generalized Zero-Shot Learning	He Huang, Changhu Wang, Philip S. Yu, Chang-Dong Wang	In this paper, we propose a novel model that provides a unified framework for three different approaches: visual->semantic mapping, semantic->visual mapping, and metric learning.
81	Query-Guided End-To-End Person Search	Bharti Munjal, Sikandar Amin, Federico Tombari, Fabio Galasso	We introduce a novel query-guided end-to-end person search network (QEEPS) to address both aspects.
82	Libra R-CNN: Towards Balanced Learning for Object Detection	Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin	In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels – sample level, feature level, and objective level.
83	Learning a Unified Classifier Incrementally via Rebalancing	Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, Dahua Lin	In this work, we develop a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly.
84	Feature Selective Anchor-Free Module for Single-Shot Object Detection	Chenchen Zhu, Yihui He, Marios Savvides	We motivate and present feature selective anchor-free (FSAF) module, a simple and effective building block for single-shot object detectors.
85	Bottom-Up Object Detection by Grouping Extreme and Center Points	Xingyi Zhou, Jiacheng Zhuo, Philipp Krahenbuhl	In this paper, we show that bottom-up approaches still perform competitively.
86	Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples	Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, Wujie Wen	To overcome these limitations, we propose a JPEG-based defensive compression framework, namely “feature distillation”, to effectively rectify adversarial examples without impacting classification accuracy on benign data.
87	SCOPS: Self-Supervised Co-Part Segmentation	Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz	We propose a self-supervised deep learning approach for part segmentation, where we devise several loss functions that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances.
88	Unsupervised Moving Object Detection via Contextual Information Separation	Yanchao Yang, Antonio Loquercio, Davide Scaramuzza, Stefano Soatto	We propose an adversarial contextual model for detecting moving objects in images.
89	Pose2Seg: Detection Free Human Instance Segmentation	Song-Hai Zhang, Ruilong Li, Xin Dong, Paul Rosin, Zixi Cai, Xi Han, Dingcheng Yang, Haozhi Huang, Shi-Min Hu	In this paper, we present a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, rather than proposal region detection. Therefore, in this paper we introduce a new benchmark “Occluded Human (OCHuman)”, which focuses on occluded humans with comprehensive annotations including bounding-box, human pose and instance masks.
90	DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios	Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, Bolei Zhou	In this paper, we construct a novel large-scale stereo dataset named DrivingStereo.
91	PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding	Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, Hao Su	We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information.
92	A Dataset and Benchmark for Large-Scale Multi-Modal Face Anti-Spoofing	Shifeng Zhang, Xiaobo Wang, Ajian Liu, Chenxu Zhao, Jun Wan, Sergio Escalera, Hailin Shi, Zezheng Wang, Stan Z. Li	To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and visual modalities. We also provide a measurement set, evaluation protocol and training/validation/testing subsets, developing a new benchmark for face anti-spoofing.
93	Unsupervised Learning of Consensus Maximization for 3D Vision Problems	Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool	In this paper, we propose for the first time an unsupervised learning framework for consensus maximization, in the context of solving 3D vision problems.
94	VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People	Danna Gurari, Qing Li, Chi Lin, Yinan Zhao, Anhong Guo, Abigale Stangl, Jeffrey P. Bigham	We introduce the first visual privacy dataset originating from people who are blind in order to better understand their privacy disclosures and to encourage the development of algorithms that can assist in preventing their unintended disclosures.
95	Structural Relational Reasoning of Point Clouds	Yueqi Duan, Yu Zheng, Jiwen Lu, Jie Zhou, Qi Tian	In this paper, we propose an effective plug-and-play module called the structural relation network (SRN) to reason about the structural dependencies of local regions in 3D point clouds.
96	MVF-Net: Multi-View 3D Face Morphable Model Regression	Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu	We in this paper explore 3DMM-based shape recovery in a different setting, where a set of multi-view facial images are given as input.
97	Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction	Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, Simon Lucey	In this paper, we address the problem of 3D object mesh reconstruction from RGB videos.
98	Guided Stereo Matching	Matteo Poggi, Davide Pallotti, Fabio Tosi, Stefano Mattoccia	Therefore, in this paper, we introduce Guided Stereo Matching, a novel paradigm leveraging a small amount of sparse, yet reliable depth measurements retrieved from an external source enabling to ameliorate this weakness.
99	Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion	Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis	In this work, we propose a novel framework for unsupervised learning for event cameras that learns motion information from only the event stream.
100	Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN	Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis	To address this problem, we propose Geo-CNN, which applies a generic convolution-like operation dubbed as GeoConv to each point and its local neighborhood.
101	3D Point Capsule Networks	Yongheng Zhao, Tolga Birdal, Haowen Deng, Federico Tombari	In this paper, we propose 3D point-capsule networks, an auto-encoder designed to process sparse 3D point clouds while preserving spatial arrangements of the input data.
102	GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving	Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang	We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving.
103	Single-Image Piece-Wise Planar 3D Reconstruction via Associative Embedding	Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, Shenghua Gao	To tackle this problem, we propose a novel two-stage method based on associative embedding, inspired by its recent success in instance segmentation.
104	3DN: 3D Deformation Network	Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann	Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we introduce 3DN, an end-to-end network that deforms the source model to resemble the target.
105	HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation	Cheng Sun, Chi-Wei Hsiao, Min Sun, Hwann-Tzong Chen	We present a new approach to the problem of estimating the 3D room layout from a single panoramic image.
106	Deep Fitting Degree Scoring Network for Monocular 3D Object Detection	Lijie Liu, Jiwen Lu, Chunjing Xu, Qi Tian, Jie Zhou	In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively.
107	Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering	Seungryul Baek, Kwang In Kim, Tae-Kyun Kim	Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering?
108	Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry	Muhammed Kocabas, Salih Karagoz, Emre Akbas	To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics.
109	FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image	Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, Yung-Yu Chuang	This paper proposes a method for head pose estimation from a single image.
110	Dense 3D Face Decoding Over 2500FPS: Joint Texture & Shape Convolutional Mesh Decoders	Yuxiang Zhou, Jiankang Deng, Irene Kotsia, Stefanos Zafeiriou	In this paper, we present the first, to the best of our knowledge, non-linear 3DMMs by learning joint texture and shape auto-encoders using direct mesh convolutions.
111	Does Learning Specific Features for Related Parts Help Human Pose Estimation?	Wei Tang, Ying Wu	Ablation experiments indicate learning specific features significantly improves the localization of occluded parts and thus benefits HPE.
112	Linkage Based Face Clustering via Graph Convolution Network	Zhongdao Wang, Liang Zheng, Yali Li, Shengjin Wang	In this paper, we present an accurate and scalable approach to the face clustering task.
113	Towards High-Fidelity Nonlinear 3D Face Morphable Model	Luan Tran, Feng Liu, Xiaoming Liu	To address this problem, this paper presents a novel approach to learn additional proxies as means to side-step strong regularizations, as well as, leverages to promote detailed shape/albedo.
114	RegularFace: Deep Face Recognition via Exclusive Regularization	Kai Zhao, Jingyi Xu, Ming-Ming Cheng	In this paper, we propose the `exclusive regularization’ that focuses on the other aspect of discriminability — the inter-class separability, which is neglected in many recent approaches.
115	BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation	Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie Zhou, Qi Tian	In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively.
116	GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction	Baris Gecer, Stylianos Ploumpis, Irene Kotsia, Stefanos Zafeiriou	In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images.
117	Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training	Mahdi Abavisani, Hamid Reza Vaezi Joze, Vishal M. Patel	We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition.
118	Learning to Reconstruct People in Clothing From a Single RGB Camera	Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, Gerard Pons-Moll	We present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4 to 5mm, while being orders of magnitude faster than previous methods.
119	Distilled Person Re-Identification: Towards a More Scalable System	Ancong Wu, Wei-Shi Zheng, Xiaowei Guo, Jian-Huang Lai	To solve these problems in a unified system, we propose a Multi-teacher Adaptive Similarity Distillation Framework, which requires only a few labelled identities of target domain to transfer knowledge from multiple teacher models to a user-specified lightweight student model without accessing source domain data.
120	A Perceptual Prediction Framework for Self Supervised Event Segmentation	Sathyanarayanan N. Aakur, Sudeep Sarkar	In this paper, we tackle the problem of self-supervised temporal segmentation that alleviates the need for any supervision in the form of labels (full supervision) or temporal ordering (weak supervision).
121	COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis	Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou	To address these problems, we introduce a large-scale dataset called “COIN” for COmprehensive INstruction video analysis.
122	Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization	Chenchen Liu, Xinyu Weng, Yadong Mu	To address this issue, this work proposes a novel framework that simultaneously solving two inherently related tasks – crowd counting and localization.
123	An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition	Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan	In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data.
124	Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection	Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H. Li, Ge Li	In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels.
125	MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment	Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis	In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network.
126	Less Is More: Learning Highlight Detection From Video Duration	Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman	We propose a scalable unsupervised solution that exploits video duration as an implicit supervision signal.
127	DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition	Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan	To remedy these issues, we propose a lightweight generator network, which reduces noises in motion vectors and captures fine motion details, achieving a more Discriminative Motion Cue (DMC) representation.
128	AdaFrame: Adaptive Frame Selection for Fast Video Recognition	Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis	We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition.
129	Spatio-Temporal Video Re-Localization by Warp LSTM	Yang Feng, Lin Ma, Wei Liu, Jiebo Luo	In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization.
130	Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization	Daochang Liu, Tingting Jiang, Yizhou Wang	In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation. Then by presenting a novel network architecture and its training strategy, the two problems are explicitly looked into.
131	Unsupervised Deep Tracking	Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li	We propose an unsupervised visual tracking method in this paper.
132	Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers	Zhen He, Jian Li, Daxue Liu, Hangen He, David Barber	To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames.
133	Fast Online Object Tracking and Segmentation: A Unifying Approach	Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H.S. Torr	In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach.
134	Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters	Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas	Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters
135	SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints	Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatofighi, Silvio Savarese	We present SoPhie; an interpretable framework based on Generative Adversarial Network (GAN), which leverages two sources of information, the path history of all the agents in a scene, and the scene context information, using images of the scene.
136	Leveraging Shape Completion for 3D Siamese Tracking	Silvio Giancola, Jesus Zarzar, Bernard Ghanem	In this paper, we investigate the versatility of Shape Completion for 3D Object Tracking in LIDAR point clouds.
137	Target-Aware Deep Tracking	Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He, Ming-Hsuan Yang	In this paper, we propose a novel scheme to learn target-aware features, which can better recognize the targets undergoing significant appearance variations than pre-trained deep features.
138	Spatiotemporal CNN for Video Object Segmentation	Kai Xu, Longyin Wen, Guorong Li, Liefeng Bo, Qingming Huang	In this paper, we present a unified, end-to-end trainable spatiotemporal CNN model for VOS, which consists of two branches, i.e., the temporal coherence branch and the spatial segmentation branch.
139	Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification	Wenjie Yang, Houjing Huang, Zhang Zhang, Xiaotang Chen, Kaiqi Huang, Shu Zhang	This paper proposes to discover diverse discriminative visual cues without extra assistance, e.g., pose estimation, human parsing.
140	Wide-Context Semantic Image Extrapolation	Yi Wang, Xin Tao, Xiaoyong Shen, Jiaya Jia	We propose a semantic regeneration network with several special contributions and use multiple spatial related losses to address these issues.
141	End-To-End Time-Lapse Video Synthesis From a Single Outdoor Image	Seonghyeon Nam, Chongyang Ma, Menglei Chai, William Brendel, Ning Xu, Seon Joo Kim	In this paper, we present an end-to-end solution to synthesize a time-lapse video from a single outdoor image using deep neural networks.
142	GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images	Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai	In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild. We introduce two large datasets, namely GIF-Faces and GIF-Moments, for both training and evaluation.
143	Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis	Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang	In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs.
144	Pluralistic Image Completion	Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai	In this paper, we present an approach for pluralistic image completion – the task of generating multiple and diverse plausible solutions for image completion.
145	Salient Object Detection With Pyramid Attention and Salient Edges	Wenguan Wang, Shuyang Zhao, Jianbing Shen, Steven C. H. Hoi, Ali Borji	This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs).
146	Latent Filter Scaling for Multimodal Unsupervised Image-To-Image Translation	Yazeed Alharbi, Neil Smith, Peter Wonka	We present a simple method that produces higher quality images than current state-of-the-art while maintaining the same amount of multimodal diversity.
147	Attention-Aware Multi-Stroke Style Transfer	Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, Jun Wang	In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model.
148	Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks	Minyoung Huh, Shao-Hua Sun, Ning Zhang	We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator.
149	Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting	Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo	In this paper, we propose a Pyramid-context Encoder Network (denoted as PEN-Net) for image inpainting by deep generative models.
150	Example-Guided Style-Consistent Image Synthesis From Semantic Labeling	Miao Wang, Guo-Ye Yang, Ruilong Li, Run-Ze Liang, Song-Hai Zhang, Peter M. Hall, Shi-Min Hu	We propose a solution to the example-guided image synthesis problem using conditional generative adversarial networks with style consistency.
151	MirrorGAN: Learning Text-To-Image Generation by Redescription	Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao	In this paper, we address this problem by proposing a novel global-local attentive and semantic-preserving text-to-image-to-text framework called MirrorGAN.
152	Light Field Messaging With Deep Photographic Steganography	Eric Wengrowski, Kristin Dana	We develop Light Field Messaging (LFM), a process of embedding, transmitting, and receiving hidden information in video that is displayed on a screen and captured by a handheld camera. To learn this CDTF we introduce a dataset (Camera-Display 1M) of 1,000,000 camera-captured images collected from 25 camera-display pairs.
153	Im2Pencil: Controllable Pencil Illustration From Photographs	Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang	We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style.
154	When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images	Mahmoud Afifi, Brian Price, Scott Cohen, Michael S. Brown	This paper introduces the first method to explicitly address this problem.
155	Beyond Volumetric Albedo — A Surface Optimization Framework for Non-Line-Of-Sight Imaging	Chia-Yin Tsai, Aswin C. Sankaranarayanan, Ioannis Gkioulekas	We introduce an analysis-by-synthesis framework that can reconstruct complex shape and reflectance of an NLOS object.
156	Reflection Removal Using a Dual-Pixel Sensor	Abhijith Punnappurath, Michael S. Brown	In this paper, we show that most cameras have an overlooked mechanism that can greatly simplify this task. As part of this work, we provide the first image dataset for reflection removal consisting of the sub-aperture views from the DP sensor.
157	Practical Coding Function Design for Time-Of-Flight Imaging	Felipe Gutierrez-Barragan, Syed Azer Reza, Andreas Velten, Mohit Gupta	We present a constrained optimization approach for designing practical coding functions that adhere to hardware constraints.
158	Meta-SR: A Magnification-Arbitrary Network for Super-Resolution	Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, Jian Sun	In this work,we propose a novel method called Meta-SR to firstly solvesuper-resolution of arbitrary scale factor (including non-integer scale factors) with a single model.
159	Multispectral and Hyperspectral Image Fusion by MS/HS Fusion Net	Qi Xie, Minghao Zhou, Qian Zhao, Deyu Meng, Wangmeng Zuo, Zongben Xu	In this paper, we propose a model-based deep learning approach for merging an HrMS and LrHS images to generate a high-resolution hyperspectral (HrHS) image.
160	Learning Attraction Field Representation for Robust Line Segment Detection	Nan Xue, Song Bai, Fudong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang	This paper presents a region-partition based attraction field dual representation for line segment maps, and thus poses the problem of line segment detection (LSD) as the region coloring problem.
161	Blind Super-Resolution With Iterative Kernel Correction	Jinjin Gu, Hannan Lu, Wangmeng Zuo, Chao Dong	In this paper, we propose an Iterative Kernel Correction (IKC) method for blur kernel estimation in blind SR problem, where the blur kernels are unknown.
162	Video Magnification in the Wild Using Fractional Anisotropy in Temporal Distribution	Shoichiro Takeda, Yasunori Akagi, Kazuki Okami, Megumi Isogai, Hideaki Kimata	In this paper, we present a novel method using fractional anisotropy (FA) to detect only meaningful subtle changes without the aforementioned requirements.
163	Attentive Feedback Network for Boundary-Aware Salient Object Detection	Mengyang Feng, Huchuan Lu, Errui Ding	In this paper, we design the Attentive Feedback Modules (AFMs) to better explore the structure of objects.
164	Heavy Rain Image Restoration: Integrating Physics Model and Conditional Adversarial Learning	Ruoteng Li, Loong-Fah Cheong, Robby T. Tan	In this paper, we propose a novel method to address these problems.
165	Learning to Calibrate Straight Lines for Fisheye Image Rectification	Zhucun Xue, Nan Xue, Gui-Song Xia, Weiming Shen	This paper presents a new deep-learning based method to simultaneously calibrate the intrinsic parameters of fisheye lens and rectify the distorted images. To train and evaluate the proposed model, we also create a new large-scale dataset labeled with corresponding distortion parameters and well-annotated distorted lines.
166	Camera Lens Super-Resolution	Chang Chen, Zhiwei Xiong, Xinmei Tian, Zheng-Jun Zha, Feng Wu	In this paper, we investigate SR from the perspective of camera lenses, named as CameraSR, which aims to alleviate the intrinsic tradeoff between resolution (R) and field-of-view (V) in realistic imaging systems.
167	Frame-Consistent Recurrent Video Deraining With Dual-Level Flow	Wenhan Yang, Jiaying Liu, Jiashi Feng	In this paper, we address the problem of rain removal from videos by proposing a more comprehensive framework that considers the additional degradation factors in real scenes neglected in previous works.
168	Deep Plug-And-Play Super-Resolution for Arbitrary Blur Kernels	Kai Zhang, Wangmeng Zuo, Lei Zhang	In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels.
169	Sea-Thru: A Method for Removing Water From Underwater Images	Derya Akkaynak, Tali Treibitz	Here, we present a method that recovers color with the revised model using RGBD images.
170	Deep Network Interpolation for Continuous Imagery Effect Transition	Xintao Wang, Ke Yu, Chao Dong, Xiaoou Tang, Chen Change Loy	Unlike existing methods that require a specific design to achieve one particular transition (e.g., style transfer), we propose a simple yet universal approach to attain a smooth control of diverse imagery effects in many low-level vision tasks, including image restoration, image-to-image translation, and style transfer.
171	Spatially Variant Linear Representation Models for Joint Filtering	Jinshan Pan, Jiangxin Dong, Jimmy S. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan Yang	Different from existing algorithms that rely on locally linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filter based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image.
172	Toward Convolutional Blind Denoising of Real Photographs	Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, Lei Zhang	In order to improve the generalization ability of deep CNN denoisers, we suggest training a convolutional blind denoising network (CBDNet) with more realistic noise model and real-world noisy-clean image pairs.
173	Towards Real Scene Super-Resolution With Raw Images	Xiangyu Xu, Yongrui Ma, Wenxiu Sun	To solve the first problem, we propose a new pipeline to generate realistic training data by simulating the imaging process of digital cameras.
174	ODE-Inspired Network Design for Single Image Super-Resolution	Xiangyu He, Zitao Mo, Peisong Wang, Yang Liu, Mingyuan Yang, Jian Cheng	In this paper, we propose to adopt an ordinary differential equation (ODE)-inspired design scheme for single image super-resolution, which have brought us a new understanding of ResNet in classification problems.
175	Blind Image Deblurring With Local Maximum Gradient Prior	Liang Chen, Faming Fang, Tingting Wang, Guixu Zhang	In this paper, we present a blind deblurring method based on Local Maximum Gradient (LMG) prior.
176	Attention-Guided Network for Ghost-Free High Dynamic Range Imaging	Qingsen Yan, Dong Gong, Qinfeng Shi, Anton van den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang	To avoid the ghosting from the source, we propose a novel attention-guided end-to-end deep neural network (AHDRNet) to produce high-quality ghost-free HDR images.
177	Searching for a Robust Neural Architecture in Four GPU Hours	Xuanyi Dong, Yi Yang	We propose an efficient NAS approach, which learns the searching approach by gradient descent.
178	Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction	Yifei Shi, Angel X. Chang, Zhelun Wu, Manolis Savva, Kai Xu	We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up encoding for context aggregation and top-down decoding for propagation.
179	Adaptively Connected Neural Networks	Guangrun Wang, Keze Wang, Liang Lin	This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) in two aspects.
180	CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency	Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang	In this paper, we present a novel pixel-wise adversarial domain adaptation algorithm.
181	Temporal Cycle-Consistency Learning	Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman	We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.
182	Predicting Future Frames Using Retrospective Cycle GAN	Yong-Hoon Kwon, Min-Gyu Park	In this paper, we propose a unified generative adversarial network for predicting accurate and temporally consistent future frames over time, even in a challenging environment.
183	Density Map Regression Guided Detection Network for RGB-D Crowd Counting and Localization	Dongze Lian, Jing Li, Jia Zheng, Weixin Luo, Shenghua Gao	To simultaneously estimate head counts and localize heads with bounding boxes, a regression guided detection network (RDNet) is proposed for RGB-D crowd counting.
184	TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning	Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez	In this work, we propose Task Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta learning fashion.
185	Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach	Yuhua Chen, Wen Li, Xiaoran Chen, Luc Van Gool	In this work, we propose an approach to cross-domain semantic segmentation with the auxiliary geometric information, which can also be easily obtained from virtual environments.
186	Attentive Single-Tasking of Multiple Tasks	Kevis-Kokitsi Maninis, Ilija Radosavovic, Iasonas Kokkinos	In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as “single-tasking multiple tasks”.
187	Deep Metric Learning to Rank	Fatih Cakir, Kun He, Xide Xia, Brian Kulis, Stan Sclaroff	We propose a novel deep metric learning method by revisiting the learning to rank approach.
188	End-To-End Multi-Task Learning With Attention	Shikun Liu, Edward Johns, Andrew J. Davison	We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention.
189	Self-Supervised Learning via Conditional Motion Propagation	Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy	In this work, we design a new learning-from-motion paradigm to bridge these gaps.
190	Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence	Hsueh-Ying Lai, Yi-Hsuan Tsai, Wei-Chen Chiu	In this paper, we propose a single and principled network to jointly learn spatiotemporal correspondence for stereo matching and flow estimation, with a newly designed geometric connection as the unsupervised signal for temporally adjacent stereo pairs.
191	All About Structure: Adapting Structural Information Across Domains for Boosting Semantic Segmentation	Wei-Lun Chang, Hui-Po Wang, Wen-Hsiao Peng, Wei-Chen Chiu	In this paper we tackle the problem of unsupervised domain adaptation for the task of semantic segmentation, where we attempt to transfer the knowledge learned upon synthetic datasets with ground-truth labels to real-world images without any annotation.
192	Iterative Reorganization With Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning	Chen Wei, Lingxi Xie, Xutong Ren, Yingda Xia, Chi Su, Jiaying Liu, Qi Tian, Alan L. Yuille	This paper presents a novel approach which applies to jigsaw puzzles with an arbitrary grid size and dimensionality.
193	Revisiting Self-Supervised Visual Representation Learning	Alexander Kolesnikov, Xiaohua Zhai, Lucas Beyer	Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights.
194	It’s Not About the Journey; It’s About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning	Monica Haurilet, Alina Roitberg, Rainer Stiefelhagen	We present a new model for Visual Reasoning, aimed at capturing the interplay among individual objects in the image represented as a scene graph.
195	Actively Seeking and Learning From Live Data	Damien Teney, Anton van den Hengel	The approach we propose is a step toward overcoming this limitation by searching for the information required at test time.
196	Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing	Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li	To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training samples online, and to drive the model to discover complementary textual-visual correspondences.
197	Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks	Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton van den Hengel	To capture and exploit this important information we propose a graph-based, language-guided attention mechanism.
198	Scene Graph Generation With External Knowledge and Image Reconstruction	Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling	In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues.
199	Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval	Yale Song, Mohammad Soleymani	In this work, we introduce Polysemous Instance Embedding Networks (PIE-Nets) that compute multiple and diverse representations of an instance by combining global context with locally-guided features via multi-head self-attention and residual learning. To facilitate further research in video-text retrieval, we release a new dataset of 50K video-sentence pairs collected from social media, dubbed MRW (my reaction when).
200	MUREL: Multimodal Relational Reasoning for Visual Question Answering	Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome	In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images.
201	Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering	Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang	In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention.
202	Information Maximizing Visual Question Generation	Ranjay Krishna, Michael Bernstein, Li Fei-Fei	To overcome the non-differentiability of discrete natural language tokens, we introduce a variational continuous latent space onto which the expected answers project.
203	Learning to Detect Human-Object Interactions With Knowledge	Bingjie Xu, Yongkang Wong, Junnan Li, Qi Zhao, Mohan S. Kankanhalli	In this work, we focus on detecting human-object interactions (HOIs) in images, an essential step towards deeper scene understanding.
204	Learning Words by Drawing Images	Didac Suris, Adria Recasens, David Bau, David Harwath, James Glass, Antonio Torralba	We propose a framework for learning through drawing.
205	Factor Graph Attention	Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander G. Schwing	We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities.
206	Reducing Uncertainty in Undersampled MRI Reconstruction With Active Acquisition	Zizhao Zhang, Adriana Romero, Matthew J. Muckley, Pascal Vincent, Lin Yang, Michal Drozdzal	In this paper, we present a novel method for MRI reconstruction that, at inference time, dynamically selects the measurements to take and iteratively refines the prediction in order to best reduce the reconstruction error and, thus, its uncertainty.
207	ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification	Fangneng Zhan, Shijian Lu	This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance.
208	ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape	Fabian Manhardt, Wadim Kehl, Adrien Gaidon	We present a deep learning method for end-to-end monocular 3D object detection and metric shape retrieval.
209	Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images	Yi Zhou, Xiaodong He, Lei Huang, Li Liu, Fan Zhu, Shanshan Cui, Ling Shao	In this paper, we propose a collaborative learning method to jointly improve the performance of disease grading and lesion segmentation by semi-supervised learning with an attention mechanism.
210	Biologically-Constrained Graphs for Global Connectomics Reconstruction	Brian Matejek, Daniel Haehn, Haidong Zhu, Donglai Wei, Toufiq Parag, Hanspeter Pfister	We propose a third step for connectomics reconstruction pipelines to refine an over-segmentation using both local and global context with an emphasis on adhering to the underlying biology.
211	P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification	Bingzhe Wu, Shiwan Zhao, Guangyu Sun, Xiaolu Zhang, Zhong Su, Caihong Zeng, Zhihong Liu	To tackle the above two challenges, we introduce a novel stochastic gradient descent (SGD) scheme, named patient privacy preserving SGD (P3SGD), which performs the model update of the SGD in the patient level via a large-step update built upon each patient’s data.
212	Elastic Boundary Projection for 3D Medical Image Segmentation	Tianwei Ni, Lingxi Xie, Huangjie Zheng, Elliot K. Fishman, Alan L. Yuille	In this paper, we bridge the gap between 2D and 3D using a novel approach named Elastic Boundary Projection (EBP).
213	SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images	Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye	In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images.
214	Noise2Void – Learning Denoising From Single Noisy Images	Alexander Krull, Tim-Oliver Buchholz, Florian Jug	Here, we introduce Noise2Void (N2V), a training scheme that takes this idea one step further.
215	Joint Discriminative and Generative Learning for Person Re-Identification	Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz	In this paper, we seek to improve learned re-id embeddings by better leveraging the generated data.
216	Unsupervised Person Re-Identification by Soft Multilabel Learning	Hong-Xing Yu, Wei-Shi Zheng, Ancong Wu, Xiaowei Guo, Shaogang Gong, Jian-Huang Lai	To overcome this problem, we propose a deep model for the soft multilabel learning for unsupervised RE-ID.
217	Learning Context Graph for Person Search	Yichao Yan, Qiang Zhang, Bingbing Ni, Wendong Zhang, Minghao Xu, Xiaokang Yang	In this work, we take a step further and consider employing context information for person search.
218	Gradient Matching Generative Networks for Zero-Shot Learning	Mert Bulent Sariyildiz, Ramazan Gokberk Cinbis	In contrast, we propose a generative model that can naturally learn from unsupervised examples, and synthesize training examples for unseen classes purely based on their class embeddings, and therefore, reduce the zero-shot learning problem into a supervised classification task.
219	Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval	Sounak Dey, Pau Riba, Anjan Dutta, Josep Llados, Yi-Zhe Song	In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories.
220	Zero-Shot Task Transfer	Arghya Pal, Vineeth N Balasubramanian	In this work, we present a novel meta-learning algorithm that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks).
221	C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection	Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao, Qixiang Ye	In this paper, we introduce a continuation optimization method into MIL and thereby creating continuation multiple instance learning (C-MIL), with the intention of alleviating the non-convexity problem in a systematic way.
222	Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations	Jiwoon Ahn, Sunghyun Cho, Suha Kwak	This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision.
223	Attention-Based Dropout Layer for Weakly Supervised Object Localization	Junsuk Choe, Hyunjung Shim	To address this problem, we propose an Attention-based Dropout Layer (ADL), which utilizes the self-attention mechanism to process the feature maps of the model.
224	Domain Generalization by Solving Jigsaw Puzzles	Fabio M. Carlucci, Antonio D’Innocente, Silvia Bucci, Barbara Caputo, Tatiana Tommasi	In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images.
225	Transferrable Prototypical Networks for Unsupervised Domain Adaptation	Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei	In this paper, we introduce a new idea for unsupervised domain adaptation via a remold of Prototypical Networks, which learn an embedding space and perform classification via a remold of the distances to the prototype of each class.
226	Blending-Target Domain Adaptation by Adversarial Meta-Adaptation Networks	Ziliang Chen, Jingyu Zhuang, Xiaodan Liang, Liang Lin	In this paper, we consider a more realistic transfer scenario: our target domain is comprised of multiple sub-targets implicitly blended with each other so that learners could not identify which sub-target each unlabeled sample belongs to.
227	ELASTIC: Improving CNNs With Dynamic Scaling Policies	Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan L. Yuille, Mohammad Rastegari	In this paper, we introduce Elastic, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data.
228	ScratchDet: Training Single-Shot Object Detectors From Scratch	Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, Tao Mei	In this paper, we explore to train object detectors from scratch robustly.
229	SFNet: Learning Object-Aware Semantic Correspondence	Junghyup Lee, Dohyung Kim, Jean Ponce, Bumsub Ham	We propose a new CNN architecture, dubbed SFNet, which implements this idea.
230	Deep Metric Learning Beyond Binary Supervision	Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak	Motivated by this, we present a novel method for deep metric learning using continuous labels.
231	Learning to Cluster Faces on an Affinity Graph	Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin	Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters.
232	C2AE: Class Conditioned Auto-Encoder for Open-Set Recognition	Poojan Oza, Vishal M. Patel	In this paper, we propose an open-set recognition algorithm using class conditioned auto-encoders with novel training and testing methodologies.
233	Shapes and Context: In-The-Wild Image Synthesis & Manipulation	Aayush Bansal, Yaser Sheikh, Deva Ramanan	We introduce a data-driven model for interactively synthesizing in-the-wild images from semantic label input masks.
234	Semantics Disentangling for Text-To-Image Generation	Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao	In this paper, we consider semantics from the input text descriptions in helping render photo-realistic images.
235	Semantic Image Synthesis With Spatially-Adaptive Normalization	Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu	We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout.
236	Progressive Pose Attention Transfer for Person Image Generation	Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai	This paper proposes a new generative adversarial network to the problem of pose transfer, i.e., transferring the pose of a given person to a target one.
237	Unsupervised Person Image Generation With Semantic Parsing Transformation	Sijie Song, Wei Zhang, Jiaying Liu, Tao Mei	In this paper, we address unsupervised pose-guided person image generation, which is known challenging due to non-rigid deformation.
238	DeepView: View Synthesis With Learned Gradient Descent	John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, Richard Tucker	We present a novel approach to view synthesis using multiplane images (MPIs).
239	Animating Arbitrary Objects via Deep Motion Transfer	Aliaksandr Siarohin, Stephane Lathuiliere, Sergey Tulyakov, Elisa Ricci, Nicu Sebe	This paper introduces a novel deep learning framework for image animation.
240	Textured Neural Avatars	Aliaksandra Shysheya, Egor Zakharov, Kara-Ali Aliev, Renat Bashirov, Egor Burkov, Karim Iskakov, Aleksei Ivakhnenko, Yury Malkov, Igor Pasechnik, Dmitry Ulyanov, Alexander Vakhitov, Victor Lempitsky	We present a system for learning full body neural avatars, i.e. deep networks that produce full body renderings of a person for varying body pose and varying camera pose.
241	IM-Net for High Resolution Video Frame Interpolation	Tomer Peleg, Pablo Szekely, Doron Sabo, Omry Sendik	In this paper we propose IM-Net: an interpolated motion neural network.
242	Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation	Ying-Cong Chen, Xiaogang Xu, Zhuotao Tian, Jiaya Jia	In this paper, we propose an alternative framework, as an extension of latent space interpolation, to consider the intermediate region between two domains during translation.
243	Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation	Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan	In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map.
244	Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping	Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Kun Zhang, Dacheng Tao	Based on this special property, we develop a geometry-consistent generative adversarial network (Gc-GAN), which enables one-sided unsupervised domain mapping.
245	DeepVoxels: Learning Persistent 3D Feature Embeddings	Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Niessner, Gordon Wetzstein, Michael Zollhofer	In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis.
246	Inverse Path Tracing for Joint Material and Lighting Estimation	Dejan Azinovic, Tzu-Mao Li, Anton Kaplanyan, Matthias Niessner	We introduce Inverse Path Tracing, a novel approach to jointly estimate the material properties of objects and light sources in indoor scenes by using an invertible light transport simulation.
247	The Visual Centrifuge: Model-Free Layered Video Representations	Jean-Baptiste Alayrac, Joao Carreira, Andrew Zisserman	Here we propose a learning-based approach for multi-layered video representation: we introduce novel uncertainty-capturing 3D convolutional architectures and train them to separate blended videos.
248	Label-Noise Robust Generative Adversarial Networks	Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada	To remedy this, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy.
249	DLOW: Domain Flow for Adaptation and Generalization	Rui Gong, Wen Li, Yuhua Chen, Luc Van Gool	In this work, we present a domain flow generation(DLOW) model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other.
250	CollaGAN: Collaborative GAN for Missing Image Data Imputation	Dongwook Lee, Junyoung Kim, Won-Jin Moon, Jong Chul Ye	To address this problem, here we proposed a novel framework for missing image data imputation, called Collaborative Generative Adversarial Network (CollaGAN).
251	d-SNE: Domain Adaptation Using Stochastic Neighborhood Embedding	Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumder	In this paper, we propose a new technique (d-SNE) of domain adaptation that cleverly uses stochastic neighborhood embedding techniques and a novel modified-Hausdorff distance.
252	Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation	Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang	To address this problem, this paper introduces a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment.
253	ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation	Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Perez	In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions.
254	ContextDesc: Local Descriptor Augmentation With Cross-Modality Context	Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan	In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors.
255	Large-Scale Long-Tailed Recognition in an Open World	Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu	We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world.
256	SDC – Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks	Rene Schuster, Oliver Wasenmuller, Christian Unger, Didier Stricker	We present a robust, unified descriptor network that considers a large context region with high spatial variance.
257	Learning Correspondence From the Cycle-Consistency of Time	Xiaolong Wang, Allan Jabri, Alexei A. Efros	We introduce a self-supervised method for learning visual correspondence from unlabeled video.
258	AE2-Nets: Autoencoder in Autoencoder Networks	Changqing Zhang, Yeqing Liu, Huazhu Fu	Differently, in this paper, we focus on unsupervised representation learning and propose a novel framework termed Autoencoder in Autoencoder Networks (AE^2-Nets), which integrates information from heterogeneous sources into an intact representation by the nested autoencoder framework.
259	Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach	Proteek Chandan Roy, Vishnu Naresh Boddeti	We formulate the problem as an adversarial non-zero sum game of finding a good embedding function with two competing goals: to retain as much task dependent discriminative image information as possible, while simultaneously minimizing the amount of information, as measured by entropy, about other sensitive attributes of the user.
260	Learning Spatial Common Sense With Geometry-Aware Recurrent Networks	Hsiao-Yu Fish Tung, Ricson Cheng, Katerina Fragkiadaki	We integrate two powerful ideas, geometry and deep visual representation learning, into recurrent network architectures for mobile visual scene understanding.
261	Structured Knowledge Distillation for Semantic Segmentation	Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang	In this paper, we investigate the issue of knowledge distillation for training compact semantic segmentation networks by making use of cumbersome networks.
262	Scan2CAD: Learning CAD Model Alignment in RGB-D Scans	Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, Matthias Niessner	We present Scan2CAD, a novel data-driven method that learns to align clean 3D CAD models from a shape database to the noisy and incomplete geometry of a commodity RGB-D scan.
263	Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation	Po-Yi Chen, Alexander H. Liu, Yen-Cheng Liu, Yu-Chiang Frank Wang	In this paper, we propose SceneNet to overcome this limitation with the aid of semantic understanding from segmentation.
264	Tell Me Where I Am: Object-Level Scene Context Prediction	Xiaotian Qiao, Quanlong Zheng, Ying Cao, Rynson W.H. Lau	In this paper, we consider an inverse problem of how to hallucinate missing contextual information from the properties of a few standalone objects.
265	Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation	He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, Leonidas J. Guibas	The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. To further improve our model and evaluate its performance on real data, we also provide a fully annotated real-world dataset with large environment and instance variation.
266	Supervised Fitting of Geometric Primitives to 3D Point Clouds	Lingxiao Li, Minhyuk Sung, Anastasia Dubrovina, Li Yi, Leonidas J. Guibas	In this work, we introduce Supervised Primitive Fitting Network (SPFN), an end-to-end neural network that can robustly detect a varying number of primitives at different scales without any user control.
267	Do Better ImageNet Models Transfer Better?	Simon Kornblith, Jonathon Shlens, Quoc V. Le	Here, we compare the performance of 16 classification networks on 12 image classification datasets. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested.
268	Gotta Adapt ‘Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild	Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu, Manmohan Chandraker	Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance
269	Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift	Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang	Theoretically, we find that Dropout shifts the variance of a specific neural unit when we transfer the state of that network from training to test. However, BN maintains its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of variances in Dropout and BN (we name this scheme “variance shift”) causes the unstable numerical behavior in inference that leads to erroneous predictions finally.
270	Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation	Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann	To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP).
271	DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Multi-Scale Deep Features	Chang Tang, Xinzhong Zhu, Xinwang Liu, Lizhe Wang, Albert Zomaya	To deal with these issues, we propose a deep neural network which recurrently fuses and refines multi-scale deep features (DeFusionNet) for defocus blur detection.
272	Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks	Eunwoo Kim, Chanho Ahn, Philip H.S. Torr, Songhwai Oh	In particular, in this work we address the problem of memory efficient learning for multiple tasks.
273	Universal Domain Adaptation	Kaichao You, Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan	To solve the universal domain adaptation problem, we propose Universal Adaptation Network (UAN).
274	Improving Transferability of Adversarial Examples With Input Diversity	Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, Alan L. Yuille	To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns.
275	Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition	Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, Heng Tao Shen	In this paper, we develop a Sequence-to-Sequence Domain Adaptation Network (SSDAN) for robust text image recognition, which could exploit unsupervised sequence data by an attention-based sequence encoder-decoder network.
276	Hybrid-Attention Based Decoupled Metric Learning for Zero-Shot Image Retrieval	Binghui Chen, Weihong Deng	In this paper, we first emphasize the importance of learning visual discriminative metric and preventing the partial/selective learning behavior of learner in ZSIR, and then propose the Decoupled Metric Learning (DeML) framework to achieve these individually.
277	Learning to Sample	Oren Dovrat, Itai Lang, Shai Avidan	To do that, we propose a deep network to simplify 3D point clouds.
278	Few-Shot Learning via Saliency-Guided Hallucination of Samples	Hongguang Zhang, Jing Zhang, Piotr Koniusz	In this paper, we follow the latter direction and present a novel data hallucination model.
279	Variational Convolutional Neural Network Pruning	Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian	We propose a variational Bayesian scheme for pruning convolutional neural networks in channel level.
280	Towards Optimal Structured CNN Pruning via Generative Adversarial Learning	Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann	In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner.
281	Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression	Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji	In this paper, we investigate the problem of CNN compression from a novel interpretable perspective.
282	Fully Quantized Network for Object Detection	Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, Rui Fan	In this paper, we demonstrate that many of these difficulties arise because of instability during the fine-tuning stage of the quantization process, and propose several novel techniques to overcome these instabilities.
283	MnasNet: Platform-Aware Neural Architecture Search for Mobile	Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le	In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.
284	Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More	Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, Mingli Song	In this paper, we investigate a novel deep-model reusing task.
285	K-Nearest Neighbors Hashing	Xiangyu He, Peisong Wang, Jian Cheng	In this work, we revisit the sign() function from the perspective of space partitioning.
286	Learning RoI Transformer for Oriented Object Detection in Aerial Images	Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu	In this paper, we propose a RoI Transformer to address these problems.
287	Snapshot Distillation: Teacher-Student Optimization in One Generation	Chenglin Yang, Lingxi Xie, Chi Su, Alan L. Yuille	This paper presents snapshot distillation (SD), the first framework which enables teacher-student optimization in one generation.
288	Geometry-Aware Distillation for Indoor Semantic Segmentation	Jianbo Jiao, Yunchao Wei, Zequn Jie, Honghui Shi, Rynson W.H. Lau, Thomas S. Huang	In this paper, we propose to jointly infer the semantic and depth information by distilling geometry-aware embedding to eliminate such strong constraint while still exploiting the helpful depth domain information.
289	LiveSketch: Query Perturbations for Guided Sketch-Based Visual Search	John Collomosse, Tu Bui, Hailin Jin	Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search.
290	Bounding Box Regression With Uncertainty for Accurate Object Detection	Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang	In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together.
291	OCGAN: One-Class Novelty Detection Using GANs With Constrained Latent Representations	Pramuditha Perera, Ramesh Nallapati, Bing Xiang	We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class.
292	Learning Metrics From Teachers: Compact Networks for Image Embedding	Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa	In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks.
293	Activity Driven Weakly Supervised Object Detection	Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan	In our work, we try to leverage not only the object class labels but also the action labels associated with the data.
294	Separate to Adapt: Open Set Domain Adaptation via Progressive Separation	Hong Liu, Zhangjie Cao, Mingsheng Long, Jianmin Wang, Qiang Yang	To this end, this paper presents Separate to Adapt (STA), an end-to-end approach to open set domain adaptation.
295	Layout-Graph Reasoning for Fashion Landmark Detection	Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin	In this paper, we propose to seamlessly enforce structural layout relationships among landmarks on the intermediate representations via multiple stacked layout-graph reasoning layers.
296	DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs	Erkun Yang, Tongliang Liu, Cheng Deng, Wei Liu, Dacheng Tao	To address this problem, in this paper, we propose a new deep unsupervised hashing model, called DistilHash, which can learn a distilled data set, where data pairs have confident similarity signals.
297	Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks	Junjie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu	In this paper, we propose a Metadata Neighbourhood Graph Co-Attention Network (MangoNet) to model the correlations between each target image and its neighbours.
298	Region Proposal by Guided Anchoring	Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin	In this paper, we revisit this foundational stage.
299	Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation	Jian Liang, Ran He, Zhenan Sun, Tieniu Tan	This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks.
300	Learning to Transfer Examples for Partial Domain Adaptation	Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, Qiang Yang	In this work, we propose a unified approach to PDA, Example Transfer Network (ETN), which jointly learns domain-invariant representations across domains and a progressive weighting scheme to quantify the transferability of source examples.
301	Generalized Zero-Shot Recognition Based on Visually Semantic Embedding	Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama	We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training.
302	Towards Visual Feature Translation	Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, Qi Tian	In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems.
303	Amodal Instance Segmentation With KINS Dataset	Lu Qi, Li Jiang, Shu Liu, Xiaoyong Shen, Jiaya Jia	In this paper, we augment KITTI with more instance pixel-level annotation for 8 categories, which we call KITTI INStance dataset (KINS).
304	Global Second-Order Pooling Convolutional Networks	Zilin Gao, Jiangtao Xie, Qilong Wang, Peihua Li	In this paper, we propose a novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network.
305	Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up	Weifeng Ge, Xiangru Lin, Yizhou Yu	In this paper, we approach this problem from a different perspective.
306	NetTailor: Tuning the Architecture, Not Just the Weights	Pedro Morgado, Nuno Vasconcelos	To address these problems, we propose a transfer learning procedure, denoted NetTailor, in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new networks.
307	Learning-Based Sampling for Natural Image Matting	Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, Tunc Ozan Aydin	In this paper, we propose the estimation of the layer colors through the use of deep neural networks prior to the opacity estimation.
308	Learning Unsupervised Video Object Segmentation Through Visual Attention	Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling	This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks.
309	4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks	Christopher Choy, JunYoung Gwak, Silvio Savarese	In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions.
310	Pyramid Feature Attention Network for Saliency Detection	Ting Zhao, Xiangqian Wu	To address this problem, a novel CNN named pyramid feature attention network (PFAN) is proposed to enhance the high-level context features and the low-level spatial structural features.
311	Co-Saliency Detection via Mask-Guided Fully Convolutional Networks With Multi-Scale Label Smoothing	Kaihua Zhang, Tengpeng Li, Bo Liu, Qingshan Liu	In this paper, we propose a hierarchical image co-saliency detection framework as a coarse to fine strategy to capture this pattern.
312	SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines	Yuan-Ting Hu, Hong-Shuo Chen, Kexin Hui, Jia-Bin Huang, Alexander G. Schwing	We introduce SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation), a new dataset aiming to stimulate semantic amodal segmentation research. To address this issue, we present a synthetic dataset extracted from the photo-realistic game GTA-V.
313	Learning Instance Activation Maps for Weakly Supervised Instance Segmentation	Yi Zhu, Yanzhao Zhou, Huijuan Xu, Qixiang Ye, David Doermann, Jianbin Jiao	In this work, we tackle this challenging problem by using a novel instance extent filling approach.
314	Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation	Zhi Tian, Tong He, Chunhua Shen, Youliang Yan	In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.
315	Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation	Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang	In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals.
316	Dual Attention Network for Scene Segmentation	Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, Hanqing Lu	In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism.
317	InverseRenderNet: Learning Single Image Inverse Rendering	Ye Yu, William A. P. Smith	We show how to train a fully convolutional neural network to perform inverse rendering from a single, uncontrolled image.
318	A Variational Auto-Encoder Model for Stochastic Point Processes	Nazanin Mehrasa, Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, Greg Mori	We propose a novel probabilistic generative model for action sequences.
319	Unifying Heterogeneous Classifiers With Distillation	Jayakorn Vongkulbhisal, Phongtharin Vinayavekhin, Marco Visentini-Scarzanella	In this paper, we study the problem of unifying knowledge from a set of classifiers with different architectures and target classes into a single classifier, given only a generic set of unlabelled data.
320	Assessment of Faster R-CNN in Man-Machine Collaborative Search	Arturo Deza, Amit Surana, Miguel P. Eckstein	With the advent of modern expert systems driven by deep learning that supplement human experts (e.g. radiologists, dermatologists, surveillance scanners), we analyze how and when do such expert systems enhance human performance in a fine-grained small target visual search task.
321	OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge	Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi	In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.
322	NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction	Yuan Gao, Jiayi Ma, Mingbo Zhao, Wei Liu, Alan L. Yuille	In this paper, we propose a novel Convolutional Neural Network (CNN) structure for general-purpose multi-task learning (MTL), which enables automatic feature fusing at every layer from different tasks.
323	Spectral Metric for Dataset Complexity Assessment	Frederic Branchaud-Charron, Andrew Achkar, Pierre-Marc Jodoin	In this paper, we propose a new measure to gauge the complexity of image classification problems.
324	ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding	Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, Hefeng Wu	We propose an attention-injective deformable convolutional network called ADCrowdNet for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes.
325	VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild	Yihang Lou, Yan Bai, Jun Liu, Shiqi Wang, Lingyu Duan	To promote the research of vehicle ReID in the wild, we collect a new dataset called VERI-Wild with the following distinct features: 1) The vehicle images are captured by a large surveillance system containing 174 cameras covering a large urban district (more than 200km^2) The camera network continuously captures vehicles for 24 hours in each day and lasts for 1 month.
326	3D Local Features for Direct Pairwise Registration	Haowen Deng, Tolga Birdal, Slobodan Ilic	We present a novel, data driven approach for solving the problem of registration of two point cloud scans.
327	HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds	Xiuye Gu, Yijie Wang, Chongruo Wu, Yong Jae Lee, Panqu Wang	We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds.
328	GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices	Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri	This paper addresses the problem of recovering projective camera matrices from collections of fundamental matrices in multiview settings.
329	Group-Wise Correlation Stereo Network	Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li	In this paper, we propose to construct the cost volume by group-wise correlation.
330	Multi-Level Context Ultra-Aggregation for Stereo Matching	Guang-Yu Nie, Ming-Ming Cheng, Yun Liu, Zhengfa Liang, Deng-Ping Fan, Yue Liu, Yongtian Wang	In this paper, we propose a unary features descriptor using multi-level context ultra-aggregation (MCUA), which encapsulates all convolutional features into a more discriminative representation by intra- and inter-level features combination.
331	Large-Scale, Metric Structure From Motion for Unordered Light Fields	Sotiris Nousias, Manolis Lourakis, Christos Bergeles	This paper presents a large scale, metric Structure from Motion (SfM) pipeline for generalised cameras with overlapping fields-of-view, and demonstrates it using Light Field (LF) images.
332	Understanding the Limitations of CNN-Based Absolute Camera Pose Regression	Torsten Sattler, Qunjie Zhou, Marc Pollefeys, Laura Leal-Taixe	To understand this behavior, we develop a theoretical model for camera pose regression.
333	DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image	Jiaxiong Qiu, Zhaopeng Cui, Yinda Zhang, Xingdi Zhang, Shuaicheng Liu, Bing Zeng, Marc Pollefeys	In this paper, we propose a deep learning architecture that produces accurate dense depth for the outdoor scene from a single color image and a sparse depth.
334	Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling	Jiancheng Yang, Qiang Zhang, Bingbing Ni, Linguo Li, Jinxian Liu, Mengdie Zhou, Qi Tian	Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points.
335	Learning With Batch-Wise Optimal Transport Loss for 3D Shape Recognition	Lin Xu, Han Sun, Yuai Liu	In this paper, we show how to learn an importance-driven distance metric via optimal transport programming from batches of samples.
336	DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion	Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martin-Martin, Cewu Lu, Li Fei-Fei, Silvio Savarese	In this work, we present DenseFusion, a generic framework for estimating 6D pose of a set of known objects from RGB-D images.
337	Dense Depth Posterior (DDP) From Single Image and Sparse Range	Yanchao Yang, Alex Wong, Stefano Soatto	We present a deep learning system to infer the posterior distribution of a dense depth map associated with an image, by exploiting sparse range measurements, for instance from a lidar.
338	DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama	Shang-Ta Yang, Fu-En Wang, Chi-Han Peng, Peter Wonka, Min Sun, Hung-Kuo Chu	We present a deep learning framework, called DuLa-Net, to predict Manhattan-world 3D room layouts from a single RGB panorama. To learn more complex room layouts, we introduce the Realtor360 dataset that contains panoramas of Manhattan-world room layouts with different numbers of corners.
339	Veritatem Dies Aperit – Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach	Amir Atapour-Abarghouei, Toby P. Breckon	In this paper, we propose a multi-task learning-based approach capable of jointly performing geometric and semantic scene understanding, namely depth prediction (monocular depth estimation and depth completion) and semantic scene segmentation.
340	Segmentation-Driven 6D Object Pose Estimation	Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann	In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations.
341	Exploiting Temporal Context for 3D Human Pose Estimation in the Wild	Anurag Arnab, Carl Doersch, Andrew Zisserman	We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes.
342	What Do Single-View 3D Reconstruction Networks Learn?	Maxim Tatarchenko, Stephan R. Richter, Rene Ranftl, Zhuwen Li, Vladlen Koltun, Thomas Brox	In this work, we set up two alternative approaches that perform image classification and retrieval respectively.
343	UniformFace: Learning Deep Equidistributed Representation for Face Recognition	Yueqi Duan, Jiwen Lu, Jie Zhou	In this paper, we propose a new supervision objective named uniform loss to learn deep equidistributed representations for face recognition.
344	Semantic Graph Convolutional Networks for 3D Human Pose Regression	Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris N. Metaxas	In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression.
345	Mask-Guided Portrait Editing With Conditional GANs	Shuyang Gu, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen, Lu Yuan	In this paper, we argue about three issues in existing techniques: diversity, quality, and controllability for portrait synthesis and editing.
346	Group Sampling for Scale Invariant Face Detection	Xiang Ming, Fangyun Wei, Ting Zhang, Dong Chen, Fang Wen	In this paper, we carefully examine the factors affecting face detection across a large range of scales, and conclude that the balance of training samples, including both positive and negative ones, at different scales is the key.
347	Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation	Yong Zhang, Baoyuan Wu, Weiming Dong, Zhifeng Li, Wei Liu, Bao-Gang Hu, Qiang Ji	In this paper, a novel general framework for AU intensity estimation is presented, which differs from traditional estimation methods in two aspects.
348	Semantic Alignment: Finding Semantically Consistent Ground-Truth for Facial Landmark Detection	Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M. Robertson, Jinqiao Wang	In this paper, we propose a novel probabilistic model which introduces a latent variable, i.e. ‘real’ groundtruth which is semantically consistent, to optimize.
349	LAEO-Net: Revisiting People Looking at Each Other in Videos	Manuel J. Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman	For this purpose, we propose LAEO-Net, a new deep CNN for determining LAEO in videos. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO.
350	Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks	Meilu Zhu, Daming Shi, Mingjie Zheng, Muhammad Sadiq	In this paper, we present a simple and effective framework called Occlusion-adaptive Deep Networks (ODN) with the purpose of solving the occlusion problem for facial landmark detection.
351	Learning Individual Styles of Conversational Gesture	Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik	We present a method for cross-modal translation from “in-the-wild” monologue speech of a single speaker to their conversational gesture motion. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures.
352	Face Anti-Spoofing: Model Matters, so Does Data	Xiao Yang, Wenhan Luo, Linchao Bao, Yuan Gao, Dihong Gong, Shibao Zheng, Zhifeng Li, Wei Liu	In this paper, we present a data collection solution along with a data synthesis technique to simulate digital medium-based face spoofing attacks, which can easily help us obtain a large amount of training data well reflecting the real-world scenarios.
353	Fast Human Pose Estimation	Feng Zhang, Xiatian Zhu, Mao Ye	In this work, we investigate the under-studied but practically critical pose model efficiency problem.
354	Decorrelated Adversarial Learning for Age-Invariant Face Recognition	Hao Wang, Dihong Gong, Zhifeng Li, Wei Liu	To implement this idea, we propose the Decorrelated Adversarial Learning (DAL) algorithm, where a Canonical Mapping Module (CMM) is introduced to find maximum correlation of the paired features generated by the backbone network, while the backbone network and the factorization module are trained to generate features reducing the correlation.
355	Cross-Task Weakly Supervised Learning From Instructional Videos	Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic	In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations.
356	D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation	Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles	We propose Discriminative Differentiable Dynamic Time Warping (D3TW), the first discriminative model using weak ordering supervision.
357	Progressive Teacher-Student Learning for Early Action Prediction	Xionghui Wang, Jian-Fang Hu, Jian-Huang Lai, Jianguo Zhang, Wei-Shi Zheng	In this paper, we aim at improving early action prediction by proposing a novel teacher-student learning framework.
358	Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning	Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, Chenggang Yan, Tao Mei	To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos.
359	MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation	Yazan Abu Farha, Jurgen Gall	In this paper, we introduce a multi-stage architecture for the temporal action segmentation task.
360	Transferable Interactiveness Knowledge for Human-Object Interaction Detection	Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yanfeng Wang, Cewu Lu	In this paper, we explore Interactiveness Knowledge which indicates whether human and object interact with each other or not.
361	Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition	Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian	To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions.
362	Multi-Granularity Generator for Temporal Action Proposal	Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang	In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information.
363	Deep Rigid Instance Scene Flow	Wei-Chiu Ma, Shenlong Wang, Rui Hu, Yuwen Xiong, Raquel Urtasun	In this paper we tackle the problem of scene flow estimation in the context of self-driving.
364	See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks	Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, Fatih Porikli	We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos.
365	Patch-Based Discriminative Feature Learning for Unsupervised Person Re-Identification	Qize Yang, Hong-Xing Yu, Ancong Wu, Wei-Shi Zheng	In this work, we overcome this problem by proposing a patch-based unsupervised learning framework in order to learn discriminative feature from patches instead of the whole images.
366	SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking	Guangting Wang, Chong Luo, Zhiwei Xiong, Wenjun Zeng	In this paper, we propose a SiamFC-based tracker, named SPM-Tracker, to tackle this challenge.
367	Spatial Fusion GAN for Image Synthesis	Fangneng Zhan, Hongyuan Zhu, Shijian Lu	This paper presents an innovative Spatial Fusion GAN (SF-GAN) that combines a geometry synthesizer and an appearance synthesizer to achieve synthesis realism in both geometry and appearance spaces.
368	Text Guided Person Image Synthesis	Xingran Zhou, Siyu Huang, Bin Li, Yingming Li, Jiachen Li, Zhongfei Zhang	This paper presents a novel method to manipulate the visual appearance (pose and attribute) of a person image according to natural language descriptions.
369	STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing	Ming Liu, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, Wangmeng Zuo, Shilei Wen	In this work, we suggest to address these issues from selective transfer perspective.
370	Towards Instance-Level Image-To-Image Translation	Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, Thomas S. Huang	In this paper, we present a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially. We also collect a large-scale benchmark for the new instance-level translation task.
371	Dense Intrinsic Appearance Flow for Human Pose Transfer	Yining Li, Chen Huang, Chen Change Loy	We present a novel approach for the task of human pose transfer, which aims at synthesizing a new image of a person from an input image of that person and a target pose.
372	Depth-Aware Video Frame Interpolation	Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang	In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information.
373	Sliced Wasserstein Generative Models	Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool	In this paper, we introduce novel approximations of the primal and dual SWD.
374	Deep Flow-Guided Video Inpainting	Rui Xu, Xiaoxiao Li, Bolei Zhou, Chen Change Loy	In this work we propose a novel flow-guided video inpainting approach.
375	Video Generation From Single Semantic Label Map	Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang	This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
376	Polarimetric Camera Calibration Using an LCD Monitor	Zhixiang Wang, Yinqiang Zheng, Yung-Yu Chuang	In this paper, we propose to jointly calibrate the polarizer angles and the inverse CRF (ICRF) using a slightly adapted checker pattern displayed on a liquid crystal display (LCD) monitor.
377	Fully Automatic Video Colorization With Self-Regularization and Diversity	Chenyang Lei, Qifeng Chen	We present a fully automatic approach to video colorization with self-regularization and diversity.
378	Zoom to Learn, Learn to Zoom	Xuaner Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun	This paper shows that when applying machine learning to digital zoom, it is beneficial to operate on real, RAW sensor data.
379	Single Image Reflection Removal Beyond Linearity	Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, Guoqiang Han, Shengfeng He	In this paper, we inject non-linearity into reflection removal from two aspects.
380	Learning to Separate Multiple Illuminants in a Single Image	Zhuo Hui, Ayan Chakrabarti, Kalyan Sunkavalli, Aswin C. Sankaranarayanan	We present a method to separate a single image captured under two illuminants, with different spectra, into the two images corresponding to the appearance of the scene under each individual illuminant.
381	Shape Unicode: A Unified Shape Representation	Sanjeev Muralikrishnan, Vladimir G. Kim, Matthew Fisher, Siddhartha Chaudhuri	We propose a unified code for 3D shapes, dubbed Shape Unicode, that imbibes shape cues across these representations into a single code, and a novel framework to learn such a code space for any 3D shape dataset.
382	Robust Video Stabilization by Optimization in CNN Weight Space	Jiyang Yu, Ravi Ramamoorthi	We propose a novel robust video stabilization method.
383	Learning Linear Transformations for Fast Image and Video Style Transfer	Xueting Li, Sifei Liu, Jan Kautz, Ming-Hsuan Yang	In this work, we present an approach for universal style transfer that learns the transformation matrix in a data-driven fashion.
384	Local Detection of Stereo Occlusion Boundaries	Jialiang Wang, Todd Zickler	This paper describes the local signatures for stereo occlusion boundaries that exist in a stereo cost volume, and it introduces a local detector for them based on a simple feedforward network with relatively small receptive fields.
385	Bi-Directional Cascade Network for Perceptual Edge Detection	Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang	To extract edges at dramatically different scales, we propose a Bi-Directional Cascade Network (BDCN) structure, where an individual layer is supervised by labeled edges at its specific scale, rather than directly applying the same supervision to all CNN outputs.
386	Single Image Deraining: A Comprehensive Benchmark Analysis	Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K. Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, Xiaochun Cao	We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes.
387	Dynamic Scene Deblurring With Parameter Selective Sharing and Nested Skip Connections	Hongyun Gao, Xin Tao, Xiaoyong Shen, Jiaya Jia	Inside the subnetwork of each scale, we propose a nested skip connection structure for the nonlinear transformation modules to replace stacked convolution layers or residual blocks. Besides, we build a new large dataset of blurred/sharp image pairs towards better restoration quality.
388	Events-To-Video: Bringing Modern Computer Vision to Event Cameras	Henri Rebecq, Rene Ranftl, Vladlen Koltun, Davide Scaramuzza	In this work, we take a different view and propose to apply existing, mature computer vision techniques to videos reconstructed from event data.
389	Feedback Network for Image Super-Resolution	Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, Wei Wu	In this paper, we propose an image super-resolution feedback network (SRFBN) to refine low-level representations with high-level information.
390	Semi-Supervised Transfer Learning for Image Rain Removal	Wei Wei, Deyu Meng, Qian Zhao, Zongben Xu, Ying Wu	Semi-Supervised Transfer Learning for Image Rain Removal.
391	EventNet: Asynchronous Recursive Event Processing	Yusuke Sekikawa, Kosuke Hara, Hideo Saito	We propose EventNet, a neural network designed for real-time processing of asynchronous event streams in a recursive and event-wise manner.
392	Recurrent Back-Projection Network for Video Super-Resolution	Muhammad Haris, Gregory Shakhnarovich, Norimichi Ukita	We proposed a novel architecture for the problem of video super-resolution. We propose a new video super-resolution benchmark, allowing evaluation at a larger scale and considering videos in different motion regimes.
393	Cascaded Partial Decoder for Fast and Accurate Salient Object Detection	Zhe Wu, Li Su, Qingming Huang	In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection.
394	A Simple Pooling-Based Design for Real-Time Salient Object Detection	Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang	We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks.
395	Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection	Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, Le Zhang	In this paper, we utilize contrast prior, which used to be a dominant cue in none deep learning based SOD approaches, into CNNs-based architecture to enhance the depth information.
396	Progressive Image Deraining Networks: A Better and Simpler Baseline	Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, Deyu Meng	To handle this issue, this paper provides a better and simpler baseline deraining network by considering network architecture, input and output, and loss functions.
397	GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud	Li Yi, Wang Zhao, He Wang, Minhyuk Sung, Leonidas J. Guibas	We introduce a novel 3D object proposal approach named Generative Shape Proposal Network (GSPN) for instance segmentation in point cloud data.
398	Attentive Relational Networks for Mapping Images to Scene Graphs	Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, Jiebo Luo	In this study, we propose a novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem.
399	Relational Knowledge Distillation	Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho	We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead.
400	Compressing Convolutional Neural Networks via Factorized Convolutional Filters	Tuanhui Li, Baoyuan Wu, Yujiu Yang, Yanbo Fan, Yong Zhang, Wei Liu	In this work, we propose to conduct filter selection and filter learning simultaneously, in a unified model.
401	On the Intrinsic Dimensionality of Image Representations	Sixue Gong, Vishnu Naresh Boddeti, Anil K. Jain	This paper addresses the following questions pertaining to the intrinsic dimensionality of any given image representation: (i) estimate its intrinsic dimensionality, (ii) develop a deep neural network based non-linear mapping, dubbed DeepMDS, that transforms the ambient representation to the minimal intrinsic space, and (iii) validate the veracity of the mapping through image matching in the intrinsic space.
402	Part-Regularized Near-Duplicate Vehicle Re-Identification	Bing He, Jia Li, Yifan Zhao, Yonghong Tian	In this paper, we proposed a simple but efficient part-regularized discriminative feature preserving method which enhances the perceptive ability of subtle discrepancies.
403	Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics	Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu	In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation.
404	Classification-Reconstruction Learning for Open-Set Recognition	Ryota Yoshihashi, Wen Shao, Rei Kawakami, Shaodi You, Makoto Iida, Takeshi Naemura	In contrast, we train networks for joint classification and reconstruction of input data.
405	Emotion-Aware Human Attention Prediction	Macario O. Cordel II, Shaojing Fan, Zhiqi Shen, Mohan S. Kankanhalli	In this work, we investigate the relation between object sentiment and human attention.
406	Residual Regression With Semantic Prior for Crowd Counting	Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, Wei Liu	In this paper, a residual regression framework is proposed for crowd counting utilizing the correlation information among samples.
407	Context-Reinforced Semantic Segmentation	Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng	In this paper, we propose a dedicated module, Context Net, to better explore the context information in p-maps.
408	Adversarial Structure Matching for Structured Prediction Tasks	Jyh-Jing Hwang, Tsung-Wei Ke, Jianbo Shi, Stella X. Yu	We, on the other hand, approach this problem from an opposing angle and propose a new framework, Adversarial Structure Matching (ASM), for training such structured prediction networks via an adversarial process, in which we train a structure analyzer that provides the supervisory signals, the ASM loss.
409	Deep Spectral Clustering Using Dual Autoencoder Network	Xu Yang, Cheng Deng, Feng Zheng, Junchi Yan, Wei Liu	In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering.
410	Deep Asymmetric Metric Learning via Rich Relationship Mining	Xinyi Xu, Yanhua Yang, Cheng Deng, Feng Zheng	This motivates us to propose a novel framework, named deep asymmetric metric learning via rich relationship mining (DAMLRRM), to mine rich relationship under satisfying sampling size.
411	Did It Change? Learning to Detect Point-Of-Interest Changes for Proactive Map Updates	Jerome Revaud, Minhyeok Heo, Rafael S. Rezende, Chanmi You, Seong-Gyun Jeong	Motivated by the broad availability of geo-tagged street-view images, we propose a new task aiming to make the map update process more proactive. Faced with the lack of an appropriate benchmark, we build and release a large dataset, captured in two large shopping centers, that comprises 33K geo-localized images and 578 POIs.
412	Associatively Segmenting Instances and Semantics in Point Clouds	Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia	In this paper, we first introduce a simple and flexible framework to segment instances and semantics in point clouds simultaneously. Then, we propose two approaches which make the two tasks take advantage of each other, leading to a win-win situation.
413	Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation	Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, Nicu Sebe, Jian Yang	In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation.
414	Scene Categorization From Contours: Medial Axis Based Salience Measures	Morteza Rezanejad, Gabriel Downs, John Wilder, Dirk B. Walther, Allan Jepson, Sven Dickinson, Kaleem Siddiqi	Specifically, we use off-the-shelf pre-trained Convolutional Neural Networks (CNNs) to perform scene classification given only contour information as input, and find performance levels well above chance.
415	Unsupervised Image Captioning	Yang Feng, Lin Ma, Wei Liu, Jiebo Luo	In this paper, we make the first attempt to train an image captioning model in an unsupervised manner.
416	Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables	Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu	In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises.
417	Cross-Modal Relationship Inference for Grounding Referring Expressions	Sibei Yang, Guanbin Li, Yizhou Yu	In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships, that have connections with a given expression, with a cross-modal attention mechanism, and represent the extracted information as a language-guided visual relation graph.
418	What’s to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions	Ehsan Abbasnejad, Qi Wu, Qinfeng Shi, Anton van den Hengel	We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output.
419	Iterative Alignment Network for Continuous Sign Language Recognition	Junfu Pu, Wengang Zhou, Houqiang Li	In this paper, we propose an alignment network with iterative optimization for weakly supervised continuous sign language recognition.
420	Neural Sequential Phrase Grounding (SeqGROUND)	Pelin Dogan, Leonid Sigal, Markus Gross	We propose an end-to-end approach for phrase grounding in images.
421	CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions	Runtao Liu, Chenxi Liu, Yutong Bai, Alan L. Yuille	In particular, we present two interesting and important findings using IEP-Ref: (1) the module trained to transform feature maps into segmentation masks can be attached to any intermediate module to reveal the entire reasoning process step-by-step; (2) even if all training data has at least one object referred, IEP-Ref can correctly predict no-foreground when presented with false-premise referring expressions. To address these issues and complement similar efforts in visual question answering, we build CLEVR-Ref+, a synthetic diagnostic dataset for referring expression comprehension. We will release data and code for CLEVR-Ref+.
422	Describing Like Humans: On Diversity in Image Captioning	Qingzhong Wang, Antoni B. Chan	In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity.
423	MSCap: Multi-Style Image Captioning With Unpaired Stylized Text	Longteng Guo, Jing Liu, Peng Yao, Jiangwei Li, Hanqing Lu	In this paper, we propose an adversarial learning network for the task of multi-style image captioning (MSCap) with a standard factual image caption dataset and a multi-stylized language corpus without paired images.
424	CRAVES: Controlling Robotic Arm With a Vision-Based Economic System	Yiming Zuo, Weichao Qiu, Lingxi Xie, Fangwei Zhong, Yizhou Wang, Alan L. Yuille	In this paper, we present an alternative solution, which uses a 3D model to create a large number of synthetic data, trains a vision model in this virtual domain, and applies it to real-world images after domain adaptation.
425	Networks for Joint Affine and Non-Parametric Image Registration	Zhengyang Shen, Xu Han, Zhenlin Xu, Marc Niethammer	We introduce an end-to-end deep-learning framework for 3D medical image registration.
426	Learning Shape-Aware Embedding for Scene Text Detection	Zhuotao Tian, Michelle Shu, Pengyuan Lyu, Ruiyu Li, Chao Zhou, Xiaoyong Shen, Jiaya Jia	Specifically, we treat text detection as instance segmentation and propose a segmentation-based framework, which extracts each text instance as an independent connected component.
427	Learning to Film From Professional Human Motion Videos	Chong Huang, Chuan-En Lin, Zhenyu Yang, Yan Kong, Peng Chen, Xin Yang, Kwang-Ting Cheng	In this study, we propose a learning-based framework which incorporates the video contents and previous camera motions to predict the future camera motions that enable the capture of professional videos.
428	Pay Attention! – Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention	Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Boloni	In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA).
429	Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence	Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon	In this paper, we propose a simple yet effective framework for fast blind video decaptioning.
430	Learning Video Representations From Correspondence Proposals	Xingyu Liu, Joon-Young Lee, Hailin Jin	In this paper, we propose a novel neural network that learns video representations by aggregating information from potential correspondences.
431	SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks	Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, Junjie Yan	In this work we prove the core reason comes from the lack of strict translation invariance.
432	Sphere Generative Adversarial Network Based on Geometric Moment Matching	Sung Woo Park, Junseok Kwon	In the paper, we mathematically prove the good properties of sphere GAN.
433	Adversarial Attacks Beyond the Image Space	Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi-Keung Tang, Alan L. Yuille	Most existing approaches generated perturbations in the image space, i.e., each pixel can be modified independently. However, in this paper we pay special attention to the subset of adversarial examples that correspond to meaningful changes in 3D physical properties (like rotation and translation, illumination condition, etc.).
434	Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks	Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu	In this paper, we propose a translation-invariant attack method to generate more transferable adversarial examples against the defense models.
435	Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses	Jerome Rony, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, Eric Granger	In this paper, an efficient approach is proposed to generate gradient-based attacks that induce misclassifications with low L2 norm, by decoupling the direction and the norm of the adversarial perturbation that is added to the image.
436	A General and Adaptive Robust Loss Function	Jonathan T. Barron	We present a generalization of the Cauchy/Lorentzian, Geman-McClure, Welsch/Leclerc, generalized Charbonnier, Charbonnier/pseudo-Huber/L1-L2, and L2 loss functions.
437	Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration	Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang	To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements.
438	Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss	Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, Changkyu Choi	To tackle this problem, we propose to learn to quantize activations and weights via a trainable quantizer that transforms and discretizes them.
439	Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection	Ruoqi Sun, Xinge Zhu, Chongruo Wu, Chen Huang, Jianping Shi, Lizhuang Ma	In this paper, we consider transfer learning for semantic segmentation that aims to mitigate the gap between abundant synthetic data (source domain) and limited real data (target domain).
440	Unsupervised Learning of Dense Shape Correspondence	Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bronstein, Ron Kimmel	We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes.
441	Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach	Minyoung Kim, Pritish Sahu, Behnam Gholami, Vladimir Pavlovic	In this paper, we take this principle further by proposing a more systematic and effective way to achieve hypothesis consistency using Gaussian processes (GP).
442	Balanced Self-Paced Learning for Generative Adversarial Clustering Network	Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, Heng Huang	In this paper, we propose a deep Generative Adversarial Clustering Network (ClusterGAN), which tackles the problems of training of deep clustering models in unsupervised manner.
443	A Style-Based Generator Architecture for Generative Adversarial Networks	Tero Karras, Samuli Laine, Timo Aila	We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
444	Parallel Optimal Transport GAN	Gil Avraham, Yan Zuo, Tom Drummond	To address these issues, we introduce an additional regularisation term which performs optimal transport in parallel within a low dimensional representation space.
445	3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans	Ji Hou, Angela Dai, Matthias Niessner	We introduce 3D-SIS, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans.
446	Causes and Corrections for Bimodal Multi-Path Scanning With Structured Light	Yu Zhang, Daniel L. Lau, Ying Yu	In this paper, we present a general mathematical model to address the bimodal multi-path issue in a phase-measuring-profilometry scanner to measure the constructive and destructive interference between the two light paths, and by taking advantage of this interesting cue, separate the paths and make two decoupled phase measurements.
447	TextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on Meshes	Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Niessner, Leonidas J. Guibas	We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e.g., color texture maps).
448	PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image	Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, Jan Kautz	This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar regions from a single RGB image. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics.
449	Occupancy Networks: Learning 3D Reconstruction in Function Space	Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas Geiger	In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods.
450	3D Shape Reconstruction From Images in the Frequency Domain	Weichao Shen, Yunde Jia, Yuwei Wu	In this paper, we propose a Fourier-based method that reconstructs a 3D shape from images in a 2D space by predicting slices in the frequency domain.
451	SiCloPe: Silhouette-Based Clothed People	Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, Shigeo Morishima	We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models.
452	Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation	Hao Zhu, Xinxin Zuo, Sen Wang, Xun Cao, Ruigang Yang	This paper presents a novel framework to recover detailed human body shapes from a single image.
453	Convolutional Mesh Regression for Single-Image Human Shape Reconstruction	Nikos Kolotouros, Georgios Pavlakos, Kostas Daniilidis	In our work, we propose to relax this heavy reliance on the model’s parameter space.
454	H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions	Bugra Tekin, Federica Bogo, Marc Pollefeys	We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras.
455	Learning the Depths of Moving People by Watching Frozen People	Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman	We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving.
456	Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion	Zhenpei Yang, Jeffrey Z. Pan, Linjie Luo, Xiaowei Zhou, Kristen Grauman, Qixing Huang	We introduce a novel approach that extends the scope to extreme relative poses, with little or even no overlap between the input scans.
457	A Skeleton-Bridged Deep Learning Approach for Generating Meshes of Complex Topologies From Single RGB Images	Jiapeng Tang, Xiaoguang Han, Junyi Pan, Kui Jia, Xin Tong	To this end, we propose in this paper a skeleton-bridged, stage-wise learning approach to address the challenge.
458	Learning Structure-And-Motion-Aware Rolling Shutter Correction	Bingbing Zhuang, Quoc-Huy Tran, Pan Ji, Loong-Fah Cheong, Manmohan Chandraker	Our method learns from a large-scale dataset synthesized in a geometrically meaningful way where the RS effect is generated in a manner consistent with the camera motion and scene structure.
459	PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation	Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, Hujun Bao	Instead, we introduce a Pixel-wise Voting Network (PVNet) to regress pixel-wise vectors pointing to the keypoints and use these vectors to vote for keypoint locations. We further create a Truncation LINEMOD dataset to validate the robustness of our approach against truncation.
460	SelFlow: Self-Supervised Learning of Optical Flow	Pengpeng Liu, Michael Lyu, Irwin King, Jia Xu	We present a self-supervised learning approach for optical flow.
461	Taking a Deeper Look at the Inverse Compositional Algorithm	Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger	In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment.
462	Deeper and Wider Siamese Networks for Real-Time Visual Tracking	Zhipeng Zhang, Houwen Peng	In this paper, we investigate how to leverage deeper and wider convolutional neural networks to enhance tracking robustness and accuracy.
463	Self-Supervised Adaptation of High-Fidelity Face Models for Monocular Performance Tracking	Jae Shin Yoon, Takaaki Shiratori, Shoou-I Yu, Hyun Soo Park	In this paper, we propose a self-supervised domain adaptation approach to enable the animation of high-fidelity face models from a commodity camera.
464	Diverse Generation for Multi-Agent Sports Games	Raymond A. Yeh, Alexander G. Schwing, Jonathan Huang, Kevin Murphy	In this paper, we propose a new generative model for multi-agent trajectory data, focusing on the case of multi-player sports games.
465	Efficient Online Multi-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields	Yaadhav Raaj, Haroon Idrees, Gines Hidalgo, Yaser Sheikh	We present an online approach to efficiently and simultaneously detect and track 2D poses of multiple people in a video sequence.
466	GFrames: Gradient-Based Local Reference Frame for 3D Shape Matching	Simone Melzi, Riccardo Spezialetti, Federico Tombari, Michael M. Bronstein, Luigi Di Stefano, Emanuele Rodola	We introduce GFrames, a novel local reference frame (LRF) construction for 3D meshes and point clouds.
467	Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking	Andrii Maksai, Pascal Fua	In this paper, we introduce a new training procedure that confronts the algorithm to its own mistakes while explicitly attempting to minimize the number of switches, which results in better training.
468	Graph Convolutional Tracking	Junyu Gao, Tianzhu Zhang, Changsheng Xu	To comprehensively leverage the spatial-temporal structure of historical target exemplars and get benefit from the context information, in this work, we present a novel Graph Convolutional Tracking (GCT) method for high-performance visual tracking.
469	ATOM: Accurate Tracking by Overlap Maximization	Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg	We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components.
470	Visual Tracking via Adaptive Spatially-Regularized Correlation Filters	Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, Jianhua Li	In this work, we propose a novel adaptive spatially-regularized correlation filters (ASRCF) model to simultaneously optimize the filter coefficients and the spatial regularization weight.
471	Deep Tree Learning for Zero-Shot Face Anti-Spoofing	Yaojie Liu, Joel Stehouwer, Amin Jourabloo, Xiaoming Liu	In this work, we expand the ZSFA problem to a wide range of 13 types of spoof attacks, including print attack, replay attack, 3D mask attacks, and so on. In addition, to enable the study of ZSFA, we introduce the first face anti-spoofing database that contains diverse types of spoof attacks.
472	ArcFace: Additive Angular Margin Loss for Deep Face Recognition	Jiankang Deng, Jia Guo, Niannan Xue, Stefanos Zafeiriou	In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition.
473	Learning Joint Gait Representation via Quintuplet Loss Minimization	Kaihao Zhang, Wenhan Luo, Lin Ma, Wei Liu, Hongdong Li	In this paper, we propose a new Joint Unique-gait and Cross-gait Network (JUCNet), to combine the advantages of unique-gait representation with that of cross-gait representation, leading to an significantly improved performance.
474	Gait Recognition via Disentangled Representation Learning	Ziyuan Zhang, Luan Tran, Xi Yin, Yousef Atoum, Xiaoming Liu, Jian Wan, Nanxin Wang	To remedy this issue, we propose a novel AutoEncoder framework to explicitly disentangle pose and appearance features from RGB imagery and the LSTM-based integration of pose features over time produces the gait feature. In addition, we collect a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since it contains minimal gait cues compared to other views.
475	Reversible GANs for Memory-Efficient Image-To-Image Translation	Tycho F.A. van der Ouderaa, Daniel E. Worrall	We extend this framework by exploring approximately invertible architectures which are well suited to these losses.
476	Sensitive-Sample Fingerprinting of Deep Neural Networks	Zecheng He, Tianwei Zhang, Ruby Lee	In this paper, we propose a novel and practical methodology to verify the integrity of remote deep learning models, with only black-box access to the target models.
477	Soft Labels for Ordinal Regression	Raul Diaz, Amit Marathe	We present a simple and effective method that constrains these relationships among categories by seamlessly incorporating metric penalties into ground truth label representations.
478	Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks	Hao Cheng, Dongze Lian, Bowen Deng, Shenghua Gao, Tao Tan, Yanlin Geng	In this paper, we incorporate the idea of LGL into the learning objective of DNNs and explain why LGL works better from an information-theoretic perspective.
479	What Does It Mean to Learn in Deep Networks? And, How Does One Detect Adversarial Attacks?	Ciprian A. Corneanu, Meysam Madadi, Sergio Escalera, Aleix M. Martinez	Here, we derive a novel approach to define what it means to learn in deep networks, and how to use this knowledge to detect adversarial attacks.
480	Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning	Ayan Kumar Bhunia, Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy	We propose an Adversarial Feature Deformation Module (AFDM) that learns ways to elastically warp extracted features in a scalable manner.
481	Adversarial Defense Through Network Profiling Based Path Extraction	Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu	This work proposes a profiling-based method to decompose the DNN models to different functional blocks, which lead to the effective path as a new approach to exploring DNNs’ internal organization.
482	RENAS: Reinforced Evolutionary Neural Architecture Search	Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang	To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RENAS), which is an evolutionary method with reinforced mutation for NAS.
483	Co-Occurrence Neural Network	Irina Shevlev, Shai Avidan	We show how to train the filter as part of the network and report results on several data sets.
484	SpotTune: Transfer Learning Through Adaptive Fine-Tuning	Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, Rogerio Feris	In this paper, we propose an adaptive fine-tuning approach, called SpotTune, which finds the optimal fine-tuning strategy per instance for the target data.
485	Signal-To-Noise Ratio: A Robust Distance Metric for Deep Metric Learning	Tongtong Yuan, Weihong Deng, Jian Tang, Yinan Tang, Binghui Chen	In this paper, different from the approaches on learning the loss structures, we propose a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning.
486	Detection Based Defense Against Adversarial Examples From the Steganalysis Point of View	Jiayang Liu, Weiming Zhang, Yiwei Zhang, Dongdong Hou, Yujia Liu, Hongyue Zha, Nenghai Yu	In this paper, we point out that steganalysis can be applied to adversarial examples detection, and propose a method to enhance steganalysis features by estimating the probability of modifications caused by adversarial attacks.
487	HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs	Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri	We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels.
488	Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects	Michael A. Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, Anh Nguyen	In this paper, we present a framework for discovering DNN failures that harnesses 3D renderers and 3D models.
489	Blind Geometric Distortion Correction on Images Through Deep Learning	Xiaoyu Li, Bo Zhang, Pedro V. Sander, Jing Liao	We propose the first general framework to automatically correct different types of geometric distortion in a single input image.
490	Instance-Level Meta Normalization	Songhao Jia, Ding-Jie Chen, Hwann-Tzong Chen	This paper presents a normalization mechanism called Instance-Level Meta Normalization (ILM Norm) to address a learning-to-normalize problem.
491	Iterative Normalization: Beyond Standardization Towards Efficient Whitening	Lei Huang, Yi Zhou, Fan Zhu, Li Liu, Ling Shao	We propose Iterative Normalization (IterNorm), which employs Newton’s iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition.
492	On Learning Density Aware Embeddings	Soumyadeep Ghosh, Richa Singh, Mayank Vatsa	In this paper, a novel noise tolerant deep metric learning algorithm is proposed.
493	Contrastive Adaptation Network for Unsupervised Domain Adaptation	Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann	To address this issue, this paper proposes Contrastive Adaptation Network (CAN) optimizing a new metric which explicitly models the intra-class domain discrepancy and the inter-class domain discrepancy.
494	LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks	Sudhakar Kumawat, Shanmuganathan Raman	To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer.
495	Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification	Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, Xian-sheng Hua	In this paper, we propose an attribute-driven method for feature disentangling and frame re-weighting.
496	Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?	Shilin Zhu, Xin Dong, Hao Su	Inspired by this investigation, we propose the Binary Ensemble Neural Network (BENN) which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost.
497	Distilling Object Detectors With Fine-Grained Feature Imitation	Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng	To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.
498	Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure	Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han	To this end, we propose Centripetal SGD (C-SGD), a novel optimization method, which can train several filters to collapse into a single point in the parameter hyperspace.
499	Knockoff Nets: Stealing Functionality of Black-Box Models	Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz	In this work, we ask to what extent can an adversary steal functionality of such “victim” models based solely on blackbox interactions: image in, predictions out.
500	Deep Embedding Learning With Discriminative Sampling Policy	Yueqi Duan, Lei Chen, Jiwen Lu, Jie Zhou	In this paper, we propose a deep embedding with discriminative sampling policy (DE-DSP) learning framework by simultaneously training two models: a deep sampler network that learns effective sampling strategies, and a feature embedding that maps samples to the feature space.
501	Hybrid Task Cascade for Instance Segmentation	Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin	In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background.
502	Multi-Task Self-Supervised Object Detection via Recycling of Bounding Box Annotations	Wonhee Lee, Joonil Na, Gunhee Kim	To make better use of given limited labels, we propose a novel object detection approach that takes advantage of both multi-task learning (MTL) and self-supervised learning (SSL). We propose a set of auxiliary tasks that help improve the accuracy of object detection.
503	ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis	Chao Chen, Guanbin Li, Ruijia Xu, Tianshui Chen, Meng Wang, Liang Lin	In this paper, we address the issue by introducing a novel point cloud representation that can be mathematically proved rigorously rotation-invariant, i.e., identical point clouds in different orientations are unified as a unique and consistent representation.
504	Learning to Learn Relation for Important People Detection in Still Images	Wei-Hong Li, Fa-Ting Hong, Wei-Shi Zheng	In this work, we propose a deep imPOrtance relatIon NeTwork (POINT) that combines both relation modeling and feature learning.
505	Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition	Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo	In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner.
506	Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning	Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, Matthew R. Scott	Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW,which is implemented in two iterative steps (i.e., mining and weighting).
507	Domain-Symmetric Networks for Adversarial Domain Adaptation	Yabin Zhang, Hui Tang, Kui Jia, Mingkui Tan	To train the SymNet, we propose a novel adversarial learning objective whose key design is based on a two-level domain confusion scheme, where the category-level confusion loss improves over the domain-level one by driving the learning of intermediate network features to be invariant at the corresponding categories of the two domains.
508	End-To-End Supervised Product Quantization for Image Search and Retrieval	Benjamin Klein, Lior Wolf	This work presents Deep Product Quantization (DPQ), a technique that leads to more accurate retrieval and classification than the latest state of the art methods, while having similar computational complexity and memory footprint as the Product Quantization method.
509	Learning to Learn From Noisy Labeled Data	Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli	To overcome this problem, we propose a noise-tolerant training algorithm, where a meta-learning update is performed prior to conventional gradient update.
510	DSFD: Dual Shot Face Detector	Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, Feiyue Huang	In this Paper, we propose a novel detection network named Dual Shot face Detector(DSFD).
511	Label Propagation for Deep Semi-Supervised Learning	Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondrej Chum	In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network.
512	Deep Global Generalized Gaussian Networks	Qilong Wang, Peihua Li, Qinghua Hu, Pengfei Zhu, Wangmeng Zuo	To handle this issue, this paper proposes a novel deep global generalized Gaussian network (3G-Net), whose core is to estimate a global covariance of generalized Gaussian for modeling the last convolutional activations.
513	Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval	Anjan Dutta, Zeynep Akata	In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training.
514	Context-Aware Crowd Counting	Weizhe Liu, Mathieu Salzmann, Pascal Fua	In this paper, we introduce an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location.
515	Detect-To-Retrieve: Efficient Regional Aggregation for Image Search	Marvin Teichmann, Andre Araujo, Menglong Zhu, Jack Sim	In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes 94k images with manually curated boxes from 15k unique landmarks.
516	Towards Accurate One-Stage Object Detection With AP-Loss	Kean Chen, Jianguo Li, Weiyao Lin, John See, Ji Wang, Lingyu Duan, Zhibo Chen, Changwei He, Junni Zou	This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem.
517	On Exploring Undetermined Relationships for Visual Relationship Detection	Yibing Zhan, Jun Yu, Ting Yu, Dacheng Tao	In this paper, we explore the beneficial effect of undetermined relationships on visual relationship detection.
518	Learning Without Memorizing	Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, Rama Chellappa	Hence, we propose a novel approach, called `Learning without Memorizing (LwM)’, to preserve the information about existing (base) classes, without storing any of their data, while making the classifier progressively learn the new classes.
519	Dynamic Recursive Neural Network	Qiushan Guo, Zhipeng Yu, Yichao Wu, Ding Liang, Haoyu Qin, Junjie Yan	This paper proposes the dynamic recursive neural network (DRNN), which simplifies the duplicated building blocks in deep neural network.
520	Destruction and Construction Learning for Fine-Grained Image Recognition	Yue Chen, Yalong Bai, Wei Zhang, Tao Mei	In this paper, we propose a novel “Destruction and Construction Learning” (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge.
521	Distraction-Aware Shadow Detection	Quanlong Zheng, Xiaotian Qiao, Ying Cao, Rynson W.H. Lau	In this paper, we propose a Distraction-aware Shadow Detection Network (DSDNet) by explicitly learning and integrating the semantics of visual distraction regions in an end-to-end framework.
522	Multi-Label Image Recognition With Graph Convolutional Networks	Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, Yanwen Guo	To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN).
523	High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection	Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, Yinan Yu	In this paper, taking pedestrian detection as an example, we provide a new perspective where detecting objects is motivated as a high-level semantic feature detection task.
524	RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection	Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, Alex M. Bronstein	In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process.
525	Ranked List Loss for Deep Metric Learning	Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, Romain Garnier, Neil M. Robertson	In this work, we present two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them.
526	CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning	Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, Chunhua Shen	In this paper, we present CANet, a class-agnostic segmentation network that performs few-shot segmentation on new classes with only a few annotated images available.
527	Precise Detection in Densely Packed Scenes	Eran Goldman, Roei Herzig, Aviv Eisenschtat, Jacob Goldberger, Tal Hassner	We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings.
528	KE-GAN: Knowledge Embedded Generative Adversarial Networks for Semi-Supervised Scene Parsing	Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li	In this paper, we propose a novel Knowledge Embedded Generative Adversarial Networks, dubbed as KE-GAN, to tackle the challenging problem in a semi-supervised fashion.
529	Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks	Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim	We present a deep learning method for the interactive video object segmentation.
530	Fast Interactive Object Annotation With Curve-GCN	Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler	We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using a Graph Convolutional Network (GCN).
531	FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference	Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon	FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks.
532	RVOS: End-To-End Recurrent Network for Video Object Segmentation	Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques, Xavier Giro-i-Nieto	In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable.
533	DeepFlux for Skeletons in the Wild	Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi	In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms.
534	Interactive Image Segmentation via Backpropagating Refinement Scheme	Won-Dong Jang, Chang-Su Kim	An interactive image segmentation algorithm, which accepts user-annotations about a target object and the background, is proposed in this work.
535	Scene Parsing via Integrated Classification Model and Variance-Based Regularization	Hengcan Shi, Hongliang Li, Qingbo Wu, Zichen Song	In this paper, we propose an integrated classification model and a variance-based regularization to achieve more accurate classifications.
536	RAVEN: A Dataset for Relational and Analogical Visual REasoNing	Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu	In this work, we propose a new dataset, built in the context of Raven’s Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation.
537	Surface Reconstruction From Normals: A Robust DGP-Based Discontinuity Preservation Approach	Wuyuan Xie, Miaohui Wang, Mingqiang Wei, Jianmin Jiang, Jing Qin	This paper introduces a robust approach to preserve the surface discontinuity in the discrete geometry way.
538	DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images	Yuying Ge, Ruimao Zhang, Xiaogang Wang, Xiaoou Tang, Ping Luo	DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
539	Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure From Motion	Suryansh Kumar	Given dense image feature correspondences of a non-rigidly moving object across multiple frames, this paper proposes an algorithm to estimate its 3D shape for each frame.
540	LVIS: A Dataset for Large Vocabulary Instance Segmentation	Agrim Gupta, Piotr Dollar, Ross Girshick	In this work, we introduce LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation.
541	Fast Object Class Labelling via Speech	Michael Gygli, Vittorio Ferrari	Instead, we propose a new interface where classes are annotated via speech.
542	LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking	Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling	In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking.
543	Creative Flow+ Dataset	Maria Shugrina, Ziheng Liang, Amlan Kar, Jiaman Li, Angad Singh, Karan Singh, Sanja Fidler	We present the Creative Flow+ Dataset, the first diverse multi-style artistic video dataset richly labeled with per-pixel optical flow, occlusions, correspondences, segmentation labels, normals, and depth.
544	Weakly Supervised Open-Set Domain Adaptation by Dual-Domain Collaboration	Shuhan Tan, Jiening Jiao, Wei-Shi Zheng	To address this practical setting, we propose the Collaborative Distribution Alignment (CDA) method, which performs knowledge transfer bilaterally and works collaboratively to classify unlabeled data and identify outlier samples.
545	A Neurobiological Evaluation Metric for Neural Network Model Search	Nathaniel Blanchard, Jeffery Kinnison, Brandon RichardWebster, Pouya Bashivan, Walter J. Scheirer	In this paper we introduce a human-model similarity (HMS) metric, which quantifies the similarity of human fMRI and network activation behavior.
546	Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision	Alireza Zaeemzadeh, Mohsen Joneidi, Nazanin Rahnavard, Mubarak Shah	This paper presents a fast and accurate data selection method, in which the selected samples are optimized to span the subspace of all data.
547	Efficient Multi-Domain Learning by Covariance Normalization	Yunsheng Li, Nuno Vasconcelos	The problem of multi-domain learning of deep networks is considered. An adaptive layer is induced per target domain and a novel procedure, denoted covariance normalization (CovNorm), proposed to reduce its parameters.
548	Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance	Nanyang Ye, Krzysztof Wolski, Rafal K. Mantiuk	In this paper, we propose a CNN-based visibility metric, which maintains the accuracy of deep network solutions and accounts for viewing conditions.
549	A Bayesian Perspective on the Deep Image Prior	Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon	We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel.
550	ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving	Xibin Song, Peng Wang, Dingfu Zhou, Rui Zhu, Chenye Guan, Yuchao Dai, Hao Su, Hongdong Li, Ruigang Yang	In this paper, we contribute the first large scale database suitable for 3D car instance understanding – ApolloCar3D.
551	Compressing Unknown Images With Product Quantizer for Efficient Zero-Shot Classification	Jin Li, Xuguang Lan, Yang Liu, Le Wang, Nanning Zheng	Based on this intuition, a Product Quantization Zero-Shot Learning (PQZSL) method is proposed to learn embeddings as well as quantizers to compress visual features into compact codes for Approximate NN (ANN) search.
552	Self-Supervised Convolutional Subspace Clustering Network	Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao Qi, Honggang Zhang, Jun Guo, Zhouchen Lin	To achieve simultaneous feature learning and subspace clustering, we propose an end-to-end trainable framework, called Self-Supervised Convolutional Subspace Clustering Network (S^2ConvSCN), that combines a ConvNet module (for feature learning), a self-expression module (for subspace clustering) and a spectral clustering module (for self-supervision) into a joint optimization framework.
553	Multi-Scale Geometric Consistency Guided Multi-View Stereo	Qingshan Xu, Wenbing Tao	In this paper, we propose an efficient multi-scale geometric consistency guided multi-view stereo method for accurate and complete depth map estimation.
554	Privacy Preserving Image-Based Localization	Pablo Speciale, Johannes L. Schonberger, Sing Bing Kang, Sudipta N. Sinha, Marc Pollefeys	This paper proposes the first solution to what we call privacy preserving image-based localization.
555	SimulCap : Single-View Human Performance Capture With Cloth Simulation	Tao Yu, Zerong Zheng, Yuan Zhong, Jianhui Zhao, Qionghai Dai, Gerard Pons-Moll, Yebin Liu	This paper proposes a new method for live free-viewpoint human performance capture with dynamic details (e.g., cloth wrinkles) using a single RGBD camera.
556	Hierarchical Deep Stereo Matching on High-Resolution Images	Gengshan Yang, Joshua Manela, Michael Happold, Deva Ramanan	To address this issue, we propose an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy. Because high-res stereo datasets are relatively rare, we introduce a dataset with high-res stereo pairs for both training and evaluation.
557	Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference	Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, Long Quan	In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network.
558	Synthesizing 3D Shapes From Silhouette Image Collections Using Multi-Projection Generative Adversarial Networks	Xiao Li, Yue Dong, Pieter Peers, Xin Tong	We present a new weakly supervised learning-based method for generating novel category-specific 3D shapes from unoccluded image collections.
559	The Perfect Match: 3D Point Cloud Matching With Smoothed Densities	Zan Gojcic, Caifa Zhou, Jan D. Wegner, Andreas Wieser	We propose 3DSmoothNet, a full workflow to match 3D point clouds with a siamese deep learning architecture and fully convolutional layers using a voxelized smoothed density value (SDV) representation.
560	Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth	Rui Wang, Stephen M. Pizer, Jan-Michael Frahm	We propose a learning-based, multi-view dense depth map and odometry estimation method that uses Recurrent Neural Networks (RNN) and trains utilizing multi-view image reprojection and forward-backward flow-consistency losses.
561	PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing	Hengshuang Zhao, Li Jiang, Chi-Wing Fu, Jiaya Jia	This paper presents PointWeb, a new approach to extract contextual features from local neighborhood in a point cloud.
562	Scan2Mesh: From Unstructured Range Scans to 3D Meshes	Angela Dai, Matthias Niessner	We introduce Scan2Mesh, a novel data-driven generative approach which transforms an unstructured and potentially incomplete range scan into a structured 3D mesh representation.
563	Unsupervised Domain Adaptation for ToF Data Denoising With Adversarial Learning	Gianluca Agresti, Henrik Schaefer, Piergiorgio Sartor, Pietro Zanuttigh	In this paper, we avoid to rely on labeled real data in the learning framework.
564	Learning Independent Object Motion From Unlabelled Stereoscopic Videos	Zhe Cao, Abhishek Kar, Christian Hane, Jitendra Malik	We present a system for learning motion maps of independently moving objects from stereo videos.
565	Learning Single-Image Depth From Videos Using Quality Assessment Networks	Weifeng Chen, Shengyi Qian, Jia Deng	In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D.
566	Learning 3D Human Dynamics From Video	Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik	We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features.
567	Lending Orientation to Neural Networks for Cross-View Geo-Localization	Liu Liu, Hongdong Li	Inspired by this insight, this paper proposes a novel method which endows deep neural networks with the `commonsense’ of orientation. To evaluate the generalization of our method, we also created a large-scale cross-view localization benchmark containing 100K geotagged ground-aerial pairs covering a city.
568	Visual Localization by Learning Objects-Of-Interest Dense Match Regression	Philippe Weinzaepfel, Gabriela Csurka, Yohann Cabon, Martin Humenberger	We introduce a novel CNN-based approach for visual localization from a single RGB image that relies on densely matching a set of Objects-of-Interest (OOIs). Given these 2D-2D matches, together with the 3D world coordinates of each reference image, we obtain a set of 2D-3D matches from which solving a Perspective-n-Point problem gives a pose estimate.
569	Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction	Alex Wong, Stefano Soatto	We propose a novel objective function that exploits the bilateral cyclic relationship between the left and right disparities and we introduce an adaptive regularization scheme that allows the network to handle both the co-visible and occluded regions in a stereo pair.
570	Face Parsing With RoI Tanh-Warping	Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, Lu Yuan	Inspired by the physiological vision system of human, we propose a novel RoI Tanh-warping operator that combines the central vision and the peripheral vision together.
571	Multi-Person Articulated Tracking With Spatial and Temporal Embeddings	Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian	We propose a unified framework for multi-person pose estimation and tracking.
572	Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information	Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, Changhu Wang	In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation.
573	A Compact Embedding for Facial Expression Similarity	Raviteja Vemulapalli, Aseem Agarwala	Different from previous work, our goal is to describe facial expressions in a continuous fashion using a compact embedding space that mimics human visual preferences. To achieve this goal, we collect a large-scale faces-in-the-wild dataset with human annotations in the form: Expressions A and B are visually more similar when compared to expression C, and use this dataset to train a neural network that produces a compact (16-dimensional) expression embedding.
574	Deep High-Resolution Representation Learning for Human Pose Estimation	Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang	In this paper, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations.
575	Feature Transfer Learning for Face Recognition With Under-Represented Data	Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker	In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
576	Unsupervised 3D Pose Estimation With Geometric Self-Supervision	Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, James M. Rehg	We present an unsupervised learning approach to re- cover 3D human pose from 2D skeletal joints extracted from a single image.
577	Peeking Into the Future: Predicting Future Person Activities and Locations in Videos	Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G. Hauptmann, Li Fei-Fei	We propose an end-to-end, multi-task learning system utilizing rich visual features about human behavioral information and interaction with their surroundings.
578	Re-Identification With Consistent Attentive Siamese Networks	Meng Zheng, Srikrishna Karanam, Ziyan Wu, Richard J. Radke	We propose a new deep architecture for person re-identification (re-id).
579	On the Continuity of Rotation Representations in Neural Networks	Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, Hao Li	In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks.
580	Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation	Junhwa Hur, Stefan Roth	Taking inspiration from both classical energy minimization approaches as well as residual networks, we propose an iterative residual refinement (IRR) scheme based on weight sharing that can be combined with several backbone networks.
581	Inverse Discriminative Networks for Handwritten Signature Verification	Ping Wei, Huan Li, Ping Hu	In this paper, we propose an inverse discriminative network (IDN) for writer-independent handwritten signature verification, which aims to determine whether a test signature is genuine or forged compared to the reference signature. Since there was no proper Chinese signature dataset in the community, we collected a large-scale Chinese signature dataset with approximately 29,000 images of 749 individuals’ signatures.
582	Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces	Guodong Mu, Di Huang, Guosheng Hu, Jia Sun, Yunhong Wang	In this paper, we focus on 3D FR using low-quality data, targeting an efficient and accurate deep learning solution.
583	ROI Pooled Correlation Filters for Visual Tracking	Yuxuan Sun, Chong Sun, Dong Wang, You He, Huchuan Lu	In this paper, we propose a novel ROI pooled correlation filter (RPCF) algorithm for robust visual tracking.
584	Deep Video Inpainting	Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon	In this work, we propose a novel deep network architecture for fast video inpainting.
585	DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis	Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang	In this paper, we focus on generating realistic images from text descriptions.
586	Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors	Yedid Hoshen, Ke Li, Jitendra Malik	In this work, we present a novel method – Generative Latent Nearest Neighbors (GLANN) – for training generative models without adversarial training.
587	Mixture Density Generative Adversarial Networks	Hamid Eghbal-zadeh, Werner Zellinger, Gerhard Widmer	In this paper, we propose a new GAN variant called Mixture Density GAN that overcomes this problem by encouraging the Discriminator to form clusters in its embedding space, which in turn leads the Generator to exploit these and discover different modes in the data.
588	SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network	Fang Liu, Xiaoming Deng, Yu-Kun Lai, Yong-Jin Liu, Cuixia Ma, Hongan Wang	In this paper, we propose SketchGAN, a new generative adversarial network (GAN) based approach that jointly completes and recognizes a sketch, boosting the performance of both tasks.
589	Foreground-Aware Image Inpainting	Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo	To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion.
590	Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation	Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara	In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data.
591	Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching	Yu Zhang, Dongqing Zou, Jimmy S. Ren, Zhe Jiang, Xiaohao Chen	Regarding this issue, this work proposes Multi-Scale Adversarial Correlation Matching (MS-ACM), a novel learning framework for structure-aware view synthesis.
592	DynTypo: Example-Based Dynamic Text Effects Transfer	Yifang Men, Zhouhui Lian, Yingmin Tang, Jianguo Xiao	In this paper, we present a novel approach for dynamic text effects transfer by using example-based texture synthesis.
593	Arbitrary Style Transfer With Style-Attentional Networks	Dae Young Park, Kwang Hee Lee	In this paper, we introduce a novel style-attentional network (SANet) that efficiently and flexibly integrates the local style patterns according to the semantic spatial distribution of the content image.
594	Typography With Decor: Intelligent Text Style Transfer	Wenjing Wang, Jiaying Liu, Shuai Yang, Zongming Guo	In this paper, we present a novel framework to stylize the text with exquisite decor, which is ignored by the previous text stylization methods.
595	RL-GAN-Net: A Reinforcement Learning Agent Controlled GAN Network for Real-Time Point Cloud Shape Completion	Muhammad Sarmad, Hyunjoo Jenny Lee, Young Min Kim	We present RL-GAN-Net, where a reinforcement learning (RL) agent provides fast and robust control of a generative adversarial network (GAN).
596	Photo Wake-Up: 3D Character Animation From a Single Photo	Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman	We present a method and application for animating a human subject from a single photo.
597	DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality	Chloe LeGendre, Wan-Chun Ma, Graham Fyffe, John Flynn, Laurent Charbonnel, Jay Busch, Paul Debevec	We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV).
598	Iterative Residual CNNs for Burst Photography Applications	Filippos Kokkinos, Stamatis Lefkimmiatis	In this work, we focus on the fact that every frame of a burst sequence can be accurately described by a forward (physical) model.
599	Learning Implicit Fields for Generative Shape Modeling	Zhiqin Chen, Hao Zhang	We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes.
600	Reliable and Efficient Image Cropping: A Grid Anchor Based Approach	Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang	This work revisits the problem of image cropping, and presents a grid anchor based formulation by considering the special properties and requirements (e.g., local redundancy, content preservation, aspect ratio) of image cropping.
601	Patch-Based Progressive 3D Point Set Upsampling	Wang Yifan, Shihao Wu, Hui Huang, Daniel Cohen-Or, Olga Sorkine-Hornung	We propose a series of architectural design contributions that lead to a substantial performance boost.
602	An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection	Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ling Shao	This paper presents a salient object detection method that integrates both top-down and bottom-up saliency inference in an iterative and cooperative manner.
603	Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring	Hongguang Zhang, Yuchao Dai, Hongdong Li, Piotr Koniusz	To tackle the above problems, we present a deep hierarchical multi-patch network inspired by Spatial Pyramid Matching to deal with blurry images via a fine-to-coarse hierarchical representation.
604	Turn a Silicon Camera Into an InGaAs Camera	Feifan Lv, Yinqiang Zheng, Bohan Zhang, Feng Lu	In this paper, we propose a novel solution for SWIR imaging using a common Silicon sensor, which has cheaper price, higher resolution and better technical maturity compared with the specialized InGaAs sensor.
605	Low-Rank Tensor Completion With a New Tensor Nuclear Norm Induced by Invertible Linear Transforms	Canyi Lu, Xi Peng, Yunchao Wei	Low-Rank Tensor Completion With a New Tensor Nuclear Norm Induced by Invertible Linear Transforms.
606	Joint Representative Selection and Feature Learning: A Semi-Supervised Approach	Suchen Wang, Jingjing Meng, Junsong Yuan, Yap-Peng Tan	In this paper, we propose a semi-supervised approach for representative selection, which finds a small set of representatives that can well summarize a large data collection.
607	The Domain Transform Solver	Akash Bapat, Jan-Michael Frahm	We present a novel framework for edge-aware optimization that is an order of magnitude faster than the state of the art while maintaining comparable results.
608	CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection	Lu Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu, You He	To this end, we propose to leverage captioning as an auxiliary semantic task to boost salient object detection in complex scenarios.
609	Phase-Only Image Based Kernel Estimation for Single Image Blind Deblurring	Liyuan Pan, Richard Hartley, Miaomiao Liu, Yuchao Dai	Unlike existing approaches which focus on approaching the problem by enforcing various priors on the blur kernel and the latent image, we are aiming at obtaining a high quality blur kernel directly by studying the problem in the frequency domain.
610	Hierarchical Discrete Distribution Decomposition for Match Density Estimation	Zhichao Yin, Trevor Darrell, Fisher Yu	In this paper, we propose Hierarchical Discrete Distribution Decomposition (HD^3), a framework suitable for learning probabilistic pixel correspondences in both optical flow and stereo matching.
611	FOCNet: A Fractional Optimal Control Network for Image Denoising	Xixi Jia, Sanyang Liu, Xiangchu Feng, Lei Zhang	Inspired by the fact that the fractional-order differential equation has long-term memory, in this paper we develop an advanced image denoising network, namely FOCNet, by solving a fractional optimal control (FOC) problem.
612	Orthogonal Decomposition Network for Pixel-Wise Binary Classification	Chang Liu, Fang Wan, Wei Ke, Zhuowei Xiao, Yuan Yao, Xiaosong Zhang, Qixiang Ye	In this paper, we implement an Orthogonal Decomposition Unit (ODU) that transforms a convolutional feature map into orthogonal bases targeting at de-correlating neighboring pixels on convolutional features.
613	Multi-Source Weak Supervision for Saliency Detection	Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang, Mingyang Qian, Yizhou Yu	To this end, we propose a unified framework to train saliency detection models with diverse weak supervision sources.
614	ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples	Xiaojun Jia, Xingxing Wei, Xiaochun Cao, Hassan Foroosh	In this paper, we propose an end-to-end image compression model to defend adversarial examples: ComDefend.
615	Combinatorial Persistency Criteria for Multicut and Max-Cut	Jan-Hendrik Lange, Bjoern Andres, Paul Swoboda	We propose persistency criteria for the multicut and max-cut problem as well as fast combinatorial routines to verify them.
616	S4Net: Single Stage Salient-Instance Segmentation	Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu	We consider an interesting problem—salient instance segmentation.
617	A Decomposition Algorithm for the Sparse Generalized Eigenvalue Problem	Ganzhao Yuan, Li Shen, Wei-Shi Zheng	In this paper, we consider a new effective decomposition method to tackle this problem.
618	Polynomial Representation for Persistence Diagram	Zhichao Wang, Qian Li, Gang Li, Guandong Xu	In this work, we discover a set of general polynomials that vanish on vectorized PDs and extract the task-adapted feature representation from these polynomials.
619	Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks	Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao	In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
620	Cross-Atlas Convolution for Parameterization Invariant Learning on Textured Mesh Surface	Shiwei Li, Zixin Luo, Mingmin Zhen, Yao Yao, Tianwei Shen, Tian Fang, Long Quan	We present a convolutional network architecture for direct feature learning on mesh surfaces through their atlases of texture maps.
621	Deep Surface Normal Estimation With Hierarchical RGB-D Fusion	Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, Yongtian Wang	In this paper, a hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image.
622	Knowledge-Embedded Routing Network for Scene Graph Generation	Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin	In this work, we find that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue.
623	An End-To-End Network for Panoptic Segmentation	Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang	To address the problems, we propose a novel end-to-end Occlusion Aware Network (OANet) for panoptic segmentation, which can efficiently and effectively predict both the instance and stuff segmentation in a single network.
624	Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models	Daniel Ritchie, Kai Wang, Yu-An Lin	We present a new, fast and flexible pipeline for indoor scene synthesis that is based on deep convolutional generative models.
625	Marginalized Latent Semantic Encoder for Zero-Shot Learning	Zhengming Ding, Hongfu Liu	In this paper, we attempt to exploit the intrinsic relationship in the semantic manifold when given semantics are not enough to describe the visual objects, and enhance the generalization ability of the visual-semantic function with marginalized strategy.
626	Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation	Jaime Spencer, Richard Bowden, Simon Hadfield	Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information.
627	Unsupervised Embedding Learning via Invariant and Spreading Instance Feature	Mang Ye, Xu Zhang, Pong C. Yuen, Shih-Fu Chang	Motivated by the positive concentrated and negative separated properties observed from category-wise supervised learning, we propose to utilize the instance-wise supervision to approximate these properties, which aims at learning data augmentation invariant and instance spread-out features.
628	AOGNets: Compositional Grammatical Architectures for Deep Learning	Xilai Li, Xi Song, Tianfu Wu	This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs.
629	A Robust Local Spectral Descriptor for Matching Non-Rigid Shapes With Incompatible Shape Structures	Yiqun Wang, Jianwei Guo, Dong-Ming Yan, Kai Wang, Xiaopeng Zhang	Focusing on this issue, in this paper, we present a more discriminative local descriptor for deformable 3D shapes with incompatible structures. Finally, for training and evaluation, we present a new benchmark dataset by extending the widely used FAUST dataset.
630	Context and Attribute Grounded Dense Captioning	Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao	In this work, we investigate contextual reasoning based on multi-scale message propagations from the neighboring contents to the target ROIs.
631	Spot and Learn: A Maximum-Entropy Patch Sampler for Few-Shot Image Classification	Wen-Hsuan Chu, Yu-Jhe Li, Jing-Cheng Chang, Yu-Chiang Frank Wang	In this work, we propose a sampling method that de-correlates an image based on maximum entropy reinforcement learning, and extracts varying sequences of patches on every forward-pass with discriminative information observed.
632	Interpreting CNNs via Decision Trees	Quanshi Zhang, Yu Yang, Haotian Ma, Ying Nian Wu	This paper aims to quantitatively explain the rationales of each prediction that is made by a pre-trained convolutional neural network (CNN).
633	Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning	Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon	Our goal in this work is to train an image captioning model that generates more dense and informative captions.
634	Deep Modular Co-Attention Networks for Visual Question Answering	Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, Qi Tian	In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth.
635	Synthesizing Environment-Aware Activities via Activity Sketches	Yuan-Hong Liao, Xavier Puig, Marko Boben, Antonio Torralba, Sanja Fidler	In this work, we address the problem: environment-aware program generation.
636	Self-Critical N-Step Training for Image Captioning	Junlong Gao, Shiqi Wang, Shanshe Wang, Siwei Ma, Wen Gao	In this paper, we estimate state value without using a parametrized value estimator.
637	Multi-Target Embodied Question Answering	Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra	We present a generalization of EQA — Multi-Target EQA (MT-EQA).
638	Visual Question Answering as Reading Comprehension	Hui Li, Peng Wang, Chunhua Shen, Anton van den Hengel	In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem.
639	StoryGAN: A Sequential Conditional GAN for Story Visualization	Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao	In this work, we propose a new task called Story Visualization.
640	Noise-Aware Unsupervised Deep Lidar-Stereo Fusion	Xuelian Cheng, Yiran Zhong, Yuchao Dai, Pan Ji, Hongdong Li	In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps.
641	Versatile Multiple Choice Learning and Its Application to Vision Computing	Kai Tian, Yi Xu, Shuigeng Zhou, Jihong Guan	In this paper, a new MCL method, called vMCL (the abbreviation of versatile Multiple Choice Learning), is developed to extend the application scenarios of MCL methods by ensembling deep neural networks.
642	EV-Gait: Event-Based Robust Gait Recognition Using Dynamic Vision Sensors	Yanxiang Wang, Bowen Du, Yiran Shen, Kai Wu, Guangrong Zhao, Jianguo Sun, Hongkai Wen	In this paper, we introduce a new type of sensing modality, the Dynamic Vision Sensors (Event Cameras), for the task of gait recognition. To evaluate the performance of EV-Gait, we collect two event-based gait datasets, one from real-world experiments and the other by converting the publicly available RGB gait recognition benchmark CASIA-B.
643	ToothNet: Automatic Tooth Instance Segmentation and Identification From Cone Beam CT Images	Zhiming Cui, Changjian Li, Wenping Wang	This paper proposes a method that uses deep convolutional neural networks to achieve automatic and accurate tooth instance segmentation and identification from CBCT (cone beam CT) images for digital dentistry.
644	Modularized Textual Grounding for Counterfactual Resilience	Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang	To address these issues, we propose a visual grounding system which is 1) end-to-end trainable in a weakly supervised fashion with only image-level annotations, and 2) counterfactually resilient owing to the modular design.
645	L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving	Weixin Lu, Yao Zhou, Guowei Wan, Shenhua Hou, Shiyu Song	We present L3-Net – a novel learning-based LiDAR localization system that achieves centimeter-level localization accuracy, comparable to prior state-of-the-art systems with hand-crafted pipelines.
646	Panoptic Feature Pyramid Networks	Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollar	In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks.
647	Mask Scoring R-CNN	Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, Xinggang Wang	In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks.
648	Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection	Hang Xu, Chenhan Jiang, Xiaodan Liang, Liang Lin, Zhenguo Li	In this paper, we address the large-scale object detection problem with thousands of categories, which poses severe challenges due to long-tail data distributions, heavy occlusions, and class ambiguities.
649	Cross-Modality Personalization for Retrieval	Nils Murrugarra-Llerena, Adriana Kovashka	In this work, we propose a model for modeling cross-modality personalized retrieval.
650	Composing Text and Image for Image Retrieval – an Empirical Odyssey	Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays	In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
651	Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation	Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, Sungjin Kim	To solve the problem, we propose a robust scene text detection method with adaptive text region representation.
652	Adaptive NMS: Refining Pedestrian Detection in a Crowd	Songtao Liu, Di Huang, Yunhong Wang	The contributions are threefold: (1) we propose adaptive-NMS, which applies a dynamic suppression threshold to an instance, according to the target density; (2) we design an efficient subnetwork to learn density scores, which can be conveniently embedded into both the single-stage and two-stage detectors; and (3) we achieve state of the art results on the CityPersons and CrowdHuman benchmarks.
653	Point in, Box Out: Beyond Counting Persons in Crowds	Yuting Liu, Miaojing Shi, Qijun Zhao, Xiaofang Wang	In this work, we instead propose a new deep detection network with only point supervision required.
654	Locating Objects Without Bounding Boxes	Javier Ribera, David Guera, Yuhao Chen, Edward J. Delp	In this paper, we address the task of estimating object locations without annotated bounding boxes which are typically hand-drawn and time consuming to label.
655	FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery	Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee	We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories.
656	Mutual Learning of Complementary Networks via Residual Correction for Improving Semi-Supervised Classification	Si Wu, Jichang Li, Cheng Liu, Zhiwen Yu, Hau-San Wong	In this paper, we explore how to capture the complementary information to enhance mutual learning.
657	Sampling Techniques for Large-Scale Object Detection From Sparsely Annotated Objects	Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki	In this study, we propose part-aware sampling, a method that uses human intuition for the hierarchical relation between objects.
658	Curls & Whey: Boosting Black-Box Adversarial Attacks	Yucheng Shi, Siyu Wang, Yahong Han	In this work, we propose Curls & Whey black-box attack to fix the above two defects.
659	Barrage of Random Transforms for Adversarially Robust Defense	Edward Raff, Jared Sylvester, Steven Forsyth, Mark McLean	In this paper, we explore the idea of stochastically combining a large number of individually weak defenses into a single barrage of randomized transformations to build a strong defense against adversarial attacks.
660	Aggregation Cross-Entropy for Sequence Recognition	Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie	In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective.
661	LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning	Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes, Alex M. Bronstein	In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.
662	Few-Shot Learning With Localization in Realistic Settings	Davis Wertheimer, Bharath Hariharan	We introduce three parameter-free improvements: (a) better training procedures based on adapting cross-validation to meta-learning, (b) novel architectures that localize objects using limited bounding box annotations before classification, and (c) simple parameter-free expansions of the feature space based on bilinear pooling.
663	AdaGraph: Unifying Predictive and Continuous Domain Adaptation Through Graphs	Massimiliano Mancini, Samuel Rota Bulo, Barbara Caputo, Elisa Ricci	Our contribution is the first deep architecture that tackles predictive domain adaptation, able to leverage over the information brought by the auxiliary domains through a graph.
664	Grounded Video Description	Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach	In this work, we explicitly link the sentence to the evidence in the video by annotating each noun phrase in a sentence with the corresponding bounding box in one of the frames of a video.
665	Streamlined Dense Video Captioning	Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han	To tackle this challenge, we propose a novel dense video captioning framework, which models temporal dependency across events in a video explicitly and leverages visual and linguistic context from prior events for coherent storytelling.
666	Adversarial Inference for Multi-Sentence Video Description	Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach	In this work, we instead propose to apply adversarial techniques during inference, designing a discriminator which encourages better multi-sentence video description.
667	Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations	Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma	We propose the Unified Visual-Semantic Embeddings (Unified VSE) for learning a joint space of visual representation and textual semantics.
668	Learning to Compose Dynamic Tree Structures for Visual Contexts	Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu	We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.
669	Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation	Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang	In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems.
670	Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering	Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, Hongsheng Li	We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities.
671	Cycle-Consistency for Robust Visual Question Answering	Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh	As a step towards improving robustness of VQA models, we propose a model-agnostic framework that exploits cycle consistency. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that state-of-the-art VQA models are notoriously brittle to linguistic variations in questions.
672	Embodied Question Answering in Photorealistic Environments With Point Cloud Perception	Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra	To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task — Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).
673	Reasoning Visual Dialogs With Structural and Partial Observations	Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu	We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures.
674	Recursive Visual Attention in Visual Dialog	Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen	In this work, to resolve the visual co-reference for visual dialog, we propose a novel attention mechanism called Recursive Visual Attention (RvA).
675	Two Body Problem: Collaborative Visual Task Completion	Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander G. Schwing, Aniruddha Kembhavi	In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks.
676	GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering	Drew A. Hudson, Christopher D. Manning	We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets.
677	Text2Scene: Generating Compositional Scenes From Textual Descriptions	Fuwen Tan, Song Feng, Vicente Ordonez	In this paper, we propose Text2Scene, a model that generates various forms of compositional scene representations from natural language descriptions.
678	From Recognition to Cognition: Visual Commonsense Reasoning	Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi	To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes.
679	The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation	Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira	In this paper, inspired by the intuition of viewing the problem as search on a navigation graph, we propose to use a progress monitor developed in prior work as a learnable heuristic for search.
680	Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation	Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa	We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the 2018 Room-to-Room (R2R) Vision-and-Language navigation challenge.
681	Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning	Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi	In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation.
682	High Flux Passive Imaging With Single-Photon Sensors	Atul Ingle, Andreas Velten, Mohit Gupta	We propose passive free-running SPAD (PF-SPAD) imaging, an imaging modality that uses SPADs for capturing 2D intensity images with unprecedented dynamic range under ambient lighting, without any active light source.
683	Photon-Flooded Single-Photon 3D Cameras	Anant Gupta, Atul Ingle, Andreas Velten, Mohit Gupta	In this paper, we address the following basic question: what is the optimal photon flux that a SPAD-based LiDAR should be operated in?
684	Acoustic Non-Line-Of-Sight Imaging	David B. Lindell, Gordon Wetzstein, Vladlen Koltun	We introduce acoustic NLOS imaging, which is orders of magnitude less expensive than most optical systems and captures hidden 3D geometry at longer ranges with shorter acquisition times compared to state-of-the-art optical methods.
685	Steady-State Non-Line-Of-Sight Imaging	Wenzheng Chen, Simon Daneau, Fahim Mannan, Felix Heide	To tackle the shape-dependence of these variations, we propose a trainable architecture which learns to map diffuse indirect reflections to scene reflectance using only synthetic training data.
686	A Theory of Fermat Paths for Non-Line-Of-Sight Shape Reconstruction	Shumian Xin, Sotiris Nousias, Kiriakos N. Kutulakos, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan, Ioannis Gkioulekas	We present a novel theory of Fermat paths of light between a known visible scene and an unknown object not in the line of sight of a transient camera.
687	End-To-End Projector Photometric Compensation	Bingyao Huang, Haibin Ling	In this paper, for the first time, we formulate the compensation problem as an end-to-end learning problem and propose a convolutional neural network, named CompenNet, to implicitly learn the complex compensation function.
688	Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera	Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, Yuchao Dai	In this paper, we propose a simple and effective approach, the Event-based Double Integral (EDI) model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data.
689	Bringing Alive Blurred Moments	Kuldeep Purohit, Anshul Shah, A. N. Rajagopalan	We present a solution for the goal of extracting a video from a single motion blurred image to sequentially reconstruct the clear views of a scene as beheld by the camera during the time of exposure.
690	Learning to Synthesize Motion Blur	Tim Brooks, Jonathan T. Barron	We present a technique for synthesizing a motion blurred image from a pair of unblurred images captured in succession.
691	Underexposed Photo Enhancement Using Deep Illumination Estimation	Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, Jiaya Jia	This paper presents a new neural network for enhancing underexposed photos.
692	Blind Visual Motif Removal From a Single Image	Amir Hertz, Sharon Fogel, Rana Hanocka, Raja Giryes, Daniel Cohen-Or	This work proposes a deep learning based technique for blind removal of such objects.
693	Non-Local Meets Global: An Integrated Paradigm for Hyperspectral Denoising	Wei He, Quanming Yao, Chao Li, Naoto Yokoya, Qibin Zhao	In this paper, we claim that the HSI lies in a global spectral low-rank subspace, and the spectral subspaces of each full band patch groups should lie in this global low-rank subspace.
694	Neural Rerendering in the Wild	Moustafa Meshry, Dan B. Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin-Brualla	We explore total scene capture — recording, modeling, and rerendering a scene under varying appearance such as season and time of day.
695	GeoNet: Deep Geodesic Networks for Point Cloud Analysis	Tong He, Haibin Huang, Li Yi, Yuqian Zhou, Chihao Wu, Jue Wang, Stefano Soatto	Thus we introduce GeoNet, the first deep learning architecture trained to model the intrinsic structure of surfaces represented as point clouds.
696	MeshAdv: Adversarial Meshes for Visual Recognition	Chaowei Xiao, Dawei Yang, Bo Li, Jia Deng, Mingyan Liu	In this paper, we propose meshAdv to generate “adversarial 3D meshes” from objects that have rich shape features but minimal textural variation.
697	Fast Spatially-Varying Indoor Lighting Estimation	Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Jean-Francois Lalonde	We propose a real-time method to estimate spatially-varying indoor lighting from a single RGB image.
698	Neural Illumination: Lighting Prediction for Indoor Environments	Shuran Song, Thomas Funkhouser	Instead, we propose “Neural Illumination,” a new approach that decomposes illumination prediction into several simpler differentiable sub-tasks: 1) geometry estimation, 2) scene completion, and 3) LDR-to-HDR estimation.
699	Deep Sky Modeling for Single Image Outdoor Lighting Estimation	Yannick Hold-Geoffroy, Akshaya Athawale, Jean-Francois Lalonde	We propose a data-driven learned sky model, which we use for outdoor lighting estimation from a single image.
700	Bidirectional Learning for Domain Adaptation of Semantic Segmentation	Yunsheng Li, Lu Yuan, Nuno Vasconcelos	In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation.
701	Enhanced Bayesian Compression via Deep Reinforcement Learning	Xin Yuan, Liangliang Ren, Jiwen Lu, Jie Zhou	In this paper, we propose an Enhanced Bayesian Compression method to flexibly compress the deep networks via reinforcement learning.
702	Strong-Weak Distribution Alignment for Adaptive Object Detection	Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko	We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection.
703	MFAS: Multimodal Fusion Architecture Search	Juan-Manuel Perez-Rua, Valentin Vielzeuf, Stephane Pateux, Moez Baccouche, Frederic Jurie	We propose a novel and generic search space that spans a large number of possible fusion architectures.
704	Disentangling Adversarial Robustness and Generalization	David Stutz, Matthias Hein, Bernt Schiele	In an effort to clarify the relationship between robustness and generalization, we assume an underlying, low-dimensional data manifold and show that: 1. regular adversarial examples leave the manifold; 2. adversarial examples constrained to the manifold, i.e., on-manifold adversarial examples, exist; 3. on-manifold adversarial examples are generalization errors, and on-manifold adversarial training boosts generalization; 4. regular robustness and generalization are not necessarily contradicting goals.
705	ShieldNets: Defending Against Adversarial Attacks Using Probabilistic Adversarial Robustness	Rajkumar Theagarajan, Ming Chen, Bir Bhanu, Jing Zhang	ShieldNet is implemented as a demonstration of PAR in this work by using PixelCNN.
706	Deeply-Supervised Knowledge Synergy	Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao	In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference.
707	Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration	Xing Liu, Masanori Suganuma, Zhun Sun, Takayuki Okatani	In this paper, we study design of deep neural networks for tasks of image restoration.
708	Probabilistic End-To-End Noise Correction for Learning With Noisy Labels	Kun Yi, Jianxin Wu	To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions.
709	Attention-Guided Unified Network for Panoptic Segmentation	Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang	Existing methods mostly dealt with these two problems separately, but in this paper, we reveal the underlying relationship between them, in particular, FG objects provide complementary cues to assist BG understanding.
710	NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection	Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le	Here we aim to learn a better architecture of feature pyramid network for object detection.
711	OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks	Jiashi Li, Qi Qi, Jingyu Wang, Ce Ge, Yujian Li, Zhangzhang Yue, Haifeng Sun	Our proposed Out-In-Channel Sparsity Regularization (OICSR) considers correlations between successive layers to further retain predictive power of the compact network.
712	Semantically Aligned Bias Reducing Zero Shot Learning	Akanksha Paul, Narayanan C. Krishnan, Prateek Munjal	In this work, we propose a novel approach, Semantically Aligned Bias Reducing (SABR) ZSL, which focuses on solving both the problems.
713	Feature Space Perturbations Yield More Transferable Adversarial Examples	Nathan Inkawhich, Wei Wen, Hai (Helen) Li, Yiran Chen	This work describes a transfer-based blackbox targeted adversarial attack of deep feature space representations that also provides insights into cross-model class representations of deep CNNs.
714	IGE-Net: Inverse Graphics Energy Networks for Human Pose Estimation and Single-View Reconstruction	Dominic Jack, Frederic Maire, Sareh Shirazi, Anders Eriksson	We propose using a deep-learning based energy minimization framework to learn a consistency measure between 2D observations and a proposed world model, and demonstrate that this framework can be trained end-to-end to produce consistent and realistic inferences.
715	Accelerating Convolutional Neural Networks via Activation Map Compression	Georgios Georgiadis	Towards this end, we propose a three-stage compression and acceleration pipeline that sparsifies, quantizes and entropy encodes activation maps of Convolutional Neural Networks.
716	Knowledge Distillation via Instance Relationship Graph	Yufan Liu, Jiajiong Cao, Bing Li, Chunfeng Yuan, Weiming Hu, Yangxi Li, Yunqiang Duan	In this paper, a novel Instance Relationship Graph (IRG) is proposed for knowledge distillation.
717	PPGNet: Learning Point-Pair Graph for Line Segment Detection	Ziheng Zhang, Zhengxin Li, Ning Bi, Jia Zheng, Jinlei Wang, Kun Huang, Weixin Luo, Yanyu Xu, Shenghua Gao	In this paper, we present a novel framework to detect line segments in man-made environments.
718	Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling	Zhen Wei, Jingyi Zhang, Li Liu, Fan Zhu, Fumin Shen, Yi Zhou, Si Liu, Yao Sun, Ling Shao	In this work, we propose a polynomial pooling (P-pooling) function that finds an intermediate form between max and average pooling to provide an optimally balanced and self-adjusted pooling strategy for semantic segmentation.
719	Variational Bayesian Dropout With a Hierarchical Prior	Yuhang Liu, Wenyong Dong, Lei Zhang, Dong Gong, Qinfeng Shi	To address this problem, we present a new generalization of Gaussian dropout, termed variational Bayesian dropout (VBD), which turns to exploit a hierarchical prior on the network weights and infer a new joint posterior.
720	AANet: Attribute Attention Network for Person Re-Identifications	Chiat-Pin Tay, Sharmili Roy, Kim-Hui Yap	This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into a classification framework to solve the person re-identification (re-ID) problem.
721	Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction	Osama Makansi, Eddy Ilg, Ozgun Cicek, Thomas Brox	In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes.
722	A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks	Yinghao Xu, Xin Dong, Yudian Li, Hao Su	In this paper, we, for the first time, define the filter-level pruning problem for binary neural networks, which cannot be solved by simply migrating existing structural pruning methods for full-precision models.
723	PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet	Yasuhiro Aoki, Hunter Goforth, Rangaprasad Arun Srivatsan, Simon Lucey	In this paper we argue that PointNet itself can be thought of as a learnable “imaging” function.
724	Few-Shot Adaptive Faster R-CNN	Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng	To mitigate the detection performance drop caused by domain shift, we aim to develop a novel few-shot adaptation approach that requires only a few target domain images with limited bounding box annotations.
725	VRSTC: Occlusion-Free Video Person Re-Identification	Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen	In this paper, we propose a novel network, called Spatio-Temporal Completion network (STCnet), to explicitly handle partial occlusion problem.
726	Compact Feature Learning for Multi-Domain Image Classification	Yajing Liu, Xinmei Tian, Ya Li, Zhiwei Xiong, Feng Wu	Therefore, we propose an end-to-end network to obtain the more optimal features, which we call compact features.
727	Adaptive Transfer Network for Cross-Domain Person Re-Identification	Jiawei Liu, Zheng-Jun Zha, Di Chen, Richang Hong, Meng Wang	In this work, we propose a novel adaptive transfer network (ATNet) for effective cross-domain person re-identification.
728	Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy	Aoxue Li, Tiange Luo, Zhiwu Lu, Tao Xiang, Liwei Wang	To overcome the challenge, we propose a novel large-scale FSL model by learning transferable visual features with the class hierarchy which encodes the semantic relations between source and target classes.
729	Moving Object Detection Under Discontinuous Change in Illumination Using Tensor Low-Rank and Invariant Sparse Decomposition	Moein Shakeri, Hong Zhang	Our method relies on the multilinear (tensor) data low-rank and sparse decomposition framework to address the weaknesses of existing methods.
730	Pedestrian Detection With Autoregressive Network Phases	Garrick Brazil, Xiaoming Liu	We present an autoregressive pedestrian detection framework with cascaded phases designed to progressively improve precision.
731	All You Need Is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification	Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu	To put this direction forward, a new and novel basic component named Sparse Shift Layer (SSL) is introduced in this paper to construct efficient convolutional neural networks.
732	Stochastic Class-Based Hard Example Mining for Deep Metric Learning	Yumin Suh, Bohyung Han, Wonsik Kim, Kyoung Mu Lee	To alleviate this limitation, we propose a stochastic hard negative mining method.
733	Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning	Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, Jiebo Luo	In this paper, we argue that a measure at such a level may not be effective enough in light of the scarcity of examples in few-shot learning.
734	Towards Robust Curve Text Detection With Conditional Spatial Expansion	Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, Wang Ling Goh	Instead of regarding the curve text detection as a polygon regression or a segmentation problem, we formulate it as a sequence prediction on the spatial domain.
735	Revisiting Perspective Information for Efficient Crowd Counting	Miaojing Shi, Zhaohui Yang, Chao Xu, Qijun Chen	In this work, we propose a perspective-aware convolutional neural network (PACNN) for efficient crowd counting, which integrates the perspective information into density regression to provide additional knowledge of the person scale change in an image.
736	Towards Universal Object Detection by Domain Attention	Xudong Wang, Zhaowei Cai, Dashan Gao, Nuno Vasconcelos	In this paper, we develop an effective and efficient universal object detection system that is capable of working on various image domains, from human faces and traffic signs to medical CT images.
737	Ensemble Deep Manifold Similarity Learning Using Hard Proxies	Nicolas Aziere, Sinisa Todorovic	We introduce a new time- and memory-efficient method for estimating the manifold similarities by using a closed-form convergence solution of the Random Walk algorithm.
738	Quantization Networks	Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-sheng Hua	In this paper, we provide a simple and uniform way for weights and activations quantization by formulating it as a differentiable non-linear function.
739	RES-PCA: A Scalable Approach to Recovering Low-Rank Matrices	Chong Peng, Chenglizhao Chen, Zhao Kang, Jianbo Li, Qiang Cheng	To combat this drawback, in this paper we propose a new type of RPCA method, RES-PCA, which is linearly efficient and scalable in both data size and dimension.
740	Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks	N. Dinesh Reddy, Minh Vo, Srinivasa G. Narasimhan	We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner.
741	Efficient Featurized Image Pyramid Network for Single Shot Detector	Yanwei Pang, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao	In this paper, we introduce a light-weight architecture to efficiently produce featurized image pyramid in a single-stage detection framework.
742	Multi-Task Multi-Sensor Fusion for 3D Object Detection	Ming Liang, Bin Yang, Yun Chen, Rui Hu, Raquel Urtasun	In this paper we propose to exploit multiple related tasks for accurate multi-sensor 3D object detection.
743	Domain-Specific Batch Normalization for Unsupervised Domain Adaptation	Woong-Gi Chang, Tackgeun You, Seonguk Seo, Suha Kwak, Bohyung Han	We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks.
744	Grid R-CNN	Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, Junjie Yan	This paper proposes a novel object detection framework named Grid R-CNN, which adopts a grid guided localization mechanism for accurate object detection.
745	MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition	Weihe Zhang, Yali Wang, Yu Qiao	To alleviate this problem, we propose a conceptually simple but effective MetaCleaner, which can learn to hallucinate a clean representation of an object category, according to a small noisy subset from the same category.
746	Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map	Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Thomas Probst, Luc Van Gool	A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: compact map construction and accurate self localization.
747	Triply Supervised Decoder Networks for Joint Detection and Segmentation	Jiale Cao, Yanwei Pang, Xuelong Li	In this paper, we propose a framework called TripleNet to deeply boost these two tasks.
748	Leveraging the Invariant Side of Generative Zero-Shot Learning	Jingjing Li, Mengmeng Jing, Ke Lu, Zhengming Ding, Lei Zhu, Zi Huang	In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions.
749	Exploring the Bounds of the Utility of Context for Object Detection	Ehud Barnea, Ohad Ben-Shahar	In this work we seek to improve our understanding of this phenomenon, in part by pursuing an opposite approach.
750	A-CNN: Annularly Convolutional Neural Networks on Point Clouds	Artem Komarichev, Zichun Zhong, Jing Hua	This paper presents a new method to define and compute convolution directly on 3D point clouds by the proposed annular convolution.
751	DARNet: Deep Active Ray Network for Building Segmentation	Dominic Cheng, Renjie Liao, Sanja Fidler, Raquel Urtasun	In this paper, we propose a Deep Active Ray Network (DARNet) for automatic building segmentation.
752	Point Cloud Oversegmentation With Graph-Structured Deep Metric Learning	Loic Landrieu, Mohamed Boussaha	We propose a new supervized learning framework for oversegmenting 3D point clouds into superpoints.
753	Graphonomy: Universal Human Parsing via Graph Transfer Learning	Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin	In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity.
754	Fitting Multiple Heterogeneous Models by Multi-Class Cascaded T-Linkage	Luca Magri, Andrea Fusiello	This paper addresses the problem of multiple models fitting in the general context where the sought structures can be described by a mixture of heterogeneous parametric models drawn from different classes.
755	A Late Fusion CNN for Digital Matting	Yunke Zhang, Lixue Gong, Lubin Fan, Peiran Ren, Qixing Huang, Hujun Bao, Weiwei Xu	This paper studies the structure of a deep convolutional neural network to predict the foreground alpha matte by taking a single RGB image as input.
756	BASNet: Boundary-Aware Salient Object Detection	Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, Martin Jagersand	In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection.
757	ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation	Di Lin, Dingguo Shen, Siting Shen, Yuanfeng Ji, Dani Lischinski, Daniel Cohen-Or, Hui Huang	In this work, we introduce ZigZagNet, which aggregates a richer multi-context feature map by using not only dense top-down and bottom-up propagation, but also by introducing pathways crossing between different levels of the top-down and the bottom-up hierarchies, in a zig-zag fashion.
758	Object Instance Annotation With Deep Extreme Level Set Evolution	Zian Wang, David Acuna, Huan Ling, Amlan Kar, Sanja Fidler	In this paper, we tackle the task of interactive object segmentation.
759	Leveraging Crowdsourced GPS Data for Road Extraction From Aerial Imagery	Tao Sun, Zonglin Di, Pengyu Che, Chun Liu, Yin Wang	In this paper, we propose to leverage crowdsourced GPS data to improve and support road extraction from aerial imagery.
760	Adaptive Pyramid Context Network for Semantic Segmentation	Junjun He, Zhongying Deng, Lei Zhou, Yali Wang, Yu Qiao	Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation.
761	Isospectralization, or How to Hear Shape, Style, and Correspondence	Luca Cosmo, Mikhail Panine, Arianna Rampini, Maks Ovsjanikov, Michael M. Bronstein, Emanuele Rodola	In this paper, we introduce a numerical procedure called isospectralization, consisting of deforming one shape to make its Laplacian spectrum match that of another.
762	Speech2Face: Learning the Face Behind a Voice	Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik	In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking.
763	Joint Manifold Diffusion for Combining Predictions on Decoupled Observations	Kwang In Kim, Hyung Jin Chang	We present a new predictor combination algorithm that improves a given task predictor based on potentially relevant reference predictors.
764	Audio Visual Scene-Aware Dialog	Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh	We introduce the task of scene-aware dialog.
765	Learning to Minify Photometric Stereo	Junxuan Li, Antonio Robles-Kelly, Shaodi You, Yasuyuki Matsushita	We propose a method that can dramatically decrease the demands on the number of images by learning the most informative ones under different illumination conditions.
766	Reflective and Fluorescent Separation Under Narrow-Band Illumination	Koji Koyamatsu, Daichi Hidaka, Takahiro Okabe, Hendrik P. A. Lensch	In this paper, we address the separation of reflective and fluorescent components in RGB images taken under narrow-band light sources such as LEDs.
767	Depth From a Polarisation + RGB Stereo Pair	Dizhong Zhu, William A. P. Smith	In this paper, we propose a hybrid depth imaging system in which a polarisation camera is augmented by a second image from a standard digital camera.
768	Rethinking the Evaluation of Video Summaries	Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila	In this paper, we will provide in-depth assessment of this pipeline using two popular benchmark datasets.
769	What Object Should I Use? – Task Driven Object Detection	Johann Sawatzky, Yaser Souri, Christian Grund, Jurgen Gall	We therefore introduce the COCO-Tasks dataset which comprises about 40,000 images where the most suitable objects for 14 tasks have been annotated. We furthermore propose an approach that detects the most suitable objects for a given task.
770	Triangulation Learning Network: From Monocular to Stereo 3D Object Detection	Zengyi Qin, Jinglu Wang, Yan Lu	In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information.
771	Connecting the Dots: Learning Representations for Active Monocular Depth Estimation	Gernot Riegler, Yiyi Liao, Simon Donne, Vladlen Koltun, Andreas Geiger	We propose a technique for depth estimation with a monocular structured-light camera, i.e., a calibrated stereo set-up with one camera and one laser projector.
772	Learning Non-Volumetric Depth Fusion Using Successive Reprojections	Simon Donne, Andreas Geiger	In this work we propose to learn an auto-regressive depth refinement directly from data. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network.
773	Stereo R-CNN Based 3D Object Detection for Autonomous Driving	Peiliang Li, Xiaozhi Chen, Shaojie Shen	We propose a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery.
774	Hybrid Scene Compression for Visual Localization	Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler	In this work, we introduce a new hybrid compression algorithm that uses a given memory limit in a more effective way.
775	MMFace: A Multi-Metric Regression Network for Unconstrained Face Reconstruction	Hongwei Yi, Chen Li, Qiong Cao, Xiaoyong Shen, Sheng Li, Guoping Wang, Yu-Wing Tai	We propose to address the face reconstruction in the wild by using a multi-metric regression network, MMFace, to align a 3D face morphable model (3DMM) to an input image.
776	3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis	Xiaojuan Qi, Zhengzhe Liu, Qifeng Chen, Jiaya Jia	In this paper, we propose a RGBD scene forecasting model with 3D motion decomposition.
777	Single Image Depth Estimation Trained via Depth From Defocus Cues	Shir Gur, Lior Wolf	In this work, we rely, instead of different views, on depth from focus cues.
778	RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion	Jie Li, Yu Liu, Dong Gong, Qinfeng Shi, Xia Yuan, Chunxia Zhao, Ian Reid	We introduce a light-weight Dimensional Decomposition Residual network (DDR) for 3D dense prediction tasks.
779	Neural Scene Decomposition for Multi-Person Motion Capture	Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua	In this paper, we therefore propose an approach to learning features that are useful for this purpose.
780	Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition	Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu, Tong Zhang, Jun Zhu	In this paper, we evaluate the robustness of state-of-the-art face recognition models in the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire hard-label predictions by sending queries to the target model.
781	FA-RPN: Floating Region Proposals for Face Detection	Mahyar Najibi, Bharat Singh, Larry S. Davis	We propose a novel approach for generating region proposals for performing face detection.
782	Bayesian Hierarchical Dynamic Model for Human Action Recognition	Rui Zhao, Wanru Xu, Hui Su, Qiang Ji	To address this issue, we propose a probabilistic model called Hierarchical Dynamic Model (HDM).
783	Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation	Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh	The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images.
784	3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training	Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli	In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints.
785	Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision	Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black	To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Additionally we create a new database of faces “not quite in-the-wild” (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions.
786	PoseFix: Model-Agnostic General Human Pose Refinement Network	Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee	In this paper, we propose a human pose refinement network that estimates a refined pose from a tuple of an input image and input pose.
787	RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation	Bastian Wandt, Bodo Rosenhahn	This paper addresses the problem of 3D human pose estimation from single images.
788	Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views	Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, Xiaowei Zhou	We propose a fast and robust approach to solve this problem.
789	Face-Focused Cross-Stream Network for Deception Detection in Videos	Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, Ji-Rong Wen	In this work, both problems are addressed.
790	Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data	Yaoyao Zhong, Weihong Deng, Mei Wang, Jiani Hu, Jianteng Peng, Xunqiang Tao, Yaohai Huang	In this paper, we propose a training strategy that treats the head data and the tail data in an unequal way, accompanying with noise-robust loss functions, to take full advantage of their respective characteristics.
791	T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor	Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, Maja Pantic	In this paper, we propose to fully parametrize Convolutional Neural Networks (CNNs) with a single high-order, low-rank tensor.
792	Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss	Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu	To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism.
793	Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video	Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, Ling Shao	In this work, we formalize abnormal event detection as a one-versus-rest binary classification problem.
794	DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition	Toby Perrett, Dima Damen	In this paper we introduce Dual-Domain LSTM (DDLSTM), an architecture that is able to learn temporal dependencies from two domains concurrently.
795	The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos	Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen	We present a new model to determine relative skill from long videos, through learnable temporal attention modules. We evaluate our approach on the EPIC-Skills dataset and additionally annotate a larger dataset from YouTube videos for skill determination with five previously unexplored tasks.
796	Collaborative Spatiotemporal Feature Learning for Video Action Recognition	Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu	In this paper, we propose a novel neural operation which encodes spatiotemporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters.
797	MARS: Motion-Augmented RGB Stream for Action Recognition	Nieves Crasto, Philippe Weinzaepfel, Karteek Alahari, Cordelia Schmid	In this paper, we introduce two learning approaches to train a standard 3D CNN, operating on RGB frames, that mimics the motion stream, and as a result avoids flow computation at test time.
798	Convolutional Relational Machine for Group Activity Recognition	Sina Mokhtarzadeh Azar, Mina Ghadimi Atigh, Ahmad Nickabadi, Alexandre Alahi	We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video.
799	Video Summarization by Learning From Unpaired Data	Mrigank Rochan, Yang Wang	We present an approach that learns to generate optimal video summaries using a set of raw videos (V) and a set of summary videos (S), where there exists no correspondence between V and S.
800	Skeleton-Based Action Recognition With Directed Graph Neural Networks	Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu	In this work, we represent the skeleton data as a directed acyclic graph based on the kinematic dependency between the joints and bones in the natural human body.
801	PA3D: Pose-Action 3D Machine for Video Recognition	An Yan, Yali Wang, Zhifeng Li, Yu Qiao	To fill this gap, we propose a concise Pose-Action 3D Machine (PA3D), which can effectively encode multiple pose modalities within a unified 3D framework, and consequently learn spatio-temporal pose representations for action recognition.
802	Deep Dual Relation Modeling for Egocentric Interaction Recognition	Haoxin Li, Yijun Cai, Wei-Shi Zheng	To exploit the strong relations for egocentric interaction recognition, we introduce a dual relation modeling framework which learns to model the relations between the camera wearer and the interactor based on the individual action representations of the two persons.
803	MOTS: Multi-Object Tracking and Segmentation	Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, Bastian Leibe	This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). We make our annotations, code, and models available at https://www.vision.rwth-aachen.de/page/mots.
804	Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking	Heng Fan, Haibin Ling	Addressing these issues, we propose a multi-stage tracking framework, Siamese Cascaded RPN (C-RPN), which consists of a sequence of RPNs cascaded from deep high-level to shallow low-level layers in a Siamese network.
805	PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds	Aseem Behl, Despoina Paschalidou, Simon Donne, Andreas Geiger	In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network.
806	Listen to the Image	Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang	To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient.
807	Image Super-Resolution by Neural Texture Transfer	Zhifei Zhang, Zhaowen Wang, Zhe Lin, Hairong Qi	This paper aims to unleash the potential of RefSR by leveraging more texture details from Ref images with stronger robustness even when irrelevant Ref images are provided. We build a benchmark dataset for the general research of RefSR, which contains Ref images paired with LR inputs with varying levels of similarity.
808	Conditional Adversarial Generative Flow for Controllable Image Synthesis	Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li	In this paper, based on modeling a joint probabilistic density of an image and its conditions, we propose a novel flow-based generative model named conditional adversarial generative flow (CAGlow).
809	How to Make a Pizza: Learning a Compositional Layer-Based GAN Model	Dim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, Antonio Torralba	In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure.
810	TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation	Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, Chen Change Loy	In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task.
811	Depth-Attentional Features for Single-Image Rain Removal	Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Pheng-Ann Heng	In this work, we first analyze the visual effects of rain subject to scene depth and formulate a rain imaging model collectively with rain streaks and fog; by then, we prepare a new dataset called RainCityscapes with rain streaks and fog on real outdoor photos.
812	Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior	Lizhi Wang, Chen Sun, Ying Fu, Min H. Kim, Hua Huang	In this paper, we present a novel hyperspectral image reconstruction algorithm that substitutes the traditional hand-crafted prior with a data-driven prior, based on an optimization-inspired network.
813	LiFF: Light Field Features in Scale and Depth	Donald G. Dansereau, Bernd Girod, Gordon Wetzstein	Building on spatio-angular imaging modalities offered by emerging light field cameras, we introduce a new and computationally efficient 4D light field feature detector and descriptor: LiFF.
814	Deep Exemplar-Based Video Colorization	Bo Zhang, Mingming He, Jing Liao, Pedro V. Sander, Lu Yuan, Amine Bermak, Dong Chen	To address this issue, we introduce a recurrent framework that unifies the semantic correspondence and color propagation steps.
815	On Finding Gray Pixels	Yanlin Qian, Joni-Kristian Kamarainen, Jarno Nikkanen, Jiri Matas	We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation.
816	UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos	Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yi Yang, Wei Xu	In this paper, we propose UnOS, an unified system for unsupervised optical flow and stereo depth estimation using convolutional neural network (CNN) by taking advantages of their inherent geometrical consistency based on the rigid-scene assumption.
817	Learning Transformation Synchronization	Xiangru Huang, Zhenxiao Liang, Xiaowei Zhou, Yao Xie, Leonidas J. Guibas, Qixing Huang	Instead of merely using the relative transformations as the input to perform transformation synchronization, we propose to use a neural network to learn the weights associated with each relative transformation.
818	D2-Net: A Trainable CNN for Joint Description and Detection of Local Features	Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, Torsten Sattler	In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions.
819	Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring	Seungjun Nah, Sanghyun Son, Kyoung Mu Lee	In this work, we aim to improve the accuracy of recurrent models by adapting the hidden states transferred from past frames to the frame being processed so that the relations between video frames could be better used.
820	Learning to Extract Flawless Slow Motion From Blurry Videos	Meiguang Jin, Zhe Hu, Paolo Favaro	In this paper, we introduce the task of generating a sharp slow-motion video given a low frame rate blurry video.
821	Natural and Realistic Single Image Super-Resolution With Explicit Natural Manifold Discrimination	Jae Woong Soh, Gu Yong Park, Junho Jo, Nam Ik Cho	Therefore, in this paper, we present a new approach to reconstructing realistic super-resolved images with high perceptual quality, while maintaining the naturalness of the result.
822	RF-Net: An End-To-End Image Matching Network Based on Receptive Field	Xuelun Shen, Cheng Wang, Xin Li, Zenglei Yu, Jonathan Li, Chenglu Wen, Ming Cheng, Zijian He	This paper proposes a new end-to-end trainable matching network based on receptive field, RF-Net, to compute sparse correspondence between images.
823	Fast Single Image Reflection Suppression via Convex Optimization	Yang Yang, Wenye Ma, Yin Zheng, Jian-Feng Cai, Weiyu Xu	We propose a convex model to suppress the reflection from a single input image.
824	A Mutual Learning Method for Salient Object Detection With Intertwined Multi-Supervision	Runmin Wu, Mengyang Feng, Wenlong Guan, Dong Wang, Huchuan Lu, Errui Ding	To alleviate these issues, we propose to train saliency detection networks by exploiting the supervision from not only salient object detection, but also foreground contour detection and edge detection.
825	Enhanced Pix2pix Dehazing Network	Yanyun Qu, Yizi Chen, Jingying Huang, Yuan Xie	In this paper, we reduce the image dehazing problem to an image-to-image translation problem, and propose Enhanced Pix2pix Dehazing Network (EPDN), which generates a haze-free image without relying on the physical scattering model.
826	Assessing Personally Perceived Image Quality via Image Features and Collaborative Filtering	Jari Korhonen	In this study, we aim to predict the personally perceived image quality by combining classical image feature analysis and collaboration filtering approach known from the recommendation systems.
827	Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements	Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, Hua Huang	In this paper, we address these issues by exploiting targeted network enhancements and the novel use of misaligned data.
828	Exploring Context and Visual Pattern of Relationship for Scene Graph Generation	Wenbin Wang, Ruiping Wang, Shiguang Shan, Xilin Chen	Therefore, we present our so-called Relationship Context – InterSeCtion Region (CISC) method.
829	Learning From Synthetic Data for Crowd Counting in the Wild	Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan	Learning From Synthetic Data for Crowd Counting in the Wild. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/.
830	A Local Block Coordinate Descent Algorithm for the CSC Model	Ev Zisselman, Jeremias Sulam, Michael Elad	In this work we propose a new and simple approach that adopts a localized strategy, based on the Block Coordinate Descent algorithm.
831	Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation	Rakshith Shetty, Bernt Schiele, Mario Fritz	We propose a method to quantify the sensitivity of black-box vision models to visual context by editing images to remove selected objects and measuring the response of the target models.
832	Discovering Fair Representations in the Data Domain	Novi Quadrianto, Viktoriia Sharmanska, Oliver Thomas	We propose to cast this problem as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced.
833	Actor-Critic Instance Segmentation	Nikita Araslanov, Constantin A. Rothkopf, Stefan Roth	In this work, we revisit the recurrent formulation of this challenging problem in the context of reinforcement learning.
834	Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders	Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata	In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders.
835	Semantic Projection Network for Zero- and Few-Label Semantic Segmentation	Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, Zeynep Akata	In this paper we take this one step further and focus on the challenging task of zero- and few-shot learning of semantic segmentation.
836	GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation	Xinhong Ma, Tianzhu Zhang, Changsheng Xu	Different from existing methods, we propose an end-to-end Graph Convolutional Adversarial Network (GCAN) for unsupervised domain adaptation by jointly modeling data structure, domain label, and class label in a unified deep framework.
837	Seamless Scene Segmentation	Lorenzo Porzi, Samuel Rota Bulo, Aleksander Colovic, Peter Kontschieder	In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results.
838	Unsupervised Image Matching and Object Discovery as Optimization	Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann LeCun, Patrick Perez, Jean Ponce	We focus here on the unsupervised discovery and matching of object cate- gories among images in a collection, following the work of Cho et al. [12].
839	Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs	Qi Zhang, Antoni B. Chan	In this paper, we propose a deep neural network framework for multi-view crowd counting, which fuses information from multiple camera views to predict a scene-level density map on the ground-plane of the 3D world.
840	Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions	Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara	In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability.
841	Towards VQA Models That Can Read	Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach	Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem.
842	Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning	Junchao Zhang, Yuxin Peng	In this paper, we propose a new video captioning approach based on object-aware aggregation with bidirectional temporal graph (OA-BTG), which captures detailed temporal dynamics for salient objects in video, and learns discriminative spatio-temporal representations by performing object-aware local feature aggregation on detected object regions.
843	Progressive Attention Memory Network for Movie Story Question Answering	Junyeong Kim, Minuk Ma, Kyungsu Kim, Sungjin Kim, Chang D. Yoo	This paper proposes the progressive attention memory network (PAMN) for movie story question answering (QA).
844	Memory-Attended Recurrent Network for Video Captioning	Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai	To tackle this limitation, we propose the Memory-Attended Recurrent Network (MARN) for video captioning, in which a memory structure is designed to explore the full-spectrum correspondence between a word and its various similar visual contexts across videos in training data.
845	Visual Query Answering by Entity-Attribute Graph Matching and Reasoning	Peixi Xiong, Huayi Zhan, Xin Wang, Baivab Sinha, Ying Wu	This paper proposes a novel method to address the VQA problem. We also create a dataset on soccer match (Soccer-VQA) with rich annotations.
846	Look Back and Predict Forward in Image Captioning	Yu Qin, Jiajun Du, Yonghua Zhang, Hongtao Lu	We propose Look Back (LB) method to embed visual information from the past and Predict Forward (PF) approach to look into future.
847	Explainable and Explicit Visual Reasoning Over Scene Graphs	Jiaxin Shi, Hanwang Zhang, Juanzi Li	We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using scene graphs — objects as nodes and the pairwise relationships as edges — for explainable and explicit reasoning with structured knowledge.
848	Transfer Learning via Unsupervised Task Discovery for Visual Question Answering	Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han	We tackle this problem in two steps: 1) learning a task conditional visual classifier, which is capable of solving diverse question-specific visual recognition tasks, based on unsupervised task discovery and 2) transferring the task conditional visual classifier to visual question answering models.
849	Intention Oriented Image Captions With Guiding Objects	Yue Zheng, Yali Li, Shengjin Wang	In this paper, we propose a novel approach for generating image captions with guiding objects (CGO). With CGO, we can extend the ability of description to the objects being neglected in image caption labels and provide a set of more comprehensive and diverse descriptions for an image.
850	Uncertainty Guided Multi-Scale Residual Learning-Using a Cycle Spinning CNN for Single Image De-Raining	Rajeev Yasarla, Vishal M. Patel	The proposed Uncertainty guided Multi-scale Residual Learning (UMRL) network attempts to address this issue by learning the rain content at different scales and using them to estimate the final de-rained output.
851	Toward Realistic Image Compositing With Adversarial Learning	Bor-Chun Chen, Andrew Kae	In this work we propose a generative adversarial network (GAN) architecture for automatic image compositing.
852	Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics	Yaron Meirovitch, Lu Mi, Hayk Saribekyan, Alexander Matveev, David Rolnick, Nir Shavit	Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack.
853	Deep ChArUco: Dark ChArUco Marker Pose Estimation	Danying Hu, Daniel DeTone, Tomasz Malisiewicz	We present Deep ChArUco, a real-time pose estimation system which combines two custom deep networks, ChArUcoNet and RefineNet, with the Perspective-n-Point (PnP) algorithm to estimate the marker’s 6DoF pose.
854	Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving	Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger	Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations — essentially mimicking the LiDAR signal.
855	Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions	Joey Hong, Benjamin Sapp, James Philbin	We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior.
856	Metric Learning for Image Registration	Marc Niethammer, Roland Kwitt, Francois-Xavier Vialard	Instead of learning the entire registration approach, we learn a spatially-adaptive regularizer within a registration model.
857	LO-Net: Deep Real-Time Lidar Odometry	Qing Li, Shaoyang Chen, Cheng Wang, Xin Li, Chenglu Wen, Ming Cheng, Jonathan Li	We present a novel deep convolutional network pipeline, LO-Net, for real-time lidar odometry estimation.
858	TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions	Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha	We present a new algorithm for predicting the near-term trajectories of road agents in dense traffic videos. We evaluate the performance of our prediction algorithm, TraPHic, on the standard datasets and also introduce a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories.
859	World From Blur	Jiayan Qiu, Xinchao Wang, Stephen J. Maybank, Dacheng Tao	We show in this paper that a 3D scene can be revealed.
860	Topology Reconstruction of Tree-Like Structure in Images via Structural Similarity Measure and Dominant Set Clustering	Jianyang Xie, Yitian Zhao, Yonghuai Liu, Pan Su, Yifan Zhao, Jun Cheng, Yalin Zheng, Jiang Liu	In this paper, we propose a novel curvilinear structural similarity measure to guide a dominant-set clustering approach to address this indispensable issue.
861	Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training	Feng Zheng, Cheng Deng, Xing Sun, Xinyang Jiang, Xiaowei Guo, Zongqiao Yu, Feiyue Huang, Rongrong Ji	In this paper, we propose a novel coarse-to-fine pyramid model to relax the need of bounding boxes, which not only incorporates local and global information, but also integrates the gradual cues between them.
862	Holistic and Comprehensive Annotation of Clinically Significant Findings on Diverse CT Images: Learning From Radiology Reports and Label Ontology	Ke Yan, Yifan Peng, Veit Sandfort, Mohammadhadi Bagheri, Zhiyong Lu, Ronald M. Summers	In this paper, we study the lesion description or annotation problem.
863	Robust Histopathology Image Analysis: To Label or to Synthesize?	Le Hou, Ayush Agarwal, Dimitris Samaras, Tahsin M. Kurc, Rajarsi R. Gupta, Joel H. Saltz	We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type.
864	Data Augmentation Using Learned Transformations for One-Shot Medical Image Segmentation	Amy Zhao, Guha Balakrishnan, Fredo Durand, John V. Guttag, Adrian V. Dalca	We present an automated data augmentation method for synthesizing labeled medical images.
865	Shifting More Attention to Video Salient Object Detection	Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, Jianbing Shen	This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions.
866	Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration	De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles	Our goal is to generate a policy to complete an unseen task given just a single video demonstration of the task in a given domain.
867	Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry	Fei Xue, Xin Wang, Shunkai Li, Qiuyuan Wang, Junqiu Wang, Hongbin Zha	In contrast, we present a VO framework by incorporating two additional components called Memory and Refining.
868	Image Generation From Layout	Bo Zhao, Lili Meng, Weidong Yin, Leonid Sigal	To address these challenges, we propose a novel approach for layout-based image generation; we call it Layout2Im.
869	Multimodal Explanations by Predicting Counterfactuality in Videos	Atsushi Kanehira, Kentaro Takemoto, Sho Inayoshi, Tatsuya Harada	Our goal is not only to classify a video into a specific category, but also to provide explanations on why it is not categorized to a specific class with combinations of visual-linguistic information.
870	Learning to Explain With Complemental Examples	Atsushi Kanehira, Tatsuya Harada	We propose a novel framework to generate complemental explanations, on which the joint distribution of the variables to explain, and those to be explained is parameterized by three different neural networks: predictor, linguistic explainer, and example selector.
871	HAQ: Hardware-Aware Automated Quantization With Mixed Precision	Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han	In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator’s feedback in the design loop.
872	Content Authentication for Neural Imaging Pipelines: End-To-End Optimization of Photo Provenance in Complex Distribution Channels	Pawel Korus, Nasir Memon	This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel.
873	Inverse Procedural Modeling of Knitwear	Elena Trunz, Sebastian Merzbach, Jonathan Klein, Thomas Schulze, Michael Weinmann, Reinhard Klein	While recent approaches are focused on woven cloth, we present a novel practical approach for the inference of more complex knitwear structures as well as the respective knitting instructions from only a single image without attached annotations.
874	Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video	Zongmian Li, Jiri Sedlar, Justin Carpentier, Ivan Laptev, Nicolas Mansard, Josef Sivic	In this paper, we introduce a method to automatically reconstruct the 3D motion of a person interacting with an object from a single RGB video.
875	DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds	Li Ding, Chen Feng	We propose DeepMapping, a novel registration framework using deep neural networks (DNNs) as auxiliary functions to align multiple point clouds from scratch to a globally consistent frame.
876	End-To-End Interpretable Neural Motion Planner	Wenyuan Zeng, Wenjie Luo, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun	In this paper, we propose a neural motion planner for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users.
877	Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model	Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, Ying Nian Wu	This paper proposes the divergence triangle as a framework for joint training of a generator model, energy-based model and inference model.
878	Image Deformation Meta-Networks for One-Shot Learning	Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert	Our key insight is that, while the deformed images may not be visually realistic, they still maintain critical semantic information and contribute significantly to formulating classifier decision boundaries.
879	Online High Rank Matrix Completion	Jicong Fan, Madeleine Udell	In this paper, we develop a new model for high rank matrix completion (HRMC), together with batch and online methods to fit the model and out-of-sample extension to complete new data.
880	Multispectral Imaging for Fine-Grained Recognition of Powders on Complex Backgrounds	Tiancheng Zhi, Bernardo R. Pires, Martial Hebert, Srinivasa G. Narasimhan	We present a method to select discriminative spectral bands to significantly reduce acquisition time while improving recognition accuracy. To address these challenges, we present the first comprehensive dataset and approach for powder recognition using multi-spectral imaging.
881	ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging	Samarth Brahmbhatt, Cusuh Ham, Charles C. Kemp, James Hays	We present ContactDB, a novel dataset of contact maps for household objects that captures the rich hand-object contact that occurs during grasping, enabled by use of a thermal camera.
882	Robust Subspace Clustering With Independent and Piecewise Identically Distributed Noise Modeling	Yuanman Li, Jiantao Zhou, Xianwei Zheng, Jinyu Tian, Yuan Yan Tang	In this work, we propose an independent and piecewise identically distributed (i.p.i.d.) noise model, where the i.i.d. property only holds locally.
883	What Correspondences Reveal About Unknown Camera and Motion Models?	Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool	In this paper, we tackle this problem in two steps.
884	Self-Calibrating Deep Photometric Stereo Networks	Guanying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong	This paper proposes an uncalibrated photometric stereo method for non-Lambertian scenes based on deep learning.
885	Argoverse: 3D Tracking and Forecasting With Rich Maps	Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, James Hays	We present Argoverse, a dataset designed to support autonomous vehicle perception tasks including 3D tracking and motion forecasting.
886	Side Window Filtering	Hui Yin, Yuanhao Gong, Guoping Qiu	Based on this insight, we propose a new Side Window Filtering (SWF) technique which aligns the window’s side or corner with the pixel being processed.
887	Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search	Abhimanyu Dubey, Laurens van der Maaten, Zeki Yalniz, Yixuan Li, Dhruv Mahajan	In this work, we hypothesize that adversarial perturbations move the image away from the image manifold in the sense that there exists no physical process that could have produced the adversarial image.
888	Incremental Object Learning From Contiguous Views	Stefan Stojanov, Samarth Mishra, Ngoc Anh Thai, Nikhil Dhanda, Ahmad Humayun, Chen Yu, Linda B. Smith, James M. Rehg	In this work, we present CRIB (Continual Recognition Inspired by Babies), a synthetic incremental object learning environment that can produce data that models visual imagery produced by object exploration in early infancy.
889	IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition	Xiaoping Wu, Chi Zhan, Yu-Kun Lai, Ming-Ming Cheng, Jufeng Yang	In this paper, we collect a large-scale dataset named IP102 for insect pest recognition.
890	CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification	Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, Jenq-Neng Hwang	This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km.
891	Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence	Amir Zadeh, Michael Chan, Paul Pu Liang, Edmund Tong, Louis-Philippe Morency	In this paper, we introduce Social-IQ, a unconstrained benchmark specifically designed to train and evaluate socially intelligent technologies.
892	UPSNet: A Unified Panoptic Segmentation Network	Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun	In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task.
893	JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds With Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields	Quang-Hieu Pham, Thanh Nguyen, Binh-Son Hua, Gemma Roig, Sai-Kit Yeung	In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds.
894	Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth	Davy Neven, Bert De Brabandere, Marc Proesmans, Luc Van Gool	In this work we propose a new clustering loss function for proposal-free instance segmentation.
895	DeepCO3: Deep Instance Co-Segmentation by Co-Peak Search and Co-Saliency Detection	Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang	In this paper, we address a new task called instance co-segmentation.
896	Improving Semantic Segmentation via Video Propagation and Label Relaxation	Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro	In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks.
897	Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video	Samvit Jain, Xin Wang, Joseph E. Gonzalez	We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that extracts high-detail features on a reference keyframe, and warps these features forward using frame-to-frame optical flow estimates, and (2) an update branch that computes features of adjustable quality on the current frame, performing a temporal update at each video frame.
898	Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes	Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu	For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input.
899	Semantic Correlation Promoted Shape-Variant Context for Segmentation	Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, Gang Wang	In this work, we propose to generate a scale- and shape-variant semantic mask for each pixel to confine its contextual region.
900	Relation-Shape Convolutional Neural Network for Point Cloud Analysis	Yongcheng Liu, Bin Fan, Shiming Xiang, Chunhong Pan	In this paper, we propose RS-CNN, namely, Relation-Shape Convolutional Neural Network, which extends regular grid CNN to irregular configuration for point cloud analysis.
901	Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network	Wenda Zhao, Bowen Zheng, Qiuhua Lin, Huchuan Lu	In this paper, we propose a novel learning strategy by breaking DBD problem into multiple smaller defocus blur detectors and thus estimate errors can cancel out each other.
902	BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames	Brent A. Griffin, Jason J. Corso	This paper addresses the problem of learning to suggest the single best frame across the video for user annotation-this is, in fact, never the first frame of video.
903	Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images	Wuyang Chen, Ziyu Jiang, Zhangyang Wang, Kexin Cui, Xiaoning Qian	We propose collaborative Global-Local Networks (GLNet) to effectively preserve both global and local information in a highly memory-efficient manner.
904	Efficient Parameter-Free Clustering Using First Neighbor Relations	Saquib Sarfraz, Vivek Sharma, Rainer Stiefelhagen	We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data.
905	Learning Personalized Modular Network Guided by Structured Knowledge	Xiaodan Liang	In this paper, we treat the structured commonsense knowledge (e.g. concept hierarchy) as the guidance of customizing more powerful and explainable network structures for distinct inputs, leading to dynamic and individualized inference paths.
906	A Generative Appearance Model for End-To-End Video Object Segmentation	Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, Michael Felsberg	To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass.
907	A Flexible Convolutional Solver for Fast Style Transfers	Gilles Puy, Patrick Perez	We propose a new flexible deep convolutional neural network (convnet) to perform fast neural style transfers.
908	Cross Domain Model Compression by Structurally Weight Sharing	Shangqian Gao, Cheng Deng, Heng Huang	In this paper, thus, we propose a new robust cross domain model compression method.
909	TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning	Matthew Amodio, Smita Krishnaswamy	For this purpose, we introduce a novel GAN based on preserving intra-domain vector transformations in a latent space learned by a siamese network.
910	Deep Robust Subjective Visual Property Prediction in Crowdsourcing	Qianqian Xu, Zhiyong Yang, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang, Yuan Yao	In this paper, we construct a deep SVP prediction model which not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations.
911	Transferable AutoML by Model Sharing Over Grouped Datasets	Chao Xue, Junchi Yan, Rong Yan, Stephen M. Chu, Yonggang Hu, Yonghua Lin	This paper presents a so-called transferable AutoML approach that leverages previously trained models to speed up the search process for new tasks and datasets.
912	Learning Not to Learn: Training Deep Neural Networks With Biased Data	Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, Junmo Kim	We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased.
913	IRLAS: Inverse Reinforcement Learning for Architecture Search	Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan	In this paper, we propose an inverse reinforcement learning method for architecture search (IRLAS), which trains an agent to learn to search network structures that are topologically inspired by human-designed network.
914	Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences	Seonguk Seo, Paul Hongsuck Seo, Bohyung Han	We propose a generic framework to calibrate accuracy and confidence of a prediction in deep neural networks through stochastic inferences.
915	Attention-Based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions	Masanori Suganuma, Xing Liu, Takayuki Okatani	For this purpose, we propose a simple yet effective layer architecture of neural networks.
916	Fully Learnable Group Convolution for Acceleration of Deep Neural Networks	Xijun Wang, Meina Kan, Shiguang Shan, Xilin Chen	To reduce the high computational and memory cost, in this work, we propose a fully learnable group convolution module (FLGC for short) which is quite efficient and can be embedded into any deep neural networks for acceleration.
917	EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching From Scratch	Jian Ren, Zhe Li, Jianchao Yang, Ning Xu, Tianbao Yang, David J. Foran	In this paper, we propose an Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of succession, extinction, mimicry, and gene duplication to search neural network structure from scratch with poorly initialized simple network and few constraints forced during the evolution, as we assume no prior knowledge about the task domain.
918	Deep Incremental Hashing Network for Efficient Image Retrieval	Dayan Wu, Qi Dai, Jing Liu, Bo Li, Weiping Wang	In this paper, we propose a novel deep hashing framework, called Deep Incremental Hashing Network (DIHN), for learning hash codes in an incremental manner.
919	Robustness via Curvature Regularization, and Vice Versa	Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, Pascal Frossard	In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries.
920	SparseFool: A Few Pixels Make a Big Difference	Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard	In this paper, we exploit the low mean curvature of the decision boundary, and propose SparseFool, a geometry inspired sparse attack that controls the sparsity of the perturbations.
921	Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks	Jorg Wagner, Jan Mathias Kohler, Tobias Gindele, Leon Hetzel, Jakob Thaddaus Wiedemer, Sven Behnke	In this work, we propose a post-hoc, optimization based visual explanation method, which highlights the evidence in the input image for a specific prediction.
922	Structured Pruning of Neural Networks With Budget-Aware Regularization	Carl Lemaire, Andrew Achkar, Pierre-Marc Jodoin	To overcome this, we introduce a budgeted regularized pruning framework for deep CNNs.
923	MBS: Macroblock Scaling for CNN Model Reduction	Yu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang	In this paper we propose the macroblock scaling (MBS) algorithm, which can be applied to various CNN architectures to reduce their model size.
924	Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells	Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid	In this work, we are particularly interested in searching for high-performance compact segmentation architectures, able to run in real-time using limited resources.
925	Generating 3D Adversarial Point Clouds	Chong Xiang, Charles R. Qi, Bo Li	In this work, we propose several novel algorithms to craft adversarial point clouds against PointNet, a widely used deep neural network for point cloud processing.
926	Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search	Xin Li, Yiming Zhou, Zheng Pan, Jiashi Feng	In this work, we propose an algorithm that can offer better speed/accuracy trade-off of searched networks, which is termed “Partial Order Pruning”.
927	Memory in Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity From Spatiotemporal Dynamics	Yunbo Wang, Jianjin Zhang, Hongyu Zhu, Mingsheng Long, Jianmin Wang, Philip S. Yu	We propose the Memory In Memory (MIM) networks and corresponding recurrent blocks for this purpose.
928	Variational Information Distillation for Knowledge Transfer	Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai	We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks.
929	You Look Twice: GaterNet for Dynamic Filter Selection in CNNs	Zhourong Chen, Yang Li, Samy Bengio, Si Si	In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs).
930	SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360deg Images	Yeonkun Lee, Jaeseok Jeong, Jongseob Yun, Wonjune Cho, Kuk-Jin Yoon	This paper presents a novel method to resolve such prob-lems of applying CNNs to omni-directional images.
931	ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network	Sachin Mehta, Mohammad Rastegari, Linda Shapiro, Hannaneh Hajishirzi	We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data.
932	Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors	Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, Babak N. Araabi	We present a simple yet effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed.
933	Exploiting Edge Features for Graph Neural Networks	Liyu Gong, Qiang Cheng	In this paper, we build a new framework for a family of new graph neural network models that can more sufficiently exploit edge features, including those of undirected or multi-dimensional edges.
934	Propagation Mechanism for Deep and Wide Neural Networks	Dejiang Xu, Mong Li Lee, Wynne Hsu	In this paper, we propose a new propagation mechanism called channel-wise addition (cAdd) to deal with the vanishing gradients problem without sacrificing the complexity of the learned features.
935	Catastrophic Child’s Play: Easy to Perform, Hard to Defend Adversarial Attacks	Chih-Hui Ho, Brandon Leung, Erik Sandstrom, Yen Chang, Nuno Vasconcelos	A framework for the study of such attacks is proposed, using real world object manipulations.
936	Embedding Complementary Deep Networks for Image Classification	Qiuyu Chen, Wei Zhang, Jun Yu, Jianping Fan	In this paper, a deep embedding algorithm is developed to achieve higher accuracy rates on large-scale image classification.
937	Deep Multimodal Clustering for Unsupervised Audiovisual Learning	Di Hu, Feiping Nie, Xuelong Li	To settle this problem, we propose to adequately excavate audio and visual components and perform elaborate correspondence learning among them.
938	Dense Classification and Implanting for Few-Shot Learning	Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, Andrei Bursuc	We propose two simple and effective solutions: (i) dense classification over feature maps, which for the first time studies local activations in the domain of few-shot learning, and (ii) implanting, that is, attaching new neurons to a previously trained network to learn new, task-specific features.
939	Class-Balanced Loss Based on Effective Number of Samples	Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie	In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish.
940	Discovering Visual Patterns in Art Collections With Spatially-Consistent Feature Learning	Xi Shen, Alexei A. Efros, Mathieu Aubry	Our goal in this paper is to discover near duplicate patterns in large collections of artworks.
941	Min-Max Statistical Alignment for Transfer Learning	Samitha Herath, Mehrtash Harandi, Basura Fernando, Richard Nock	We question the capability of this school of thought and propose to minimize the maximum disparity between domains.
942	Spatial-Aware Graph Relation Network for Large-Scale Object Detection	Hang Xu, Chenhan Jiang, Xiaodan Liang, Zhenguo Li	In this work, we introduce a Spatial-aware Graph Relation Network (SGRN) to adaptive discover and incorporate key semantic and spatial relationships for reasoning over each object.
943	Deformable ConvNets V2: More Deformable, Better Results	Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai	To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training.
944	Interaction-And-Aggregation Network for Person Re-Identification	Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen	In this paper, we propose a novel network structure, Interaction-and-Aggregation (IA), to enhance the feature representation capability of CNNs.
945	Rare Event Detection Using Disentangled Representation Learning	Ryuhei Hamaguchi, Ken Sakurada, Ryosuke Nakamura	This paper presents a novel method for rare event detection from an image pair with class-imbalanced datasets.
946	Shape Robust Text Detection With Progressive Scale Expansion Network	Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao	To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes.
947	Dual Encoding for Zero-Example Video Retrieval	Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang	In contrast, this paper takes a concept-free approach, proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.
948	MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors	Lile Cai, Bin Zhao, Zhe Wang, Jie Lin, Chuan Sheng Foo, Mohamed Sabry Aly, Vijay Chandrasekhar	In this paper, we introduce MaxpoolNMS, a parallelizable alternative to the NMS algorithm, which is based on max-pooling classification score maps.
949	Character Region Awareness for Text Detection	Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee	In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters.
950	Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features	Vlad Hosu, Bastian Goldlucke, Dietmar Saupe	We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database.
951	Attentive Region Embedding Network for Zero-Shot Learning	Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao	In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task.
952	Explicit Spatial Encoding for Deep Local Descriptors	Arun Mukundan, Giorgos Tolias, Ondrej Chum	We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations.
953	Panoptic Segmentation	Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar	The aim of our work is to revive the interest of the community in a more unified view of image segmentation.
954	You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection	Krishna Kumar Singh, Yong Jae Lee	We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection.
955	Explore-Exploit Graph Traversal for Image Retrieval	Cheng Chang, Guangwei Yu, Chundi Liu, Maksims Volkovs	We propose a novel graph-based approach for image retrieval.
956	Dissimilarity Coefficient Based Weakly Supervised Object Detection	Aditya Arun, C.V. Jawahar, M. Pawan Kumar	We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category.
957	Kernel Transformer Networks for Compact Spherical Convolution	Yu-Chuan Su, Kristen Grauman	We present the Kernel Transformer Network (KTN) to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360deg images.
958	Object Detection With Location-Aware Deformable Convolution and Backward Attention Filtering	Chen Zhang, Joohee Kim	In this paper, we propose a location-aware deformable convolution and a backward attention filtering to improve the detection performance.
959	Variational Prototyping-Encoder: One-Shot Learning With Prototypical Images	Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, In So Kweon	We propose a new approach called variational prototyping-encoder (VPE) that learns the image translation task from real-world input images to their corresponding prototypical images as a meta-task.
960	Unsupervised Domain Adaptation Using Feature-Whitening and Consensus Loss	Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulo, Nicu Sebe, Elisa Ricci	In this work we introduce a novel deep learning framework which unifies different paradigms in unsupervised domain adaptation.
961	FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation	Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen	In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning.
962	PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation	Fenggen Yu, Kun Liu, Yan Zhang, Chenyang Zhu, Kai Xu	It achieves the state-of-the-art performance, both for fine-grained and semantic segmentation, on the public benchmark and a new benchmark of fine-grained segmentation proposed in this work.
963	Learning Multi-Class Segmentations From Single-Class Datasets	Konstantin Dmitriev, Arie E. Kaufman	While existing segmentation research in such domains use private multi-class datasets or focus on single-class segmentations, we propose a unified highly efficient framework for robust simultaneous learning of multi-class segmentations by combining single-class datasets and utilizing a novel way of conditioning a convolutional network for the purpose of segmentation.
964	Convolutional Recurrent Network for Road Boundary Extraction	Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Shenlong Wang, Raquel Urtasun	In this paper, we tackle the problem of drivable road boundary extraction from LiDAR and camera imagery.
965	DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation	Hanchao Li, Pengfei Xiong, Haoqiang Fan, Jian Sun	This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints.
966	A Cross-Season Correspondence Dataset for Robust Semantic Segmentation	Mans Larsson, Erik Stenborg, Lars Hammarstrand, Marc Pollefeys, Torsten Sattler, Fredrik Kahl	In this paper, we present a method to utilize 2D-2D point matches between images taken during different image conditions to train a convolutional neural network for semantic segmentation.
967	ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features	Yue Wu, Wael AbdAlmageed, Premkumar Natarajan	To fight against real-life image forgery, which commonly involves different types and combined manipulations, we propose a unified deep neural architecture called ManTra-Net.
968	On Zero-Shot Recognition of Generic Objects	Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi	In this paper, we argue that the main reason behind this apparent lack of progress is the poor quality of this benchmark.
969	Explicit Bias Discovery in Visual Question Answering Models	Varun Manjunatha, Nirat Saini, Larry S. Davis	It is of interest to the community to explicitly discover such biases, both for understanding the behavior of such models, and towards debugging them. Our work address this problem.
970	REPAIR: Removing Representation Bias by Dataset Resampling	Yi Li, Nuno Vasconcelos	REPAIR: Removing Representation Bias by Dataset Resampling The tools used for characterizing representation bias, and the proposed dataset REPAIR algorithm, are available at https://github.com/JerryYLi/Dataset-REPAIR/.
971	Label Efficient Semi-Supervised Learning via Graph Filtering	Qimai Li, Xiao-Ming Wu, Han Liu, Xiaotong Zhang, Zhichao Guan	In this paper, we address label efficient semi-supervised learning from a graph filtering perspective.
972	MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection	Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger	We introduce the MVTec Anomaly Detection (MVTec AD) dataset containing 5354 high-resolution color images of different object and texture categories.
973	ABC: A Big CAD Model Dataset for Geometric Deep Learning	Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, Daniele Panozzo	We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications.
974	Tightness-Aware Evaluation Protocol for Scene Text Detection	Yuliang Liu, Lianwen Jin, Zecheng Xie, Canjie Luo, Shuaitao Zhang, Lele Xie	Therefore, this paper proposes a novel evaluation protocol called Tightness-aware Intersect-over-Union (TIoU) metric that could quantify completeness of ground truth, compactness of detection, and tightness of matching degree.
975	PointConv: Deep Convolutional Networks on 3D Point Clouds	Wenxuan Wu, Zhongang Qi, Li Fuxin	In this paper, we extend the dynamic filter to a new convolution operation, named PointConv.
976	Octree Guided CNN With Spherical Kernels for 3D Point Clouds	Huan Lei, Naveed Akhtar, Ajmal Mian	We propose an octree guided neural network architecture and spherical convolutional kernel for machine learning from arbitrary 3D point clouds.
977	VITAMIN-E: VIsual Tracking and MappINg With Extremely Dense Feature Points	Masashi Yokozuka, Shuji Oishi, Simon Thompson, Atsuhiko Banno	In this paper, we propose a novel indirect monocular simultaneous localization and mapping (SLAM) algorithm called “VITAMIN-E,” which is highly accurate and robust as a result of tracking extremely dense feature points.
978	Conditional Single-View Shape Generation for Multi-View Stereo Reconstruction	Yi Wei, Shaohui Liu, Wang Zhao, Jiwen Lu	In this paper, we present a new perspective towards image-based shape generation.
979	Learning to Adapt for Stereo	Alessio Tonioni, Oscar Rahnama, Thomas Joy, Luigi Di Stefano, Thalaiyasingam Ajanthan, Philip H.S. Torr	In this work, we introduce a “learning-to-adapt” framework that enables deep stereo methods to continuously adapt to new target domains in an unsupervised manner.
980	3D Appearance Super-Resolution With Deep Learning	Yawei Li, Vagia Tsiminaki, Radu Timofte, Marc Pollefeys, Luc Van Gool	We tackle the problem of retrieving high-resolution (HR) texture maps of objects that are captured from multiple view points.
981	Radial Distortion Triangulation	Zuzana Kukelova, Viktor Larsson	This paper presents the first optimal, maximal likelihood, solution to the triangulation problem for radially distorted cameras.
982	Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes	Ziquan Lan, Zi Jian Yew, Gim Hee Lee	To alleviate this problem, we propose a probabilistic approach for robust back-end optimization in the presence of outliers.
983	Minimal Solvers for Mini-Loop Closures in 3D Multi-Scan Alignment	Pedro Miraldo, Surojit Saha, Srikumar Ramalingam	In this paper, we take a different approach and develop minimal solvers for jointly computing the initial poses of cameras in small loops such as 3-, 4-, and 5-cycles.
984	Volumetric Capture of Humans With a Single RGBD Camera via Semi-Parametric Learning	Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean Fanello	Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD camera.
985	Joint Face Detection and Facial Motion Retargeting for Multiple Faces	Bindita Chaudhuri, Noranart Vesdapunt, Baoyuan Wang	In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces.
986	Monocular Depth Estimation Using Relative Depth Maps	Jae-Han Lee, Chang-Su Kim	We propose a novel algorithm for monocular depth estimation using relative depth maps.
987	Unsupervised Primitive Discovery for Improved 3D Generative Modeling	Salman H. Khan, Yulan Guo, Munawar Hayat, Nick Barnes	Here, we propose a novel factorized generative model for 3D shape generation that sequentially transitions from coarse to fine scale shape generation.
988	Learning to Explore Intrinsic Saliency for Stereoscopic Video	Qiudan Zhang, Xu Wang, Shiqi Wang, Shikai Li, Sam Kwong, Jianmin Jiang	In this paper, we argue that the high-level features are crucial and resort to the deep learning framework to learn the saliency map of stereoscopic videos.
989	Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres	Shuai Liao, Efstratios Gavves, Cees G. M. Snoek	By introducing a spherical exponential mapping on n-spheres at the regression output, we obtain well-behaved gradients, leading to stable training.
990	Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation	Andrea Pilzer, Stephane Lathuiliere, Nicu Sebe, Elisa Ricci	Following these works, we propose a novel self-supervised deep model for estimating depth maps.
991	Learning View Priors for Single-View 3D Reconstruction	Hiroharu Kato, Tatsuya Harada	To reconstruct shapes that look reasonable from any viewpoint, we propose to train a discriminator that learns prior knowledge regarding possible views.
992	Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation	Shanshan Zhao, Huan Fu, Mingming Gong, Dacheng Tao	Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly.
993	Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge	Fabio Tosi, Filippo Aleotti, Matteo Poggi, Stefano Mattoccia	To this aim we propose monoResMatch, a novel deep architecture designed to infer depth from a single input image by synthesizing features from a different point of view, horizontally aligned with the input image, performing stereo matching between the two cues.
994	SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception	Yue Meng, Yongxi Lu, Aman Raj, Samuel Sunarjo, Rui Guo, Tara Javidi, Gaurav Bansal, Dinesh Bharadia	This paper introduces SIGNet, a novel framework that provides robust geometry perception without requiring geometrically informative labels.
995	3D Guided Fine-Grained Face Manipulation	Zhenglin Geng, Chen Cao, Sergey Tulyakov	We present a method for fine-grained face manipulation.
996	Neuro-Inspired Eye Tracking With Eye Movement Dynamics	Kang Wang, Hui Su, Qiang Ji	To address this issue, we propose to leverage on eye movement dynamics inspired by neurological studies.
997	Facial Emotion Distribution Learning by Exploiting Low-Rank Label Correlations Locally	Xiuyi Jia, Xiang Zheng, Weiwei Li, Changqing Zhang, Zechao Li	Therefore, to depict facial expressions more accurately, this paper adopts a label distribution learning approach for emotion recognition that can address the ambiguity of “how to describe the expression” and proposes an emotion distribution learning method that exploits label correlations locally.
998	Unsupervised Face Normalization With Extreme Pose and Expression in the Wild	Yichen Qian, Weihong Deng, Jiani Hu	To this end, we propose a Face Normalization Model (FNM) to generate a frontal, neutral expression, photorealistic face image for face recognition.
999	Semantic Component Decomposition for Face Attribute Manipulation	Ying-Cong Chen, Xiaohui Shen, Zhe Lin, Xin Lu, I-Ming Pao, Jiaya Jia	In this paper, we address these issues by proposing a semantic component model.
1000	R3 Adversarial Network for Cross Model Face Recognition	Ken Chen, Yichao Wu, Haoyu Qin, Ding Liang, Xuebo Liu, Junjie Yan	In this paper, we raise a new problem, namely cross model face recognition (CMFR), which has considerable economic and social significance.
1001	Disentangling Latent Hands for Image Synthesis and Pose Estimation	Linlin Yang, Angela Yao	To better analyze these factors of variation, we propose the use of disentangled representations and a disentangled variational autoencoder (dVAE) that allows for specific sampling and inference of these factors.
1002	Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network	Chen Li, Gim Hee Lee	In this paper, we propose a novel approach to generate multiple feasible hypotheses of the 3D pose from 2D joints.
1003	CrossInfoNet: Multi-Task Information Sharing Based Hand Pose Estimation	Kuo Du, Xiangbo Lin, Yi Sun, Xiaohong Ma	Our main contributions lie in designing a new pose regression network architecture named CrossInfoNet.
1004	P2SGrad: Refined Gradients for Optimizing Deep Face Models	Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li	This paper addresses this challenge by directly designing the gradients for training in an adaptive manner.
1005	Action Recognition From Single Timestamp Supervision in Untrimmed Videos	Davide Moltisanti, Sanja Fidler, Dima Damen	We propose a method that is supervised by single timestamps located around each action instance, in untrimmed videos.
1006	Time-Conditioned Action Anticipation in One Shot	Qiuhong Ke, Mario Fritz, Bernt Schiele	In this paper, we propose a novel time-conditioned method for efficient and effective long-term action anticipation.
1007	Dance With Flow: Two-In-One Stream Action Detection	Jiaojiao Zhao, Cees G. M. Snoek	The goal of this paper is to detect the spatio-temporal extent of an action.
1008	Representation Flow for Action Recognition	AJ Piergiovanni, Michael S. Ryoo	In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations.
1009	LSTA: Long Short-Term Attention for Egocentric Action Recognition	Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz	In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence.
1010	Learning Actor Relation Graphs for Group Activity Recognition	Jianchao Wu, Limin Wang, Li Wang, Jie Guo, Gangshan Wu	This paper aims at learning discriminative relation between actors efficiently using deep models.
1011	A Structured Model for Action Detection	Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid	To address this limitation, we propose to incorporate domain knowledge into the structure of the model, simplifying optimization.
1012	Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition	Devraj Mandal, Sanath Narayan, Sai Kumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao	In this paper, we set out to tackle this issue by arguing for a separate treatment of seen and unseen action categories in generalized zero-shot action recognition.
1013	Object Discovery in Videos as Foreground Motion Clustering	Christopher Xie, Yu Xiang, Zaid Harchaoui, Dieter Fox	We consider the problem of providing dense segmentation masks for object discovery in videos.
1014	Towards Natural and Accurate Future Motion Prediction of Humans and Animals	Zhenguang Liu, Shuang Wu, Shuyuan Jin, Qi Liu, Shijian Lu, Roger Zimmermann, Li Cheng	To address these problems, we propose to explicitly encode anatomical constraints by modeling their skeletons with a Lie algebra representation.
1015	Automatic Face Aging in Videos via Deep Reinforcement Learning	Chi Nhan Duong, Khoa Luu, Kha Gia Quach, Nghia Nguyen, Eric Patterson, Tien D. Bui, Ngan Le	This paper presents a novel approach for synthesizing automatically age-progressed facial images in video sequences using Deep Reinforcement Learning.
1016	Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection	Rui Shao, Xiangyuan Lan, Jiawei Li, Pong C. Yuen	We propose to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework.
1017	A Content Transformation Block for Image Style Transfer	Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Bjorn Ommer	Therefore, we introduce a content transformation module between the encoder and decoder.
1018	BeautyGlow: On-Demand Makeup Transfer Framework With Reversible Generative Network	Hung-Jen Chen, Ka-Ming Hui, Szu-Yu Wang, Li-Wu Tsao, Hong-Han Shuai, Wen-Huang Cheng	To facilitate on-demand makeup transfer, in this work, we propose BeautyGlow that decompose the latent vectors of face images derived from the Glow model into makeup and non-makeup latent vectors.
1019	Style Transfer by Relaxed Optimal Transport and Self-Similarity	Nicholas Kolkin, Jason Salavon, Gregory Shakhnarovich	We propose Style Transfer by Relaxed Optimal Transport and Self-Similarity (STROTSS), a new optimization-based style transfer algorithm.
1020	Inserting Videos Into Videos	Donghoon Lee, Tomas Pfister, Ming-Hsuan Yang	In this paper, we introduce a new problem of manipulating a given video by inserting other videos into it.
1021	Learning Image and Video Compression Through Spatial-Temporal Energy Compaction	Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto	Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression.
1022	Event-Based High Dynamic Range Image and Very High Frame Rate Video Generation Using Conditional Generative Adversarial Networks	Lin Wang, S. Mohammad Mostafavi I., Yo-Sung Ho, Kuk-Jin Yoon	In this paper, we unlock the potential of event camera-based conditional generative adversarial networks to create images/videos from an adjustable portion of the event data stream.
1023	Enhancing TripleGAN for Semi-Supervised Conditional Instance Synthesis and Classification	Si Wu, Guangchang Deng, Jichang Li, Rui Li, Zhiwen Yu, Hau-San Wong	To improve both instance synthesis and classification in this setting, we propose an enhanced TripleGAN (EnhancedTGAN) model in this work.
1024	Capture, Learning, and Synthesis of 3D Speaking Styles	Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black	To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.
1025	Nesti-Net: Normal Estimation for Unstructured 3D Point Clouds Using Convolutional Neural Networks	Yizhak Ben-Shabat, Michael Lindenbaum, Anath Fischer	In this paper, we propose a normal estimation method for unstructured 3D point clouds.
1026	Ray-Space Projection Model for Light Field Camera	Qi Zhang, Jinbo Ling, Qing Wang, Jingyi Yu	In the paper, we propose a novel ray-space projection model to transform sets of rays captured by multiple light field cameras in term of the Plucker coordinates.
1027	Deep Geometric Prior for Surface Reconstruction	Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, Daniele Panozzo	We propose the use of a deep neural network as a geometric prior for surface reconstruction.
1028	Analysis of Feature Visibility in Non-Line-Of-Sight Measurements	Xiaochun Liu, Sebastian Bauer, Andreas Velten	We formulate an equation describing a general Non-line-of-sight (NLOS) imaging measurement and analyze the properties of the measurement in the Fourier domain regarding the spatial frequencies of the scene it encodes.
1029	Hyperspectral Imaging With Random Printed Mask	Yuanyuan Zhao, Hui Guo, Zhan Ma, Xun Cao, Tao Yue, Xuemei Hu	In this paper, based on a simple but not widely noticed phenomenon that the color printer can print color masks with a large number of independent spectral transmission responses, we propose a simple and low-budget scheme to capture the hyperspectral images with a random mask printed by the consumer-level color printer.
1030	All-Weather Deep Outdoor Lighting Estimation	Jinsong Zhang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Sunil Hadap, Jonathan Eisenman, Jean-Francois Lalonde	We present a neural network that predicts HDR outdoor illumination from a single LDR image.
1031	A Variational EM Framework With Adaptive Edge Selection for Blind Motion Deblurring	Liuge Yang, Hui Ji	This paper presented an interpretation of edge selection/reweighting in terms of variational Bayes inference, and therefore developed a novel variational expectation maximization (VEM) algorithm with built-in adaptive edge selection for blind deblurring.
1032	Viewport Proposal CNN for 360deg Video Quality Assessment	Chen Li, Mai Xu, Lai Jiang, Shanyi Zhang, Xiaoming Tao	Thus, this paper proposes a viewport-based convolutional neural network (V-CNN) approach for VQA on 360deg video, considering both auxiliary tasks of viewport proposal and viewport saliency prediction.
1033	Beyond Gradient Descent for Regularized Segmentation Losses	Dmitrii Marin, Meng Tang, Ismail Ben Ayed, Yuri Boykov	Our work suggests that network design/training should pay more attention to optimization methods.
1034	MAGSAC: Marginalizing Sample Consensus	Daniel Barath, Jiri Matas, Jana Noskova	A method called, sigma-consensus, is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC.
1035	Understanding and Visualizing Deep Visual Saliency Models	Sen He, Hamed R. Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault	This article attempts to answer these questions by analyzing the representations learned by individual neurons located at the intermediate layers of deep saliency models.
1036	Divergence Prior and Vessel-Tree Reconstruction	Zhongwen Zhang, Dmitrii Marin, Egor Chesakov, Marc Moreno Maza, Maria Drangova, Yuri Boykov	We propose a new geometric regularization principle for reconstructing vector fields based on prior knowledge about their divergence.
1037	Unsupervised Domain-Specific Deblurring via Disentangled Representations	Boyu Lu, Jun-Cheng Chen, Rama Chellappa	In this paper, we present an unsupervised method for domain-specific, single-image deblurring based on disentangled representations.
1038	Douglas-Rachford Networks: Learning Both the Image Prior and Data Fidelity Terms for Blind Image Deconvolution	Raied Aljadaany, Dipan K. Pal, Marios Savvides	In this paper, we present a method called Dr-Net, which does not require any such estimate and is further able to invert the effects of the blurring in blind image recovery tasks.
1039	Speed Invariant Time Surface for Learning to Detect Corner Points With Event-Based Cameras	Jacques Manderscheid, Amos Sironi, Nicolas Bourdis, Davide Migliore, Vincent Lepetit	We propose a learning approach to corner detection for event-based cameras that is stable even under fast and abrupt motions. Moreover, we introduce a high-resolution dataset suitable for quantitative evaluation and comparison of corner detection methods for event-based cameras.
1040	Training Deep Learning Based Image Denoisers From Undersampled Measurements Without Ground Truth and Without Image Prior	Magauiya Zhussip, Shakarim Soltanayev, Se Young Chun	To resolve this dilemma, we propose novel methods based on two well-grounded theories: denoiser-approximate message passing (D-AMP) and Stein’s unbiased risk estimator (SURE).
1041	A Variational Pan-Sharpening With Local Gradient Constraints	Xueyang Fu, Zihuang Lin, Yue Huang, Xinghao Ding	In this paper, a new variational model based on a local gradient constraint for pan-sharpening is proposed.
1042	F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning	Yongqin Xian, Saurabh Sharma, Bernt Schiele, Zeynep Akata	In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings.
1043	Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation	Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, Daniel Ulbricht	In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary and the Wasserstein metric.
1044	Graph Attention Convolution for Point Cloud Semantic Segmentation	Lei Wang, Yuchun Huang, Yaolin Hou, Shenman Zhang, Jie Shan	This paper proposes a novel graph attention convolution (GAC), whose kernels can be dynamically carved into specific shapes to adapt to the structure of an object.
1045	Normalized Diversification	Shaohui Liu, Xiao Zhang, Jianqiao Wangni, Jianbo Shi	We introduce the concept of normalized diversity which force the model to preserve the normalized pairwise distance between the sparse samples from a latent parametric distribution and their corresponding high-dimensional outputs. We demonstrate that by combining the normalized diversity loss and the adversarial loss, we generate diverse data without suffering from mode collapsing.
1046	Learning to Localize Through Compressed Binary Maps	Xinkai Wei, Ioan Andrei Barsan, Shenlong Wang, Julieta Martinez, Raquel Urtasun	In this paper we propose to learn to compress the map representation such that it is optimal for the localization task.
1047	A Parametric Top-View Representation of Complex Road Scenes	Ziyan Wang, Buyu Liu, Samuel Schulter, Manmohan Chandraker	In this paper, we address the problem of inferring the layout of complex road scenes given a single camera as input.
1048	Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction	Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, Yueting Zhuang	We propose a self-supervised spatiotemporal learning technique which leverages the chronological order of videos.
1049	Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids	Despoina Paschalidou, Ali Osman Ulusoy, Andreas Geiger	This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements.
1050	Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network	Xianglei Xing, Tian Han, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu	We present a deformable generator model to disentangle the appearance and geometric information in purely unsupervised manner.
1051	Self-Supervised Representation Learning by Rotation Feature Decoupling	Zeyu Feng, Chang Xu, Dacheng Tao	We introduce a self-supervised learning method that focuses on beneficial properties of representation and their abilities in generalizing to real-world tasks.
1052	Weakly Supervised Deep Image Hashing Through Tag Embeddings	Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li	Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem.
1053	Improved Road Connectivity by Joint Learning of Orientation and Segmentation	Anil Batra, Suriya Singh, Guan Pang, Saikat Basu, C.V. Jawahar, Manohar Paluri	In this paper, we propose a connectivity task called Orientation Learning, motivated by the human behavior of annotating roads by tracing it at a specific orientation.
1054	Deep Supervised Cross-Modal Retrieval	Liangli Zhen, Peng Hu, Xu Wang, Dezhong Peng	In this paper, we present a novel cross-modal retrieval method, called Deep Supervised Cross-modal Retrieval (DSCMR).
1055	A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning	Thanh-Toan Do, Toan Tran, Ian Reid, Vijay Kumar, Tuan Hoang, Gustavo Carneiro	We propose a method that substantially improves the efficiency of deep distance metric learning based on the optimization of the triplet loss function.
1056	Data Representation and Learning With Graph Diffusion-Embedding Networks	Bo Jiang, Doudou Lin, Jin Tang, Bin Luo	In this paper, we present Graph Diffusion-Embedding networks (GDENs), a new model for graph-structured data representation and learning.
1057	Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph	Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi	In this paper, we construct a Conditional Random Field on a fully-connected spatio-temporal graph that exploits the statistical dependency between relational entities spatially and temporally.
1058	Image-Question-Answer Synergistic Network for Visual Dialog	Dalu Guo, Chang Xu, Dacheng Tao	In this paper, we devise a novel image-question-answer synergistic network to value the role of the answer for precise visual dialog.
1059	Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses	Jing Shi, Jia Xu, Boqing Gong, Chenliang Xu	In this work, we address these issues by extending frame-level MIL with a false positive frame-bag constraint and modeling the visual feature consistency in the video.
1060	Inverse Cooking: Recipe Generation From Food Images	Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero	Therefore, in this paper we introduce an inverse cooking system that recreates cooking recipes given food images.
1061	Adversarial Semantic Alignment for Improved Image Captions	Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Tom Sercu	In this paper, we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions.
1062	Answer Them All! Toward Universal Visual Question Answering Models	Robik Shrestha, Kushal Kafle, Christopher Kanan	To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.
1063	Unsupervised Multi-Modal Neural Machine Translation	Yuanhang Su, Kai Fan, Nguyen Bach, C.-C. Jay Kuo, Fei Huang	We propose an unsupervised multi-modal machine translation (UMNMT) framework based on the language translation cycle consistency loss conditional on the image, targeting to learn the bidirectional multi-modal translation simultaneously.
1064	Multi-Task Learning of Hierarchical Vision-Language Representation	Duy-Kien Nguyen, Takayuki Okatani	We propose a multi-task learning approach that enables to learn vision-language representation that is shared by many tasks from their diverse datasets.
1065	Cross-Modal Self-Attention Network for Referring Image Segmentation	Linwei Ye, Mrigank Rochan, Zhi Liu, Yang Wang	In this paper, we propose a cross-modal self-attention (CMSA) module that effectively captures the long-range dependencies between linguistic and visual features.
1066	DuDoNet: Dual Domain Network for CT Metal Artifact Reduction	Wei-An Lin, Haofu Liao, Cheng Peng, Xiaohang Sun, Jingdan Zhang, Jiebo Luo, Rama Chellappa, Shaohua Kevin Zhou	To overcome these difficulties, we propose an end-to-end trainable Dual Domain Network (DuDoNet) to simultaneously restore sinogram consistency and enhance CT images.
1067	Fast Spatio-Temporal Residual Network for Video Super-Resolution	Sheng Li, Fengxiang He, Bo Du, Lefei Zhang, Yonghao Xu, Dacheng Tao	In this paper, we present a novel fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load.
1068	Complete the Look: Scene-Based Complementary Product Recommendation	Wang-Cheng Kang, Eric Kim, Jure Leskovec, Charles Rosenberg, Julian McAuley	In this work, we propose a new task called ‘Complete the Look’, which seeks to recommend visually compatible products based on scene images.
1069	Selective Sensor Fusion for Neural Visual-Inertial Odometry	Changhao Chen, Stefano Rosa, Yishu Miao, Chris Xiaoxuan Lu, Wei Wu, Andrew Markham, Niki Trigoni	We propose a novel end-to-end selective sensor fusion framework for monocular VIO, which fuses monocular images and inertial measurements in order to estimate the trajectory whilst improving robustness to real-life issues, such as missing and corrupted data or bad sensor synchronization.
1070	Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes	Chengquan Zhang, Borong Liang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding, Xinghao Ding	To address these two problems, we present a novel text detector namely LOMO, which localizes the text progressively for multiple times (or in other word, LOok More than Once).
1071	Learning Binary Code for Personalized Fashion Recommendation	Zhi Lu, Yang Hu, Yunchao Jiang, Yan Chen, Bing Zeng	In this paper, we propose to learn binary code for efficient personalized fashion outfits recommendation. We collect outfit data together with user label information from a fashion-focused social website for the personalized recommendation task.
1072	Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model	Liu Li, Mai Xu, Xiaofei Wang, Lai Jiang, Hanruo Liu	This paper proposes an attention-based CNN for glaucoma detection (AG-CNN).
1073	Privacy Protection in Street-View Panoramas Using Depth and Multi-View Imagery	Ries Uittenbogaard, Clint Sebastian, Julien Vijverberg, Bas Boom, Dariu M. Gavrila, Peter H.N. de With	In this paper, we propose a framework that is an alternative to blurring, which automatically removes and inpaints moving objects (e.g. pedestrians, vehicles) in street-view imagery.
1074	Grounding Human-To-Vehicle Advice for Self-Driving Vehicles	Jinkyu Kim, Teruhisa Misu, Yi-Ting Chen, Ashish Tawari, John Canny	Here, we propose to address this issue by augmenting training data with natural language advice from a human.
1075	Multi-Step Prediction of Occupancy Grid Maps With Recurrent Neural Networks	Nima Mohajerin, Mohsen Rohani	We investigate the multi-step prediction of the drivable space, represented by Occupancy Grid Maps (OGMs), for autonomous vehicles.
1076	Connecting Touch and Vision via Cross-Modal Prediction	Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba	In this work, we investigate the cross-modal connection between vision and touch. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences.
1077	X2CT-GAN: Reconstructing CT From Biplanar X-Rays With Generative Adversarial Networks	Xingde Ying, Heng Guo, Kai Ma, Jian Wu, Zhengxin Weng, Yefeng Zheng	In this work, we propose to reconstruct CT from two orthogonal X-rays using the generative adversarial network (GAN) framework.
1078	Practical Full Resolution Learned Lossless Image Compression	Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool	We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000.
1079	Image-To-Image Translation via Group-Wise Deep Whitening-And-Coloring Transformation	Wonwoong Cho, Sungha Choi, David Keetae Park, Inkyu Shin, Jaegul Choo	In response, this paper proposes an end-to-end approach tailored for image translation that efficiently approximates this transformation with our novel regularization methods.
1080	Max-Sliced Wasserstein Distance and Its Use for GANs	Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander G. Schwing	Max-Sliced Wasserstein Distance and Its Use for GANs
1081	Meta-Learning With Differentiable Convex Optimization	Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto	Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories.
1082	RePr: Improved Training of Convolutional Filters	Aaditya Prakash, James Storer, Dinei Florencio, Cha Zhang	Innovations in network architecture such as skip/dense connections and inception units have mitigated this problem to some extent, but these improvements come with increased computation and memory requirements at run-time. We attempt to address this problem from another angle – not by changing the network structure but by altering the training method.
1083	Tangent-Normal Adversarial Regularization for Semi-Supervised Learning	Bing Yu, Jingfeng Wu, Jinwen Ma, Zhanxing Zhu	In this work, we propose tangent-normal adversarial regularization (TNAR) as an extension of VAT by taking the data manifold into consideration.
1084	Auto-Encoding Scene Graphs for Image Captioning	Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai	We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.
1085	Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech	Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander G. Schwing, David Forsyth	In this paper, we first predict a meaningful summary of the image, then generate the caption based on that summary.
1086	Attention Branch Network: Learning of Attention Mechanism for Visual Explanation	Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi	In this paper, we focus on the attention map for visual explanation, which represents a high response value as the attention location in image recognition.
1087	Cascaded Projection: End-To-End Network Compression and Acceleration	Breton Minnehan, Andreas Savakis	We propose a data-driven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements.
1088	DeepCaps: Going Deeper With Capsule Networks	Jathushan Rajasegaran, Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Suranga Seneviratne, Ranga Rodrigo	Drawing intuition from the success achieved by Convolutional Neural Networks (CNNs) by going deeper, we introduce DeepCaps, a deep capsule network architecture which uses a novel 3D convolution based dynamic routing algorithm.
1089	FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search	Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer	To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods.
1090	APDrawingGAN: Generating Artistic Portrait Drawings From Face Photos With Hierarchical GANs	Ran Yi, Yong-Jin Liu, Yu-Kun Lai, Paul L. Rosin	To address these challenges, we propose APDrawingGAN, a novel GAN based architecture that builds upon hierarchical generators and discriminators combining both a global network (for images as a whole) and local networks (for individual facial regions). To train APDrawingGAN, we construct an artistic drawing dataset containing high-resolution portrait photos and corresponding professional artistic drawings.
1091	Constrained Generative Adversarial Networks for Interactive Image Generation	Eric Heim	In this work we develop a novel GAN framework that allows humans to be “in-the-loop” of the image generation process.
1092	WarpGAN: Automatic Caricature Generation	Yichun Shi, Debayan Deb, Anil K. Jain	We propose, WarpGAN, a fully automatic network that can generate caricatures given an input face photo.
1093	Explainability Methods for Graph Convolutional Neural Networks	Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, Heiko Hoffmann	In this paper, we introduce explainability methods for GCNNs.
1094	A Generative Adversarial Density Estimator	M. Ehsan Abbasnejad, Qinfeng Shi, Anton van den Hengel, Lingqiao Liu	We propose a Generative Adversarial Density Estimator, a density estimation approach that bridges the gap between the two.
1095	SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates	Martin Engilberge, Louis Chevallier, Patrick Perez, Matthieu Cord	In the present work, we introduce a new method to learn approximations of such non-differentiable objective functions.
1096	High-Quality Face Capture Using Anatomical Muscles	Michael Bao, Matthew Cong, Stephane Grabli, Ronald Fedkiw	Thus, we propose modifying a recently developed rather expressive muscle-based system in order to make it fully-differentiable; in fact, our proposed modifications allow this physically robust and anatomically accurate muscle model to conveniently be driven by an underlying blendshape basis.
1097	FML: Face Model Learning From Videos	Ayush Tewari, Florian Bernard, Pablo Garrido, Gaurav Bharaj, Mohamed Elgharib, Hans-Peter Seidel, Patrick Perez, Michael Zollhofer, Christian Theobalt	In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces.
1098	AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations	Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li	In this paper, we investigate in depth the effects of two important hyperparameters of cosine-based softmax losses, the scale parameter and angular margin parameter, by analyzing how they modulate the predicted classification probability.
1099	3D Hand Shape and Pose Estimation From a Single RGB Image	Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan	In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose.
1100	3D Hand Shape and Pose From Images in the Wild	Adnane Boukhayma, Rodrigo de Bem, Philip H.S. Torr	We present in this work the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild.
1101	Self-Supervised 3D Hand Pose Estimation Through Training by Fitting	Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao	We present a self-supervision method for 3D hand pose estimation from depth maps.
1102	CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark	Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, Cewu Lu	In this paper, we propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms.
1103	Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in a Triadic Interaction	Hanbyul Joo, Tomas Simon, Mina Cikara, Yaser Sheikh	We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. We then present a new 3D motion capture dataset to explore this problem, where the broad spectrum of social signals (3D body, face, and hand motions) are captured in a triadic social interaction scenario.
1104	HoloPose: Holistic 3D Human Reconstruction In-The-Wild	Riza Alp Guler, Iasonas Kokkinos	We introduce HoloPose, a method for holistic monocular 3D human body reconstruction.
1105	Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation	Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Liang Lin	In this work, we propose a geometry-aware 3D representation for the human pose to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision.
1106	In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations	Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, Christian Theobalt	We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes.
1107	Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues	Natalia Neverova, James Thewlis, Riza Alp Guler, Iasonas Kokkinos, Andrea Vedaldi	In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies.
1108	Self-Supervised Representation Learning From Videos for Facial Action Unit Detection	Yong Li, Jiabei Zeng, Shiguang Shan, Xilin Chen	In this paper, we aim to learn discriminative representation for facial action unit (AU) detection from large amount of videos without manual annotations.
1109	Combining 3D Morphable Models: A Large Scale Face-And-Head Model	Stylianos Ploumpis, Haoyang Wang, Nick Pears, William A. P. Smith, Stefanos Zafeiriou	In answering this question, we make two contributions. First, we propose two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Second, as an example application of our approach, we build a new head and face model that combines the variability and facial detail of the LSFM with the full head modelling of the LYHM.
1110	Boosting Local Shape Matching for Dense 3D Face Correspondence	Zhenfeng Fan, Xiyuan Hu, Chen Chen, Silong Peng	In this paper, we explicitly formulate the deformation as locally rigid motions guided by some seed points, and the formulated deformation satisfies coherent local motions everywhere on a face.
1111	Unsupervised Part-Based Disentangling of Object Shape and Appearance	Dominik Lorenz, Leonard Bereska, Timo Milbich, Bjorn Ommer	We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category.
1112	Monocular Total Capture: Posing Face, Body, and Hands in the Wild	Donglai Xiang, Hanbyul Joo, Yaser Sheikh	We present the first method to capture the 3D total motion of a target person from a monocular view input. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system.
1113	Expressive Body Capture: 3D Hands, Face, and Body From a Single Image	Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black	To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image.
1114	Neural RGB(r)D Sensing: Depth and Uncertainty From a Video Camera	Chao Liu, Jinwei Gu, Kihwan Kim, Srinivasa G. Narasimhan, Jan Kautz	In this paper, we propose a deep learning (DL) method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera.
1115	DAVANet: Stereo Deblurring With View Aggregation	Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, Haozhe Xie, Jinshan Pan, Jimmy S. Ren	By exploiting the two-view nature of stereo images, we propose a novel stereo image deblurring network with Depth Awareness and View Aggregation, named DAVANet. Moreover, we present a large-scale multi-scene dataset for stereo deblurring, containing 20,637 blurry-sharp stereo image pairs from 135 diverse sequences and their corresponding bidirectional disparities.
1116	DVC: An End-To-End Deep Video Compression Framework	Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao	In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression.
1117	SOSNet: Second Order Similarity Regularization for Local Descriptor Learning	Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, Vassileios Balntas	In this work, we explore the potential of \sos in the field of descriptor learning by building upon the intuition that a positive pair of matching points should exhibit similar distances with respect to other points in the embedding space.
1118	“Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors	Yosef Gandelsman, Assaf Shocher, Michal Irani	In this paper we propose a unified framework for unsupervised layer decomposition of a single image, based on coupled “Deep-image-Prior” (DIP) networks.
1119	Unprocessing Images for Learned Raw Denoising	Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, Jonathan T. Barron	To address this, we present a technique to “unprocess” images by inverting each step of an image processing pipeline, thereby allowing us to synthesize realistic raw sensor measurements from commonly available Internet photos.
1120	Residual Networks for Light Field Image Super-Resolution	Shuo Zhang, Youfang Lin, Hao Sheng	In this paper, a learning-based method using residual convolutional networks is proposed to reconstruct light fields with higher spatial resolution.
1121	Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers	Jingwen He, Chao Dong, Yu Qiao	We make a step forward by proposing a unified CNN framework that consists of little additional parameters than a single-level model yet could handle arbitrary restoration levels between a start and an end level.
1122	Second-Order Attention Network for Single Image Super-Resolution	Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, Lei Zhang	To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning.
1123	Devil Is in the Edges: Learning Semantic Boundaries From Noisy Annotations	David Acuna, Amlan Kar, Sanja Fidler	We propose a simple new layer and loss that can be used with existing learning-based boundary detectors.
1124	Path-Invariant Map Networks	Zaiwei Zhang, Zhenxiao Liang, Lemeng Wu, Xiaowei Zhou, Qixing Huang	In this paper, we study a natural self-supervision constraint for directed map networks called path-invariance, which enforces that composite maps along different paths between a fixed pair of source and target domains are identical.
1125	FilterReg: Robust and Efficient Probabilistic Point-Set Registration Using Gaussian Filter and Twist Parameterization	Wei Gao, Russ Tedrake	In this paper, we contribute a novel probabilistic registration method that achieves state-of-the-art robustness as well as substantially faster computational performance than modern ICP implementations.
1126	Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope	Tolga Birdal, Umut Simsekli	We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images.
1127	Lifting Vectorial Variational Problems: A Natural Formulation Based on Geometric Measure Theory and Discrete Exterior Calculus	Thomas Mollenhoff, Daniel Cremers	We approach the relaxation and convexification of such vectorial variational problems via a lifting to the space of currents.
1128	A Sufficient Condition for Convergences of Adam and RMSProp	Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu	In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization.
1129	Guaranteed Matrix Completion Under Multiple Linear Transformations	Chao Li, Wei He, Longhao Yuan, Zhun Sun, Qibin Zhao	To tackle this problem, we propose a more general framework for LRMC, in which the linear transformations of the data are taken into account.
1130	MAP Inference via Block-Coordinate Frank-Wolfe Algorithm	Paul Swoboda, Vladimir Kolmogorov	We present a new proximal bundle method for Maximum-A-Posteriori (MAP) inference in structured energy minimization problems.
1131	A Convex Relaxation for Multi-Graph Matching	Paul Swoboda, Dagmar Kainm”uller, Ashkan Mokarian, Christian Theobalt, Florian Bernard	We present a convex relaxation for the multi-graph matching problem.
1132	Pixel-Adaptive Convolutional Neural Networks	Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, Jan Kautz	We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially varying kernel that depends on learnable, local pixel features.
1133	Single-Frame Regularization for Temporally Stable CNNs	Gabriel Eilertsen, Rafal K. Mantiuk, Jonas Unger	We take a different approach to the problem, posing temporal stability as a regularization of the cost function.
1134	An End-To-End Network for Generating Social Relationship Graphs	Arushi Goel, Keng Teck Ma, Cheston Tan	We introduce a novel end-to-end-trainable neural network that is capable of generating a Social Relationship Graph – a structured, unified representation of social relationships and attributes – from a given input image.
1135	Meta-Learning Convolutional Neural Architectures for Multi-Target Concrete Defect Classification With the COncrete DEfect BRidge IMage Dataset	Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh	In this work we introduce the novel COncrete DEfect BRidge IMage dataset (CODEBRIM) for multi-target classification of five commonly appearing concrete defects.
1136	ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model	Haichuan Yang, Yuhao Zhu, Ji Liu	This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss.
1137	SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization	Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang	In this paper we present a novel and general method to accelerate convolutional neural network (CNN) inference by taking advantage of feature map sparsity.
1138	Defending Against Adversarial Attacks by Randomized Diversification	Olga Taran, Shideh Rezaeifar, Taras Holotyak, Slava Voloshynovskiy	In this paper, we propose a randomized diversification as a defense strategy.
1139	Rob-GAN: Generator, Discriminator, and Adversarial Attacker	Xuanqing Liu, Cho-Jui Hsieh	Combining these two insights, we develop a framework called Rob-GAN to jointly optimize generator and discriminator in the presence of adversarial attacks—the generator generates fake images to fool discriminator; the adversarial attacker perturbs real images to fool discriminator, and the discriminator wants to minimize loss under fake and adversarial images.
1140	Learning From Noisy Labels by Regularized Estimation of Annotator Confusion	Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, Nathan Silberman	In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations.
1141	Task-Free Continual Learning	Rahaf Aljundi, Klaas Kelchtermans, Tinne Tuytelaars	Therefore we investigate how to transform continual learning to an online setup.
1142	Importance Estimation for Neural Network Pruning	Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz	We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores.
1143	Detecting Overfitting of Deep Generative Networks via Latent Recovery	Ryan Webster, Julien Rabin, Loic Simon, Frederic Jurie	We address this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors for training versus validation images.
1144	Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks	Seungjoo Yoo, Hyojin Bahng, Sunghyo Chung, Junsoo Lee, Jaehyuk Chang, Jaegul Choo	To tackle this issue, we present a novel memory-augmented colorization model MemoPainter that can produce high-quality colorization with limited data.
1145	Characterizing and Avoiding Negative Transfer	Zirui Wang, Zihang Dai, Barnabas Poczos, Jaime Carbonell	This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof.
1146	Building Efficient Deep Neural Networks With Unitary Group Convolutions	Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang	We propose unitary group convolutions (UGConvs), a building block for CNNs which compose a group convolution with unitary transforms in feature space to learn a richer set of representations than group convolution alone.
1147	Semi-Supervised Learning With Graph Learning-Convolutional Networks	Bo Jiang, Ziyan Zhang, Doudou Lin, Jin Tang, Bin Luo	In this paper, we propose a novel Graph Learning-Convolutional Network (GLCN) for graph data representation and semi-supervised learning.
1148	Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning	Oleksiy Ostapenko, Mihai Puscas, Tassilo Klein, Patrick Jahnichen, Moin Nabi	In order to tackle these challenges, we introduce Dynamic Generative Memory (DGM) – synaptic plasticity driven framework for continual learning.
1149	AIRD: Adversarial Learning Framework for Image Repurposing Detection	Ayush Jaiswal, Yue Wu, Wael AbdAlmageed, Iacopo Masi, Premkumar Natarajan	In this paper, we present a novel method for image repurposing detection that is based on the real-world adversarial interplay between a bad actor who repurposes images with counterfeit metadata and a watchdog who verifies the semantic consistency between images and their accompanying metadata, where both players have access to a reference dataset of verified content, which they can use to achieve their goals.
1150	A Kernelized Manifold Mapping to Diminish the Effect of Adversarial Perturbations	Saeid Asgari Taghanaki, Kumar Abhishek, Shekoofeh Azizi, Ghassan Hamarneh	To tackle this problem, we propose a non-linear radial basis convolutional feature mapping by learning a Mahalanobis-like distance function.
1151	Trust Region Based Adversarial Attack on Neural Networks	Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael W. Mahoney	To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently.
1152	PEPSI : Fast Image Inpainting With Parallel Decoding Network	Min-cheol Sagong, Yong-goo Shin, Seung-wook Kim, Seung Park, Sung-jea Ko	To solve this problem, in this paper, we present a novel network structure, called PEPSI: parallel extended-decoder path for semantic inpainting.
1153	Model-Blind Video Denoising via Frame-To-Frame Training	Thibaud Ehret, Axel Davy, Jean-Michel Morel, Gabriele Facciolo, Pablo Arias	In this paper we propose a fully blind video denoising method, with two versions off-line and on-line.
1154	End-To-End Efficient Representation Learning via Cascading Combinatorial Optimization	Yeonwoo Jeong, Yoonsung Kim, Hyun Oh Song	We develop hierarchically quantized efficient embedding representations for similarity-based search and show that this representation provides not only the state of the art performance on the search accuracy but also provides several orders of speed up during inference.
1155	Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation	Fengda Zhu, Linchao Zhu, Yi Yang	Specifically, our method employs an adversarial feature adaptation model for visual representation transfer anda policy mimic strategy for policy behavior imitation.
1156	ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation	Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha	We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors.
1157	Regularizing Activation Distribution for Training Binarized Deep Networks	Ruizhou Ding, Ting-Wu Chin, Zeye Liu, Diana Marculescu	In this paper, we propose to use distribution loss to explicitly regularize the activation flow, and develop a framework to systematically formulate the loss.
1158	Robustness Verification of Classification Deep Neural Networks via Linear Programming	Wang Lin, Zhengfeng Yang, Xin Chen, Qingye Zhao, Xiangkun Li, Zhiming Liu, Jifeng He	In this paper, we develop a novel method for robustness verification of CDNNs with sigmoid activation functions.
1159	Additive Adversarial Learning for Unbiased Authentication	Jian Liang, Yuren Cao, Chenbin Zhang, Shiyu Chang, Kun Bai, Zenglin Xu	To address this issue, we propose a novel two-stage method that disentangles the class/identity from domain-differences, and we consider multiple types of domain-difference.
1160	Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation	Zhezhi He, Deliang Fan	In this work, we propose a novel ternarized neural network training method which simultaneously optimizes both weights and quantizer during training, differentiating from prior works.
1161	Adversarial Defense by Stratified Convolutional Sparse Coding	Bo Sun, Nian-Hsuan Tsai, Fangchen Liu, Ronald Yu, Hao Su	We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size.
1162	Exploring Object Relation in Mean Teacher for Cross-Domain Detection	Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Lingyu Duan, Ting Yao	In this work, we advance this Mean Teacher paradigm to be applicable for cross-domain detection.
1163	Hierarchical Disentanglement of Discriminative Latent Features for Zero-Shot Learning	Bin Tong, Chao Wang, Martin Klinkigt, Yoshiyuki Kobayashi, Yuuichi Nonaka	In this paper, we discuss two questions about generalization that are seldom discussed.
1164	R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network	Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Yanbin Hao	This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of generating image from procedure text for retrieval problem.
1165	Rethinking Knowledge Graph Propagation for Zero-Shot Learning	Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, Eric P. Xing	In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes.
1166	Learning to Learn Image Classifiers With Visual Analogy	Linjun Zhou, Peng Cui, Shiqiang Yang, Wenwu Zhu, Qi Tian	In this paper, we attempt to investigate a new human-like learning method by organically combining these two mechanisms.
1167	Where’s Wally Now? Deep Generative and Discriminative Embeddings for Novelty Detection	Philippe Burlina, Neil Joshi, I-Jeng Wang	We address these challenges via the following contributions: We make a proposal for a novel framework to measure the performance of novelty detection methods using a trade-space demonstrating performance (measured by ROCAUC) as a function of problem complexity.
1168	Weakly Supervised Image Classification Through Noise Regularization	Mengying Hu, Hu Han, Shiguang Shan, Xilin Chen	In this work, we propose an effective approach for weakly supervised image classification utilizing massive noisy labeled data with only a small set of clean labels (e.g., 5%).
1169	Data-Driven Neuron Allocation for Scale Aggregation Networks	Yi Li, Zhanghui Kuang, Yimin Chen, Wayne Zhang	In this paper, we propose to learn the neuron allocation for aggregating multi-scale information in different building blocks of a deep network.
1170	Graphical Contrastive Losses for Scene Graph Parsing	Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro	We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph parsing problem, collectively termed the Graphical Contrastive Losses.
1171	Deep Transfer Learning for Multiple Class Novelty Detection	Pramuditha Perera, Vishal M. Patel	We propose a transfer learning-based solution for the problem of multiple class novelty detection.
1172	QATM: Quality-Aware Template Matching for Deep Learning	Jiaxin Cheng, Yue Wu, Wael AbdAlmageed, Premkumar Natarajan	In this paper, we propose a novel quality-aware template matching method, which is not only used as a standalone template matching algorithm, but also a trainable layer that can be easily plugged in any deep neural network.
1173	Retrieval-Augmented Convolutional Neural Networks Against Adversarial Examples	Jake Zhao (Junbo), Kyunghyun Cho	We propose a retrieval-augmented convolutional network (RaCNN) and propose to train it with local mixup, a novel variant of the recently proposed mixup algorithm.
1174	Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images	Hao Wang, Doyen Sahoo, Chenghao Liu, Ee-peng Lim, Steven C. H. Hoi	In this paper, we investigate an open research task of cross-modal retrieval between cooking recipes and food images, and propose a novel framework Adversarial Cross-Modal Embedding (ACME) to resolve the cross-modal retrieval task in food domains.
1175	FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network	Jonah Philion	In this paper, we use lane detection to study modeling and training techniques that yield better performance on real world test drives.
1176	Weakly Supervised Video Moment Retrieval From Text Queries	Niluthpol Chowdhury Mithun, Sujoy Paul, Amit K. Roy-Chowdhury	In order to cope with this issue, in this work, we introduce the problem of learning from weak labels for the task of text to video moment retrieval.
1177	Content-Aware Multi-Level Guidance for Interactive Instance Segmentation	Soumajit Majumder, Angela Yao	We propose a novel transformation of user clicks to generate content-aware guidance maps that leverage the hierarchical structural information present in an image.
1178	Greedy Structure Learning of Hierarchical Compositional Models	Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, Thomas Vetter	In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter.
1179	Interactive Full Image Segmentation by Considering All Regions Jointly	Eirikur Agustsson, Jasper R. R. Uijlings, Vittorio Ferrari	We propose an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions.
1180	Learning Active Contour Models for Medical Image Segmentation	Xu Chen, Bryan M. Williams, Srinivasa R. Vallabhaneni, Gabriela Czanner, Rachel Williams, Yalin Zheng	Our aim was to tackle this limitation by developing a new model based on deep learning which takes into account the area inside as well as outside the region of interest as well as the size of boundaries during learning.
1181	Customizable Architecture Search for Semantic Segmentation	Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, Tao Mei	In this paper, we propose a Customizable Architecture Search (CAS) approach to automatically generate a network architecture for semantic image segmentation.
1182	Local Features and Visual Words Emerge in Activations	Oriane Simeoni, Yannis Avrithis, Ondrej Chum	We propose a novel method of deep spatial matching (DSM) for image retrieval.
1183	Hyperspectral Image Super-Resolution With Optimized RGB Guidance	Ying Fu, Tao Zhang, Yinqiang Zheng, Debing Zhang, Hua Huang	In this paper, we first present a simple and efficient convolutional neural network (CNN) based method for HSI super-resolution in an unsupervised way, without any prior training. Later, we append a CSR optimization layer onto the HSI super-resolution network, either to automatically select the best CSR in a given CSR dataset, or to design the optimal CSR under some physical restrictions.
1184	Adaptive Confidence Smoothing for Generalized Zero-Shot Learning	Yuval Atzmon, Gal Chechik	Here we describe a probabilistic approach that breaks the model into three modular components, and then combines them in a consistent way.
1185	PMS-Net: Robust Haze Removal Based on Patch Map for Single Images	Wei-Ting Chen, Jian-Jiun Ding, Sy-Yen Kuo	In this paper, we proposed a novel haze removal algorithm based on a new feature called the patch map.
1186	Deep Spherical Quantization for Image Search	Sepehr Eghbali, Ladan Tahvildari	In this paper, we put forward Deep Spherical Quantization (DSQ), a novel method to make deep convolutional neural networks generate supervised and compact binary codes for efficient image search.
1187	Large-Scale Interactive Object Segmentation With Human Annotators	Rodrigo Benenson, Stefan Popov, Vittorio Ferrari	In this paper we make several contributions to interactive segmentation: (1) we systematically explore in simulation the design space of deep interactive segmentation models and report new insights and caveats; (2) we execute a large-scale annotation campaign with real human annotators, producing masks for 2.5M instances on the OpenImages dataset. We released this data publicly, forming the largest existing dataset for instance segmentation.
1188	A Poisson-Gaussian Denoising Dataset With Real Fluorescence Microscopy Images	Yide Zhang, Yinhao Zhu, Evan Nichols, Qingfei Wang, Siyuan Zhang, Cody Smith, Scott Howard	In this paper, we fill this gap by constructing a dataset – the Fluorescence Microscopy Denoising (FMD) dataset – that is dedicated to Poisson-Gaussian denoising.
1189	Task Agnostic Meta-Learning for Few-Shot Learning	Muhammad Abdullah Jamal, Guo-Jun Qi	Specifically, we present an entropy-based approach that meta-learns an unbiased initial model with the largest uncertainty over the output labels by preventing it from over-performing in classification tasks.
1190	Progressive Ensemble Networks for Zero-Shot Recognition	Meng Ye, Yuhong Guo	In this paper, we propose a novel progressive ensemble network model with multiple projected label embeddings to address zero-shot image recognition.
1191	Direct Object Recognition Without Line-Of-Sight Using Optical Coherence	Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu	We introduce a novel approach based on speckle pattern recognition with deep neural network, which is simpler and more robust than other NLOS recognition methods.
1192	Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning	Mahdi S. Hosseini, Lyndon Chan, Gabriel Tse, Michael Tang, Jun Deng, Sajad Norouzi, Corwyn Rowsell, Konstantinos N. Plataniotis, Savvas Damaskinos	In this paper, we propose a new digital pathology database, the “Atlas of Digital Pathology” (or ADP), which comprises of 17,668 patch images extracted from 100 slides annotated with up to 57 hierarchical HTTs.
1193	Perturbation Analysis of the 8-Point Algorithm: A Case Study for Wide FoV Cameras	Thiago L. T. da Silveira, Claudio R. Jung	This paper presents a perturbation analysis for the estimate of epipolar matrices using the 8-Point Algorithm (8-PA).
1194	Robustness of 3D Deep Learning in an Adversarial Setting	Matthew Wicker, Marta Kwiatkowska	In this work, we develop an algorithm for analysis of pointwise robustness of neural networks that operate on 3D data.
1195	SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations	Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison	We introduce a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image.
1196	StereoDRNet: Dilated Residual StereoNet	Rohan Chabra, Julian Straub, Christopher Sweeney, Richard Newcombe, Henry Fuchs	We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene.
1197	The Alignment of the Spheres: Globally-Optimal Spherical Mixture Alignment for Camera Pose Estimation	Dylan Campbell, Lars Petersson, Laurent Kneip, Hongdong Li, Stephen Gould	Hence, we cast the problem as a 2D-3D mixture model alignment task and propose the first globally-optimal solution to this formulation under the robust L2 distance between mixture distributions.
1198	Learning Joint Reconstruction of Hands and Manipulated Objects	Yana Hasson, Gul Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid	In this work, we regu- larize the joint reconstruction of hands and objects with ma- nipulation constraints. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations.
1199	Deep Single Image Camera Calibration With Radial Distortion	Manuel Lopez, Roger Mari, Pau Gargallo, Yubin Kuang, Javier Gonzalez-Jimenez, Gloria Haro	In this work we propose a method to predict extrinsic (tilt and roll) and intrinsic (focal length and radial distortion) parameters from a single image.
1200	CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth	Jose M. Facil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, Javier Civera	In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns.
1201	Translate-to-Recognize Networks for RGB-D Scene Recognition	Dapeng Du, Limin Wang, Huiling Wang, Kai Zhao, Gangshan Wu	To this end, this paper presents a unified framework to integrate the tasks of cross-modal translation and modality-specific recognition, termed as Translate-to-Recognize Network TRecgNet.
1202	Re-Identification Supervised Texture Generation	Jian Wang, Yunshan Zhong, Yachun Li, Chi Zhang, Yichen Wei	In this paper, we propose an end-to-end learning strategy to generate textures of human bodies under the supervision of person re-identification.
1203	Action4D: Online Action Recognition in the Crowd and Clutter	Quanzeng You, Hao Jiang	At the first step, we propose a new method to track people in 4D, which can reliably detect and follow each person in real time.
1204	Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction	Jason Ku, Alex D. Pon, Steven L. Waslander	We present MonoPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction.
1205	Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks	Yunfan Liu, Qi Li, Zhenan Sun	In this paper, we propose an attribute-aware face aging model with wavelet based Generative Adversarial Networks (GANs) to address the above issues.
1206	Noise-Tolerant Paradigm for Training Face Recognition CNNs	Wei Hu, Yangyu Huang, Fan Zhang, Ruirui Li	Thus, we propose a novel training paradigm that employs the idea of weighting samples based on the above probability.
1207	Low-Rank Laplacian-Uniform Mixed Model for Robust Face Recognition	Jiayu Dong, Huicheng Zheng, Lina Lian	In this paper, we aim at recognizing identities from faces with varying levels of noises of various forms such as occlusion, pixel corruption, or disguise, and take improving the fitting ability of the error model as the key to addressing this problem.
1208	Generalizing Eye Tracking With Bayesian Adversarial Learning	Kang Wang, Rui Zhao, Hui Su, Qiang Ji	To improve the generalization performance, we propose to incorporate adversarial learning and Bayesian inference into a unified framework.
1209	Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection	Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, Shiguang Shan	To resolve these issues, in this work, we propose a novel AU detection method by utilizing local information and the relationship of individual local face regions.
1210	Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer	Shile Li, Dongheui Lee	In this paper, we present a novel deep learning hand pose estimation method for an unordered point cloud.
1211	Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis	Yu Yu, Gang Liu, Jean-Marc Odobez	In doing so, our contributions are threefold:(i) we design our gaze redirection framework from synthetic data, allowing us to benefit from aligned training sample pairs to predict accurate inverse mapping fields; (ii) we proposed a self-supervised approach for domain adaptation; (iii) we exploit the gaze redirection to improve the performance of person-specific gaze estimation.
1212	AdaptiveFace: Adaptive Margin and Sampling for Face Recognition	Hao Liu, Xiangyu Zhu, Zhen Lei, Stan Z. Li	In this paper, we argue that the margin should be adapted to different classes.
1213	Disentangled Representation Learning for 3D Face Shape	Zi-Hang Jiang, Qianyi Wu, Keyu Chen, Juyong Zhang	In this paper, we present a novel strategy to design disentangled 3D face shape representation.
1214	LBS Autoencoder: Self-Supervised Fitting of Articulated Meshes to Point Clouds	Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabas Poczos, Yaser Sheikh	We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds.
1215	PifPaf: Composite Fields for Human Pose Estimation	Sven Kreiss, Lorenzo Bertoni, Alexandre Alahi	We propose a new bottom-up method for multi-person 2D human pose estimation that is particularly well suited for urban mobility such as self-driving cars and delivery robots.
1216	TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection	Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun	In this paper, we define these ambiguous samples as “transitional states”, and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states.
1217	Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos	Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, Svetha Venkatesh	We propose a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features.
1218	Local Temporal Bilinear Pooling for Fine-Grained Action Parsing	Yan Zhang, Siyu Tang, Krikamol Muandet, Christian Jarvers, Heiko Neumann	In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net.
1219	Improving Action Localization by Progressive Cross-Stream Cooperation	Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu	In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to iterative improve action localization results and generate better bounding boxes for one stream (i.e., Flow/RGB) by leveraging both region proposals and features from another stream (i.e., RGB/Flow) in an iterative fashion.
1220	Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition	Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu	In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition.
1221	A Neural Network Based on SPD Manifold Learning for Skeleton-Based Hand Gesture Recognition	Xuan Son Nguyen, Luc Brun, Olivier Lezoray, Sebastien Bougleux	This paper proposes a new neural network based on SPD manifold learning for skeleton-based hand gesture recognition.
1222	Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition	Deepti Ghadiyaram, Du Tran, Dhruv Mahajan	This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition.
1223	Learning Spatio-Temporal Representation With Local and Global Diffusion	Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei	In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD).
1224	Unsupervised Learning of Action Classes With Continuous Temporal Embedding	Anna Kukleva, Hilde Kuehne, Fadime Sener, Jurgen Gall	To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences.
1225	Double Nuclear Norm Based Low Rank Representation on Grassmann Manifolds for Clustering	Xinglin Piao, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin	In this paper, we propose a new low rank model for high-dimension data clustering task on Grassmann manifold based on the Double Nuclear norm which is used to better approximate the rank minimization of matrix.
1226	SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction	Pu Zhang, Wanli Ouyang, Pengfei Zhang, Jianru Xue, Nanning Zheng	In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism.
1227	Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes	Yiran Zhong, Pan Ji, Jianyuan Wang, Yuchao Dai, Hongdong Li	In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning.
1228	An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM	Patrick Geneva, James Maley, Guoquan Huang	In this paper, we propose a novel, high-precision, efficient visual-inertial (VI)-SLAM algorithm, termed Schmidt-EKF VI-SLAM (SEVIS), which optimally fuses IMU measurements and monocular images in a tightly-coupled manner to provide 3D motion tracking with bounded error.
1229	A Neural Temporal Model for Human Motion Prediction	Anand Gopalakrishnan, Ankur Mali, Dan Kifer, Lee Giles, Alexander G. Ororbia	We propose novel neural temporal models for predicting and synthesizing human motion, achieving state-of-the-art in modeling long-term motion trajectories while being competitive with prior work in short-term prediction and requiring significantly less computation.
1230	Multi-Agent Tensor Fusion for Contextual Trajectory Prediction	Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu	Specifically, the model encodes multiple agents’ past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context.
1231	Coordinate-Based Texture Inpainting for Pose-Guided Human Image Generation	Artur Grigorev, Artem Sevastopolsky, Alexander Vakhitov, Victor Lempitsky	We present a new deep learning approach to pose-guided resynthesis of human photographs.
1232	On Stabilizing Generative Adversarial Training With Noise	Simon Jenni, Paolo Favaro	We present a novel method and analysis to train generative adversarial networks (GAN) in a stable manner.
1233	Self-Supervised GANs via Auxiliary Rotation Loss	Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby	In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs.
1234	Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture	Ning Yu, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Michal Lukac	To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and projected back onto the image domain, thus ensuring both intuitive control and realistic results.
1235	Object-Driven Text-To-Image Synthesis via Adversarial Training	Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao	In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow attention-driven, multi-stage refinement for synthesizing complex images from text descriptions.
1236	Zoom-In-To-Check: Boosting Video Interpolation via Instance-Level Discrimination	Liangzhe Yuan, Yibo Chen, Hantian Liu, Tao Kong, Jianbo Shi	We propose a light-weight video frame interpolation algorithm.
1237	Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions	Zhilin Zheng, Li Sun	But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, zs and zu, for a single input.
1238	Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager	Yuanyuan Zhao, Xuemei Hu, Hui Guo, Zhan Ma, Tao Yue, Xun Cao	In this work, we propose a novel multispectral imaging technique, that could capture the multispectral images with a high light efficiency.
1239	Quasi-Unsupervised Color Constancy	Simone Bianco, Claudio Cusano	We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale.
1240	Deep Defocus Map Estimation Using Domain Adaptation	Junyong Lee, Sungkil Lee, Sunghyun Cho, Seungyong Lee	In this paper, we propose the first end-to-end convolutional neural network (CNN) architecture, Defocus Map Estimation Network (DMENet), for spatially varying defocus map estimation.
1241	Using Unknown Occluders to Recover Hidden Scenes	Adam B. Yedidia, Manel Baradad, Christos Thrampoulidis, William T. Freeman, Gregory W. Wornell	In this paper, we relax this often impractical assumption, extending the range of applications for passive occluder-based NLoS imaging systems.
1242	Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation	Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, Michael J. Black	To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems.
1243	Learning Parallax Attention for Stereo Image Super-Resolution	Longguang Wang, Yingqian Wang, Zhengfa Liang, Zaiping Lin, Jungang Yang, Wei An, Yulan Guo	In this paper, we propose a parallax-attention stereo superresolution network (PASSRnet) to integrate the information from a stereo image pair for SR.
1244	Knowing When to Stop: Evaluation and Verification of Conformity to Output-Size Specifications	Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli	In this paper, we study the vulnerability of these models to attacks aimed at changing the output-size that can have undesirable consequences including increased computation and inducing faults in downstream modules that expect outputs of a certain length.
1245	Spatial Attentive Single-Image Deraining With a High Quality Real Rain Dataset	Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, Rynson W.H. Lau	In this paper, we address the single image rain removal problem in two ways. Using this method, we construct a large-scale dataset of 29.5K rain/rain-free image pairs that covers a wide range of natural rain scenes.
1246	Focus Is All You Need: Loss Functions for Event-Based Vision	Guillermo Gallego, Mathias Gehrig, Davide Scaramuzza	We present a collection and taxonomy of twenty two objective functions to analyze event alignment in motion compensation approaches.
1247	Scalable Convolutional Neural Network for Image Compressed Sensing	Wuzhen Shi, Feng Jiang, Shaohui Liu, Debin Zhao	In this paper, we propose a scalable convolutional neural network (dubbed SCSNet) to achieve scalable sampling and scalable reconstruction with only one model.
1248	Event Cameras, Contrast Maximization and Reward Functions: An Analysis	Timo Stoffregen, Lindsay Kleeman	In this work we examine the choice of reward used in contrast maximization, propose a classification of different rewards and show how a reward can be constructed that is more robust to noise and aperture uncertainty.
1249	Convolutional Neural Networks Can Be Deceived by Visual Illusions	Alexander Gomez-Villa, Adrian Martin, Javier Vazquez-Corral, Marcelo Bertalmio	In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size.
1250	PDE Acceleration for Active Contours	Anthony Yezzi, Ganesh Sundaramoorthi, Minas Benyamin	We extend their formulation to the PDE framework, specifically for the infinite dimensional manifold of continuous curves, to introduce acceleration, and its added robustness, into the broad range of PDE based active contours.
1251	Dichromatic Model Based Temporal Color Constancy for AC Light Sources	Jun-Sang Yoo, Jong-Ok Kim	In this paper, we propose a novel approach to estimate the illuminant chromaticity of AC light source using high-speed camera.
1252	Semantic Attribute Matching Networks	Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, Kwanghoon Sohn	We present semantic attribute matching networks (SAM-Net) for jointly establishing correspondences and transferring attributes across semantically similar images, which intelligently weaves the advantages of the two tasks while overcoming their limitations.
1253	Skin-Based Identification From Multispectral Image Data Using CNNs	Takeshi Uemori, Atsushi Ito, Yusuke Moriuchi, Alexander Gatto, Jun Murayama	In this paper, we propose a new biometric identification system based solely on a skin patch from a multispectral image.
1254	Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks	Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka	We propose an alternative approach using a second order optimization method that shows similar generalization capability to first order methods, but converges faster and can handle larger mini-batches.
1255	Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments	Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz	In this paper, we aim to predict affordances of 3D indoor scenes, specifically what human poses are afforded by a given indoor environment, such as sitting on a chair or standing on the floor.
1256	PIEs: Pose Invariant Embeddings	Chih-Hui Ho, Pedro Morgado, Amir Persekian, Nuno Vasconcelos	A taxonomic classification of embeddings, according to their level of invariance, is introduced and used to clarify connections between existing embeddings, identify missing approaches, and propose invariant generalizations.
1257	Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning	Kshitij Dwivedi, Gemma Roig	We address this problem by proposing an approach to assess the relationship between visual tasks and their task-specific models.
1258	Object Counting and Instance Segmentation With Image-Level Supervision	Hisham Cholakkal, Guolei Sun, Fahad Shahbaz Khan, Ling Shao	We propose an image-level supervised approach that provides both the global object count and the spatial distribution of object instances by constructing an object category density map.
1259	Variational Autoencoders Pursue PCA Directions (by Accident)	Michal Rolinek, Dominik Zietlow, Georg Martius	Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments.
1260	A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes	Lichao Mou, Yuansheng Hua, Xiao Xiang Zhu	In this work, we introduce two simple yet effective network units, the spatial relation module and the channel relation module, to learn and reason about global relationships between any two spatial positions or feature maps, and then produce relation-augmented feature representations.
1261	Temporal Transformer Networks: Joint Learning of Invariant and Discriminative Time Warping	Suhas Lohit, Qiao Wang, Pavan Turaga	In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation.
1262	PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval	Wenxiao Zhang, Chunxia Xiao	In this paper, we propose a Point Contextual Attention Network (PCAN), which can predict the significance of each local point feature based on point context.
1263	Depth Coefficients for Depth Completion	Saif Imran, Yunfei Long, Xiaoming Liu, Daniel Morris	We propose a new representation for depth called Depth Coefficients (DC) to address this problem.
1264	Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection	Taekyung Kim, Minki Jeong, Seunghyeon Kim, Seokeon Choi, Changick Kim	We introduce a novel unsupervised domain adaptation approach for object detection.
1265	Good News, Everyone! Context Driven Entity-Aware Captioning for News Images	Ali Furkan Biten, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas	In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline.
1266	Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding	Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang	We address the problem of phrase grounding by learning a multi-level common semantic space shared by the textual and visual modalities.
1267	Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning	Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian	We argue that careful designing of visual features for this task is equally important, and present a visual feature encoding technique to generate semantically rich captions using Gated Recurrent Units (GRUs).
1268	Pointing Novel Objects in Image Captioning	Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei	In this paper, we propose to address the problem by augmenting standard deep captioning architectures with object learners.
1269	Informative Object Annotations: Tell Me Something I Don’t Know	Lior Bracha, Gal Chechik	Motivated by cognitive theories of categorization and communication, we present a new unsupervised approach to model this prior knowledge and quantify the informativeness of a description.
1270	Engaging Image Captioning via Personality	Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston	We build models that combine existing work from (i) sentence representations [36] with Transformers trained on 1.7 billion dialogue examples; and (ii) image representations [32] with ResNets trained on 3.5 billion social media images. We collect and release a large dataset of 241,858 of such captions conditioned over 215 possible traits.
1271	Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention	Khanh Nguyen, Debadeepta Dey, Chris Brockett, Bill Dolan	To model language-based assistance, we develop a general framework termed Imitation Learning with Indirect Intervention (I3L), and propose a solution that is effective on the VNLA task.
1272	TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments	Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi	We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object.
1273	A Simple Baseline for Audio-Visual Scene-Aware Dialog	Idan Schwartz, Alexander G. Schwing, Tamir Hazan	Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end.
1274	End-To-End Learned Random Walker for Seeded Image Segmentation	Lorenzo Cerrone, Alexander Zeilmann, Fred A. Hamprecht	We present an end-to-end learned algorithm for seeded segmentation.
1275	Efficient Neural Network Compression	Hyeji Kim, Muhammad Umar Karim Khan, Chong-Min Kyung	In this paper we propose an efficient method for obtaining the rank configuration of the whole network.
1276	Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms	Fandong Zhang, Ling Luo, Xinwei Sun, Zhen Zhou, Xiuli Li, Yizhou Yu, Yizhou Wang	In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models.
1277	C3AE: Exploring the Limits of Compact Model for Age Estimation	Chao Zhang, Shuaicheng Liu, Xun Xu, Ce Zhu	In this work, we investigate the limits of compact model for small-scale image and propose an extremely Compact yet efficient Cascade Context-based Age Estimation model(C3AE).
1278	Adaptive Weighting Multi-Field-Of-View CNN for Semantic Segmentation in Pathology	Hiroki Tokunaga, Yuki Teramoto, Akihiko Yoshizawa, Ryoma Bise	In this paper, we propose a novel semantic segmentation method, called Adaptive-Weighting-Multi-Field-of-View-CNN (AWMF-CNN), that can adaptively use image features from images with different magnifications to segment multiple cancer subtype regions in the input image.
1279	In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images	Marin Orsic, Ivan Kreso, Petra Bevandic, Sinisa Segvic	We propose an alternative approach which achieves a significantly better performance across a wide range of computing budgets.
1280	Context-Aware Visual Compatibility Prediction	Guillem Cucurull, Perouz Taslakian, David Vazquez	In this work we propose a method that predicts compatibility between two items based on their visual features, as well as their context.
1281	Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks	Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, Konstantinos Bousmalis	In this paper, we present Randomized-to-Canonical Adaptation Networks (RCANs), a novel approach to crossing the visual reality gap that uses no real-world data.
1282	Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation	Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou	We propose to tackle the problem of multiview 2D/3D rigid registration for intervention via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2).
1283	Context-Aware Spatio-Recurrent Curvilinear Structure Segmentation	Feigege Wang, Yue Gu, Wenxi Liu, Yuanlong Yu, Shengfeng He, Jia Pan	In this paper, we propose a novel curvilinear structure segmentation approach using context-aware spatio-recurrent networks.
1284	An Alternative Deep Feature Approach to Line Level Keyword Spotting	George Retsinas, Georgios Louloudis, Nikolaos Stamatopoulos, Giorgos Sfikas, Basilis Gatos	In this work, we propose a time and storage-efficient, deep feature-based approach that enables both the image and textual search options.
1285	Dynamics Are Important for the Recognition of Equine Pain in Video	Sofia Broome, Karina Bech Gleerup, Pia Haubro Andersen, Hedvig Kjellstrom	In this study, we propose a deep recurrent two-stream architecture for the task of distinguishing pain from non-pain in videos of horses.
1286	LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving	Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington	In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving.
1287	Machine Vision Guided 3D Medical Image Compression for Efficient Transmission and Accurate Segmentation in the Clouds	Zihao Liu, Xiaowei Xu, Tao Liu, Qi Liu, Yanzhi Wang, Yiyu Shi, Wujie Wen, Meiping Huang, Haiyun Yuan, Jian Zhuang	In this paper, we will use deep learning based medical image segmentation as a vehicle and demonstrate that interestingly, machine and human view the compression quality differently.
1288	PointPillars: Fast Encoders for Object Detection From Point Clouds	Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, Oscar Beijbom	In this paper, we consider the problem of encoding a point cloud into a format appropriate for a downstream detection pipeline.
1289	Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views	Kun Huang, Yifu Wang, Laurent Kneip	We present the complete theory of this novel solver, and test it on both simulated and real data.
1290	From Coarse to Fine: Robust Hierarchical Localization at Large Scale	Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, Marcin Dymczyk	In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization.
1291	Large Scale High-Resolution Land Cover Mapping With Multi-Resolution Data	Caleb Robinson, Le Hou, Kolya Malkin, Rachel Soobitsky, Jacob Czawlytko, Bistra Dilkina, Nebojsa Jojic	In this paper we propose multi-resolution data fusion methods for deep learning-based high-resolution land cover mapping from aerial imagery.
1292	Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting	Muming Zhao, Jian Zhang, Chongyang Zhang, Wenjun Zhang	In this paper, we propose to address these issues by leveraging the heterogeneous attributes compounded in the density map.
1293	AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data	Liheng Zhang, Guo-Jun Qi, Liqiang Wang, Jiebo Luo	In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach.
1294	2.5D Visual Sound	Ruohan Gao, Kristen Grauman	We propose to convert common monaural audio into binaural audio by leveraging video.