Paper Digest: CVPR 2019 Highlights
You can also download paper highlights by sessions (15 sessions in total):
3D Multiview; 3D Single View & RGBD; Action & Video;
Applications; Computational Photography & Graphics; Deep Learning;
Face & Body; Language & Reasoning; Low-Level & Optimization;
Motion & Biometrics; Recognition; Scenes & Representation;
Segmentation, Grouping, & Shape; Statistics, Physics, Theory, & Datasets; Synthesis.
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top computer vision conferences in the world. In 2019, it is to be held in California. There were more than 5,000 paper submissions, of which 1,294 were accepted. More than 100 papers also published their code (download link).
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
We thank all authors for writing these interesting papers, and readers for reading our digests. If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get new paper updates customized to your own interests on a daily basis. You are also welcome to follow us on Twitter and Linkedin for most recent updates.
Paper Digest Team
team@paperdigest.org
TABLE 1: CVPR 2019 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Finding Task-Relevant Features for Few-Shot Learning by Category Traversal | Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang | In this work, we introduce a Category Traversal Module that can be inserted as a plug-and-play module into most metric-learning based few-shot learners. |
2 | Edge-Labeling Graph Neural Network for Few-Shot Learning | Jongmin Kim, Taesup Kim, Sungwoong Kim, Chang D. Yoo | In this paper, we propose a novel edge-labeling graph neural network (EGNN), which adapts a deep neural network on the edge-labeling graph, for few-shot learning. |
3 | Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning | Spyros Gidaris, Nikos Komodakis | Given an initial recognition model already trained on a set of base classes, the goal of this work is to develop a meta-model for few-shot learning. |
4 | Kervolutional Neural Networks | Chen Wang, Jianfei Yang, Lihua Xie, Junsong Yuan | To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. |
5 | Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem | Matthias Hein, Maksym Andriushchenko, Julian Bitterwolf | For bounded domains like images we propose a new robust optimization technique similar to adversarial training which enforces low confidence predictions far away from the training data. |
6 | On the Structural Sensitivity of Deep Convolutional Networks to the Directions of Fourier Basis Functions | Yusuke Tsuzuku, Issei Sato | As a byproduct of the analysis, we propose an algorithm to create shift-invariant universal adversarial perturbations available in black-box settings. |
7 | Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource Utilization | Siyuan Qiao, Zhe Lin, Jianming Zhang, Alan L. Yuille | In this paper, we study the problem of improving computational resource utilization of neural networks. |
8 | Hardness-Aware Deep Metric Learning | Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, Jie Zhou | This paper presents a hardness-aware deep metric learning (HDML) framework. |
9 | Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation | Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L. Yuille, Li Fei-Fei | In this paper, we study NAS for semantic image segmentation. |
10 | Learning Loss for Active Learning | Donggeun Yoo, In So Kweon | In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. |
11 | Striking the Right Balance With Uncertainty | Salman Khan, Munawar Hayat, Syed Waqas Zamir, Jianbing Shen, Ling Shao | In this paper, we demonstrate that the Bayesian uncertainty estimates directly correlate with the rarity of classes and the difficulty level of individual samples. |
12 | AutoAugment: Learning Augmentation Strategies From Data | Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le | In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. |
13 | SDRSAC: Semidefinite-Based Randomized Approach for Robust Point Cloud Registration Without Correspondences | Huu M. Le, Thanh-Toan Do, Tuan Hoang, Ngai-Man Cheung | This paper presents a novel randomized algorithm for robust point cloud registration without correspondences. |
14 | BAD SLAM: Bundle Adjusted Direct RGB-D SLAM | Thomas Schops, Torsten Sattler, Marc Pollefeys | In contrast, in this paper we present a novel, fast direct BA formulation which we implement in a real-time dense RGB-D SLAM algorithm. In order to facilitate state-of-the-art research on direct RGB-D SLAM, we propose a novel, well-calibrated benchmark for this task that uses synchronized global shutter RGB and depth cameras. |
15 | Revealing Scenes by Inverting Structure From Motion Reconstructions | Francesco Pittaluga, Sanjeev J. Koppal, Sing Bing Kang, Sudipta N. Sinha | In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. |
16 | Strand-Accurate Multi-View Hair Capture | Giljoo Nam, Chenglei Wu, Min H. Kim, Yaser Sheikh | In this paper, we present the first method to capture high-fidelity hair geometry with strand-level accuracy. |
17 | DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation | Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove | In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. |
18 | Pushing the Boundaries of View Extrapolation With Multiplane Images | Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, Noah Snavely | We explore the problem of view synthesis from a narrow baseline pair of images, and focus on generating high-quality view extrapolations with plausible disocclusions. |
19 | GA-Net: Guided Aggregation Net for End-To-End Stereo Matching | Feihu Zhang, Victor Prisacariu, Ruigang Yang, Philip H.S. Torr | We propose two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively. |
20 | Real-Time Self-Adaptive Deep Stereo | Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Luigi Di Stefano | Instead, we propose to perform unsupervised and continuous online adaptation of a deep stereo network, which allows for preserving its accuracy in any environment. |
21 | LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation | Sunok Kim, Seungryong Kim, Dongbo Min, Kwanghoon Sohn | We present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. |
22 | NM-Net: Mining Reliable Neighbors for Robust Feature Correspondences | Chen Zhao, Zhiguo Cao, Chi Li, Xin Li, Jiaqi Yang | To address this issue, we present a compatibility-specific mining method to search for consistent neighbors. |
23 | Coordinate-Free Carlsson-Weinshall Duality and Relative Multi-View Geometry | Matthew Trager, Martial Hebert, Jean Ponce | We present a coordinate-free description of Carlsson-Weinshall duality between scene points and camera pinholes and use it to derive a new characterization of primal/dual multi-view geometry. |
24 | Deep Reinforcement Learning of Volume-Guided Progressive View Inpainting for 3D Point Scene Completion From a Single Depth Image | Xiaoguang Han, Zhaoxuan Zhang, Dong Du, Mingdai Yang, Jingming Yu, Pan Pan, Xin Yang, Ligang Liu, Zixiang Xiong, Shuguang Cui | We present a deep reinforcement learning method of progressive view inpainting for 3D point scene completion under volume guidance, achieving high-quality scene reconstruction from only a single depth image with severe occlusion. |
25 | Video Action Transformer Network | Rohit Girdhar, Joao Carreira, Carl Doersch, Andrew Zisserman | We introduce the Action Transformer model for recognizing and localizing human actions in video clips. |
26 | Timeception for Complex Action Recognition | Noureldien Hussein, Efstratios Gavves, Arnold W.M. Smeulders | This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued. |
27 | STEP: Spatio-Temporal Progressive Learning for Video Action Detection | Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry S. Davis, Jan Kautz | In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector–a progressive learning framework for spatio-temporal action detection in videos. |
28 | Relational Action Forecasting | Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia Schmid | This paper focuses on multi-person action forecasting in videos. |
29 | Long-Term Feature Banks for Detailed Video Understanding | Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, Ross Girshick | In this paper, we enable existing video models to do the same. |
30 | Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes | Yuke Li | To this end, we propose a novel Imitative Decision Learning (IDL) approach. |
31 | What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment | Paritosh Parmar, Brendan Tran Morris | In this paper, we propose to learn spatio-temporal features that explain three related tasks – fine-grained action recognition, commentary generation, and estimating the AQA score. |
32 | MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation | Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou | In this paper, we propose a novel approach to defer the decision making for a target object in each frame, until a global view can be established with the entire video being taken into consideration. |
33 | Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model | Weining Wang, Yan Huang, Liang Wang | Considering that current methods are generally time-consuming due to the dense frame-processing manner, we propose a recurrent neural network based reinforcement learning model which selectively observes a sequence of frames and associates the given sentence with video content in a matching-based manner. |
34 | Gaussian Temporal Awareness Networks for Action Localization | Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei | In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. |
35 | Efficient Video Classification Using Fewer Frames | Shweta Bhardwaj, Mukundhan Srinivasan, Mitesh M. Khapra | In this work, we focus on building compute-efficient video classification models which process fewer frames and hence have less number of FLOPs. |
36 | Parsing R-CNN for Instance-Level Human Analysis | Lu Yang, Qing Song, Zhihui Wang, Ming Jiang | In this paper, we present an end-to-end pipeline for solving the instance-level human analysis, named Parsing R-CNN. |
37 | Large Scale Incremental Learning | Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, Yun Fu | We propose a simple and effective method to address this data imbalance issue. |
38 | TopNet: Structural Point Cloud Decoder | Lyne P. Tchapmi, Vineet Kosaraju, Hamid Rezatofighi, Ian Reid, Silvio Savarese | In this work, we pro-pose a novel decoder that generates a structured point cloud without assuming any specific structure or topology on the underlying point set. |
39 | Perceive Where to Focus: Learning Visibility-Aware Part-Level Features for Partial Person Re-Identification | Yifan Sun, Qin Xu, Yali Li, Chi Zhang, Yikang Li, Shengjin Wang, Jian Sun | We propose a Visibility-aware Part Model (VPM) for partial re-ID, which learns to perceive the visibility of regions through self-supervision. |
40 | Meta-Transfer Learning for Few-Shot Learning | Qianru Sun, Yaoyao Liu, Tat-Seng Chua, Bernt Schiele | In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. |
41 | Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation | Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid | In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically for mobile devices with limited power capacity and computation resources. |
42 | Deep RNN Framework for Visual Sequential Applications | Bo Pang, Kaiwen Zha, Hanwen Cao, Chen Shi, Cewu Lu | To deal with this, we propose a new recurrent neural framework that can be stacked deep effectively. |
43 | Graph-Based Global Reasoning Networks | Yunpeng Chen, Marcus Rohrbach, Zhicheng Yan, Yan Shuicheng, Jiashi Feng, Yannis Kalantidis | In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed. |
44 | SSN: Learning Sparse Switchable Normalization via SparsestMax | Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo | This work addresses this issue by presenting Sparse Switchable Normalization (SSN) where the importance ratios are constrained to be sparse. |
45 | Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition | Yongming Rao, Jiwen Lu, Jie Zhou | We present a generic, flexible and 3D rotation invariant framework based on spherical symmetry for point cloud recognition. |
46 | Learning to Generate Synthetic Data via Compositing | Shashank Tripathi, Siddhartha Chandra, Amit Agrawal, Ambrish Tyagi, James M. Rehg, Visesh Chari | We present a task-specific approach to synthetic data generation. |
47 | Divide and Conquer the Embedding Space for Metric Learning | Artsiom Sanakoyeu, Vadim Tschernezki, Uta Buchler, Bjorn Ommer | In this work, we propose a novel easy-to-implement divide and conquer approach for deep metric learning, which significantly improves the state-of-the-art performance of metric learning. |
48 | Latent Space Autoregression for Novelty Detection | Davide Abati, Angelo Porrello, Simone Calderara, Rita Cucchiara | In our proposal, we design a general unsupervised framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying the latent representations with an autoregressive procedure. |
49 | Attending to Discriminative Certainty for Domain Adaptation | Vinod Kumar Kurmi, Shanu Kumar, Vinay P. Namboodiri | In this paper, we aim to solve for unsupervised domain adaptation of classifiers where we have access to label information for the source domain while these are not available for a target domain. |
50 | Feature Denoising for Improving Adversarial Robustness | Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L. Yuille, Kaiming He | Motivated by this observation, we develop new network architectures that increase adversarial robustness by performing feature denoising. |
51 | Selective Kernel Networks | Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang | We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. |
52 | On Implicit Filter Level Sparsity in Convolutional Neural Networks | Dushyant Mehta, Kwang In Kim, Christian Theobalt | We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay. |
53 | FlowNet3D: Learning Scene Flow in 3D Point Clouds | Xingyu Liu, Charles R. Qi, Leonidas J. Guibas | In this work, we propose a novel deep neural network named FlowNet3D that learns scene flow from point clouds in an end-to-end fashion. |
54 | Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks | Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese | In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). |
55 | Co-Occurrent Features in Semantic Segmentation | Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie | In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target. |
56 | Bag of Tricks for Image Classification with Convolutional Neural Networks | Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li | In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. |
57 | Learning Channel-Wise Interactions for Binary Convolutional Neural Networks | Ziwei Wang, Jiwen Lu, Chenxin Tao, Jie Zhou, Qi Tian | In this paper, we propose a channel-wise interaction based binary convolutional neural network learning method (CI-BCNN) for efficient inference. |
58 | Knowledge Adaptation for Efficient Semantic Segmentation | Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan | To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride. |
59 | Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack | Zhezhi He, Adnan Siraj Rakin, Deliang Fan | In this work, we propose Parametric-Noise-Injection (PNI) which involves trainable Gaussian noise injection at each layer on either activation or weights through solving the Min-Max optimization problem, embedded with adversarial training. |
60 | Invariance Matters: Exemplar Memory for Domain Adaptive Person Re-Identification | Zhun Zhong, Liang Zheng, Zhiming Luo, Shaozi Li, Yi Yang | In this work, we comprehensively investigate into the intra-domain variations of the target domain and propose to generalize the re-ID model w.r.t three types of the underlying invariance, i.e., exemplar-invariance, camera-invariance and neighborhood-invariance. |
61 | Dissecting Person Re-Identification From the Viewpoint of Viewpoint | Xiaoxiao Sun, Liang Zheng | To derive insights in this scientific campaign, this paper makes an early attempt in studying a particular factor, viewpoint. |
62 | Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification | Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, Shin’ichi Satoh | To address the problem, this paper introduces a novel Dual-level Discrepancy Reduction Learning (D^2RL) scheme which handles the two discrepancies separately. |
63 | Progressive Feature Alignment for Unsupervised Domain Adaptation | Chaoqi Chen, Weiping Xie, Wenbing Huang, Yu Rong, Xinghao Ding, Yue Huang, Tingyang Xu, Junzhou Huang | In this paper, we propose the Progressive Feature Alignment Network (PFAN) to align the discriminative features across domains progressively and effectively, via exploiting the intra-class variation in the target domain. |
64 | Feature-Level Frankenstein: Eliminating Variations for Discriminative Recognition | Xiaofeng Liu, Site Li, Lingsheng Kong, Wanqing Xie, Ping Jia, Jane You, B.V.K. Kumar | In this paper, we cast these problems as an adversarial minimax game in the latent space. |
65 | Learning a Deep ConvNet for Multi-Label Classification With Partial Labels | Thibaut Durand, Nazanin Mehrasa, Greg Mori | To reduce the annotation cost, we propose to train a model with partial labels i.e. only some labels are known per image. |
66 | Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression | Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese | In this paper, we address the this weakness by introducing a generalized version of IoU as both a new loss and a new metric. |
67 | Densely Semantically Aligned Person Re-Identification | Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Zhibo Chen | We propose a densely semantically aligned person re-identification (re-ID) framework. By leveraging the estimation of the dense semantics of a person image, we construct a set of densely semantically aligned part images (DSAP-images), where the same spatial positions have the same semantics across different person images. |
68 | Generalising Fine-Grained Sketch-Based Image Retrieval | Kaiyue Pang, Ke Li, Yongxin Yang, Honggang Zhang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song | In this paper, we identify cross-category generalisation for FG-SBIR as a domain generalisation problem, and propose the first solution. |
69 | Adapting Object Detectors via Selective Cross-Domain Alignment | Xinge Zhu, Jiangmiao Pang, Ceyuan Yang, Jianping Shi, Dahua Lin | Motivated by this, we propose a novel approach to domain adaption for object detection to handle the issues in “where to look” and “how to align”. |
70 | Cyclic Guidance for Weakly Supervised Joint Detection and Segmentation | Yunhang Shen, Rongrong Ji, Yan Wang, Yongjian Wu, Liujuan Cao | In particular, we present an efficient and effective framework termed Weakly Supervised Joint Detection and Segmentation (WS-JDS). |
71 | Thinking Outside the Pool: Active Training Image Creation for Relative Attributes | Aron Yu, Kristen Grauman | We propose an active image generation approach to address this issue. |
72 | Generalizable Person Re-Identification by Domain-Invariant Mapping Network | Jifei Song, Yongxin Yang, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales | In this work, a novel deep ReID model termed Domain-Invariant Mapping Network (DIMN) is proposed. |
73 | Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification | Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, Song Wang | To address this problem, we propose a two-branch network with an original image and its transformed image as inputs and introduce a new attention consistency loss that measures the attention heatmap consistency between two branches. |
74 | Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification | Song Bai, Peng Tang, Philip H.S. Torr, Longin Jan Latecki | Based on the analysis, we propose a unified yet robust algorithm which inherits their advantages and discards their disadvantages. |
75 | Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization | Junbao Zhuo, Shuhui Wang, Shuhao Cui, Qingming Huang | We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. |
76 | Weakly Supervised Person Re-Identification | Jingke Meng, Sheng Wu, Wei-Shi Zheng | We cast this weakly supervised person re-id challenge into a multi-instance multi-label learning (MIML) problem. |
77 | PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud | Shaoshuai Shi, Xiaogang Wang, Hongsheng Li | In this paper, we propose PointRCNN for 3D object detection from raw point cloud. |
78 | Automatic Adaptation of Object Detectors to New Domains Using Self-Training | Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller | A modified knowledge distillation loss is proposed, and we investigate several ways of assigning soft-labels to the training examples from the target domain. |
79 | Deep Sketch-Shape Hashing With Segmented 3D Stochastic Viewing | Jiaxin Chen, Jie Qin, Li Liu, Fan Zhu, Fumin Shen, Jin Xie, Ling Shao | In this paper, we propose a novel framework for efficient sketch-based 3D shape retrieval, i.e., Deep Sketch-Shape Hashing (DSSH), which tackles the challenging problem from two perspectives. |
80 | Generative Dual Adversarial Network for Generalized Zero-Shot Learning | He Huang, Changhu Wang, Philip S. Yu, Chang-Dong Wang | In this paper, we propose a novel model that provides a unified framework for three different approaches: visual->semantic mapping, semantic->visual mapping, and metric learning. |
81 | Query-Guided End-To-End Person Search | Bharti Munjal, Sikandar Amin, Federico Tombari, Fabio Galasso | We introduce a novel query-guided end-to-end person search network (QEEPS) to address both aspects. |
82 | Libra R-CNN: Towards Balanced Learning for Object Detection | Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin | In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels – sample level, feature level, and objective level. |
83 | Learning a Unified Classifier Incrementally via Rebalancing | Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, Dahua Lin | In this work, we develop a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly. |
84 | Feature Selective Anchor-Free Module for Single-Shot Object Detection | Chenchen Zhu, Yihui He, Marios Savvides | We motivate and present feature selective anchor-free (FSAF) module, a simple and effective building block for single-shot object detectors. |
85 | Bottom-Up Object Detection by Grouping Extreme and Center Points | Xingyi Zhou, Jiacheng Zhuo, Philipp Krahenbuhl | In this paper, we show that bottom-up approaches still perform competitively. |
86 | Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples | Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, Wujie Wen | To overcome these limitations, we propose a JPEG-based defensive compression framework, namely “feature distillation”, to effectively rectify adversarial examples without impacting classification accuracy on benign data. |
87 | SCOPS: Self-Supervised Co-Part Segmentation | Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz | We propose a self-supervised deep learning approach for part segmentation, where we devise several loss functions that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances. |
88 | Unsupervised Moving Object Detection via Contextual Information Separation | Yanchao Yang, Antonio Loquercio, Davide Scaramuzza, Stefano Soatto | We propose an adversarial contextual model for detecting moving objects in images. |
89 | Pose2Seg: Detection Free Human Instance Segmentation | Song-Hai Zhang, Ruilong Li, Xin Dong, Paul Rosin, Zixi Cai, Xi Han, Dingcheng Yang, Haozhi Huang, Shi-Min Hu | In this paper, we present a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, rather than proposal region detection. Therefore, in this paper we introduce a new benchmark “Occluded Human (OCHuman)”, which focuses on occluded humans with comprehensive annotations including bounding-box, human pose and instance masks. |
90 | DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios | Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, Bolei Zhou | In this paper, we construct a novel large-scale stereo dataset named DrivingStereo. |
91 | PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding | Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, Hao Su | We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. |
92 | A Dataset and Benchmark for Large-Scale Multi-Modal Face Anti-Spoofing | Shifeng Zhang, Xiaobo Wang, Ajian Liu, Chenxu Zhao, Jun Wan, Sergio Escalera, Hailin Shi, Zezheng Wang, Stan Z. Li | To facilitate face anti-spoofing research, we introduce a large-scale multi-modal dataset, namely CASIA-SURF, which is the largest publicly available dataset for face anti-spoofing in terms of both subjects and visual modalities. We also provide a measurement set, evaluation protocol and training/validation/testing subsets, developing a new benchmark for face anti-spoofing. |
93 | Unsupervised Learning of Consensus Maximization for 3D Vision Problems | Thomas Probst, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool | In this paper, we propose for the first time an unsupervised learning framework for consensus maximization, in the context of solving 3D vision problems. |
94 | VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People | Danna Gurari, Qing Li, Chi Lin, Yinan Zhao, Anhong Guo, Abigale Stangl, Jeffrey P. Bigham | We introduce the first visual privacy dataset originating from people who are blind in order to better understand their privacy disclosures and to encourage the development of algorithms that can assist in preventing their unintended disclosures. |
95 | Structural Relational Reasoning of Point Clouds | Yueqi Duan, Yu Zheng, Jiwen Lu, Jie Zhou, Qi Tian | In this paper, we propose an effective plug-and-play module called the structural relation network (SRN) to reason about the structural dependencies of local regions in 3D point clouds. |
96 | MVF-Net: Multi-View 3D Face Morphable Model Regression | Fanzi Wu, Linchao Bao, Yajing Chen, Yonggen Ling, Yibing Song, Songnan Li, King Ngi Ngan, Wei Liu | We in this paper explore 3DMM-based shape recovery in a different setting, where a set of multi-view facial images are given as input. |
97 | Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction | Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, Simon Lucey | In this paper, we address the problem of 3D object mesh reconstruction from RGB videos. |
98 | Guided Stereo Matching | Matteo Poggi, Davide Pallotti, Fabio Tosi, Stefano Mattoccia | Therefore, in this paper, we introduce Guided Stereo Matching, a novel paradigm leveraging a small amount of sparse, yet reliable depth measurements retrieved from an external source enabling to ameliorate this weakness. |
99 | Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion | Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis | In this work, we propose a novel framework for unsupervised learning for event cameras that learns motion information from only the event stream. |
100 | Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN | Shiyi Lan, Ruichi Yu, Gang Yu, Larry S. Davis | To address this problem, we propose Geo-CNN, which applies a generic convolution-like operation dubbed as GeoConv to each point and its local neighborhood. |
101 | 3D Point Capsule Networks | Yongheng Zhao, Tolga Birdal, Haowen Deng, Federico Tombari | In this paper, we propose 3D point-capsule networks, an auto-encoder designed to process sparse 3D point clouds while preserving spatial arrangements of the input data. |
102 | GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving | Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang | We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving. |
103 | Single-Image Piece-Wise Planar 3D Reconstruction via Associative Embedding | Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, Shenghua Gao | To tackle this problem, we propose a novel two-stage method based on associative embedding, inspired by its recent success in instance segmentation. |
104 | 3DN: 3D Deformation Network | Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann | Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we introduce 3DN, an end-to-end network that deforms the source model to resemble the target. |
105 | HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation | Cheng Sun, Chi-Wei Hsiao, Min Sun, Hwann-Tzong Chen | We present a new approach to the problem of estimating the 3D room layout from a single panoramic image. |
106 | Deep Fitting Degree Scoring Network for Monocular 3D Object Detection | Lijie Liu, Jiwen Lu, Chunjing Xu, Qi Tian, Jie Zhou | In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively. |
107 | Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering | Seungryul Baek, Kwang In Kim, Tae-Kyun Kim | Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering? |
108 | Self-Supervised Learning of 3D Human Pose Using Multi-View Geometry | Muhammed Kocabas, Salih Karagoz, Emre Akbas | To address these problems, we present EpipolarPose, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics. |
109 | FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation From a Single Image | Tsun-Yi Yang, Yi-Ting Chen, Yen-Yu Lin, Yung-Yu Chuang | This paper proposes a method for head pose estimation from a single image. |
110 | Dense 3D Face Decoding Over 2500FPS: Joint Texture & Shape Convolutional Mesh Decoders | Yuxiang Zhou, Jiankang Deng, Irene Kotsia, Stefanos Zafeiriou | In this paper, we present the first, to the best of our knowledge, non-linear 3DMMs by learning joint texture and shape auto-encoders using direct mesh convolutions. |
111 | Does Learning Specific Features for Related Parts Help Human Pose Estimation? | Wei Tang, Ying Wu | Ablation experiments indicate learning specific features significantly improves the localization of occluded parts and thus benefits HPE. |
112 | Linkage Based Face Clustering via Graph Convolution Network | Zhongdao Wang, Liang Zheng, Yali Li, Shengjin Wang | In this paper, we present an accurate and scalable approach to the face clustering task. |
113 | Towards High-Fidelity Nonlinear 3D Face Morphable Model | Luan Tran, Feng Liu, Xiaoming Liu | To address this problem, this paper presents a novel approach to learn additional proxies as means to side-step strong regularizations, as well as, leverages to promote detailed shape/albedo. |
114 | RegularFace: Deep Face Recognition via Exclusive Regularization | Kai Zhao, Jingyi Xu, Ming-Ming Cheng | In this paper, we propose the `exclusive regularization’ that focuses on the other aspect of discriminability — the inter-class separability, which is neglected in many recent approaches. |
115 | BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation | Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie Zhou, Qi Tian | In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively. |
116 | GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction | Baris Gecer, Stylianos Ploumpis, Irene Kotsia, Stefanos Zafeiriou | In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. |
117 | Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training | Mahdi Abavisani, Hamid Reza Vaezi Joze, Vishal M. Patel | We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. |
118 | Learning to Reconstruct People in Clothing From a Single RGB Camera | Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, Gerard Pons-Moll | We present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4 to 5mm, while being orders of magnitude faster than previous methods. |
119 | Distilled Person Re-Identification: Towards a More Scalable System | Ancong Wu, Wei-Shi Zheng, Xiaowei Guo, Jian-Huang Lai | To solve these problems in a unified system, we propose a Multi-teacher Adaptive Similarity Distillation Framework, which requires only a few labelled identities of target domain to transfer knowledge from multiple teacher models to a user-specified lightweight student model without accessing source domain data. |
120 | A Perceptual Prediction Framework for Self Supervised Event Segmentation | Sathyanarayanan N. Aakur, Sudeep Sarkar | In this paper, we tackle the problem of self-supervised temporal segmentation that alleviates the need for any supervision in the form of labels (full supervision) or temporal ordering (weak supervision). |
121 | COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis | Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou | To address these problems, we introduce a large-scale dataset called “COIN” for COmprehensive INstruction video analysis. |
122 | Recurrent Attentive Zooming for Joint Crowd Counting and Precise Localization | Chenchen Liu, Xinyu Weng, Yadong Mu | To address this issue, this work proposes a novel framework that simultaneously solving two inherently related tasks – crowd counting and localization. |
123 | An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition | Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan | In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. |
124 | Graph Convolutional Label Noise Cleaner: Train a Plug-And-Play Action Classifier for Anomaly Detection | Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H. Li, Ge Li | In this paper, we provide a new perspective, i.e., a supervised learning task under noisy labels. |
125 | MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment | Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis | In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network. |
126 | Less Is More: Learning Highlight Detection From Video Duration | Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, Kristen Grauman | We propose a scalable unsupervised solution that exploits video duration as an implicit supervision signal. |
127 | DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition | Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan | To remedy these issues, we propose a lightweight generator network, which reduces noises in motion vectors and captures fine motion details, achieving a more Discriminative Motion Cue (DMC) representation. |
128 | AdaFrame: Adaptive Frame Selection for Fast Video Recognition | Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis | We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition. |
129 | Spatio-Temporal Video Re-Localization by Warp LSTM | Yang Feng, Lin Ma, Wei Liu, Jiebo Luo | In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. |
130 | Completeness Modeling and Context Separation for Weakly Supervised Temporal Action Localization | Daochang Liu, Tingting Jiang, Yizhou Wang | In this work, we first identify two underexplored problems posed by the weak supervision for temporal action localization, namely action completeness modeling and action-context separation. Then by presenting a novel network architecture and its training strategy, the two problems are explicitly looked into. |
131 | Unsupervised Deep Tracking | Ning Wang, Yibing Song, Chao Ma, Wengang Zhou, Wei Liu, Houqiang Li | We propose an unsupervised visual tracking method in this paper. |
132 | Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers | Zhen He, Jian Li, Daxue Liu, Hangen He, David Barber | To achieve both label-free and end-to-end learning of MOT, we propose a Tracking-by-Animation framework, where a differentiable neural model first tracks objects from input frames and then animates these objects into reconstructed frames. |
133 | Fast Online Object Tracking and Segmentation: A Unifying Approach | Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, Philip H.S. Torr | In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. |
134 | Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters | Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas | Object Tracking by Reconstruction With View-Specific Discriminative Correlation Filters |
135 | SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints | Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatofighi, Silvio Savarese | We present SoPhie; an interpretable framework based on Generative Adversarial Network (GAN), which leverages two sources of information, the path history of all the agents in a scene, and the scene context information, using images of the scene. |
136 | Leveraging Shape Completion for 3D Siamese Tracking | Silvio Giancola, Jesus Zarzar, Bernard Ghanem | In this paper, we investigate the versatility of Shape Completion for 3D Object Tracking in LIDAR point clouds. |
137 | Target-Aware Deep Tracking | Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He, Ming-Hsuan Yang | In this paper, we propose a novel scheme to learn target-aware features, which can better recognize the targets undergoing significant appearance variations than pre-trained deep features. |
138 | Spatiotemporal CNN for Video Object Segmentation | Kai Xu, Longyin Wen, Guorong Li, Liefeng Bo, Qingming Huang | In this paper, we present a unified, end-to-end trainable spatiotemporal CNN model for VOS, which consists of two branches, i.e., the temporal coherence branch and the spatial segmentation branch. |
139 | Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification | Wenjie Yang, Houjing Huang, Zhang Zhang, Xiaotang Chen, Kaiqi Huang, Shu Zhang | This paper proposes to discover diverse discriminative visual cues without extra assistance, e.g., pose estimation, human parsing. |
140 | Wide-Context Semantic Image Extrapolation | Yi Wang, Xin Tao, Xiaoyong Shen, Jiaya Jia | We propose a semantic regeneration network with several special contributions and use multiple spatial related losses to address these issues. |
141 | End-To-End Time-Lapse Video Synthesis From a Single Outdoor Image | Seonghyeon Nam, Chongyang Ma, Menglei Chai, William Brendel, Ning Xu, Seon Joo Kim | In this paper, we present an end-to-end solution to synthesize a time-lapse video from a single outdoor image using deep neural networks. |
142 | GIF2Video: Color Dequantization and Temporal Interpolation of GIF Images | Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai | In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild. We introduce two large datasets, namely GIF-Faces and GIF-Moments, for both training and evaluation. |
143 | Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis | Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang | In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs. |
144 | Pluralistic Image Completion | Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai | In this paper, we present an approach for pluralistic image completion – the task of generating multiple and diverse plausible solutions for image completion. |
145 | Salient Object Detection With Pyramid Attention and Salient Edges | Wenguan Wang, Shuyang Zhao, Jianbing Shen, Steven C. H. Hoi, Ali Borji | This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). |
146 | Latent Filter Scaling for Multimodal Unsupervised Image-To-Image Translation | Yazeed Alharbi, Neil Smith, Peter Wonka | We present a simple method that produces higher quality images than current state-of-the-art while maintaining the same amount of multimodal diversity. |
147 | Attention-Aware Multi-Stroke Style Transfer | Yuan Yao, Jianqiang Ren, Xuansong Xie, Weidong Liu, Yong-Jin Liu, Jun Wang | In this paper, we tackle these limitations by developing an attention-aware multi-stroke style transfer model. |
148 | Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks | Minyoung Huh, Shao-Hua Sun, Ning Zhang | We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator. |
149 | Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting | Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo | In this paper, we propose a Pyramid-context Encoder Network (denoted as PEN-Net) for image inpainting by deep generative models. |
150 | Example-Guided Style-Consistent Image Synthesis From Semantic Labeling | Miao Wang, Guo-Ye Yang, Ruilong Li, Run-Ze Liang, Song-Hai Zhang, Peter M. Hall, Shi-Min Hu | We propose a solution to the example-guided image synthesis problem using conditional generative adversarial networks with style consistency. |
151 | MirrorGAN: Learning Text-To-Image Generation by Redescription | Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao | In this paper, we address this problem by proposing a novel global-local attentive and semantic-preserving text-to-image-to-text framework called MirrorGAN. |
152 | Light Field Messaging With Deep Photographic Steganography | Eric Wengrowski, Kristin Dana | We develop Light Field Messaging (LFM), a process of embedding, transmitting, and receiving hidden information in video that is displayed on a screen and captured by a handheld camera. To learn this CDTF we introduce a dataset (Camera-Display 1M) of 1,000,000 camera-captured images collected from 25 camera-display pairs. |
153 | Im2Pencil: Controllable Pencil Illustration From Photographs | Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang | We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style. |
154 | When Color Constancy Goes Wrong: Correcting Improperly White-Balanced Images | Mahmoud Afifi, Brian Price, Scott Cohen, Michael S. Brown | This paper introduces the first method to explicitly address this problem. |
155 | Beyond Volumetric Albedo — A Surface Optimization Framework for Non-Line-Of-Sight Imaging | Chia-Yin Tsai, Aswin C. Sankaranarayanan, Ioannis Gkioulekas | We introduce an analysis-by-synthesis framework that can reconstruct complex shape and reflectance of an NLOS object. |
156 | Reflection Removal Using a Dual-Pixel Sensor | Abhijith Punnappurath, Michael S. Brown | In this paper, we show that most cameras have an overlooked mechanism that can greatly simplify this task. As part of this work, we provide the first image dataset for reflection removal consisting of the sub-aperture views from the DP sensor. |
157 | Practical Coding Function Design for Time-Of-Flight Imaging | Felipe Gutierrez-Barragan, Syed Azer Reza, Andreas Velten, Mohit Gupta | We present a constrained optimization approach for designing practical coding functions that adhere to hardware constraints. |
158 | Meta-SR: A Magnification-Arbitrary Network for Super-Resolution | Xuecai Hu, Haoyuan Mu, Xiangyu Zhang, Zilei Wang, Tieniu Tan, Jian Sun | In this work,we propose a novel method called Meta-SR to firstly solvesuper-resolution of arbitrary scale factor (including non-integer scale factors) with a single model. |
159 | Multispectral and Hyperspectral Image Fusion by MS/HS Fusion Net | Qi Xie, Minghao Zhou, Qian Zhao, Deyu Meng, Wangmeng Zuo, Zongben Xu | In this paper, we propose a model-based deep learning approach for merging an HrMS and LrHS images to generate a high-resolution hyperspectral (HrHS) image. |
160 | Learning Attraction Field Representation for Robust Line Segment Detection | Nan Xue, Song Bai, Fudong Wang, Gui-Song Xia, Tianfu Wu, Liangpei Zhang | This paper presents a region-partition based attraction field dual representation for line segment maps, and thus poses the problem of line segment detection (LSD) as the region coloring problem. |
161 | Blind Super-Resolution With Iterative Kernel Correction | Jinjin Gu, Hannan Lu, Wangmeng Zuo, Chao Dong | In this paper, we propose an Iterative Kernel Correction (IKC) method for blur kernel estimation in blind SR problem, where the blur kernels are unknown. |
162 | Video Magnification in the Wild Using Fractional Anisotropy in Temporal Distribution | Shoichiro Takeda, Yasunori Akagi, Kazuki Okami, Megumi Isogai, Hideaki Kimata | In this paper, we present a novel method using fractional anisotropy (FA) to detect only meaningful subtle changes without the aforementioned requirements. |
163 | Attentive Feedback Network for Boundary-Aware Salient Object Detection | Mengyang Feng, Huchuan Lu, Errui Ding | In this paper, we design the Attentive Feedback Modules (AFMs) to better explore the structure of objects. |
164 | Heavy Rain Image Restoration: Integrating Physics Model and Conditional Adversarial Learning | Ruoteng Li, Loong-Fah Cheong, Robby T. Tan | In this paper, we propose a novel method to address these problems. |
165 | Learning to Calibrate Straight Lines for Fisheye Image Rectification | Zhucun Xue, Nan Xue, Gui-Song Xia, Weiming Shen | This paper presents a new deep-learning based method to simultaneously calibrate the intrinsic parameters of fisheye lens and rectify the distorted images. To train and evaluate the proposed model, we also create a new large-scale dataset labeled with corresponding distortion parameters and well-annotated distorted lines. |
166 | Camera Lens Super-Resolution | Chang Chen, Zhiwei Xiong, Xinmei Tian, Zheng-Jun Zha, Feng Wu | In this paper, we investigate SR from the perspective of camera lenses, named as CameraSR, which aims to alleviate the intrinsic tradeoff between resolution (R) and field-of-view (V) in realistic imaging systems. |
167 | Frame-Consistent Recurrent Video Deraining With Dual-Level Flow | Wenhan Yang, Jiaying Liu, Jiashi Feng | In this paper, we address the problem of rain removal from videos by proposing a more comprehensive framework that considers the additional degradation factors in real scenes neglected in previous works. |
168 | Deep Plug-And-Play Super-Resolution for Arbitrary Blur Kernels | Kai Zhang, Wangmeng Zuo, Lei Zhang | In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels. |
169 | Sea-Thru: A Method for Removing Water From Underwater Images | Derya Akkaynak, Tali Treibitz | Here, we present a method that recovers color with the revised model using RGBD images. |
170 | Deep Network Interpolation for Continuous Imagery Effect Transition | Xintao Wang, Ke Yu, Chao Dong, Xiaoou Tang, Chen Change Loy | Unlike existing methods that require a specific design to achieve one particular transition (e.g., style transfer), we propose a simple yet universal approach to attain a smooth control of diverse imagery effects in many low-level vision tasks, including image restoration, image-to-image translation, and style transfer. |
171 | Spatially Variant Linear Representation Models for Joint Filtering | Jinshan Pan, Jiangxin Dong, Jimmy S. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan Yang | Different from existing algorithms that rely on locally linear models or hand-designed objective functions to extract the structural information from the guidance image, we propose a new joint filter based on a spatially variant linear representation model (SVLRM), where the target image is linearly represented by the guidance image. |
172 | Toward Convolutional Blind Denoising of Real Photographs | Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, Lei Zhang | In order to improve the generalization ability of deep CNN denoisers, we suggest training a convolutional blind denoising network (CBDNet) with more realistic noise model and real-world noisy-clean image pairs. |
173 | Towards Real Scene Super-Resolution With Raw Images | Xiangyu Xu, Yongrui Ma, Wenxiu Sun | To solve the first problem, we propose a new pipeline to generate realistic training data by simulating the imaging process of digital cameras. |
174 | ODE-Inspired Network Design for Single Image Super-Resolution | Xiangyu He, Zitao Mo, Peisong Wang, Yang Liu, Mingyuan Yang, Jian Cheng | In this paper, we propose to adopt an ordinary differential equation (ODE)-inspired design scheme for single image super-resolution, which have brought us a new understanding of ResNet in classification problems. |
175 | Blind Image Deblurring With Local Maximum Gradient Prior | Liang Chen, Faming Fang, Tingting Wang, Guixu Zhang | In this paper, we present a blind deblurring method based on Local Maximum Gradient (LMG) prior. |
176 | Attention-Guided Network for Ghost-Free High Dynamic Range Imaging | Qingsen Yan, Dong Gong, Qinfeng Shi, Anton van den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang | To avoid the ghosting from the source, we propose a novel attention-guided end-to-end deep neural network (AHDRNet) to produce high-quality ghost-free HDR images. |
177 | Searching for a Robust Neural Architecture in Four GPU Hours | Xuanyi Dong, Yi Yang | We propose an efficient NAS approach, which learns the searching approach by gradient descent. |
178 | Hierarchy Denoising Recursive Autoencoders for 3D Scene Layout Prediction | Yifei Shi, Angel X. Chang, Zhelun Wu, Manolis Savva, Kai Xu | We present a variational denoising recursive autoencoder (VDRAE) that generates and iteratively refines a hierarchical representation of 3D object layouts, interleaving bottom-up encoding for context aggregation and top-down decoding for propagation. |
179 | Adaptively Connected Neural Networks | Guangrun Wang, Keze Wang, Liang Lin | This paper presents a novel adaptively connected neural network (ACNet) to improve the traditional convolutional neural networks (CNNs) in two aspects. |
180 | CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency | Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang | In this paper, we present a novel pixel-wise adversarial domain adaptation algorithm. |
181 | Temporal Cycle-Consistency Learning | Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman | We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. |
182 | Predicting Future Frames Using Retrospective Cycle GAN | Yong-Hoon Kwon, Min-Gyu Park | In this paper, we propose a unified generative adversarial network for predicting accurate and temporally consistent future frames over time, even in a challenging environment. |
183 | Density Map Regression Guided Detection Network for RGB-D Crowd Counting and Localization | Dongze Lian, Jing Li, Jia Zheng, Weixin Luo, Shenghua Gao | To simultaneously estimate head counts and localize heads with bounding boxes, a regression guided detection network (RDNet) is proposed for RGB-D crowd counting. |
184 | TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning | Xin Wang, Fisher Yu, Ruth Wang, Trevor Darrell, Joseph E. Gonzalez | In this work, we propose Task Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta learning fashion. |
185 | Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach | Yuhua Chen, Wen Li, Xiaoran Chen, Luc Van Gool | In this work, we propose an approach to cross-domain semantic segmentation with the auxiliary geometric information, which can also be easily obtained from virtual environments. |
186 | Attentive Single-Tasking of Multiple Tasks | Kevis-Kokitsi Maninis, Ilija Radosavovic, Iasonas Kokkinos | In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as “single-tasking multiple tasks”. |
187 | Deep Metric Learning to Rank | Fatih Cakir, Kun He, Xide Xia, Brian Kulis, Stan Sclaroff | We propose a novel deep metric learning method by revisiting the learning to rank approach. |
188 | End-To-End Multi-Task Learning With Attention | Shikun Liu, Edward Johns, Andrew J. Davison | We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. |
189 | Self-Supervised Learning via Conditional Motion Propagation | Xiaohang Zhan, Xingang Pan, Ziwei Liu, Dahua Lin, Chen Change Loy | In this work, we design a new learning-from-motion paradigm to bridge these gaps. |
190 | Bridging Stereo Matching and Optical Flow via Spatiotemporal Correspondence | Hsueh-Ying Lai, Yi-Hsuan Tsai, Wei-Chen Chiu | In this paper, we propose a single and principled network to jointly learn spatiotemporal correspondence for stereo matching and flow estimation, with a newly designed geometric connection as the unsupervised signal for temporally adjacent stereo pairs. |
191 | All About Structure: Adapting Structural Information Across Domains for Boosting Semantic Segmentation | Wei-Lun Chang, Hui-Po Wang, Wen-Hsiao Peng, Wei-Chen Chiu | In this paper we tackle the problem of unsupervised domain adaptation for the task of semantic segmentation, where we attempt to transfer the knowledge learned upon synthetic datasets with ground-truth labels to real-world images without any annotation. |
192 | Iterative Reorganization With Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning | Chen Wei, Lingxi Xie, Xutong Ren, Yingda Xia, Chi Su, Jiaying Liu, Qi Tian, Alan L. Yuille | This paper presents a novel approach which applies to jigsaw puzzles with an arbitrary grid size and dimensionality. |
193 | Revisiting Self-Supervised Visual Representation Learning | Alexander Kolesnikov, Xiaohua Zhai, Lucas Beyer | Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. |
194 | It’s Not About the Journey; It’s About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning | Monica Haurilet, Alina Roitberg, Rainer Stiefelhagen | We present a new model for Visual Reasoning, aimed at capturing the interplay among individual objects in the image represented as a scene graph. |
195 | Actively Seeking and Learning From Live Data | Damien Teney, Anton van den Hengel | The approach we propose is a step toward overcoming this limitation by searching for the information required at test time. |
196 | Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing | Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li | To tackle this issue, we design a novel cross-modal attention-guided erasing approach, where we discard the most dominant information from either textual or visual domains to generate difficult training samples online, and to drive the model to discover complementary textual-visual correspondences. |
197 | Neighbourhood Watch: Referring Expression Comprehension via Language-Guided Graph Attention Networks | Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton van den Hengel | To capture and exploit this important information we propose a graph-based, language-guided attention mechanism. |
198 | Scene Graph Generation With External Knowledge and Image Reconstruction | Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling | In this paper, we propose a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome these dataset issues. |
199 | Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval | Yale Song, Mohammad Soleymani | In this work, we introduce Polysemous Instance Embedding Networks (PIE-Nets) that compute multiple and diverse representations of an instance by combining global context with locally-guided features via multi-head self-attention and residual learning. To facilitate further research in video-text retrieval, we release a new dataset of 50K video-sentence pairs collected from social media, dubbed MRW (my reaction when). |
200 | MUREL: Multimodal Relational Reasoning for Visual Question Answering | Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome | In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. |
201 | Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering | Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang | In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention. |
202 | Information Maximizing Visual Question Generation | Ranjay Krishna, Michael Bernstein, Li Fei-Fei | To overcome the non-differentiability of discrete natural language tokens, we introduce a variational continuous latent space onto which the expected answers project. |
203 | Learning to Detect Human-Object Interactions With Knowledge | Bingjie Xu, Yongkang Wong, Junnan Li, Qi Zhao, Mohan S. Kankanhalli | In this work, we focus on detecting human-object interactions (HOIs) in images, an essential step towards deeper scene understanding. |
204 | Learning Words by Drawing Images | Didac Suris, Adria Recasens, David Bau, David Harwath, James Glass, Antonio Torralba | We propose a framework for learning through drawing. |
205 | Factor Graph Attention | Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander G. Schwing | We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities. |
206 | Reducing Uncertainty in Undersampled MRI Reconstruction With Active Acquisition | Zizhao Zhang, Adriana Romero, Matthew J. Muckley, Pascal Vincent, Lin Yang, Michal Drozdzal | In this paper, we present a novel method for MRI reconstruction that, at inference time, dynamically selects the measurements to take and iteratively refines the prediction in order to best reduce the reconstruction error and, thus, its uncertainty. |
207 | ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification | Fangneng Zhan, Shijian Lu | This paper presents an end-to-end trainable scene text recognition system (ESIR) that iteratively removes perspective distortion and text line curvature as driven by better scene text recognition performance. |
208 | ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape | Fabian Manhardt, Wadim Kehl, Adrien Gaidon | We present a deep learning method for end-to-end monocular 3D object detection and metric shape retrieval. |
209 | Collaborative Learning of Semi-Supervised Segmentation and Classification for Medical Images | Yi Zhou, Xiaodong He, Lei Huang, Li Liu, Fan Zhu, Shanshan Cui, Ling Shao | In this paper, we propose a collaborative learning method to jointly improve the performance of disease grading and lesion segmentation by semi-supervised learning with an attention mechanism. |
210 | Biologically-Constrained Graphs for Global Connectomics Reconstruction | Brian Matejek, Daniel Haehn, Haidong Zhu, Donglai Wei, Toufiq Parag, Hanspeter Pfister | We propose a third step for connectomics reconstruction pipelines to refine an over-segmentation using both local and global context with an emphasis on adhering to the underlying biology. |
211 | P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification | Bingzhe Wu, Shiwan Zhao, Guangyu Sun, Xiaolu Zhang, Zhong Su, Caihong Zeng, Zhihong Liu | To tackle the above two challenges, we introduce a novel stochastic gradient descent (SGD) scheme, named patient privacy preserving SGD (P3SGD), which performs the model update of the SGD in the patient level via a large-step update built upon each patient’s data. |
212 | Elastic Boundary Projection for 3D Medical Image Segmentation | Tianwei Ni, Lingxi Xie, Huangjie Zheng, Elliot K. Fishman, Alan L. Yuille | In this paper, we bridge the gap between 2D and 3D using a novel approach named Elastic Boundary Projection (EBP). |
213 | SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images | Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye | In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. |
214 | Noise2Void – Learning Denoising From Single Noisy Images | Alexander Krull, Tim-Oliver Buchholz, Florian Jug | Here, we introduce Noise2Void (N2V), a training scheme that takes this idea one step further. |
215 | Joint Discriminative and Generative Learning for Person Re-Identification | Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz | In this paper, we seek to improve learned re-id embeddings by better leveraging the generated data. |
216 | Unsupervised Person Re-Identification by Soft Multilabel Learning | Hong-Xing Yu, Wei-Shi Zheng, Ancong Wu, Xiaowei Guo, Shaogang Gong, Jian-Huang Lai | To overcome this problem, we propose a deep model for the soft multilabel learning for unsupervised RE-ID. |
217 | Learning Context Graph for Person Search | Yichao Yan, Qiang Zhang, Bingbing Ni, Wendong Zhang, Minghao Xu, Xiaokang Yang | In this work, we take a step further and consider employing context information for person search. |
218 | Gradient Matching Generative Networks for Zero-Shot Learning | Mert Bulent Sariyildiz, Ramazan Gokberk Cinbis | In contrast, we propose a generative model that can naturally learn from unsupervised examples, and synthesize training examples for unseen classes purely based on their class embeddings, and therefore, reduce the zero-shot learning problem into a supervised classification task. |
219 | Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval | Sounak Dey, Pau Riba, Anjan Dutta, Josep Llados, Yi-Zhe Song | In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. |
220 | Zero-Shot Task Transfer | Arghya Pal, Vineeth N Balasubramanian | In this work, we present a novel meta-learning algorithm that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). |
221 | C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection | Fang Wan, Chang Liu, Wei Ke, Xiangyang Ji, Jianbin Jiao, Qixiang Ye | In this paper, we introduce a continuation optimization method into MIL and thereby creating continuation multiple instance learning (C-MIL), with the intention of alleviating the non-convexity problem in a systematic way. |
222 | Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations | Jiwoon Ahn, Sunghyun Cho, Suha Kwak | This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision. |
223 | Attention-Based Dropout Layer for Weakly Supervised Object Localization | Junsuk Choe, Hyunjung Shim | To address this problem, we propose an Attention-based Dropout Layer (ADL), which utilizes the self-attention mechanism to process the feature maps of the model. |
224 | Domain Generalization by Solving Jigsaw Puzzles | Fabio M. Carlucci, Antonio D’Innocente, Silvia Bucci, Barbara Caputo, Tatiana Tommasi | In this paper we propose to apply a similar approach to the task of object recognition across domains: our model learns the semantic labels in a supervised fashion, and broadens its understanding of the data by learning from self-supervised signals how to solve a jigsaw puzzle on the same images. |
225 | Transferrable Prototypical Networks for Unsupervised Domain Adaptation | Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, Tao Mei | In this paper, we introduce a new idea for unsupervised domain adaptation via a remold of Prototypical Networks, which learn an embedding space and perform classification via a remold of the distances to the prototype of each class. |
226 | Blending-Target Domain Adaptation by Adversarial Meta-Adaptation Networks | Ziliang Chen, Jingyu Zhuang, Xiaodan Liang, Liang Lin | In this paper, we consider a more realistic transfer scenario: our target domain is comprised of multiple sub-targets implicitly blended with each other so that learners could not identify which sub-target each unlabeled sample belongs to. |
227 | ELASTIC: Improving CNNs With Dynamic Scaling Policies | Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan L. Yuille, Mohammad Rastegari | In this paper, we introduce Elastic, a simple, efficient and yet very effective approach to learn a dynamic scale policy from data. |
228 | ScratchDet: Training Single-Shot Object Detectors From Scratch | Rui Zhu, Shifeng Zhang, Xiaobo Wang, Longyin Wen, Hailin Shi, Liefeng Bo, Tao Mei | In this paper, we explore to train object detectors from scratch robustly. |
229 | SFNet: Learning Object-Aware Semantic Correspondence | Junghyup Lee, Dohyung Kim, Jean Ponce, Bumsub Ham | We propose a new CNN architecture, dubbed SFNet, which implements this idea. |
230 | Deep Metric Learning Beyond Binary Supervision | Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak | Motivated by this, we present a novel method for deep metric learning using continuous labels. |
231 | Learning to Cluster Faces on an Affinity Graph | Lei Yang, Xiaohang Zhan, Dapeng Chen, Junjie Yan, Chen Change Loy, Dahua Lin | Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters. |
232 | C2AE: Class Conditioned Auto-Encoder for Open-Set Recognition | Poojan Oza, Vishal M. Patel | In this paper, we propose an open-set recognition algorithm using class conditioned auto-encoders with novel training and testing methodologies. |
233 | Shapes and Context: In-The-Wild Image Synthesis & Manipulation | Aayush Bansal, Yaser Sheikh, Deva Ramanan | We introduce a data-driven model for interactively synthesizing in-the-wild images from semantic label input masks. |
234 | Semantics Disentangling for Text-To-Image Generation | Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao | In this paper, we consider semantics from the input text descriptions in helping render photo-realistic images. |
235 | Semantic Image Synthesis With Spatially-Adaptive Normalization | Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu | We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. |
236 | Progressive Pose Attention Transfer for Person Image Generation | Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, Xiang Bai | This paper proposes a new generative adversarial network to the problem of pose transfer, i.e., transferring the pose of a given person to a target one. |
237 | Unsupervised Person Image Generation With Semantic Parsing Transformation | Sijie Song, Wei Zhang, Jiaying Liu, Tao Mei | In this paper, we address unsupervised pose-guided person image generation, which is known challenging due to non-rigid deformation. |
238 | DeepView: View Synthesis With Learned Gradient Descent | John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, Richard Tucker | We present a novel approach to view synthesis using multiplane images (MPIs). |
239 | Animating Arbitrary Objects via Deep Motion Transfer | Aliaksandr Siarohin, Stephane Lathuiliere, Sergey Tulyakov, Elisa Ricci, Nicu Sebe | This paper introduces a novel deep learning framework for image animation. |
240 | Textured Neural Avatars | Aliaksandra Shysheya, Egor Zakharov, Kara-Ali Aliev, Renat Bashirov, Egor Burkov, Karim Iskakov, Aleksei Ivakhnenko, Yury Malkov, Igor Pasechnik, Dmitry Ulyanov, Alexander Vakhitov, Victor Lempitsky | We present a system for learning full body neural avatars, i.e. deep networks that produce full body renderings of a person for varying body pose and varying camera pose. |
241 | IM-Net for High Resolution Video Frame Interpolation | Tomer Peleg, Pablo Szekely, Doron Sabo, Omry Sendik | In this paper we propose IM-Net: an interpolated motion neural network. |
242 | Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation | Ying-Cong Chen, Xiaogang Xu, Zhuotao Tian, Jiaya Jia | In this paper, we propose an alternative framework, as an extension of latent space interpolation, to consider the intermediate region between two domains during translation. |
243 | Multi-Channel Attention Selection GAN With Cascaded Semantic Guidance for Cross-View Image Translation | Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan | In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map. |
244 | Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping | Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Kun Zhang, Dacheng Tao | Based on this special property, we develop a geometry-consistent generative adversarial network (Gc-GAN), which enables one-sided unsupervised domain mapping. |
245 | DeepVoxels: Learning Persistent 3D Feature Embeddings | Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Niessner, Gordon Wetzstein, Michael Zollhofer | In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. |
246 | Inverse Path Tracing for Joint Material and Lighting Estimation | Dejan Azinovic, Tzu-Mao Li, Anton Kaplanyan, Matthias Niessner | We introduce Inverse Path Tracing, a novel approach to jointly estimate the material properties of objects and light sources in indoor scenes by using an invertible light transport simulation. |
247 | The Visual Centrifuge: Model-Free Layered Video Representations | Jean-Baptiste Alayrac, Joao Carreira, Andrew Zisserman | Here we propose a learning-based approach for multi-layered video representation: we introduce novel uncertainty-capturing 3D convolutional architectures and train them to separate blended videos. |
248 | Label-Noise Robust Generative Adversarial Networks | Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada | To remedy this, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy. |
249 | DLOW: Domain Flow for Adaptation and Generalization | Rui Gong, Wen Li, Yuhua Chen, Luc Van Gool | In this work, we present a domain flow generation(DLOW) model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other. |
250 | CollaGAN: Collaborative GAN for Missing Image Data Imputation | Dongwook Lee, Junyoung Kim, Won-Jin Moon, Jong Chul Ye | To address this problem, here we proposed a novel framework for missing image data imputation, called Collaborative Generative Adversarial Network (CollaGAN). |
251 | d-SNE: Domain Adaptation Using Stochastic Neighborhood Embedding | Xiang Xu, Xiong Zhou, Ragav Venkatesan, Gurumurthy Swaminathan, Orchid Majumder | In this paper, we propose a new technique (d-SNE) of domain adaptation that cleverly uses stochastic neighborhood embedding techniques and a novel modified-Hausdorff distance. |
252 | Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation | Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang | To address this problem, this paper introduces a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. |
253 | ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation | Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Perez | In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. |
254 | ContextDesc: Local Descriptor Augmentation With Cross-Modality Context | Zixin Luo, Tianwei Shen, Lei Zhou, Jiahui Zhang, Yao Yao, Shiwei Li, Tian Fang, Long Quan | In this paper, we go beyond the local detail representation by introducing context awareness to augment off-the-shelf local feature descriptors. |
255 | Large-Scale Long-Tailed Recognition in an Open World | Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu | We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. |
256 | SDC – Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks | Rene Schuster, Oliver Wasenmuller, Christian Unger, Didier Stricker | We present a robust, unified descriptor network that considers a large context region with high spatial variance. |
257 | Learning Correspondence From the Cycle-Consistency of Time | Xiaolong Wang, Allan Jabri, Alexei A. Efros | We introduce a self-supervised method for learning visual correspondence from unlabeled video. |
258 | AE2-Nets: Autoencoder in Autoencoder Networks | Changqing Zhang, Yeqing Liu, Huazhu Fu | Differently, in this paper, we focus on unsupervised representation learning and propose a novel framework termed Autoencoder in Autoencoder Networks (AE^2-Nets), which integrates information from heterogeneous sources into an intact representation by the nested autoencoder framework. |
259 | Mitigating Information Leakage in Image Representations: A Maximum Entropy Approach | Proteek Chandan Roy, Vishnu Naresh Boddeti | We formulate the problem as an adversarial non-zero sum game of finding a good embedding function with two competing goals: to retain as much task dependent discriminative image information as possible, while simultaneously minimizing the amount of information, as measured by entropy, about other sensitive attributes of the user. |
260 | Learning Spatial Common Sense With Geometry-Aware Recurrent Networks | Hsiao-Yu Fish Tung, Ricson Cheng, Katerina Fragkiadaki | We integrate two powerful ideas, geometry and deep visual representation learning, into recurrent network architectures for mobile visual scene understanding. |
261 | Structured Knowledge Distillation for Semantic Segmentation | Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang | In this paper, we investigate the issue of knowledge distillation for training compact semantic segmentation networks by making use of cumbersome networks. |
262 | Scan2CAD: Learning CAD Model Alignment in RGB-D Scans | Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, Matthias Niessner | We present Scan2CAD, a novel data-driven method that learns to align clean 3D CAD models from a shape database to the noisy and incomplete geometry of a commodity RGB-D scan. |
263 | Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation | Po-Yi Chen, Alexander H. Liu, Yen-Cheng Liu, Yu-Chiang Frank Wang | In this paper, we propose SceneNet to overcome this limitation with the aid of semantic understanding from segmentation. |
264 | Tell Me Where I Am: Object-Level Scene Context Prediction | Xiaotian Qiao, Quanlong Zheng, Ying Cao, Rynson W.H. Lau | In this paper, we consider an inverse problem of how to hallucinate missing contextual information from the properties of a few standalone objects. |
265 | Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation | He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, Leonidas J. Guibas | The goal of this paper is to estimate the 6D pose and dimensions of unseen object instances in an RGB-D image. To further improve our model and evaluate its performance on real data, we also provide a fully annotated real-world dataset with large environment and instance variation. |
266 | Supervised Fitting of Geometric Primitives to 3D Point Clouds | Lingxiao Li, Minhyuk Sung, Anastasia Dubrovina, Li Yi, Leonidas J. Guibas | In this work, we introduce Supervised Primitive Fitting Network (SPFN), an end-to-end neural network that can robustly detect a varying number of primitives at different scales without any user control. |
267 | Do Better ImageNet Models Transfer Better? | Simon Kornblith, Jonathon Shlens, Quoc V. Le | Here, we compare the performance of 16 classification networks on 12 image classification datasets. Together, our results show that ImageNet architectures generalize well across datasets, but ImageNet features are less general than previously suggested. |
268 | Gotta Adapt ‘Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild | Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu, Manmohan Chandraker | Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance |
269 | Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift | Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang | Theoretically, we find that Dropout shifts the variance of a specific neural unit when we transfer the state of that network from training to test. However, BN maintains its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of variances in Dropout and BN (we name this scheme “variance shift”) causes the unstable numerical behavior in inference that leads to erroneous predictions finally. |
270 | Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation | Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann | To address this problem, we propose new circulant filters (CiFs) and a circulant binary convolution (CBConv) to enhance the capacity of binarized convolutional features via our circulant back propagation (CBP). |
271 | DeFusionNET: Defocus Blur Detection via Recurrently Fusing and Refining Multi-Scale Deep Features | Chang Tang, Xinzhong Zhu, Xinwang Liu, Lizhe Wang, Albert Zomaya | To deal with these issues, we propose a deep neural network which recurrently fuses and refines multi-scale deep features (DeFusionNet) for defocus blur detection. |
272 | Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks | Eunwoo Kim, Chanho Ahn, Philip H.S. Torr, Songhwai Oh | In particular, in this work we address the problem of memory efficient learning for multiple tasks. |
273 | Universal Domain Adaptation | Kaichao You, Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan | To solve the universal domain adaptation problem, we propose Universal Adaptation Network (UAN). |
274 | Improving Transferability of Adversarial Examples With Input Diversity | Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, Alan L. Yuille | To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns. |
275 | Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition | Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, Heng Tao Shen | In this paper, we develop a Sequence-to-Sequence Domain Adaptation Network (SSDAN) for robust text image recognition, which could exploit unsupervised sequence data by an attention-based sequence encoder-decoder network. |
276 | Hybrid-Attention Based Decoupled Metric Learning for Zero-Shot Image Retrieval | Binghui Chen, Weihong Deng | In this paper, we first emphasize the importance of learning visual discriminative metric and preventing the partial/selective learning behavior of learner in ZSIR, and then propose the Decoupled Metric Learning (DeML) framework to achieve these individually. |
277 | Learning to Sample | Oren Dovrat, Itai Lang, Shai Avidan | To do that, we propose a deep network to simplify 3D point clouds. |
278 | Few-Shot Learning via Saliency-Guided Hallucination of Samples | Hongguang Zhang, Jing Zhang, Piotr Koniusz | In this paper, we follow the latter direction and present a novel data hallucination model. |
279 | Variational Convolutional Neural Network Pruning | Chenglong Zhao, Bingbing Ni, Jian Zhang, Qiwei Zhao, Wenjun Zhang, Qi Tian | We propose a variational Bayesian scheme for pruning convolutional neural networks in channel level. |
280 | Towards Optimal Structured CNN Pruning via Generative Adversarial Learning | Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann | In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner. |
281 | Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression | Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji | In this paper, we investigate the problem of CNN compression from a novel interpretable perspective. |
282 | Fully Quantized Network for Object Detection | Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, Rui Fan | In this paper, we demonstrate that many of these difficulties arise because of instability during the fine-tuning stage of the quantization process, and propose several novel techniques to overcome these instabilities. |
283 | MnasNet: Platform-Aware Neural Architecture Search for Mobile | Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le | In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. |
284 | Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More | Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, Mingli Song | In this paper, we investigate a novel deep-model reusing task. |
285 | K-Nearest Neighbors Hashing | Xiangyu He, Peisong Wang, Jian Cheng | In this work, we revisit the sign() function from the perspective of space partitioning. |
286 | Learning RoI Transformer for Oriented Object Detection in Aerial Images | Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, Qikai Lu | In this paper, we propose a RoI Transformer to address these problems. |
287 | Snapshot Distillation: Teacher-Student Optimization in One Generation | Chenglin Yang, Lingxi Xie, Chi Su, Alan L. Yuille | This paper presents snapshot distillation (SD), the first framework which enables teacher-student optimization in one generation. |
288 | Geometry-Aware Distillation for Indoor Semantic Segmentation | Jianbo Jiao, Yunchao Wei, Zequn Jie, Honghui Shi, Rynson W.H. Lau, Thomas S. Huang | In this paper, we propose to jointly infer the semantic and depth information by distilling geometry-aware embedding to eliminate such strong constraint while still exploiting the helpful depth domain information. |
289 | LiveSketch: Query Perturbations for Guided Sketch-Based Visual Search | John Collomosse, Tu Bui, Hailin Jin | Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. |
290 | Bounding Box Regression With Uncertainty for Accurate Object Detection | Yihui He, Chenchen Zhu, Jianren Wang, Marios Savvides, Xiangyu Zhang | In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. |
291 | OCGAN: One-Class Novelty Detection Using GANs With Constrained Latent Representations | Pramuditha Perera, Ramesh Nallapati, Bing Xiang | We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class. |
292 | Learning Metrics From Teachers: Compact Networks for Image Embedding | Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa | In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks. |
293 | Activity Driven Weakly Supervised Object Detection | Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan | In our work, we try to leverage not only the object class labels but also the action labels associated with the data. |
294 | Separate to Adapt: Open Set Domain Adaptation via Progressive Separation | Hong Liu, Zhangjie Cao, Mingsheng Long, Jianmin Wang, Qiang Yang | To this end, this paper presents Separate to Adapt (STA), an end-to-end approach to open set domain adaptation. |
295 | Layout-Graph Reasoning for Fashion Landmark Detection | Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin | In this paper, we propose to seamlessly enforce structural layout relationships among landmarks on the intermediate representations via multiple stacked layout-graph reasoning layers. |
296 | DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs | Erkun Yang, Tongliang Liu, Cheng Deng, Wei Liu, Dacheng Tao | To address this problem, in this paper, we propose a new deep unsupervised hashing model, called DistilHash, which can learn a distilled data set, where data pairs have confident similarity signals. |
297 | Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks | Junjie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu | In this paper, we propose a Metadata Neighbourhood Graph Co-Attention Network (MangoNet) to model the correlations between each target image and its neighbours. |
298 | Region Proposal by Guided Anchoring | Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin | In this paper, we revisit this foundational stage. |
299 | Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation | Jian Liang, Ran He, Zhenan Sun, Tieniu Tan | This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. |
300 | Learning to Transfer Examples for Partial Domain Adaptation | Zhangjie Cao, Kaichao You, Mingsheng Long, Jianmin Wang, Qiang Yang | In this work, we propose a unified approach to PDA, Example Transfer Network (ETN), which jointly learns domain-invariant representations across domains and a progressive weighting scheme to quantify the transferability of source examples. |
301 | Generalized Zero-Shot Recognition Based on Visually Semantic Embedding | Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama | We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. |
302 | Towards Visual Feature Translation | Jie Hu, Rongrong Ji, Hong Liu, Shengchuan Zhang, Cheng Deng, Qi Tian | In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems. |
303 | Amodal Instance Segmentation With KINS Dataset | Lu Qi, Li Jiang, Shu Liu, Xiaoyong Shen, Jiaya Jia | In this paper, we augment KITTI with more instance pixel-level annotation for 8 categories, which we call KITTI INStance dataset (KINS). |
304 | Global Second-Order Pooling Convolutional Networks | Zilin Gao, Jiangtao Xie, Qilong Wang, Peihua Li | In this paper, we propose a novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network. |
305 | Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up | Weifeng Ge, Xiangru Lin, Yizhou Yu | In this paper, we approach this problem from a different perspective. |
306 | NetTailor: Tuning the Architecture, Not Just the Weights | Pedro Morgado, Nuno Vasconcelos | To address these problems, we propose a transfer learning procedure, denoted NetTailor, in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new networks. |
307 | Learning-Based Sampling for Natural Image Matting | Jingwei Tang, Yagiz Aksoy, Cengiz Oztireli, Markus Gross, Tunc Ozan Aydin | In this paper, we propose the estimation of the layer colors through the use of deep neural networks prior to the opacity estimation. |
308 | Learning Unsupervised Video Object Segmentation Through Visual Attention | Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling | This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. |
309 | 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks | Christopher Choy, JunYoung Gwak, Silvio Savarese | In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos using high-dimensional convolutions. |
310 | Pyramid Feature Attention Network for Saliency Detection | Ting Zhao, Xiangqian Wu | To address this problem, a novel CNN named pyramid feature attention network (PFAN) is proposed to enhance the high-level context features and the low-level spatial structural features. |
311 | Co-Saliency Detection via Mask-Guided Fully Convolutional Networks With Multi-Scale Label Smoothing | Kaihua Zhang, Tengpeng Li, Bo Liu, Qingshan Liu | In this paper, we propose a hierarchical image co-saliency detection framework as a coarse to fine strategy to capture this pattern. |
312 | SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines | Yuan-Ting Hu, Hong-Shuo Chen, Kexin Hui, Jia-Bin Huang, Alexander G. Schwing | We introduce SAIL-VOS (Semantic Amodal Instance Level Video Object Segmentation), a new dataset aiming to stimulate semantic amodal segmentation research. To address this issue, we present a synthetic dataset extracted from the photo-realistic game GTA-V. |
313 | Learning Instance Activation Maps for Weakly Supervised Instance Segmentation | Yi Zhu, Yanzhao Zhou, Huijuan Xu, Qixiang Ye, David Doermann, Jianbin Jiao | In this work, we tackle this challenging problem by using a novel instance extent filling approach. |
314 | Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation | Zhi Tian, Tong He, Chunhua Shen, Youliang Yan | In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. |
315 | Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation | Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang | In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. |
316 | Dual Attention Network for Scene Segmentation | Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, Hanqing Lu | In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. |
317 | InverseRenderNet: Learning Single Image Inverse Rendering | Ye Yu, William A. P. Smith | We show how to train a fully convolutional neural network to perform inverse rendering from a single, uncontrolled image. |
318 | A Variational Auto-Encoder Model for Stochastic Point Processes | Nazanin Mehrasa, Akash Abdu Jyothi, Thibaut Durand, Jiawei He, Leonid Sigal, Greg Mori | We propose a novel probabilistic generative model for action sequences. |
319 | Unifying Heterogeneous Classifiers With Distillation | Jayakorn Vongkulbhisal, Phongtharin Vinayavekhin, Marco Visentini-Scarzanella | In this paper, we study the problem of unifying knowledge from a set of classifiers with different architectures and target classes into a single classifier, given only a generic set of unlabelled data. |
320 | Assessment of Faster R-CNN in Man-Machine Collaborative Search | Arturo Deza, Amit Surana, Miguel P. Eckstein | With the advent of modern expert systems driven by deep learning that supplement human experts (e.g. radiologists, dermatologists, surveillance scanners), we analyze how and when do such expert systems enhance human performance in a fine-grained small target visual search task. |
321 | OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge | Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi | In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources. |
322 | NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction | Yuan Gao, Jiayi Ma, Mingbo Zhao, Wei Liu, Alan L. Yuille | In this paper, we propose a novel Convolutional Neural Network (CNN) structure for general-purpose multi-task learning (MTL), which enables automatic feature fusing at every layer from different tasks. |
323 | Spectral Metric for Dataset Complexity Assessment | Frederic Branchaud-Charron, Andrew Achkar, Pierre-Marc Jodoin | In this paper, we propose a new measure to gauge the complexity of image classification problems. |
324 | ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding | Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, Hefeng Wu | We propose an attention-injective deformable convolutional network called ADCrowdNet for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes. |
325 | VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild | Yihang Lou, Yan Bai, Jun Liu, Shiqi Wang, Lingyu Duan | To promote the research of vehicle ReID in the wild, we collect a new dataset called VERI-Wild with the following distinct features: 1) The vehicle images are captured by a large surveillance system containing 174 cameras covering a large urban district (more than 200km^2) The camera network continuously captures vehicles for 24 hours in each day and lasts for 1 month. |
326 | 3D Local Features for Direct Pairwise Registration | Haowen Deng, Tolga Birdal, Slobodan Ilic | We present a novel, data driven approach for solving the problem of registration of two point cloud scans. |
327 | HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds | Xiuye Gu, Yijie Wang, Chongruo Wu, Yong Jae Lee, Panqu Wang | We present a novel deep neural network architecture for end-to-end scene flow estimation that directly operates on large-scale 3D point clouds. |
328 | GPSfM: Global Projective SFM Using Algebraic Constraints on Multi-View Fundamental Matrices | Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri | This paper addresses the problem of recovering projective camera matrices from collections of fundamental matrices in multiview settings. |
329 | Group-Wise Correlation Stereo Network | Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li | In this paper, we propose to construct the cost volume by group-wise correlation. |
330 | Multi-Level Context Ultra-Aggregation for Stereo Matching | Guang-Yu Nie, Ming-Ming Cheng, Yun Liu, Zhengfa Liang, Deng-Ping Fan, Yue Liu, Yongtian Wang | In this paper, we propose a unary features descriptor using multi-level context ultra-aggregation (MCUA), which encapsulates all convolutional features into a more discriminative representation by intra- and inter-level features combination. |
331 | Large-Scale, Metric Structure From Motion for Unordered Light Fields | Sotiris Nousias, Manolis Lourakis, Christos Bergeles | This paper presents a large scale, metric Structure from Motion (SfM) pipeline for generalised cameras with overlapping fields-of-view, and demonstrates it using Light Field (LF) images. |
332 | Understanding the Limitations of CNN-Based Absolute Camera Pose Regression | Torsten Sattler, Qunjie Zhou, Marc Pollefeys, Laura Leal-Taixe | To understand this behavior, we develop a theoretical model for camera pose regression. |
333 | DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image | Jiaxiong Qiu, Zhaopeng Cui, Yinda Zhang, Xingdi Zhang, Shuaicheng Liu, Bing Zeng, Marc Pollefeys | In this paper, we propose a deep learning architecture that produces accurate dense depth for the outdoor scene from a single color image and a sparse depth. |
334 | Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling | Jiancheng Yang, Qiang Zhang, Bingbing Ni, Linguo Li, Jinxian Liu, Mengdie Zhou, Qi Tian | Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. |
335 | Learning With Batch-Wise Optimal Transport Loss for 3D Shape Recognition | Lin Xu, Han Sun, Yuai Liu | In this paper, we show how to learn an importance-driven distance metric via optimal transport programming from batches of samples. |
336 | DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion | Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martin-Martin, Cewu Lu, Li Fei-Fei, Silvio Savarese | In this work, we present DenseFusion, a generic framework for estimating 6D pose of a set of known objects from RGB-D images. |
337 | Dense Depth Posterior (DDP) From Single Image and Sparse Range | Yanchao Yang, Alex Wong, Stefano Soatto | We present a deep learning system to infer the posterior distribution of a dense depth map associated with an image, by exploiting sparse range measurements, for instance from a lidar. |
338 | DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama | Shang-Ta Yang, Fu-En Wang, Chi-Han Peng, Peter Wonka, Min Sun, Hung-Kuo Chu | We present a deep learning framework, called DuLa-Net, to predict Manhattan-world 3D room layouts from a single RGB panorama. To learn more complex room layouts, we introduce the Realtor360 dataset that contains panoramas of Manhattan-world room layouts with different numbers of corners. |
339 | Veritatem Dies Aperit – Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach | Amir Atapour-Abarghouei, Toby P. Breckon | In this paper, we propose a multi-task learning-based approach capable of jointly performing geometric and semantic scene understanding, namely depth prediction (monocular depth estimation and depth completion) and semantic scene segmentation. |
340 | Segmentation-Driven 6D Object Pose Estimation | Yinlin Hu, Joachim Hugonot, Pascal Fua, Mathieu Salzmann | In this paper, we introduce a segmentation-driven 6D pose estimation framework where each visible part of the objects contributes a local pose prediction in the form of 2D keypoint locations. |
341 | Exploiting Temporal Context for 3D Human Pose Estimation in the Wild | Anurag Arnab, Carl Doersch, Andrew Zisserman | We present a bundle-adjustment-based algorithm for recovering accurate 3D human pose and meshes from monocular videos. Building upon our algorithm, we present a new dataset of more than 3 million frames of YouTube videos from Kinetics with automatically generated 3D poses and meshes. |
342 | What Do Single-View 3D Reconstruction Networks Learn? | Maxim Tatarchenko, Stephan R. Richter, Rene Ranftl, Zhuwen Li, Vladlen Koltun, Thomas Brox | In this work, we set up two alternative approaches that perform image classification and retrieval respectively. |
343 | UniformFace: Learning Deep Equidistributed Representation for Face Recognition | Yueqi Duan, Jiwen Lu, Jie Zhou | In this paper, we propose a new supervision objective named uniform loss to learn deep equidistributed representations for face recognition. |
344 | Semantic Graph Convolutional Networks for 3D Human Pose Regression | Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, Dimitris N. Metaxas | In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression. |
345 | Mask-Guided Portrait Editing With Conditional GANs | Shuyang Gu, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen, Lu Yuan | In this paper, we argue about three issues in existing techniques: diversity, quality, and controllability for portrait synthesis and editing. |
346 | Group Sampling for Scale Invariant Face Detection | Xiang Ming, Fangyun Wei, Ting Zhang, Dong Chen, Fang Wen | In this paper, we carefully examine the factors affecting face detection across a large range of scales, and conclude that the balance of training samples, including both positive and negative ones, at different scales is the key. |
347 | Joint Representation and Estimator Learning for Facial Action Unit Intensity Estimation | Yong Zhang, Baoyuan Wu, Weiming Dong, Zhifeng Li, Wei Liu, Bao-Gang Hu, Qiang Ji | In this paper, a novel general framework for AU intensity estimation is presented, which differs from traditional estimation methods in two aspects. |
348 | Semantic Alignment: Finding Semantically Consistent Ground-Truth for Facial Landmark Detection | Zhiwei Liu, Xiangyu Zhu, Guosheng Hu, Haiyun Guo, Ming Tang, Zhen Lei, Neil M. Robertson, Jinqiao Wang | In this paper, we propose a novel probabilistic model which introduces a latent variable, i.e. ‘real’ groundtruth which is semantically consistent, to optimize. |
349 | LAEO-Net: Revisiting People Looking at Each Other in Videos | Manuel J. Marin-Jimenez, Vicky Kalogeiton, Pablo Medina-Suarez, Andrew Zisserman | For this purpose, we propose LAEO-Net, a new deep CNN for determining LAEO in videos. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO. |
350 | Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks | Meilu Zhu, Daming Shi, Mingjie Zheng, Muhammad Sadiq | In this paper, we present a simple and effective framework called Occlusion-adaptive Deep Networks (ODN) with the purpose of solving the occlusion problem for facial landmark detection. |
351 | Learning Individual Styles of Conversational Gesture | Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik | We present a method for cross-modal translation from “in-the-wild” monologue speech of a single speaker to their conversational gesture motion. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures. |
352 | Face Anti-Spoofing: Model Matters, so Does Data | Xiao Yang, Wenhan Luo, Linchao Bao, Yuan Gao, Dihong Gong, Shibao Zheng, Zhifeng Li, Wei Liu | In this paper, we present a data collection solution along with a data synthesis technique to simulate digital medium-based face spoofing attacks, which can easily help us obtain a large amount of training data well reflecting the real-world scenarios. |
353 | Fast Human Pose Estimation | Feng Zhang, Xiatian Zhu, Mao Ye | In this work, we investigate the under-studied but practically critical pose model efficiency problem. |
354 | Decorrelated Adversarial Learning for Age-Invariant Face Recognition | Hao Wang, Dihong Gong, Zhifeng Li, Wei Liu | To implement this idea, we propose the Decorrelated Adversarial Learning (DAL) algorithm, where a Canonical Mapping Module (CMM) is introduced to find maximum correlation of the paired features generated by the backbone network, while the backbone network and the factorization module are trained to generate features reducing the correlation. |
355 | Cross-Task Weakly Supervised Learning From Instructional Videos | Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic | In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. |
356 | D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation | Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles | We propose Discriminative Differentiable Dynamic Time Warping (D3TW), the first discriminative model using weak ordering supervision. |
357 | Progressive Teacher-Student Learning for Early Action Prediction | Xionghui Wang, Jian-Fang Hu, Jian-Huang Lai, Jianguo Zhang, Wei-Shi Zheng | In this paper, we aim at improving early action prediction by proposing a novel teacher-student learning framework. |
358 | Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning | Xinchen Liu, Wu Liu, Meng Zhang, Jingwen Chen, Lianli Gao, Chenggang Yan, Tao Mei | To overcome these challenges, we propose a Multi-scale Spatial-Temporal Reasoning (MSTR) framework to recognize social relations from videos. |
359 | MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation | Yazan Abu Farha, Jurgen Gall | In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. |
360 | Transferable Interactiveness Knowledge for Human-Object Interaction Detection | Yong-Lu Li, Siyuan Zhou, Xijie Huang, Liang Xu, Ze Ma, Hao-Shu Fang, Yanfeng Wang, Cewu Lu | In this paper, we explore Interactiveness Knowledge which indicates whether human and object interact with each other or not. |
361 | Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition | Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, Qi Tian | To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions. |
362 | Multi-Granularity Generator for Temporal Action Proposal | Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang | In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information. |
363 | Deep Rigid Instance Scene Flow | Wei-Chiu Ma, Shenlong Wang, Rui Hu, Yuwen Xiong, Raquel Urtasun | In this paper we tackle the problem of scene flow estimation in the context of self-driving. |
364 | See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks | Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, Fatih Porikli | We propose a unified and end-to-end trainable framework where different co-attention variants can be derived for mining the rich context within videos. |
365 | Patch-Based Discriminative Feature Learning for Unsupervised Person Re-Identification | Qize Yang, Hong-Xing Yu, Ancong Wu, Wei-Shi Zheng | In this work, we overcome this problem by proposing a patch-based unsupervised learning framework in order to learn discriminative feature from patches instead of the whole images. |
366 | SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking | Guangting Wang, Chong Luo, Zhiwei Xiong, Wenjun Zeng | In this paper, we propose a SiamFC-based tracker, named SPM-Tracker, to tackle this challenge. |
367 | Spatial Fusion GAN for Image Synthesis | Fangneng Zhan, Hongyuan Zhu, Shijian Lu | This paper presents an innovative Spatial Fusion GAN (SF-GAN) that combines a geometry synthesizer and an appearance synthesizer to achieve synthesis realism in both geometry and appearance spaces. |
368 | Text Guided Person Image Synthesis | Xingran Zhou, Siyu Huang, Bin Li, Yingming Li, Jiachen Li, Zhongfei Zhang | This paper presents a novel method to manipulate the visual appearance (pose and attribute) of a person image according to natural language descriptions. |
369 | STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing | Ming Liu, Yukang Ding, Min Xia, Xiao Liu, Errui Ding, Wangmeng Zuo, Shilei Wen | In this work, we suggest to address these issues from selective transfer perspective. |
370 | Towards Instance-Level Image-To-Image Translation | Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, Thomas S. Huang | In this paper, we present a simple yet effective instance-aware image-to-image translation approach (INIT), which employs the fine-grained local (instance) and global styles to the target image spatially. We also collect a large-scale benchmark for the new instance-level translation task. |
371 | Dense Intrinsic Appearance Flow for Human Pose Transfer | Yining Li, Chen Huang, Chen Change Loy | We present a novel approach for the task of human pose transfer, which aims at synthesizing a new image of a person from an input image of that person and a target pose. |
372 | Depth-Aware Video Frame Interpolation | Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang | In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. |
373 | Sliced Wasserstein Generative Models | Jiqing Wu, Zhiwu Huang, Dinesh Acharya, Wen Li, Janine Thoma, Danda Pani Paudel, Luc Van Gool | In this paper, we introduce novel approximations of the primal and dual SWD. |
374 | Deep Flow-Guided Video Inpainting | Rui Xu, Xiaoxiao Li, Bolei Zhou, Chen Change Loy | In this work we propose a novel flow-guided video inpainting approach. |
375 | Video Generation From Single Semantic Label Map | Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang | This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process. |
376 | Polarimetric Camera Calibration Using an LCD Monitor | Zhixiang Wang, Yinqiang Zheng, Yung-Yu Chuang | In this paper, we propose to jointly calibrate the polarizer angles and the inverse CRF (ICRF) using a slightly adapted checker pattern displayed on a liquid crystal display (LCD) monitor. |
377 | Fully Automatic Video Colorization With Self-Regularization and Diversity | Chenyang Lei, Qifeng Chen | We present a fully automatic approach to video colorization with self-regularization and diversity. |
378 | Zoom to Learn, Learn to Zoom | Xuaner Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun | This paper shows that when applying machine learning to digital zoom, it is beneficial to operate on real, RAW sensor data. |
379 | Single Image Reflection Removal Beyond Linearity | Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, Guoqiang Han, Shengfeng He | In this paper, we inject non-linearity into reflection removal from two aspects. |
380 | Learning to Separate Multiple Illuminants in a Single Image | Zhuo Hui, Ayan Chakrabarti, Kalyan Sunkavalli, Aswin C. Sankaranarayanan | We present a method to separate a single image captured under two illuminants, with different spectra, into the two images corresponding to the appearance of the scene under each individual illuminant. |
381 | Shape Unicode: A Unified Shape Representation | Sanjeev Muralikrishnan, Vladimir G. Kim, Matthew Fisher, Siddhartha Chaudhuri | We propose a unified code for 3D shapes, dubbed Shape Unicode, that imbibes shape cues across these representations into a single code, and a novel framework to learn such a code space for any 3D shape dataset. |
382 | Robust Video Stabilization by Optimization in CNN Weight Space | Jiyang Yu, Ravi Ramamoorthi | We propose a novel robust video stabilization method. |
383 | Learning Linear Transformations for Fast Image and Video Style Transfer | Xueting Li, Sifei Liu, Jan Kautz, Ming-Hsuan Yang | In this work, we present an approach for universal style transfer that learns the transformation matrix in a data-driven fashion. |
384 | Local Detection of Stereo Occlusion Boundaries | Jialiang Wang, Todd Zickler | This paper describes the local signatures for stereo occlusion boundaries that exist in a stereo cost volume, and it introduces a local detector for them based on a simple feedforward network with relatively small receptive fields. |
385 | Bi-Directional Cascade Network for Perceptual Edge Detection | Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang | To extract edges at dramatically different scales, we propose a Bi-Directional Cascade Network (BDCN) structure, where an individual layer is supervised by labeled edges at its specific scale, rather than directly applying the same supervision to all CNN outputs. |
386 | Single Image Deraining: A Comprehensive Benchmark Analysis | Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K. Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, Xiaochun Cao | We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes. |
387 | Dynamic Scene Deblurring With Parameter Selective Sharing and Nested Skip Connections | Hongyun Gao, Xin Tao, Xiaoyong Shen, Jiaya Jia | Inside the subnetwork of each scale, we propose a nested skip connection structure for the nonlinear transformation modules to replace stacked convolution layers or residual blocks. Besides, we build a new large dataset of blurred/sharp image pairs towards better restoration quality. |
388 | Events-To-Video: Bringing Modern Computer Vision to Event Cameras | Henri Rebecq, Rene Ranftl, Vladlen Koltun, Davide Scaramuzza | In this work, we take a different view and propose to apply existing, mature computer vision techniques to videos reconstructed from event data. |
389 | Feedback Network for Image Super-Resolution | Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, Wei Wu | In this paper, we propose an image super-resolution feedback network (SRFBN) to refine low-level representations with high-level information. |
390 | Semi-Supervised Transfer Learning for Image Rain Removal | Wei Wei, Deyu Meng, Qian Zhao, Zongben Xu, Ying Wu | Semi-Supervised Transfer Learning for Image Rain Removal. |
391 | EventNet: Asynchronous Recursive Event Processing | Yusuke Sekikawa, Kosuke Hara, Hideo Saito | We propose EventNet, a neural network designed for real-time processing of asynchronous event streams in a recursive and event-wise manner. |
392 | Recurrent Back-Projection Network for Video Super-Resolution | Muhammad Haris, Gregory Shakhnarovich, Norimichi Ukita | We proposed a novel architecture for the problem of video super-resolution. We propose a new video super-resolution benchmark, allowing evaluation at a larger scale and considering videos in different motion regimes. |
393 | Cascaded Partial Decoder for Fast and Accurate Salient Object Detection | Zhe Wu, Li Su, Qingming Huang | In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. |
394 | A Simple Pooling-Based Design for Real-Time Salient Object Detection | Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang | We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. |
395 | Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection | Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, Le Zhang | In this paper, we utilize contrast prior, which used to be a dominant cue in none deep learning based SOD approaches, into CNNs-based architecture to enhance the depth information. |
396 | Progressive Image Deraining Networks: A Better and Simpler Baseline | Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, Deyu Meng | To handle this issue, this paper provides a better and simpler baseline deraining network by considering network architecture, input and output, and loss functions. |
397 | GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud | Li Yi, Wang Zhao, He Wang, Minhyuk Sung, Leonidas J. Guibas | We introduce a novel 3D object proposal approach named Generative Shape Proposal Network (GSPN) for instance segmentation in point cloud data. |
398 | Attentive Relational Networks for Mapping Images to Scene Graphs | Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, Jiebo Luo | In this study, we propose a novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem. |
399 | Relational Knowledge Distillation | Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho | We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. |
400 | Compressing Convolutional Neural Networks via Factorized Convolutional Filters | Tuanhui Li, Baoyuan Wu, Yujiu Yang, Yanbo Fan, Yong Zhang, Wei Liu | In this work, we propose to conduct filter selection and filter learning simultaneously, in a unified model. |
401 | On the Intrinsic Dimensionality of Image Representations | Sixue Gong, Vishnu Naresh Boddeti, Anil K. Jain | This paper addresses the following questions pertaining to the intrinsic dimensionality of any given image representation: (i) estimate its intrinsic dimensionality, (ii) develop a deep neural network based non-linear mapping, dubbed DeepMDS, that transforms the ambient representation to the minimal intrinsic space, and (iii) validate the veracity of the mapping through image matching in the intrinsic space. |
402 | Part-Regularized Near-Duplicate Vehicle Re-Identification | Bing He, Jia Li, Yifan Zhao, Yonghong Tian | In this paper, we proposed a simple but efficient part-regularized discriminative feature preserving method which enhances the perceptive ability of subtle discrepancies. |
403 | Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics | Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu | In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. |
404 | Classification-Reconstruction Learning for Open-Set Recognition | Ryota Yoshihashi, Wen Shao, Rei Kawakami, Shaodi You, Makoto Iida, Takeshi Naemura | In contrast, we train networks for joint classification and reconstruction of input data. |
405 | Emotion-Aware Human Attention Prediction | Macario O. Cordel II, Shaojing Fan, Zhiqi Shen, Mohan S. Kankanhalli | In this work, we investigate the relation between object sentiment and human attention. |
406 | Residual Regression With Semantic Prior for Crowd Counting | Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, Wei Liu | In this paper, a residual regression framework is proposed for crowd counting utilizing the correlation information among samples. |
407 | Context-Reinforced Semantic Segmentation | Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng | In this paper, we propose a dedicated module, Context Net, to better explore the context information in p-maps. |
408 | Adversarial Structure Matching for Structured Prediction Tasks | Jyh-Jing Hwang, Tsung-Wei Ke, Jianbo Shi, Stella X. Yu | We, on the other hand, approach this problem from an opposing angle and propose a new framework, Adversarial Structure Matching (ASM), for training such structured prediction networks via an adversarial process, in which we train a structure analyzer that provides the supervisory signals, the ASM loss. |
409 | Deep Spectral Clustering Using Dual Autoencoder Network | Xu Yang, Cheng Deng, Feng Zheng, Junchi Yan, Wei Liu | In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering. |
410 | Deep Asymmetric Metric Learning via Rich Relationship Mining | Xinyi Xu, Yanhua Yang, Cheng Deng, Feng Zheng | This motivates us to propose a novel framework, named deep asymmetric metric learning via rich relationship mining (DAMLRRM), to mine rich relationship under satisfying sampling size. |
411 | Did It Change? Learning to Detect Point-Of-Interest Changes for Proactive Map Updates | Jerome Revaud, Minhyeok Heo, Rafael S. Rezende, Chanmi You, Seong-Gyun Jeong | Motivated by the broad availability of geo-tagged street-view images, we propose a new task aiming to make the map update process more proactive. Faced with the lack of an appropriate benchmark, we build and release a large dataset, captured in two large shopping centers, that comprises 33K geo-localized images and 578 POIs. |
412 | Associatively Segmenting Instances and Semantics in Point Clouds | Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia | In this paper, we first introduce a simple and flexible framework to segment instances and semantics in point clouds simultaneously. Then, we propose two approaches which make the two tasks take advantage of each other, leading to a win-win situation. |
413 | Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation | Zhenyu Zhang, Zhen Cui, Chunyan Xu, Yan Yan, Nicu Sebe, Jian Yang | In this paper, we propose a novel Pattern-Affinitive Propagation (PAP) framework to jointly predict depth, surface normal and semantic segmentation. |
414 | Scene Categorization From Contours: Medial Axis Based Salience Measures | Morteza Rezanejad, Gabriel Downs, John Wilder, Dirk B. Walther, Allan Jepson, Sven Dickinson, Kaleem Siddiqi | Specifically, we use off-the-shelf pre-trained Convolutional Neural Networks (CNNs) to perform scene classification given only contour information as input, and find performance levels well above chance. |
415 | Unsupervised Image Captioning | Yang Feng, Lin Ma, Wei Liu, Jiebo Luo | In this paper, we make the first attempt to train an image captioning model in an unsupervised manner. |
416 | Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables | Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu | In this work, we study the robustness of a CNN+RNN based image captioning system being subjected to adversarial noises. |
417 | Cross-Modal Relationship Inference for Grounding Referring Expressions | Sibei Yang, Guanbin Li, Yizhou Yu | In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships, that have connections with a given expression, with a cross-modal attention mechanism, and represent the extracted information as a language-guided visual relation graph. |
418 | What’s to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions | Ehsan Abbasnejad, Qi Wu, Qinfeng Shi, Anton van den Hengel | We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output. |
419 | Iterative Alignment Network for Continuous Sign Language Recognition | Junfu Pu, Wengang Zhou, Houqiang Li | In this paper, we propose an alignment network with iterative optimization for weakly supervised continuous sign language recognition. |
420 | Neural Sequential Phrase Grounding (SeqGROUND) | Pelin Dogan, Leonid Sigal, Markus Gross | We propose an end-to-end approach for phrase grounding in images. |
421 | CLEVR-Ref+: Diagnosing Visual Reasoning With Referring Expressions | Runtao Liu, Chenxi Liu, Yutong Bai, Alan L. Yuille | In particular, we present two interesting and important findings using IEP-Ref: (1) the module trained to transform feature maps into segmentation masks can be attached to any intermediate module to reveal the entire reasoning process step-by-step; (2) even if all training data has at least one object referred, IEP-Ref can correctly predict no-foreground when presented with false-premise referring expressions. To address these issues and complement similar efforts in visual question answering, we build CLEVR-Ref+, a synthetic diagnostic dataset for referring expression comprehension. We will release data and code for CLEVR-Ref+. |
422 | Describing Like Humans: On Diversity in Image Captioning | Qingzhong Wang, Antoni B. Chan | In this paper, we proposed a new metric for measuring the diversity of image captions, which is derived from latent semantic analysis and kernelized to use CIDEr similarity. |
423 | MSCap: Multi-Style Image Captioning With Unpaired Stylized Text | Longteng Guo, Jing Liu, Peng Yao, Jiangwei Li, Hanqing Lu | In this paper, we propose an adversarial learning network for the task of multi-style image captioning (MSCap) with a standard factual image caption dataset and a multi-stylized language corpus without paired images. |
424 | CRAVES: Controlling Robotic Arm With a Vision-Based Economic System | Yiming Zuo, Weichao Qiu, Lingxi Xie, Fangwei Zhong, Yizhou Wang, Alan L. Yuille | In this paper, we present an alternative solution, which uses a 3D model to create a large number of synthetic data, trains a vision model in this virtual domain, and applies it to real-world images after domain adaptation. |
425 | Networks for Joint Affine and Non-Parametric Image Registration | Zhengyang Shen, Xu Han, Zhenlin Xu, Marc Niethammer | We introduce an end-to-end deep-learning framework for 3D medical image registration. |
426 | Learning Shape-Aware Embedding for Scene Text Detection | Zhuotao Tian, Michelle Shu, Pengyuan Lyu, Ruiyu Li, Chao Zhou, Xiaoyong Shen, Jiaya Jia | Specifically, we treat text detection as instance segmentation and propose a segmentation-based framework, which extracts each text instance as an independent connected component. |
427 | Learning to Film From Professional Human Motion Videos | Chong Huang, Chuan-En Lin, Zhenyu Yang, Yan Kong, Peng Chen, Xin Yang, Kwang-Ting Cheng | In this study, we propose a learning-based framework which incorporates the video contents and previous camera motions to predict the future camera motions that enable the capture of professional videos. |
428 | Pay Attention! – Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention | Pooya Abolghasemi, Amir Mazaheri, Mubarak Shah, Ladislau Boloni | In this paper, we propose an approach for augmenting a deep visuomotor policy trained through demonstrations with Task Focused visual Attention (TFA). |
429 | Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence | Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon | In this paper, we propose a simple yet effective framework for fast blind video decaptioning. |
430 | Learning Video Representations From Correspondence Proposals | Xingyu Liu, Joon-Young Lee, Hailin Jin | In this paper, we propose a novel neural network that learns video representations by aggregating information from potential correspondences. |
431 | SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks | Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, Junjie Yan | In this work we prove the core reason comes from the lack of strict translation invariance. |
432 | Sphere Generative Adversarial Network Based on Geometric Moment Matching | Sung Woo Park, Junseok Kwon | In the paper, we mathematically prove the good properties of sphere GAN. |
433 | Adversarial Attacks Beyond the Image Space | Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi-Keung Tang, Alan L. Yuille | Most existing approaches generated perturbations in the image space, i.e., each pixel can be modified independently. However, in this paper we pay special attention to the subset of adversarial examples that correspond to meaningful changes in 3D physical properties (like rotation and translation, illumination condition, etc.). |
434 | Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks | Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu | In this paper, we propose a translation-invariant attack method to generate more transferable adversarial examples against the defense models. |
435 | Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses | Jerome Rony, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, Eric Granger | In this paper, an efficient approach is proposed to generate gradient-based attacks that induce misclassifications with low L2 norm, by decoupling the direction and the norm of the adversarial perturbation that is added to the image. |
436 | A General and Adaptive Robust Loss Function | Jonathan T. Barron | We present a generalization of the Cauchy/Lorentzian, Geman-McClure, Welsch/Leclerc, generalized Charbonnier, Charbonnier/pseudo-Huber/L1-L2, and L2 loss functions. |
437 | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration | Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang | To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. |
438 | Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss | Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, Changkyu Choi | To tackle this problem, we propose to learn to quantize activations and weights via a trainable quantizer that transforms and discretizes them. |
439 | Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selection | Ruoqi Sun, Xinge Zhu, Chongruo Wu, Chen Huang, Jianping Shi, Lizhuang Ma | In this paper, we consider transfer learning for semantic segmentation that aims to mitigate the gap between abundant synthetic data (source domain) and limited real data (target domain). |
440 | Unsupervised Learning of Dense Shape Correspondence | Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bronstein, Ron Kimmel | We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. |
441 | Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach | Minyoung Kim, Pritish Sahu, Behnam Gholami, Vladimir Pavlovic | In this paper, we take this principle further by proposing a more systematic and effective way to achieve hypothesis consistency using Gaussian processes (GP). |
442 | Balanced Self-Paced Learning for Generative Adversarial Clustering Network | Kamran Ghasedi, Xiaoqian Wang, Cheng Deng, Heng Huang | In this paper, we propose a deep Generative Adversarial Clustering Network (ClusterGAN), which tackles the problems of training of deep clustering models in unsupervised manner. |
443 | A Style-Based Generator Architecture for Generative Adversarial Networks | Tero Karras, Samuli Laine, Timo Aila | We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. Finally, we introduce a new, highly varied and high-quality dataset of human faces. |
444 | Parallel Optimal Transport GAN | Gil Avraham, Yan Zuo, Tom Drummond | To address these issues, we introduce an additional regularisation term which performs optimal transport in parallel within a low dimensional representation space. |
445 | 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans | Ji Hou, Angela Dai, Matthias Niessner | We introduce 3D-SIS, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans. |
446 | Causes and Corrections for Bimodal Multi-Path Scanning With Structured Light | Yu Zhang, Daniel L. Lau, Ying Yu | In this paper, we present a general mathematical model to address the bimodal multi-path issue in a phase-measuring-profilometry scanner to measure the constructive and destructive interference between the two light paths, and by taking advantage of this interesting cue, separate the paths and make two decoupled phase measurements. |
447 | TextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on Meshes | Jingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Niessner, Leonidas J. Guibas | We introduce, TextureNet, a neural network architecture designed to extract features from high-resolution signals associated with 3D surface meshes (e.g., color texture maps). |
448 | PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image | Chen Liu, Kihwan Kim, Jinwei Gu, Yasutaka Furukawa, Jan Kautz | This paper proposes a deep neural architecture, PlaneRCNN, that detects and reconstructs piecewise planar regions from a single RGB image. The paper also presents a new benchmark with more fine-grained plane segmentations in the ground-truth, in which, PlaneRCNN outperforms existing state-of-the-art methods with significant margins in the plane detection, segmentation, and reconstruction metrics. |
449 | Occupancy Networks: Learning 3D Reconstruction in Function Space | Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas Geiger | In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. |
450 | 3D Shape Reconstruction From Images in the Frequency Domain | Weichao Shen, Yunde Jia, Yuwei Wu | In this paper, we propose a Fourier-based method that reconstructs a 3D shape from images in a 2D space by predicting slices in the frequency domain. |
451 | SiCloPe: Silhouette-Based Clothed People | Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, Shigeo Morishima | We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models. |
452 | Detailed Human Shape Estimation From a Single Image by Hierarchical Mesh Deformation | Hao Zhu, Xinxin Zuo, Sen Wang, Xun Cao, Ruigang Yang | This paper presents a novel framework to recover detailed human body shapes from a single image. |
453 | Convolutional Mesh Regression for Single-Image Human Shape Reconstruction | Nikos Kolotouros, Georgios Pavlakos, Kostas Daniilidis | In our work, we propose to relax this heavy reliance on the model’s parameter space. |
454 | H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions | Bugra Tekin, Federica Bogo, Marc Pollefeys | We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras. |
455 | Learning the Depths of Moving People by Watching Frozen People | Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman | We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving. |
456 | Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion | Zhenpei Yang, Jeffrey Z. Pan, Linjie Luo, Xiaowei Zhou, Kristen Grauman, Qixing Huang | We introduce a novel approach that extends the scope to extreme relative poses, with little or even no overlap between the input scans. |
457 | A Skeleton-Bridged Deep Learning Approach for Generating Meshes of Complex Topologies From Single RGB Images | Jiapeng Tang, Xiaoguang Han, Junyi Pan, Kui Jia, Xin Tong | To this end, we propose in this paper a skeleton-bridged, stage-wise learning approach to address the challenge. |
458 | Learning Structure-And-Motion-Aware Rolling Shutter Correction | Bingbing Zhuang, Quoc-Huy Tran, Pan Ji, Loong-Fah Cheong, Manmohan Chandraker | Our method learns from a large-scale dataset synthesized in a geometrically meaningful way where the RS effect is generated in a manner consistent with the camera motion and scene structure. |
459 | PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation | Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, Hujun Bao | Instead, we introduce a Pixel-wise Voting Network (PVNet) to regress pixel-wise vectors pointing to the keypoints and use these vectors to vote for keypoint locations. We further create a Truncation LINEMOD dataset to validate the robustness of our approach against truncation. |
460 | SelFlow: Self-Supervised Learning of Optical Flow | Pengpeng Liu, Michael Lyu, Irwin King, Jia Xu | We present a self-supervised learning approach for optical flow. |
461 | Taking a Deeper Look at the Inverse Compositional Algorithm | Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger | In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. |
462 | Deeper and Wider Siamese Networks for Real-Time Visual Tracking | Zhipeng Zhang, Houwen Peng | In this paper, we investigate how to leverage deeper and wider convolutional neural networks to enhance tracking robustness and accuracy. |
463 | Self-Supervised Adaptation of High-Fidelity Face Models for Monocular Performance Tracking | Jae Shin Yoon, Takaaki Shiratori, Shoou-I Yu, Hyun Soo Park | In this paper, we propose a self-supervised domain adaptation approach to enable the animation of high-fidelity face models from a commodity camera. |
464 | Diverse Generation for Multi-Agent Sports Games | Raymond A. Yeh, Alexander G. Schwing, Jonathan Huang, Kevin Murphy | In this paper, we propose a new generative model for multi-agent trajectory data, focusing on the case of multi-player sports games. |
465 | Efficient Online Multi-Person 2D Pose Tracking With Recurrent Spatio-Temporal Affinity Fields | Yaadhav Raaj, Haroon Idrees, Gines Hidalgo, Yaser Sheikh | We present an online approach to efficiently and simultaneously detect and track 2D poses of multiple people in a video sequence. |
466 | GFrames: Gradient-Based Local Reference Frame for 3D Shape Matching | Simone Melzi, Riccardo Spezialetti, Federico Tombari, Michael M. Bronstein, Luigi Di Stefano, Emanuele Rodola | We introduce GFrames, a novel local reference frame (LRF) construction for 3D meshes and point clouds. |
467 | Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking | Andrii Maksai, Pascal Fua | In this paper, we introduce a new training procedure that confronts the algorithm to its own mistakes while explicitly attempting to minimize the number of switches, which results in better training. |
468 | Graph Convolutional Tracking | Junyu Gao, Tianzhu Zhang, Changsheng Xu | To comprehensively leverage the spatial-temporal structure of historical target exemplars and get benefit from the context information, in this work, we present a novel Graph Convolutional Tracking (GCT) method for high-performance visual tracking. |
469 | ATOM: Accurate Tracking by Overlap Maximization | Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg | We address this problem by proposing a novel tracking architecture, consisting of dedicated target estimation and classification components. |
470 | Visual Tracking via Adaptive Spatially-Regularized Correlation Filters | Kenan Dai, Dong Wang, Huchuan Lu, Chong Sun, Jianhua Li | In this work, we propose a novel adaptive spatially-regularized correlation filters (ASRCF) model to simultaneously optimize the filter coefficients and the spatial regularization weight. |
471 | Deep Tree Learning for Zero-Shot Face Anti-Spoofing | Yaojie Liu, Joel Stehouwer, Amin Jourabloo, Xiaoming Liu | In this work, we expand the ZSFA problem to a wide range of 13 types of spoof attacks, including print attack, replay attack, 3D mask attacks, and so on. In addition, to enable the study of ZSFA, we introduce the first face anti-spoofing database that contains diverse types of spoof attacks. |
472 | ArcFace: Additive Angular Margin Loss for Deep Face Recognition | Jiankang Deng, Jia Guo, Niannan Xue, Stefanos Zafeiriou | In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. |
473 | Learning Joint Gait Representation via Quintuplet Loss Minimization | Kaihao Zhang, Wenhan Luo, Lin Ma, Wei Liu, Hongdong Li | In this paper, we propose a new Joint Unique-gait and Cross-gait Network (JUCNet), to combine the advantages of unique-gait representation with that of cross-gait representation, leading to an significantly improved performance. |
474 | Gait Recognition via Disentangled Representation Learning | Ziyuan Zhang, Luan Tran, Xi Yin, Yousef Atoum, Xiaoming Liu, Jian Wan, Nanxin Wang | To remedy this issue, we propose a novel AutoEncoder framework to explicitly disentangle pose and appearance features from RGB imagery and the LSTM-based integration of pose features over time produces the gait feature. In addition, we collect a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since it contains minimal gait cues compared to other views. |
475 | Reversible GANs for Memory-Efficient Image-To-Image Translation | Tycho F.A. van der Ouderaa, Daniel E. Worrall | We extend this framework by exploring approximately invertible architectures which are well suited to these losses. |
476 | Sensitive-Sample Fingerprinting of Deep Neural Networks | Zecheng He, Tianwei Zhang, Ruby Lee | In this paper, we propose a novel and practical methodology to verify the integrity of remote deep learning models, with only black-box access to the target models. |
477 | Soft Labels for Ordinal Regression | Raul Diaz, Amit Marathe | We present a simple and effective method that constrains these relationships among categories by seamlessly incorporating metric penalties into ground truth label representations. |
478 | Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks | Hao Cheng, Dongze Lian, Bowen Deng, Shenghua Gao, Tao Tan, Yanlin Geng | In this paper, we incorporate the idea of LGL into the learning objective of DNNs and explain why LGL works better from an information-theoretic perspective. |
479 | What Does It Mean to Learn in Deep Networks? And, How Does One Detect Adversarial Attacks? | Ciprian A. Corneanu, Meysam Madadi, Sergio Escalera, Aleix M. Martinez | Here, we derive a novel approach to define what it means to learn in deep networks, and how to use this knowledge to detect adversarial attacks. |
480 | Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning | Ayan Kumar Bhunia, Abhirup Das, Ankan Kumar Bhunia, Perla Sai Raj Kishore, Partha Pratim Roy | We propose an Adversarial Feature Deformation Module (AFDM) that learns ways to elastically warp extracted features in a scalable manner. |
481 | Adversarial Defense Through Network Profiling Based Path Extraction | Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu | This work proposes a profiling-based method to decompose the DNN models to different functional blocks, which lead to the effective path as a new approach to exploring DNNs’ internal organization. |
482 | RENAS: Reinforced Evolutionary Neural Architecture Search | Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang | To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RENAS), which is an evolutionary method with reinforced mutation for NAS. |
483 | Co-Occurrence Neural Network | Irina Shevlev, Shai Avidan | We show how to train the filter as part of the network and report results on several data sets. |
484 | SpotTune: Transfer Learning Through Adaptive Fine-Tuning | Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, Rogerio Feris | In this paper, we propose an adaptive fine-tuning approach, called SpotTune, which finds the optimal fine-tuning strategy per instance for the target data. |
485 | Signal-To-Noise Ratio: A Robust Distance Metric for Deep Metric Learning | Tongtong Yuan, Weihong Deng, Jian Tang, Yinan Tang, Binghui Chen | In this paper, different from the approaches on learning the loss structures, we propose a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning. |
486 | Detection Based Defense Against Adversarial Examples From the Steganalysis Point of View | Jiayang Liu, Weiming Zhang, Yiwei Zhang, Dongdong Hou, Yujia Liu, Hongyue Zha, Nenghai Yu | In this paper, we point out that steganalysis can be applied to adversarial examples detection, and propose a method to enhance steganalysis features by estimating the probability of modifications caused by adversarial attacks. |
487 | HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs | Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri | We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. |
488 | Strike (With) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects | Michael A. Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, Anh Nguyen | In this paper, we present a framework for discovering DNN failures that harnesses 3D renderers and 3D models. |
489 | Blind Geometric Distortion Correction on Images Through Deep Learning | Xiaoyu Li, Bo Zhang, Pedro V. Sander, Jing Liao | We propose the first general framework to automatically correct different types of geometric distortion in a single input image. |
490 | Instance-Level Meta Normalization | Songhao Jia, Ding-Jie Chen, Hwann-Tzong Chen | This paper presents a normalization mechanism called Instance-Level Meta Normalization (ILM Norm) to address a learning-to-normalize problem. |
491 | Iterative Normalization: Beyond Standardization Towards Efficient Whitening | Lei Huang, Yi Zhou, Fan Zhu, Li Liu, Ling Shao | We propose Iterative Normalization (IterNorm), which employs Newton’s iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition. |
492 | On Learning Density Aware Embeddings | Soumyadeep Ghosh, Richa Singh, Mayank Vatsa | In this paper, a novel noise tolerant deep metric learning algorithm is proposed. |
493 | Contrastive Adaptation Network for Unsupervised Domain Adaptation | Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann | To address this issue, this paper proposes Contrastive Adaptation Network (CAN) optimizing a new metric which explicitly models the intra-class domain discrepancy and the inter-class domain discrepancy. |
494 | LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks | Sudhakar Kumawat, Shanmuganathan Raman | To address these issues, we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative to the standard 3D convolutional layer. |
495 | Attribute-Driven Feature Disentangling and Temporal Aggregation for Video Person Re-Identification | Yiru Zhao, Xu Shen, Zhongming Jin, Hongtao Lu, Xian-sheng Hua | In this paper, we propose an attribute-driven method for feature disentangling and frame re-weighting. |
496 | Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? | Shilin Zhu, Xin Dong, Hao Su | Inspired by this investigation, we propose the Binary Ensemble Neural Network (BENN) which leverages ensemble methods to improve the performance of BNNs with limited efficiency cost. |
497 | Distilling Object Detectors With Fine-Grained Feature Imitation | Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng | To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response. |
498 | Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure | Xiaohan Ding, Guiguang Ding, Yuchen Guo, Jungong Han | To this end, we propose Centripetal SGD (C-SGD), a novel optimization method, which can train several filters to collapse into a single point in the parameter hyperspace. |
499 | Knockoff Nets: Stealing Functionality of Black-Box Models | Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz | In this work, we ask to what extent can an adversary steal functionality of such “victim” models based solely on blackbox interactions: image in, predictions out. |
500 | Deep Embedding Learning With Discriminative Sampling Policy | Yueqi Duan, Lei Chen, Jiwen Lu, Jie Zhou | In this paper, we propose a deep embedding with discriminative sampling policy (DE-DSP) learning framework by simultaneously training two models: a deep sampler network that learns effective sampling strategies, and a feature embedding that maps samples to the feature space. |
501 | Hybrid Task Cascade for Instance Segmentation | Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin | In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. |
502 | Multi-Task Self-Supervised Object Detection via Recycling of Bounding Box Annotations | Wonhee Lee, Joonil Na, Gunhee Kim | To make better use of given limited labels, we propose a novel object detection approach that takes advantage of both multi-task learning (MTL) and self-supervised learning (SSL). We propose a set of auxiliary tasks that help improve the accuracy of object detection. |
503 | ClusterNet: Deep Hierarchical Cluster Network With Rigorously Rotation-Invariant Representation for Point Cloud Analysis | Chao Chen, Guanbin Li, Ruijia Xu, Tianshui Chen, Meng Wang, Liang Lin | In this paper, we address the issue by introducing a novel point cloud representation that can be mathematically proved rigorously rotation-invariant, i.e., identical point clouds in different orientations are unified as a unique and consistent representation. |
504 | Learning to Learn Relation for Important People Detection in Still Images | Wei-Hong Li, Fa-Ting Hong, Wei-Shi Zheng | In this work, we propose a deep imPOrtance relatIon NeTwork (POINT) that combines both relation modeling and feature learning. |
505 | Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition | Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo | In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. |
506 | Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning | Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, Matthew R. Scott | Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW,which is implemented in two iterative steps (i.e., mining and weighting). |
507 | Domain-Symmetric Networks for Adversarial Domain Adaptation | Yabin Zhang, Hui Tang, Kui Jia, Mingkui Tan | To train the SymNet, we propose a novel adversarial learning objective whose key design is based on a two-level domain confusion scheme, where the category-level confusion loss improves over the domain-level one by driving the learning of intermediate network features to be invariant at the corresponding categories of the two domains. |
508 | End-To-End Supervised Product Quantization for Image Search and Retrieval | Benjamin Klein, Lior Wolf | This work presents Deep Product Quantization (DPQ), a technique that leads to more accurate retrieval and classification than the latest state of the art methods, while having similar computational complexity and memory footprint as the Product Quantization method. |
509 | Learning to Learn From Noisy Labeled Data | Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli | To overcome this problem, we propose a noise-tolerant training algorithm, where a meta-learning update is performed prior to conventional gradient update. |
510 | DSFD: Dual Shot Face Detector | Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, Feiyue Huang | In this Paper, we propose a novel detection network named Dual Shot face Detector(DSFD). |
511 | Label Propagation for Deep Semi-Supervised Learning | Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondrej Chum | In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network. |
512 | Deep Global Generalized Gaussian Networks | Qilong Wang, Peihua Li, Qinghua Hu, Pengfei Zhu, Wangmeng Zuo | To handle this issue, this paper proposes a novel deep global generalized Gaussian network (3G-Net), whose core is to estimate a global covariance of generalized Gaussian for modeling the last convolutional activations. |
513 | Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval | Anjan Dutta, Zeynep Akata | In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. |
514 | Context-Aware Crowd Counting | Weizhe Liu, Mathieu Salzmann, Pascal Fua | In this paper, we introduce an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location. |
515 | Detect-To-Retrieve: Efficient Regional Aggregation for Image Search | Marvin Teichmann, Andre Araujo, Menglong Zhu, Jack Sim | In this paper, we first fill the void by providing a new dataset of landmark bounding boxes, based on the Google Landmarks dataset, that includes 94k images with manually curated boxes from 15k unique landmarks. |
516 | Towards Accurate One-Stage Object Detection With AP-Loss | Kean Chen, Jianguo Li, Weiyao Lin, John See, Ji Wang, Lingyu Duan, Zhibo Chen, Changwei He, Junni Zou | This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. |
517 | On Exploring Undetermined Relationships for Visual Relationship Detection | Yibing Zhan, Jun Yu, Ting Yu, Dacheng Tao | In this paper, we explore the beneficial effect of undetermined relationships on visual relationship detection. |
518 | Learning Without Memorizing | Prithviraj Dhar, Rajat Vikram Singh, Kuan-Chuan Peng, Ziyan Wu, Rama Chellappa | Hence, we propose a novel approach, called `Learning without Memorizing (LwM)’, to preserve the information about existing (base) classes, without storing any of their data, while making the classifier progressively learn the new classes. |
519 | Dynamic Recursive Neural Network | Qiushan Guo, Zhipeng Yu, Yichao Wu, Ding Liang, Haoyu Qin, Junjie Yan | This paper proposes the dynamic recursive neural network (DRNN), which simplifies the duplicated building blocks in deep neural network. |
520 | Destruction and Construction Learning for Fine-Grained Image Recognition | Yue Chen, Yalong Bai, Wei Zhang, Tao Mei | In this paper, we propose a novel “Destruction and Construction Learning” (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge. |
521 | Distraction-Aware Shadow Detection | Quanlong Zheng, Xiaotian Qiao, Ying Cao, Rynson W.H. Lau | In this paper, we propose a Distraction-aware Shadow Detection Network (DSDNet) by explicitly learning and integrating the semantics of visual distraction regions in an end-to-end framework. |
522 | Multi-Label Image Recognition With Graph Convolutional Networks | Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, Yanwen Guo | To capture and explore such important dependencies, we propose a multi-label classification model based on Graph Convolutional Network (GCN). |
523 | High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection | Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, Yinan Yu | In this paper, taking pedestrian detection as an example, we provide a new perspective where detecting objects is motivated as a high-level semantic feature detection task. |
524 | RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection | Leonid Karlinsky, Joseph Shtok, Sivan Harary, Eli Schwartz, Amit Aides, Rogerio Feris, Raja Giryes, Alex M. Bronstein | In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. |
525 | Ranked List Loss for Deep Metric Learning | Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, Romain Garnier, Neil M. Robertson | In this work, we present two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. |
526 | CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning | Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, Chunhua Shen | In this paper, we present CANet, a class-agnostic segmentation network that performs few-shot segmentation on new classes with only a few annotated images available. |
527 | Precise Detection in Densely Packed Scenes | Eran Goldman, Roei Herzig, Aviv Eisenschtat, Jacob Goldberger, Tal Hassner | We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. |
528 | KE-GAN: Knowledge Embedded Generative Adversarial Networks for Semi-Supervised Scene Parsing | Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li | In this paper, we propose a novel Knowledge Embedded Generative Adversarial Networks, dubbed as KE-GAN, to tackle the challenging problem in a semi-supervised fashion. |
529 | Fast User-Guided Video Object Segmentation by Interaction-And-Propagation Networks | Seoung Wug Oh, Joon-Young Lee, Ning Xu, Seon Joo Kim | We present a deep learning method for the interactive video object segmentation. |
530 | Fast Interactive Object Annotation With Curve-GCN | Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler | We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using a Graph Convolutional Network (GCN). |
531 | FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference | Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon | FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks. |
532 | RVOS: End-To-End Recurrent Network for Video Object Segmentation | Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques, Xavier Giro-i-Nieto | In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. |
533 | DeepFlux for Skeletons in the Wild | Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi | In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms. |
534 | Interactive Image Segmentation via Backpropagating Refinement Scheme | Won-Dong Jang, Chang-Su Kim | An interactive image segmentation algorithm, which accepts user-annotations about a target object and the background, is proposed in this work. |
535 | Scene Parsing via Integrated Classification Model and Variance-Based Regularization | Hengcan Shi, Hongliang Li, Qingbo Wu, Zichen Song | In this paper, we propose an integrated classification model and a variance-based regularization to achieve more accurate classifications. |
536 | RAVEN: A Dataset for Relational and Analogical Visual REasoNing | Chi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu | In this work, we propose a new dataset, built in the context of Raven’s Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. |
537 | Surface Reconstruction From Normals: A Robust DGP-Based Discontinuity Preservation Approach | Wuyuan Xie, Miaohui Wang, Mingqiang Wei, Jianmin Jiang, Jing Qin | This paper introduces a robust approach to preserve the surface discontinuity in the discrete geometry way. |
538 | DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images | Yuying Ge, Ruimao Zhang, Xiaogang Wang, Xiaoou Tang, Ping Luo | DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images |
539 | Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure From Motion | Suryansh Kumar | Given dense image feature correspondences of a non-rigidly moving object across multiple frames, this paper proposes an algorithm to estimate its 3D shape for each frame. |
540 | LVIS: A Dataset for Large Vocabulary Instance Segmentation | Agrim Gupta, Piotr Dollar, Ross Girshick | In this work, we introduce LVIS (pronounced ‘el-vis’): a new dataset for Large Vocabulary Instance Segmentation. |
541 | Fast Object Class Labelling via Speech | Michael Gygli, Vittorio Ferrari | Instead, we propose a new interface where classes are annotated via speech. |
542 | LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking | Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling | In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking. |
543 | Creative Flow+ Dataset | Maria Shugrina, Ziheng Liang, Amlan Kar, Jiaman Li, Angad Singh, Karan Singh, Sanja Fidler | We present the Creative Flow+ Dataset, the first diverse multi-style artistic video dataset richly labeled with per-pixel optical flow, occlusions, correspondences, segmentation labels, normals, and depth. |
544 | Weakly Supervised Open-Set Domain Adaptation by Dual-Domain Collaboration | Shuhan Tan, Jiening Jiao, Wei-Shi Zheng | To address this practical setting, we propose the Collaborative Distribution Alignment (CDA) method, which performs knowledge transfer bilaterally and works collaboratively to classify unlabeled data and identify outlier samples. |
545 | A Neurobiological Evaluation Metric for Neural Network Model Search | Nathaniel Blanchard, Jeffery Kinnison, Brandon RichardWebster, Pouya Bashivan, Walter J. Scheirer | In this paper we introduce a human-model similarity (HMS) metric, which quantifies the similarity of human fMRI and network activation behavior. |
546 | Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision | Alireza Zaeemzadeh, Mohsen Joneidi, Nazanin Rahnavard, Mubarak Shah | This paper presents a fast and accurate data selection method, in which the selected samples are optimized to span the subspace of all data. |
547 | Efficient Multi-Domain Learning by Covariance Normalization | Yunsheng Li, Nuno Vasconcelos | The problem of multi-domain learning of deep networks is considered. An adaptive layer is induced per target domain and a novel procedure, denoted covariance normalization (CovNorm), proposed to reduce its parameters. |
548 | Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance | Nanyang Ye, Krzysztof Wolski, Rafal K. Mantiuk | In this paper, we propose a CNN-based visibility metric, which maintains the accuracy of deep network solutions and accounts for viewing conditions. |
549 | A Bayesian Perspective on the Deep Image Prior | Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon | We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. |
550 | ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving | Xibin Song, Peng Wang, Dingfu Zhou, Rui Zhu, Chenye Guan, Yuchao Dai, Hao Su, Hongdong Li, Ruigang Yang | In this paper, we contribute the first large scale database suitable for 3D car instance understanding – ApolloCar3D. |
551 | Compressing Unknown Images With Product Quantizer for Efficient Zero-Shot Classification | Jin Li, Xuguang Lan, Yang Liu, Le Wang, Nanning Zheng | Based on this intuition, a Product Quantization Zero-Shot Learning (PQZSL) method is proposed to learn embeddings as well as quantizers to compress visual features into compact codes for Approximate NN (ANN) search. |
552 | Self-Supervised Convolutional Subspace Clustering Network | Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao Qi, Honggang Zhang, Jun Guo, Zhouchen Lin | To achieve simultaneous feature learning and subspace clustering, we propose an end-to-end trainable framework, called Self-Supervised Convolutional Subspace Clustering Network (S^2ConvSCN), that combines a ConvNet module (for feature learning), a self-expression module (for subspace clustering) and a spectral clustering module (for self-supervision) into a joint optimization framework. |
553 | Multi-Scale Geometric Consistency Guided Multi-View Stereo | Qingshan Xu, Wenbing Tao | In this paper, we propose an efficient multi-scale geometric consistency guided multi-view stereo method for accurate and complete depth map estimation. |
554 | Privacy Preserving Image-Based Localization | Pablo Speciale, Johannes L. Schonberger, Sing Bing Kang, Sudipta N. Sinha, Marc Pollefeys | This paper proposes the first solution to what we call privacy preserving image-based localization. |
555 | SimulCap : Single-View Human Performance Capture With Cloth Simulation | Tao Yu, Zerong Zheng, Yuan Zhong, Jianhui Zhao, Qionghai Dai, Gerard Pons-Moll, Yebin Liu | This paper proposes a new method for live free-viewpoint human performance capture with dynamic details (e.g., cloth wrinkles) using a single RGBD camera. |
556 | Hierarchical Deep Stereo Matching on High-Resolution Images | Gengshan Yang, Joshua Manela, Michael Happold, Deva Ramanan | To address this issue, we propose an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy. Because high-res stereo datasets are relatively rare, we introduce a dataset with high-res stereo pairs for both training and evaluation. |
557 | Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference | Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, Long Quan | In this paper, we introduce a scalable multi-view stereo framework based on the recurrent neural network. |
558 | Synthesizing 3D Shapes From Silhouette Image Collections Using Multi-Projection Generative Adversarial Networks | Xiao Li, Yue Dong, Pieter Peers, Xin Tong | We present a new weakly supervised learning-based method for generating novel category-specific 3D shapes from unoccluded image collections. |
559 | The Perfect Match: 3D Point Cloud Matching With Smoothed Densities | Zan Gojcic, Caifa Zhou, Jan D. Wegner, Andreas Wieser | We propose 3DSmoothNet, a full workflow to match 3D point clouds with a siamese deep learning architecture and fully convolutional layers using a voxelized smoothed density value (SDV) representation. |
560 | Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth | Rui Wang, Stephen M. Pizer, Jan-Michael Frahm | We propose a learning-based, multi-view dense depth map and odometry estimation method that uses Recurrent Neural Networks (RNN) and trains utilizing multi-view image reprojection and forward-backward flow-consistency losses. |
561 | PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing | Hengshuang Zhao, Li Jiang, Chi-Wing Fu, Jiaya Jia | This paper presents PointWeb, a new approach to extract contextual features from local neighborhood in a point cloud. |
562 | Scan2Mesh: From Unstructured Range Scans to 3D Meshes | Angela Dai, Matthias Niessner | We introduce Scan2Mesh, a novel data-driven generative approach which transforms an unstructured and potentially incomplete range scan into a structured 3D mesh representation. |
563 | Unsupervised Domain Adaptation for ToF Data Denoising With Adversarial Learning | Gianluca Agresti, Henrik Schaefer, Piergiorgio Sartor, Pietro Zanuttigh | In this paper, we avoid to rely on labeled real data in the learning framework. |
564 | Learning Independent Object Motion From Unlabelled Stereoscopic Videos | Zhe Cao, Abhishek Kar, Christian Hane, Jitendra Malik | We present a system for learning motion maps of independently moving objects from stereo videos. |
565 | Learning Single-Image Depth From Videos Using Quality Assessment Networks | Weifeng Chen, Shengyi Qian, Jia Deng | In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. |
566 | Learning 3D Human Dynamics From Video | Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, Jitendra Malik | We present a framework that can similarly learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features. |
567 | Lending Orientation to Neural Networks for Cross-View Geo-Localization | Liu Liu, Hongdong Li | Inspired by this insight, this paper proposes a novel method which endows deep neural networks with the `commonsense’ of orientation. To evaluate the generalization of our method, we also created a large-scale cross-view localization benchmark containing 100K geotagged ground-aerial pairs covering a city. |
568 | Visual Localization by Learning Objects-Of-Interest Dense Match Regression | Philippe Weinzaepfel, Gabriela Csurka, Yohann Cabon, Martin Humenberger | We introduce a novel CNN-based approach for visual localization from a single RGB image that relies on densely matching a set of Objects-of-Interest (OOIs). Given these 2D-2D matches, together with the 3D world coordinates of each reference image, we obtain a set of 2D-3D matches from which solving a Perspective-n-Point problem gives a pose estimate. |
569 | Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction | Alex Wong, Stefano Soatto | We propose a novel objective function that exploits the bilateral cyclic relationship between the left and right disparities and we introduce an adaptive regularization scheme that allows the network to handle both the co-visible and occluded regions in a stereo pair. |
570 | Face Parsing With RoI Tanh-Warping | Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, Lu Yuan | Inspired by the physiological vision system of human, we propose a novel RoI Tanh-warping operator that combines the central vision and the peripheral vision together. |
571 | Multi-Person Articulated Tracking With Spatial and Temporal Embeddings | Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian | We propose a unified framework for multi-person pose estimation and tracking. |
572 | Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information | Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, Changhu Wang | In this paper, we propose two novel modules to perform the enhancement of the information for the multi-person pose estimation. |
573 | A Compact Embedding for Facial Expression Similarity | Raviteja Vemulapalli, Aseem Agarwala | Different from previous work, our goal is to describe facial expressions in a continuous fashion using a compact embedding space that mimics human visual preferences. To achieve this goal, we collect a large-scale faces-in-the-wild dataset with human annotations in the form: Expressions A and B are visually more similar when compared to expression C, and use this dataset to train a neural network that produces a compact (16-dimensional) expression embedding. |
574 | Deep High-Resolution Representation Learning for Human Pose Estimation | Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang | In this paper, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. |
575 | Feature Transfer Learning for Face Recognition With Under-Represented Data | Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker | In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples. |
576 | Unsupervised 3D Pose Estimation With Geometric Self-Supervision | Ching-Hang Chen, Ambrish Tyagi, Amit Agrawal, Dylan Drover, Rohith MV, Stefan Stojanov, James M. Rehg | We present an unsupervised learning approach to re- cover 3D human pose from 2D skeletal joints extracted from a single image. |
577 | Peeking Into the Future: Predicting Future Person Activities and Locations in Videos | Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G. Hauptmann, Li Fei-Fei | We propose an end-to-end, multi-task learning system utilizing rich visual features about human behavioral information and interaction with their surroundings. |
578 | Re-Identification With Consistent Attentive Siamese Networks | Meng Zheng, Srikrishna Karanam, Ziyan Wu, Richard J. Radke | We propose a new deep architecture for person re-identification (re-id). |
579 | On the Continuity of Rotation Representations in Neural Networks | Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, Hao Li | In this paper, we advance a definition of a continuous representation, which can be helpful for training deep neural networks. |
580 | Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation | Junhwa Hur, Stefan Roth | Taking inspiration from both classical energy minimization approaches as well as residual networks, we propose an iterative residual refinement (IRR) scheme based on weight sharing that can be combined with several backbone networks. |
581 | Inverse Discriminative Networks for Handwritten Signature Verification | Ping Wei, Huan Li, Ping Hu | In this paper, we propose an inverse discriminative network (IDN) for writer-independent handwritten signature verification, which aims to determine whether a test signature is genuine or forged compared to the reference signature. Since there was no proper Chinese signature dataset in the community, we collected a large-scale Chinese signature dataset with approximately 29,000 images of 749 individuals’ signatures. |
582 | Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces | Guodong Mu, Di Huang, Guosheng Hu, Jia Sun, Yunhong Wang | In this paper, we focus on 3D FR using low-quality data, targeting an efficient and accurate deep learning solution. |
583 | ROI Pooled Correlation Filters for Visual Tracking | Yuxuan Sun, Chong Sun, Dong Wang, You He, Huchuan Lu | In this paper, we propose a novel ROI pooled correlation filter (RPCF) algorithm for robust visual tracking. |
584 | Deep Video Inpainting | Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon | In this work, we propose a novel deep network architecture for fast video inpainting. |
585 | DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis | Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang | In this paper, we focus on generating realistic images from text descriptions. |
586 | Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors | Yedid Hoshen, Ke Li, Jitendra Malik | In this work, we present a novel method – Generative Latent Nearest Neighbors (GLANN) – for training generative models without adversarial training. |
587 | Mixture Density Generative Adversarial Networks | Hamid Eghbal-zadeh, Werner Zellinger, Gerhard Widmer | In this paper, we propose a new GAN variant called Mixture Density GAN that overcomes this problem by encouraging the Discriminator to form clusters in its embedding space, which in turn leads the Generator to exploit these and discover different modes in the data. |
588 | SketchGAN: Joint Sketch Completion and Recognition With Generative Adversarial Network | Fang Liu, Xiaoming Deng, Yu-Kun Lai, Yong-Jin Liu, Cuixia Ma, Hongan Wang | In this paper, we propose SketchGAN, a new generative adversarial network (GAN) based approach that jointly completes and recognizes a sketch, boosting the performance of both tasks. |
589 | Foreground-Aware Image Inpainting | Wei Xiong, Jiahui Yu, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo | To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. |
590 | Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation | Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara | In this paper, we propose a semantic-aware architecture which can translate artworks to photo-realistic visualizations, thus reducing the gap between visual features of artistic and realistic data. |
591 | Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching | Yu Zhang, Dongqing Zou, Jimmy S. Ren, Zhe Jiang, Xiaohao Chen | Regarding this issue, this work proposes Multi-Scale Adversarial Correlation Matching (MS-ACM), a novel learning framework for structure-aware view synthesis. |
592 | DynTypo: Example-Based Dynamic Text Effects Transfer | Yifang Men, Zhouhui Lian, Yingmin Tang, Jianguo Xiao | In this paper, we present a novel approach for dynamic text effects transfer by using example-based texture synthesis. |
593 | Arbitrary Style Transfer With Style-Attentional Networks | Dae Young Park, Kwang Hee Lee | In this paper, we introduce a novel style-attentional network (SANet) that efficiently and flexibly integrates the local style patterns according to the semantic spatial distribution of the content image. |
594 | Typography With Decor: Intelligent Text Style Transfer | Wenjing Wang, Jiaying Liu, Shuai Yang, Zongming Guo | In this paper, we present a novel framework to stylize the text with exquisite decor, which is ignored by the previous text stylization methods. |
595 | RL-GAN-Net: A Reinforcement Learning Agent Controlled GAN Network for Real-Time Point Cloud Shape Completion | Muhammad Sarmad, Hyunjoo Jenny Lee, Young Min Kim | We present RL-GAN-Net, where a reinforcement learning (RL) agent provides fast and robust control of a generative adversarial network (GAN). |
596 | Photo Wake-Up: 3D Character Animation From a Single Photo | Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman | We present a method and application for animating a human subject from a single photo. |
597 | DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality | Chloe LeGendre, Wan-Chun Ma, Graham Fyffe, John Flynn, Laurent Charbonnel, Jay Busch, Paul Debevec | We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). |
598 | Iterative Residual CNNs for Burst Photography Applications | Filippos Kokkinos, Stamatis Lefkimmiatis | In this work, we focus on the fact that every frame of a burst sequence can be accurately described by a forward (physical) model. |
599 | Learning Implicit Fields for Generative Shape Modeling | Zhiqin Chen, Hao Zhang | We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes. |
600 | Reliable and Efficient Image Cropping: A Grid Anchor Based Approach | Hui Zeng, Lida Li, Zisheng Cao, Lei Zhang | This work revisits the problem of image cropping, and presents a grid anchor based formulation by considering the special properties and requirements (e.g., local redundancy, content preservation, aspect ratio) of image cropping. |
601 | Patch-Based Progressive 3D Point Set Upsampling | Wang Yifan, Shihao Wu, Hui Huang, Daniel Cohen-Or, Olga Sorkine-Hornung | We propose a series of architectural design contributions that lead to a substantial performance boost. |
602 | An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection | Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ling Shao | This paper presents a salient object detection method that integrates both top-down and bottom-up saliency inference in an iterative and cooperative manner. |
603 | Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring | Hongguang Zhang, Yuchao Dai, Hongdong Li, Piotr Koniusz | To tackle the above problems, we present a deep hierarchical multi-patch network inspired by Spatial Pyramid Matching to deal with blurry images via a fine-to-coarse hierarchical representation. |
604 | Turn a Silicon Camera Into an InGaAs Camera | Feifan Lv, Yinqiang Zheng, Bohan Zhang, Feng Lu | In this paper, we propose a novel solution for SWIR imaging using a common Silicon sensor, which has cheaper price, higher resolution and better technical maturity compared with the specialized InGaAs sensor. |
605 | Low-Rank Tensor Completion With a New Tensor Nuclear Norm Induced by Invertible Linear Transforms | Canyi Lu, Xi Peng, Yunchao Wei | Low-Rank Tensor Completion With a New Tensor Nuclear Norm Induced by Invertible Linear Transforms. |
606 | Joint Representative Selection and Feature Learning: A Semi-Supervised Approach | Suchen Wang, Jingjing Meng, Junsong Yuan, Yap-Peng Tan | In this paper, we propose a semi-supervised approach for representative selection, which finds a small set of representatives that can well summarize a large data collection. |
607 | The Domain Transform Solver | Akash Bapat, Jan-Michael Frahm | We present a novel framework for edge-aware optimization that is an order of magnitude faster than the state of the art while maintaining comparable results. |
608 | CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection | Lu Zhang, Jianming Zhang, Zhe Lin, Huchuan Lu, You He | To this end, we propose to leverage captioning as an auxiliary semantic task to boost salient object detection in complex scenarios. |
609 | Phase-Only Image Based Kernel Estimation for Single Image Blind Deblurring | Liyuan Pan, Richard Hartley, Miaomiao Liu, Yuchao Dai | Unlike existing approaches which focus on approaching the problem by enforcing various priors on the blur kernel and the latent image, we are aiming at obtaining a high quality blur kernel directly by studying the problem in the frequency domain. |
610 | Hierarchical Discrete Distribution Decomposition for Match Density Estimation | Zhichao Yin, Trevor Darrell, Fisher Yu | In this paper, we propose Hierarchical Discrete Distribution Decomposition (HD^3), a framework suitable for learning probabilistic pixel correspondences in both optical flow and stereo matching. |
611 | FOCNet: A Fractional Optimal Control Network for Image Denoising | Xixi Jia, Sanyang Liu, Xiangchu Feng, Lei Zhang | Inspired by the fact that the fractional-order differential equation has long-term memory, in this paper we develop an advanced image denoising network, namely FOCNet, by solving a fractional optimal control (FOC) problem. |
612 | Orthogonal Decomposition Network for Pixel-Wise Binary Classification | Chang Liu, Fang Wan, Wei Ke, Zhuowei Xiao, Yuan Yao, Xiaosong Zhang, Qixiang Ye | In this paper, we implement an Orthogonal Decomposition Unit (ODU) that transforms a convolutional feature map into orthogonal bases targeting at de-correlating neighboring pixels on convolutional features. |
613 | Multi-Source Weak Supervision for Saliency Detection | Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang, Mingyang Qian, Yizhou Yu | To this end, we propose a unified framework to train saliency detection models with diverse weak supervision sources. |
614 | ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples | Xiaojun Jia, Xingxing Wei, Xiaochun Cao, Hassan Foroosh | In this paper, we propose an end-to-end image compression model to defend adversarial examples: ComDefend. |
615 | Combinatorial Persistency Criteria for Multicut and Max-Cut | Jan-Hendrik Lange, Bjoern Andres, Paul Swoboda | We propose persistency criteria for the multicut and max-cut problem as well as fast combinatorial routines to verify them. |
616 | S4Net: Single Stage Salient-Instance Segmentation | Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu | We consider an interesting problem—salient instance segmentation. |
617 | A Decomposition Algorithm for the Sparse Generalized Eigenvalue Problem | Ganzhao Yuan, Li Shen, Wei-Shi Zheng | In this paper, we consider a new effective decomposition method to tackle this problem. |
618 | Polynomial Representation for Persistence Diagram | Zhichao Wang, Qian Li, Gang Li, Guandong Xu | In this work, we discover a set of general polynomials that vanish on vectorized PDs and extract the task-adapted feature representation from these polynomials. |
619 | Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks | Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao | In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. |
620 | Cross-Atlas Convolution for Parameterization Invariant Learning on Textured Mesh Surface | Shiwei Li, Zixin Luo, Mingmin Zhen, Yao Yao, Tianwei Shen, Tian Fang, Long Quan | We present a convolutional network architecture for direct feature learning on mesh surfaces through their atlases of texture maps. |
621 | Deep Surface Normal Estimation With Hierarchical RGB-D Fusion | Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, Yongtian Wang | In this paper, a hierarchical fusion network with adaptive feature re-weighting is proposed for surface normal estimation from a single RGB-D image. |
622 | Knowledge-Embedded Routing Network for Scene Graph Generation | Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin | In this work, we find that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue. |
623 | An End-To-End Network for Panoptic Segmentation | Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang | To address the problems, we propose a novel end-to-end Occlusion Aware Network (OANet) for panoptic segmentation, which can efficiently and effectively predict both the instance and stuff segmentation in a single network. |
624 | Fast and Flexible Indoor Scene Synthesis via Deep Convolutional Generative Models | Daniel Ritchie, Kai Wang, Yu-An Lin | We present a new, fast and flexible pipeline for indoor scene synthesis that is based on deep convolutional generative models. |
625 | Marginalized Latent Semantic Encoder for Zero-Shot Learning | Zhengming Ding, Hongfu Liu | In this paper, we attempt to exploit the intrinsic relationship in the semantic manifold when given semantics are not enough to describe the visual objects, and enhance the generalization ability of the visual-semantic function with marginalized strategy. |
626 | Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation | Jaime Spencer, Richard Bowden, Simon Hadfield | Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. |
627 | Unsupervised Embedding Learning via Invariant and Spreading Instance Feature | Mang Ye, Xu Zhang, Pong C. Yuen, Shih-Fu Chang | Motivated by the positive concentrated and negative separated properties observed from category-wise supervised learning, we propose to utilize the instance-wise supervision to approximate these properties, which aims at learning data augmentation invariant and instance spread-out features. |
628 | AOGNets: Compositional Grammatical Architectures for Deep Learning | Xilai Li, Xi Song, Tianfu Wu | This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. |
629 | A Robust Local Spectral Descriptor for Matching Non-Rigid Shapes With Incompatible Shape Structures | Yiqun Wang, Jianwei Guo, Dong-Ming Yan, Kai Wang, Xiaopeng Zhang | Focusing on this issue, in this paper, we present a more discriminative local descriptor for deformable 3D shapes with incompatible structures. Finally, for training and evaluation, we present a new benchmark dataset by extending the widely used FAUST dataset. |
630 | Context and Attribute Grounded Dense Captioning | Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao | In this work, we investigate contextual reasoning based on multi-scale message propagations from the neighboring contents to the target ROIs. |
631 | Spot and Learn: A Maximum-Entropy Patch Sampler for Few-Shot Image Classification | Wen-Hsuan Chu, Yu-Jhe Li, Jing-Cheng Chang, Yu-Chiang Frank Wang | In this work, we propose a sampling method that de-correlates an image based on maximum entropy reinforcement learning, and extracts varying sequences of patches on every forward-pass with discriminative information observed. |
632 | Interpreting CNNs via Decision Trees | Quanshi Zhang, Yu Yang, Haotian Ma, Ying Nian Wu | This paper aims to quantitatively explain the rationales of each prediction that is made by a pre-trained convolutional neural network (CNN). |
633 | Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning | Dong-Jin Kim, Jinsoo Choi, Tae-Hyun Oh, In So Kweon | Our goal in this work is to train an image captioning model that generates more dense and informative captions. |
634 | Deep Modular Co-Attention Networks for Visual Question Answering | Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, Qi Tian | In this paper, we propose a deep Modular Co-Attention Network (MCAN) that consists of Modular Co-Attention (MCA) layers cascaded in depth. |
635 | Synthesizing Environment-Aware Activities via Activity Sketches | Yuan-Hong Liao, Xavier Puig, Marko Boben, Antonio Torralba, Sanja Fidler | In this work, we address the problem: environment-aware program generation. |
636 | Self-Critical N-Step Training for Image Captioning | Junlong Gao, Shiqi Wang, Shanshe Wang, Siwei Ma, Wen Gao | In this paper, we estimate state value without using a parametrized value estimator. |
637 | Multi-Target Embodied Question Answering | Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra | We present a generalization of EQA — Multi-Target EQA (MT-EQA). |
638 | Visual Question Answering as Reading Comprehension | Hui Li, Peng Wang, Chunhua Shen, Anton van den Hengel | In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem. |
639 | StoryGAN: A Sequential Conditional GAN for Story Visualization | Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao | In this work, we propose a new task called Story Visualization. |
640 | Noise-Aware Unsupervised Deep Lidar-Stereo Fusion | Xuelian Cheng, Yiran Zhong, Yuchao Dai, Pan Ji, Hongdong Li | In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps. |
641 | Versatile Multiple Choice Learning and Its Application to Vision Computing | Kai Tian, Yi Xu, Shuigeng Zhou, Jihong Guan | In this paper, a new MCL method, called vMCL (the abbreviation of versatile Multiple Choice Learning), is developed to extend the application scenarios of MCL methods by ensembling deep neural networks. |
642 | EV-Gait: Event-Based Robust Gait Recognition Using Dynamic Vision Sensors | Yanxiang Wang, Bowen Du, Yiran Shen, Kai Wu, Guangrong Zhao, Jianguo Sun, Hongkai Wen | In this paper, we introduce a new type of sensing modality, the Dynamic Vision Sensors (Event Cameras), for the task of gait recognition. To evaluate the performance of EV-Gait, we collect two event-based gait datasets, one from real-world experiments and the other by converting the publicly available RGB gait recognition benchmark CASIA-B. |
643 | ToothNet: Automatic Tooth Instance Segmentation and Identification From Cone Beam CT Images | Zhiming Cui, Changjian Li, Wenping Wang | This paper proposes a method that uses deep convolutional neural networks to achieve automatic and accurate tooth instance segmentation and identification from CBCT (cone beam CT) images for digital dentistry. |
644 | Modularized Textual Grounding for Counterfactual Resilience | Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang | To address these issues, we propose a visual grounding system which is 1) end-to-end trainable in a weakly supervised fashion with only image-level annotations, and 2) counterfactually resilient owing to the modular design. |
645 | L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving | Weixin Lu, Yao Zhou, Guowei Wan, Shenhua Hou, Shiyu Song | We present L3-Net – a novel learning-based LiDAR localization system that achieves centimeter-level localization accuracy, comparable to prior state-of-the-art systems with hand-crafted pipelines. |
646 | Panoptic Feature Pyramid Networks | Alexander Kirillov, Ross Girshick, Kaiming He, Piotr Dollar | In this work, we aim to unify these methods at the architectural level, designing a single network for both tasks. |
647 | Mask Scoring R-CNN | Zhaojin Huang, Lichao Huang, Yongchao Gong, Chang Huang, Xinggang Wang | In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks. |
648 | Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection | Hang Xu, Chenhan Jiang, Xiaodan Liang, Liang Lin, Zhenguo Li | In this paper, we address the large-scale object detection problem with thousands of categories, which poses severe challenges due to long-tail data distributions, heavy occlusions, and class ambiguities. |
649 | Cross-Modality Personalization for Retrieval | Nils Murrugarra-Llerena, Adriana Kovashka | In this work, we propose a model for modeling cross-modality personalized retrieval. |
650 | Composing Text and Image for Image Retrieval – an Empirical Odyssey | Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays | In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. |
651 | Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation | Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, Sungjin Kim | To solve the problem, we propose a robust scene text detection method with adaptive text region representation. |
652 | Adaptive NMS: Refining Pedestrian Detection in a Crowd | Songtao Liu, Di Huang, Yunhong Wang | The contributions are threefold: (1) we propose adaptive-NMS, which applies a dynamic suppression threshold to an instance, according to the target density; (2) we design an efficient subnetwork to learn density scores, which can be conveniently embedded into both the single-stage and two-stage detectors; and (3) we achieve state of the art results on the CityPersons and CrowdHuman benchmarks. |
653 | Point in, Box Out: Beyond Counting Persons in Crowds | Yuting Liu, Miaojing Shi, Qijun Zhao, Xiaofang Wang | In this work, we instead propose a new deep detection network with only point supervision required. |
654 | Locating Objects Without Bounding Boxes | Javier Ribera, David Guera, Yuhao Chen, Edward J. Delp | In this paper, we address the task of estimating object locations without annotated bounding boxes which are typically hand-drawn and time consuming to label. |
655 | FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery | Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee | We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. |
656 | Mutual Learning of Complementary Networks via Residual Correction for Improving Semi-Supervised Classification | Si Wu, Jichang Li, Cheng Liu, Zhiwen Yu, Hau-San Wong | In this paper, we explore how to capture the complementary information to enhance mutual learning. |
657 | Sampling Techniques for Large-Scale Object Detection From Sparsely Annotated Objects | Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki | In this study, we propose part-aware sampling, a method that uses human intuition for the hierarchical relation between objects. |
658 | Curls & Whey: Boosting Black-Box Adversarial Attacks | Yucheng Shi, Siyu Wang, Yahong Han | In this work, we propose Curls & Whey black-box attack to fix the above two defects. |
659 | Barrage of Random Transforms for Adversarially Robust Defense | Edward Raff, Jared Sylvester, Steven Forsyth, Mark McLean | In this paper, we explore the idea of stochastically combining a large number of individually weak defenses into a single barrage of randomized transformations to build a strong defense against adversarial attacks. |
660 | Aggregation Cross-Entropy for Sequence Recognition | Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie | In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. |
661 | LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning | Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes, Alex M. Bronstein | In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines. |
662 | Few-Shot Learning With Localization in Realistic Settings | Davis Wertheimer, Bharath Hariharan | We introduce three parameter-free improvements: (a) better training procedures based on adapting cross-validation to meta-learning, (b) novel architectures that localize objects using limited bounding box annotations before classification, and (c) simple parameter-free expansions of the feature space based on bilinear pooling. |
663 | AdaGraph: Unifying Predictive and Continuous Domain Adaptation Through Graphs | Massimiliano Mancini, Samuel Rota Bulo, Barbara Caputo, Elisa Ricci | Our contribution is the first deep architecture that tackles predictive domain adaptation, able to leverage over the information brought by the auxiliary domains through a graph. |
664 | Grounded Video Description | Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach | In this work, we explicitly link the sentence to the evidence in the video by annotating each noun phrase in a sentence with the corresponding bounding box in one of the frames of a video. |
665 | Streamlined Dense Video Captioning | Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han | To tackle this challenge, we propose a novel dense video captioning framework, which models temporal dependency across events in a video explicitly and leverages visual and linguistic context from prior events for coherent storytelling. |
666 | Adversarial Inference for Multi-Sentence Video Description | Jae Sung Park, Marcus Rohrbach, Trevor Darrell, Anna Rohrbach | In this work, we instead propose to apply adversarial techniques during inference, designing a discriminator which encourages better multi-sentence video description. |
667 | Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations | Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma | We propose the Unified Visual-Semantic Embeddings (Unified VSE) for learning a joint space of visual representation and textual semantics. |
668 | Learning to Compose Dynamic Tree Structures for Visual Contexts | Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu | We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A. |
669 | Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation | Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang | In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. |
670 | Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering | Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, Hongsheng Li | We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. |
671 | Cycle-Consistency for Robust Visual Question Answering | Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh | As a step towards improving robustness of VQA models, we propose a model-agnostic framework that exploits cycle consistency. We introduce a new evaluation protocol and associated dataset (VQA-Rephrasings) and show that state-of-the-art VQA models are notoriously brittle to linguistic variations in questions. |
672 | Embodied Question Answering in Photorealistic Environments With Point Cloud Perception | Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra | To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task — Embodied Question Answering [1] in photo-realistic environments (Matterport 3D). |
673 | Reasoning Visual Dialogs With Structural and Partial Observations | Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu | We propose a novel model to address the task of Visual Dialog which exhibits complex dialog structures. |
674 | Recursive Visual Attention in Visual Dialog | Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen | In this work, to resolve the visual co-reference for visual dialog, we propose a novel attention mechanism called Recursive Visual Attention (RvA). |
675 | Two Body Problem: Collaborative Visual Task Completion | Unnat Jain, Luca Weihs, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander G. Schwing, Aniruddha Kembhavi | In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. |
676 | GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering | Drew A. Hudson, Christopher D. Manning | We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. |
677 | Text2Scene: Generating Compositional Scenes From Textual Descriptions | Fuwen Tan, Song Feng, Vicente Ordonez | In this paper, we propose Text2Scene, a model that generates various forms of compositional scene representations from natural language descriptions. |
678 | From Recognition to Cognition: Visual Commonsense Reasoning | Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi | To move towards cognition-level understanding, we present a new reasoning engine, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. Next, we introduce a new dataset, VCR, consisting of 290k multiple choice QA problems derived from 110k movie scenes. |
679 | The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation | Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira | In this paper, inspired by the intuition of viewing the problem as search on a navigation graph, we propose to use a progress monitor developed in prior work as a learnable heuristic for search. |
680 | Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation | Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa | We present the Frontier Aware Search with backTracking (FAST) Navigator, a general framework for action decoding, that achieves state-of-the-art results on the 2018 Room-to-Room (R2R) Vision-and-Language navigation challenge. |
681 | Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning | Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi | In this paper we study the problem of learning to learn at both training and test time in the context of visual navigation. |
682 | High Flux Passive Imaging With Single-Photon Sensors | Atul Ingle, Andreas Velten, Mohit Gupta | We propose passive free-running SPAD (PF-SPAD) imaging, an imaging modality that uses SPADs for capturing 2D intensity images with unprecedented dynamic range under ambient lighting, without any active light source. |
683 | Photon-Flooded Single-Photon 3D Cameras | Anant Gupta, Atul Ingle, Andreas Velten, Mohit Gupta | In this paper, we address the following basic question: what is the optimal photon flux that a SPAD-based LiDAR should be operated in? |
684 | Acoustic Non-Line-Of-Sight Imaging | David B. Lindell, Gordon Wetzstein, Vladlen Koltun | We introduce acoustic NLOS imaging, which is orders of magnitude less expensive than most optical systems and captures hidden 3D geometry at longer ranges with shorter acquisition times compared to state-of-the-art optical methods. |
685 | Steady-State Non-Line-Of-Sight Imaging | Wenzheng Chen, Simon Daneau, Fahim Mannan, Felix Heide | To tackle the shape-dependence of these variations, we propose a trainable architecture which learns to map diffuse indirect reflections to scene reflectance using only synthetic training data. |
686 | A Theory of Fermat Paths for Non-Line-Of-Sight Shape Reconstruction | Shumian Xin, Sotiris Nousias, Kiriakos N. Kutulakos, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan, Ioannis Gkioulekas | We present a novel theory of Fermat paths of light between a known visible scene and an unknown object not in the line of sight of a transient camera. |
687 | End-To-End Projector Photometric Compensation | Bingyao Huang, Haibin Ling | In this paper, for the first time, we formulate the compensation problem as an end-to-end learning problem and propose a convolutional neural network, named CompenNet, to implicitly learn the complex compensation function. |
688 | Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera | Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, Yuchao Dai | In this paper, we propose a simple and effective approach, the Event-based Double Integral (EDI) model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data. |
689 | Bringing Alive Blurred Moments | Kuldeep Purohit, Anshul Shah, A. N. Rajagopalan | We present a solution for the goal of extracting a video from a single motion blurred image to sequentially reconstruct the clear views of a scene as beheld by the camera during the time of exposure. |
690 | Learning to Synthesize Motion Blur | Tim Brooks, Jonathan T. Barron | We present a technique for synthesizing a motion blurred image from a pair of unblurred images captured in succession. |
691 | Underexposed Photo Enhancement Using Deep Illumination Estimation | Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, Jiaya Jia | This paper presents a new neural network for enhancing underexposed photos. |
692 | Blind Visual Motif Removal From a Single Image | Amir Hertz, Sharon Fogel, Rana Hanocka, Raja Giryes, Daniel Cohen-Or | This work proposes a deep learning based technique for blind removal of such objects. |
693 | Non-Local Meets Global: An Integrated Paradigm for Hyperspectral Denoising | Wei He, Quanming Yao, Chao Li, Naoto Yokoya, Qibin Zhao | In this paper, we claim that the HSI lies in a global spectral low-rank subspace, and the spectral subspaces of each full band patch groups should lie in this global low-rank subspace. |
694 | Neural Rerendering in the Wild | Moustafa Meshry, Dan B. Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin-Brualla | We explore total scene capture — recording, modeling, and rerendering a scene under varying appearance such as season and time of day. |
695 | GeoNet: Deep Geodesic Networks for Point Cloud Analysis | Tong He, Haibin Huang, Li Yi, Yuqian Zhou, Chihao Wu, Jue Wang, Stefano Soatto | Thus we introduce GeoNet, the first deep learning architecture trained to model the intrinsic structure of surfaces represented as point clouds. |
696 | MeshAdv: Adversarial Meshes for Visual Recognition | Chaowei Xiao, Dawei Yang, Bo Li, Jia Deng, Mingyan Liu | In this paper, we propose meshAdv to generate “adversarial 3D meshes” from objects that have rich shape features but minimal textural variation. |
697 | Fast Spatially-Varying Indoor Lighting Estimation | Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Jean-Francois Lalonde | We propose a real-time method to estimate spatially-varying indoor lighting from a single RGB image. |
698 | Neural Illumination: Lighting Prediction for Indoor Environments | Shuran Song, Thomas Funkhouser | Instead, we propose “Neural Illumination,” a new approach that decomposes illumination prediction into several simpler differentiable sub-tasks: 1) geometry estimation, 2) scene completion, and 3) LDR-to-HDR estimation. |
699 | Deep Sky Modeling for Single Image Outdoor Lighting Estimation | Yannick Hold-Geoffroy, Akshaya Athawale, Jean-Francois Lalonde | We propose a data-driven learned sky model, which we use for outdoor lighting estimation from a single image. |
700 | Bidirectional Learning for Domain Adaptation of Semantic Segmentation | Yunsheng Li, Lu Yuan, Nuno Vasconcelos | In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation. |
701 | Enhanced Bayesian Compression via Deep Reinforcement Learning | Xin Yuan, Liangliang Ren, Jiwen Lu, Jie Zhou | In this paper, we propose an Enhanced Bayesian Compression method to flexibly compress the deep networks via reinforcement learning. |
702 | Strong-Weak Distribution Alignment for Adaptive Object Detection | Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko | We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. |
703 | MFAS: Multimodal Fusion Architecture Search | Juan-Manuel Perez-Rua, Valentin Vielzeuf, Stephane Pateux, Moez Baccouche, Frederic Jurie | We propose a novel and generic search space that spans a large number of possible fusion architectures. |
704 | Disentangling Adversarial Robustness and Generalization | David Stutz, Matthias Hein, Bernt Schiele | In an effort to clarify the relationship between robustness and generalization, we assume an underlying, low-dimensional data manifold and show that: 1. regular adversarial examples leave the manifold; 2. adversarial examples constrained to the manifold, i.e., on-manifold adversarial examples, exist; 3. on-manifold adversarial examples are generalization errors, and on-manifold adversarial training boosts generalization; 4. regular robustness and generalization are not necessarily contradicting goals. |
705 | ShieldNets: Defending Against Adversarial Attacks Using Probabilistic Adversarial Robustness | Rajkumar Theagarajan, Ming Chen, Bir Bhanu, Jing Zhang | ShieldNet is implemented as a demonstration of PAR in this work by using PixelCNN. |
706 | Deeply-Supervised Knowledge Synergy | Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao | In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. |
707 | Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration | Xing Liu, Masanori Suganuma, Zhun Sun, Takayuki Okatani | In this paper, we study design of deep neural networks for tasks of image restoration. |
708 | Probabilistic End-To-End Noise Correction for Learning With Noisy Labels | Kun Yi, Jianxin Wu | To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions. |
709 | Attention-Guided Unified Network for Panoptic Segmentation | Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang | Existing methods mostly dealt with these two problems separately, but in this paper, we reveal the underlying relationship between them, in particular, FG objects provide complementary cues to assist BG understanding. |
710 | NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection | Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le | Here we aim to learn a better architecture of feature pyramid network for object detection. |
711 | OICSR: Out-In-Channel Sparsity Regularization for Compact Deep Neural Networks | Jiashi Li, Qi Qi, Jingyu Wang, Ce Ge, Yujian Li, Zhangzhang Yue, Haifeng Sun | Our proposed Out-In-Channel Sparsity Regularization (OICSR) considers correlations between successive layers to further retain predictive power of the compact network. |
712 | Semantically Aligned Bias Reducing Zero Shot Learning | Akanksha Paul, Narayanan C. Krishnan, Prateek Munjal | In this work, we propose a novel approach, Semantically Aligned Bias Reducing (SABR) ZSL, which focuses on solving both the problems. |
713 | Feature Space Perturbations Yield More Transferable Adversarial Examples | Nathan Inkawhich, Wei Wen, Hai (Helen) Li, Yiran Chen | This work describes a transfer-based blackbox targeted adversarial attack of deep feature space representations that also provides insights into cross-model class representations of deep CNNs. |
714 | IGE-Net: Inverse Graphics Energy Networks for Human Pose Estimation and Single-View Reconstruction | Dominic Jack, Frederic Maire, Sareh Shirazi, Anders Eriksson | We propose using a deep-learning based energy minimization framework to learn a consistency measure between 2D observations and a proposed world model, and demonstrate that this framework can be trained end-to-end to produce consistent and realistic inferences. |
715 | Accelerating Convolutional Neural Networks via Activation Map Compression | Georgios Georgiadis | Towards this end, we propose a three-stage compression and acceleration pipeline that sparsifies, quantizes and entropy encodes activation maps of Convolutional Neural Networks. |
716 | Knowledge Distillation via Instance Relationship Graph | Yufan Liu, Jiajiong Cao, Bing Li, Chunfeng Yuan, Weiming Hu, Yangxi Li, Yunqiang Duan | In this paper, a novel Instance Relationship Graph (IRG) is proposed for knowledge distillation. |
717 | PPGNet: Learning Point-Pair Graph for Line Segment Detection | Ziheng Zhang, Zhengxin Li, Ning Bi, Jia Zheng, Jinlei Wang, Kun Huang, Weixin Luo, Yanyu Xu, Shenghua Gao | In this paper, we present a novel framework to detect line segments in man-made environments. |
718 | Building Detail-Sensitive Semantic Segmentation Networks With Polynomial Pooling | Zhen Wei, Jingyi Zhang, Li Liu, Fan Zhu, Fumin Shen, Yi Zhou, Si Liu, Yao Sun, Ling Shao | In this work, we propose a polynomial pooling (P-pooling) function that finds an intermediate form between max and average pooling to provide an optimally balanced and self-adjusted pooling strategy for semantic segmentation. |
719 | Variational Bayesian Dropout With a Hierarchical Prior | Yuhang Liu, Wenyong Dong, Lei Zhang, Dong Gong, Qinfeng Shi | To address this problem, we present a new generalization of Gaussian dropout, termed variational Bayesian dropout (VBD), which turns to exploit a hierarchical prior on the network weights and infer a new joint posterior. |
720 | AANet: Attribute Attention Network for Person Re-Identifications | Chiat-Pin Tay, Sharmili Roy, Kim-Hui Yap | This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into a classification framework to solve the person re-identification (re-ID) problem. |
721 | Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction | Osama Makansi, Eddy Ilg, Ozgun Cicek, Thomas Brox | In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. |
722 | A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks | Yinghao Xu, Xin Dong, Yudian Li, Hao Su | In this paper, we, for the first time, define the filter-level pruning problem for binary neural networks, which cannot be solved by simply migrating existing structural pruning methods for full-precision models. |
723 | PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet | Yasuhiro Aoki, Hunter Goforth, Rangaprasad Arun Srivatsan, Simon Lucey | In this paper we argue that PointNet itself can be thought of as a learnable “imaging” function. |
724 | Few-Shot Adaptive Faster R-CNN | Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng | To mitigate the detection performance drop caused by domain shift, we aim to develop a novel few-shot adaptation approach that requires only a few target domain images with limited bounding box annotations. |
725 | VRSTC: Occlusion-Free Video Person Re-Identification | Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen | In this paper, we propose a novel network, called Spatio-Temporal Completion network (STCnet), to explicitly handle partial occlusion problem. |
726 | Compact Feature Learning for Multi-Domain Image Classification | Yajing Liu, Xinmei Tian, Ya Li, Zhiwei Xiong, Feng Wu | Therefore, we propose an end-to-end network to obtain the more optimal features, which we call compact features. |
727 | Adaptive Transfer Network for Cross-Domain Person Re-Identification | Jiawei Liu, Zheng-Jun Zha, Di Chen, Richang Hong, Meng Wang | In this work, we propose a novel adaptive transfer network (ATNet) for effective cross-domain person re-identification. |
728 | Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy | Aoxue Li, Tiange Luo, Zhiwu Lu, Tao Xiang, Liwei Wang | To overcome the challenge, we propose a novel large-scale FSL model by learning transferable visual features with the class hierarchy which encodes the semantic relations between source and target classes. |
729 | Moving Object Detection Under Discontinuous Change in Illumination Using Tensor Low-Rank and Invariant Sparse Decomposition | Moein Shakeri, Hong Zhang | Our method relies on the multilinear (tensor) data low-rank and sparse decomposition framework to address the weaknesses of existing methods. |
730 | Pedestrian Detection With Autoregressive Network Phases | Garrick Brazil, Xiaoming Liu | We present an autoregressive pedestrian detection framework with cascaded phases designed to progressively improve precision. |
731 | All You Need Is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification | Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu | To put this direction forward, a new and novel basic component named Sparse Shift Layer (SSL) is introduced in this paper to construct efficient convolutional neural networks. |
732 | Stochastic Class-Based Hard Example Mining for Deep Metric Learning | Yumin Suh, Bohyung Han, Wonsik Kim, Kyoung Mu Lee | To alleviate this limitation, we propose a stochastic hard negative mining method. |
733 | Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning | Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, Jiebo Luo | In this paper, we argue that a measure at such a level may not be effective enough in light of the scarcity of examples in few-shot learning. |
734 | Towards Robust Curve Text Detection With Conditional Spatial Expansion | Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, Wang Ling Goh | Instead of regarding the curve text detection as a polygon regression or a segmentation problem, we formulate it as a sequence prediction on the spatial domain. |
735 | Revisiting Perspective Information for Efficient Crowd Counting | Miaojing Shi, Zhaohui Yang, Chao Xu, Qijun Chen | In this work, we propose a perspective-aware convolutional neural network (PACNN) for efficient crowd counting, which integrates the perspective information into density regression to provide additional knowledge of the person scale change in an image. |
736 | Towards Universal Object Detection by Domain Attention | Xudong Wang, Zhaowei Cai, Dashan Gao, Nuno Vasconcelos | In this paper, we develop an effective and efficient universal object detection system that is capable of working on various image domains, from human faces and traffic signs to medical CT images. |
737 | Ensemble Deep Manifold Similarity Learning Using Hard Proxies | Nicolas Aziere, Sinisa Todorovic | We introduce a new time- and memory-efficient method for estimating the manifold similarities by using a closed-form convergence solution of the Random Walk algorithm. |
738 | Quantization Networks | Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, Xian-sheng Hua | In this paper, we provide a simple and uniform way for weights and activations quantization by formulating it as a differentiable non-linear function. |
739 | RES-PCA: A Scalable Approach to Recovering Low-Rank Matrices | Chong Peng, Chenglizhao Chen, Zhao Kang, Jianbo Li, Qiang Cheng | To combat this drawback, in this paper we propose a new type of RPCA method, RES-PCA, which is linearly efficient and scalable in both data size and dimension. |
740 | Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks | N. Dinesh Reddy, Minh Vo, Srinivasa G. Narasimhan | We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. |
741 | Efficient Featurized Image Pyramid Network for Single Shot Detector | Yanwei Pang, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Ling Shao | In this paper, we introduce a light-weight architecture to efficiently produce featurized image pyramid in a single-stage detection framework. |
742 | Multi-Task Multi-Sensor Fusion for 3D Object Detection | Ming Liang, Bin Yang, Yun Chen, Rui Hu, Raquel Urtasun | In this paper we propose to exploit multiple related tasks for accurate multi-sensor 3D object detection. |
743 | Domain-Specific Batch Normalization for Unsupervised Domain Adaptation | Woong-Gi Chang, Tackgeun You, Seonguk Seo, Suha Kwak, Bohyung Han | We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks. |
744 | Grid R-CNN | Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, Junjie Yan | This paper proposes a novel object detection framework named Grid R-CNN, which adopts a grid guided localization mechanism for accurate object detection. |
745 | MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition | Weihe Zhang, Yali Wang, Yu Qiao | To alleviate this problem, we propose a conceptually simple but effective MetaCleaner, which can learn to hallucinate a clean representation of an object category, according to a small noisy subset from the same category. |
746 | Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map | Janine Thoma, Danda Pani Paudel, Ajad Chhatkuli, Thomas Probst, Luc Van Gool | A contribution of this paper is to formulate such a set of requirements for the two sub-tasks involved: compact map construction and accurate self localization. |
747 | Triply Supervised Decoder Networks for Joint Detection and Segmentation | Jiale Cao, Yanwei Pang, Xuelong Li | In this paper, we propose a framework called TripleNet to deeply boost these two tasks. |
748 | Leveraging the Invariant Side of Generative Zero-Shot Learning | Jingjing Li, Mengmeng Jing, Ke Lu, Zhengming Ding, Lei Zhu, Zi Huang | In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. |
749 | Exploring the Bounds of the Utility of Context for Object Detection | Ehud Barnea, Ohad Ben-Shahar | In this work we seek to improve our understanding of this phenomenon, in part by pursuing an opposite approach. |
750 | A-CNN: Annularly Convolutional Neural Networks on Point Clouds | Artem Komarichev, Zichun Zhong, Jing Hua | This paper presents a new method to define and compute convolution directly on 3D point clouds by the proposed annular convolution. |
751 | DARNet: Deep Active Ray Network for Building Segmentation | Dominic Cheng, Renjie Liao, Sanja Fidler, Raquel Urtasun | In this paper, we propose a Deep Active Ray Network (DARNet) for automatic building segmentation. |
752 | Point Cloud Oversegmentation With Graph-Structured Deep Metric Learning | Loic Landrieu, Mohamed Boussaha | We propose a new supervized learning framework for oversegmenting 3D point clouds into superpoints. |
753 | Graphonomy: Universal Human Parsing via Graph Transfer Learning | Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin | In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. |
754 | Fitting Multiple Heterogeneous Models by Multi-Class Cascaded T-Linkage | Luca Magri, Andrea Fusiello | This paper addresses the problem of multiple models fitting in the general context where the sought structures can be described by a mixture of heterogeneous parametric models drawn from different classes. |
755 | A Late Fusion CNN for Digital Matting | Yunke Zhang, Lixue Gong, Lubin Fan, Peiran Ren, Qixing Huang, Hujun Bao, Weiwei Xu | This paper studies the structure of a deep convolutional neural network to predict the foreground alpha matte by taking a single RGB image as input. |
756 | BASNet: Boundary-Aware Salient Object Detection | Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, Martin Jagersand | In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. |
757 | ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation | Di Lin, Dingguo Shen, Siting Shen, Yuanfeng Ji, Dani Lischinski, Daniel Cohen-Or, Hui Huang | In this work, we introduce ZigZagNet, which aggregates a richer multi-context feature map by using not only dense top-down and bottom-up propagation, but also by introducing pathways crossing between different levels of the top-down and the bottom-up hierarchies, in a zig-zag fashion. |
758 | Object Instance Annotation With Deep Extreme Level Set Evolution | Zian Wang, David Acuna, Huan Ling, Amlan Kar, Sanja Fidler | In this paper, we tackle the task of interactive object segmentation. |
759 | Leveraging Crowdsourced GPS Data for Road Extraction From Aerial Imagery | Tao Sun, Zonglin Di, Pengyu Che, Chun Liu, Yin Wang | In this paper, we propose to leverage crowdsourced GPS data to improve and support road extraction from aerial imagery. |
760 | Adaptive Pyramid Context Network for Semantic Segmentation | Junjun He, Zhongying Deng, Lei Zhou, Yali Wang, Yu Qiao | Based on this analysis, this paper proposes Adaptive Pyramid Context Network (APCNet) for semantic segmentation. |
761 | Isospectralization, or How to Hear Shape, Style, and Correspondence | Luca Cosmo, Mikhail Panine, Arianna Rampini, Maks Ovsjanikov, Michael M. Bronstein, Emanuele Rodola | In this paper, we introduce a numerical procedure called isospectralization, consisting of deforming one shape to make its Laplacian spectrum match that of another. |
762 | Speech2Face: Learning the Face Behind a Voice | Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik | In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. |
763 | Joint Manifold Diffusion for Combining Predictions on Decoupled Observations | Kwang In Kim, Hyung Jin Chang | We present a new predictor combination algorithm that improves a given task predictor based on potentially relevant reference predictors. |
764 | Audio Visual Scene-Aware Dialog | Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh | We introduce the task of scene-aware dialog. |
765 | Learning to Minify Photometric Stereo | Junxuan Li, Antonio Robles-Kelly, Shaodi You, Yasuyuki Matsushita | We propose a method that can dramatically decrease the demands on the number of images by learning the most informative ones under different illumination conditions. |
766 | Reflective and Fluorescent Separation Under Narrow-Band Illumination | Koji Koyamatsu, Daichi Hidaka, Takahiro Okabe, Hendrik P. A. Lensch | In this paper, we address the separation of reflective and fluorescent components in RGB images taken under narrow-band light sources such as LEDs. |
767 | Depth From a Polarisation + RGB Stereo Pair | Dizhong Zhu, William A. P. Smith | In this paper, we propose a hybrid depth imaging system in which a polarisation camera is augmented by a second image from a standard digital camera. |
768 | Rethinking the Evaluation of Video Summaries | Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila | In this paper, we will provide in-depth assessment of this pipeline using two popular benchmark datasets. |
769 | What Object Should I Use? – Task Driven Object Detection | Johann Sawatzky, Yaser Souri, Christian Grund, Jurgen Gall | We therefore introduce the COCO-Tasks dataset which comprises about 40,000 images where the most suitable objects for 14 tasks have been annotated. We furthermore propose an approach that detects the most suitable objects for a given task. |
770 | Triangulation Learning Network: From Monocular to Stereo 3D Object Detection | Zengyi Qin, Jinglu Wang, Yan Lu | In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information. |
771 | Connecting the Dots: Learning Representations for Active Monocular Depth Estimation | Gernot Riegler, Yiyi Liao, Simon Donne, Vladlen Koltun, Andreas Geiger | We propose a technique for depth estimation with a monocular structured-light camera, i.e., a calibrated stereo set-up with one camera and one laser projector. |
772 | Learning Non-Volumetric Depth Fusion Using Successive Reprojections | Simon Donne, Andreas Geiger | In this work we propose to learn an auto-regressive depth refinement directly from data. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. |
773 | Stereo R-CNN Based 3D Object Detection for Autonomous Driving | Peiliang Li, Xiaozhi Chen, Shaojie Shen | We propose a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. |
774 | Hybrid Scene Compression for Visual Localization | Federico Camposeco, Andrea Cohen, Marc Pollefeys, Torsten Sattler | In this work, we introduce a new hybrid compression algorithm that uses a given memory limit in a more effective way. |
775 | MMFace: A Multi-Metric Regression Network for Unconstrained Face Reconstruction | Hongwei Yi, Chen Li, Qiong Cao, Xiaoyong Shen, Sheng Li, Guoping Wang, Yu-Wing Tai | We propose to address the face reconstruction in the wild by using a multi-metric regression network, MMFace, to align a 3D face morphable model (3DMM) to an input image. |
776 | 3D Motion Decomposition for RGBD Future Dynamic Scene Synthesis | Xiaojuan Qi, Zhengzhe Liu, Qifeng Chen, Jiaya Jia | In this paper, we propose a RGBD scene forecasting model with 3D motion decomposition. |
777 | Single Image Depth Estimation Trained via Depth From Defocus Cues | Shir Gur, Lior Wolf | In this work, we rely, instead of different views, on depth from focus cues. |
778 | RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion | Jie Li, Yu Liu, Dong Gong, Qinfeng Shi, Xia Yuan, Chunxia Zhao, Ian Reid | We introduce a light-weight Dimensional Decomposition Residual network (DDR) for 3D dense prediction tasks. |
779 | Neural Scene Decomposition for Multi-Person Motion Capture | Helge Rhodin, Victor Constantin, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua | In this paper, we therefore propose an approach to learning features that are useful for this purpose. |
780 | Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition | Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu, Tong Zhang, Jun Zhu | In this paper, we evaluate the robustness of state-of-the-art face recognition models in the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire hard-label predictions by sending queries to the target model. |
781 | FA-RPN: Floating Region Proposals for Face Detection | Mahyar Najibi, Bharat Singh, Larry S. Davis | We propose a novel approach for generating region proposals for performing face detection. |
782 | Bayesian Hierarchical Dynamic Model for Human Action Recognition | Rui Zhao, Wanru Xu, Hui Su, Qiang Ji | To address this issue, we propose a probabilistic model called Hierarchical Dynamic Model (HDM). |
783 | Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation | Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh | The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. |
784 | 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training | Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli | In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. |
785 | Learning to Regress 3D Face Shape and Expression From an Image Without 3D Supervision | Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black | To train a network without any 2D-to-3D supervision, we present RingNet, which learns to compute 3D face shape from a single image. Additionally we create a new database of faces “not quite in-the-wild” (NoW) with 3D head scans and high-resolution images of the subjects in a wide variety of conditions. |
786 | PoseFix: Model-Agnostic General Human Pose Refinement Network | Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee | In this paper, we propose a human pose refinement network that estimates a refined pose from a tuple of an input image and input pose. |
787 | RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation | Bastian Wandt, Bodo Rosenhahn | This paper addresses the problem of 3D human pose estimation from single images. |
788 | Fast and Robust Multi-Person 3D Pose Estimation From Multiple Views | Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, Xiaowei Zhou | We propose a fast and robust approach to solve this problem. |
789 | Face-Focused Cross-Stream Network for Deception Detection in Videos | Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, Ji-Rong Wen | In this work, both problems are addressed. |
790 | Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data | Yaoyao Zhong, Weihong Deng, Mei Wang, Jiani Hu, Jianteng Peng, Xunqiang Tao, Yaohai Huang | In this paper, we propose a training strategy that treats the head data and the tail data in an unequal way, accompanying with noise-robust loss functions, to take full advantage of their respective characteristics. |
791 | T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor | Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos, Maja Pantic | In this paper, we propose to fully parametrize Convolutional Neural Networks (CNNs) with a single high-order, low-rank tensor. |
792 | Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss | Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu | To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. |
793 | Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video | Radu Tudor Ionescu, Fahad Shahbaz Khan, Mariana-Iuliana Georgescu, Ling Shao | In this work, we formalize abnormal event detection as a one-versus-rest binary classification problem. |
794 | DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition | Toby Perrett, Dima Damen | In this paper we introduce Dual-Domain LSTM (DDLSTM), an architecture that is able to learn temporal dependencies from two domains concurrently. |
795 | The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos | Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen | We present a new model to determine relative skill from long videos, through learnable temporal attention modules. We evaluate our approach on the EPIC-Skills dataset and additionally annotate a larger dataset from YouTube videos for skill determination with five previously unexplored tasks. |
796 | Collaborative Spatiotemporal Feature Learning for Video Action Recognition | Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu | In this paper, we propose a novel neural operation which encodes spatiotemporal features collaboratively by imposing a weight-sharing constraint on the learnable parameters. |
797 | MARS: Motion-Augmented RGB Stream for Action Recognition | Nieves Crasto, Philippe Weinzaepfel, Karteek Alahari, Cordelia Schmid | In this paper, we introduce two learning approaches to train a standard 3D CNN, operating on RGB frames, that mimics the motion stream, and as a result avoids flow computation at test time. |
798 | Convolutional Relational Machine for Group Activity Recognition | Sina Mokhtarzadeh Azar, Mina Ghadimi Atigh, Ahmad Nickabadi, Alexandre Alahi | We present an end-to-end deep Convolutional Neural Network called Convolutional Relational Machine (CRM) for recognizing group activities that utilizes the information in spatial relations between individual persons in image or video. |
799 | Video Summarization by Learning From Unpaired Data | Mrigank Rochan, Yang Wang | We present an approach that learns to generate optimal video summaries using a set of raw videos (V) and a set of summary videos (S), where there exists no correspondence between V and S. |
800 | Skeleton-Based Action Recognition With Directed Graph Neural Networks | Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu | In this work, we represent the skeleton data as a directed acyclic graph based on the kinematic dependency between the joints and bones in the natural human body. |
801 | PA3D: Pose-Action 3D Machine for Video Recognition | An Yan, Yali Wang, Zhifeng Li, Yu Qiao | To fill this gap, we propose a concise Pose-Action 3D Machine (PA3D), which can effectively encode multiple pose modalities within a unified 3D framework, and consequently learn spatio-temporal pose representations for action recognition. |
802 | Deep Dual Relation Modeling for Egocentric Interaction Recognition | Haoxin Li, Yijun Cai, Wei-Shi Zheng | To exploit the strong relations for egocentric interaction recognition, we introduce a dual relation modeling framework which learns to model the relations between the camera wearer and the interactor based on the individual action representations of the two persons. |
803 | MOTS: Multi-Object Tracking and Segmentation | Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, Bastian Leibe | This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). We make our annotations, code, and models available at https://www.vision.rwth-aachen.de/page/mots. |
804 | Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking | Heng Fan, Haibin Ling | Addressing these issues, we propose a multi-stage tracking framework, Siamese Cascaded RPN (C-RPN), which consists of a sequence of RPNs cascaded from deep high-level to shallow low-level layers in a Siamese network. |
805 | PointFlowNet: Learning Representations for Rigid Motion Estimation From Point Clouds | Aseem Behl, Despoina Paschalidou, Simon Donne, Andreas Geiger | In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. |
806 | Listen to the Image | Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang | To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. |
807 | Image Super-Resolution by Neural Texture Transfer | Zhifei Zhang, Zhaowen Wang, Zhe Lin, Hairong Qi | This paper aims to unleash the potential of RefSR by leveraging more texture details from Ref images with stronger robustness even when irrelevant Ref images are provided. We build a benchmark dataset for the general research of RefSR, which contains Ref images paired with LR inputs with varying levels of similarity. |
808 | Conditional Adversarial Generative Flow for Controllable Image Synthesis | Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li | In this paper, based on modeling a joint probabilistic density of an image and its conditions, we propose a novel flow-based generative model named conditional adversarial generative flow (CAGlow). |
809 | How to Make a Pizza: Learning a Compositional Layer-Based GAN Model | Dim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, Antonio Torralba | In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. |
810 | TransGaGa: Geometry-Aware Unsupervised Image-To-Image Translation | Wayne Wu, Kaidi Cao, Cheng Li, Chen Qian, Chen Change Loy | In this work, we present a novel disentangle-and-translate framework to tackle the complex objects image-to-image translation task. |
811 | Depth-Attentional Features for Single-Image Rain Removal | Xiaowei Hu, Chi-Wing Fu, Lei Zhu, Pheng-Ann Heng | In this work, we first analyze the visual effects of rain subject to scene depth and formulate a rain imaging model collectively with rain streaks and fog; by then, we prepare a new dataset called RainCityscapes with rain streaks and fog on real outdoor photos. |
812 | Hyperspectral Image Reconstruction Using a Deep Spatial-Spectral Prior | Lizhi Wang, Chen Sun, Ying Fu, Min H. Kim, Hua Huang | In this paper, we present a novel hyperspectral image reconstruction algorithm that substitutes the traditional hand-crafted prior with a data-driven prior, based on an optimization-inspired network. |
813 | LiFF: Light Field Features in Scale and Depth | Donald G. Dansereau, Bernd Girod, Gordon Wetzstein | Building on spatio-angular imaging modalities offered by emerging light field cameras, we introduce a new and computationally efficient 4D light field feature detector and descriptor: LiFF. |
814 | Deep Exemplar-Based Video Colorization | Bo Zhang, Mingming He, Jing Liao, Pedro V. Sander, Lu Yuan, Amine Bermak, Dong Chen | To address this issue, we introduce a recurrent framework that unifies the semantic correspondence and color propagation steps. |
815 | On Finding Gray Pixels | Yanlin Qian, Joni-Kristian Kamarainen, Jarno Nikkanen, Jiri Matas | We propose a novel grayness index for finding gray pixels and demonstrate its effectiveness and efficiency in illumination estimation. |
816 | UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos | Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yi Yang, Wei Xu | In this paper, we propose UnOS, an unified system for unsupervised optical flow and stereo depth estimation using convolutional neural network (CNN) by taking advantages of their inherent geometrical consistency based on the rigid-scene assumption. |
817 | Learning Transformation Synchronization | Xiangru Huang, Zhenxiao Liang, Xiaowei Zhou, Yao Xie, Leonidas J. Guibas, Qixing Huang | Instead of merely using the relative transformations as the input to perform transformation synchronization, we propose to use a neural network to learn the weights associated with each relative transformation. |
818 | D2-Net: A Trainable CNN for Joint Description and Detection of Local Features | Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, Torsten Sattler | In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. |
819 | Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring | Seungjun Nah, Sanghyun Son, Kyoung Mu Lee | In this work, we aim to improve the accuracy of recurrent models by adapting the hidden states transferred from past frames to the frame being processed so that the relations between video frames could be better used. |
820 | Learning to Extract Flawless Slow Motion From Blurry Videos | Meiguang Jin, Zhe Hu, Paolo Favaro | In this paper, we introduce the task of generating a sharp slow-motion video given a low frame rate blurry video. |
821 | Natural and Realistic Single Image Super-Resolution With Explicit Natural Manifold Discrimination | Jae Woong Soh, Gu Yong Park, Junho Jo, Nam Ik Cho | Therefore, in this paper, we present a new approach to reconstructing realistic super-resolved images with high perceptual quality, while maintaining the naturalness of the result. |
822 | RF-Net: An End-To-End Image Matching Network Based on Receptive Field | Xuelun Shen, Cheng Wang, Xin Li, Zenglei Yu, Jonathan Li, Chenglu Wen, Ming Cheng, Zijian He | This paper proposes a new end-to-end trainable matching network based on receptive field, RF-Net, to compute sparse correspondence between images. |
823 | Fast Single Image Reflection Suppression via Convex Optimization | Yang Yang, Wenye Ma, Yin Zheng, Jian-Feng Cai, Weiyu Xu | We propose a convex model to suppress the reflection from a single input image. |
824 | A Mutual Learning Method for Salient Object Detection With Intertwined Multi-Supervision | Runmin Wu, Mengyang Feng, Wenlong Guan, Dong Wang, Huchuan Lu, Errui Ding | To alleviate these issues, we propose to train saliency detection networks by exploiting the supervision from not only salient object detection, but also foreground contour detection and edge detection. |
825 | Enhanced Pix2pix Dehazing Network | Yanyun Qu, Yizi Chen, Jingying Huang, Yuan Xie | In this paper, we reduce the image dehazing problem to an image-to-image translation problem, and propose Enhanced Pix2pix Dehazing Network (EPDN), which generates a haze-free image without relying on the physical scattering model. |
826 | Assessing Personally Perceived Image Quality via Image Features and Collaborative Filtering | Jari Korhonen | In this study, we aim to predict the personally perceived image quality by combining classical image feature analysis and collaboration filtering approach known from the recommendation systems. |
827 | Single Image Reflection Removal Exploiting Misaligned Training Data and Network Enhancements | Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, Hua Huang | In this paper, we address these issues by exploiting targeted network enhancements and the novel use of misaligned data. |
828 | Exploring Context and Visual Pattern of Relationship for Scene Graph Generation | Wenbin Wang, Ruiping Wang, Shiguang Shan, Xilin Chen | Therefore, we present our so-called Relationship Context – InterSeCtion Region (CISC) method. |
829 | Learning From Synthetic Data for Crowd Counting in the Wild | Qi Wang, Junyu Gao, Wei Lin, Yuan Yuan | Learning From Synthetic Data for Crowd Counting in the Wild. The dataset and source code are available at https://gjy3035.github.io/GCC-CL/. |
830 | A Local Block Coordinate Descent Algorithm for the CSC Model | Ev Zisselman, Jeremias Sulam, Michael Elad | In this work we propose a new and simple approach that adopts a localized strategy, based on the Block Coordinate Descent algorithm. |
831 | Not Using the Car to See the Sidewalk — Quantifying and Controlling the Effects of Context in Classification and Segmentation | Rakshith Shetty, Bernt Schiele, Mario Fritz | We propose a method to quantify the sensitivity of black-box vision models to visual context by editing images to remove selected objects and measuring the response of the target models. |
832 | Discovering Fair Representations in the Data Domain | Novi Quadrianto, Viktoriia Sharmanska, Oliver Thomas | We propose to cast this problem as data-to-data translation, i.e. learning a mapping from an input domain to a fair target domain, where a fairness definition is being enforced. |
833 | Actor-Critic Instance Segmentation | Nikita Araslanov, Constantin A. Rothkopf, Stefan Roth | In this work, we revisit the recurrent formulation of this challenging problem in the context of reinforcement learning. |
834 | Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders | Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, Zeynep Akata | In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. |
835 | Semantic Projection Network for Zero- and Few-Label Semantic Segmentation | Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, Zeynep Akata | In this paper we take this one step further and focus on the challenging task of zero- and few-shot learning of semantic segmentation. |
836 | GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation | Xinhong Ma, Tianzhu Zhang, Changsheng Xu | Different from existing methods, we propose an end-to-end Graph Convolutional Adversarial Network (GCAN) for unsupervised domain adaptation by jointly modeling data structure, domain label, and class label in a unified deep framework. |
837 | Seamless Scene Segmentation | Lorenzo Porzi, Samuel Rota Bulo, Aleksander Colovic, Peter Kontschieder | In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results. |
838 | Unsupervised Image Matching and Object Discovery as Optimization | Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann LeCun, Patrick Perez, Jean Ponce | We focus here on the unsupervised discovery and matching of object cate- gories among images in a collection, following the work of Cho et al. [12]. |
839 | Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs | Qi Zhang, Antoni B. Chan | In this paper, we propose a deep neural network framework for multi-view crowd counting, which fuses information from multiple camera views to predict a scene-level density map on the ground-plane of the 3D world. |
840 | Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions | Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara | In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by allowing both grounding and controllability. |
841 | Towards VQA Models That Can Read | Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach | Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today’s VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new “TextVQA” dataset to facilitate progress on this important problem. |
842 | Object-Aware Aggregation With Bidirectional Temporal Graph for Video Captioning | Junchao Zhang, Yuxin Peng | In this paper, we propose a new video captioning approach based on object-aware aggregation with bidirectional temporal graph (OA-BTG), which captures detailed temporal dynamics for salient objects in video, and learns discriminative spatio-temporal representations by performing object-aware local feature aggregation on detected object regions. |
843 | Progressive Attention Memory Network for Movie Story Question Answering | Junyeong Kim, Minuk Ma, Kyungsu Kim, Sungjin Kim, Chang D. Yoo | This paper proposes the progressive attention memory network (PAMN) for movie story question answering (QA). |
844 | Memory-Attended Recurrent Network for Video Captioning | Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai | To tackle this limitation, we propose the Memory-Attended Recurrent Network (MARN) for video captioning, in which a memory structure is designed to explore the full-spectrum correspondence between a word and its various similar visual contexts across videos in training data. |
845 | Visual Query Answering by Entity-Attribute Graph Matching and Reasoning | Peixi Xiong, Huayi Zhan, Xin Wang, Baivab Sinha, Ying Wu | This paper proposes a novel method to address the VQA problem. We also create a dataset on soccer match (Soccer-VQA) with rich annotations. |
846 | Look Back and Predict Forward in Image Captioning | Yu Qin, Jiajun Du, Yonghua Zhang, Hongtao Lu | We propose Look Back (LB) method to embed visual information from the past and Predict Forward (PF) approach to look into future. |
847 | Explainable and Explicit Visual Reasoning Over Scene Graphs | Jiaxin Shi, Hanwang Zhang, Juanzi Li | We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using scene graphs — objects as nodes and the pairwise relationships as edges — for explainable and explicit reasoning with structured knowledge. |
848 | Transfer Learning via Unsupervised Task Discovery for Visual Question Answering | Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han | We tackle this problem in two steps: 1) learning a task conditional visual classifier, which is capable of solving diverse question-specific visual recognition tasks, based on unsupervised task discovery and 2) transferring the task conditional visual classifier to visual question answering models. |
849 | Intention Oriented Image Captions With Guiding Objects | Yue Zheng, Yali Li, Shengjin Wang | In this paper, we propose a novel approach for generating image captions with guiding objects (CGO). With CGO, we can extend the ability of description to the objects being neglected in image caption labels and provide a set of more comprehensive and diverse descriptions for an image. |
850 | Uncertainty Guided Multi-Scale Residual Learning-Using a Cycle Spinning CNN for Single Image De-Raining | Rajeev Yasarla, Vishal M. Patel | The proposed Uncertainty guided Multi-scale Residual Learning (UMRL) network attempts to address this issue by learning the rain content at different scales and using them to estimate the final de-rained output. |
851 | Toward Realistic Image Compositing With Adversarial Learning | Bor-Chun Chen, Andrew Kae | In this work we propose a generative adversarial network (GAN) architecture for automatic image compositing. |
852 | Cross-Classification Clustering: An Efficient Multi-Object Tracking Technique for 3-D Instance Segmentation in Connectomics | Yaron Meirovitch, Lu Mi, Hayk Saribekyan, Alexander Matveev, David Rolnick, Nir Shavit | Here we introduce cross-classification clustering (3C), a technique that simultaneously tracks complex, interrelated objects in an image stack. |
853 | Deep ChArUco: Dark ChArUco Marker Pose Estimation | Danying Hu, Daniel DeTone, Tomasz Malisiewicz | We present Deep ChArUco, a real-time pose estimation system which combines two custom deep networks, ChArUcoNet and RefineNet, with the Perspective-n-Point (PnP) algorithm to estimate the marker’s 6DoF pose. |
854 | Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving | Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger | Taking the inner workings of convolutional neural networks into consideration, we propose to convert image-based depth maps to pseudo-LiDAR representations — essentially mimicking the LiDAR signal. |
855 | Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions | Joey Hong, Benjamin Sapp, James Philbin | We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior. |
856 | Metric Learning for Image Registration | Marc Niethammer, Roland Kwitt, Francois-Xavier Vialard | Instead of learning the entire registration approach, we learn a spatially-adaptive regularizer within a registration model. |
857 | LO-Net: Deep Real-Time Lidar Odometry | Qing Li, Shaoyang Chen, Cheng Wang, Xin Li, Chenglu Wen, Ming Cheng, Jonathan Li | We present a novel deep convolutional network pipeline, LO-Net, for real-time lidar odometry estimation. |
858 | TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions | Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha | We present a new algorithm for predicting the near-term trajectories of road agents in dense traffic videos. We evaluate the performance of our prediction algorithm, TraPHic, on the standard datasets and also introduce a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories. |
859 | World From Blur | Jiayan Qiu, Xinchao Wang, Stephen J. Maybank, Dacheng Tao | We show in this paper that a 3D scene can be revealed. |
860 | Topology Reconstruction of Tree-Like Structure in Images via Structural Similarity Measure and Dominant Set Clustering | Jianyang Xie, Yitian Zhao, Yonghuai Liu, Pan Su, Yifan Zhao, Jun Cheng, Yalin Zheng, Jiang Liu | In this paper, we propose a novel curvilinear structural similarity measure to guide a dominant-set clustering approach to address this indispensable issue. |
861 | Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training | Feng Zheng, Cheng Deng, Xing Sun, Xinyang Jiang, Xiaowei Guo, Zongqiao Yu, Feiyue Huang, Rongrong Ji | In this paper, we propose a novel coarse-to-fine pyramid model to relax the need of bounding boxes, which not only incorporates local and global information, but also integrates the gradual cues between them. |
862 | Holistic and Comprehensive Annotation of Clinically Significant Findings on Diverse CT Images: Learning From Radiology Reports and Label Ontology | Ke Yan, Yifan Peng, Veit Sandfort, Mohammadhadi Bagheri, Zhiyong Lu, Ronald M. Summers | In this paper, we study the lesion description or annotation problem. |
863 | Robust Histopathology Image Analysis: To Label or to Synthesize? | Le Hou, Ayush Agarwal, Dimitris Samaras, Tahsin M. Kurc, Rajarsi R. Gupta, Joel H. Saltz | We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type. |
864 | Data Augmentation Using Learned Transformations for One-Shot Medical Image Segmentation | Amy Zhao, Guha Balakrishnan, Fredo Durand, John V. Guttag, Adrian V. Dalca | We present an automated data augmentation method for synthesizing labeled medical images. |
865 | Shifting More Attention to Video Salient Object Detection | Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, Jianbing Shen | This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. |
866 | Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration | De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles | Our goal is to generate a policy to complete an unseen task given just a single video demonstration of the task in a given domain. |
867 | Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry | Fei Xue, Xin Wang, Shunkai Li, Qiuyuan Wang, Junqiu Wang, Hongbin Zha | In contrast, we present a VO framework by incorporating two additional components called Memory and Refining. |
868 | Image Generation From Layout | Bo Zhao, Lili Meng, Weidong Yin, Leonid Sigal | To address these challenges, we propose a novel approach for layout-based image generation; we call it Layout2Im. |
869 | Multimodal Explanations by Predicting Counterfactuality in Videos | Atsushi Kanehira, Kentaro Takemoto, Sho Inayoshi, Tatsuya Harada | Our goal is not only to classify a video into a specific category, but also to provide explanations on why it is not categorized to a specific class with combinations of visual-linguistic information. |
870 | Learning to Explain With Complemental Examples | Atsushi Kanehira, Tatsuya Harada | We propose a novel framework to generate complemental explanations, on which the joint distribution of the variables to explain, and those to be explained is parameterized by three different neural networks: predictor, linguistic explainer, and example selector. |
871 | HAQ: Hardware-Aware Automated Quantization With Mixed Precision | Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han | In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator’s feedback in the design loop. |
872 | Content Authentication for Neural Imaging Pipelines: End-To-End Optimization of Photo Provenance in Complex Distribution Channels | Pawel Korus, Nasir Memon | This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel. |
873 | Inverse Procedural Modeling of Knitwear | Elena Trunz, Sebastian Merzbach, Jonathan Klein, Thomas Schulze, Michael Weinmann, Reinhard Klein | While recent approaches are focused on woven cloth, we present a novel practical approach for the inference of more complex knitwear structures as well as the respective knitting instructions from only a single image without attached annotations. |
874 | Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video | Zongmian Li, Jiri Sedlar, Justin Carpentier, Ivan Laptev, Nicolas Mansard, Josef Sivic | In this paper, we introduce a method to automatically reconstruct the 3D motion of a person interacting with an object from a single RGB video. |
875 | DeepMapping: Unsupervised Map Estimation From Multiple Point Clouds | Li Ding, Chen Feng | We propose DeepMapping, a novel registration framework using deep neural networks (DNNs) as auxiliary functions to align multiple point clouds from scratch to a globally consistent frame. |
876 | End-To-End Interpretable Neural Motion Planner | Wenyuan Zeng, Wenjie Luo, Simon Suo, Abbas Sadat, Bin Yang, Sergio Casas, Raquel Urtasun | In this paper, we propose a neural motion planner for learning to drive autonomously in complex urban scenarios that include traffic-light handling, yielding, and interactions with multiple road-users. |
877 | Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model | Tian Han, Erik Nijkamp, Xiaolin Fang, Mitch Hill, Song-Chun Zhu, Ying Nian Wu | This paper proposes the divergence triangle as a framework for joint training of a generator model, energy-based model and inference model. |
878 | Image Deformation Meta-Networks for One-Shot Learning | Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert | Our key insight is that, while the deformed images may not be visually realistic, they still maintain critical semantic information and contribute significantly to formulating classifier decision boundaries. |
879 | Online High Rank Matrix Completion | Jicong Fan, Madeleine Udell | In this paper, we develop a new model for high rank matrix completion (HRMC), together with batch and online methods to fit the model and out-of-sample extension to complete new data. |
880 | Multispectral Imaging for Fine-Grained Recognition of Powders on Complex Backgrounds | Tiancheng Zhi, Bernardo R. Pires, Martial Hebert, Srinivasa G. Narasimhan | We present a method to select discriminative spectral bands to significantly reduce acquisition time while improving recognition accuracy. To address these challenges, we present the first comprehensive dataset and approach for powder recognition using multi-spectral imaging. |
881 | ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging | Samarth Brahmbhatt, Cusuh Ham, Charles C. Kemp, James Hays | We present ContactDB, a novel dataset of contact maps for household objects that captures the rich hand-object contact that occurs during grasping, enabled by use of a thermal camera. |
882 | Robust Subspace Clustering With Independent and Piecewise Identically Distributed Noise Modeling | Yuanman Li, Jiantao Zhou, Xianwei Zheng, Jinyu Tian, Yuan Yan Tang | In this work, we propose an independent and piecewise identically distributed (i.p.i.d.) noise model, where the i.i.d. property only holds locally. |
883 | What Correspondences Reveal About Unknown Camera and Motion Models? | Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool | In this paper, we tackle this problem in two steps. |
884 | Self-Calibrating Deep Photometric Stereo Networks | Guanying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong | This paper proposes an uncalibrated photometric stereo method for non-Lambertian scenes based on deep learning. |
885 | Argoverse: 3D Tracking and Forecasting With Rich Maps | Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, James Hays | We present Argoverse, a dataset designed to support autonomous vehicle perception tasks including 3D tracking and motion forecasting. |
886 | Side Window Filtering | Hui Yin, Yuanhao Gong, Guoping Qiu | Based on this insight, we propose a new Side Window Filtering (SWF) technique which aligns the window’s side or corner with the pixel being processed. |
887 | Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search | Abhimanyu Dubey, Laurens van der Maaten, Zeki Yalniz, Yixuan Li, Dhruv Mahajan | In this work, we hypothesize that adversarial perturbations move the image away from the image manifold in the sense that there exists no physical process that could have produced the adversarial image. |
888 | Incremental Object Learning From Contiguous Views | Stefan Stojanov, Samarth Mishra, Ngoc Anh Thai, Nikhil Dhanda, Ahmad Humayun, Chen Yu, Linda B. Smith, James M. Rehg | In this work, we present CRIB (Continual Recognition Inspired by Babies), a synthetic incremental object learning environment that can produce data that models visual imagery produced by object exploration in early infancy. |
889 | IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition | Xiaoping Wu, Chi Zhan, Yu-Kun Lai, Ming-Ming Cheng, Jufeng Yang | In this paper, we collect a large-scale dataset named IP102 for insect pest recognition. |
890 | CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification | Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, Jenq-Neng Hwang | This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. |
891 | Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence | Amir Zadeh, Michael Chan, Paul Pu Liang, Edmund Tong, Louis-Philippe Morency | In this paper, we introduce Social-IQ, a unconstrained benchmark specifically designed to train and evaluate socially intelligent technologies. |
892 | UPSNet: A Unified Panoptic Segmentation Network | Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun | In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. |
893 | JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds With Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields | Quang-Hieu Pham, Thanh Nguyen, Binh-Son Hua, Gemma Roig, Sai-Kit Yeung | In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. |
894 | Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth | Davy Neven, Bert De Brabandere, Marc Proesmans, Luc Van Gool | In this work we propose a new clustering loss function for proposal-free instance segmentation. |
895 | DeepCO3: Deep Instance Co-Segmentation by Co-Peak Search and Co-Saliency Detection | Kuang-Jui Hsu, Yen-Yu Lin, Yung-Yu Chuang | In this paper, we address a new task called instance co-segmentation. |
896 | Improving Semantic Segmentation via Video Propagation and Label Relaxation | Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam, Andrew Tao, Bryan Catanzaro | In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. |
897 | Accel: A Corrective Fusion Network for Efficient Semantic Segmentation on Video | Samvit Jain, Xin Wang, Joseph E. Gonzalez | We present Accel, a novel semantic video segmentation system that achieves high accuracy at low inference cost by combining the predictions of two network branches: (1) a reference branch that extracts high-detail features on a reference keyframe, and warps these features forward using frame-to-frame optical flow estimates, and (2) an update branch that computes features of adjustable quality on the current frame, performing a temporal update at each video frame. |
898 | Shape2Motion: Joint Analysis of Motion Parts and Attributes From 3D Shapes | Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu | For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input. |
899 | Semantic Correlation Promoted Shape-Variant Context for Segmentation | Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, Gang Wang | In this work, we propose to generate a scale- and shape-variant semantic mask for each pixel to confine its contextual region. |
900 | Relation-Shape Convolutional Neural Network for Point Cloud Analysis | Yongcheng Liu, Bin Fan, Shiming Xiang, Chunhong Pan | In this paper, we propose RS-CNN, namely, Relation-Shape Convolutional Neural Network, which extends regular grid CNN to irregular configuration for point cloud analysis. |
901 | Enhancing Diversity of Defocus Blur Detectors via Cross-Ensemble Network | Wenda Zhao, Bowen Zheng, Qiuhua Lin, Huchuan Lu | In this paper, we propose a novel learning strategy by breaking DBD problem into multiple smaller defocus blur detectors and thus estimate errors can cancel out each other. |
902 | BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames | Brent A. Griffin, Jason J. Corso | This paper addresses the problem of learning to suggest the single best frame across the video for user annotation-this is, in fact, never the first frame of video. |
903 | Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images | Wuyang Chen, Ziyu Jiang, Zhangyang Wang, Kexin Cui, Xiaoning Qian | We propose collaborative Global-Local Networks (GLNet) to effectively preserve both global and local information in a highly memory-efficient manner. |
904 | Efficient Parameter-Free Clustering Using First Neighbor Relations | Saquib Sarfraz, Vivek Sharma, Rainer Stiefelhagen | We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. |
905 | Learning Personalized Modular Network Guided by Structured Knowledge | Xiaodan Liang | In this paper, we treat the structured commonsense knowledge (e.g. concept hierarchy) as the guidance of customizing more powerful and explainable network structures for distinct inputs, leading to dynamic and individualized inference paths. |
906 | A Generative Appearance Model for End-To-End Video Object Segmentation | Joakim Johnander, Martin Danelljan, Emil Brissman, Fahad Shahbaz Khan, Michael Felsberg | To address these issues, we propose a network architecture that learns a powerful representation of the target and background appearance in a single forward pass. |
907 | A Flexible Convolutional Solver for Fast Style Transfers | Gilles Puy, Patrick Perez | We propose a new flexible deep convolutional neural network (convnet) to perform fast neural style transfers. |
908 | Cross Domain Model Compression by Structurally Weight Sharing | Shangqian Gao, Cheng Deng, Heng Huang | In this paper, thus, we propose a new robust cross domain model compression method. |
909 | TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning | Matthew Amodio, Smita Krishnaswamy | For this purpose, we introduce a novel GAN based on preserving intra-domain vector transformations in a latent space learned by a siamese network. |
910 | Deep Robust Subjective Visual Property Prediction in Crowdsourcing | Qianqian Xu, Zhiyong Yang, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang, Yuan Yao | In this paper, we construct a deep SVP prediction model which not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. |
911 | Transferable AutoML by Model Sharing Over Grouped Datasets | Chao Xue, Junchi Yan, Rong Yan, Stephen M. Chu, Yonggang Hu, Yonghua Lin | This paper presents a so-called transferable AutoML approach that leverages previously trained models to speed up the search process for new tasks and datasets. |
912 | Learning Not to Learn: Training Deep Neural Networks With Biased Data | Byungju Kim, Hyunwoo Kim, Kyungsu Kim, Sungjin Kim, Junmo Kim | We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased. |
913 | IRLAS: Inverse Reinforcement Learning for Architecture Search | Minghao Guo, Zhao Zhong, Wei Wu, Dahua Lin, Junjie Yan | In this paper, we propose an inverse reinforcement learning method for architecture search (IRLAS), which trains an agent to learn to search network structures that are topologically inspired by human-designed network. |
914 | Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences | Seonguk Seo, Paul Hongsuck Seo, Bohyung Han | We propose a generic framework to calibrate accuracy and confidence of a prediction in deep neural networks through stochastic inferences. |
915 | Attention-Based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions | Masanori Suganuma, Xing Liu, Takayuki Okatani | For this purpose, we propose a simple yet effective layer architecture of neural networks. |
916 | Fully Learnable Group Convolution for Acceleration of Deep Neural Networks | Xijun Wang, Meina Kan, Shiguang Shan, Xilin Chen | To reduce the high computational and memory cost, in this work, we propose a fully learnable group convolution module (FLGC for short) which is quite efficient and can be embedded into any deep neural networks for acceleration. |
917 | EIGEN: Ecologically-Inspired GENetic Approach for Neural Network Structure Searching From Scratch | Jian Ren, Zhe Li, Jianchao Yang, Ning Xu, Tianbao Yang, David J. Foran | In this paper, we propose an Ecologically-Inspired GENetic (EIGEN) approach that uses the concept of succession, extinction, mimicry, and gene duplication to search neural network structure from scratch with poorly initialized simple network and few constraints forced during the evolution, as we assume no prior knowledge about the task domain. |
918 | Deep Incremental Hashing Network for Efficient Image Retrieval | Dayan Wu, Qi Dai, Jing Liu, Bo Li, Weiping Wang | In this paper, we propose a novel deep hashing framework, called Deep Incremental Hashing Network (DIHN), for learning hash codes in an incremental manner. |
919 | Robustness via Curvature Regularization, and Vice Versa | Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, Pascal Frossard | In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries. |
920 | SparseFool: A Few Pixels Make a Big Difference | Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard | In this paper, we exploit the low mean curvature of the decision boundary, and propose SparseFool, a geometry inspired sparse attack that controls the sparsity of the perturbations. |
921 | Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks | Jorg Wagner, Jan Mathias Kohler, Tobias Gindele, Leon Hetzel, Jakob Thaddaus Wiedemer, Sven Behnke | In this work, we propose a post-hoc, optimization based visual explanation method, which highlights the evidence in the input image for a specific prediction. |
922 | Structured Pruning of Neural Networks With Budget-Aware Regularization | Carl Lemaire, Andrew Achkar, Pierre-Marc Jodoin | To overcome this, we introduce a budgeted regularized pruning framework for deep CNNs. |
923 | MBS: Macroblock Scaling for CNN Model Reduction | Yu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang | In this paper we propose the macroblock scaling (MBS) algorithm, which can be applied to various CNN architectures to reduce their model size. |
924 | Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells | Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid | In this work, we are particularly interested in searching for high-performance compact segmentation architectures, able to run in real-time using limited resources. |
925 | Generating 3D Adversarial Point Clouds | Chong Xiang, Charles R. Qi, Bo Li | In this work, we propose several novel algorithms to craft adversarial point clouds against PointNet, a widely used deep neural network for point cloud processing. |
926 | Partial Order Pruning: For Best Speed/Accuracy Trade-Off in Neural Architecture Search | Xin Li, Yiming Zhou, Zheng Pan, Jiashi Feng | In this work, we propose an algorithm that can offer better speed/accuracy trade-off of searched networks, which is termed “Partial Order Pruning”. |
927 | Memory in Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity From Spatiotemporal Dynamics | Yunbo Wang, Jianjin Zhang, Hongyu Zhu, Mingsheng Long, Jianmin Wang, Philip S. Yu | We propose the Memory In Memory (MIM) networks and corresponding recurrent blocks for this purpose. |
928 | Variational Information Distillation for Knowledge Transfer | Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai | We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. |
929 | You Look Twice: GaterNet for Dynamic Filter Selection in CNNs | Zhourong Chen, Yang Li, Samy Bengio, Si Si | In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). |
930 | SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360deg Images | Yeonkun Lee, Jaeseok Jeong, Jongseob Yun, Wonjune Cho, Kuk-Jin Yoon | This paper presents a novel method to resolve such prob-lems of applying CNNs to omni-directional images. |
931 | ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network | Sachin Mehta, Mohammad Rastegari, Linda Shapiro, Hannaneh Hajishirzi | We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2, for modeling visual and sequential data. |
932 | Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors | Mohammad Mahdi Derakhshani, Saeed Masoudnia, Amir Hossein Shaker, Omid Mersa, Mohammad Amin Sadeghi, Mohammad Rastegari, Babak N. Araabi | We present a simple yet effective learning technique that significantly improves mAP of YOLO object detectors without compromising their speed. |
933 | Exploiting Edge Features for Graph Neural Networks | Liyu Gong, Qiang Cheng | In this paper, we build a new framework for a family of new graph neural network models that can more sufficiently exploit edge features, including those of undirected or multi-dimensional edges. |
934 | Propagation Mechanism for Deep and Wide Neural Networks | Dejiang Xu, Mong Li Lee, Wynne Hsu | In this paper, we propose a new propagation mechanism called channel-wise addition (cAdd) to deal with the vanishing gradients problem without sacrificing the complexity of the learned features. |
935 | Catastrophic Child’s Play: Easy to Perform, Hard to Defend Adversarial Attacks | Chih-Hui Ho, Brandon Leung, Erik Sandstrom, Yen Chang, Nuno Vasconcelos | A framework for the study of such attacks is proposed, using real world object manipulations. |
936 | Embedding Complementary Deep Networks for Image Classification | Qiuyu Chen, Wei Zhang, Jun Yu, Jianping Fan | In this paper, a deep embedding algorithm is developed to achieve higher accuracy rates on large-scale image classification. |
937 | Deep Multimodal Clustering for Unsupervised Audiovisual Learning | Di Hu, Feiping Nie, Xuelong Li | To settle this problem, we propose to adequately excavate audio and visual components and perform elaborate correspondence learning among them. |
938 | Dense Classification and Implanting for Few-Shot Learning | Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, Andrei Bursuc | We propose two simple and effective solutions: (i) dense classification over feature maps, which for the first time studies local activations in the domain of few-shot learning, and (ii) implanting, that is, attaching new neurons to a previously trained network to learn new, task-specific features. |
939 | Class-Balanced Loss Based on Effective Number of Samples | Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie | In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. |
940 | Discovering Visual Patterns in Art Collections With Spatially-Consistent Feature Learning | Xi Shen, Alexei A. Efros, Mathieu Aubry | Our goal in this paper is to discover near duplicate patterns in large collections of artworks. |
941 | Min-Max Statistical Alignment for Transfer Learning | Samitha Herath, Mehrtash Harandi, Basura Fernando, Richard Nock | We question the capability of this school of thought and propose to minimize the maximum disparity between domains. |
942 | Spatial-Aware Graph Relation Network for Large-Scale Object Detection | Hang Xu, Chenhan Jiang, Xiaodan Liang, Zhenguo Li | In this work, we introduce a Spatial-aware Graph Relation Network (SGRN) to adaptive discover and incorporate key semantic and spatial relationships for reasoning over each object. |
943 | Deformable ConvNets V2: More Deformable, Better Results | Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai | To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. |
944 | Interaction-And-Aggregation Network for Person Re-Identification | Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, Xilin Chen | In this paper, we propose a novel network structure, Interaction-and-Aggregation (IA), to enhance the feature representation capability of CNNs. |
945 | Rare Event Detection Using Disentangled Representation Learning | Ryuhei Hamaguchi, Ken Sakurada, Ryosuke Nakamura | This paper presents a novel method for rare event detection from an image pair with class-imbalanced datasets. |
946 | Shape Robust Text Detection With Progressive Scale Expansion Network | Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao | To address these two challenges, in this paper, we propose a novel Progressive Scale Expansion Network (PSENet), which can precisely detect text instances with arbitrary shapes. |
947 | Dual Encoding for Zero-Example Video Retrieval | Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang | In contrast, this paper takes a concept-free approach, proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. |
948 | MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors | Lile Cai, Bin Zhao, Zhe Wang, Jie Lin, Chuan Sheng Foo, Mohamed Sabry Aly, Vijay Chandrasekhar | In this paper, we introduce MaxpoolNMS, a parallelizable alternative to the NMS algorithm, which is based on max-pooling classification score maps. |
949 | Character Region Awareness for Text Detection | Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee | In this paper, we propose a new scene text detection method to effectively detect text area by exploring each character and affinity between characters. |
950 | Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features | Vlad Hosu, Bastian Goldlucke, Dietmar Saupe | We propose an effective deep learning approach to aesthetics quality assessment that relies on a new type of pre-trained features, and apply it to the AVA data set, the currently largest aesthetics database. |
951 | Attentive Region Embedding Network for Zero-Shot Learning | Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, Ling Shao | In this paper, to discover (semantic) regions, we propose the attentive region embedding network (AREN), which is tailored to advance the ZSL task. |
952 | Explicit Spatial Encoding for Deep Local Descriptors | Arun Mukundan, Giorgos Tolias, Ondrej Chum | We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations. |
953 | Panoptic Segmentation | Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar | The aim of our work is to revive the interest of the community in a more unified view of image segmentation. |
954 | You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection | Krishna Kumar Singh, Yong Jae Lee | We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. |
955 | Explore-Exploit Graph Traversal for Image Retrieval | Cheng Chang, Guangwei Yu, Chundi Liu, Maksims Volkovs | We propose a novel graph-based approach for image retrieval. |
956 | Dissimilarity Coefficient Based Weakly Supervised Object Detection | Aditya Arun, C.V. Jawahar, M. Pawan Kumar | We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. |
957 | Kernel Transformer Networks for Compact Spherical Convolution | Yu-Chuan Su, Kristen Grauman | We present the Kernel Transformer Network (KTN) to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360deg images. |
958 | Object Detection With Location-Aware Deformable Convolution and Backward Attention Filtering | Chen Zhang, Joohee Kim | In this paper, we propose a location-aware deformable convolution and a backward attention filtering to improve the detection performance. |
959 | Variational Prototyping-Encoder: One-Shot Learning With Prototypical Images | Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, In So Kweon | We propose a new approach called variational prototyping-encoder (VPE) that learns the image translation task from real-world input images to their corresponding prototypical images as a meta-task. |
960 | Unsupervised Domain Adaptation Using Feature-Whitening and Consensus Loss | Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulo, Nicu Sebe, Elisa Ricci | In this work we introduce a novel deep learning framework which unifies different paradigms in unsupervised domain adaptation. |
961 | FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation | Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen | In this work, we propose FEELVOS as a simple and fast method which does not rely on fine-tuning. |
962 | PartNet: A Recursive Part Decomposition Network for Fine-Grained and Hierarchical Shape Segmentation | Fenggen Yu, Kun Liu, Yan Zhang, Chenyang Zhu, Kai Xu | It achieves the state-of-the-art performance, both for fine-grained and semantic segmentation, on the public benchmark and a new benchmark of fine-grained segmentation proposed in this work. |
963 | Learning Multi-Class Segmentations From Single-Class Datasets | Konstantin Dmitriev, Arie E. Kaufman | While existing segmentation research in such domains use private multi-class datasets or focus on single-class segmentations, we propose a unified highly efficient framework for robust simultaneous learning of multi-class segmentations by combining single-class datasets and utilizing a novel way of conditioning a convolutional network for the purpose of segmentation. |
964 | Convolutional Recurrent Network for Road Boundary Extraction | Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Shenlong Wang, Raquel Urtasun | In this paper, we tackle the problem of drivable road boundary extraction from LiDAR and camera imagery. |
965 | DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation | Hanchao Li, Pengfei Xiong, Haoqiang Fan, Jian Sun | This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. |
966 | A Cross-Season Correspondence Dataset for Robust Semantic Segmentation | Mans Larsson, Erik Stenborg, Lars Hammarstrand, Marc Pollefeys, Torsten Sattler, Fredrik Kahl | In this paper, we present a method to utilize 2D-2D point matches between images taken during different image conditions to train a convolutional neural network for semantic segmentation. |
967 | ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries With Anomalous Features | Yue Wu, Wael AbdAlmageed, Premkumar Natarajan | To fight against real-life image forgery, which commonly involves different types and combined manipulations, we propose a unified deep neural architecture called ManTra-Net. |
968 | On Zero-Shot Recognition of Generic Objects | Tristan Hascoet, Yasuo Ariki, Tetsuya Takiguchi | In this paper, we argue that the main reason behind this apparent lack of progress is the poor quality of this benchmark. |
969 | Explicit Bias Discovery in Visual Question Answering Models | Varun Manjunatha, Nirat Saini, Larry S. Davis | It is of interest to the community to explicitly discover such biases, both for understanding the behavior of such models, and towards debugging them. Our work address this problem. |
970 | REPAIR: Removing Representation Bias by Dataset Resampling | Yi Li, Nuno Vasconcelos | REPAIR: Removing Representation Bias by Dataset Resampling The tools used for characterizing representation bias, and the proposed dataset REPAIR algorithm, are available at https://github.com/JerryYLi/Dataset-REPAIR/. |
971 | Label Efficient Semi-Supervised Learning via Graph Filtering | Qimai Li, Xiao-Ming Wu, Han Liu, Xiaotong Zhang, Zhichao Guan | In this paper, we address label efficient semi-supervised learning from a graph filtering perspective. |
972 | MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection | Paul Bergmann, Michael Fauser, David Sattlegger, Carsten Steger | We introduce the MVTec Anomaly Detection (MVTec AD) dataset containing 5354 high-resolution color images of different object and texture categories. |
973 | ABC: A Big CAD Model Dataset for Geometric Deep Learning | Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, Daniele Panozzo | We introduce ABC-Dataset, a collection of one million Computer-Aided Design (CAD) models for research of geometric deep learning methods and applications. |
974 | Tightness-Aware Evaluation Protocol for Scene Text Detection | Yuliang Liu, Lianwen Jin, Zecheng Xie, Canjie Luo, Shuaitao Zhang, Lele Xie | Therefore, this paper proposes a novel evaluation protocol called Tightness-aware Intersect-over-Union (TIoU) metric that could quantify completeness of ground truth, compactness of detection, and tightness of matching degree. |
975 | PointConv: Deep Convolutional Networks on 3D Point Clouds | Wenxuan Wu, Zhongang Qi, Li Fuxin | In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. |
976 | Octree Guided CNN With Spherical Kernels for 3D Point Clouds | Huan Lei, Naveed Akhtar, Ajmal Mian | We propose an octree guided neural network architecture and spherical convolutional kernel for machine learning from arbitrary 3D point clouds. |
977 | VITAMIN-E: VIsual Tracking and MappINg With Extremely Dense Feature Points | Masashi Yokozuka, Shuji Oishi, Simon Thompson, Atsuhiko Banno | In this paper, we propose a novel indirect monocular simultaneous localization and mapping (SLAM) algorithm called “VITAMIN-E,” which is highly accurate and robust as a result of tracking extremely dense feature points. |
978 | Conditional Single-View Shape Generation for Multi-View Stereo Reconstruction | Yi Wei, Shaohui Liu, Wang Zhao, Jiwen Lu | In this paper, we present a new perspective towards image-based shape generation. |
979 | Learning to Adapt for Stereo | Alessio Tonioni, Oscar Rahnama, Thomas Joy, Luigi Di Stefano, Thalaiyasingam Ajanthan, Philip H.S. Torr | In this work, we introduce a “learning-to-adapt” framework that enables deep stereo methods to continuously adapt to new target domains in an unsupervised manner. |
980 | 3D Appearance Super-Resolution With Deep Learning | Yawei Li, Vagia Tsiminaki, Radu Timofte, Marc Pollefeys, Luc Van Gool | We tackle the problem of retrieving high-resolution (HR) texture maps of objects that are captured from multiple view points. |
981 | Radial Distortion Triangulation | Zuzana Kukelova, Viktor Larsson | This paper presents the first optimal, maximal likelihood, solution to the triangulation problem for radially distorted cameras. |
982 | Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes | Ziquan Lan, Zi Jian Yew, Gim Hee Lee | To alleviate this problem, we propose a probabilistic approach for robust back-end optimization in the presence of outliers. |
983 | Minimal Solvers for Mini-Loop Closures in 3D Multi-Scan Alignment | Pedro Miraldo, Surojit Saha, Srikumar Ramalingam | In this paper, we take a different approach and develop minimal solvers for jointly computing the initial poses of cameras in small loops such as 3-, 4-, and 5-cycles. |
984 | Volumetric Capture of Humans With a Single RGBD Camera via Semi-Parametric Learning | Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean Fanello | Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD camera. |
985 | Joint Face Detection and Facial Motion Retargeting for Multiple Faces | Bindita Chaudhuri, Noranart Vesdapunt, Baoyuan Wang | In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for multiple faces. |
986 | Monocular Depth Estimation Using Relative Depth Maps | Jae-Han Lee, Chang-Su Kim | We propose a novel algorithm for monocular depth estimation using relative depth maps. |
987 | Unsupervised Primitive Discovery for Improved 3D Generative Modeling | Salman H. Khan, Yulan Guo, Munawar Hayat, Nick Barnes | Here, we propose a novel factorized generative model for 3D shape generation that sequentially transitions from coarse to fine scale shape generation. |
988 | Learning to Explore Intrinsic Saliency for Stereoscopic Video | Qiudan Zhang, Xu Wang, Shiqi Wang, Shikai Li, Sam Kwong, Jianmin Jiang | In this paper, we argue that the high-level features are crucial and resort to the deep learning framework to learn the saliency map of stereoscopic videos. |
989 | Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres | Shuai Liao, Efstratios Gavves, Cees G. M. Snoek | By introducing a spherical exponential mapping on n-spheres at the regression output, we obtain well-behaved gradients, leading to stable training. |
990 | Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation | Andrea Pilzer, Stephane Lathuiliere, Nicu Sebe, Elisa Ricci | Following these works, we propose a novel self-supervised deep model for estimating depth maps. |
991 | Learning View Priors for Single-View 3D Reconstruction | Hiroharu Kato, Tatsuya Harada | To reconstruct shapes that look reasonable from any viewpoint, we propose to train a discriminator that learns prior knowledge regarding possible views. |
992 | Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation | Shanshan Zhao, Huan Fu, Mingming Gong, Dacheng Tao | Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly. |
993 | Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge | Fabio Tosi, Filippo Aleotti, Matteo Poggi, Stefano Mattoccia | To this aim we propose monoResMatch, a novel deep architecture designed to infer depth from a single input image by synthesizing features from a different point of view, horizontally aligned with the input image, performing stereo matching between the two cues. |
994 | SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception | Yue Meng, Yongxi Lu, Aman Raj, Samuel Sunarjo, Rui Guo, Tara Javidi, Gaurav Bansal, Dinesh Bharadia | This paper introduces SIGNet, a novel framework that provides robust geometry perception without requiring geometrically informative labels. |
995 | 3D Guided Fine-Grained Face Manipulation | Zhenglin Geng, Chen Cao, Sergey Tulyakov | We present a method for fine-grained face manipulation. |
996 | Neuro-Inspired Eye Tracking With Eye Movement Dynamics | Kang Wang, Hui Su, Qiang Ji | To address this issue, we propose to leverage on eye movement dynamics inspired by neurological studies. |
997 | Facial Emotion Distribution Learning by Exploiting Low-Rank Label Correlations Locally | Xiuyi Jia, Xiang Zheng, Weiwei Li, Changqing Zhang, Zechao Li | Therefore, to depict facial expressions more accurately, this paper adopts a label distribution learning approach for emotion recognition that can address the ambiguity of “how to describe the expression” and proposes an emotion distribution learning method that exploits label correlations locally. |
998 | Unsupervised Face Normalization With Extreme Pose and Expression in the Wild | Yichen Qian, Weihong Deng, Jiani Hu | To this end, we propose a Face Normalization Model (FNM) to generate a frontal, neutral expression, photorealistic face image for face recognition. |
999 | Semantic Component Decomposition for Face Attribute Manipulation | Ying-Cong Chen, Xiaohui Shen, Zhe Lin, Xin Lu, I-Ming Pao, Jiaya Jia | In this paper, we address these issues by proposing a semantic component model. |
1000 | R3 Adversarial Network for Cross Model Face Recognition | Ken Chen, Yichao Wu, Haoyu Qin, Ding Liang, Xuebo Liu, Junjie Yan | In this paper, we raise a new problem, namely cross model face recognition (CMFR), which has considerable economic and social significance. |
1001 | Disentangling Latent Hands for Image Synthesis and Pose Estimation | Linlin Yang, Angela Yao | To better analyze these factors of variation, we propose the use of disentangled representations and a disentangled variational autoencoder (dVAE) that allows for specific sampling and inference of these factors. |
1002 | Generating Multiple Hypotheses for 3D Human Pose Estimation With Mixture Density Network | Chen Li, Gim Hee Lee | In this paper, we propose a novel approach to generate multiple feasible hypotheses of the 3D pose from 2D joints. |
1003 | CrossInfoNet: Multi-Task Information Sharing Based Hand Pose Estimation | Kuo Du, Xiangbo Lin, Yi Sun, Xiaohong Ma | Our main contributions lie in designing a new pose regression network architecture named CrossInfoNet. |
1004 | P2SGrad: Refined Gradients for Optimizing Deep Face Models | Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li | This paper addresses this challenge by directly designing the gradients for training in an adaptive manner. |
1005 | Action Recognition From Single Timestamp Supervision in Untrimmed Videos | Davide Moltisanti, Sanja Fidler, Dima Damen | We propose a method that is supervised by single timestamps located around each action instance, in untrimmed videos. |
1006 | Time-Conditioned Action Anticipation in One Shot | Qiuhong Ke, Mario Fritz, Bernt Schiele | In this paper, we propose a novel time-conditioned method for efficient and effective long-term action anticipation. |
1007 | Dance With Flow: Two-In-One Stream Action Detection | Jiaojiao Zhao, Cees G. M. Snoek | The goal of this paper is to detect the spatio-temporal extent of an action. |
1008 | Representation Flow for Action Recognition | AJ Piergiovanni, Michael S. Ryoo | In this paper, we propose a convolutional layer inspired by optical flow algorithms to learn motion representations. |
1009 | LSTA: Long Short-Term Attention for Egocentric Action Recognition | Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz | In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. |
1010 | Learning Actor Relation Graphs for Group Activity Recognition | Jianchao Wu, Limin Wang, Li Wang, Jie Guo, Gangshan Wu | This paper aims at learning discriminative relation between actors efficiently using deep models. |
1011 | A Structured Model for Action Detection | Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid | To address this limitation, we propose to incorporate domain knowledge into the structure of the model, simplifying optimization. |
1012 | Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition | Devraj Mandal, Sanath Narayan, Sai Kumar Dwivedi, Vikram Gupta, Shuaib Ahmed, Fahad Shahbaz Khan, Ling Shao | In this paper, we set out to tackle this issue by arguing for a separate treatment of seen and unseen action categories in generalized zero-shot action recognition. |
1013 | Object Discovery in Videos as Foreground Motion Clustering | Christopher Xie, Yu Xiang, Zaid Harchaoui, Dieter Fox | We consider the problem of providing dense segmentation masks for object discovery in videos. |
1014 | Towards Natural and Accurate Future Motion Prediction of Humans and Animals | Zhenguang Liu, Shuang Wu, Shuyuan Jin, Qi Liu, Shijian Lu, Roger Zimmermann, Li Cheng | To address these problems, we propose to explicitly encode anatomical constraints by modeling their skeletons with a Lie algebra representation. |
1015 | Automatic Face Aging in Videos via Deep Reinforcement Learning | Chi Nhan Duong, Khoa Luu, Kha Gia Quach, Nghia Nguyen, Eric Patterson, Tien D. Bui, Ngan Le | This paper presents a novel approach for synthesizing automatically age-progressed facial images in video sequences using Deep Reinforcement Learning. |
1016 | Multi-Adversarial Discriminative Deep Domain Generalization for Face Presentation Attack Detection | Rui Shao, Xiangyuan Lan, Jiawei Li, Pong C. Yuen | We propose to learn a generalized feature space via a novel multi-adversarial discriminative deep domain generalization framework. |
1017 | A Content Transformation Block for Image Style Transfer | Dmytro Kotovenko, Artsiom Sanakoyeu, Pingchuan Ma, Sabine Lang, Bjorn Ommer | Therefore, we introduce a content transformation module between the encoder and decoder. |
1018 | BeautyGlow: On-Demand Makeup Transfer Framework With Reversible Generative Network | Hung-Jen Chen, Ka-Ming Hui, Szu-Yu Wang, Li-Wu Tsao, Hong-Han Shuai, Wen-Huang Cheng | To facilitate on-demand makeup transfer, in this work, we propose BeautyGlow that decompose the latent vectors of face images derived from the Glow model into makeup and non-makeup latent vectors. |
1019 | Style Transfer by Relaxed Optimal Transport and Self-Similarity | Nicholas Kolkin, Jason Salavon, Gregory Shakhnarovich | We propose Style Transfer by Relaxed Optimal Transport and Self-Similarity (STROTSS), a new optimization-based style transfer algorithm. |
1020 | Inserting Videos Into Videos | Donghoon Lee, Tomas Pfister, Ming-Hsuan Yang | In this paper, we introduce a new problem of manipulating a given video by inserting other videos into it. |
1021 | Learning Image and Video Compression Through Spatial-Temporal Energy Compaction | Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto | Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. |
1022 | Event-Based High Dynamic Range Image and Very High Frame Rate Video Generation Using Conditional Generative Adversarial Networks | Lin Wang, S. Mohammad Mostafavi I., Yo-Sung Ho, Kuk-Jin Yoon | In this paper, we unlock the potential of event camera-based conditional generative adversarial networks to create images/videos from an adjustable portion of the event data stream. |
1023 | Enhancing TripleGAN for Semi-Supervised Conditional Instance Synthesis and Classification | Si Wu, Guangchang Deng, Jichang Li, Rui Li, Zhiwen Yu, Hau-San Wong | To improve both instance synthesis and classification in this setting, we propose an enhanced TripleGAN (EnhancedTGAN) model in this work. |
1024 | Capture, Learning, and Synthesis of 3D Speaking Styles | Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black | To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. |
1025 | Nesti-Net: Normal Estimation for Unstructured 3D Point Clouds Using Convolutional Neural Networks | Yizhak Ben-Shabat, Michael Lindenbaum, Anath Fischer | In this paper, we propose a normal estimation method for unstructured 3D point clouds. |
1026 | Ray-Space Projection Model for Light Field Camera | Qi Zhang, Jinbo Ling, Qing Wang, Jingyi Yu | In the paper, we propose a novel ray-space projection model to transform sets of rays captured by multiple light field cameras in term of the Plucker coordinates. |
1027 | Deep Geometric Prior for Surface Reconstruction | Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, Daniele Panozzo | We propose the use of a deep neural network as a geometric prior for surface reconstruction. |
1028 | Analysis of Feature Visibility in Non-Line-Of-Sight Measurements | Xiaochun Liu, Sebastian Bauer, Andreas Velten | We formulate an equation describing a general Non-line-of-sight (NLOS) imaging measurement and analyze the properties of the measurement in the Fourier domain regarding the spatial frequencies of the scene it encodes. |
1029 | Hyperspectral Imaging With Random Printed Mask | Yuanyuan Zhao, Hui Guo, Zhan Ma, Xun Cao, Tao Yue, Xuemei Hu | In this paper, based on a simple but not widely noticed phenomenon that the color printer can print color masks with a large number of independent spectral transmission responses, we propose a simple and low-budget scheme to capture the hyperspectral images with a random mask printed by the consumer-level color printer. |
1030 | All-Weather Deep Outdoor Lighting Estimation | Jinsong Zhang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Sunil Hadap, Jonathan Eisenman, Jean-Francois Lalonde | We present a neural network that predicts HDR outdoor illumination from a single LDR image. |
1031 | A Variational EM Framework With Adaptive Edge Selection for Blind Motion Deblurring | Liuge Yang, Hui Ji | This paper presented an interpretation of edge selection/reweighting in terms of variational Bayes inference, and therefore developed a novel variational expectation maximization (VEM) algorithm with built-in adaptive edge selection for blind deblurring. |
1032 | Viewport Proposal CNN for 360deg Video Quality Assessment | Chen Li, Mai Xu, Lai Jiang, Shanyi Zhang, Xiaoming Tao | Thus, this paper proposes a viewport-based convolutional neural network (V-CNN) approach for VQA on 360deg video, considering both auxiliary tasks of viewport proposal and viewport saliency prediction. |
1033 | Beyond Gradient Descent for Regularized Segmentation Losses | Dmitrii Marin, Meng Tang, Ismail Ben Ayed, Yuri Boykov | Our work suggests that network design/training should pay more attention to optimization methods. |
1034 | MAGSAC: Marginalizing Sample Consensus | Daniel Barath, Jiri Matas, Jana Noskova | A method called, sigma-consensus, is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. |
1035 | Understanding and Visualizing Deep Visual Saliency Models | Sen He, Hamed R. Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault | This article attempts to answer these questions by analyzing the representations learned by individual neurons located at the intermediate layers of deep saliency models. |
1036 | Divergence Prior and Vessel-Tree Reconstruction | Zhongwen Zhang, Dmitrii Marin, Egor Chesakov, Marc Moreno Maza, Maria Drangova, Yuri Boykov | We propose a new geometric regularization principle for reconstructing vector fields based on prior knowledge about their divergence. |
1037 | Unsupervised Domain-Specific Deblurring via Disentangled Representations | Boyu Lu, Jun-Cheng Chen, Rama Chellappa | In this paper, we present an unsupervised method for domain-specific, single-image deblurring based on disentangled representations. |
1038 | Douglas-Rachford Networks: Learning Both the Image Prior and Data Fidelity Terms for Blind Image Deconvolution | Raied Aljadaany, Dipan K. Pal, Marios Savvides | In this paper, we present a method called Dr-Net, which does not require any such estimate and is further able to invert the effects of the blurring in blind image recovery tasks. |
1039 | Speed Invariant Time Surface for Learning to Detect Corner Points With Event-Based Cameras | Jacques Manderscheid, Amos Sironi, Nicolas Bourdis, Davide Migliore, Vincent Lepetit | We propose a learning approach to corner detection for event-based cameras that is stable even under fast and abrupt motions. Moreover, we introduce a high-resolution dataset suitable for quantitative evaluation and comparison of corner detection methods for event-based cameras. |
1040 | Training Deep Learning Based Image Denoisers From Undersampled Measurements Without Ground Truth and Without Image Prior | Magauiya Zhussip, Shakarim Soltanayev, Se Young Chun | To resolve this dilemma, we propose novel methods based on two well-grounded theories: denoiser-approximate message passing (D-AMP) and Stein’s unbiased risk estimator (SURE). |
1041 | A Variational Pan-Sharpening With Local Gradient Constraints | Xueyang Fu, Zihuang Lin, Yue Huang, Xinghao Ding | In this paper, a new variational model based on a local gradient constraint for pan-sharpening is proposed. |
1042 | F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning | Yongqin Xian, Saurabh Sharma, Bernt Schiele, Zeynep Akata | In this paper, we tackle any-shot learning problems i.e. zero-shot and few-shot, in a unified feature generating framework that operates in both inductive and transductive learning settings. |
1043 | Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation | Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, Daniel Ulbricht | In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary and the Wasserstein metric. |
1044 | Graph Attention Convolution for Point Cloud Semantic Segmentation | Lei Wang, Yuchun Huang, Yaolin Hou, Shenman Zhang, Jie Shan | This paper proposes a novel graph attention convolution (GAC), whose kernels can be dynamically carved into specific shapes to adapt to the structure of an object. |
1045 | Normalized Diversification | Shaohui Liu, Xiao Zhang, Jianqiao Wangni, Jianbo Shi | We introduce the concept of normalized diversity which force the model to preserve the normalized pairwise distance between the sparse samples from a latent parametric distribution and their corresponding high-dimensional outputs. We demonstrate that by combining the normalized diversity loss and the adversarial loss, we generate diverse data without suffering from mode collapsing. |
1046 | Learning to Localize Through Compressed Binary Maps | Xinkai Wei, Ioan Andrei Barsan, Shenlong Wang, Julieta Martinez, Raquel Urtasun | In this paper we propose to learn to compress the map representation such that it is optimal for the localization task. |
1047 | A Parametric Top-View Representation of Complex Road Scenes | Ziyan Wang, Buyu Liu, Samuel Schulter, Manmohan Chandraker | In this paper, we address the problem of inferring the layout of complex road scenes given a single camera as input. |
1048 | Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction | Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, Yueting Zhuang | We propose a self-supervised spatiotemporal learning technique which leverages the chronological order of videos. |
1049 | Superquadrics Revisited: Learning 3D Shape Parsing Beyond Cuboids | Despoina Paschalidou, Ali Osman Ulusoy, Andreas Geiger | This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. |
1050 | Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network | Xianglei Xing, Tian Han, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu | We present a deformable generator model to disentangle the appearance and geometric information in purely unsupervised manner. |
1051 | Self-Supervised Representation Learning by Rotation Feature Decoupling | Zeyu Feng, Chang Xu, Dacheng Tao | We introduce a self-supervised learning method that focuses on beneficial properties of representation and their abilities in generalizing to real-world tasks. |
1052 | Weakly Supervised Deep Image Hashing Through Tag Embeddings | Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li | Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem. |
1053 | Improved Road Connectivity by Joint Learning of Orientation and Segmentation | Anil Batra, Suriya Singh, Guan Pang, Saikat Basu, C.V. Jawahar, Manohar Paluri | In this paper, we propose a connectivity task called Orientation Learning, motivated by the human behavior of annotating roads by tracing it at a specific orientation. |
1054 | Deep Supervised Cross-Modal Retrieval | Liangli Zhen, Peng Hu, Xu Wang, Dezhong Peng | In this paper, we present a novel cross-modal retrieval method, called Deep Supervised Cross-modal Retrieval (DSCMR). |
1055 | A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning | Thanh-Toan Do, Toan Tran, Ian Reid, Vijay Kumar, Tuan Hoang, Gustavo Carneiro | We propose a method that substantially improves the efficiency of deep distance metric learning based on the optimization of the triplet loss function. |
1056 | Data Representation and Learning With Graph Diffusion-Embedding Networks | Bo Jiang, Doudou Lin, Jin Tang, Bin Luo | In this paper, we present Graph Diffusion-Embedding networks (GDENs), a new model for graph-structured data representation and learning. |
1057 | Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph | Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi | In this paper, we construct a Conditional Random Field on a fully-connected spatio-temporal graph that exploits the statistical dependency between relational entities spatially and temporally. |
1058 | Image-Question-Answer Synergistic Network for Visual Dialog | Dalu Guo, Chang Xu, Dacheng Tao | In this paper, we devise a novel image-question-answer synergistic network to value the role of the answer for precise visual dialog. |
1059 | Not All Frames Are Equal: Weakly-Supervised Video Grounding With Contextual Similarity and Visual Clustering Losses | Jing Shi, Jia Xu, Boqing Gong, Chenliang Xu | In this work, we address these issues by extending frame-level MIL with a false positive frame-bag constraint and modeling the visual feature consistency in the video. |
1060 | Inverse Cooking: Recipe Generation From Food Images | Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero | Therefore, in this paper we introduce an inverse cooking system that recreates cooking recipes given food images. |
1061 | Adversarial Semantic Alignment for Improved Image Captions | Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Tom Sercu | In this paper, we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. |
1062 | Answer Them All! Toward Universal Visual Question Answering Models | Robik Shrestha, Kushal Kafle, Christopher Kanan | To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains. |
1063 | Unsupervised Multi-Modal Neural Machine Translation | Yuanhang Su, Kai Fan, Nguyen Bach, C.-C. Jay Kuo, Fei Huang | We propose an unsupervised multi-modal machine translation (UMNMT) framework based on the language translation cycle consistency loss conditional on the image, targeting to learn the bidirectional multi-modal translation simultaneously. |
1064 | Multi-Task Learning of Hierarchical Vision-Language Representation | Duy-Kien Nguyen, Takayuki Okatani | We propose a multi-task learning approach that enables to learn vision-language representation that is shared by many tasks from their diverse datasets. |
1065 | Cross-Modal Self-Attention Network for Referring Image Segmentation | Linwei Ye, Mrigank Rochan, Zhi Liu, Yang Wang | In this paper, we propose a cross-modal self-attention (CMSA) module that effectively captures the long-range dependencies between linguistic and visual features. |
1066 | DuDoNet: Dual Domain Network for CT Metal Artifact Reduction | Wei-An Lin, Haofu Liao, Cheng Peng, Xiaohang Sun, Jingdan Zhang, Jiebo Luo, Rama Chellappa, Shaohua Kevin Zhou | To overcome these difficulties, we propose an end-to-end trainable Dual Domain Network (DuDoNet) to simultaneously restore sinogram consistency and enhance CT images. |
1067 | Fast Spatio-Temporal Residual Network for Video Super-Resolution | Sheng Li, Fengxiang He, Bo Du, Lefei Zhang, Yonghao Xu, Dacheng Tao | In this paper, we present a novel fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load. |
1068 | Complete the Look: Scene-Based Complementary Product Recommendation | Wang-Cheng Kang, Eric Kim, Jure Leskovec, Charles Rosenberg, Julian McAuley | In this work, we propose a new task called ‘Complete the Look’, which seeks to recommend visually compatible products based on scene images. |
1069 | Selective Sensor Fusion for Neural Visual-Inertial Odometry | Changhao Chen, Stefano Rosa, Yishu Miao, Chris Xiaoxuan Lu, Wei Wu, Andrew Markham, Niki Trigoni | We propose a novel end-to-end selective sensor fusion framework for monocular VIO, which fuses monocular images and inertial measurements in order to estimate the trajectory whilst improving robustness to real-life issues, such as missing and corrupted data or bad sensor synchronization. |
1070 | Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes | Chengquan Zhang, Borong Liang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding, Xinghao Ding | To address these two problems, we present a novel text detector namely LOMO, which localizes the text progressively for multiple times (or in other word, LOok More than Once). |
1071 | Learning Binary Code for Personalized Fashion Recommendation | Zhi Lu, Yang Hu, Yunchao Jiang, Yan Chen, Bing Zeng | In this paper, we propose to learn binary code for efficient personalized fashion outfits recommendation. We collect outfit data together with user label information from a fashion-focused social website for the personalized recommendation task. |
1072 | Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model | Liu Li, Mai Xu, Xiaofei Wang, Lai Jiang, Hanruo Liu | This paper proposes an attention-based CNN for glaucoma detection (AG-CNN). |
1073 | Privacy Protection in Street-View Panoramas Using Depth and Multi-View Imagery | Ries Uittenbogaard, Clint Sebastian, Julien Vijverberg, Bas Boom, Dariu M. Gavrila, Peter H.N. de With | In this paper, we propose a framework that is an alternative to blurring, which automatically removes and inpaints moving objects (e.g. pedestrians, vehicles) in street-view imagery. |
1074 | Grounding Human-To-Vehicle Advice for Self-Driving Vehicles | Jinkyu Kim, Teruhisa Misu, Yi-Ting Chen, Ashish Tawari, John Canny | Here, we propose to address this issue by augmenting training data with natural language advice from a human. |
1075 | Multi-Step Prediction of Occupancy Grid Maps With Recurrent Neural Networks | Nima Mohajerin, Mohsen Rohani | We investigate the multi-step prediction of the drivable space, represented by Occupancy Grid Maps (OGMs), for autonomous vehicles. |
1076 | Connecting Touch and Vision via Cross-Modal Prediction | Yunzhu Li, Jun-Yan Zhu, Russ Tedrake, Antonio Torralba | In this work, we investigate the cross-modal connection between vision and touch. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. |
1077 | X2CT-GAN: Reconstructing CT From Biplanar X-Rays With Generative Adversarial Networks | Xingde Ying, Heng Guo, Kai Ma, Jian Wu, Zhengxin Weng, Yefeng Zheng | In this work, we propose to reconstruct CT from two orthogonal X-rays using the generative adversarial network (GAN) framework. |
1078 | Practical Full Resolution Learned Lossless Image Compression | Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool | We propose the first practical learned lossless image compression system, L3C, and show that it outperforms the popular engineered codecs, PNG, WebP and JPEG 2000. |
1079 | Image-To-Image Translation via Group-Wise Deep Whitening-And-Coloring Transformation | Wonwoong Cho, Sungha Choi, David Keetae Park, Inkyu Shin, Jaegul Choo | In response, this paper proposes an end-to-end approach tailored for image translation that efficiently approximates this transformation with our novel regularization methods. |
1080 | Max-Sliced Wasserstein Distance and Its Use for GANs | Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander G. Schwing | Max-Sliced Wasserstein Distance and Its Use for GANs |
1081 | Meta-Learning With Differentiable Convex Optimization | Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto | Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. |
1082 | RePr: Improved Training of Convolutional Filters | Aaditya Prakash, James Storer, Dinei Florencio, Cha Zhang | Innovations in network architecture such as skip/dense connections and inception units have mitigated this problem to some extent, but these improvements come with increased computation and memory requirements at run-time. We attempt to address this problem from another angle – not by changing the network structure but by altering the training method. |
1083 | Tangent-Normal Adversarial Regularization for Semi-Supervised Learning | Bing Yu, Jingfeng Wu, Jinwen Ma, Zhanxing Zhu | In this work, we propose tangent-normal adversarial regularization (TNAR) as an extension of VAT by taking the data manifold into consideration. |
1084 | Auto-Encoding Scene Graphs for Image Captioning | Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai | We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. |
1085 | Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech | Aditya Deshpande, Jyoti Aneja, Liwei Wang, Alexander G. Schwing, David Forsyth | In this paper, we first predict a meaningful summary of the image, then generate the caption based on that summary. |
1086 | Attention Branch Network: Learning of Attention Mechanism for Visual Explanation | Hiroshi Fukui, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi | In this paper, we focus on the attention map for visual explanation, which represents a high response value as the attention location in image recognition. |
1087 | Cascaded Projection: End-To-End Network Compression and Acceleration | Breton Minnehan, Andreas Savakis | We propose a data-driven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements. |
1088 | DeepCaps: Going Deeper With Capsule Networks | Jathushan Rajasegaran, Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, Suranga Seneviratne, Ranga Rodrigo | Drawing intuition from the success achieved by Convolutional Neural Networks (CNNs) by going deeper, we introduce DeepCaps, a deep capsule network architecture which uses a novel 3D convolution based dynamic routing algorithm. |
1089 | FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search | Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer | To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. |
1090 | APDrawingGAN: Generating Artistic Portrait Drawings From Face Photos With Hierarchical GANs | Ran Yi, Yong-Jin Liu, Yu-Kun Lai, Paul L. Rosin | To address these challenges, we propose APDrawingGAN, a novel GAN based architecture that builds upon hierarchical generators and discriminators combining both a global network (for images as a whole) and local networks (for individual facial regions). To train APDrawingGAN, we construct an artistic drawing dataset containing high-resolution portrait photos and corresponding professional artistic drawings. |
1091 | Constrained Generative Adversarial Networks for Interactive Image Generation | Eric Heim | In this work we develop a novel GAN framework that allows humans to be “in-the-loop” of the image generation process. |
1092 | WarpGAN: Automatic Caricature Generation | Yichun Shi, Debayan Deb, Anil K. Jain | We propose, WarpGAN, a fully automatic network that can generate caricatures given an input face photo. |
1093 | Explainability Methods for Graph Convolutional Neural Networks | Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, Heiko Hoffmann | In this paper, we introduce explainability methods for GCNNs. |
1094 | A Generative Adversarial Density Estimator | M. Ehsan Abbasnejad, Qinfeng Shi, Anton van den Hengel, Lingqiao Liu | We propose a Generative Adversarial Density Estimator, a density estimation approach that bridges the gap between the two. |
1095 | SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates | Martin Engilberge, Louis Chevallier, Patrick Perez, Matthieu Cord | In the present work, we introduce a new method to learn approximations of such non-differentiable objective functions. |
1096 | High-Quality Face Capture Using Anatomical Muscles | Michael Bao, Matthew Cong, Stephane Grabli, Ronald Fedkiw | Thus, we propose modifying a recently developed rather expressive muscle-based system in order to make it fully-differentiable; in fact, our proposed modifications allow this physically robust and anatomically accurate muscle model to conveniently be driven by an underlying blendshape basis. |
1097 | FML: Face Model Learning From Videos | Ayush Tewari, Florian Bernard, Pablo Garrido, Gaurav Bharaj, Mohamed Elgharib, Hans-Peter Seidel, Patrick Perez, Michael Zollhofer, Christian Theobalt | In contrast, we propose multi-frame video-based self-supervised training of a deep network that (i) learns a face identity model both in shape and appearance while (ii) jointly learning to reconstruct 3D faces. |
1098 | AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations | Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li | In this paper, we investigate in depth the effects of two important hyperparameters of cosine-based softmax losses, the scale parameter and angular margin parameter, by analyzing how they modulate the predicted classification probability. |
1099 | 3D Hand Shape and Pose Estimation From a Single RGB Image | Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan | In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. |
1100 | 3D Hand Shape and Pose From Images in the Wild | Adnane Boukhayma, Rodrigo de Bem, Philip H.S. Torr | We present in this work the first end-to-end deep learning based method that predicts both 3D hand shape and pose from RGB images in the wild. |
1101 | Self-Supervised 3D Hand Pose Estimation Through Training by Fitting | Chengde Wan, Thomas Probst, Luc Van Gool, Angela Yao | We present a self-supervision method for 3D hand pose estimation from depth maps. |
1102 | CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark | Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, Cewu Lu | In this paper, we propose a novel and efficient method to tackle the problem of pose estimation in the crowd and a new dataset to better evaluate algorithms. |
1103 | Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in a Triadic Interaction | Hanbyul Joo, Tomas Simon, Mina Cikara, Yaser Sheikh | We present a new research task and a dataset to understand human social interactions via computational methods, to ultimately endow machines with the ability to encode and decode a broad channel of social signals humans use. We then present a new 3D motion capture dataset to explore this problem, where the broad spectrum of social signals (3D body, face, and hand motions) are captured in a triadic social interaction scenario. |
1104 | HoloPose: Holistic 3D Human Reconstruction In-The-Wild | Riza Alp Guler, Iasonas Kokkinos | We introduce HoloPose, a method for holistic monocular 3D human body reconstruction. |
1105 | Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation | Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Liang Lin | In this work, we propose a geometry-aware 3D representation for the human pose to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision. |
1106 | In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations | Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, Christian Theobalt | We therefore propose a new deep learning based method for monocular 3D human pose estimation that shows high accuracy and generalizes better to in-the-wild scenes. |
1107 | Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues | Natalia Neverova, James Thewlis, Riza Alp Guler, Iasonas Kokkinos, Andrea Vedaldi | In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. |
1108 | Self-Supervised Representation Learning From Videos for Facial Action Unit Detection | Yong Li, Jiabei Zeng, Shiguang Shan, Xilin Chen | In this paper, we aim to learn discriminative representation for facial action unit (AU) detection from large amount of videos without manual annotations. |
1109 | Combining 3D Morphable Models: A Large Scale Face-And-Head Model | Stylianos Ploumpis, Haoyang Wang, Nick Pears, William A. P. Smith, Stefanos Zafeiriou | In answering this question, we make two contributions. First, we propose two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Second, as an example application of our approach, we build a new head and face model that combines the variability and facial detail of the LSFM with the full head modelling of the LYHM. |
1110 | Boosting Local Shape Matching for Dense 3D Face Correspondence | Zhenfeng Fan, Xiyuan Hu, Chen Chen, Silong Peng | In this paper, we explicitly formulate the deformation as locally rigid motions guided by some seed points, and the formulated deformation satisfies coherent local motions everywhere on a face. |
1111 | Unsupervised Part-Based Disentangling of Object Shape and Appearance | Dominik Lorenz, Leonard Bereska, Timo Milbich, Bjorn Ommer | We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category. |
1112 | Monocular Total Capture: Posing Face, Body, and Hands in the Wild | Donglai Xiang, Hanbyul Joo, Yaser Sheikh | We present the first method to capture the 3D total motion of a target person from a monocular view input. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. |
1113 | Expressive Body Capture: 3D Hands, Face, and Body From a Single Image | Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, Michael J. Black | To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. |
1114 | Neural RGB(r)D Sensing: Depth and Uncertainty From a Video Camera | Chao Liu, Jinwei Gu, Kihwan Kim, Srinivasa G. Narasimhan, Jan Kautz | In this paper, we propose a deep learning (DL) method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera. |
1115 | DAVANet: Stereo Deblurring With View Aggregation | Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, Haozhe Xie, Jinshan Pan, Jimmy S. Ren | By exploiting the two-view nature of stereo images, we propose a novel stereo image deblurring network with Depth Awareness and View Aggregation, named DAVANet. Moreover, we present a large-scale multi-scene dataset for stereo deblurring, containing 20,637 blurry-sharp stereo image pairs from 135 diverse sequences and their corresponding bidirectional disparities. |
1116 | DVC: An End-To-End Deep Video Compression Framework | Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao | In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. |
1117 | SOSNet: Second Order Similarity Regularization for Local Descriptor Learning | Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, Vassileios Balntas | In this work, we explore the potential of \sos in the field of descriptor learning by building upon the intuition that a positive pair of matching points should exhibit similar distances with respect to other points in the embedding space. |
1118 | “Double-DIP”: Unsupervised Image Decomposition via Coupled Deep-Image-Priors | Yosef Gandelsman, Assaf Shocher, Michal Irani | In this paper we propose a unified framework for unsupervised layer decomposition of a single image, based on coupled “Deep-image-Prior” (DIP) networks. |
1119 | Unprocessing Images for Learned Raw Denoising | Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, Jonathan T. Barron | To address this, we present a technique to “unprocess” images by inverting each step of an image processing pipeline, thereby allowing us to synthesize realistic raw sensor measurements from commonly available Internet photos. |
1120 | Residual Networks for Light Field Image Super-Resolution | Shuo Zhang, Youfang Lin, Hao Sheng | In this paper, a learning-based method using residual convolutional networks is proposed to reconstruct light fields with higher spatial resolution. |
1121 | Modulating Image Restoration With Continual Levels via Adaptive Feature Modification Layers | Jingwen He, Chao Dong, Yu Qiao | We make a step forward by proposing a unified CNN framework that consists of little additional parameters than a single-level model yet could handle arbitrary restoration levels between a start and an end level. |
1122 | Second-Order Attention Network for Single Image Super-Resolution | Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, Lei Zhang | To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning. |
1123 | Devil Is in the Edges: Learning Semantic Boundaries From Noisy Annotations | David Acuna, Amlan Kar, Sanja Fidler | We propose a simple new layer and loss that can be used with existing learning-based boundary detectors. |
1124 | Path-Invariant Map Networks | Zaiwei Zhang, Zhenxiao Liang, Lemeng Wu, Xiaowei Zhou, Qixing Huang | In this paper, we study a natural self-supervision constraint for directed map networks called path-invariance, which enforces that composite maps along different paths between a fixed pair of source and target domains are identical. |
1125 | FilterReg: Robust and Efficient Probabilistic Point-Set Registration Using Gaussian Filter and Twist Parameterization | Wei Gao, Russ Tedrake | In this paper, we contribute a novel probabilistic registration method that achieves state-of-the-art robustness as well as substantially faster computational performance than modern ICP implementations. |
1126 | Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope | Tolga Birdal, Umut Simsekli | We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images. |
1127 | Lifting Vectorial Variational Problems: A Natural Formulation Based on Geometric Measure Theory and Discrete Exterior Calculus | Thomas Mollenhoff, Daniel Cremers | We approach the relaxation and convexification of such vectorial variational problems via a lifting to the space of currents. |
1128 | A Sufficient Condition for Convergences of Adam and RMSProp | Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu | In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization. |
1129 | Guaranteed Matrix Completion Under Multiple Linear Transformations | Chao Li, Wei He, Longhao Yuan, Zhun Sun, Qibin Zhao | To tackle this problem, we propose a more general framework for LRMC, in which the linear transformations of the data are taken into account. |
1130 | MAP Inference via Block-Coordinate Frank-Wolfe Algorithm | Paul Swoboda, Vladimir Kolmogorov | We present a new proximal bundle method for Maximum-A-Posteriori (MAP) inference in structured energy minimization problems. |
1131 | A Convex Relaxation for Multi-Graph Matching | Paul Swoboda, Dagmar Kainm”uller, Ashkan Mokarian, Christian Theobalt, Florian Bernard | We present a convex relaxation for the multi-graph matching problem. |
1132 | Pixel-Adaptive Convolutional Neural Networks | Hang Su, Varun Jampani, Deqing Sun, Orazio Gallo, Erik Learned-Miller, Jan Kautz | We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially varying kernel that depends on learnable, local pixel features. |
1133 | Single-Frame Regularization for Temporally Stable CNNs | Gabriel Eilertsen, Rafal K. Mantiuk, Jonas Unger | We take a different approach to the problem, posing temporal stability as a regularization of the cost function. |
1134 | An End-To-End Network for Generating Social Relationship Graphs | Arushi Goel, Keng Teck Ma, Cheston Tan | We introduce a novel end-to-end-trainable neural network that is capable of generating a Social Relationship Graph – a structured, unified representation of social relationships and attributes – from a given input image. |
1135 | Meta-Learning Convolutional Neural Architectures for Multi-Target Concrete Defect Classification With the COncrete DEfect BRidge IMage Dataset | Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh | In this work we introduce the novel COncrete DEfect BRidge IMage dataset (CODEBRIM) for multi-target classification of five commonly appearing concrete defects. |
1136 | ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model | Haichuan Yang, Yuhao Zhu, Ji Liu | This paper proposes ECC, a framework that compresses DNNs to meet a given energy constraint while minimizing accuracy loss. |
1137 | SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization | Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang | In this paper we present a novel and general method to accelerate convolutional neural network (CNN) inference by taking advantage of feature map sparsity. |
1138 | Defending Against Adversarial Attacks by Randomized Diversification | Olga Taran, Shideh Rezaeifar, Taras Holotyak, Slava Voloshynovskiy | In this paper, we propose a randomized diversification as a defense strategy. |
1139 | Rob-GAN: Generator, Discriminator, and Adversarial Attacker | Xuanqing Liu, Cho-Jui Hsieh | Combining these two insights, we develop a framework called Rob-GAN to jointly optimize generator and discriminator in the presence of adversarial attacks—the generator generates fake images to fool discriminator; the adversarial attacker perturbs real images to fool discriminator, and the discriminator wants to minimize loss under fake and adversarial images. |
1140 | Learning From Noisy Labels by Regularized Estimation of Annotator Confusion | Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, Nathan Silberman | In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations. |
1141 | Task-Free Continual Learning | Rahaf Aljundi, Klaas Kelchtermans, Tinne Tuytelaars | Therefore we investigate how to transform continual learning to an online setup. |
1142 | Importance Estimation for Neural Network Pruning | Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, Jan Kautz | We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. |
1143 | Detecting Overfitting of Deep Generative Networks via Latent Recovery | Ryan Webster, Julien Rabin, Loic Simon, Frederic Jurie | We address this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors for training versus validation images. |
1144 | Coloring With Limited Data: Few-Shot Colorization via Memory Augmented Networks | Seungjoo Yoo, Hyojin Bahng, Sunghyo Chung, Junsoo Lee, Jaehyuk Chang, Jaegul Choo | To tackle this issue, we present a novel memory-augmented colorization model MemoPainter that can produce high-quality colorization with limited data. |
1145 | Characterizing and Avoiding Negative Transfer | Zirui Wang, Zihang Dai, Barnabas Poczos, Jaime Carbonell | This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof. |
1146 | Building Efficient Deep Neural Networks With Unitary Group Convolutions | Ritchie Zhao, Yuwei Hu, Jordan Dotzel, Christopher De Sa, Zhiru Zhang | We propose unitary group convolutions (UGConvs), a building block for CNNs which compose a group convolution with unitary transforms in feature space to learn a richer set of representations than group convolution alone. |
1147 | Semi-Supervised Learning With Graph Learning-Convolutional Networks | Bo Jiang, Ziyan Zhang, Doudou Lin, Jin Tang, Bin Luo | In this paper, we propose a novel Graph Learning-Convolutional Network (GLCN) for graph data representation and semi-supervised learning. |
1148 | Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning | Oleksiy Ostapenko, Mihai Puscas, Tassilo Klein, Patrick Jahnichen, Moin Nabi | In order to tackle these challenges, we introduce Dynamic Generative Memory (DGM) – synaptic plasticity driven framework for continual learning. |
1149 | AIRD: Adversarial Learning Framework for Image Repurposing Detection | Ayush Jaiswal, Yue Wu, Wael AbdAlmageed, Iacopo Masi, Premkumar Natarajan | In this paper, we present a novel method for image repurposing detection that is based on the real-world adversarial interplay between a bad actor who repurposes images with counterfeit metadata and a watchdog who verifies the semantic consistency between images and their accompanying metadata, where both players have access to a reference dataset of verified content, which they can use to achieve their goals. |
1150 | A Kernelized Manifold Mapping to Diminish the Effect of Adversarial Perturbations | Saeid Asgari Taghanaki, Kumar Abhishek, Shekoofeh Azizi, Ghassan Hamarneh | To tackle this problem, we propose a non-linear radial basis convolutional feature mapping by learning a Mahalanobis-like distance function. |
1151 | Trust Region Based Adversarial Attack on Neural Networks | Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael W. Mahoney | To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently. |
1152 | PEPSI : Fast Image Inpainting With Parallel Decoding Network | Min-cheol Sagong, Yong-goo Shin, Seung-wook Kim, Seung Park, Sung-jea Ko | To solve this problem, in this paper, we present a novel network structure, called PEPSI: parallel extended-decoder path for semantic inpainting. |
1153 | Model-Blind Video Denoising via Frame-To-Frame Training | Thibaud Ehret, Axel Davy, Jean-Michel Morel, Gabriele Facciolo, Pablo Arias | In this paper we propose a fully blind video denoising method, with two versions off-line and on-line. |
1154 | End-To-End Efficient Representation Learning via Cascading Combinatorial Optimization | Yeonwoo Jeong, Yoonsung Kim, Hyun Oh Song | We develop hierarchically quantized efficient embedding representations for similarity-based search and show that this representation provides not only the state of the art performance on the search accuracy but also provides several orders of speed up during inference. |
1155 | Sim-Real Joint Reinforcement Transfer for 3D Indoor Navigation | Fengda Zhu, Linchao Zhu, Yi Yang | Specifically, our method employs an adversarial feature adaptation model for visual representation transfer anda policy mimic strategy for policy behavior imitation. |
1156 | ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation | Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha | We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. |
1157 | Regularizing Activation Distribution for Training Binarized Deep Networks | Ruizhou Ding, Ting-Wu Chin, Zeye Liu, Diana Marculescu | In this paper, we propose to use distribution loss to explicitly regularize the activation flow, and develop a framework to systematically formulate the loss. |
1158 | Robustness Verification of Classification Deep Neural Networks via Linear Programming | Wang Lin, Zhengfeng Yang, Xin Chen, Qingye Zhao, Xiangkun Li, Zhiming Liu, Jifeng He | In this paper, we develop a novel method for robustness verification of CDNNs with sigmoid activation functions. |
1159 | Additive Adversarial Learning for Unbiased Authentication | Jian Liang, Yuren Cao, Chenbin Zhang, Shiyu Chang, Kun Bai, Zenglin Xu | To address this issue, we propose a novel two-stage method that disentangles the class/identity from domain-differences, and we consider multiple types of domain-difference. |
1160 | Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation | Zhezhi He, Deliang Fan | In this work, we propose a novel ternarized neural network training method which simultaneously optimizes both weights and quantizer during training, differentiating from prior works. |
1161 | Adversarial Defense by Stratified Convolutional Sparse Coding | Bo Sun, Nian-Hsuan Tsai, Fangchen Liu, Ronald Yu, Hao Su | We propose an adversarial defense method that achieves state-of-the-art performance among attack-agnostic adversarial defense methods while also maintaining robustness to input resolution, scale of adversarial perturbation, and scale of dataset size. |
1162 | Exploring Object Relation in Mean Teacher for Cross-Domain Detection | Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Lingyu Duan, Ting Yao | In this work, we advance this Mean Teacher paradigm to be applicable for cross-domain detection. |
1163 | Hierarchical Disentanglement of Discriminative Latent Features for Zero-Shot Learning | Bin Tong, Chao Wang, Martin Klinkigt, Yoshiyuki Kobayashi, Yuuichi Nonaka | In this paper, we discuss two questions about generalization that are seldom discussed. |
1164 | R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network | Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Yanbin Hao | This paper studies a new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), to explore the feasibility of generating image from procedure text for retrieval problem. |
1165 | Rethinking Knowledge Graph Propagation for Zero-Shot Learning | Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, Eric P. Xing | In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. |
1166 | Learning to Learn Image Classifiers With Visual Analogy | Linjun Zhou, Peng Cui, Shiqiang Yang, Wenwu Zhu, Qi Tian | In this paper, we attempt to investigate a new human-like learning method by organically combining these two mechanisms. |
1167 | Where’s Wally Now? Deep Generative and Discriminative Embeddings for Novelty Detection | Philippe Burlina, Neil Joshi, I-Jeng Wang | We address these challenges via the following contributions: We make a proposal for a novel framework to measure the performance of novelty detection methods using a trade-space demonstrating performance (measured by ROCAUC) as a function of problem complexity. |
1168 | Weakly Supervised Image Classification Through Noise Regularization | Mengying Hu, Hu Han, Shiguang Shan, Xilin Chen | In this work, we propose an effective approach for weakly supervised image classification utilizing massive noisy labeled data with only a small set of clean labels (e.g., 5%). |
1169 | Data-Driven Neuron Allocation for Scale Aggregation Networks | Yi Li, Zhanghui Kuang, Yimin Chen, Wayne Zhang | In this paper, we propose to learn the neuron allocation for aggregating multi-scale information in different building blocks of a deep network. |
1170 | Graphical Contrastive Losses for Scene Graph Parsing | Ji Zhang, Kevin J. Shih, Ahmed Elgammal, Andrew Tao, Bryan Catanzaro | We propose a set of contrastive loss formulations that specifically target these types of errors within the scene graph parsing problem, collectively termed the Graphical Contrastive Losses. |
1171 | Deep Transfer Learning for Multiple Class Novelty Detection | Pramuditha Perera, Vishal M. Patel | We propose a transfer learning-based solution for the problem of multiple class novelty detection. |
1172 | QATM: Quality-Aware Template Matching for Deep Learning | Jiaxin Cheng, Yue Wu, Wael AbdAlmageed, Premkumar Natarajan | In this paper, we propose a novel quality-aware template matching method, which is not only used as a standalone template matching algorithm, but also a trainable layer that can be easily plugged in any deep neural network. |
1173 | Retrieval-Augmented Convolutional Neural Networks Against Adversarial Examples | Jake Zhao (Junbo), Kyunghyun Cho | We propose a retrieval-augmented convolutional network (RaCNN) and propose to train it with local mixup, a novel variant of the recently proposed mixup algorithm. |
1174 | Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images | Hao Wang, Doyen Sahoo, Chenghao Liu, Ee-peng Lim, Steven C. H. Hoi | In this paper, we investigate an open research task of cross-modal retrieval between cooking recipes and food images, and propose a novel framework Adversarial Cross-Modal Embedding (ACME) to resolve the cross-modal retrieval task in food domains. |
1175 | FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network | Jonah Philion | In this paper, we use lane detection to study modeling and training techniques that yield better performance on real world test drives. |
1176 | Weakly Supervised Video Moment Retrieval From Text Queries | Niluthpol Chowdhury Mithun, Sujoy Paul, Amit K. Roy-Chowdhury | In order to cope with this issue, in this work, we introduce the problem of learning from weak labels for the task of text to video moment retrieval. |
1177 | Content-Aware Multi-Level Guidance for Interactive Instance Segmentation | Soumajit Majumder, Angela Yao | We propose a novel transformation of user clicks to generate content-aware guidance maps that leverage the hierarchical structural information present in an image. |
1178 | Greedy Structure Learning of Hierarchical Compositional Models | Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, Thomas Vetter | In this work, we consider the problem of learning a hierarchical generative model of an object from a set of images which show examples of the object in the presence of variable background clutter. |
1179 | Interactive Full Image Segmentation by Considering All Regions Jointly | Eirikur Agustsson, Jasper R. R. Uijlings, Vittorio Ferrari | We propose an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions. |
1180 | Learning Active Contour Models for Medical Image Segmentation | Xu Chen, Bryan M. Williams, Srinivasa R. Vallabhaneni, Gabriela Czanner, Rachel Williams, Yalin Zheng | Our aim was to tackle this limitation by developing a new model based on deep learning which takes into account the area inside as well as outside the region of interest as well as the size of boundaries during learning. |
1181 | Customizable Architecture Search for Semantic Segmentation | Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, Tao Mei | In this paper, we propose a Customizable Architecture Search (CAS) approach to automatically generate a network architecture for semantic image segmentation. |
1182 | Local Features and Visual Words Emerge in Activations | Oriane Simeoni, Yannis Avrithis, Ondrej Chum | We propose a novel method of deep spatial matching (DSM) for image retrieval. |
1183 | Hyperspectral Image Super-Resolution With Optimized RGB Guidance | Ying Fu, Tao Zhang, Yinqiang Zheng, Debing Zhang, Hua Huang | In this paper, we first present a simple and efficient convolutional neural network (CNN) based method for HSI super-resolution in an unsupervised way, without any prior training. Later, we append a CSR optimization layer onto the HSI super-resolution network, either to automatically select the best CSR in a given CSR dataset, or to design the optimal CSR under some physical restrictions. |
1184 | Adaptive Confidence Smoothing for Generalized Zero-Shot Learning | Yuval Atzmon, Gal Chechik | Here we describe a probabilistic approach that breaks the model into three modular components, and then combines them in a consistent way. |
1185 | PMS-Net: Robust Haze Removal Based on Patch Map for Single Images | Wei-Ting Chen, Jian-Jiun Ding, Sy-Yen Kuo | In this paper, we proposed a novel haze removal algorithm based on a new feature called the patch map. |
1186 | Deep Spherical Quantization for Image Search | Sepehr Eghbali, Ladan Tahvildari | In this paper, we put forward Deep Spherical Quantization (DSQ), a novel method to make deep convolutional neural networks generate supervised and compact binary codes for efficient image search. |
1187 | Large-Scale Interactive Object Segmentation With Human Annotators | Rodrigo Benenson, Stefan Popov, Vittorio Ferrari | In this paper we make several contributions to interactive segmentation: (1) we systematically explore in simulation the design space of deep interactive segmentation models and report new insights and caveats; (2) we execute a large-scale annotation campaign with real human annotators, producing masks for 2.5M instances on the OpenImages dataset. We released this data publicly, forming the largest existing dataset for instance segmentation. |
1188 | A Poisson-Gaussian Denoising Dataset With Real Fluorescence Microscopy Images | Yide Zhang, Yinhao Zhu, Evan Nichols, Qingfei Wang, Siyuan Zhang, Cody Smith, Scott Howard | In this paper, we fill this gap by constructing a dataset – the Fluorescence Microscopy Denoising (FMD) dataset – that is dedicated to Poisson-Gaussian denoising. |
1189 | Task Agnostic Meta-Learning for Few-Shot Learning | Muhammad Abdullah Jamal, Guo-Jun Qi | Specifically, we present an entropy-based approach that meta-learns an unbiased initial model with the largest uncertainty over the output labels by preventing it from over-performing in classification tasks. |
1190 | Progressive Ensemble Networks for Zero-Shot Recognition | Meng Ye, Yuhong Guo | In this paper, we propose a novel progressive ensemble network model with multiple projected label embeddings to address zero-shot image recognition. |
1191 | Direct Object Recognition Without Line-Of-Sight Using Optical Coherence | Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu | We introduce a novel approach based on speckle pattern recognition with deep neural network, which is simpler and more robust than other NLOS recognition methods. |
1192 | Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning | Mahdi S. Hosseini, Lyndon Chan, Gabriel Tse, Michael Tang, Jun Deng, Sajad Norouzi, Corwyn Rowsell, Konstantinos N. Plataniotis, Savvas Damaskinos | In this paper, we propose a new digital pathology database, the “Atlas of Digital Pathology” (or ADP), which comprises of 17,668 patch images extracted from 100 slides annotated with up to 57 hierarchical HTTs. |
1193 | Perturbation Analysis of the 8-Point Algorithm: A Case Study for Wide FoV Cameras | Thiago L. T. da Silveira, Claudio R. Jung | This paper presents a perturbation analysis for the estimate of epipolar matrices using the 8-Point Algorithm (8-PA). |
1194 | Robustness of 3D Deep Learning in an Adversarial Setting | Matthew Wicker, Marta Kwiatkowska | In this work, we develop an algorithm for analysis of pointwise robustness of neural networks that operate on 3D data. |
1195 | SceneCode: Monocular Dense Semantic Reconstruction Using Learned Encoded Scene Representations | Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison | We introduce a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image. |
1196 | StereoDRNet: Dilated Residual StereoNet | Rohan Chabra, Julian Straub, Christopher Sweeney, Richard Newcombe, Henry Fuchs | We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene. |
1197 | The Alignment of the Spheres: Globally-Optimal Spherical Mixture Alignment for Camera Pose Estimation | Dylan Campbell, Lars Petersson, Laurent Kneip, Hongdong Li, Stephen Gould | Hence, we cast the problem as a 2D-3D mixture model alignment task and propose the first globally-optimal solution to this formulation under the robust L2 distance between mixture distributions. |
1198 | Learning Joint Reconstruction of Hands and Manipulated Objects | Yana Hasson, Gul Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid | In this work, we regu- larize the joint reconstruction of hands and objects with ma- nipulation constraints. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. |
1199 | Deep Single Image Camera Calibration With Radial Distortion | Manuel Lopez, Roger Mari, Pau Gargallo, Yubin Kuang, Javier Gonzalez-Jimenez, Gloria Haro | In this work we propose a method to predict extrinsic (tilt and roll) and intrinsic (focal length and radial distortion) parameters from a single image. |
1200 | CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth | Jose M. Facil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, Javier Civera | In this work, we propose a new type of convolution that can take the camera parameters into account, thus allowing neural networks to learn calibration-aware patterns. |
1201 | Translate-to-Recognize Networks for RGB-D Scene Recognition | Dapeng Du, Limin Wang, Huiling Wang, Kai Zhao, Gangshan Wu | To this end, this paper presents a unified framework to integrate the tasks of cross-modal translation and modality-specific recognition, termed as Translate-to-Recognize Network TRecgNet. |
1202 | Re-Identification Supervised Texture Generation | Jian Wang, Yunshan Zhong, Yachun Li, Chi Zhang, Yichen Wei | In this paper, we propose an end-to-end learning strategy to generate textures of human bodies under the supervision of person re-identification. |
1203 | Action4D: Online Action Recognition in the Crowd and Clutter | Quanzeng You, Hao Jiang | At the first step, we propose a new method to track people in 4D, which can reliably detect and follow each person in real time. |
1204 | Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction | Jason Ku, Alex D. Pon, Steven L. Waslander | We present MonoPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction. |
1205 | Attribute-Aware Face Aging With Wavelet-Based Generative Adversarial Networks | Yunfan Liu, Qi Li, Zhenan Sun | In this paper, we propose an attribute-aware face aging model with wavelet based Generative Adversarial Networks (GANs) to address the above issues. |
1206 | Noise-Tolerant Paradigm for Training Face Recognition CNNs | Wei Hu, Yangyu Huang, Fan Zhang, Ruirui Li | Thus, we propose a novel training paradigm that employs the idea of weighting samples based on the above probability. |
1207 | Low-Rank Laplacian-Uniform Mixed Model for Robust Face Recognition | Jiayu Dong, Huicheng Zheng, Lina Lian | In this paper, we aim at recognizing identities from faces with varying levels of noises of various forms such as occlusion, pixel corruption, or disguise, and take improving the fitting ability of the error model as the key to addressing this problem. |
1208 | Generalizing Eye Tracking With Bayesian Adversarial Learning | Kang Wang, Rui Zhao, Hui Su, Qiang Ji | To improve the generalization performance, we propose to incorporate adversarial learning and Bayesian inference into a unified framework. |
1209 | Local Relationship Learning With Person-Specific Shape Regularization for Facial Action Unit Detection | Xuesong Niu, Hu Han, Songfan Yang, Yan Huang, Shiguang Shan | To resolve these issues, in this work, we propose a novel AU detection method by utilizing local information and the relationship of individual local face regions. |
1210 | Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer | Shile Li, Dongheui Lee | In this paper, we present a novel deep learning hand pose estimation method for an unordered point cloud. |
1211 | Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis | Yu Yu, Gang Liu, Jean-Marc Odobez | In doing so, our contributions are threefold:(i) we design our gaze redirection framework from synthetic data, allowing us to benefit from aligned training sample pairs to predict accurate inverse mapping fields; (ii) we proposed a self-supervised approach for domain adaptation; (iii) we exploit the gaze redirection to improve the performance of person-specific gaze estimation. |
1212 | AdaptiveFace: Adaptive Margin and Sampling for Face Recognition | Hao Liu, Xiangyu Zhu, Zhen Lei, Stan Z. Li | In this paper, we argue that the margin should be adapted to different classes. |
1213 | Disentangled Representation Learning for 3D Face Shape | Zi-Hang Jiang, Qianyi Wu, Keyu Chen, Juyong Zhang | In this paper, we present a novel strategy to design disentangled 3D face shape representation. |
1214 | LBS Autoencoder: Self-Supervised Fitting of Articulated Meshes to Point Clouds | Chun-Liang Li, Tomas Simon, Jason Saragih, Barnabas Poczos, Yaser Sheikh | We present LBS-AE; a self-supervised autoencoding algorithm for fitting articulated mesh models to point clouds. |
1215 | PifPaf: Composite Fields for Human Pose Estimation | Sven Kreiss, Lorenzo Bertoni, Alexandre Alahi | We propose a new bottom-up method for multi-person 2D human pose estimation that is particularly well suited for urban mobility such as self-driving cars and delivery robots. |
1216 | TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection | Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun | In this paper, we define these ambiguous samples as “transitional states”, and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states. |
1217 | Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos | Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, Svetha Venkatesh | We propose a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features. |
1218 | Local Temporal Bilinear Pooling for Fine-Grained Action Parsing | Yan Zhang, Siyu Tang, Krikamol Muandet, Christian Jarvers, Heiko Neumann | In this paper we propose a novel bilinear pooling operation, which is used in intermediate layers of a temporal convolutional encoder-decoder net. |
1219 | Improving Action Localization by Progressive Cross-Stream Cooperation | Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu | In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to iterative improve action localization results and generate better bounding boxes for one stream (i.e., Flow/RGB) by leveraging both region proposals and features from another stream (i.e., RGB/Flow) in an iterative fashion. |
1220 | Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition | Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu | In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. |
1221 | A Neural Network Based on SPD Manifold Learning for Skeleton-Based Hand Gesture Recognition | Xuan Son Nguyen, Luc Brun, Olivier Lezoray, Sebastien Bougleux | This paper proposes a new neural network based on SPD manifold learning for skeleton-based hand gesture recognition. |
1222 | Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition | Deepti Ghadiyaram, Du Tran, Dhruv Mahajan | This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. |
1223 | Learning Spatio-Temporal Representation With Local and Global Diffusion | Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei | In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). |
1224 | Unsupervised Learning of Action Classes With Continuous Temporal Embedding | Anna Kukleva, Hilde Kuehne, Fadime Sener, Jurgen Gall | To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences. |
1225 | Double Nuclear Norm Based Low Rank Representation on Grassmann Manifolds for Clustering | Xinglin Piao, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin | In this paper, we propose a new low rank model for high-dimension data clustering task on Grassmann manifold based on the Double Nuclear norm which is used to better approximate the rank minimization of matrix. |
1226 | SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction | Pu Zhang, Wanli Ouyang, Pengfei Zhang, Jianru Xue, Nanning Zheng | In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. |
1227 | Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes | Yiran Zhong, Pan Ji, Jianyuan Wang, Yuchao Dai, Hongdong Li | In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning. |
1228 | An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM | Patrick Geneva, James Maley, Guoquan Huang | In this paper, we propose a novel, high-precision, efficient visual-inertial (VI)-SLAM algorithm, termed Schmidt-EKF VI-SLAM (SEVIS), which optimally fuses IMU measurements and monocular images in a tightly-coupled manner to provide 3D motion tracking with bounded error. |
1229 | A Neural Temporal Model for Human Motion Prediction | Anand Gopalakrishnan, Ankur Mali, Dan Kifer, Lee Giles, Alexander G. Ororbia | We propose novel neural temporal models for predicting and synthesizing human motion, achieving state-of-the-art in modeling long-term motion trajectories while being competitive with prior work in short-term prediction and requiring significantly less computation. |
1230 | Multi-Agent Tensor Fusion for Contextual Trajectory Prediction | Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu | Specifically, the model encodes multiple agents’ past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context. |
1231 | Coordinate-Based Texture Inpainting for Pose-Guided Human Image Generation | Artur Grigorev, Artem Sevastopolsky, Alexander Vakhitov, Victor Lempitsky | We present a new deep learning approach to pose-guided resynthesis of human photographs. |
1232 | On Stabilizing Generative Adversarial Training With Noise | Simon Jenni, Paolo Favaro | We present a novel method and analysis to train generative adversarial networks (GAN) in a stable manner. |
1233 | Self-Supervised GANs via Auxiliary Rotation Loss | Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby | In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs. |
1234 | Texture Mixer: A Network for Controllable Synthesis and Interpolation of Texture | Ning Yu, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, Michal Lukac | To solve it we propose a neural network trained simultaneously on a reconstruction task and a generation task, which can project texture examples onto a latent space where they can be linearly interpolated and projected back onto the image domain, thus ensuring both intuitive control and realistic results. |
1235 | Object-Driven Text-To-Image Synthesis via Adversarial Training | Wenbo Li, Pengchuan Zhang, Lei Zhang, Qiuyuan Huang, Xiaodong He, Siwei Lyu, Jianfeng Gao | In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow attention-driven, multi-stage refinement for synthesizing complex images from text descriptions. |
1236 | Zoom-In-To-Check: Boosting Video Interpolation via Instance-Level Discrimination | Liangzhe Yuan, Yibo Chen, Hantian Liu, Tao Kong, Jianbo Shi | We propose a light-weight video frame interpolation algorithm. |
1237 | Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions | Zhilin Zheng, Li Sun | But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, zs and zu, for a single input. |
1238 | Spectral Reconstruction From Dispersive Blur: A Novel Light Efficient Spectral Imager | Yuanyuan Zhao, Xuemei Hu, Hui Guo, Zhan Ma, Tao Yue, Xun Cao | In this work, we propose a novel multispectral imaging technique, that could capture the multispectral images with a high light efficiency. |
1239 | Quasi-Unsupervised Color Constancy | Simone Bianco, Claudio Cusano | We present here a method for computational color constancy in which a deep convolutional neural network is trained to detect achromatic pixels in color images after they have been converted to grayscale. |
1240 | Deep Defocus Map Estimation Using Domain Adaptation | Junyong Lee, Sungkil Lee, Sunghyun Cho, Seungyong Lee | In this paper, we propose the first end-to-end convolutional neural network (CNN) architecture, Defocus Map Estimation Network (DMENet), for spatially varying defocus map estimation. |
1241 | Using Unknown Occluders to Recover Hidden Scenes | Adam B. Yedidia, Manel Baradad, Christos Thrampoulidis, William T. Freeman, Gregory W. Wornell | In this paper, we relax this often impractical assumption, extending the range of applications for passive occluder-based NLoS imaging systems. |
1242 | Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation | Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, Michael J. Black | To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. |
1243 | Learning Parallax Attention for Stereo Image Super-Resolution | Longguang Wang, Yingqian Wang, Zhengfa Liang, Zaiping Lin, Jungang Yang, Wei An, Yulan Guo | In this paper, we propose a parallax-attention stereo superresolution network (PASSRnet) to integrate the information from a stereo image pair for SR. |
1244 | Knowing When to Stop: Evaluation and Verification of Conformity to Output-Size Specifications | Chenglong Wang, Rudy Bunel, Krishnamurthy Dvijotham, Po-Sen Huang, Edward Grefenstette, Pushmeet Kohli | In this paper, we study the vulnerability of these models to attacks aimed at changing the output-size that can have undesirable consequences including increased computation and inducing faults in downstream modules that expect outputs of a certain length. |
1245 | Spatial Attentive Single-Image Deraining With a High Quality Real Rain Dataset | Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, Rynson W.H. Lau | In this paper, we address the single image rain removal problem in two ways. Using this method, we construct a large-scale dataset of 29.5K rain/rain-free image pairs that covers a wide range of natural rain scenes. |
1246 | Focus Is All You Need: Loss Functions for Event-Based Vision | Guillermo Gallego, Mathias Gehrig, Davide Scaramuzza | We present a collection and taxonomy of twenty two objective functions to analyze event alignment in motion compensation approaches. |
1247 | Scalable Convolutional Neural Network for Image Compressed Sensing | Wuzhen Shi, Feng Jiang, Shaohui Liu, Debin Zhao | In this paper, we propose a scalable convolutional neural network (dubbed SCSNet) to achieve scalable sampling and scalable reconstruction with only one model. |
1248 | Event Cameras, Contrast Maximization and Reward Functions: An Analysis | Timo Stoffregen, Lindsay Kleeman | In this work we examine the choice of reward used in contrast maximization, propose a classification of different rewards and show how a reward can be constructed that is more robust to noise and aperture uncertainty. |
1249 | Convolutional Neural Networks Can Be Deceived by Visual Illusions | Alexander Gomez-Villa, Adrian Martin, Javier Vazquez-Corral, Marcelo Bertalmio | In particular, we show that CNNs trained for image denoising, image deblurring, and computational color constancy are able to replicate the human response to visual illusions, and that the extent of this replication varies with respect to variation in architecture and spatial pattern size. |
1250 | PDE Acceleration for Active Contours | Anthony Yezzi, Ganesh Sundaramoorthi, Minas Benyamin | We extend their formulation to the PDE framework, specifically for the infinite dimensional manifold of continuous curves, to introduce acceleration, and its added robustness, into the broad range of PDE based active contours. |
1251 | Dichromatic Model Based Temporal Color Constancy for AC Light Sources | Jun-Sang Yoo, Jong-Ok Kim | In this paper, we propose a novel approach to estimate the illuminant chromaticity of AC light source using high-speed camera. |
1252 | Semantic Attribute Matching Networks | Seungryong Kim, Dongbo Min, Somi Jeong, Sunok Kim, Sangryul Jeon, Kwanghoon Sohn | We present semantic attribute matching networks (SAM-Net) for jointly establishing correspondences and transferring attributes across semantically similar images, which intelligently weaves the advantages of the two tasks while overcoming their limitations. |
1253 | Skin-Based Identification From Multispectral Image Data Using CNNs | Takeshi Uemori, Atsushi Ito, Yusuke Moriuchi, Alexander Gatto, Jun Murayama | In this paper, we propose a new biometric identification system based solely on a skin patch from a multispectral image. |
1254 | Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks | Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka | We propose an alternative approach using a second order optimization method that shows similar generalization capability to first order methods, but converges faster and can handle larger mini-batches. |
1255 | Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments | Xueting Li, Sifei Liu, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz | In this paper, we aim to predict affordances of 3D indoor scenes, specifically what human poses are afforded by a given indoor environment, such as sitting on a chair or standing on the floor. |
1256 | PIEs: Pose Invariant Embeddings | Chih-Hui Ho, Pedro Morgado, Amir Persekian, Nuno Vasconcelos | A taxonomic classification of embeddings, according to their level of invariance, is introduced and used to clarify connections between existing embeddings, identify missing approaches, and propose invariant generalizations. |
1257 | Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning | Kshitij Dwivedi, Gemma Roig | We address this problem by proposing an approach to assess the relationship between visual tasks and their task-specific models. |
1258 | Object Counting and Instance Segmentation With Image-Level Supervision | Hisham Cholakkal, Guolei Sun, Fahad Shahbaz Khan, Ling Shao | We propose an image-level supervised approach that provides both the global object count and the spatial distribution of object instances by constructing an object category density map. |
1259 | Variational Autoencoders Pursue PCA Directions (by Accident) | Michal Rolinek, Dominik Zietlow, Georg Martius | Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments. |
1260 | A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes | Lichao Mou, Yuansheng Hua, Xiao Xiang Zhu | In this work, we introduce two simple yet effective network units, the spatial relation module and the channel relation module, to learn and reason about global relationships between any two spatial positions or feature maps, and then produce relation-augmented feature representations. |
1261 | Temporal Transformer Networks: Joint Learning of Invariant and Discriminative Time Warping | Suhas Lohit, Qiao Wang, Pavan Turaga | In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation. |
1262 | PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval | Wenxiao Zhang, Chunxia Xiao | In this paper, we propose a Point Contextual Attention Network (PCAN), which can predict the significance of each local point feature based on point context. |
1263 | Depth Coefficients for Depth Completion | Saif Imran, Yunfei Long, Xiaoming Liu, Daniel Morris | We propose a new representation for depth called Depth Coefficients (DC) to address this problem. |
1264 | Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection | Taekyung Kim, Minki Jeong, Seunghyeon Kim, Seokeon Choi, Changick Kim | We introduce a novel unsupervised domain adaptation approach for object detection. |
1265 | Good News, Everyone! Context Driven Entity-Aware Captioning for News Images | Ali Furkan Biten, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas | In this work, we aim to take a step closer to producing captions that offer a plausible interpretation of the scene, by integrating such contextual information into the captioning pipeline. |
1266 | Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding | Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang | We address the problem of phrase grounding by learning a multi-level common semantic space shared by the textual and visual modalities. |
1267 | Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning | Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian | We argue that careful designing of visual features for this task is equally important, and present a visual feature encoding technique to generate semantically rich captions using Gated Recurrent Units (GRUs). |
1268 | Pointing Novel Objects in Image Captioning | Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, Tao Mei | In this paper, we propose to address the problem by augmenting standard deep captioning architectures with object learners. |
1269 | Informative Object Annotations: Tell Me Something I Don’t Know | Lior Bracha, Gal Chechik | Motivated by cognitive theories of categorization and communication, we present a new unsupervised approach to model this prior knowledge and quantify the informativeness of a description. |
1270 | Engaging Image Captioning via Personality | Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston | We build models that combine existing work from (i) sentence representations [36] with Transformers trained on 1.7 billion dialogue examples; and (ii) image representations [32] with ResNets trained on 3.5 billion social media images. We collect and release a large dataset of 241,858 of such captions conditioned over 215 possible traits. |
1271 | Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention | Khanh Nguyen, Debadeepta Dey, Chris Brockett, Bill Dolan | To model language-based assistance, we develop a general framework termed Imitation Learning with Indirect Intervention (I3L), and propose a solution that is effective on the VNLA task. |
1272 | TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments | Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi | We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. |
1273 | A Simple Baseline for Audio-Visual Scene-Aware Dialog | Idan Schwartz, Alexander G. Schwing, Tamir Hazan | Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. |
1274 | End-To-End Learned Random Walker for Seeded Image Segmentation | Lorenzo Cerrone, Alexander Zeilmann, Fred A. Hamprecht | We present an end-to-end learned algorithm for seeded segmentation. |
1275 | Efficient Neural Network Compression | Hyeji Kim, Muhammad Umar Karim Khan, Chong-Min Kyung | In this paper we propose an efficient method for obtaining the rank configuration of the whole network. |
1276 | Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms | Fandong Zhang, Ling Luo, Xinwei Sun, Zhen Zhou, Xiuli Li, Yizhou Yu, Yizhou Wang | In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models. |
1277 | C3AE: Exploring the Limits of Compact Model for Age Estimation | Chao Zhang, Shuaicheng Liu, Xun Xu, Ce Zhu | In this work, we investigate the limits of compact model for small-scale image and propose an extremely Compact yet efficient Cascade Context-based Age Estimation model(C3AE). |
1278 | Adaptive Weighting Multi-Field-Of-View CNN for Semantic Segmentation in Pathology | Hiroki Tokunaga, Yuki Teramoto, Akihiko Yoshizawa, Ryoma Bise | In this paper, we propose a novel semantic segmentation method, called Adaptive-Weighting-Multi-Field-of-View-CNN (AWMF-CNN), that can adaptively use image features from images with different magnifications to segment multiple cancer subtype regions in the input image. |
1279 | In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images | Marin Orsic, Ivan Kreso, Petra Bevandic, Sinisa Segvic | We propose an alternative approach which achieves a significantly better performance across a wide range of computing budgets. |
1280 | Context-Aware Visual Compatibility Prediction | Guillem Cucurull, Perouz Taslakian, David Vazquez | In this work we propose a method that predicts compatibility between two items based on their visual features, as well as their context. |
1281 | Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks | Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, Konstantinos Bousmalis | In this paper, we present Randomized-to-Canonical Adaptation Networks (RCANs), a novel approach to crossing the visual reality gap that uses no real-world data. |
1282 | Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation | Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou | We propose to tackle the problem of multiview 2D/3D rigid registration for intervention via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2). |
1283 | Context-Aware Spatio-Recurrent Curvilinear Structure Segmentation | Feigege Wang, Yue Gu, Wenxi Liu, Yuanlong Yu, Shengfeng He, Jia Pan | In this paper, we propose a novel curvilinear structure segmentation approach using context-aware spatio-recurrent networks. |
1284 | An Alternative Deep Feature Approach to Line Level Keyword Spotting | George Retsinas, Georgios Louloudis, Nikolaos Stamatopoulos, Giorgos Sfikas, Basilis Gatos | In this work, we propose a time and storage-efficient, deep feature-based approach that enables both the image and textual search options. |
1285 | Dynamics Are Important for the Recognition of Equine Pain in Video | Sofia Broome, Karina Bech Gleerup, Pia Haubro Andersen, Hedvig Kjellstrom | In this study, we propose a deep recurrent two-stream architecture for the task of distinguishing pain from non-pain in videos of horses. |
1286 | LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving | Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington | In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. |
1287 | Machine Vision Guided 3D Medical Image Compression for Efficient Transmission and Accurate Segmentation in the Clouds | Zihao Liu, Xiaowei Xu, Tao Liu, Qi Liu, Yanzhi Wang, Yiyu Shi, Wujie Wen, Meiping Huang, Haiyun Yuan, Jian Zhuang | In this paper, we will use deep learning based medical image segmentation as a vehicle and demonstrate that interestingly, machine and human view the compression quality differently. |
1288 | PointPillars: Fast Encoders for Object Detection From Point Clouds | Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, Oscar Beijbom | In this paper, we consider the problem of encoding a point cloud into a format appropriate for a downstream detection pipeline. |
1289 | Motion Estimation of Non-Holonomic Ground Vehicles From a Single Feature Correspondence Measured Over N Views | Kun Huang, Yifu Wang, Laurent Kneip | We present the complete theory of this novel solver, and test it on both simulated and real data. |
1290 | From Coarse to Fine: Robust Hierarchical Localization at Large Scale | Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, Marcin Dymczyk | In this paper we propose HF-Net, a hierarchical localization approach based on a monolithic CNN that simultaneously predicts local features and global descriptors for accurate 6-DoF localization. |
1291 | Large Scale High-Resolution Land Cover Mapping With Multi-Resolution Data | Caleb Robinson, Le Hou, Kolya Malkin, Rachel Soobitsky, Jacob Czawlytko, Bistra Dilkina, Nebojsa Jojic | In this paper we propose multi-resolution data fusion methods for deep learning-based high-resolution land cover mapping from aerial imagery. |
1292 | Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting | Muming Zhao, Jian Zhang, Chongyang Zhang, Wenjun Zhang | In this paper, we propose to address these issues by leveraging the heterogeneous attributes compounded in the density map. |
1293 | AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data | Liheng Zhang, Guo-Jun Qi, Liqiang Wang, Jiebo Luo | In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach. |
1294 | 2.5D Visual Sound | Ruohan Gao, Kristen Grauman | We propose to convert common monaural audio into binaural audio by leveraging video. |